The LLVM Project Blog

LLVM Project News and Details from the Trenches

Map LLVM Values to corresponding source level expression, GSoC'23 Project

Hi, My name is Shivam, I involved with the LLVM Foundation in 2023 GSoC edition and worked on an interesting project Map LLVM Values to corresponding source level expression.

Project Scope

Programmers frequently rely on compiler-generated remarks and analysis reports to enhance the efficiency of their code. While compilers excel at including source code positions (such as line and column numbers) in these generated messages, it would be advantageous if these reports also contained the corresponding source-level expressions. The LLVM implementation presently employs a limited set of intrinsic functions to establish a connection between LLVM program elements and source-level expressions. This project’s objective is to leverage the data embedded in these intrinsic functions to either generate source expressions that correspond to LLVM values. The optimization of memory accesses within a program is crucial for achieving optimal application performance. Specifically, our goal is to utilize compiler analysis messages that detail source-level memory accesses associated with LLVM load/store pointer values, which can impede compiler optimizations. As an illustration, this information can be used to identify memory access dependencies that hinder vectorization.

Expected result was to provide an interface which takes an LLVM value at any point in the LLVM transformations pipeline and returns a string corresponding to the equivalent source-level expression. We are especially interested in using this interface to map addresses used in load/store instructions to equivalent source-level memory references.

What we did

The core achievement of the project is the development of an analysis pass that operates on LLVM intermediate representation (IR). This analysis pass identifies load and store instructions, and then conducts a recursive traversal to construct source expressions that represent equivalent source-level memory references. This is achieved by utilizing the metadata and debug intrinsics available in the LLVM IR. This pass was integrated into the loop vectorizer framework, which is a significant step towards practical application. Accompanying the implementation, a comprehensive suite of tests was developed to ensure the accuracy and expected behavior of the analysis pass. Analysis pass exist at llvm/lib/Analysis/SourceExpressionAnalysis.cpp

Implementation Overview

Debug Metadata Handling:

The implementation effectively processes debug metadata associated with instructions. It leverages debug value and declare instructions to retrieve variable names, which are then used to construct source expressions. This enables accurate mapping of LLVM values to their corresponding source-level expressions.

Instruction Types Handling:

The implementation covers a range of instruction types, including binary operators, GetElementPtr, sign extension instructions, LoadInst, and StoreInst. This comprehensive coverage ensures that a wide array of LLVM instructions can be translated into meaningful source-level expressions.

Type and Tag Handling:

The implementation utilizes type information from DIType to determine the type tag, which aids in constructing accurate source-level expressions. Different types are handled appropriately, enhancing the fidelity of the generated expressions.

Expression Construction:

The implementation constructs source-level expressions using the provided LLVM instructions. It combines operand names, operator symbols, and other relevant components to create expressions that closely resemble the original source code.

LoadInst and StoreInst Processing:

The implementation effectively processes LoadInst and StoreInst instructions. It generates source expressions for the loaded and stored values, considering both instruction operands and their associated debug metadata.

Mapping Storage:

The SourceExpressionsMap efficiently stores generated source expressions for various LLVM values. This storage mechanism helps in avoiding redundant calculations and ensures consistent results throughout the analysis.

However, It’s important to note that the generated source expressions are in C/C++ Style. Accounting for different source languages and their peculiarities has been beyond the scope of this initial attempt. In addition to developing a separate analysis pass for translating LLVM values into source-level expressions, the implementation was further enhanced by integrating this pass with the Loop Vectorizer. This integration allows for the reporting of source expressions for dependence source and destination pointers in the context of the loop vectorization process. This feature provides valuable insights to developers, aiding in their understanding of memory access patterns and facilitating optimizations.

The Current State

The project has successfully delivered the core functionality of generating source expressions for load and store instructions, covering array and pointer memory references. While initial attempts were made to handle complex structures like structs, this aspect is currently outside the project’s scope.

Structs pose a unique challenge due to their intricate representation within the LLVM Intermediate Representation (IR). While the project did make initial attempts to incorporate basic support for handling structs, the complexity of nested structures presented significant difficulties. As a result, we encountered obstacles in accurately extracting source expressions for structs and their complex compositions.

The code still didn’t get merge, we still need review on the patch from other community members, the pull request is trackable on Github now Map LLVM Values to source level expression

Let’s look at how the analysis pass able to provide useful source level expression for the memory dependencies in the loop.

//test.c

void test_backward_dep(int n, int *A) {
  for (int i = 1; i <= n-3; i += 3) {
    A[i] = A[i-1];
    A[i+1] = A[i+3];
  }
}

Generate LLVM file (*.ll) using

clang -O3 -S -g emitllvm test.c

(Assuming test.ll file gets generated)

Using below clang command to compile and emit remarks related to loop vectorization along with source expression

opt -report-source-expr=true -passes='function(loop-vectorize,require<access-info>)' -disable-output -pass-remarks-analysis=loop-vectorize test.ll

Note : A command-line option, ReportSourceExpr, was introduced to control the reporting of source expressions. This option allows users to toggle the reporting of source expressions for Load/Store pointers. By setting this option to true (-report-source-expr=true), developers can receive additional information about the source expressions associated with dependence source and destination pointers, enhancing the quality and depth of the optimization reports.

Output remarks

remark: test.c:3:12: loop not vectorized: unsafe dependent memory operations in the loop. Use #pragma loop distribute(enable) to allow loop distribution to attempt to isolate the offending operation into a separate loop 
Dependence source: &A[(i + -3)] Dependence destination: &A[(i + 1)]

This output includes the information about the loop that wasn’t vectorized due to unsafe dependent memory operations. And the interesting part for us is the Dependence Source and Destination source expressions.

This quick demonstration shows how the analysis can be integrated into the compilation process to provide valuable insights into the memory access patterns and their implications for Loop Vectorization.

Challenges and learnings

One of the challenges faced during the project was integrating support for complex structures like structs. These structures require specialized handling due to their intricacies in the LLVM IR. However, this aspect revealed the depth of understanding needed for successful interaction with LLVM’s IR and debug metadata. The project was an interesting journey, allowing for deep exploration of LLVM IR and a practical understanding of optimization remarks and metadata. Additionally, working with the loop vectorizer provided insights into its functionality and integration with custom analyses. Overall, the project served as a stepping stone for me to becoming an active contributor to the LLVM community. It provided invaluable learning opportunities and practical insights into compiler optimizations and LLVM’s architecture.

Future Work

• Handling Structs and Complex Data Types
• Support for Other LLVM Instructions
• Accurately build the source expression  when the Optimizations alters the source level data in the IR rigorously. 
• Possible integration with LLVM debugger.
• Support multiple source languages, we would need to define mappings from LLVM constructs to constructs in each target language. 

Final Words

I really wanted to thanks LLVM Foundation and my mentors Karthik Senthil and Satish Guggila for guiding over the project. It was amazing experience for me to working on this project. I am hoping that I’ll keep myself active in LLVM and Compilers. More details of this project can be found in this final report. Feel free to reach out to me at physhivam@gmail.com for discussing this patch or anything else.