GSoC 2025: Rich Disassembler for LLDB
Hello! I’m Abdullah Amin, and this summer I had the exciting opportunity to participate in Google Summer of Code (GSoC) 2025 with the LLVM Compiler Infrastructure project. My project focused on extending LLDB with a Rich Disassembler that annotates machine instructions with source-level variable information.
Mentors: Adrian Prantl and Jonas Devlieghere
Project Background
LLDB is LLVM’s debugger, capable of source-level debugging across multiple platforms and architectures. While LLDB’s disassembler shows machine instructions, it doesn’t provide much insight into how variables from the original program map onto registers or memory.
The goal of this project was to use DWARF debug information to enhance LLDB’s disassembly with variable lifetime and location annotations. This allows developers to better understand what each register contains at a given point in time, and how variables flow across instructions.
For example, instead of just seeing register usage:
0x100000f80 <+32>: movq (%rbx,%r15,8), %rdi
…the rich disassembler can add context:
0x100000f80 <+32>: movq (%rbx,%r15,8), %rdi ; i = r15
This makes it much easier to reason about code, especially in optimized builds.
What We Accomplished
Over the summer, I implemented a prototype that integrates DWARF variable location ranges into LLDB’s disassembly pipeline. The key accomplishments are:
DWARFExpressionEntry API
Added a new helper (GetExpressionEntryAtAddress) to expose variable location ranges from DWARF debug info.
PR: #144238Register-Resident Variable Annotations
Updated the disassembler to annotate instructions when variables enter, change, or leave registers.
PR: #147460Stateful Variable Tracking
ExtendedDisassembler::PrintInstructions()to track live variable states across instructions, emitting transitions such as:var = RDIwhen a variable becomes live in a registervar = <undef>when a variable goes out of scope
PR: #152887
Portable Tests
Added new LLDB API tests underlldb/test/API/functionalities/disassembler-variables/. These use stable.sfiles (with original C seeds as comments) to generate.ofiles for disassembly. This ensures reliable, portable tests independent of compiler optimizations.
PR: #152887 #155942 #156026
Example test coverage includes:
- Function parameters passed in registers (integer, floating point, and mixed).
- Variables live across function calls.
- Loop-based register rotation.
- Constants and undefined ranges.
How to use it
The annotations are available from LLDB’s disassembler. Enable them with:
(lldb) disassemble --variable-annotations``` or ```(lldb) disassemble -v
You can also target a specific symbol:
(lldb) disassemble -n loop_reg_rotate --variable-annotations
or
(lldb) disassemble -n loop_reg_rotate -v
Example
C seed (kept minimal but forces interesting register reshuffles):
__attribute__((noinline))
int loop_reg_rotate(int n, int seed) {
volatile int acc = seed; // keep as a named local
int i = 0, j = 1, k = 2; // extra pressure but not enough to spill
for (int t = 0; t < n; ++t) {
// Mix uses so the allocator may reshuffle regs for 'acc'
acc = acc + i;
asm volatile("" :: "r"(acc)); // pin 'acc' live here
acc = acc ^ j;
asm volatile("" :: "r"(acc)); // and here
acc = acc + k;
i ^= acc; j += acc; k ^= j;
}
asm volatile("" :: "r"(acc));
return acc + i + j + k;
}
Disassembly with variable annotations (excerpt):
loop_reg_rotate.o`loop_reg_rotate:
0x0 <+0>: pushq %rbp ; n = RDI, seed = RSI
0x1 <+1>: movq %rsp, %rbp
0x4 <+4>: movl %esi, -0x4(%rbp)
0x7 <+7>: testl %edi, %edi ; j = 1, k = 2, t = 0, i = 0
0x9 <+9>: jle 0x3d ; <+61> at loop_reg_rotate.c
0xb <+11>: xorl %eax, %eax
0xd <+13>: movl $0x1, %edx
0x12 <+18>: movl $0x2, %ecx
0x17 <+23>: nopw (%rax,%rax) ; j = RDX, k = RCX, i = RAX, t = <undef>
0x20 <+32>: addl %eax, -0x4(%rbp)
0x23 <+35>: movl -0x4(%rbp), %esi
0x26 <+38>: xorl %edx, -0x4(%rbp)
0x29 <+41>: movl -0x4(%rbp), %esi
0x2c <+44>: addl %ecx, -0x4(%rbp)
0x2f <+47>: xorl -0x4(%rbp), %eax
0x32 <+50>: addl -0x4(%rbp), %edx
0x35 <+53>: xorl %edx, %ecx
0x37 <+55>: decl %edi
0x39 <+57>: jne 0x20 ; <+32> at loop_reg_rotate.c:8:9
0x3d <+61>: movl $0x2, %ecx ; j = 1, k = 2, i = 0
0x42 <+66>: movl $0x1, %edx
0x47 <+71>: xorl %eax, %eax
0x49 <+73>: movl -0x4(%rbp), %esi ; j = <undef>, k = <undef>, i = <undef>
0x4c <+76>: addl %edx, %eax
0x4e <+78>: addl %ecx, %eax
0x50 <+80>: addl -0x4(%rbp), %eax
0x53 <+83>: popq %rbp
0x54 <+84>: retq
In this example:
- Function params are annotated at entry (n = RDI, seed = RSI).
- Local temporaries (i, j, k) become live in specific registers and later go
when they leave scope or change location. - Only transitions are printed (start/change/end), keeping the output concise.
Current State
Working prototype complete:
Rich disassembly annotations are now functional for variables that reside fully in registers or constants.Tested and validated:
A comprehensive set of tests confirm correctness across multiple scenarios, including register rotation, constants, and live-across-call variables.Upstreamed into LLVM:
The core implementation, supporting infrastructure, and final refactoring/formatting changes have all been merged into the main LLVM repository. This means the feature is available in the latest development builds of LLDB.
What’s Left to Do
One original goal of the project was to expose the rich disassembly annotations as structured data through LLDB’s scripting API, so that tooling can build on top of it.
While the textual annotations and stateful tracking are complete, this structured API exposure remains future work. I plan to continue working on this beyond GSoC as a follow-up contribution.
Challenges and Learnings
- DWARF complexity: Navigating DWARF location expressions and ranges was challenging, but I gained a deep understanding of how debuggers map source variables to registers and memory.
- Testing portability: Early attempts at hand-writing DWARF with
yaml2objproved too fragile. Switching to compiler-generated.sfiles provided stable and portable tests. - Collaboration: Working with my mentors taught me the value of incremental, reviewable patches and iterative design.
Conclusion
LLDB’s disassembler is a feature aimed at advanced programmers who need detailed insights into optimized machine code. With the new variable annotations, it becomes easier to understand how source-level variables map to registers and how their lifetimes evolve, bridging the gap between source code and raw assembly.
Future work will focus on structured API exposure, enabling new tooling to build on these annotations.
I am grateful to my mentors, Adrian Prantl and Jonas Devlieghere, for their guidance and support throughout the project, and to the LLVM community for reviewing and testing my work.
Related Links
- PR #144238: Add DWARFExpressionEntry API
- PR #147460: Annotate disassembly with register-resident variables
- PR #152887: Stateful variable-location annotations
- PR #155942: Fix workflow testing issues (part 1)
- PR #156026: Fix workflow testing issues (part 2)
- PR #156118: Final code formatting and refactoring
- LLVM Repository
- My GitHub Profile