Annotate disassembly with register‐resident variable locations #147460

UltimateForce21 · 2025-07-08T04:08:59Z

This PR builds upon the functionality introduced in PR #144238, which adds support for annotating disassembly output with DWARF variable locations.

This patch includes:

Additional test coverage for various DWARF location cases (e.g., stack variables, control flow edges, partial decoding).

A refined implementation that uses the new DWARFExpressionEntry API.

Logic to gracefully skip fragments while preserving valid annotations.

⚠️ Note: This PR depends on #144238 and should be rebased once it lands.

Summary

This patch adds a first‐pass prototype of “rich disassembly” annotations into Instruction::Dump(). For each instruction, it will:

Query our new GetExpressionEntryAtAddress() API for any in‐scope variable live‐ranges covering this PC.

If a variable’s live‐range begins exactly at the current PC that is being iterated in the disassembly, call expr->DumpLocation(...) to render its DWARF expression (e.g. “DW_OP_reg5 RDI”, “DW_OP_reg3 RBX”, or constant) and append it as a ; var = … comment after the instruction.

Currently this only handles simple cases:

Variables whose location expression is a single register (e.g. DW_OP_reg14 R14)

Variables whose location expression is a constant (e.g. DW_OP_consts +0)

Example test program

// test.cpp
#include <iostream>

void testFunction(int value);

int main(int argc, char** argv) {
    int total = 0;
    for (int my_special_variable = 0; my_special_variable < 5; ++my_special_variable){
        puts(argv[my_special_variable]);
        int new_variable = my_special_variable + 1;
        testFunction(new_variable); 
        total += my_special_variable;
    }
    return total;
}


void testFunction(int value) {
    std::cout << "Local variable value: " << value << std::endl;
}

Disassembly output with annotations

(lldb) target create "/home/ultimateforce21/lldb_test/test"
Current executable set to '/home/ultimateforce21/lldb_test/test' (x86_64).
(lldb) b main
Breakpoint 1: where = test`main + 16 at test.cpp:10:14, address = 0x0000000000001200
(lldb) r
Process 26674 launched: '/home/ultimateforce21/lldb_test/test' (x86_64)
Process 26674 stopped
* thread #1, name = 'test', stop reason = breakpoint 1.1
    frame #0: 0x0000555555555200 test`main(argc=<unavailable>, argv=0x00007fffffffdc68) at test.cpp:10:14
   7    int main(int argc, char** argv) {
   8        int total = 0;
   9        for (int my_special_variable = 0; my_special_variable < 5; ++my_special_variable){
-> 10           puts(argv[my_special_variable]);
   11           int new_variable = my_special_variable + 1;
   12           testFunction(new_variable); 
   13           total += my_special_variable;
(lldb) dis
test`main:
    0x5555555551f0 <+0>:  pushq  %r14           ; argc = DW_OP_reg5 RDI ; argv = DW_OP_reg4 RSI ; std::__ioinit = DW_OP_addrx 0x0
    0x5555555551f2 <+2>:  pushq  %rbx
    0x5555555551f3 <+3>:  pushq  %rax
    0x5555555551f4 <+4>:  movq   %rsi, %r14
    0x5555555551f7 <+7>:  xorl   %ebx, %ebx ; total = DW_OP_consts +0, DW_OP_stack_value
    0x5555555551f9 <+9>:  nopl   (%rax) ; my_special_variable = DW_OP_reg3 RBX ; argc = DW_OP_entry_value(DW_OP_reg5 RDI), DW_OP_stack_value ; argv = DW_OP_reg14 R14
->  0x555555555200 <+16>: movq   (%r14,%rbx,8), %rdi
    0x555555555204 <+20>: callq  0x5555555550a0 ; symbol stub for: puts
    0x555555555209 <+25>: addq   $0x1, %rbx
    0x55555555520d <+29>: movl   %ebx, %edi ; new_variable = DW_OP_reg3 RBX
    0x55555555520f <+31>: callq  0x555555555230 ; testFunction at test.cpp:19
    0x555555555214 <+36>: cmpq   $0x5, %rbx ; my_special_variable = DW_OP_reg3 RBX
    0x555555555218 <+40>: jne    0x555555555200 ; <+16> at test.cpp:10:14
    0x55555555521a <+42>: movl   $0xa, %eax
    0x55555555521f <+47>: addq   $0x8, %rsp
    0x555555555223 <+51>: popq   %rbx
    0x555555555224 <+52>: popq   %r14
    0x555555555226 <+54>: retq                             ; argv = DW_OP_entry_value(DW_OP_reg4 RSI), DW_OP_stack_value

Next steps

Cleanup & style (DONE)

Tidy up any styling according to LLDB’s conventions, and handle any off‐by‐one column issues.

Test coverage (DONE)

Add a lldb-test cases exercising single-register variables at known instructions.

…DWARFExpressionList This introduces a new API for retrieving DWARF expression metadata associated with variable location entries at a given PC address. It provides the base, end, and expression pointer for downstream consumers such as disassembler annotations. Intended for use in richer instruction annotations in Instruction::Dump().

Co-authored-by: Jonas Devlieghere <[email protected]>

Updated comment for GetExpressionEntryAtAddress to directly refer to struct DWARFExpressionEntry Co-authored-by: Jonas Devlieghere <[email protected]>

updating code style for function GetExpressionEntryAtAddress Co-authored-by: Jonas Devlieghere <[email protected]>

Replace raw base/end with `AddressRange` in `DWARFExpressionEntry` and cleans up helper comments to follow Doxygen convention. Using `AddressRange` makes the intent clearer, avoids duplication of basic `AddressRange` logic usage

Converts `GetExpressionEntryAtAddress` to return `llvm::Expected<DWARFExpressionEntry>` using the updated `DWARFExpressionEntry`. Updates the implementation to compute a single `AddressRange file_range` for each DWARF location interval.

Updated commenting style for struct DWARFExpressionEntry

Updated function `llvm::Expected<DWARFExpressionList::DWARFExpressionEntry> DWARFExpressionList::GetExpressionEntryAtAddress` to use `FindEntryThatContains` instead of `FindEntryIndexThatContains`

… llvm::Expected

Co-authored-by: Adrian Prantl <[email protected]>

This patch adds explicit checks: - ensure `load_addr >= func_load_addr` to avoid underflow, - compute and verify a temporary delta variable, then verify `delta + m_func_file_addr` does not exceed `addr_t` max to avoid overflow.

…bly output. Right now just checks if DW annotations show up for a basic program and that a variable location is annotated (i.e 'a = DW_OP_reg...').

- Fixed an issue where variable location annotations were not shown if the current instruction address did not exactly match the DWARF base address. Now, annotations are shown as long as the PC is within the valid range. - Improved alignment of annotation comments in Instruction::Dump(). While `FillLastLineToColumn` can sometimes overcompensate due to internal formatting or byte-width mismatches, the overall alignment is now significantly more consistent and readable.

Previously, when a DWARF expression contained any decoding error, the entire variable location annotation was printed with the error, e.g. `c = DW_OP_addr 0x0, <decoding error> 00 00 00`. This was misleading and cluttered the disassembly view. This patch improves the formatting by stripping out the `<decoding error ...>` fragments while preserving the valid portions of the expression, so that partial information like `c = DW_OP_addr 0x0` can still be shown. This allows the rich disassembler to give more meaningful variable annotations, especially in optimized (-O1/-O2) builds where partial DWARF corruption or unsupported expressions may occur.

Handled edge case where the entire DWARF expression is a `<decoding error>`, ensuring no misleading or empty annotations are printed for such variables.

This patch adds API tests to verify that DWARF variable location annotations are correctly displayed in the disassembly output. The tests cover: - Local variables in loops and functions - Multiple stack variables - Control flow edge cases - Different optimization levels (-O0, -O1, -O2) - Ensuring decoding errors are excluded from output

…try API This rebases the `add-disassembler-annotations` work onto the latest `add-dwarfexprentry-api` branch so that the instruction annotation patches sit cleanly atop the new DWARFExpressionEntry struct and helper API. All conflicts have been resolved and the annotation code now integrates with the updated std::optional<AddressRange>-based GetExpressionEntryAtAddress signature.

github-actions · 2025-07-08T04:09:18Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

UltimateForce21 · 2025-07-08T04:10:48Z

@JDevlieghere @adrian-prantl I have created the PR #144238 dependent Rich Disassembler Annotations PR.

…w `DWARFExpression::DumpLocationWithOptions` for simplified expression printing This patch introduces a PrintRegisterOnly flag to the DIDumpOptions struct, enabling concise rendering of DWARF expressions for disassembler annotations. Key changes: - Added DumpLocationWithOptions to DWARFExpression for flexible dumping with DIDumpOptions. - Updated DWARFExpression::print and Operation::print to respect PrintRegisterOnly, rendering registers like RDI without DW_OP_ or llvm: prefixes. - Suppressed <decoding error> output when PrintRegisterOnly is true to avoid clutter during register-only disassembly output. These changes are motivated by LLDB’s rich disassembler feature, where annotations should match user-facing register names without DWARF-level detail. Test impact: Some rich-disassembler tests that relied on DW_OP_ for validation were deprecated. Updated tests aligned with the new formatting will be added next.

adrian-prantl · 2025-07-16T12:42:06Z

lldb/source/Core/Disassembler.cpp


+  const size_t annotation_column = 150;
+
+  if (exe_ctx && exe_ctx->GetFramePtr()) {


Can you add a high-level comment here to explain what this block of code is doing?

Added now in commit: 09c4d04b33e16e25da3fbfb826c609c8bb42264e

adrian-prantl · 2025-07-16T12:42:37Z

lldb/source/Core/Disassembler.cpp

+    if (frame && target_sp) {
+      addr_t current_pc = m_address.GetLoadAddress(target_sp.get());
+      addr_t original_pc = frame->GetFrameCodeAddress().GetLoadAddress(target_sp.get());
+      if (frame->ChangePC(current_pc)) {


What is m_address here? Is it really necessary to modify the PC of frame?

m_address here refers to the address of the current instruction being dumped in the disassembler output. Since Instruction::Dump is called per-instruction, m_address gives us the load address of the instruction currently being printed.

I used frame->ChangePC(current_pc) to temporarily update the frame's PC to this instruction address. This is necessary because GetInScopeVariableList(true) relies on the frame's current PC to determine which variables are in scope and what their locations are at that exact address. Without setting the PC to the instruction being dumped, the variable list might reflect a different point in the function, which could lead to incorrect or missing annotations.

That said, I make sure to save the original PC with frame->GetFrameCodeAddress() and restore it afterward with frame->ChangePC(original_pc) to avoid any persistent side effects on the frame state.

If there’s a better way to get the in-scope variables for a given address without mutating the frame, I’m happy to revise the approach.

adrian-prantl · 2025-07-16T12:44:09Z

lldb/source/Core/Disassembler.cpp

+          func_load_addr = sc.function->GetAddress().GetLoadAddress(target_sp.get());
+
+
+        if(ss.GetSizeOfLastLine() < annotation_column) {


Does this check if there is enough space for an annotation? Maybe comment that?

Yes, exactly. I have now added the comment in commit: 6e17f77c4300654c7664b52efc045983bb59721d

adrian-prantl · 2025-07-16T12:45:49Z

lldb/source/Core/Disassembler.cpp

+
+          std::vector<std::string> annotations;
+
+          if (var_list_sp) {


LLVM coding style prefers early exits over checks. Maybe you can wrap this whole block in a lambda or static function, then this can become

if (!var_list_sp) return;

Thanks for the suggestion! I've pushed the change in 31431c0, which moves the annotation logic into a local annotate_variables lambda with early exits.

adrian-prantl · 2025-07-16T12:46:38Z

lldb/source/Core/Disassembler.cpp

+          std::vector<std::string> annotations;
+
+          if (var_list_sp) {
+            for (size_t i = 0; i < var_list_sp->GetSize(); ++i) {


can this be a a range-based for? (i.e, for (VariableSP var_sp : *var_list_sp))

I've updated the loop to use a range-based for as recommended. The change is now committed in 9c5cb8f.

adrian-prantl · 2025-07-16T12:51:24Z

lldb/source/Core/Disassembler.cpp

+            if (!annotations.empty()) {
+              ss.FillLastLineToColumn(annotation_column, ' ');
+              ss.PutCString(" ; ");
+              for (size_t i = 0; i < annotations.size(); ++i) {


range-based for

there is also a helper in LLVM to join a list with a separator (https://llvm.org/doxygen/StringExtras_8h_source.html)

I replaced the manual string concatenation now with llvm::join as suggested in commit
ffefe5f.

adrian-prantl · 2025-07-16T12:52:06Z

lldb/include/lldb/Expression/DWARFExpression.h

  }

  void DumpLocation(Stream *s, lldb::DescriptionLevel level, ABI *abi) const;
+  void DumpLocationWithOptions(Stream *s, lldb::DescriptionLevel level, ABI *abi, llvm::DIDumpOptions options) const;


why not have one version with a default argument?

void DumpLocationWithOptions(Stream *s, lldb::DescriptionLevel level, ABI *abi, llvm::DIDumpOptions options = {}) const;

I've updated the code now to use a single DumpLocation method with a default DIDumpOptions argument, as recommended. The DumpLocationWithOptions variant has been removed in commit ca8510c.

adrian-prantl · 2025-07-16T12:53:59Z

lldb/source/Expression/DWARFExpressionList.cpp

+
+  // Guard against overflow.
+  lldb::addr_t delta = load_addr - func_load_addr;
+  if (delta > std::numeric_limits<lldb::addr_t>::max() - m_func_file_addr)


I think it would be better to use the add with overflow methods from MathExtras here.

From when I dug into MathExtras last time, I found that its AddOverflow and SubOverflow helpers are only defined for signed types.

I they rely on two’s-complement signed overflow semantics and return the truncated signed result. Since lldb::addr_t is an unsigned address, I think using those helpers would either fail to compile or silently convert to signed semantics.

That's why I tried to implement the signed overflow check into the function.

adrian-prantl · 2025-07-16T13:01:19Z

As high-level comments:

I would filter out the entry values, since this feature is meant to help a user understand the disassembly, and the entry values don't really exist in the function.
It would help readability if all annotations started at the same column
in a follow-up patch it would be great to outline the valid range of each expression
is it possible to hide the DW_OP... part and just show the human-readable register name?
DW_OP_consts +0, DW_OP_stack_value -> I would filter out anything that isn't a pure register. Then we can later recognize certain kinds of expression, for example, this is a constant, so it should just print = 0

UltimateForce21 · 2025-07-16T18:54:37Z

As high-level comments:

I would filter out the entry values, since this feature is meant to help a user understand the disassembly, and the entry values don't really exist in the function.

It would help readability if all annotations started at the same column

in a follow-up patch it would be great to outline the valid range of each expression

is it possible to hide the DW_OP... part and just show the human-readable register name?

DW_OP_consts +0, DW_OP_stack_value -> I would filter out anything that isn't a pure register. Then we can later recognize certain kinds of expression, for example, this is a constant, so it should just print = 0

Thank you for the thoughtful feedback! Here's a breakdown of what has been addressed so far and what I’m currently working on:

DW_OP_ suppression and human-readable register names
The most recent commit 912ba6d introduces a new PrintRegisterOnly flag in DIDumpOptions, along with a DumpLocationWithOptions() API. This allows us to display clean, architecture-specific register names (e.g. RDI) without the DW_OP_ prefix or LLVM-internal formatting.

It also suppresses <decoding error> messages when PrintRegisterOnly is enabled, helping keep output clean for disassembly annotations.

Column alignment for annotations
I attempted to align annotations to a fixed column, and this works well in non-colored output. However, with color enabled, the byte-length of character sequences skews alignment. I'm investigating this, and suspect that the issue stems from the color codes affecting character counting without accounting for their visual width.
Variable invalidation (end of live range)
I'm currently working on modifying Disassembler::PrintInstructions() to allow tracking of variable live ranges across instructions. The goal is to emit annotations like var = <invalid> at the exact PC where a variable’s location range ends—while ensuring this is printed only once per variable.
DW_OP_consts filtering
I’ll follow up with a commit to recognize and handle DW_OP_consts and similar constant expressions more gracefully (e.g. print var = 0), while continuing to filter out more complex or mixed expressions for now, as suggested.

Thanks again—will continue to iterate on this based on your guidance and on all the other comments you have made!

jasonmolenda · 2025-07-17T00:55:49Z

Interesting PR, thanks for looking at adding these annotations. I think exposing these dwarf location expressions in a more user-visible way is going to put a lot of pressure on making them more meaningful to users, as opposed to debugging prints for debugger/compiler developers. Another example of an unfortunate behavior on AArch64 where x0 is the 64-bit register name, w0 is the 32-bit register name, but the DWARF register number doesn't specify which it is (it depends on the type of the value to read 32- or 64-bits), so the expression parser always prints the "w" register by default which might be a little confusing. The UI of making the dwarf expression parser a human readable format instead of a simple dump of dwarf opcodes and arguments is maybe the most interesting one to think about/enhance.

One thing which is obviously correct, but was unintuitive at first for me, was that the DWARF location is only known after the instruction initializing it, e.g.

    0x5555555551f7 <+7>:  xorl   %ebx, %ebx
    0x5555555551f9 <+9>:  nopl   (%rax) ; my_special_variable = DW_OP_reg3 RBX

for uint32_t my_special_variable = 0. The comment on +9 is really reflecting what has been done when +7 has executed. It's CORRECT, but at first blush it's not what I'd have expected. I think changing this would be more confusing when people are examining edge cases than otherwise. Also in the sample codegen, we see my_special_variable cease to exist for a little when that register is reused as new_variable == my_special_variable+1 and then that same incr value becomes my_special_variable when we loop and we avoid incrementing it again. Oh well, that's the fun of reading assembly code, no way to make that easier.

jasonmolenda · 2025-07-17T06:00:12Z

Incidentally, this is always something people get a little confused by when reading assembly. We see it with backtraces especially. A CALL instruction executes and the return address is the next instruction. That confuses people looking at the assembly language -- the "pc" pointer is pointing to the next instruction! -- but we cheat it for source symbolication, subtracting 1 from the return address when we do source line lookup so we show the function call as "currently executing" when we really have the return address which may point to the next source line. We have to do the same thing with dwarf location list lookups, where we might have a noreturn function call, and the variables available at the return address (which is the target of a branch from another part of the function) are drastically different.

I'm aways unsure of how much to lie to the user in these scenarios. The general call is "at source level lie, at assembly level tell the truth of what the return address shows us". It gets a little messier on a function that ends in a noreturn CALL as the last instruction, we don't want to show the return address as the next function in memory, even though that's what it really shows us.

…n Instruction::Dump

…:Dump

…tate_variable` lambda function

…Dump

…IDumpOptions instead of having to introduce new function

adrian-prantl · 2025-07-25T16:43:00Z

lldb/source/Core/Disassembler.cpp

+  // disassembled instruction stream, similar to how debug information
+  // enhances source-level debugging.
+
+  const size_t annotation_column = 150;


@JDevlieghere Question for a future PR: Should we consider threading through and checking the actual width of the Terminal to see if the annotations would fit?

Resolve conflicts in DWARFExpression files after upstream refactor

github-actions · 2025-08-04T19:56:01Z

✅ With the latest revision this PR passed the Python code formatter.

github-actions · 2025-08-04T19:56:01Z

✅ With the latest revision this PR passed the C/C++ code formatter.

UltimateForce21 · 2025-08-06T05:10:04Z

Hi @adrian-prantl,

While debugging the current CI test failures, I found that a number of existing LLDB API tests (e.g., TestFrameSelect, TestInferiorAssert) depend on the exact disassembly output.
Since my changes add new annotations in Instruction::Dump, they alter the default output and cause these tests to fail.

To address this, I’m considering gating the annotation output behind a new LLDB setting, for example:

settings set enable-rich-disassembly true

This would default to false, so the default disassembly output would be unchanged for tests and existing users, but developers/users could explicitly enable the richer annotated disassembly when desired.

My thinking is:

CI and default workflows see the same output as before (tests continue to pass).
Rich annotations are still available when activated.

Does that sound like a reasonable approach or is there a better/easier solution?

adrian-prantl · 2025-08-06T23:31:13Z

Hi @adrian-prantl,

While debugging the current CI test failures, I found that a number of existing LLDB API tests (e.g., TestFrameSelect, TestInferiorAssert) depend on the exact disassembly output. Since my changes add new annotations in Instruction::Dump, they alter the default output and cause these tests to fail.

To address this, I’m considering gating the annotation output behind a new LLDB setting, for example:
settings set enable-rich-disassembly true

This, or adding an option to the dissemble CommandObject is the right solution. Personally I think I would prefer an option to the CommandObject. Do you think this is feasible, or are there more many points through which the disassembly is reachable, making an option infeasible?

This would default to false, so the default disassembly output would be unchanged for tests and existing users, but developers/users could explicitly enable the richer annotated disassembly when desired.

I think that is correct while the feature is under development, but I think the goal should be for it to be on by default, with the few tests relying on it not being there opting out.

My thinking is:

CI and default workflows see the same output as before (tests continue to pass).

Rich annotations are still available when activated.

Does that sound like a reasonable approach or is there a better/easier solution?

Yes, this is the right approach!

UltimateForce21 · 2025-08-07T01:45:00Z

adding an option to the dissemble CommandObject is the right solution. Personally I think I would prefer an option to the CommandObject. Do you think this is feasible, or are there more many points through which the disassembly is reachable, making an option infeasible?

After looking more closely into one of the failing test cases, TestFrameDisassemble.py, I noticed that its entry point for disassembly is through SBFrame::Disassemble(), which bypasses the CLI entirely and doesn’t go through CommandObjectDisassemble.

Since this means command-line options like --rich wouldn’t apply in this context, would it be okay if I start by introducing a new setting (e.g., target.enable-rich-disassembly) to gate the annotation output globally?

Once that’s in place and CI is passing, I can follow up with a separate change to add the --rich option to CommandObjectDisassemble, which would override the setting for CLI use cases and give users more fine-grained control.

Does that sound reasonable?

adrian-prantl · 2025-08-07T15:07:51Z

adding an option to the dissemble CommandObject is the right solution. Personally I think I would prefer an option to the CommandObject. Do you think this is feasible, or are there more many points through which the disassembly is reachable, making an option infeasible?

After looking more closely into one of the failing test cases, TestFrameDisassemble.py, I noticed that its entry point for disassembly is through SBFrame::Disassemble(), which bypasses the CLI entirely and doesn’t go through CommandObjectDisassemble.

There are multiple options:

keep the original behavior and call the internal API from SBFrame::Disassemble with the flag to turn off the annotations
(optional) add an SBFrame::Disassemble(class DisassembleOptions) overload

Since this means command-line options like --rich wouldn’t apply in this context, would it be okay if I start by introducing a new setting (e.g., target.enable-rich-disassembly) to gate the annotation output globally?

I think I would prefer implementing option (1) and worry about exposing the new behavior through the API later. I would like to avoid global or target-specific settings if possible, because they can introduce unwanted side effects. (Imagine an IDE that depends on the traditional disassemble format, and a user that turns on the setting in their lldbinit file).

Once that’s in place and CI is passing, I can follow up with a separate change to add the --rich option to CommandObjectDisassemble, which would override the setting for CLI use cases and give users more fine-grained control.

Does that sound reasonable?

UltimateForce21 · 2025-08-08T01:37:14Z

Thanks, that makes sense.

Given what I found in the failing tests (some go through SBFrame::Disassemble(), others through the disassemble CLI), I see two viable rollouts:

Phase 1 (minimal blast radius):

Thread an enable_rich boolean into the disassembly path.
Pass false from both SBFrame::Disassemble() and CommandObjectDisassemble by default, so current output stays identical and CI/IDEs remain unaffected.
Add the CLI override (e.g. disassemble --rich) that sets enable_rich=true for that invocation.

Phase 2 (once we’re confident):

Flip the default to on in the CLI path (and/or provide an SB overload later), and update the handful of tests that rely on the old format to opt out (e.g. disassemble --no-rich). This matches our long-term goal of making it on by default while keeping legacy tests/tools stable.

I can start with Phase 1 now (SB path passes false; CommandObjectDisassemble passes false by default; new --rich flag), then follow up with a small PR to flip the default and adjust affected tests.

…e path** * Added a new `--rich` (`-R`) command-line option to `CommandObjectDisassemble` to enable rich disassembly annotations for the current invocation. * Plumbed a new `enable_rich_annotations` flag through: * `Disassembler::Disassemble` overloads * `Disassembler::PrintInstructions` * `Instruction::Dump` * `StackFrame::Disassemble` * Updated `StackFrame::Disassemble` to take an optional `bool enable_rich_annotations` (default `false`) so the SB API can request annotated output without CLI involvement. * Ensured annotations are only added when `enable_rich_annotations` is `true`; preserved caching for the non-rich path. * Modified `Options.td` to define the new `--rich` option. * Added/updated API test `TestRichDisassembler.py` to run `disassemble --rich -f` and check annotated output. * Kept default behavior unchanged so existing scripts and IDE integrations are unaffected.

UltimateForce21 · 2025-08-09T05:38:42Z

I’ve addressed all the requested changes and fixed the formatting issues.
The implementation is now complete and all tests are passing locally.

@adrian.prantl, when you have time, could you please review the final version and merge if everything looks good?

Thanks!

…ions() (follow-up to #147460) (#152887) **Context** Follow-up to [#147460](#147460), which added the ability to surface register-resident variable locations. This PR moves the annotation logic out of `Instruction::Dump()` and into `Disassembler::PrintInstructions()`, and adds lightweight state tracking so we only print changes at range starts and when variables go out of scope. --- ## What this does While iterating the instructions for a function, we maintain a “live variable map” keyed by `lldb::user_id_t` (the `Variable`’s ID) to remember each variable’s last emitted location string. For each instruction: - **New (or newly visible) variable** → print `name = <location>` once at the start of its DWARF location range, cache it. - **Location changed** (e.g., DWARF range switched to a different register/const) → print the updated mapping. - **Out of scope** (was tracked previously but not found for the current PC) → print `name = <undef>` and drop it. This produces **concise, stateful annotations** that highlight variable lifetime transitions without spamming every line. --- ## Why in `PrintInstructions()`? - Keeps `Instruction` stateless and avoids changing the `Instruction::Dump()` virtual API. - Makes it straightforward to diff state across instructions (`prev → current`) inside the single driver loop. --- ## How it works (high-level) 1. For the current PC, get in-scope variables via `StackFrame::GetInScopeVariableList(/*get_parent=*/true)`. 2. For each `Variable`, query `DWARFExpressionList::GetExpressionEntryAtAddress(func_load_addr, current_pc)` (added in #144238). 3. If the entry exists, call `DumpLocation(..., eDescriptionLevelBrief, abi)` to get a short, ABI-aware location string (e.g., `DW_OP_reg3 RBX → RBX`). 4. Compare against the last emitted location in the live map: - If not present → emit `name = <location>` and record it. - If different → emit updated mapping and record it. 5. After processing current in-scope variables, compute the set difference vs. the previous map and emit `name = <undef>` for any that disappeared. Internally: - We respect file↔load address translation already provided by `DWARFExpressionList`. - We reuse the ABI to map LLVM register numbers to arch register names. --- ## Example output (x86_64, simplified) ``` -> 0x55c6f5f6a140 <+0>: cmpl $0x2, %edi ; argc = RDI, argv = RSI 0x55c6f5f6a143 <+3>: jl 0x55c6f5f6a176 ; <+54> at d_original_example.c:6:3 0x55c6f5f6a145 <+5>: pushq %r15 0x55c6f5f6a147 <+7>: pushq %r14 0x55c6f5f6a149 <+9>: pushq %rbx 0x55c6f5f6a14a <+10>: movq %rsi, %rbx 0x55c6f5f6a14d <+13>: movl %edi, %r14d 0x55c6f5f6a150 <+16>: movl $0x1, %r15d ; argc = R14 0x55c6f5f6a156 <+22>: nopw %cs:(%rax,%rax) ; i = R15, argv = RBX 0x55c6f5f6a160 <+32>: movq (%rbx,%r15,8), %rdi 0x55c6f5f6a164 <+36>: callq 0x55c6f5f6a030 ; symbol stub for: puts 0x55c6f5f6a169 <+41>: incq %r15 0x55c6f5f6a16c <+44>: cmpq %r15, %r14 0x55c6f5f6a16f <+47>: jne 0x55c6f5f6a160 ; <+32> at d_original_example.c:5:10 0x55c6f5f6a171 <+49>: popq %rbx ; i = <undef> 0x55c6f5f6a172 <+50>: popq %r14 ; argv = RSI 0x55c6f5f6a174 <+52>: popq %r15 ; argc = RDI 0x55c6f5f6a176 <+54>: xorl %eax, %eax 0x55c6f5f6a178 <+56>: retq ``` Only transitions are shown: the start of a location, changes, and end-of-lifetime. --- ## Scope & limitations (by design) - Handles **simple locations** first (registers, const-in-register cases surfaced by `DumpLocation`). - **Memory/composite locations** are out of scope for this PR. - Annotations appear **only at range boundaries** (start/change/end) to minimize noise. - Output is **target-independent**; register names come from the target ABI. ## Implementation notes - All annotation printing now happens in `Disassembler::PrintInstructions()`. - Uses `std::unordered_map<lldb::user_id_t, std::string>` as the live map. - No persistent state across calls; the map is rebuilt while walking instruction by instruction. - **No changes** to the `Instruction` interface. --- ## Requested feedback - Placement and wording of the `<undef>` marker. - Whether we should optionally gate this behind a setting (currently always on when disassembling with an `ExecutionContext`). - Preference for immediate inclusion of tests vs. follow-up patch. --- Thanks for reviewing! Happy to adjust behavior/format based on feedback. --------- Co-authored-by: Jonas Devlieghere <[email protected]> Co-authored-by: Adrian Prantl <[email protected]>

…intInstructions() (follow-up to #147460) (#152887) **Context** Follow-up to [#147460](llvm/llvm-project#147460), which added the ability to surface register-resident variable locations. This PR moves the annotation logic out of `Instruction::Dump()` and into `Disassembler::PrintInstructions()`, and adds lightweight state tracking so we only print changes at range starts and when variables go out of scope. --- ## What this does While iterating the instructions for a function, we maintain a “live variable map” keyed by `lldb::user_id_t` (the `Variable`’s ID) to remember each variable’s last emitted location string. For each instruction: - **New (or newly visible) variable** → print `name = <location>` once at the start of its DWARF location range, cache it. - **Location changed** (e.g., DWARF range switched to a different register/const) → print the updated mapping. - **Out of scope** (was tracked previously but not found for the current PC) → print `name = <undef>` and drop it. This produces **concise, stateful annotations** that highlight variable lifetime transitions without spamming every line. --- ## Why in `PrintInstructions()`? - Keeps `Instruction` stateless and avoids changing the `Instruction::Dump()` virtual API. - Makes it straightforward to diff state across instructions (`prev → current`) inside the single driver loop. --- ## How it works (high-level) 1. For the current PC, get in-scope variables via `StackFrame::GetInScopeVariableList(/*get_parent=*/true)`. 2. For each `Variable`, query `DWARFExpressionList::GetExpressionEntryAtAddress(func_load_addr, current_pc)` (added in #144238). 3. If the entry exists, call `DumpLocation(..., eDescriptionLevelBrief, abi)` to get a short, ABI-aware location string (e.g., `DW_OP_reg3 RBX → RBX`). 4. Compare against the last emitted location in the live map: - If not present → emit `name = <location>` and record it. - If different → emit updated mapping and record it. 5. After processing current in-scope variables, compute the set difference vs. the previous map and emit `name = <undef>` for any that disappeared. Internally: - We respect file↔load address translation already provided by `DWARFExpressionList`. - We reuse the ABI to map LLVM register numbers to arch register names. --- ## Example output (x86_64, simplified) ``` -> 0x55c6f5f6a140 <+0>: cmpl $0x2, %edi ; argc = RDI, argv = RSI 0x55c6f5f6a143 <+3>: jl 0x55c6f5f6a176 ; <+54> at d_original_example.c:6:3 0x55c6f5f6a145 <+5>: pushq %r15 0x55c6f5f6a147 <+7>: pushq %r14 0x55c6f5f6a149 <+9>: pushq %rbx 0x55c6f5f6a14a <+10>: movq %rsi, %rbx 0x55c6f5f6a14d <+13>: movl %edi, %r14d 0x55c6f5f6a150 <+16>: movl $0x1, %r15d ; argc = R14 0x55c6f5f6a156 <+22>: nopw %cs:(%rax,%rax) ; i = R15, argv = RBX 0x55c6f5f6a160 <+32>: movq (%rbx,%r15,8), %rdi 0x55c6f5f6a164 <+36>: callq 0x55c6f5f6a030 ; symbol stub for: puts 0x55c6f5f6a169 <+41>: incq %r15 0x55c6f5f6a16c <+44>: cmpq %r15, %r14 0x55c6f5f6a16f <+47>: jne 0x55c6f5f6a160 ; <+32> at d_original_example.c:5:10 0x55c6f5f6a171 <+49>: popq %rbx ; i = <undef> 0x55c6f5f6a172 <+50>: popq %r14 ; argv = RSI 0x55c6f5f6a174 <+52>: popq %r15 ; argc = RDI 0x55c6f5f6a176 <+54>: xorl %eax, %eax 0x55c6f5f6a178 <+56>: retq ``` Only transitions are shown: the start of a location, changes, and end-of-lifetime. --- ## Scope & limitations (by design) - Handles **simple locations** first (registers, const-in-register cases surfaced by `DumpLocation`). - **Memory/composite locations** are out of scope for this PR. - Annotations appear **only at range boundaries** (start/change/end) to minimize noise. - Output is **target-independent**; register names come from the target ABI. ## Implementation notes - All annotation printing now happens in `Disassembler::PrintInstructions()`. - Uses `std::unordered_map<lldb::user_id_t, std::string>` as the live map. - No persistent state across calls; the map is rebuilt while walking instruction by instruction. - **No changes** to the `Instruction` interface. --- ## Requested feedback - Placement and wording of the `<undef>` marker. - Whether we should optionally gate this behind a setting (currently always on when disassembling with an `ExecutionContext`). - Preference for immediate inclusion of tests vs. follow-up patch. --- Thanks for reviewing! Happy to adjust behavior/format based on feedback. --------- Co-authored-by: Jonas Devlieghere <[email protected]> Co-authored-by: Adrian Prantl <[email protected]>

UltimateForce21 and others added 19 commits June 11, 2025 00:29

Update lldb/include/lldb/Expression/DWARFExpressionList.h

1db5002

Co-authored-by: Jonas Devlieghere <[email protected]>

Update lldb/include/lldb/Expression/DWARFExpressionList.h

a26010b

Updated comment for GetExpressionEntryAtAddress to directly refer to struct DWARFExpressionEntry Co-authored-by: Jonas Devlieghere <[email protected]>

Update lldb/source/Expression/DWARFExpressionList.cpp

72237b7

updating code style for function GetExpressionEntryAtAddress Co-authored-by: Jonas Devlieghere <[email protected]>

Update DWARFExpressionList.h

94e4951

Replace raw base/end with `AddressRange` in `DWARFExpressionEntry` and cleans up helper comments to follow Doxygen convention. Using `AddressRange` makes the intent clearer, avoids duplication of basic `AddressRange` logic usage

Update DWARFExpressionList.cpp

e8142da

Converts `GetExpressionEntryAtAddress` to return `llvm::Expected<DWARFExpressionEntry>` using the updated `DWARFExpressionEntry`. Updates the implementation to compute a single `AddressRange file_range` for each DWARF location interval.

Update DWARFExpressionList.h

7e8741e

Updated commenting style for struct DWARFExpressionEntry

Update DWARFExpressionList.cpp

c4cd77f

Updated function `llvm::Expected<DWARFExpressionList::DWARFExpressionEntry> DWARFExpressionList::GetExpressionEntryAtAddress` to use `FindEntryThatContains` instead of `FindEntryIndexThatContains`

Change GetExpressionEntryAtAddress to return std::optional instead of…

62c02a9

… llvm::Expected

Update DWARFExpressionList.cpp

d015971

Co-authored-by: Adrian Prantl <[email protected]>

Make file_range optional in DWARFExpressionEntry for always-valid expr

3462165

Annotate Instruction::Dump() with DWARF variable locations

2ed8443

Added Initial Basic API test for rich variable annotation in disassem…

8c6b22d

…bly output. Right now just checks if DW annotations show up for a basic program and that a variable location is annotated (i.e 'a = DW_OP_reg...').

Ignore annotations with only decoding errors

6bbc8aa

Handled edge case where the entire DWARF expression is a `<decoding error>`, ensuring no misleading or empty annotations are printed for such variables.

UltimateForce21 requested a review from JDevlieghere as a code owner July 8, 2025 04:09

JDevlieghere requested a review from adrian-prantl July 14, 2025 15:55

adrian-prantl reviewed Jul 16, 2025

View reviewed changes

Add high-level comment explaining rich disassembly annotation logic i…

09c4d04

…n Instruction::Dump

UltimateForce21 added 5 commits July 20, 2025 15:38

Add comment clarifying annotation column length check in Instruction:…

6e17f77

…:Dump

Refactor variable annotation logic in Instruction::Dump using `anno…

31431c0

…tate_variable` lambda function

Use range-based for loop for variable list iteration in Instruction::…

9c5cb8f

…Dump

Consolidated DumpLocation and DumpLocationWithOptions using default D…

ca8510c

…IDumpOptions instead of having to introduce new function

Use llvm::join to simplify annotation output formatting

ffefe5f

adrian-prantl approved these changes Jul 25, 2025

View reviewed changes

Merge branch 'main' into add-disassembler-annotations

fae745a

Resolve conflicts in DWARFExpression files after upstream refactor

UltimateForce21 added 3 commits August 4, 2025 16:01

Fix formatting to match LLVM style

dcddf16

More formatting fixes

7bac074

Fix formatting for code and tests

79c0a9e

UltimateForce21 and others added 4 commits August 8, 2025 11:14

Formatting changes.

6ca4bb6

Merge branch 'main' into add-disassembler-annotations

4bf584e

Redo Workflow tests

b1f13e7

UltimateForce21 mentioned this pull request Aug 10, 2025

Stateful variable-location annotations in Disassembler::PrintInstructions() (follow-up to #147460) #152887

Merged


		const size_t annotation_column = 150;

		if (exe_ctx && exe_ctx->GetFramePtr()) {

		func_load_addr = sc.function->GetAddress().GetLoadAddress(target_sp.get());


		if(ss.GetSizeOfLastLine() < annotation_column) {

Annotate disassembly with register‐resident variable locations #147460

Are you sure you want to change the base?

Annotate disassembly with register‐resident variable locations #147460

Uh oh!

Conversation

UltimateForce21 commented Jul 8, 2025

Summary

Example test program

Disassembly output with annotations

Next steps

Uh oh!

github-actions bot commented Jul 8, 2025

Uh oh!

UltimateForce21 commented Jul 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

UltimateForce21 Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

UltimateForce21 Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

UltimateForce21 Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adrian-prantl commented Jul 16, 2025

Uh oh!

UltimateForce21 commented Jul 16, 2025

Uh oh!

jasonmolenda commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasonmolenda commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

UltimateForce21 commented Aug 6, 2025

Uh oh!

adrian-prantl commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

UltimateForce21 commented Aug 7, 2025

Uh oh!

adrian-prantl commented Aug 7, 2025

Uh oh!

UltimateForce21 Jul 20, 2025 •

edited

Loading

UltimateForce21 Jul 20, 2025 •

edited

Loading

UltimateForce21 Jul 20, 2025 •

edited

Loading

jasonmolenda commented Jul 17, 2025 •

edited

Loading

jasonmolenda commented Jul 17, 2025 •

edited

Loading

github-actions bot commented Aug 4, 2025 •

edited

Loading

github-actions bot commented Aug 4, 2025 •

edited

Loading

adrian-prantl commented Aug 6, 2025 •

edited

Loading