-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Annotate disassembly with register‐resident variable locations #147460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Annotate disassembly with register‐resident variable locations #147460
Conversation
…DWARFExpressionList This introduces a new API for retrieving DWARF expression metadata associated with variable location entries at a given PC address. It provides the base, end, and expression pointer for downstream consumers such as disassembler annotations. Intended for use in richer instruction annotations in Instruction::Dump().
Co-authored-by: Jonas Devlieghere <[email protected]>
Updated comment for GetExpressionEntryAtAddress to directly refer to struct DWARFExpressionEntry Co-authored-by: Jonas Devlieghere <[email protected]>
updating code style for function GetExpressionEntryAtAddress Co-authored-by: Jonas Devlieghere <[email protected]>
Replace raw base/end with `AddressRange` in `DWARFExpressionEntry` and cleans up helper comments to follow Doxygen convention. Using `AddressRange` makes the intent clearer, avoids duplication of basic `AddressRange` logic usage
Converts `GetExpressionEntryAtAddress` to return `llvm::Expected<DWARFExpressionEntry>` using the updated `DWARFExpressionEntry`. Updates the implementation to compute a single `AddressRange file_range` for each DWARF location interval.
Updated commenting style for struct DWARFExpressionEntry
Updated function `llvm::Expected<DWARFExpressionList::DWARFExpressionEntry> DWARFExpressionList::GetExpressionEntryAtAddress` to use `FindEntryThatContains` instead of `FindEntryIndexThatContains`
Co-authored-by: Adrian Prantl <[email protected]>
This patch adds explicit checks:
- ensure `load_addr >= func_load_addr` to avoid underflow,
- compute and verify a temporary delta variable, then verify `delta + m_func_file_addr` does
not exceed `addr_t` max to avoid overflow.
…bly output. Right now just checks if DW annotations show up for a basic program and that a variable location is annotated (i.e 'a = DW_OP_reg...').
- Fixed an issue where variable location annotations were not shown if the current instruction address did not exactly match the DWARF base address. Now, annotations are shown as long as the PC is within the valid range. - Improved alignment of annotation comments in Instruction::Dump(). While `FillLastLineToColumn` can sometimes overcompensate due to internal formatting or byte-width mismatches, the overall alignment is now significantly more consistent and readable.
Previously, when a DWARF expression contained any decoding error, the entire variable location annotation was printed with the error, e.g. `c = DW_OP_addr 0x0, <decoding error> 00 00 00`. This was misleading and cluttered the disassembly view. This patch improves the formatting by stripping out the `<decoding error ...>` fragments while preserving the valid portions of the expression, so that partial information like `c = DW_OP_addr 0x0` can still be shown. This allows the rich disassembler to give more meaningful variable annotations, especially in optimized (-O1/-O2) builds where partial DWARF corruption or unsupported expressions may occur.
Handled edge case where the entire DWARF expression is a `<decoding error>`, ensuring no misleading or empty annotations are printed for such variables.
This patch adds API tests to verify that DWARF variable location annotations are correctly displayed in the disassembly output. The tests cover: - Local variables in loops and functions - Multiple stack variables - Control flow edge cases - Different optimization levels (-O0, -O1, -O2) - Ensuring decoding errors are excluded from output
…try API This rebases the `add-disassembler-annotations` work onto the latest `add-dwarfexprentry-api` branch so that the instruction annotation patches sit cleanly atop the new DWARFExpressionEntry struct and helper API. All conflicts have been resolved and the annotation code now integrates with the updated std::optional<AddressRange>-based GetExpressionEntryAtAddress signature.
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@JDevlieghere @adrian-prantl I have created the PR #144238 dependent Rich Disassembler Annotations PR. |
…w `DWARFExpression::DumpLocationWithOptions` for simplified expression printing This patch introduces a PrintRegisterOnly flag to the DIDumpOptions struct, enabling concise rendering of DWARF expressions for disassembler annotations. Key changes: - Added DumpLocationWithOptions to DWARFExpression for flexible dumping with DIDumpOptions. - Updated DWARFExpression::print and Operation::print to respect PrintRegisterOnly, rendering registers like RDI without DW_OP_ or llvm: prefixes. - Suppressed <decoding error> output when PrintRegisterOnly is true to avoid clutter during register-only disassembly output. These changes are motivated by LLDB’s rich disassembler feature, where annotations should match user-facing register names without DWARF-level detail. Test impact: Some rich-disassembler tests that relied on DW_OP_ for validation were deprecated. Updated tests aligned with the new formatting will be added next.
lldb/source/Core/Disassembler.cpp
Outdated
|
|
||
| const size_t annotation_column = 150; | ||
|
|
||
| if (exe_ctx && exe_ctx->GetFramePtr()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a high-level comment here to explain what this block of code is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added now in commit: 09c4d04b33e16e25da3fbfb826c609c8bb42264e
lldb/source/Core/Disassembler.cpp
Outdated
| if (frame && target_sp) { | ||
| addr_t current_pc = m_address.GetLoadAddress(target_sp.get()); | ||
| addr_t original_pc = frame->GetFrameCodeAddress().GetLoadAddress(target_sp.get()); | ||
| if (frame->ChangePC(current_pc)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is m_address here? Is it really necessary to modify the PC of frame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_address here refers to the address of the current instruction being dumped in the disassembler output. Since Instruction::Dump is called per-instruction, m_address gives us the load address of the instruction currently being printed.
I used frame->ChangePC(current_pc) to temporarily update the frame's PC to this instruction address. This is necessary because GetInScopeVariableList(true) relies on the frame's current PC to determine which variables are in scope and what their locations are at that exact address. Without setting the PC to the instruction being dumped, the variable list might reflect a different point in the function, which could lead to incorrect or missing annotations.
That said, I make sure to save the original PC with frame->GetFrameCodeAddress() and restore it afterward with frame->ChangePC(original_pc) to avoid any persistent side effects on the frame state.
If there’s a better way to get the in-scope variables for a given address without mutating the frame, I’m happy to revise the approach.
lldb/source/Core/Disassembler.cpp
Outdated
| func_load_addr = sc.function->GetAddress().GetLoadAddress(target_sp.get()); | ||
|
|
||
|
|
||
| if(ss.GetSizeOfLastLine() < annotation_column) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this check if there is enough space for an annotation? Maybe comment that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. I have now added the comment in commit: 6e17f77c4300654c7664b52efc045983bb59721d
lldb/source/Core/Disassembler.cpp
Outdated
|
|
||
| std::vector<std::string> annotations; | ||
|
|
||
| if (var_list_sp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM coding style prefers early exits over checks. Maybe you can wrap this whole block in a lambda or static function, then this can become
if (!var_list_sp)
return;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I've pushed the change in 31431c0, which moves the annotation logic into a local annotate_variables lambda with early exits.
lldb/source/Core/Disassembler.cpp
Outdated
| std::vector<std::string> annotations; | ||
|
|
||
| if (var_list_sp) { | ||
| for (size_t i = 0; i < var_list_sp->GetSize(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be a a range-based for? (i.e, for (VariableSP var_sp : *var_list_sp))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the loop to use a range-based for as recommended. The change is now committed in 9c5cb8f.
lldb/source/Core/Disassembler.cpp
Outdated
| if (!annotations.empty()) { | ||
| ss.FillLastLineToColumn(annotation_column, ' '); | ||
| ss.PutCString(" ; "); | ||
| for (size_t i = 0; i < annotations.size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- range-based for
- there is also a helper in LLVM to join a list with a separator (https://llvm.org/doxygen/StringExtras_8h_source.html)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced the manual string concatenation now with llvm::join as suggested in commit
ffefe5f.
| } | ||
|
|
||
| void DumpLocation(Stream *s, lldb::DescriptionLevel level, ABI *abi) const; | ||
| void DumpLocationWithOptions(Stream *s, lldb::DescriptionLevel level, ABI *abi, llvm::DIDumpOptions options) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not have one version with a default argument?
void DumpLocationWithOptions(Stream *s, lldb::DescriptionLevel level, ABI *abi, llvm::DIDumpOptions options = {}) const;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the code now to use a single DumpLocation method with a default DIDumpOptions argument, as recommended. The DumpLocationWithOptions variant has been removed in commit ca8510c.
|
|
||
| // Guard against overflow. | ||
| lldb::addr_t delta = load_addr - func_load_addr; | ||
| if (delta > std::numeric_limits<lldb::addr_t>::max() - m_func_file_addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to use the add with overflow methods from MathExtras here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From when I dug into MathExtras last time, I found that its AddOverflow and SubOverflow helpers are only defined for signed types.
I they rely on two’s-complement signed overflow semantics and return the truncated signed result. Since lldb::addr_t is an unsigned address, I think using those helpers would either fail to compile or silently convert to signed semantics.
That's why I tried to implement the signed overflow check into the function.
|
As high-level comments:
|
Thank you for the thoughtful feedback! Here's a breakdown of what has been addressed so far and what I’m currently working on:
Thanks again—will continue to iterate on this based on your guidance and on all the other comments you have made! |
|
Interesting PR, thanks for looking at adding these annotations. I think exposing these dwarf location expressions in a more user-visible way is going to put a lot of pressure on making them more meaningful to users, as opposed to debugging prints for debugger/compiler developers. Another example of an unfortunate behavior on AArch64 where x0 is the 64-bit register name, w0 is the 32-bit register name, but the DWARF register number doesn't specify which it is (it depends on the type of the value to read 32- or 64-bits), so the expression parser always prints the "w" register by default which might be a little confusing. The UI of making the dwarf expression parser a human readable format instead of a simple dump of dwarf opcodes and arguments is maybe the most interesting one to think about/enhance. One thing which is obviously correct, but was unintuitive at first for me, was that the DWARF location is only known after the instruction initializing it, e.g. for |
|
Incidentally, this is always something people get a little confused by when reading assembly. We see it with backtraces especially. A CALL instruction executes and the return address is the next instruction. That confuses people looking at the assembly language -- the "pc" pointer is pointing to the next instruction! -- but we cheat it for source symbolication, subtracting 1 from the return address when we do source line lookup so we show the function call as "currently executing" when we really have the return address which may point to the next source line. We have to do the same thing with dwarf location list lookups, where we might have a I'm aways unsure of how much to lie to the user in these scenarios. The general call is "at source level lie, at assembly level tell the truth of what the return address shows us". It gets a little messier on a function that ends in a noreturn CALL as the last instruction, we don't want to show the return address as the next function in memory, even though that's what it really shows us. |
…n Instruction::Dump
…tate_variable` lambda function
…IDumpOptions instead of having to introduce new function
| // disassembled instruction stream, similar to how debug information | ||
| // enhances source-level debugging. | ||
|
|
||
| const size_t annotation_column = 150; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JDevlieghere Question for a future PR: Should we consider threading through and checking the actual width of the Terminal to see if the annotations would fit?
Resolve conflicts in DWARFExpression files after upstream refactor
|
✅ With the latest revision this PR passed the Python code formatter. |
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
|
Hi @adrian-prantl, While debugging the current CI test failures, I found that a number of existing LLDB API tests (e.g., To address this, I’m considering gating the annotation output behind a new LLDB setting, for example: This would default to false, so the default disassembly output would be unchanged for tests and existing users, but developers/users could explicitly enable the richer annotated disassembly when desired. My thinking is:
Does that sound like a reasonable approach or is there a better/easier solution? |
This, or adding an option to the dissemble CommandObject is the right solution. Personally I think I would prefer an option to the CommandObject. Do you think this is feasible, or are there more many points through which the disassembly is reachable, making an option infeasible?
I think that is correct while the feature is under development, but I think the goal should be for it to be on by default, with the few tests relying on it not being there opting out.
Yes, this is the right approach! |
After looking more closely into one of the failing test cases, TestFrameDisassemble.py, I noticed that its entry point for disassembly is through Since this means command-line options like Once that’s in place and CI is passing, I can follow up with a separate change to add the Does that sound reasonable? |
There are multiple options:
I think I would prefer implementing option (1) and worry about exposing the new behavior through the API later. I would like to avoid global or target-specific settings if possible, because they can introduce unwanted side effects. (Imagine an IDE that depends on the traditional disassemble format, and a user that turns on the setting in their lldbinit file).
|
|
Thanks, that makes sense. Given what I found in the failing tests (some go through Phase 1 (minimal blast radius):
Phase 2 (once we’re confident):
I can start with Phase 1 now (SB path passes false; |
…e path** * Added a new `--rich` (`-R`) command-line option to `CommandObjectDisassemble` to enable rich disassembly annotations for the current invocation. * Plumbed a new `enable_rich_annotations` flag through: * `Disassembler::Disassemble` overloads * `Disassembler::PrintInstructions` * `Instruction::Dump` * `StackFrame::Disassemble` * Updated `StackFrame::Disassemble` to take an optional `bool enable_rich_annotations` (default `false`) so the SB API can request annotated output without CLI involvement. * Ensured annotations are only added when `enable_rich_annotations` is `true`; preserved caching for the non-rich path. * Modified `Options.td` to define the new `--rich` option. * Added/updated API test `TestRichDisassembler.py` to run `disassemble --rich -f` and check annotated output. * Kept default behavior unchanged so existing scripts and IDE integrations are unaffected.
|
I’ve addressed all the requested changes and fixed the formatting issues. @adrian.prantl, when you have time, could you please review the final version and merge if everything looks good? Thanks! |
…ions() (follow-up to #147460) (#152887) **Context** Follow-up to [#147460](#147460), which added the ability to surface register-resident variable locations. This PR moves the annotation logic out of `Instruction::Dump()` and into `Disassembler::PrintInstructions()`, and adds lightweight state tracking so we only print changes at range starts and when variables go out of scope. --- ## What this does While iterating the instructions for a function, we maintain a “live variable map” keyed by `lldb::user_id_t` (the `Variable`’s ID) to remember each variable’s last emitted location string. For each instruction: - **New (or newly visible) variable** → print `name = <location>` once at the start of its DWARF location range, cache it. - **Location changed** (e.g., DWARF range switched to a different register/const) → print the updated mapping. - **Out of scope** (was tracked previously but not found for the current PC) → print `name = <undef>` and drop it. This produces **concise, stateful annotations** that highlight variable lifetime transitions without spamming every line. --- ## Why in `PrintInstructions()`? - Keeps `Instruction` stateless and avoids changing the `Instruction::Dump()` virtual API. - Makes it straightforward to diff state across instructions (`prev → current`) inside the single driver loop. --- ## How it works (high-level) 1. For the current PC, get in-scope variables via `StackFrame::GetInScopeVariableList(/*get_parent=*/true)`. 2. For each `Variable`, query `DWARFExpressionList::GetExpressionEntryAtAddress(func_load_addr, current_pc)` (added in #144238). 3. If the entry exists, call `DumpLocation(..., eDescriptionLevelBrief, abi)` to get a short, ABI-aware location string (e.g., `DW_OP_reg3 RBX → RBX`). 4. Compare against the last emitted location in the live map: - If not present → emit `name = <location>` and record it. - If different → emit updated mapping and record it. 5. After processing current in-scope variables, compute the set difference vs. the previous map and emit `name = <undef>` for any that disappeared. Internally: - We respect file↔load address translation already provided by `DWARFExpressionList`. - We reuse the ABI to map LLVM register numbers to arch register names. --- ## Example output (x86_64, simplified) ``` -> 0x55c6f5f6a140 <+0>: cmpl $0x2, %edi ; argc = RDI, argv = RSI 0x55c6f5f6a143 <+3>: jl 0x55c6f5f6a176 ; <+54> at d_original_example.c:6:3 0x55c6f5f6a145 <+5>: pushq %r15 0x55c6f5f6a147 <+7>: pushq %r14 0x55c6f5f6a149 <+9>: pushq %rbx 0x55c6f5f6a14a <+10>: movq %rsi, %rbx 0x55c6f5f6a14d <+13>: movl %edi, %r14d 0x55c6f5f6a150 <+16>: movl $0x1, %r15d ; argc = R14 0x55c6f5f6a156 <+22>: nopw %cs:(%rax,%rax) ; i = R15, argv = RBX 0x55c6f5f6a160 <+32>: movq (%rbx,%r15,8), %rdi 0x55c6f5f6a164 <+36>: callq 0x55c6f5f6a030 ; symbol stub for: puts 0x55c6f5f6a169 <+41>: incq %r15 0x55c6f5f6a16c <+44>: cmpq %r15, %r14 0x55c6f5f6a16f <+47>: jne 0x55c6f5f6a160 ; <+32> at d_original_example.c:5:10 0x55c6f5f6a171 <+49>: popq %rbx ; i = <undef> 0x55c6f5f6a172 <+50>: popq %r14 ; argv = RSI 0x55c6f5f6a174 <+52>: popq %r15 ; argc = RDI 0x55c6f5f6a176 <+54>: xorl %eax, %eax 0x55c6f5f6a178 <+56>: retq ``` Only transitions are shown: the start of a location, changes, and end-of-lifetime. --- ## Scope & limitations (by design) - Handles **simple locations** first (registers, const-in-register cases surfaced by `DumpLocation`). - **Memory/composite locations** are out of scope for this PR. - Annotations appear **only at range boundaries** (start/change/end) to minimize noise. - Output is **target-independent**; register names come from the target ABI. ## Implementation notes - All annotation printing now happens in `Disassembler::PrintInstructions()`. - Uses `std::unordered_map<lldb::user_id_t, std::string>` as the live map. - No persistent state across calls; the map is rebuilt while walking instruction by instruction. - **No changes** to the `Instruction` interface. --- ## Requested feedback - Placement and wording of the `<undef>` marker. - Whether we should optionally gate this behind a setting (currently always on when disassembling with an `ExecutionContext`). - Preference for immediate inclusion of tests vs. follow-up patch. --- Thanks for reviewing! Happy to adjust behavior/format based on feedback. --------- Co-authored-by: Jonas Devlieghere <[email protected]> Co-authored-by: Adrian Prantl <[email protected]>
…intInstructions() (follow-up to #147460) (#152887) **Context** Follow-up to [#147460](llvm/llvm-project#147460), which added the ability to surface register-resident variable locations. This PR moves the annotation logic out of `Instruction::Dump()` and into `Disassembler::PrintInstructions()`, and adds lightweight state tracking so we only print changes at range starts and when variables go out of scope. --- ## What this does While iterating the instructions for a function, we maintain a “live variable map” keyed by `lldb::user_id_t` (the `Variable`’s ID) to remember each variable’s last emitted location string. For each instruction: - **New (or newly visible) variable** → print `name = <location>` once at the start of its DWARF location range, cache it. - **Location changed** (e.g., DWARF range switched to a different register/const) → print the updated mapping. - **Out of scope** (was tracked previously but not found for the current PC) → print `name = <undef>` and drop it. This produces **concise, stateful annotations** that highlight variable lifetime transitions without spamming every line. --- ## Why in `PrintInstructions()`? - Keeps `Instruction` stateless and avoids changing the `Instruction::Dump()` virtual API. - Makes it straightforward to diff state across instructions (`prev → current`) inside the single driver loop. --- ## How it works (high-level) 1. For the current PC, get in-scope variables via `StackFrame::GetInScopeVariableList(/*get_parent=*/true)`. 2. For each `Variable`, query `DWARFExpressionList::GetExpressionEntryAtAddress(func_load_addr, current_pc)` (added in #144238). 3. If the entry exists, call `DumpLocation(..., eDescriptionLevelBrief, abi)` to get a short, ABI-aware location string (e.g., `DW_OP_reg3 RBX → RBX`). 4. Compare against the last emitted location in the live map: - If not present → emit `name = <location>` and record it. - If different → emit updated mapping and record it. 5. After processing current in-scope variables, compute the set difference vs. the previous map and emit `name = <undef>` for any that disappeared. Internally: - We respect file↔load address translation already provided by `DWARFExpressionList`. - We reuse the ABI to map LLVM register numbers to arch register names. --- ## Example output (x86_64, simplified) ``` -> 0x55c6f5f6a140 <+0>: cmpl $0x2, %edi ; argc = RDI, argv = RSI 0x55c6f5f6a143 <+3>: jl 0x55c6f5f6a176 ; <+54> at d_original_example.c:6:3 0x55c6f5f6a145 <+5>: pushq %r15 0x55c6f5f6a147 <+7>: pushq %r14 0x55c6f5f6a149 <+9>: pushq %rbx 0x55c6f5f6a14a <+10>: movq %rsi, %rbx 0x55c6f5f6a14d <+13>: movl %edi, %r14d 0x55c6f5f6a150 <+16>: movl $0x1, %r15d ; argc = R14 0x55c6f5f6a156 <+22>: nopw %cs:(%rax,%rax) ; i = R15, argv = RBX 0x55c6f5f6a160 <+32>: movq (%rbx,%r15,8), %rdi 0x55c6f5f6a164 <+36>: callq 0x55c6f5f6a030 ; symbol stub for: puts 0x55c6f5f6a169 <+41>: incq %r15 0x55c6f5f6a16c <+44>: cmpq %r15, %r14 0x55c6f5f6a16f <+47>: jne 0x55c6f5f6a160 ; <+32> at d_original_example.c:5:10 0x55c6f5f6a171 <+49>: popq %rbx ; i = <undef> 0x55c6f5f6a172 <+50>: popq %r14 ; argv = RSI 0x55c6f5f6a174 <+52>: popq %r15 ; argc = RDI 0x55c6f5f6a176 <+54>: xorl %eax, %eax 0x55c6f5f6a178 <+56>: retq ``` Only transitions are shown: the start of a location, changes, and end-of-lifetime. --- ## Scope & limitations (by design) - Handles **simple locations** first (registers, const-in-register cases surfaced by `DumpLocation`). - **Memory/composite locations** are out of scope for this PR. - Annotations appear **only at range boundaries** (start/change/end) to minimize noise. - Output is **target-independent**; register names come from the target ABI. ## Implementation notes - All annotation printing now happens in `Disassembler::PrintInstructions()`. - Uses `std::unordered_map<lldb::user_id_t, std::string>` as the live map. - No persistent state across calls; the map is rebuilt while walking instruction by instruction. - **No changes** to the `Instruction` interface. --- ## Requested feedback - Placement and wording of the `<undef>` marker. - Whether we should optionally gate this behind a setting (currently always on when disassembling with an `ExecutionContext`). - Preference for immediate inclusion of tests vs. follow-up patch. --- Thanks for reviewing! Happy to adjust behavior/format based on feedback. --------- Co-authored-by: Jonas Devlieghere <[email protected]> Co-authored-by: Adrian Prantl <[email protected]>
This PR builds upon the functionality introduced in PR #144238, which adds support for annotating disassembly output with DWARF variable locations.
This patch includes:
Additional test coverage for various DWARF location cases (e.g., stack variables, control flow edges, partial decoding).
A refined implementation that uses the new DWARFExpressionEntry API.
Logic to gracefully skip fragments while preserving valid annotations.
Summary
This patch adds a first‐pass prototype of “rich disassembly” annotations into Instruction::Dump(). For each instruction, it will:
GetExpressionEntryAtAddress()API for any in‐scope variable live‐ranges covering this PC.If a variable’s live‐range begins exactly at the current PC that is being iterated in the disassembly, call
expr->DumpLocation(...)to render its DWARF expression (e.g. “DW_OP_reg5 RDI”, “DW_OP_reg3 RBX”, or constant) and append it as a; var = …comment after the instruction.Currently this only handles simple cases:
Variables whose location expression is a single register (e.g.
DW_OP_reg14 R14)Variables whose location expression is a constant (e.g.
DW_OP_consts +0)Example test program
Disassembly output with annotations
Next steps
Cleanup & style (DONE)
Test coverage (DONE)