Skip to content

Stateful variable-location annotations in Disassembler::PrintInstructions() (follow-up to #147460) #152887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 42 commits into
base: main
Choose a base branch
from

Conversation

UltimateForce21
Copy link
Contributor

Context
Follow-up to #147460, which added the ability to surface register-resident variable locations.
This PR moves the annotation logic out of Instruction::Dump() and into Disassembler::PrintInstructions(), and adds lightweight state tracking so we only print changes at range starts and when variables go out of scope.


What this does

While iterating the instructions for a function, we maintain a “live variable map” keyed by lldb::user_id_t (the Variable’s ID) to remember each variable’s last emitted location string. For each instruction:

  • New (or newly visible) variable → print name = <location> once at the start of its DWARF location range, cache it.
  • Location changed (e.g., DWARF range switched to a different register/const) → print the updated mapping.
  • Out of scope (was tracked previously but not found for the current PC) → print name = <undef> and drop it.

This produces concise, stateful annotations that highlight variable lifetime transitions without spamming every line.


Why in PrintInstructions()?

  • Keeps Instruction stateless and avoids changing the Instruction::Dump() virtual API.
  • Makes it straightforward to diff state across instructions (prev → current) inside the single driver loop.

How it works (high-level)

  1. For the current PC, get in-scope variables via StackFrame::GetInScopeVariableList(/*get_parent=*/true).
  2. For each Variable, query DWARFExpressionList::GetExpressionEntryAtAddress(func_load_addr, current_pc) (added in [lldb] Add DWARFExpressionEntry and GetExpressionEntryAtAddress() to … #144238).
  3. If the entry exists, call DumpLocation(..., eDescriptionLevelBrief, abi) to get a short, ABI-aware location string (e.g., DW_OP_reg3 RBX → RBX).
  4. Compare against the last emitted location in the live map:
    • If not present → emit name = <location> and record it.
    • If different → emit updated mapping and record it.
  5. After processing current in-scope variables, compute the set difference vs. the previous map and emit name = <undef> for any that disappeared.

Internally:

  • We respect file↔load address translation already provided by DWARFExpressionList.
  • We reuse the ABI to map LLVM register numbers to arch register names.

Example output (x86_64, simplified)

->  0x55c6f5f6a140 <+0>:  cmpl   $0x2, %edi                                                         ; argc = RDI, argv = RSI
    0x55c6f5f6a143 <+3>:  jl     0x55c6f5f6a176            ; <+54> at d_original_example.c:6:3
    0x55c6f5f6a145 <+5>:  pushq  %r15
    0x55c6f5f6a147 <+7>:  pushq  %r14
    0x55c6f5f6a149 <+9>:  pushq  %rbx
    0x55c6f5f6a14a <+10>: movq   %rsi, %rbx
    0x55c6f5f6a14d <+13>: movl   %edi, %r14d
    0x55c6f5f6a150 <+16>: movl   $0x1, %r15d                                                        ; argc = R14
    0x55c6f5f6a156 <+22>: nopw   %cs:(%rax,%rax)                                                    ; i = R15, argv = RBX
    0x55c6f5f6a160 <+32>: movq   (%rbx,%r15,8), %rdi
    0x55c6f5f6a164 <+36>: callq  0x55c6f5f6a030            ; symbol stub for: puts
    0x55c6f5f6a169 <+41>: incq   %r15
    0x55c6f5f6a16c <+44>: cmpq   %r15, %r14
    0x55c6f5f6a16f <+47>: jne    0x55c6f5f6a160            ; <+32> at d_original_example.c:5:10
    0x55c6f5f6a171 <+49>: popq   %rbx                                                               ; i = <undef>
    0x55c6f5f6a172 <+50>: popq   %r14                                                               ; argv = RSI
    0x55c6f5f6a174 <+52>: popq   %r15                                                               ; argc = RDI
    0x55c6f5f6a176 <+54>: xorl   %eax, %eax
    0x55c6f5f6a178 <+56>: retq  

Only transitions are shown: the start of a location, changes, and end-of-lifetime.


Scope & limitations (by design)

  • Handles simple locations first (registers, const-in-register cases surfaced by DumpLocation).
  • Memory/composite locations are out of scope for this PR.
  • Annotations appear only at range boundaries (start/change/end) to minimize noise.
  • Output is target-independent; register names come from the target ABI.

Implementation notes

  • All annotation printing now happens in Disassembler::PrintInstructions().
  • Uses std::unordered_map<lldb::user_id_t, std::string> as the live map.
  • No persistent state across calls; the map is rebuilt while walking instruction by instruction.
  • No changes to the Instruction interface.

Requested feedback

  • Placement and wording of the <undef> marker.
  • Whether we should optionally gate this behind a setting (currently always on when disassembling with an ExecutionContext).
  • Preference for immediate inclusion of tests vs. follow-up patch.

Thanks for reviewing! Happy to adjust behavior/format based on feedback.

UltimateForce21 and others added 30 commits June 11, 2025 00:29
…DWARFExpressionList

This introduces a new API for retrieving DWARF expression metadata associated with variable location entries at a given PC address. It provides the base, end, and expression pointer for downstream consumers such as disassembler annotations.

Intended for use in richer instruction annotations in Instruction::Dump().
Updated comment for GetExpressionEntryAtAddress to directly refer to struct DWARFExpressionEntry

Co-authored-by: Jonas Devlieghere <[email protected]>
updating code style for function GetExpressionEntryAtAddress

Co-authored-by: Jonas Devlieghere <[email protected]>
Replace raw base/end with `AddressRange` in `DWARFExpressionEntry` and cleans up helper comments to follow Doxygen convention.

Using `AddressRange` makes the intent clearer, avoids duplication of basic `AddressRange` logic usage
Converts `GetExpressionEntryAtAddress` to return `llvm::Expected<DWARFExpressionEntry>` using the updated `DWARFExpressionEntry`. Updates the implementation to compute a single `AddressRange file_range` for each DWARF location interval.
Updated commenting style for struct DWARFExpressionEntry
Updated function `llvm::Expected<DWARFExpressionList::DWARFExpressionEntry>
DWARFExpressionList::GetExpressionEntryAtAddress` to use `FindEntryThatContains` instead of `FindEntryIndexThatContains`
Co-authored-by: Adrian Prantl <[email protected]>
This patch adds explicit checks:
  - ensure `load_addr >= func_load_addr` to avoid underflow,
  - compute and verify a temporary delta variable, then verify `delta + m_func_file_addr` does
    not exceed `addr_t` max to avoid overflow.
…bly output. Right now just checks if DW annotations show up for a basic program and that a variable location is annotated (i.e 'a = DW_OP_reg...').
- Fixed an issue where variable location annotations were not shown if the current instruction address did not exactly match the DWARF base address. Now, annotations are shown as long as the PC is within the valid range.
- Improved alignment of annotation comments in Instruction::Dump(). While `FillLastLineToColumn` can sometimes overcompensate due to internal formatting or byte-width mismatches, the overall alignment is now significantly more consistent and readable.
Previously, when a DWARF expression contained any decoding error,
the entire variable location annotation was printed with the error,
e.g. `c = DW_OP_addr 0x0, <decoding error> 00 00 00`. This was
misleading and cluttered the disassembly view.

This patch improves the formatting by stripping out the
`<decoding error ...>` fragments while preserving the valid portions
of the expression, so that partial information like
`c = DW_OP_addr 0x0` can still be shown.

This allows the rich disassembler to give more meaningful variable
annotations, especially in optimized (-O1/-O2) builds where partial
DWARF corruption or unsupported expressions may occur.
Handled edge case where the entire DWARF expression is a `<decoding error>`, ensuring no misleading or empty annotations are printed for such variables.
This patch adds API tests to verify that DWARF variable location
annotations are correctly displayed in the disassembly output.

The tests cover:
- Local variables in loops and functions
- Multiple stack variables
- Control flow edge cases
- Different optimization levels (-O0, -O1, -O2)
- Ensuring decoding errors are excluded from output
…try API

This rebases the `add-disassembler-annotations` work onto the
latest `add-dwarfexprentry-api` branch so that the instruction
annotation patches sit cleanly atop the new DWARFExpressionEntry
struct and helper API. All conflicts have been resolved and the
annotation code now integrates with the updated std::optional<AddressRange>-based
GetExpressionEntryAtAddress signature.
…w `DWARFExpression::DumpLocationWithOptions` for simplified expression printing

This patch introduces a PrintRegisterOnly flag to the DIDumpOptions struct, enabling concise rendering of DWARF expressions for disassembler annotations.

Key changes:
- Added DumpLocationWithOptions to DWARFExpression for flexible dumping with DIDumpOptions.
- Updated DWARFExpression::print and Operation::print to respect PrintRegisterOnly, rendering registers like RDI without DW_OP_ or llvm: prefixes.
- Suppressed <decoding error> output when PrintRegisterOnly is true to avoid clutter during register-only disassembly output.

These changes are motivated by LLDB’s rich disassembler feature, where annotations should match user-facing register names without DWARF-level detail.

Test impact:
Some rich-disassembler tests that relied on DW_OP_ for validation were deprecated. Updated tests aligned with the new formatting will be added next.
…IDumpOptions instead of having to introduce new function
Resolve conflicts in DWARFExpression files after upstream refactor
UltimateForce21 and others added 7 commits August 6, 2025 20:41
…e path**

* Added a new `--rich` (`-R`) command-line option to `CommandObjectDisassemble` to enable rich disassembly annotations for the current invocation.

* Plumbed a new `enable_rich_annotations` flag through:
  * `Disassembler::Disassemble` overloads
  * `Disassembler::PrintInstructions`
  * `Instruction::Dump`
  * `StackFrame::Disassemble`

* Updated `StackFrame::Disassemble` to take an optional `bool enable_rich_annotations` (default `false`) so the SB API can request annotated output without CLI involvement.

* Ensured annotations are only added when `enable_rich_annotations` is `true`; preserved caching for the non-rich path.

* Modified `Options.td` to define the new `--rich` option.

* Added/updated API test `TestRichDisassembler.py` to run `disassemble --rich -f` and check annotated output.

* Kept default behavior unchanged so existing scripts and IDE integrations are unaffected.
This change introduces a simple live-variable tracking system for annotated
disassembly. While iterating over instructions, we now maintain an
unordered_map keyed by `lldb::user_id_t` to remember each in-scope variable's
last known location string.

For each instruction:
  * If a variable is new, print `name = location` and add it to the map.
  * If a variable's location has changed, print the updated mapping.
  * If a previously tracked variable is no longer found, print
    `name = <undef>` and remove it.

This produces concise, stateful annotations that only update when needed,
reducing noise in the disassembly while still showing variable lifetimes.
Copy link

github-actions bot commented Aug 10, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@UltimateForce21
Copy link
Contributor Author

@adrian-prantl @JDevlieghere PR posted for the variable location tracking and annotations moved to Disassembler::PrintInstructions. Looking forward to your feedback and thank you in advance!

@@ -445,10 +444,11 @@ class Disassembler : public std::enable_shared_from_this<Disassembler>,
const ExecutionContext &exe_ctx, const Address &start,
Limit limit, bool mixed_source_and_assembly,
uint32_t num_mixed_context_lines, uint32_t options,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the new flag not part of options?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are defined in Disassembler.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I misunderstood earlier when you mentioned about “options,” thinking you were referring only to CommandObjectDisassemble’s CommandOptions. I missed that you meant the existing Disassembler options bitmask.

I’ll update the patch so that --rich sets a new eOptionRichAnnotations flag in the Disassembler options enum instead of plumbing a separate boolean through the APIs, and then read that flag in PrintInstructions to enable annotations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I wasn't being quite as specific ;-)
I did mean to add a user-visible option to the command (which you implemented as --rich), and I didn't bother checking how to thread that flag through all layers of the API. At the lldb_private::Dissassembler reusing the existing option set seems like the right choice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside: given that there is an existing -r (--raw) option to the disassemble command, what do you think about naming it --variables <boolean> to be both more descriptive and avoid confusion with the -r option?

Copy link
Contributor Author

@UltimateForce21 UltimateForce21 Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good idea to me. Right now --rich is synonymous to -R, but I agree --variable would be clear to users to be different.

};

// Track live variables across instructions (keyed by stable LLDB user_id_t)
std::unordered_map<lldb::user_id_t, VarState> live_vars;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but we typically prefer using llvm::DenseMap<> or llvm::SmallDenseMap<>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the erase loop you're doing depends on this.

Copy link
Contributor Author

@UltimateForce21 UltimateForce21 Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer, I have switched to using llvm::SmallDenseMap<> now.

if (!frame || !target_sp || !process_sp)
return events;

// Reset "seen" flags for this instruction
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: LLVM coding style wants full sentences in comments with a . at the end. (This applies to all comments)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I keep forgetting to do this, thank you for all the reminders. Updating them now.

@@ -0,0 +1,6 @@

# CXX_SOURCES := a_loop_with_local_variable.c b_multiple_stack_variables.c c_variable_passed_to_another_function.c d_original_example.c e_control_flow_edge.c
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stale comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed now

// enhances source-level debugging.

struct VarState {
std::string name; // display name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::string name; // display name
std::string name; ///< Display name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now

This change refactors how the `--rich` flag is handled, based on code review feedback.

- Removed the `enable_rich_annotations` boolean from the API signatures of:
  - Disassembler::Disassemble(...)
  - Disassembler::PrintInstructions(...)
  - StackFrame::Disassemble(...)
- Added a new Disassembler::Option enum value: eOptionRichAnnotations.
- The `--rich` CLI flag now sets the new option bit in CommandObjectDisassemble::DoExecute:
    options |= Disassembler::eOptionRichAnnotations;
- Disassembler::PrintInstructions checks the bit to determine whether to enable rich annotations:
    const bool enable_rich = (options & eOptionRichAnnotations) != 0;

The SB API remains unchanged and defaults to non-rich output.

Tested via the existing test using `disassemble --rich -f`.
Address code review feedback suggesting the use of LLVM's DenseMap family
over std::unordered_map for consistency and potential performance benefits
within LLDB.

Replaced:
  std::unordered_map<lldb::user_id_t, VarState>
with:
  llvm::SmallDenseMap<lldb::user_id_t, VarState, 8>

The small buffer size of 8 is a heuristic for typical numbers of live
variables in scope, reducing allocations for common cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants