Skip to content

Conversation

@eronnen
Copy link
Contributor

@eronnen eronnen commented May 18, 2025

Fix the handling of the instructionOffset parameter, which resulted in always returning the wrong disassembly because VSCode always uses instructionOffset = -50 and expects 50 instructions before the given address, instead of 50 bytes before

@eronnen eronnen force-pushed the lldb-dap-fix-disassembly-request-instruction-offset-handling branch from cf7600d to b9465b5 Compare May 19, 2025 23:49
@eronnen eronnen marked this pull request as ready for review May 19, 2025 23:51
@eronnen eronnen requested a review from JDevlieghere as a code owner May 19, 2025 23:51
@llvmbot
Copy link
Member

llvmbot commented May 19, 2025

@llvm/pr-subscribers-lldb

Author: Ely Ronnen (eronnen)

Changes

Patch is 21.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/140486.diff

9 Files Affected:

  • (modified) lldb/include/lldb/API/SBTarget.h (+4)
  • (modified) lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py (+7-6)
  • (modified) lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py (+7-3)
  • (modified) lldb/source/API/SBTarget.cpp (+20)
  • (modified) lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py (+36-9)
  • (modified) lldb/test/API/tools/lldb-dap/disassemble/main.c (+1-1)
  • (modified) lldb/test/API/tools/lldb-dap/instruction-breakpoint/TestDAP_instruction_breakpoint.py (+2-3)
  • (modified) lldb/tools/lldb-dap/Handler/DisassembleRequestHandler.cpp (+182-80)
  • (modified) lldb/tools/lldb-dap/Handler/RequestHandler.h (+9)
diff --git a/lldb/include/lldb/API/SBTarget.h b/lldb/include/lldb/API/SBTarget.h
index 17735fdca6559..30dc337f02a2e 100644
--- a/lldb/include/lldb/API/SBTarget.h
+++ b/lldb/include/lldb/API/SBTarget.h
@@ -349,6 +349,10 @@ class LLDB_API SBTarget {
 
   SBError SetLabel(const char *label);
 
+  uint32_t GetMinimumOpcodeByteSize() const;
+
+  uint32_t GetMaximumOpcodeByteSize() const;
+
   /// Architecture data byte width accessor
   ///
   /// \return
diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
index 70fd0b0c419db..9937618a2cf54 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
@@ -136,7 +136,6 @@ def __init__(
         self.initialized = False
         self.frame_scopes = {}
         self.init_commands = init_commands
-        self.disassembled_instructions = {}
 
     @classmethod
     def encode_content(cls, s: str) -> bytes:
@@ -735,11 +734,15 @@ def request_disconnect(self, terminateDebuggee=None):
         return self.send_recv(command_dict)
 
     def request_disassemble(
-        self, memoryReference, offset=-50, instructionCount=200, resolveSymbols=True
+        self,
+        memoryReference,
+        instructionOffset=-50,
+        instructionCount=200,
+        resolveSymbols=True,
     ):
         args_dict = {
             "memoryReference": memoryReference,
-            "offset": offset,
+            "instructionOffset": instructionOffset,
             "instructionCount": instructionCount,
             "resolveSymbols": resolveSymbols,
         }
@@ -748,9 +751,7 @@ def request_disassemble(
             "type": "request",
             "arguments": args_dict,
         }
-        instructions = self.send_recv(command_dict)["body"]["instructions"]
-        for inst in instructions:
-            self.disassembled_instructions[inst["address"]] = inst
+        return self.send_recv(command_dict)["body"]["instructions"]
 
     def request_readMemory(self, memoryReference, offset, count):
         args_dict = {
diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
index afdc746ed0d0d..4028ae4a2525f 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
@@ -346,10 +346,14 @@ def disassemble(self, threadId=None, frameIndex=None):
         memoryReference = stackFrames[0]["instructionPointerReference"]
         self.assertIsNotNone(memoryReference)
 
-        if memoryReference not in self.dap_server.disassembled_instructions:
-            self.dap_server.request_disassemble(memoryReference=memoryReference)
+        instructions = self.dap_server.request_disassemble(
+            memoryReference=memoryReference
+        )
+        disassembled_instructions = {}
+        for inst in instructions:
+            disassembled_instructions[inst["address"]] = inst
 
-        return self.dap_server.disassembled_instructions[memoryReference]
+        return disassembled_instructions, disassembled_instructions[memoryReference]
 
     def attach(
         self,
diff --git a/lldb/source/API/SBTarget.cpp b/lldb/source/API/SBTarget.cpp
index b42ada42b0931..1de8579500776 100644
--- a/lldb/source/API/SBTarget.cpp
+++ b/lldb/source/API/SBTarget.cpp
@@ -1668,6 +1668,26 @@ SBError SBTarget::SetLabel(const char *label) {
   return Status::FromError(target_sp->SetLabel(label));
 }
 
+uint32_t SBTarget::GetMinimumOpcodeByteSize() const {
+  LLDB_INSTRUMENT_VA(this);
+
+  TargetSP target_sp(GetSP());
+  if (target_sp) {
+    return target_sp->GetArchitecture().GetMinimumOpcodeByteSize();
+  }
+  return 0;
+}
+
+uint32_t SBTarget::GetMaximumOpcodeByteSize() const {
+  LLDB_INSTRUMENT_VA(this);
+
+  TargetSP target_sp(GetSP());
+  if (target_sp) {
+    return target_sp->GetArchitecture().GetMaximumOpcodeByteSize();
+  }
+  return 0;
+}
+
 uint32_t SBTarget::GetDataByteSize() {
   LLDB_INSTRUMENT_VA(this);
 
diff --git a/lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py b/lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py
index 9e8ef5b289f2e..a09d5f111bf23 100644
--- a/lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py
+++ b/lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py
@@ -20,21 +20,48 @@ def test_disassemble(self):
         program = self.getBuildArtifact("a.out")
         self.build_and_launch(program)
         source = "main.c"
-        self.source_path = os.path.join(os.getcwd(), source)
-        self.set_source_breakpoints(
-            source,
-            [
-                line_number(source, "// breakpoint 1"),
-            ],
-        )
+        self.set_source_breakpoints(source, [line_number(source, "// breakpoint 1")])
         self.continue_to_next_stop()
 
-        pc_assembly = self.disassemble(frameIndex=0)
+        _, pc_assembly = self.disassemble(frameIndex=0)
         self.assertIn("location", pc_assembly, "Source location missing.")
         self.assertIn("instruction", pc_assembly, "Assembly instruction missing.")
 
         # The calling frame (qsort) is coming from a system library, as a result
         # we should not have a source location.
-        qsort_assembly = self.disassemble(frameIndex=1)
+        _, qsort_assembly = self.disassemble(frameIndex=1)
         self.assertNotIn("location", qsort_assembly, "Source location not expected.")
         self.assertIn("instruction", pc_assembly, "Assembly instruction missing.")
+
+    def test_disassemble_backwards(self):
+        """
+        Tests the 'disassemble' request with a backwards disassembly range.
+        """
+        program = self.getBuildArtifact("a.out")
+        self.build_and_launch(program)
+        source = "main.c"
+        self.set_source_breakpoints(source, [line_number(source, "// breakpoint 1")])
+        self.continue_to_next_stop()
+
+        instruction_pointer_reference = self.get_stackFrames()[1][
+            "instructionPointerReference"
+        ]
+        backwards_instructions = 50
+        instructions = self.dap_server.request_disassemble(
+            memoryReference=instruction_pointer_reference,
+            instructionOffset=-backwards_instructions,
+        )
+
+        frame_instruction_index = next(
+            (
+                i
+                for i, instruction in enumerate(instructions)
+                if instruction["address"] == instruction_pointer_reference
+            ),
+            -1,
+        )
+        self.assertEqual(
+            frame_instruction_index,
+            backwards_instructions,
+            f"requested instruction should be preceeded by {backwards_instructions} instructions. Actual index: {frame_instruction_index}",
+        )
diff --git a/lldb/test/API/tools/lldb-dap/disassemble/main.c b/lldb/test/API/tools/lldb-dap/disassemble/main.c
index 6609a4c37a70f..9da119ef70262 100644
--- a/lldb/test/API/tools/lldb-dap/disassemble/main.c
+++ b/lldb/test/API/tools/lldb-dap/disassemble/main.c
@@ -27,4 +27,4 @@ int main(void) {
 
   printf("\n");
   return 0;
-}
\ No newline at end of file
+}
diff --git a/lldb/test/API/tools/lldb-dap/instruction-breakpoint/TestDAP_instruction_breakpoint.py b/lldb/test/API/tools/lldb-dap/instruction-breakpoint/TestDAP_instruction_breakpoint.py
index 53b7df9e54af2..07d2059b2749b 100644
--- a/lldb/test/API/tools/lldb-dap/instruction-breakpoint/TestDAP_instruction_breakpoint.py
+++ b/lldb/test/API/tools/lldb-dap/instruction-breakpoint/TestDAP_instruction_breakpoint.py
@@ -66,7 +66,7 @@ def instruction_breakpoint_test(self):
         )
 
         # Check disassembly view
-        instruction = self.disassemble(frameIndex=0)
+        disassembled_instructions, instruction = self.disassemble(frameIndex=0)
         self.assertEqual(
             instruction["address"],
             intstructionPointerReference[0],
@@ -74,8 +74,7 @@ def instruction_breakpoint_test(self):
         )
 
         # Get next instruction address to set instruction breakpoint
-        disassembled_instruction_list = self.dap_server.disassembled_instructions
-        instruction_addr_list = list(disassembled_instruction_list.keys())
+        instruction_addr_list = list(disassembled_instructions.keys())
         index = instruction_addr_list.index(intstructionPointerReference[0])
         if len(instruction_addr_list) >= (index + 1):
             next_inst_addr = instruction_addr_list[index + 1]
diff --git a/lldb/tools/lldb-dap/Handler/DisassembleRequestHandler.cpp b/lldb/tools/lldb-dap/Handler/DisassembleRequestHandler.cpp
index d779de1e8e834..2a0ae8d8e9e12 100644
--- a/lldb/tools/lldb-dap/Handler/DisassembleRequestHandler.cpp
+++ b/lldb/tools/lldb-dap/Handler/DisassembleRequestHandler.cpp
@@ -26,8 +26,6 @@ namespace lldb_dap {
 /// `supportsDisassembleRequest` is true.
 llvm::Expected<DisassembleResponseBody>
 DisassembleRequestHandler::Run(const DisassembleArguments &args) const {
-  std::vector<DisassembledInstruction> instructions;
-
   std::optional<lldb::addr_t> addr_opt =
       DecodeMemoryReference(args.memoryReference);
   if (!addr_opt.has_value())
@@ -35,7 +33,7 @@ DisassembleRequestHandler::Run(const DisassembleArguments &args) const {
                                       args.memoryReference);
 
   lldb::addr_t addr_ptr = *addr_opt;
-  addr_ptr += args.instructionOffset.value_or(0);
+  addr_ptr += args.offset.value_or(0);
   lldb::SBAddress addr(addr_ptr, dap.target);
   if (!addr.IsValid())
     return llvm::make_error<DAPError>(
@@ -56,100 +54,204 @@ DisassembleRequestHandler::Run(const DisassembleArguments &args) const {
     }
   }
 
+  int64_t instructionOffset = args.instructionOffset.value_or(0);
+  if (instructionOffset > 0) {
+    lldb::SBInstructionList forward_insts = dap.target.ReadInstructions(
+        addr, instructionOffset + 1, flavor_string.c_str());
+    if (forward_insts.GetSize() != static_cast<size_t>(instructionOffset + 1)) {
+      return llvm::make_error<DAPError>(
+          "Failed to disassemble instructions after " +
+          std::to_string(instructionOffset) +
+          " instructions from the given address.");
+    }
+
+    addr = forward_insts.GetInstructionAtIndex(instructionOffset).GetAddress();
+  }
+
+  const bool resolve_symbols = args.resolveSymbols.value_or(false);
+  std::vector<DisassembledInstruction> instructions;
+  if (instructionOffset < 0)
+    instructions = disassembleBackwards(addr, std::abs(instructionOffset),
+                                        flavor_string.c_str(), resolve_symbols);
+
+  const auto instructions_left = args.instructionCount - instructions.size();
   lldb::SBInstructionList insts = dap.target.ReadInstructions(
-      addr, args.instructionCount, flavor_string.c_str());
+      addr, instructions_left, flavor_string.c_str());
 
   if (!insts.IsValid())
     return llvm::make_error<DAPError>(
         "Failed to find instructions for memory address.");
 
-  const bool resolve_symbols = args.resolveSymbols.value_or(false);
+  // add the disassembly from the given address forward
   const auto num_insts = insts.GetSize();
-  for (size_t i = 0; i < num_insts; ++i) {
+  for (size_t i = 0;
+       i < num_insts && instructions.size() < args.instructionCount; ++i) {
     lldb::SBInstruction inst = insts.GetInstructionAtIndex(i);
-    auto addr = inst.GetAddress();
-    const auto inst_addr = addr.GetLoadAddress(dap.target);
-    const char *m = inst.GetMnemonic(dap.target);
-    const char *o = inst.GetOperands(dap.target);
-    const char *c = inst.GetComment(dap.target);
-    auto d = inst.GetData(dap.target);
-
-    std::string bytes;
-    llvm::raw_string_ostream sb(bytes);
-    for (unsigned i = 0; i < inst.GetByteSize(); i++) {
-      lldb::SBError error;
-      uint8_t b = d.GetUnsignedInt8(error, i);
-      if (error.Success()) {
-        sb << llvm::format("%2.2x ", b);
-      }
-    }
+    instructions.push_back(
+        SBInstructionToDisassembledInstruction(inst, resolve_symbols));
+  }
 
-    DisassembledInstruction disassembled_inst;
-    disassembled_inst.address = inst_addr;
-    disassembled_inst.instructionBytes =
-        bytes.size() > 0 ? bytes.substr(0, bytes.size() - 1) : "";
-
-    std::string instruction;
-    llvm::raw_string_ostream si(instruction);
-
-    lldb::SBSymbol symbol = addr.GetSymbol();
-    // Only add the symbol on the first line of the function.
-    if (symbol.IsValid() && symbol.GetStartAddress() == addr) {
-      // If we have a valid symbol, append it as a label prefix for the first
-      // instruction. This is so you can see the start of a function/callsite
-      // in the assembly, at the moment VS Code (1.80) does not visualize the
-      // symbol associated with the assembly instruction.
-      si << (symbol.GetMangledName() != nullptr ? symbol.GetMangledName()
-                                                : symbol.GetName())
-         << ": ";
-
-      if (resolve_symbols)
-        disassembled_inst.symbol = symbol.GetDisplayName();
-    }
+  // Pad the instructions with invalid instructions if needed.
+  if (instructions.size() < args.instructionCount)
+    for (size_t i = instructions.size(); i < args.instructionCount; ++i)
+      instructions.push_back(GetInvalidInstruction());
 
-    si << llvm::formatv("{0,7} {1,12}", m, o);
-    if (c && c[0]) {
-      si << " ; " << c;
-    }
+  return DisassembleResponseBody{std::move(instructions)};
+}
+
+std::vector<protocol::DisassembledInstruction>
+DisassembleRequestHandler::disassembleBackwards(
+    lldb::SBAddress &addr, const uint32_t instruction_count,
+    const char *flavor_string, bool resolve_symbols) const {
+  std::vector<DisassembledInstruction> instructions;
 
-    disassembled_inst.instruction = instruction;
-
-    auto line_entry = addr.GetLineEntry();
-    // If the line number is 0 then the entry represents a compiler generated
-    // location.
-    if (line_entry.GetStartAddress() == addr && line_entry.IsValid() &&
-        line_entry.GetFileSpec().IsValid() && line_entry.GetLine() != 0) {
-      auto source = CreateSource(line_entry);
-      disassembled_inst.location = std::move(source);
-
-      const auto line = line_entry.GetLine();
-      if (line != 0 && line != LLDB_INVALID_LINE_NUMBER)
-        disassembled_inst.line = line;
-
-      const auto column = line_entry.GetColumn();
-      if (column != 0 && column != LLDB_INVALID_COLUMN_NUMBER)
-        disassembled_inst.column = column;
-
-      auto end_line_entry = line_entry.GetEndAddress().GetLineEntry();
-      if (end_line_entry.IsValid() &&
-          end_line_entry.GetFileSpec() == line_entry.GetFileSpec()) {
-        const auto end_line = end_line_entry.GetLine();
-        if (end_line != 0 && end_line != LLDB_INVALID_LINE_NUMBER &&
-            end_line != line) {
-          disassembled_inst.endLine = end_line;
-
-          const auto end_column = end_line_entry.GetColumn();
-          if (end_column != 0 && end_column != LLDB_INVALID_COLUMN_NUMBER &&
-              end_column != column)
-            disassembled_inst.endColumn = end_column - 1;
+  if (dap.target.GetMinimumOpcodeByteSize() ==
+      dap.target.GetMaximumOpcodeByteSize()) {
+    // If the target has a fixed opcode size, we can disassemble backwards
+    // directly.
+    lldb::addr_t disassemble_start_load_addr =
+        addr.GetLoadAddress(dap.target) -
+        (instruction_count * dap.target.GetMinimumOpcodeByteSize());
+    lldb::SBAddress disassemble_start_addr(disassemble_start_load_addr,
+                                           dap.target);
+    lldb::SBInstructionList backwards_insts =
+        dap.target.ReadInstructions(addr, instruction_count, flavor_string);
+    if (backwards_insts.IsValid()) {
+      for (size_t i = 0; i < backwards_insts.GetSize(); ++i) {
+        lldb::SBInstruction inst = backwards_insts.GetInstructionAtIndex(i);
+        instructions.push_back(
+            SBInstructionToDisassembledInstruction(inst, resolve_symbols));
+      }
+      return instructions;
+    }
+  } else {
+    // There is no opcode fixed size so we have no idea where are the valid
+    // instructions before the current address. let's try from the start of the
+    // symbol if available.
+    auto symbol = addr.GetSymbol();
+    if (symbol.IsValid()) {
+      // add valid instructions before the current instruction using the symbol.
+      lldb::SBInstructionList symbol_insts = dap.target.ReadInstructions(
+          symbol.GetStartAddress(), addr, flavor_string);
+      if (symbol_insts.IsValid()) {
+        size_t backwards_insts_start =
+            symbol_insts.GetSize() >= instruction_count
+                ? symbol_insts.GetSize() - instruction_count
+                : 0;
+        for (size_t i = backwards_insts_start;
+             i < symbol_insts.GetSize() &&
+             instructions.size() < instruction_count;
+             ++i) {
+          lldb::SBInstruction inst = symbol_insts.GetInstructionAtIndex(i);
+          instructions.push_back(
+              SBInstructionToDisassembledInstruction(inst, resolve_symbols));
         }
       }
     }
+  }
 
-    instructions.push_back(std::move(disassembled_inst));
+  // pad the instructions with invalid instructions if needed.
+  while (instructions.size() < instruction_count) {
+    instructions.insert(instructions.begin(), GetInvalidInstruction());
   }
 
-  return DisassembleResponseBody{std::move(instructions)};
+  return instructions;
+}
+
+DisassembledInstruction
+DisassembleRequestHandler::SBInstructionToDisassembledInstruction(
+    lldb::SBInstruction &inst, bool resolve_symbols) const {
+  if (!inst.IsValid())
+    return GetInvalidInstruction();
+
+  auto addr = inst.GetAddress();
+  const auto inst_addr = addr.GetLoadAddress(dap.target);
+  const char *m = inst.GetMnemonic(dap.target);
+  const char *o = inst.GetOperands(dap.target);
+  const char *c = inst.GetComment(dap.target);
+  auto d = inst.GetData(dap.target);
+
+  std::string bytes;
+  llvm::raw_string_ostream sb(bytes);
+  for (unsigned i = 0; i < inst.GetByteSize(); i++) {
+    lldb::SBError error;
+    uint8_t b = d.GetUnsignedInt8(error, i);
+    if (error.Success())
+      sb << llvm::format("%2.2x ", b);
+  }
+
+  DisassembledInstruction disassembled_inst;
+  disassembled_inst.address = inst_addr;
+  disassembled_inst.instructionBytes =
+      bytes.size() > 0 ? bytes.substr(0, bytes.size() - 1) : "";
+
+  std::string instruction;
+  llvm::raw_string_ostream si(instruction);
+
+  lldb::SBSymbol symbol = addr.GetSymbol();
+  // Only add the symbol on the first line of the function.
+  if (symbol.IsValid() && symbol.GetStartAddress() == addr) {
+    // If we have a valid symbol, append it as a label prefix for the first
+    // instruction. This is so you can see the start of a function/callsite
+    // in the assembly, at the moment VS Code (1.80) does not visualize the
+    // symbol associated with the assembly instruction.
+    si << (symbol.GetMangledName() != nullptr ? symbol.GetMangledName()
+                                              : symbol.GetName())
+       << ": ";
+
+    if (resolve_symbols)
+      disassembled_inst.symbol = symbol.GetDisplayName();
+  }
+
+  si << llvm::formatv("{0,7} {1,12}", m, o);
+  if (c && c[0]) {
+    si << " ; " << c;
+  }
+
+  disassembled_inst.instruction = std::move(instruction);
+
+  auto line_entry = addr.GetLineEntry();
+  // If the line number is 0 then the entry represents a compiler generated
+  // location.
+
+  if (line_entry.GetStartAddress() == addr && line_entry.IsValid() &&
+      line_entry.GetFileSpec().IsValid() && line_entry.GetLine() != 0) {
+    auto source = CreateSource(line_entry);
+    disassembled_inst.location = std::move(source);
+
+    const auto line = line_entry.GetLine();
+    if (line != 0 && line != LLDB_INVALID_LINE_NUMBER)
+      disassembled_inst.line = line;
+
+    const auto column = line_entry.GetColumn();
+    if (column != 0 && column != LLDB_INVALID_COLUMN_NUMBER)
+      disassembled_inst.column = column;
+
+    auto end_line_entry = line_entry.GetEndAddress().GetLineEntry();
+    if (end_line_entry.IsValid() &&
+        end_line_entry.GetFileSpec() == line_entry.GetFileSpec()) {
+      const auto end_line = end_line_entry.GetLine();
+      if (...
[truncated]

@eronnen eronnen requested review from ashgti and da-viper May 19, 2025 23:56
Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment, the code is becoming quite complicated and I think it would help if we can break things up more. Looking at the protocol arguments we have and offset in bytes, an offset in instructions, and then an instruction count. If possible please try to mimic that logic in the structure of this function:

  1. Take the address
  2. Apply the byte offset
  3. Compute the instruction offset and apply it (the logic to do that is complex and therefore this should be factored out into a helper function)
  4. Get the number of instructions at the computed offset and create a list of SBInstructions to disassemble. You can use a default constructed SBInstruction for padding.
  5. Disassemble the list of SBInstructions

Looking at the code, I think that's doable, albeit maybe slightly less efficiently?

Comment on lines 1675 to 1677
if (target_sp) {
return target_sp->GetArchitecture().GetMinimumOpcodeByteSize();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (target_sp) {
return target_sp->GetArchitecture().GetMinimumOpcodeByteSize();
}
if (target_sp)
return target_sp->GetArchitecture().GetMinimumOpcodeByteSize();

Comment on lines 1685 to 1687
if (target_sp) {
return target_sp->GetArchitecture().GetMaximumOpcodeByteSize();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (target_sp) {
return target_sp->GetArchitecture().GetMaximumOpcodeByteSize();
}
if (target_sp)
return target_sp->GetArchitecture().GetMaximumOpcodeByteSize();

"Failed to find instructions for memory address.");

const bool resolve_symbols = args.resolveSymbols.value_or(false);
// add the disassembly from the given address forward
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// add the disassembly from the given address forward
// Add the disassembly starting at the given address.

}
}

int64_t instructionOffset = args.instructionOffset.value_or(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment saying that this is the number of instructions.

Suggested change
int64_t instructionOffset = args.instructionOffset.value_or(0);
// Offset (in instructions) to be applied after the byte offset (if any) before disassembling. Can be negative.
int64_t instruction_offset = args.instructionOffset.value_or(0);

Comment on lines 62 to 65
return llvm::make_error<DAPError>(
"Failed to disassemble instructions after " +
std::to_string(instructionOffset) +
" instructions from the given address.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand: we're erroring out here because if there are already not enough instructions to cover the instruction offset, there's no way we can assemble anymore (i.e. the ones covered by instructionCount)? Might be worth a comment. My initial reaction was: why not do a best case effort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I can always pad with invalid instructions if there are not enough instructions

Comment on lines 73 to 75
if (instructionOffset < 0)
instructions = disassembleBackwards(addr, std::abs(instructionOffset),
flavor_string.c_str(), resolve_symbols);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more readable if you handled the positive instruction offset case in the else case here instead of breaking them up between line 58 and here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

instructions = disassembleBackwards(addr, std::abs(instructionOffset),
flavor_string.c_str(), resolve_symbols);

const auto instructions_left = args.instructionCount - instructions.size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use remaining instead of left, the latter could give the impression you mean left (before) as opposed to right (after).

Comment on lines 153 to 155
// pad the instructions with invalid instructions if needed.
while (instructions.size() < instruction_count) {
instructions.insert(instructions.begin(), GetInvalidInstruction());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're only padding when we don't have a fixed opcode size because of the early return on line 125 (unless the instruction isn't valid).

Suggested change
// pad the instructions with invalid instructions if needed.
while (instructions.size() < instruction_count) {
instructions.insert(instructions.begin(), GetInvalidInstruction());
// Pad the instructions with invalid instructions if needed.
while (instructions.size() < instruction_count)
instructions.insert(instructions.begin(), GetInvalidInstruction());

// symbol if available.
auto symbol = addr.GetSymbol();
if (symbol.IsValid()) {
// add valid instructions before the current instruction using the symbol.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// add valid instructions before the current instruction using the symbol.
// Add valid instructions before the current instruction using the symbol.

@eronnen
Copy link
Contributor Author

eronnen commented May 20, 2025

@JDevlieghere I Tried to simplify the logics as much as I can in the way you asked, handling the case of unfixed opcode size and disassembling backwards is still a bit complex because there is the edge case where padding is needed, but I hope it's readable enough now.

@eronnen eronnen requested a review from JDevlieghere May 20, 2025 21:25
Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This is a big improvement compared to the previous revision and makes it much easier to follow. Great job!

Copy link
Contributor

@ashgti ashgti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Comment on lines 352 to 354
uint32_t GetMinimumOpcodeByteSize() const;

uint32_t GetMaximumOpcodeByteSize() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment?

Architecture opcode byte size width accessor...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@eronnen eronnen merged commit 9d0614e into llvm:main May 21, 2025
6 of 10 checks passed
@medismailben
Copy link
Member

Hey @eronnen, I think this change broke the macOS lldb incremental bot: https://green.lab.llvm.org/job/llvm.org/view/LLDB/job/as-lldb-cmake/26259/execution/node/106/log/?consoleFull

Let me know if you need help to investigate it or if we should revert it if you don't have time to take a look.

@eronnen
Copy link
Contributor Author

eronnen commented May 22, 2025

@medismailben Unfortunately I dont own a mac so I can't really test but I think this patch should fix it: #140975

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants