Skip to content

Conversation

@lenary
Copy link
Member

@lenary lenary commented Nov 5, 2025

This adds support for --symbolize-operands, so that local references are turned back into labels by objdump, which makes it easier to tell what is going on with a linked object.

When using --symbolize-operands, branch target addresses are not printed, only the referenced symbol is printed, and the address is elided:

# Without --symbolize-operands
       0: 04a05263      blez    a0, 0x44 <.text+0x44>
...
      40: fd1ff06f      j       0x10 <.text+0x10>
      44: 00000613      li      a2, 0x0

# With --symbolize-operands
       0: 04a05263      blez    a0,  <L3>
...
      40: fd1ff06f      j        <L0>
<L3>:
      44: 00000613      li      a2, 0x0

@llvmbot
Copy link
Member

llvmbot commented Nov 5, 2025

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-binary-utilities

Author: Sam Elliott (lenary)

Changes

This adds support for --symbolize-operands, so that local references are turned back into labels by objdump, which makes it easier to tell what is going on with a linked object.

When using --symbolize-operands, branch target addresses are not printed, only the referenced symbol is printed:

# Without --symbolize-operands
       0: 04a05263      blez    a0, 0x44 &lt;.text+0x44&gt;
...
      40: fd1ff06f      j       0x10 &lt;.text+0x10&gt;
      44: 00000613      li      a2, 0x0

# With --symbolize-operands
       0: 04a05263      blez    a0,  &lt;L3&gt;
...
      40: fd1ff06f      j        &lt;L0&gt;
&lt;L3&gt;:
      44: 00000613      li      a2, 0x0

Full diff: https://github.com/llvm/llvm-project/pull/166656.diff

5 Files Affected:

  • (modified) llvm/docs/CommandGuide/llvm-objdump.rst (+1-1)
  • (modified) llvm/docs/ReleaseNotes.md (+1)
  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.cpp (+5)
  • (added) llvm/test/MC/RISCV/symbolize-operands.s (+43)
  • (modified) llvm/tools/llvm-objdump/llvm-objdump.cpp (+2-1)
diff --git a/llvm/docs/CommandGuide/llvm-objdump.rst b/llvm/docs/CommandGuide/llvm-objdump.rst
index aaf38f84b92e5..44649c670dd42 100644
--- a/llvm/docs/CommandGuide/llvm-objdump.rst
+++ b/llvm/docs/CommandGuide/llvm-objdump.rst
@@ -284,7 +284,7 @@ OPTIONS
   any analysis with a special representation (i.e. BlockFrequency,
   BranchProbability, etc) are printed as raw hex values.
 
-  Only supported for AArch64, BPF, PowerPC, and X86.
+  Only supported for AArch64, BPF, PowerPC, RISC-V, and X86.
 
   Example:
     A non-symbolized branch instruction with a local target and pc-relative memory access like
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index bfe68274eae3f..61ff14f7f6255 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -135,6 +135,7 @@ Changes to the RISC-V Backend
 * Adds experimental support for the 'Zibi` (Branch with Immediate) extension.
 * Add support for Zvfofp8min (OFP8 conversion extension)
 * Adds assembler support for the Andes `XAndesvsinth` (Andes Vector Small Int Handling Extension).
+* `llvm-objdump` now has support for `--symbolize-operands` with RISC-V.
 
 Changes to the WebAssembly Backend
 ----------------------------------
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.cpp
index 7b9c4b3e800cd..02bdbb8c5155c 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.cpp
@@ -16,6 +16,7 @@
 #include "llvm/MC/MCExpr.h"
 #include "llvm/MC/MCInst.h"
 #include "llvm/MC/MCInstPrinter.h"
+#include "llvm/MC/MCInstrAnalysis.h"
 #include "llvm/MC/MCSubtargetInfo.h"
 #include "llvm/MC/MCSymbol.h"
 #include "llvm/Support/CommandLine.h"
@@ -108,6 +109,10 @@ void RISCVInstPrinter::printBranchOperand(const MCInst *MI, uint64_t Address,
                                           unsigned OpNo,
                                           const MCSubtargetInfo &STI,
                                           raw_ostream &O) {
+  // Do not print the numeric target address when symbolizing.
+  if (SymbolizeOperands)
+    return;
+
   const MCOperand &MO = MI->getOperand(OpNo);
   if (!MO.isImm())
     return printOperand(MI, OpNo, STI, O);
diff --git a/llvm/test/MC/RISCV/symbolize-operands.s b/llvm/test/MC/RISCV/symbolize-operands.s
new file mode 100644
index 0000000000000..cad1f3d265342
--- /dev/null
+++ b/llvm/test/MC/RISCV/symbolize-operands.s
@@ -0,0 +1,43 @@
+# RUN: llvm-mc -triple=riscv32 < %s -mattr=-relax -filetype=obj -o - \
+# RUN: | llvm-objdump -d --no-leading-addr --no-show-raw-insn --symbolize-operands - \
+# RUN: | FileCheck %s
+
+# CHECK-LABEL: <.text>:
+  .text
+  .p2align  2
+# CHECK: blez a0, <L3>
+  blez a0, .LBB0_6
+  li a3, 0
+  li a2, 0
+# CHECK: j <L1>
+  j .LBB0_3
+# CHECK-NEXT: <L0>:
+.LBB0_2:
+  addi a3, a3, 1
+# CHECK: beq a3, a0, <L4>
+  beq a3, a0, .LBB0_7
+# CHECK-NEXT: <L1>:
+.LBB0_3:
+  slli a4, a3, 2
+  add a4, a1, a4
+  lw a5, 0(a4)
+  lbu a4, 0(a5)
+# CHECK: beqz a4, <L0>
+  beqz a4, .LBB0_2
+  addi a5, a5, 1
+# CHECK: <L2>
+.LBB0_5:
+  add a2, a2, a4
+  lbu a4, 0(a5)
+  addi a5, a5, 1
+# CHECK: bnez a4, <L2>
+  bnez a4, .LBB0_5
+# CHECK-NEXT: j <L0>
+  j .LBB0_2
+# CHECK-NEXT: <L3>:
+.LBB0_6:
+  li a2, 0
+# CHECK: <L4>:
+.LBB0_7:
+  mv a0, a2
+  ret
diff --git a/llvm/tools/llvm-objdump/llvm-objdump.cpp b/llvm/tools/llvm-objdump/llvm-objdump.cpp
index 3ec644a472bfc..0153badb6603e 100644
--- a/llvm/tools/llvm-objdump/llvm-objdump.cpp
+++ b/llvm/tools/llvm-objdump/llvm-objdump.cpp
@@ -1571,7 +1571,8 @@ collectLocalBranchTargets(ArrayRef<uint8_t> Bytes, MCInstrAnalysis *MIA,
   const bool isX86 = STI->getTargetTriple().isX86();
   const bool isAArch64 = STI->getTargetTriple().isAArch64();
   const bool isBPF = STI->getTargetTriple().isBPF();
-  if (!isPPC && !isX86 && !isAArch64 && !isBPF)
+  const bool isRISCV = STI->getTargetTriple().isRISCV();
+  if (!isPPC && !isX86 && !isAArch64 && !isBPF && !isRISCV)
     return;
 
   if (MIA)

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections here, but this needs input from one or more people with RISC-V knowledge.

You probably also want to add llvm-objdump to the PR title somewhere for the benefit of casual readers of the commit list.

Copy link
Member

@arichardson arichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, just a few suggestions for more test coverage.

# CHECK-LABEL: <.text>:
.text
.p2align 2
# CHECK: blez a0, <L3>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this currently only support branches or also instructions like jalr/ld?

Would be good to add a test showing how those behave as well. And maybe one test case where symbolization fails? something like blez a0, 0x123?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the diff below it looks like this only affects branches, but I think it might still be good to have something like

foo:
   j bar
bar:
   ...

in the test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't yet have support for addi/loads/stores, as far as I understand. That is in #144620.

I will look at the cases where symbolization might fail, but I'm not sure they're covered well in other architectures either. LLVM would interpret blez a0, 0x122 (0x123 is not encodable) as a branch to pc+0x122, which it might still create a symbol for even if it's not dumped for that function, so I don't know what the answer is there.

I will try some cases, but I think the "don't print the number" signal doesn't depend on whether something symbolized correctly, which might be good or bad.

I also don't mind if we print both the number and the symbol - it does make it harder to diff the output, but post-processing is normally needed before a dump can be diffed well anyway.

Comment on lines +12 to +13
# CHECK: j <L1>
j .LBB0_3
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arichardson this includes a j for jumps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I missed that one, so looks like we do handle jalr (at least for local symbols?). Could add another one calling a non-local symbol?

# CHECK-LABEL: <.text>:
.text
.p2align 2
# CHECK: blez a0, <L3>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't yet have support for addi/loads/stores, as far as I understand. That is in #144620.

I will look at the cases where symbolization might fail, but I'm not sure they're covered well in other architectures either. LLVM would interpret blez a0, 0x122 (0x123 is not encodable) as a branch to pc+0x122, which it might still create a symbol for even if it's not dumped for that function, so I don't know what the answer is there.

I will try some cases, but I think the "don't print the number" signal doesn't depend on whether something symbolized correctly, which might be good or bad.

I also don't mind if we print both the number and the symbol - it does make it harder to diff the output, but post-processing is normally needed before a dump can be diffed well anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants