Skip to content

Conversation

@maksfb
Copy link
Contributor

@maksfb maksfb commented Feb 20, 2025

Add AArch64MCSymbolizer that symbolizes MCInst operands during disassembly. The symbolization was previously done in BinaryFunction::disassemble(), but it is also required by scanExternalRefs() for "lite" mode functionality. Hence, similar to x86, I've implemented the symbolizer interface that uses BinaryFunction relocations to properly create instruction operands. I expect the result of the disassembly to be identical after the change.

AArch64 disassembler was not calling tryAddingSymbolicOperand() for MOV instructions. Fix that. Additionally, the disassembler marks ldr instructions as branches by setting IsBranch parameter to true. Ignore the parameter and rely on MCPlusBuilder interface instead.

I've modified --check-encoding flag to check symolization of operands of instructions that have relocations against them.

Add AArch64MCSymbolizer that symbolizes MCInst operands during
disassembly. The symbolization was previously done in
`BinaryFunction::disassemble()`, but it is also required by
`scanExternalRefs()` for "lite" mode functionality. Hence, similar to
x86, I've implemented the symbolizer interface that uses BinaryFunction
relocations to properly create instruction operands. I expect the result
of the disassembly to be identical after the change.

So far, the only quirk of AArch64 disassembler that I found is that it
marks `ldr` instructions as branch by setting `IsBranch` parameter to
true. Ignore the parameter and rely on `MCPlusBuilder` interface
instead.
@llvmbot
Copy link
Member

llvmbot commented Feb 20, 2025

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-bolt

Author: Maksim Panchenko (maksfb)

Changes

Add AArch64MCSymbolizer that symbolizes MCInst operands during disassembly. The symbolization was previously done in BinaryFunction::disassemble(), but it is also required by scanExternalRefs() for "lite" mode functionality. Hence, similar to x86, I've implemented the symbolizer interface that uses BinaryFunction relocations to properly create instruction operands. I expect the result of the disassembly to be identical after the change.

So far, the only quirk of AArch64 disassembler that I found is that it marks ldr instructions as branch by setting IsBranch parameter to true. Ignore the parameter and rely on MCPlusBuilder interface instead.


Full diff: https://github.com/llvm/llvm-project/pull/127969.diff

5 Files Affected:

  • (modified) bolt/lib/Core/BinaryFunction.cpp (+1-20)
  • (modified) bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp (+7)
  • (added) bolt/lib/Target/AArch64/AArch64MCSymbolizer.cpp (+86)
  • (added) bolt/lib/Target/AArch64/AArch64MCSymbolizer.h (+44)
  • (modified) bolt/lib/Target/AArch64/CMakeLists.txt (+1)
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index bc45caf3ec8b7..f64f2a3591061 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -1434,9 +1434,8 @@ Error BinaryFunction::disassemble() {
         if (BC.isAArch64())
           handleAArch64IndirectCall(Instruction, Offset);
       }
-    } else if (BC.isAArch64() || BC.isRISCV()) {
+    } else if (BC.isRISCV()) {
       // Check if there's a relocation associated with this instruction.
-      bool UsedReloc = false;
       for (auto Itr = Relocations.lower_bound(Offset),
                 ItrE = Relocations.lower_bound(Offset + Size);
            Itr != ItrE; ++Itr) {
@@ -1461,24 +1460,6 @@ Error BinaryFunction::disassemble() {
             Relocation.Type);
         (void)Result;
         assert(Result && "cannot replace immediate with relocation");
-
-        // For aarch64, if we replaced an immediate with a symbol from a
-        // relocation, we mark it so we do not try to further process a
-        // pc-relative operand. All we need is the symbol.
-        UsedReloc = true;
-      }
-
-      if (!BC.isRISCV() && MIB->hasPCRelOperand(Instruction) && !UsedReloc) {
-        if (auto NewE = handleErrors(
-                handlePCRelOperand(Instruction, AbsoluteInstrAddr, Size),
-                [&](const BOLTError &E) -> Error {
-                  if (E.isFatal())
-                    return Error(std::make_unique<BOLTError>(std::move(E)));
-                  if (!E.getMessage().empty())
-                    E.log(BC.errs());
-                  return Error::success();
-                }))
-          return Error(std::move(NewE));
       }
     }
 
diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
index 4b21ff719b3ab..60efa25c54ffa 100644
--- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
@@ -10,6 +10,7 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include "AArch64MCSymbolizer.h"
 #include "MCTargetDesc/AArch64AddressingModes.h"
 #include "MCTargetDesc/AArch64FixupKinds.h"
 #include "MCTargetDesc/AArch64MCExpr.h"
@@ -133,6 +134,12 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
 public:
   using MCPlusBuilder::MCPlusBuilder;
 
+  std::unique_ptr<MCSymbolizer>
+  createTargetSymbolizer(BinaryFunction &Function,
+                         bool CreateNewSymbols) const override {
+    return std::make_unique<AArch64MCSymbolizer>(Function, CreateNewSymbols);
+  }
+
   MCPhysReg getStackPointer() const override { return AArch64::SP; }
   MCPhysReg getFramePointer() const override { return AArch64::FP; }
 
diff --git a/bolt/lib/Target/AArch64/AArch64MCSymbolizer.cpp b/bolt/lib/Target/AArch64/AArch64MCSymbolizer.cpp
new file mode 100644
index 0000000000000..de92a5a493e27
--- /dev/null
+++ b/bolt/lib/Target/AArch64/AArch64MCSymbolizer.cpp
@@ -0,0 +1,86 @@
+//===- bolt/Target/AArch64/AArch64MCSymbolizer.cpp ------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "AArch64MCSymbolizer.h"
+#include "bolt/Core/BinaryContext.h"
+#include "bolt/Core/BinaryFunction.h"
+#include "bolt/Core/MCPlusBuilder.h"
+#include "bolt/Core/Relocation.h"
+#include "llvm/MC/MCInst.h"
+#include "llvm/MC/MCRegisterInfo.h"
+#include "llvm/Support/Debug.h"
+
+#define DEBUG_TYPE "bolt-symbolizer"
+
+namespace llvm {
+namespace bolt {
+
+AArch64MCSymbolizer::~AArch64MCSymbolizer() {}
+
+bool AArch64MCSymbolizer::tryAddingSymbolicOperand(
+    MCInst &Inst, raw_ostream &CStream, int64_t Value, uint64_t InstAddress,
+    bool IsBranch, uint64_t ImmOffset, uint64_t ImmSize, uint64_t InstSize) {
+  BinaryContext &BC = Function.getBinaryContext();
+  MCContext *Ctx = BC.Ctx.get();
+
+  // NOTE: the callee may incorrectly set IsBranch.
+  if (BC.MIB->isBranch(Inst) || BC.MIB->isCall(Inst))
+    return false;
+
+  // TODO: add handling for linker "relaxation". At the moment, relocations
+  // corresponding to "relaxed" instructions are excluded from BinaryFunction
+  // relocation list.
+
+  const uint64_t InstOffset = InstAddress - Function.getAddress();
+  const Relocation *Relocation = Function.getRelocationAt(InstOffset);
+
+  /// Add symbolic operand to the instruction with an optional addend.
+  auto addOperand = [&](const MCSymbol *Symbol, uint64_t Addend,
+                        uint64_t RelType) {
+    const MCExpr *Expr = MCSymbolRefExpr::create(Symbol, *Ctx);
+    if (Addend)
+      Expr = MCBinaryExpr::createAdd(Expr, MCConstantExpr::create(Addend, *Ctx),
+                                     *Ctx);
+    Inst.addOperand(MCOperand::createExpr(
+        BC.MIB->getTargetExprFor(Inst, Expr, *Ctx, RelType)));
+  };
+
+  if (Relocation) {
+    addOperand(Relocation->Symbol, Relocation->Addend, Relocation->Type);
+    return true;
+  }
+
+  if (!BC.MIB->hasPCRelOperand(Inst))
+    return false;
+
+  Value += InstAddress;
+  const MCSymbol *TargetSymbol;
+  uint64_t TargetOffset;
+  if (!CreateNewSymbols) {
+    if (BinaryData *BD = BC.getBinaryDataContainingAddress(Value)) {
+      TargetSymbol = BD->getSymbol();
+      TargetOffset = Value - BD->getAddress();
+    } else {
+      return false;
+    }
+  } else {
+    std::tie(TargetSymbol, TargetOffset) =
+        BC.handleAddressRef(Value, Function, /*IsPCRel*/ true);
+  }
+
+  addOperand(TargetSymbol, TargetOffset, 0);
+
+  return true;
+}
+
+void AArch64MCSymbolizer::tryAddingPcLoadReferenceComment(raw_ostream &CStream,
+                                                          int64_t Value,
+                                                          uint64_t Address) {}
+
+} // namespace bolt
+} // namespace llvm
diff --git a/bolt/lib/Target/AArch64/AArch64MCSymbolizer.h b/bolt/lib/Target/AArch64/AArch64MCSymbolizer.h
new file mode 100644
index 0000000000000..56ba4fbcaf275
--- /dev/null
+++ b/bolt/lib/Target/AArch64/AArch64MCSymbolizer.h
@@ -0,0 +1,44 @@
+//===- bolt/Target/AArch64/AArch64MCSymbolizer.cpp --------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef BOLT_TARGET_AARCH64_AARCH64MCSYMBOLIZER_H
+#define BOLT_TARGET_AARCH64_AARCH64MCSYMBOLIZER_H
+
+#include "bolt/Core/BinaryFunction.h"
+#include "llvm/MC/MCDisassembler/MCSymbolizer.h"
+
+namespace llvm {
+namespace bolt {
+
+class AArch64MCSymbolizer : public MCSymbolizer {
+protected:
+  BinaryFunction &Function;
+  bool CreateNewSymbols{true};
+
+public:
+  AArch64MCSymbolizer(BinaryFunction &Function, bool CreateNewSymbols = true)
+      : MCSymbolizer(*Function.getBinaryContext().Ctx.get(), nullptr),
+        Function(Function), CreateNewSymbols(CreateNewSymbols) {}
+
+  AArch64MCSymbolizer(const AArch64MCSymbolizer &) = delete;
+  AArch64MCSymbolizer &operator=(const AArch64MCSymbolizer &) = delete;
+  virtual ~AArch64MCSymbolizer();
+
+  bool tryAddingSymbolicOperand(MCInst &Inst, raw_ostream &CStream,
+                                int64_t Value, uint64_t Address, bool IsBranch,
+                                uint64_t Offset, uint64_t OpSize,
+                                uint64_t InstSize) override;
+
+  void tryAddingPcLoadReferenceComment(raw_ostream &CStream, int64_t Value,
+                                       uint64_t Address) override;
+};
+
+} // namespace bolt
+} // namespace llvm
+
+#endif
diff --git a/bolt/lib/Target/AArch64/CMakeLists.txt b/bolt/lib/Target/AArch64/CMakeLists.txt
index 8435ea7245e7e..6cf50ae2df9eb 100644
--- a/bolt/lib/Target/AArch64/CMakeLists.txt
+++ b/bolt/lib/Target/AArch64/CMakeLists.txt
@@ -18,6 +18,7 @@ endif()
 
 add_llvm_library(LLVMBOLTTargetAArch64
   AArch64MCPlusBuilder.cpp
+  AArch64MCSymbolizer.cpp
 
   NO_EXPORT
   DISABLE_LLVM_LINK_LLVM_DYLIB

Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Maks,

Thanks for working on the scanExternalRefs/lite-mode area.
Not fully confident, but it does looks equivalent to the X86 implementation to me.

I do get the following test failures on AArch64:

  1. max-funcs.test: crashes at extractFixupExpr:1814
  2. issue177.s: it core dumps with illegal hardware instruction
    This looks like it's triggered by --trap-old-code but also a couple of movk operands in main are different from before this patch.

#include "bolt/Core/MCPlusBuilder.h"
#include "bolt/Core/Relocation.h"
#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCRegisterInfo.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDE warns that this 'is not used directly'

Suggested change
#include "llvm/MC/MCRegisterInfo.h"

add_public_tablegen_target(AArch64CommonTableGen)
include_directories(${CMAKE_CURRENT_BINARY_DIR})
endif()

Copy link
Member

@paschalis-mpeis paschalis-mpeis Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On AArch64 I get linking errors. I believe on LLVM_LINK_COMPONENTS (line 2) we also need MCDisassembler.

@maksfb
Copy link
Contributor Author

maksfb commented Feb 20, 2025

Hi Paschalis,

Thanks for identifying the issues. The first one is caused by a currently limited support for AArch64 relocations in createRelocation() function. For this PR, I might have to disable the symbolizer in scanExternalRefs() while I work on adding new relocations.

The second issue is caused by LLVM AArch64 disassembler. It does not invoke tryAddingSymbolicOperand() for movz and movk instructions. Will need to dig into why.

@maksfb
Copy link
Contributor Author

maksfb commented Mar 1, 2025

All issues with this PR should be fixed. I have a stack on top that handles symbolization in scanExternalRefs().

Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks Maksim. The changes look good.

@maksfb maksfb changed the title [BOLT] Add symbolizer for AArch64 disassembler. NFCI [BOLT][AArch64] Add symbolizer for AArch64 disassembler. NFCI Mar 3, 2025
@maksfb
Copy link
Contributor Author

maksfb commented Mar 3, 2025

@yota9, does this look fine to you? I need someone to stamp the PR.

Copy link
Member

@yota9 yota9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you!

@maksfb maksfb merged commit b971d4d into llvm:main Mar 3, 2025
11 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Mar 4, 2025

LLVM Buildbot has detected a new failure on builder lld-x86_64-win running on as-worker-93 while building bolt,llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/2403

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM-Unit :: Support/./SupportTests.exe/38/87' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe-LLVM-Unit-10428-38-87.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=87 GTEST_SHARD_INDEX=38 C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe
--

Script:
--
C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe --gtest_filter=ProgramEnvTest.CreateProcessLongPath
--
C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(160): error: Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(163): error: fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied



C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:160
Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:163
fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied




********************


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants