-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[BOLT][AArch64] Add symbolizer for AArch64 disassembler. NFCI #127969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add AArch64MCSymbolizer that symbolizes MCInst operands during disassembly. The symbolization was previously done in `BinaryFunction::disassemble()`, but it is also required by `scanExternalRefs()` for "lite" mode functionality. Hence, similar to x86, I've implemented the symbolizer interface that uses BinaryFunction relocations to properly create instruction operands. I expect the result of the disassembly to be identical after the change. So far, the only quirk of AArch64 disassembler that I found is that it marks `ldr` instructions as branch by setting `IsBranch` parameter to true. Ignore the parameter and rely on `MCPlusBuilder` interface instead.
|
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-bolt Author: Maksim Panchenko (maksfb) ChangesAdd AArch64MCSymbolizer that symbolizes So far, the only quirk of AArch64 disassembler that I found is that it marks Full diff: https://github.com/llvm/llvm-project/pull/127969.diff 5 Files Affected:
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index bc45caf3ec8b7..f64f2a3591061 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -1434,9 +1434,8 @@ Error BinaryFunction::disassemble() {
if (BC.isAArch64())
handleAArch64IndirectCall(Instruction, Offset);
}
- } else if (BC.isAArch64() || BC.isRISCV()) {
+ } else if (BC.isRISCV()) {
// Check if there's a relocation associated with this instruction.
- bool UsedReloc = false;
for (auto Itr = Relocations.lower_bound(Offset),
ItrE = Relocations.lower_bound(Offset + Size);
Itr != ItrE; ++Itr) {
@@ -1461,24 +1460,6 @@ Error BinaryFunction::disassemble() {
Relocation.Type);
(void)Result;
assert(Result && "cannot replace immediate with relocation");
-
- // For aarch64, if we replaced an immediate with a symbol from a
- // relocation, we mark it so we do not try to further process a
- // pc-relative operand. All we need is the symbol.
- UsedReloc = true;
- }
-
- if (!BC.isRISCV() && MIB->hasPCRelOperand(Instruction) && !UsedReloc) {
- if (auto NewE = handleErrors(
- handlePCRelOperand(Instruction, AbsoluteInstrAddr, Size),
- [&](const BOLTError &E) -> Error {
- if (E.isFatal())
- return Error(std::make_unique<BOLTError>(std::move(E)));
- if (!E.getMessage().empty())
- E.log(BC.errs());
- return Error::success();
- }))
- return Error(std::move(NewE));
}
}
diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
index 4b21ff719b3ab..60efa25c54ffa 100644
--- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
@@ -10,6 +10,7 @@
//
//===----------------------------------------------------------------------===//
+#include "AArch64MCSymbolizer.h"
#include "MCTargetDesc/AArch64AddressingModes.h"
#include "MCTargetDesc/AArch64FixupKinds.h"
#include "MCTargetDesc/AArch64MCExpr.h"
@@ -133,6 +134,12 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
public:
using MCPlusBuilder::MCPlusBuilder;
+ std::unique_ptr<MCSymbolizer>
+ createTargetSymbolizer(BinaryFunction &Function,
+ bool CreateNewSymbols) const override {
+ return std::make_unique<AArch64MCSymbolizer>(Function, CreateNewSymbols);
+ }
+
MCPhysReg getStackPointer() const override { return AArch64::SP; }
MCPhysReg getFramePointer() const override { return AArch64::FP; }
diff --git a/bolt/lib/Target/AArch64/AArch64MCSymbolizer.cpp b/bolt/lib/Target/AArch64/AArch64MCSymbolizer.cpp
new file mode 100644
index 0000000000000..de92a5a493e27
--- /dev/null
+++ b/bolt/lib/Target/AArch64/AArch64MCSymbolizer.cpp
@@ -0,0 +1,86 @@
+//===- bolt/Target/AArch64/AArch64MCSymbolizer.cpp ------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "AArch64MCSymbolizer.h"
+#include "bolt/Core/BinaryContext.h"
+#include "bolt/Core/BinaryFunction.h"
+#include "bolt/Core/MCPlusBuilder.h"
+#include "bolt/Core/Relocation.h"
+#include "llvm/MC/MCInst.h"
+#include "llvm/MC/MCRegisterInfo.h"
+#include "llvm/Support/Debug.h"
+
+#define DEBUG_TYPE "bolt-symbolizer"
+
+namespace llvm {
+namespace bolt {
+
+AArch64MCSymbolizer::~AArch64MCSymbolizer() {}
+
+bool AArch64MCSymbolizer::tryAddingSymbolicOperand(
+ MCInst &Inst, raw_ostream &CStream, int64_t Value, uint64_t InstAddress,
+ bool IsBranch, uint64_t ImmOffset, uint64_t ImmSize, uint64_t InstSize) {
+ BinaryContext &BC = Function.getBinaryContext();
+ MCContext *Ctx = BC.Ctx.get();
+
+ // NOTE: the callee may incorrectly set IsBranch.
+ if (BC.MIB->isBranch(Inst) || BC.MIB->isCall(Inst))
+ return false;
+
+ // TODO: add handling for linker "relaxation". At the moment, relocations
+ // corresponding to "relaxed" instructions are excluded from BinaryFunction
+ // relocation list.
+
+ const uint64_t InstOffset = InstAddress - Function.getAddress();
+ const Relocation *Relocation = Function.getRelocationAt(InstOffset);
+
+ /// Add symbolic operand to the instruction with an optional addend.
+ auto addOperand = [&](const MCSymbol *Symbol, uint64_t Addend,
+ uint64_t RelType) {
+ const MCExpr *Expr = MCSymbolRefExpr::create(Symbol, *Ctx);
+ if (Addend)
+ Expr = MCBinaryExpr::createAdd(Expr, MCConstantExpr::create(Addend, *Ctx),
+ *Ctx);
+ Inst.addOperand(MCOperand::createExpr(
+ BC.MIB->getTargetExprFor(Inst, Expr, *Ctx, RelType)));
+ };
+
+ if (Relocation) {
+ addOperand(Relocation->Symbol, Relocation->Addend, Relocation->Type);
+ return true;
+ }
+
+ if (!BC.MIB->hasPCRelOperand(Inst))
+ return false;
+
+ Value += InstAddress;
+ const MCSymbol *TargetSymbol;
+ uint64_t TargetOffset;
+ if (!CreateNewSymbols) {
+ if (BinaryData *BD = BC.getBinaryDataContainingAddress(Value)) {
+ TargetSymbol = BD->getSymbol();
+ TargetOffset = Value - BD->getAddress();
+ } else {
+ return false;
+ }
+ } else {
+ std::tie(TargetSymbol, TargetOffset) =
+ BC.handleAddressRef(Value, Function, /*IsPCRel*/ true);
+ }
+
+ addOperand(TargetSymbol, TargetOffset, 0);
+
+ return true;
+}
+
+void AArch64MCSymbolizer::tryAddingPcLoadReferenceComment(raw_ostream &CStream,
+ int64_t Value,
+ uint64_t Address) {}
+
+} // namespace bolt
+} // namespace llvm
diff --git a/bolt/lib/Target/AArch64/AArch64MCSymbolizer.h b/bolt/lib/Target/AArch64/AArch64MCSymbolizer.h
new file mode 100644
index 0000000000000..56ba4fbcaf275
--- /dev/null
+++ b/bolt/lib/Target/AArch64/AArch64MCSymbolizer.h
@@ -0,0 +1,44 @@
+//===- bolt/Target/AArch64/AArch64MCSymbolizer.cpp --------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef BOLT_TARGET_AARCH64_AARCH64MCSYMBOLIZER_H
+#define BOLT_TARGET_AARCH64_AARCH64MCSYMBOLIZER_H
+
+#include "bolt/Core/BinaryFunction.h"
+#include "llvm/MC/MCDisassembler/MCSymbolizer.h"
+
+namespace llvm {
+namespace bolt {
+
+class AArch64MCSymbolizer : public MCSymbolizer {
+protected:
+ BinaryFunction &Function;
+ bool CreateNewSymbols{true};
+
+public:
+ AArch64MCSymbolizer(BinaryFunction &Function, bool CreateNewSymbols = true)
+ : MCSymbolizer(*Function.getBinaryContext().Ctx.get(), nullptr),
+ Function(Function), CreateNewSymbols(CreateNewSymbols) {}
+
+ AArch64MCSymbolizer(const AArch64MCSymbolizer &) = delete;
+ AArch64MCSymbolizer &operator=(const AArch64MCSymbolizer &) = delete;
+ virtual ~AArch64MCSymbolizer();
+
+ bool tryAddingSymbolicOperand(MCInst &Inst, raw_ostream &CStream,
+ int64_t Value, uint64_t Address, bool IsBranch,
+ uint64_t Offset, uint64_t OpSize,
+ uint64_t InstSize) override;
+
+ void tryAddingPcLoadReferenceComment(raw_ostream &CStream, int64_t Value,
+ uint64_t Address) override;
+};
+
+} // namespace bolt
+} // namespace llvm
+
+#endif
diff --git a/bolt/lib/Target/AArch64/CMakeLists.txt b/bolt/lib/Target/AArch64/CMakeLists.txt
index 8435ea7245e7e..6cf50ae2df9eb 100644
--- a/bolt/lib/Target/AArch64/CMakeLists.txt
+++ b/bolt/lib/Target/AArch64/CMakeLists.txt
@@ -18,6 +18,7 @@ endif()
add_llvm_library(LLVMBOLTTargetAArch64
AArch64MCPlusBuilder.cpp
+ AArch64MCSymbolizer.cpp
NO_EXPORT
DISABLE_LLVM_LINK_LLVM_DYLIB
|
paschalis-mpeis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Maks,
Thanks for working on the scanExternalRefs/lite-mode area.
Not fully confident, but it does looks equivalent to the X86 implementation to me.
I do get the following test failures on AArch64:
max-funcs.test: crashes at extractFixupExpr:1814issue177.s: it core dumps with illegal hardware instruction
This looks like it's triggered by--trap-old-codebut also a couple ofmovkoperands in main are different from before this patch.
| #include "bolt/Core/MCPlusBuilder.h" | ||
| #include "bolt/Core/Relocation.h" | ||
| #include "llvm/MC/MCInst.h" | ||
| #include "llvm/MC/MCRegisterInfo.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDE warns that this 'is not used directly'
| #include "llvm/MC/MCRegisterInfo.h" |
| add_public_tablegen_target(AArch64CommonTableGen) | ||
| include_directories(${CMAKE_CURRENT_BINARY_DIR}) | ||
| endif() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On AArch64 I get linking errors. I believe on LLVM_LINK_COMPONENTS (line 2) we also need MCDisassembler.
|
Hi Paschalis, Thanks for identifying the issues. The first one is caused by a currently limited support for AArch64 relocations in The second issue is caused by LLVM AArch64 disassembler. It does not invoke |
|
All issues with this PR should be fixed. I have a stack on top that handles symbolization in |
paschalis-mpeis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks Maksim. The changes look good.
|
@yota9, does this look fine to you? I need someone to stamp the PR. |
yota9
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thank you!
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/2403 Here is the relevant piece of the build log for the reference |
Add AArch64MCSymbolizer that symbolizes
MCInstoperands during disassembly. The symbolization was previously done inBinaryFunction::disassemble(), but it is also required byscanExternalRefs()for "lite" mode functionality. Hence, similar to x86, I've implemented the symbolizer interface that usesBinaryFunctionrelocations to properly create instruction operands. I expect the result of the disassembly to be identical after the change.AArch64 disassembler was not calling
tryAddingSymbolicOperand()forMOVinstructions. Fix that. Additionally, the disassembler marksldrinstructions as branches by settingIsBranchparameter to true. Ignore the parameter and rely onMCPlusBuilderinterface instead.I've modified
--check-encodingflag to check symolization of operands of instructions that have relocations against them.