-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[BOLT][AArch64] Add support for long absolute LLD thunks/veneers #113408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Absolute thunks generated by LLD reference function addresses recorded as data in code. Since they are generated by the linker, they don't have relocations associated with them and thus the addresses are left undetected. Use pattern matching to detect such thunks and handle them in VeneerElimination pass.
|
@llvm/pr-subscribers-bolt Author: Maksim Panchenko (maksfb) ChangesAbsolute thunks generated by LLD reference function addresses recorded as data in code. Since they are generated by the linker, they don't have relocations associated with them and thus the addresses are left undetected. Use pattern matching to detect such thunks and handle them in VeneerElimination pass. Full diff: https://github.com/llvm/llvm-project/pull/113408.diff 4 Files Affected:
diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h
index 32eda0b283b883..b9a13968535a5c 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -1536,6 +1536,11 @@ class MCPlusBuilder {
llvm_unreachable("not implemented");
}
+ virtual bool matchAbsLongVeneer(const BinaryFunction &BF,
+ uint64_t &TargetAddress) const {
+ llvm_unreachable("not implemented");
+ }
+
virtual bool matchAdrpAddPair(const MCInst &Adrp, const MCInst &Add) const {
llvm_unreachable("not implemented");
return false;
diff --git a/bolt/lib/Passes/VeneerElimination.cpp b/bolt/lib/Passes/VeneerElimination.cpp
index 8bf0359477c658..2736448df50cc3 100644
--- a/bolt/lib/Passes/VeneerElimination.cpp
+++ b/bolt/lib/Passes/VeneerElimination.cpp
@@ -29,30 +29,49 @@ static llvm::cl::opt<bool>
namespace llvm {
namespace bolt {
+static bool isPossibleVeneer(const BinaryFunction &BF) {
+ return BF.isAArch64Veneer() || BF.getOneName().starts_with("__AArch64");
+}
+
Error VeneerElimination::runOnFunctions(BinaryContext &BC) {
if (!opts::EliminateVeneers || !BC.isAArch64())
return Error::success();
std::map<uint64_t, BinaryFunction> &BFs = BC.getBinaryFunctions();
std::unordered_map<const MCSymbol *, const MCSymbol *> VeneerDestinations;
- uint64_t VeneersCount = 0;
- for (auto &It : BFs) {
- BinaryFunction &VeneerFunction = It.second;
- if (!VeneerFunction.isAArch64Veneer())
+ uint64_t NumEliminatedVeneers = 0;
+ for (BinaryFunction &BF : llvm::make_second_range(BC.getBinaryFunctions())) {
+ if (!isPossibleVeneer(BF))
+ continue;
+
+ if (BF.isIgnored())
continue;
- VeneersCount++;
- VeneerFunction.setPseudo(true);
- MCInst &FirstInstruction = *(VeneerFunction.begin()->begin());
- const MCSymbol *VeneerTargetSymbol =
- BC.MIB->getTargetSymbol(FirstInstruction, 1);
- assert(VeneerTargetSymbol && "Expecting target symbol for instruction");
- for (const MCSymbol *Symbol : VeneerFunction.getSymbols())
+ MCInst &FirstInstruction = *(BF.begin()->begin());
+ const MCSymbol *VeneerTargetSymbol = 0;
+ uint64_t TargetAddress;
+ if (BC.MIB->matchAbsLongVeneer(BF, TargetAddress)) {
+ if (BinaryFunction *TargetBF =
+ BC.getBinaryFunctionAtAddress(TargetAddress))
+ VeneerTargetSymbol = TargetBF->getSymbol();
+ } else {
+ if (!BC.MIB->hasAnnotation(FirstInstruction, "AArch64Veneer"))
+ continue;
+ VeneerTargetSymbol = BC.MIB->getTargetSymbol(FirstInstruction, 1);
+ }
+
+ if (!VeneerTargetSymbol)
+ continue;
+
+ for (const MCSymbol *Symbol : BF.getSymbols())
VeneerDestinations[Symbol] = VeneerTargetSymbol;
+
+ NumEliminatedVeneers++;
+ BF.setPseudo(true);
}
BC.outs() << "BOLT-INFO: number of removed linker-inserted veneers: "
- << VeneersCount << "\n";
+ << NumEliminatedVeneers << '\n';
// Handle veneers to veneers in case they occur
for (auto &Entry : VeneerDestinations) {
diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
index f58f7857e28aeb..1012ff77d6f66a 100644
--- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
@@ -15,6 +15,8 @@
#include "MCTargetDesc/AArch64MCExpr.h"
#include "MCTargetDesc/AArch64MCTargetDesc.h"
#include "Utils/AArch64BaseInfo.h"
+#include "bolt/Core/BinaryBasicBlock.h"
+#include "bolt/Core/BinaryFunction.h"
#include "bolt/Core/MCPlusBuilder.h"
#include "llvm/BinaryFormat/ELF.h"
#include "llvm/MC/MCContext.h"
@@ -22,6 +24,7 @@
#include "llvm/MC/MCInstBuilder.h"
#include "llvm/MC/MCInstrInfo.h"
#include "llvm/MC/MCRegisterInfo.h"
+#include "llvm/Support/DataExtractor.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"
@@ -1320,6 +1323,67 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
return 3;
}
+ /// Match the following pattern:
+ ///
+ /// ADR x16, .L1
+ /// BR x16
+ /// L1:
+ /// .quad Target
+ ///
+ /// Populate \p TargetAddress with the Target value on successful match.
+ bool matchAbsLongVeneer(const BinaryFunction &BF,
+ uint64_t &TargetAddress) const override {
+ if (BF.size() != 1 || BF.getMaxSize() < 16)
+ return false;
+
+ if (!BF.hasConstantIsland())
+ return false;
+
+ const BinaryBasicBlock &BB = BF.front();
+ if (BB.size() != 2)
+ return false;
+
+ const MCInst &LDRInst = BB.getInstructionAtIndex(0);
+ if (LDRInst.getOpcode() != AArch64::LDRXl)
+ return false;
+
+ if (!LDRInst.getOperand(0).isReg() ||
+ LDRInst.getOperand(0).getReg() != AArch64::X16)
+ return false;
+
+ const MCSymbol *TargetSym = getTargetSymbol(LDRInst, 1);
+ if (!TargetSym)
+ return false;
+
+ const MCInst &BRInst = BB.getInstructionAtIndex(1);
+ if (BRInst.getOpcode() != AArch64::BR)
+ return false;
+ if (!BRInst.getOperand(0).isReg() ||
+ BRInst.getOperand(0).getReg() != AArch64::X16)
+ return false;
+
+ const BinaryFunction::IslandInfo &IInfo = BF.getIslandInfo();
+ if (IInfo.HasDynamicRelocations)
+ return false;
+
+ auto Iter = IInfo.Offsets.find(8);
+ if (Iter == IInfo.Offsets.end() || Iter->second != TargetSym)
+ return false;
+
+ // Extract the absolute value stored inside the island.
+ StringRef SectionContents = BF.getOriginSection()->getContents();
+ StringRef FunctionContents = SectionContents.substr(
+ BF.getAddress() - BF.getOriginSection()->getAddress(), BF.getMaxSize());
+
+ const BinaryContext &BC = BF.getBinaryContext();
+ DataExtractor DE(FunctionContents, BC.AsmInfo->isLittleEndian(),
+ BC.AsmInfo->getCodePointerSize());
+ uint64_t Offset = 8;
+ TargetAddress = DE.getAddress(&Offset);
+
+ return true;
+ }
+
bool matchAdrpAddPair(const MCInst &Adrp, const MCInst &Add) const override {
if (!isADRP(Adrp) || !isAddXri(Add))
return false;
diff --git a/bolt/test/AArch64/veneer-lld-abs.s b/bolt/test/AArch64/veneer-lld-abs.s
new file mode 100644
index 00000000000000..d10ff46e2cb016
--- /dev/null
+++ b/bolt/test/AArch64/veneer-lld-abs.s
@@ -0,0 +1,51 @@
+## Check that llvm-bolt correctly recognizes long absolute thunks generated
+## by LLD.
+
+# RUN: llvm-mc -filetype=obj -triple aarch64-unknown-unknown %s -o %t.o
+# RUN: %clang %cflags -fno-PIC -no-pie %t.o -o %t.exe -nostdlib \
+# RUN: -fuse-ld=lld -Wl,-q
+# RUN: llvm-objdump -d %t.exe | FileCheck --check-prefix=CHECK-INPUT %s
+# RUN: llvm-objcopy --remove-section .rela.mytext %t.exe
+# RUN: llvm-bolt %t.exe -o %t.bolt --elim-link-veneers=true --lite=0
+# RUN: llvm-objdump -d -j .text %t.bolt | \
+# RUN: FileCheck --check-prefix=CHECK-OUTPUT %s
+
+.text
+.balign 4
+.global foo
+.type foo, %function
+foo:
+ adrp x1, foo
+ ret
+.size foo, .-foo
+
+.section ".mytext", "ax"
+.balign 4
+
+.global __AArch64AbsLongThunk_foo
+.type __AArch64AbsLongThunk_foo, %function
+__AArch64AbsLongThunk_foo:
+ ldr x16, .L1
+ br x16
+# CHECK-INPUT-LABEL: <__AArch64AbsLongThunk_foo>:
+# CHECK-INPUT-NEXT: ldr
+# CHECK-INPUT-NEXT: br
+.L1:
+ .quad foo
+.size __AArch64AbsLongThunk_foo, .-__AArch64AbsLongThunk_foo
+
+## Check that the thunk was removed from .text and _start() calls foo()
+## directly.
+
+# CHECK-OUTPUT-NOT: __AArch64AbsLongThunk_foo
+
+.global _start
+.type _start, %function
+_start:
+# CHECK-INPUT-LABEL: <_start>:
+# CHECK-OUTPUT-LABEL: <_start>:
+ bl __AArch64AbsLongThunk_foo
+# CHECK-INPUT-NEXT: bl {{.*}} <__AArch64AbsLongThunk_foo>
+# CHECK-OUTPUT-NEXT: bl {{.*}} <foo>
+ ret
+.size _start, .-_start
|
paschalis-mpeis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @maksfb, thanks for your PR.
Added a few suggestions. It looks good.
| namespace bolt { | ||
|
|
||
| static bool isPossibleVeneer(const BinaryFunction &BF) { | ||
| return BF.isAArch64Veneer() || BF.getOneName().starts_with("__AArch64"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to also check that this is not PI? eg:
BF.isAArch64Veneer() ||
(BF.getOneName().starts_with("__AArch64") && BF.getBinaryContext().HasFixedLoadAddress);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can possibly add a check later when we detect an absolute veneer, but then it should be legal to call an absolute-address function even from PIC, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smithp35 made a similar suggestion with mine here (2nd paragraph).
But it looks like there is a more complex scenario: a hypothetical non-PI binary that has mixed PIC and non-PIC segments?
Not sure if my understanding is correct.
In theory at least, it should be possible/legal to do such a call.
But if we have such a veneer in an otherwise PIC segment, wouldn't that make the entire segment a non-PIC? (at least in theory).
Regarding BOLT itself, TMU it assumes that a binary is PIC if the ELF file is not ET_EXEC; it does not keep such information per segment.
| } else { | ||
| MCInst &FirstInstruction = *(BF.begin()->begin()); | ||
| if (BC.MIB->hasAnnotation(FirstInstruction, "AArch64Veneer")) | ||
| VeneerTargetSymbol = BC.MIB->getTargetSymbol(FirstInstruction, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could keep the assertion just for the else case?
assert(VeneerTargetSymbol && "Expecting target symbol for instruction");There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect we will get more veneers which I'll add the support for in later PRs.
|
|
||
| BC.outs() << "BOLT-INFO: number of removed linker-inserted veneers: " | ||
| << VeneersCount << "\n"; | ||
| << NumEliminatedVeneers << '\n'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly related, but was there a particular reason the other statistic at L102 was a debug message?
In case the output is helpful to understand whether the number of eliminated veneers does match or not the number of patched call-sites, we could maybe combine them and emit them both at L102?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if there was a reason for that separation, but indeed we can combine the messages in a new PR.
|
|
||
| /// Match the following pattern: | ||
| /// | ||
| /// ADR x16, .L1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this was meant to be LDR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching it.
smithp35
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see anything wrong with the patch. I expect the __AArch64ADRPThunk_ to be more common, as these will be used when the code is build position independent.
It is a bit of a pity that BOLT has to detect veneers, as there are many possible code-sequences across all static linkers. While it is too late to do anything now, I'm wondering if there could be some extra hints that could be generated by static linkers to make it easier for binary analysis tools to handle veneers. Possibly written up in the AArch64 sysvabi so that other linker's could choose to adopt them.
One possible case that you may need to handle in the veneer to a veneer case is the newly added __AArch64BTIThunk_. These are needed when LLD inserts a veneer with an indirect branch to a destination that doesn't have a BTI at the destination. In this case the linker adds a "landing pad" veneer with a BTI which then transfers control to the destination, either by falling through or with a direct branch.
For example:
bl __AArch64AbsLongThunkFoo
__AArch64AbsLongThunk_Foo
...
br x16
...
__AArch64BTIThunk_Foo
bti c
b Foo // This can be elided if Foo follows after the veneer.
...
Foo:
// some instruction other than one of the BTI (or compatible instructions)
I don't think it is critical that these are detected as these veneers are alternative entry points for a function, but BOLT may have been able to rearrange the code such that an indirect branch to Foo (which requires a BTI at the destination) is no longer necessary.
I wouldn't expect BOLT to encounter these for a while as clang currently outputs BTI instructions for all functions, but this may change in the future.
As an aside, a possible optimisation for BOLT, assuming it can detect all indirect branch targets is to remove BTI c instructions that aren't needed.
| namespace bolt { | ||
|
|
||
| static bool isPossibleVeneer(const BinaryFunction &BF) { | ||
| return BF.isAArch64Veneer() || BF.getOneName().starts_with("__AArch64"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think lld is the only tool putting a prefix of __AArch64 onto the symbol, if other tools start doing that too then LLD also has a Thunk later on in the name, although scanning for that is not likely profitable.
The AbsLongThunks must only be generated in the absence of PIC or PIE so if that can be detected by Bolt then it could be used to rule these out. Whether it is worth doing so or not I don't know.
| /// .quad Target | ||
| /// | ||
| /// Populate \p TargetAddress with the Target value on successful match. | ||
| bool matchAbsLongVeneer(const BinaryFunction &BF, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLD will attempt to write veneers as "short" veneers whenever possible. This is true for all AArch64 LLD generated veneers. If the distance from the veneer to the target is < 128 MiB then LLD will use
B target instead of the sequence above. LLD won't change the name of the veneer when it does that.
I'd expect "short" veneers to be common for programs that are between 128 MiB and 256 MiB in size.
Maybe a possible enhancement to detect this case.
|
@smithp35, thank you for the review. I agree that making veneers/thunks part of the ABI would have made job easier for tools. For now, we have to rely on name/pattern matching. I plan to add support for more thunk types (particularly short ones) after this PR. Regarding absolute thunks in PIE/PIC, I believe you are right that a linker wouldn't generate such thunks even for absolute symbols. @paschalis-mpeis mentioned it in his review as well. |
Absolute thunks generated by LLD reference function addresses recorded as data in code. Since they are generated by the linker, they don't have relocations associated with them and thus the addresses are left undetected. Use pattern matching to detect such thunks and handle them in VeneerElimination pass.