-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[lld-macho] Handle InputSection targets in branch range extension logic #126347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-lld @llvm/pr-subscribers-lld-macho Author: None (alx32) ChangesBranch Extension Support for InputSection-type RelocationsProblemThe branch extension algorithm was originally designed to handle branch relocations that point directly to symbols. However, when linking Rust code, we encountered edge cases where branch target relocations point to
SolutionThis patch adds support for
Implementation Details
This change ensures proper branch extension support for edge case Rust-generated code while maintaining compatibility with existing symbol-based relocations. TestingThis was tested to handle the known problematic scenario - but there doesn't seem to be a way for us to generate a branch relocation pointing to an InputSection as part of a test. Full diff: https://github.com/llvm/llvm-project/pull/126347.diff 1 Files Affected:
diff --git a/lld/MachO/ConcatOutputSection.cpp b/lld/MachO/ConcatOutputSection.cpp
index d64816ec695ad52..375f81d632f2389 100644
--- a/lld/MachO/ConcatOutputSection.cpp
+++ b/lld/MachO/ConcatOutputSection.cpp
@@ -116,6 +116,30 @@ void ConcatOutputSection::addInput(ConcatInputSection *input) {
DenseMap<Symbol *, ThunkInfo> lld::macho::thunkMap;
+// Returns the target Symbol that a relocation refers to.
+// A Reloc can refer to either a Symbol directly, or to an InputSection.
+// For InputSection referents, we return the first Symbol at offset 0.
+// This conversion is necessary because the thunk generation algorithm
+// can only handle Symbols as branch targets, not InputSections.
+static Symbol *getReferentSymbol(const Reloc &r) {
+ if (auto *sym = r.referent.dyn_cast<Symbol *>()) {
+ return sym;
+ } else if (auto *isec = r.referent.dyn_cast<InputSection *>()) {
+ // Use the first symbol at offset 0 in the InputSection
+ for (Defined *sym : isec->symbols) {
+ if (sym->value == 0) {
+ return sym;
+ }
+ }
+ // Handle absence of suitable symbol
+ warn("Branch-range extension: No symbol at offset 0 in InputSection '" +
+ toString(isec) + "', possible branch out of range errors may occur.");
+ return nullptr;
+ } else {
+ llvm_unreachable("Unexpected referent type");
+ }
+}
+
// Determine whether we need thunks, which depends on the target arch -- RISC
// (i.e., ARM) generally does because it has limited-range branch/call
// instructions, whereas CISC (i.e., x86) generally doesn't. RISC only needs
@@ -145,7 +169,10 @@ bool TextOutputSection::needsThunks() const {
for (Reloc &r : isec->relocs) {
if (!target->hasAttr(r.type, RelocAttrBits::BRANCH))
continue;
- auto *sym = cast<Symbol *>(r.referent);
+ // Get the Symbol that the relocation targets.
+ Symbol *sym = getReferentSymbol(r);
+ if (!sym)
+ continue;
// Pre-populate the thunkMap and memoize call site counts for every
// InputSection and ThunkInfo. We do this for the benefit of
// estimateStubsInRangeVA().
@@ -325,7 +352,10 @@ void TextOutputSection::finalize() {
backwardBranchRange < callVA ? callVA - backwardBranchRange : 0;
uint64_t highVA = callVA + forwardBranchRange;
// Calculate our call referent address
- auto *funcSym = cast<Symbol *>(r.referent);
+ Symbol *funcSym = getReferentSymbol(r);
+ if (!funcSym)
+ continue;
+
ThunkInfo &thunkInfo = thunkMap[funcSym];
// The referent is not reachable, so we need to use a thunk ...
if (funcSym->isInStubs() && callVA >= stubsInRangeVA) {
|
| // This conversion is necessary because the thunk generation algorithm | ||
| // can only handle Symbols as branch targets, not InputSections. | ||
| static Symbol *getReferentSymbol(const Reloc &r) { | ||
| if (auto *sym = r.referent.dyn_cast<Symbol *>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are migrating away from PointerUnion::dyn_cast:
llvm-project/llvm/include/llvm/ADT/PointerUnion.h
Lines 146 to 147 in c89735d
| // FIXME: Replace the uses of is(), get() and dyn_cast() with | |
| // isa<T>, cast<T> and the llvm::dyn_cast<T> |
AFAICT from llvm_unrechable below, you expect r.referent.isNull() to be false, so may I suggest the following?
| if (auto *sym = r.referent.dyn_cast<Symbol *>()) { | |
| if (auto *sym = llvm::dyn_cast<Symbol *>(r.referent)) { |
| static Symbol *getReferentSymbol(const Reloc &r) { | ||
| if (auto *sym = r.referent.dyn_cast<Symbol *>()) { | ||
| return sym; | ||
| } else if (auto *isec = r.referent.dyn_cast<InputSection *>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, may I suggest the following?
| } else if (auto *isec = r.referent.dyn_cast<InputSection *>()) { | |
| } else if (auto *isec = llvm::dyn_cast<InputSection *>(r.referent)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, don't use else after a return
https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return
|
Overall, it looks good to me. |
|
LGTM |
| } | ||
| } | ||
| // Handle absence of suitable symbol | ||
| warn("Branch-range extension: No symbol at offset 0 in InputSection '" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no symbol at offset 0, could we create a local fake symbol at offset 0, and later after relocation is done skip emitting this created symbol into symbol table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could but I haven't encountered this scenario yet so we might not want to add the extra complexity without a way of testing it.
Is there any way to generate a reduced Rust assembly or LLVM IR from the problematic scenario as a test? |
|
If you're not able to generate the test case though assembly, can you use yaml2obj? |
| static Symbol *getReferentSymbol(const Reloc &r) { | ||
| if (auto *sym = r.referent.dyn_cast<Symbol *>()) { | ||
| return sym; | ||
| } else if (auto *isec = r.referent.dyn_cast<InputSection *>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, don't use else after a return
https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return
| for (Defined *sym : isec->symbols) { | ||
| if (sym->value == 0) { | ||
| return sym; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for (Defined *sym : isec->symbols) { | |
| if (sym->value == 0) { | |
| return sym; | |
| } | |
| } | |
| for (Defined *sym : isec->symbols) | |
| if (sym->value == 0) | |
| return sym; |
| } else { | ||
| llvm_unreachable("Unexpected referent type"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } else { | |
| llvm_unreachable("Unexpected referent type"); | |
| } | |
| } | |
| llvm_unreachable("Unexpected referent type"); |
| } | ||
| // Handle absence of suitable symbol | ||
| warn("Branch-range extension: No symbol at offset 0 in InputSection '" + | ||
| toString(isec) + "', possible branch out of range errors may occur."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| toString(isec) + "', possible branch out of range errors may occur."); | |
| toString(isec) + "', branch out of range errors may occur."); |
"possible" and "may" are redundant.
|
While trying to get a Rust repro for this - it turns out that I was misdiagnosing the root issue. The issue comes from While this PR still fixes the issue - it seems like fixing the root cause would be better: #126835 I'll close this PR after landing the alternate one above. |
|
Another approach to fixing the issue (#126835) was merged - closing this PR. Thank you everyone for your feedback ! |
Branch Extension Support for InputSection-type Relocations
Problem
The branch extension algorithm was originally designed to handle branch relocations that point directly to symbols. However, when linking Rust code, we encountered edge cases where branch target relocations point to
InputSectionsinstead of symbols. This caused issues because:InputSection-based relocationsSolution
This patch adds support for
InputSection-based relocations by:InputSectioninstead of a SymbolInputSectionImplementation Details
getReferentSymbol()to get the branch target symbol, regardless of wether a relocation points to aSymbolor to anInputSectionInputSection, a warning is emitted to alert about potential branch range issuesThis change ensures proper branch extension support for edge case Rust-generated code while maintaining compatibility with existing symbol-based relocations.
Testing
This was tested to handle the known problematic scenario - but there doesn't seem to be a way for us to generate a branch relocation pointing to an InputSection as part of a test.