-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[MachineOutliner] Don't outline ADRP pair to avoid incorrect ICF #160232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Fixes llvm#131660 Earlier attempts to fix this in the linker were not accepted. Current attempts is pending at llvm#139493
@llvm/pr-subscribers-backend-aarch64 Author: Pranav Kant (pranavk) ChangesFixes #131660 Earlier attempts to fix this in the linker were not accepted. Current linker attempts is pending at #139493. Full diff: https://github.com/llvm/llvm-project/pull/160232.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 5a51c812732e6..8880ca455c1f6 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -10179,11 +10179,33 @@ AArch64InstrInfo::getOutliningTypeImpl(const MachineModuleInfo &MMI,
return outliner::InstrType::Illegal;
}
- // Special cases for instructions that can always be outlined, but will fail
- // the later tests. e.g, ADRPs, which are PC-relative use LR, but can always
- // be outlined because they don't require a *specific* value to be in LR.
- if (MI.getOpcode() == AArch64::ADRP)
+ // An ADRP instruction referencing a GOT should not be outlined.
+ // This is to avoid splitting ADRP/(LDR/ADD/etc.) pair into different
+ // functions which can lead to linker ICF merging sections incorrectly.
+ if (MI.getOpcode() == AArch64::ADRP) {
+ bool IsPage = (MI.getOperand(1).getTargetFlags() & AArch64II::MO_PAGE) != 0;
+ bool IsGot = (MI.getOperand(1).getTargetFlags() & AArch64II::MO_GOT) != 0;
+ if (IsPage && IsGot)
+ return outliner::InstrType::Illegal;
+
+ // Special cases for instructions that can always be outlined, but will fail
+ // the later tests. e.g, ADRPs, which are PC-relative use LR, but can always
+ // be outlined because they don't require a *specific* value to be in LR.
return outliner::InstrType::Legal;
+ }
+
+ // Similarly, any user of ADRP instruction referencing a GOT should not be
+ // outlined. It's hard/costly to check exact users of ADRP. So we use check
+ // all operands and reject any that's a page offset and references a GOT.
+ const auto &F = MI.getMF()->getFunction();
+ for (const auto &MO : MI.operands()) {
+ bool IsPageOff = (MO.getTargetFlags() & AArch64II::MO_PAGEOFF) != 0;
+ bool IsGot = (MO.getTargetFlags() & AArch64II::MO_GOT) != 0;
+ if (IsPageOff && IsGot &&
+ (MI.getMF()->getTarget().getFunctionSections() || F.hasComdat() ||
+ F.hasSection() || F.getSectionPrefix()))
+ return outliner::InstrType::Illegal;
+ }
// If MI is a call we might be able to outline it. We don't want to outline
// any calls that rely on the position of items on the stack. When we outline
diff --git a/llvm/test/CodeGen/AArch64/machine-outliner-adrp-got-split.mir b/llvm/test/CodeGen/AArch64/machine-outliner-adrp-got-split.mir
new file mode 100644
index 0000000000000..169835809d6ba
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/machine-outliner-adrp-got-split.mir
@@ -0,0 +1,130 @@
+# RUN: llc -mtriple=aarch64--- -run-pass=machine-outliner -verify-machineinstrs %s -o - | FileCheck %s
+--- |
+
+ @x = common global i32 0, align 4
+
+ define i32 @adrp_add() #0 {
+ ret i32 0
+ }
+
+ define i32 @adrp_ldr() #0 {
+ ret i32 0
+ }
+
+ define void @bar(i32 %a) #0 {
+ ret void
+ }
+
+ attributes #0 = { noinline noredzone }
+...
+---
+# This test ensures that we do not outline ADRP / ADD pair when it's referencing
+# a GOT entry.
+#
+# CHECK-LABEL: name: adrp_add
+# CHECK-DAG: bb.0:
+# CHECK: $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+# CHECK: $x12 = ADDXri $x9, target-flags(aarch64-pageoff, aarch64-got) @x, 0
+
+# CHECK-DAG: bb.1
+# CHECK: $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+# CHECK: $x12 = ADDXri $x9, target-flags(aarch64-pageoff, aarch64-got) @x, 0
+
+# CHECK-DAG: bb.2
+# CHECK: $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+# CHECK: $x12 = ADDXri $x9, target-flags(aarch64-pageoff, aarch64-got) @x, 0
+name: adrp_add
+tracksRegLiveness: true
+body: |
+ bb.0:
+ liveins: $lr
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+ $x12 = ADDXri $x9, target-flags(aarch64-pageoff, aarch64-got) @x, 0
+ $lr = ORRXri $xzr, 1
+ bb.1:
+ liveins: $lr
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+ $x12 = ADDXri $x9, target-flags(aarch64-pageoff, aarch64-got) @x, 0
+ $lr = ORRXri $xzr, 1
+ bb.2:
+ liveins: $lr
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+ $x12 = ADDXri $x9, target-flags(aarch64-pageoff, aarch64-got) @x, 0
+ $lr = ORRXri $xzr, 1
+ bb.3:
+ liveins: $lr
+ RET undef $lr
+...
+---
+# This test ensures that we do not outline ADRP / LDR pair when it's referencing
+# a GOT entry.
+#
+# CHECK-LABEL: name: adrp_ldr
+# CHECK-DAG: bb.0:
+# CHECK: $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+# CHECK: $x12 = LDRXui $x9, target-flags(aarch64-pageoff, aarch64-got) @x
+
+# CHECK-DAG: bb.1
+# CHECK: $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+# CHECK: $x12 = LDRXui $x9, target-flags(aarch64-pageoff, aarch64-got) @x
+
+# CHECK-DAG: bb.2
+# CHECK: $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+# CHECK: $x12 = LDRXui $x9, target-flags(aarch64-pageoff, aarch64-got) @x
+name: adrp_ldr
+tracksRegLiveness: true
+body: |
+ bb.0:
+ liveins: $lr
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+ $x12 = LDRXui $x9, target-flags(aarch64-pageoff, aarch64-got) @x
+ $lr = ORRXri $xzr, 1
+ bb.1:
+ liveins: $lr
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+ $x12 = LDRXui $x9, target-flags(aarch64-pageoff, aarch64-got) @x
+ $lr = ORRXri $xzr, 1
+ bb.2:
+ liveins: $lr
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $w12 = ORRWri $wzr, 1
+ $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+ $x12 = LDRXui $x9, target-flags(aarch64-pageoff, aarch64-got) @x
+ $lr = ORRXri $xzr, 1
+ bb.3:
+ liveins: $lr
+ RET undef $lr
\ No newline at end of file
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for having a go at this. I'm not an expert on the outliner, ideally we can get someone who is as a reviewer.
I've made some suggestions based on what I know of the AArch64 instruction set.
Please could you add a full description (for the commit message) it will be really useful to see that in place from git log on a terminal.
bool IsPageOff = (MO.getTargetFlags() & AArch64II::MO_PAGEOFF) != 0; | ||
bool IsGot = (MO.getTargetFlags() & AArch64II::MO_GOT) != 0; | ||
if (IsPageOff && IsGot && | ||
(MI.getMF()->getTarget().getFunctionSections() || F.hasComdat() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment above says:
So we use check all operands and reject any that's a page offset and references a GOT.
This part seems to further constrain to only functions that are, roughly speaking, compiled in their own section.
However the ADRP is not constrained. Can you explain why this is needed? If there is a good reason, I think we should have a test case.
Could we end up with a case where there is an attempt to outline an ADRP, ADD sequence from .text where the ADRP is illegal, but the ADD is not, and this causes problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. My code was partially motivated by @lenary's code pointer to RISCV backend here: #131660 (comment)
Maybe I am missing something there but now that I look at it again, I agree that this should just be
if (isPageOff && IsGot) { ... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional conditions are to detect the cases where the outlined function will be in a separate section to the original function. If you're outlining from a function to create a function in the same section, then you shouldn't need to worry about ICF (which only works with -ffunction-sections, iirc). This is the approach we made on RISC-V, but I note that ADRP does work slightly differently to AUIPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the additional context. I had made the assumption that the outliner could always create a new section when outlining, but if it is in the same section then I agree we won't see the ICF problem.
I think it would be worth a quick comment to explain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The outliner creates new functions (there will be a little LLVM IR function stub for them, because there has to be, but the MachineFunction contents is the important bit), which might inherit section-related metadata, but the section creation still happens as usual in the AsmPrinter, which is at the end of the codegen pipeline.
return outliner::InstrType::Legal; | ||
} | ||
|
||
// Similarly, any user of ADRP instruction referencing a GOT should not be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can constrain this to just the opcodes that can follow an ADRP in a GOT sequence. There should be only one ADD and one LDR opcode.
The tiny code-model can use a single pc-relative load to the GOT which could be outlined. This is a different encoding to the base relative ldr used after an ADRP in the small code model.
// tiny
ldr x0, :got:var
// small
adrp x0, :got:var
ldr x0, [x0 :got_lo12: var]
It is possible that there is a different flag to MO_GOT used though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will address this in next comment.
// the later tests. e.g, ADRPs, which are PC-relative use LR, but can always | ||
// be outlined because they don't require a *specific* value to be in LR. | ||
if (MI.getOpcode() == AArch64::ADRP) | ||
// An ADRP instruction referencing a GOT should not be outlined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, this will prevent a sequence of ADRP, ADD or ADRP, LDR from being outlined as a whole sequence. Do we have any data on whether that significantly affects the effectiveness of the outliner?
I'm not familiar with the outliner code so I'm not sure I can offer a direct alternative. I can see some code that evaluates the RepeatedSequences. Is there an alternative that scans the sequences looking for LDR or ADD instructions with MO_GOT that appear before an ADRP with a MO_PAGE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an alternative that scans the sequences looking for LDR or ADD instructions with MO_GOT that appear before an ADRP with a MO_PAGE.
You mean that appears after an ADRP with MO_PAGE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @davemgreen as this relates to his comment as well.
I wasn't sure ADRP and ADD would always appear together. It seems that's a reasonable assumption as David mentioned we fuse them together. I could std::prev(MIT)
and check if ADD/LDR is being followed by an ADRP but not yet sure how to allow outlining of both of them as a whole sequence while preventing them from outlining into separate function. Perhaps this function is not the right place to do this sort of thing and it needs to be done somewhere else. But then how does RISCV code that @lenary pointed out aims to fix this. That RISCV code also aims to prevent separation of LO/HI instructions. It appears that code is also going to prevent outlining of both HI/LO as a whole sequence and is buggy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they will often be laid out next to one another, so if we need to check that the next instruction is a adrp/add/etc then it should be quick to check. They are not always guaranteed to be next to one another though, in case the check fails. In which case it is probably fine to bail out.
If there are no candidates that ever mark one of the adrp/add individually as outlinable, would it ever be possible to outline a single instruction from them? That might not be the most reliable solution though.
It appears that code is also going to prevent outlining of both HI/LO as a whole sequence and is buggy?
Maybe just a little inefficient? I'm not sure how much is lost from not outlining adrp+adds. If it is very little then marking them always invalid could be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an alternative that scans the sequences looking for LDR or ADD instructions with MO_GOT that appear before an ADRP with a MO_PAGE.
You mean that appears after an ADRP with MO_PAGE?
I'm thinking of the case where there are candidates for outlining, like in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp#L9550 that are scanned for validity.
If we have in an outline candidate:
add x0, x0, :got_lo12: sym
or ldr x0, [x0, :got_lo12:sym]
but no preceeding adrp x0 :got: sym
in the outline candidate, then we know that an adrp, add
or adrp, ldr
sequence has been split up as the add x0, x0, :got_lo12: sym
and ldr x0, [x0, :got_lo12:sym]
don't make any sense on their own.
I agree with @davemgreen that if not much is lost from outlining adrp (add/ldr) then it is fine to keep this simple. I don't have a representative set of programs to know that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. The idea sounds OK to me. Would it be valid to outline so long as we always outlined both instructions (adrp+add, adrp+ldr) together? We fuse the adrp+add so they should generally be scheduled next to one another.
Yes, as long as the adrp+add, adrp+ldr are in the same section, we don't get the problem, as ICF will remove both or none of them. The comment in #129122 (comment) has a good summary of the chain of events that lead to the problem. |
I have been wondering on the RISC-V side if we can add back our ADRP-like instruction to the outlined segment if we find we have only outlined the instruction with the lo operand. This would potentially add overhead to sequences where only the lo instruction has been outlined, and might cause redundant ADRP-likes, but I think it might allow more outlining? I think this should be doable in I don't know how this approach would cope if the outliner tries to only outline the ADRP-like, rather than the lo instruction. You'd presumably want to replicate the ADRPs from inside the outlined part to immediately after the call to the outlined sequence? That sounds like it might be too much overhead, but the cost model might catch that. |
I'm wondering if it would make sense to do this in For detecting that a sequence has been split, you can do what @smithp35 mentioned earlier.
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
@@ -0,0 +1,133 @@ | |||
# RUN: llc -mtriple=aarch64--- -run-pass=machine-outliner -verify-machineinstrs %s -o - | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for the whole adrp/add/ldr sequence?
On AArch64, ADRP and its user instructions (LDR, ADD, etc.), that are referencing a GOT symbol, when separated into different functions by machine outliner exposes a correctness issue in the linker ICF. In such cases, user instructions can end up pointing to a folded section (with its canonical folded symbol), while ADRP instruction point to a GOT entry corresponding to the original symbol. This leads to loading from incorrect memory address after ICF. #129122 explains how this can happen in detail.
This addresses #131660 which should fix two things: