[MachineOutliner] Don't outline ADRP pair to avoid incorrect ICF #160232

pranavk · 2025-09-23T04:56:58Z

On AArch64, ADRP and its user instructions (LDR, ADD, etc.), that are referencing a GOT symbol, when separated into different functions by machine outliner exposes a correctness issue in the linker ICF. In such cases, user instructions can end up pointing to a folded section (with its canonical folded symbol), while ADRP instruction point to a GOT entry corresponding to the original symbol. This leads to loading from incorrect memory address after ICF. #129122 explains how this can happen in detail.

This addresses #131660 which should fix two things:

Hide the correctness issue described above in the LLVM linker.
Allows optimizations that could relax GOT addressing to PC-relative addressing.

Fixes llvm#131660 Earlier attempts to fix this in the linker were not accepted. Current attempts is pending at llvm#139493

llvmbot · 2025-09-23T22:08:17Z

@llvm/pr-subscribers-backend-aarch64

Author: Pranav Kant (pranavk)

Changes

Fixes #131660

Earlier attempts to fix this in the linker were not accepted. Current linker attempts is pending at #139493.

Full diff: https://github.com/llvm/llvm-project/pull/160232.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+26-4)
(added) llvm/test/CodeGen/AArch64/machine-outliner-adrp-got-split.mir (+130)

diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 5a51c812732e6..8880ca455c1f6 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -10179,11 +10179,33 @@ AArch64InstrInfo::getOutliningTypeImpl(const MachineModuleInfo &MMI,
       return outliner::InstrType::Illegal;
   }
 
-  // Special cases for instructions that can always be outlined, but will fail
-  // the later tests. e.g, ADRPs, which are PC-relative use LR, but can always
-  // be outlined because they don't require a *specific* value to be in LR.
-  if (MI.getOpcode() == AArch64::ADRP)
+  // An ADRP instruction referencing a GOT should not be outlined.
+  // This is to avoid splitting ADRP/(LDR/ADD/etc.) pair into different
+  // functions which can lead to linker ICF merging sections incorrectly.
+  if (MI.getOpcode() == AArch64::ADRP) {
+    bool IsPage = (MI.getOperand(1).getTargetFlags() & AArch64II::MO_PAGE) != 0;
+    bool IsGot = (MI.getOperand(1).getTargetFlags() & AArch64II::MO_GOT) != 0;
+    if (IsPage && IsGot)
+      return outliner::InstrType::Illegal;
+
+    // Special cases for instructions that can always be outlined, but will fail
+    // the later tests. e.g, ADRPs, which are PC-relative use LR, but can always
+    // be outlined because they don't require a *specific* value to be in LR.
     return outliner::InstrType::Legal;
+  }
+
+  // Similarly, any user of ADRP instruction referencing a GOT should not be
+  // outlined. It's hard/costly to check exact users of ADRP. So we use check
+  // all operands and reject any that's a page offset and references a GOT.
+  const auto &F = MI.getMF()->getFunction();
+  for (const auto &MO : MI.operands()) {
+    bool IsPageOff = (MO.getTargetFlags() & AArch64II::MO_PAGEOFF) != 0;
+    bool IsGot = (MO.getTargetFlags() & AArch64II::MO_GOT) != 0;
+    if (IsPageOff && IsGot &&
+        (MI.getMF()->getTarget().getFunctionSections() || F.hasComdat() ||
+         F.hasSection() || F.getSectionPrefix()))
+      return outliner::InstrType::Illegal;
+  }
 
   // If MI is a call we might be able to outline it. We don't want to outline
   // any calls that rely on the position of items on the stack. When we outline
diff --git a/llvm/test/CodeGen/AArch64/machine-outliner-adrp-got-split.mir b/llvm/test/CodeGen/AArch64/machine-outliner-adrp-got-split.mir
new file mode 100644
index 0000000000000..169835809d6ba
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/machine-outliner-adrp-got-split.mir
@@ -0,0 +1,130 @@
+# RUN: llc -mtriple=aarch64---  -run-pass=machine-outliner -verify-machineinstrs %s -o - | FileCheck %s
+--- |
+
+  @x = common global i32 0, align 4
+
+  define i32 @adrp_add() #0 {
+    ret i32 0
+  }
+
+  define i32 @adrp_ldr() #0 {
+    ret i32 0
+  }
+
+  define void @bar(i32 %a) #0 {
+    ret void
+  }
+
+  attributes #0 = { noinline noredzone }
+...
+---
+# This test ensures that we do not outline ADRP / ADD pair when it's referencing 
+# a GOT entry.
+#
+# CHECK-LABEL: name: adrp_add
+# CHECK-DAG: bb.0:
+# CHECK: $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+# CHECK: $x12 = ADDXri $x9, target-flags(aarch64-pageoff, aarch64-got) @x, 0
+
+# CHECK-DAG: bb.1
+# CHECK: $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+# CHECK: $x12 = ADDXri $x9, target-flags(aarch64-pageoff, aarch64-got) @x, 0
+
+# CHECK-DAG: bb.2
+# CHECK: $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+# CHECK: $x12 = ADDXri $x9, target-flags(aarch64-pageoff, aarch64-got) @x, 0
+name:            adrp_add
+tracksRegLiveness: true
+body:             |
+  bb.0:
+  liveins: $lr
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+    $x12 = ADDXri $x9, target-flags(aarch64-pageoff, aarch64-got) @x, 0
+    $lr = ORRXri $xzr, 1
+  bb.1:
+  liveins: $lr
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+    $x12 = ADDXri $x9, target-flags(aarch64-pageoff, aarch64-got) @x, 0
+    $lr = ORRXri $xzr, 1
+  bb.2:
+  liveins: $lr
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+    $x12 = ADDXri $x9, target-flags(aarch64-pageoff, aarch64-got) @x, 0
+    $lr = ORRXri $xzr, 1
+  bb.3:
+  liveins: $lr
+    RET undef $lr
+...
+---
+# This test ensures that we do not outline ADRP / LDR pair when it's referencing 
+# a GOT entry.
+#
+# CHECK-LABEL: name: adrp_ldr
+# CHECK-DAG: bb.0:
+# CHECK: $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+# CHECK: $x12 = LDRXui $x9, target-flags(aarch64-pageoff, aarch64-got) @x
+
+# CHECK-DAG: bb.1
+# CHECK: $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+# CHECK: $x12 = LDRXui $x9, target-flags(aarch64-pageoff, aarch64-got) @x
+
+# CHECK-DAG: bb.2
+# CHECK: $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+# CHECK: $x12 = LDRXui $x9, target-flags(aarch64-pageoff, aarch64-got) @x
+name:            adrp_ldr
+tracksRegLiveness: true
+body:             |
+  bb.0:
+  liveins: $lr
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+    $x12 = LDRXui $x9, target-flags(aarch64-pageoff, aarch64-got) @x
+    $lr = ORRXri $xzr, 1
+  bb.1:
+  liveins: $lr
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+    $x12 = LDRXui $x9, target-flags(aarch64-pageoff, aarch64-got) @x
+    $lr = ORRXri $xzr, 1
+  bb.2:
+  liveins: $lr
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $w12 = ORRWri $wzr, 1
+    $x9 = ADRP target-flags(aarch64-page, aarch64-got) @x
+    $x12 = LDRXui $x9, target-flags(aarch64-pageoff, aarch64-got) @x
+    $lr = ORRXri $xzr, 1
+  bb.3:
+  liveins: $lr
+    RET undef $lr
\ No newline at end of file

smithp35

Thanks for having a go at this. I'm not an expert on the outliner, ideally we can get someone who is as a reviewer.

I've made some suggestions based on what I know of the AArch64 instruction set.

Please could you add a full description (for the commit message) it will be really useful to see that in place from git log on a terminal.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

davemgreen

Hi. The idea sounds OK to me. Would it be valid to outline so long as we always outlined both instructions (adrp+add, adrp+ldr) together? We fuse the adrp+add so they should generally be scheduled next to one another.

smithp35 · 2025-09-25T12:15:57Z

Hi. The idea sounds OK to me. Would it be valid to outline so long as we always outlined both instructions (adrp+add, adrp+ldr) together? We fuse the adrp+add so they should generally be scheduled next to one another.

Yes, as long as the adrp+add, adrp+ldr are in the same section, we don't get the problem, as ICF will remove both or none of them. The comment in #129122 (comment) has a good summary of the chain of events that lead to the problem.

lenary · 2025-10-01T06:51:55Z

I have been wondering on the RISC-V side if we can add back our ADRP-like instruction to the outlined segment if we find we have only outlined the instruction with the lo operand. This would potentially add overhead to sequences where only the lo instruction has been outlined, and might cause redundant ADRP-likes, but I think it might allow more outlining? I think this should be doable in buildOutlinedFrame, but additional work would also have to be done in getOutliningCandidateInfo to correctly set the FrameOverhead.

I don't know how this approach would cope if the outliner tries to only outline the ADRP-like, rather than the lo instruction. You'd presumably want to replicate the ADRPs from inside the outlined part to immediately after the call to the outlined sequence? That sounds like it might be too much overhead, but the cost model might catch that.

dtellenbach · 2025-10-04T21:18:35Z

I'm wondering if it would make sense to do this in getOutliningCandidateInfo and just remove the candidates that split up a adrp, add, ldr sequence. getOutliningTypeImpl is better suited for preventing individual instructions to be considered as part of candidates but doesn't seem a perfect fit if you want to allow the whole sequence to be outlined but not parts of it.

For detecting that a sequence has been split, you can do what @smithp35 mentioned earlier.

If we have in an outline candidate:
add x0, x0, :got_lo12: sym or ldr x0, [x0, :got_lo12:sym] but no preceeding adrp x0 :got: sym in the outline candidate, then we know that an adrp, add or adrp, ldr sequence has been split up as the add x0, x0, :got_lo12: sym and ldr x0, [x0, :got_lo12:sym] don't make any sense on their own.

pranavk · 2025-10-09T16:58:43Z

I'm wondering if it would make sense to do this in getOutliningCandidateInfo and just remove the candidates that split up a adrp, add, ldr sequence. getOutliningTypeImpl is better suited for preventing individual instructions to be considered as part of candidates but doesn't seem a perfect fit if you want to allow the whole sequence to be outlined but not parts of it.

getOutliningCandidateInfo certainly sounds like a better place to do what @smithp35 suggested above. This is under the assumption that LDR/ADD always follow ADRP which should generally happen as suggested above. @davemgreen I am curious under what conditions they won't be scheduled together. I'd like to avoid the correctness issue exposed by linker ICF. So it may be worth it to avoid this as well if possible.

github-actions · 2025-10-09T23:25:57Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/test/CodeGen/AArch64/machine-outliner-adrp-got-split.mir

dtellenbach

LGTM, thanks!

[MachineOutliner] Don't outline ADRP pair to avoid incorrect ICF

b4216f1

Fixes llvm#131660 Earlier attempts to fix this in the linker were not accepted. Current attempts is pending at llvm#139493

pranavk force-pushed the outline branch from e774618 to b4216f1 Compare September 23, 2025 22:07

llvm deleted a comment from github-actions bot Sep 23, 2025

pranavk marked this pull request as ready for review September 23, 2025 22:07

llvmbot added the backend:AArch64 label Sep 23, 2025

pranavk requested review from MaskRay, rnk and smithp35 September 23, 2025 22:08

fhahn requested review from aemerson and ornata September 24, 2025 09:25

smithp35 reviewed Sep 24, 2025

View reviewed changes

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp Outdated Show resolved Hide resolved

davemgreen reviewed Sep 24, 2025

View reviewed changes

dtellenbach self-requested a review October 4, 2025 21:18

Allow outlining ADRP pair as a whole

48d7e62

pranavk added 2 commits October 9, 2025 16:29

restore accidently deleted snippet

eed2542

clang-format

f5f9f16

dtellenbach reviewed Oct 10, 2025

View reviewed changes

llvm/test/CodeGen/AArch64/machine-outliner-adrp-got-split.mir Show resolved Hide resolved

dtellenbach approved these changes Nov 7, 2025

View reviewed changes

pranavk merged commit 2f54efd into llvm:main Nov 10, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MachineOutliner] Don't outline ADRP pair to avoid incorrect ICF #160232

[MachineOutliner] Don't outline ADRP pair to avoid incorrect ICF #160232

Uh oh!

pranavk commented Sep 23, 2025 •

edited

Loading

Uh oh!

llvmbot commented Sep 23, 2025

Uh oh!

smithp35 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davemgreen left a comment

Uh oh!

smithp35 commented Sep 25, 2025

Uh oh!

lenary commented Oct 1, 2025

Uh oh!

dtellenbach commented Oct 4, 2025 •

edited

Loading

Uh oh!

pranavk commented Oct 9, 2025

Uh oh!

github-actions bot commented Oct 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

dtellenbach left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[MachineOutliner] Don't outline ADRP pair to avoid incorrect ICF #160232

[MachineOutliner] Don't outline ADRP pair to avoid incorrect ICF #160232

Uh oh!

Conversation

pranavk commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 23, 2025

Uh oh!

smithp35 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

smithp35 commented Sep 25, 2025

Uh oh!

lenary commented Oct 1, 2025

Uh oh!

dtellenbach commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pranavk commented Oct 9, 2025

Uh oh!

github-actions bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dtellenbach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pranavk commented Sep 23, 2025 •

edited

Loading

dtellenbach commented Oct 4, 2025 •

edited

Loading

github-actions bot commented Oct 9, 2025 •

edited

Loading