Skip to content

Conversation

@peterwaller-arm
Copy link
Contributor

@peterwaller-arm peterwaller-arm commented Nov 26, 2024

_init is used during startup of binaires. Unfortunately, its
address can be shared (at least on AArch64 glibc static binaries) with a data
reference that lives in the GOT. The GOT rewriting is currently unable
to distinguish between data addresses and function addresses. This leads
to the data address being incorrectly rewritten, causing a crash on
startup of the binary:

Unexpected reloc type in static binary.

To avoid this, don't consider _init for being moved, by skipping it.

We could add further conditions to narrow the skipped case for known
crashes, but as a straw man I thought it'd be best to keep the condition
as simple as possible and see if there any objections to this.

(Edit: this broke the test bolt/test/runtime/X86/retpoline-synthetic.test,
because _init was skipped from the retpoline pass and it has an indirect
call in it, so I include a check for static binaries now, which avoids the test failure,
but perhaps this could/should be narrowed further?)

For now, skip _init for static binaries on any architecture; we could
add further conditions to narrow the skipped case for known crashes, but
as a straw man I thought it'd be best to keep the condition as simple as
possible and see if there any objections to this.

Updates #100096.

@github-actions
Copy link

github-actions bot commented Nov 26, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@llvmbot
Copy link
Member

llvmbot commented Nov 27, 2024

@llvm/pr-subscribers-bolt

Author: Peter Waller (peterwaller-arm)

Changes

_init is used during startup of binaires. Unfortunately, its
address can be shared (at least on AArch64 glibc static binaries) with a data
reference that lives in the GOT. The GOT rewriting is currently unable
to distinguish between data addresses and function addresses. This leads
to the data address being incorrectly rewritten, causing a crash on
startup of the binary:

Unexpected reloc type in static binary.

To avoid this, don't consider _init for being moved, by skipping it.

We could add further conditions to narrow the skipped case for known
crashes, but as a straw man I thought it'd be best to keep the condition
as simple as possible and see if there any objections to this.

(Edit: this broke the test bolt/test/runtime/X86/retpoline-synthetic.test,
because _init was skipped from the retpoline pass and it has an indirect
call in it, so I include a check for static binaries now, which avoids the test failure,
but perhaps this could/should be narrowed further?)

For now, skip _init for static binaries on any architecture; we could
add further conditions to narrow the skipped case for known crashes, but
as a straw man I thought it'd be best to keep the condition as simple as
possible and see if there any objections to this.

Updates #100096.


Full diff: https://github.com/llvm/llvm-project/pull/117751.diff

2 Files Affected:

  • (modified) bolt/lib/Rewrite/RewriteInstance.cpp (+21-1)
  • (added) bolt/test/AArch64/check-init-not-moved.s (+43)
diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp
index 7059a3dd231099..078e4a44ddd13e 100644
--- a/bolt/lib/Rewrite/RewriteInstance.cpp
+++ b/bolt/lib/Rewrite/RewriteInstance.cpp
@@ -2927,6 +2927,24 @@ void RewriteInstance::handleRelocation(const SectionRef &RelocatedSection,
     LLVM_DEBUG(dbgs() << "BOLT-DEBUG: ignoring relocation from data to data\n");
 }
 
+static bool isStaticInitFunction(const BinaryContext &BC,
+                                 const BinaryFunction &Function) {
+  // Workaround for https://github.com/llvm/llvm-project/issues/100096
+  // ("[BOLT] GOT array pointer incorrectly rewritten"). In aarch64
+  // static glibc binaries, the .init section's _init function pointer can
+  // alias with a data pointer for the end of an array. GOT rewriting
+  // currently can't detect this and updates the data pointer to the
+  // moved _init, causing a runtime crash. Skipping _init on the other
+  // hand should be harmless.
+  if (!BC.IsStaticExecutable)
+    return false;
+  bool ShouldSkip =
+      Function.hasName("_init") && Function.getOriginSectionName() == ".init";
+  if (ShouldSkip)
+    LLVM_DEBUG(dbgs() << "BOLT-DEBUG: skip _init in for GOT workaround.\n");
+  return ShouldSkip;
+}
+
 void RewriteInstance::selectFunctionsToProcess() {
   // Extend the list of functions to process or skip from a file.
   auto populateFunctionNames = [](cl::opt<std::string> &FunctionNamesFile,
@@ -3061,7 +3079,9 @@ void RewriteInstance::selectFunctionsToProcess() {
     if (Function.isFragment())
       continue;
 
-    if (!shouldProcess(Function)) {
+    if (isStaticInitFunction(*BC, Function)) {
+      Function.setIgnored();
+    } else if (!shouldProcess(Function)) {
       if (opts::Verbosity >= 1) {
         BC->outs() << "BOLT-INFO: skipping processing " << Function
                    << " per user request\n";
diff --git a/bolt/test/AArch64/check-init-not-moved.s b/bolt/test/AArch64/check-init-not-moved.s
new file mode 100644
index 00000000000000..ad4b80d2e60e23
--- /dev/null
+++ b/bolt/test/AArch64/check-init-not-moved.s
@@ -0,0 +1,43 @@
+# Regression test for https://github.com/llvm/llvm-project/issues/100096
+# static glibc binaries crash on startup because _init is moved and
+# shares its address with an array end pointer. The GOT rewriting can't
+# tell the two pointers apart and incorrectly updates the _array_end
+# address. Test checks that _init is not moved.
+
+# RUN: llvm-mc -filetype=obj -triple aarch64-unknown-unknown %s -o %t.o
+# RUN: %clang %cflags %t.o -o %t.exe -Wl,-q -static -Wl,--section-start=.data=0x1000 -Wl,--section-start=.init=0x1004
+# RUN: llvm-bolt %t.exe -o %t.bolt
+# RUN: llvm-nm %t.exe | FileCheck --check-prefix=CHECK-ORIGINAL %s
+# RUN: llvm-nm %t.bolt | FileCheck --check-prefix=CHECK-BOLTED %s
+
+.section .data
+.globl _array_end
+_array_start:
+    .word 0x0
+
+_array_end:
+.section .init,"ax",@progbits
+.globl _init
+
+# Check that bolt doesn't move _init.
+#
+# CHECK-ORIGINAL: 0000000000001004 T _init
+# CHECK-BOLTED:   0000000000001004 T _init
+_init:
+    ret
+
+.section .text,"ax",@progbits
+.globl _start
+
+# Check that bolt is moving some other functions.
+#
+# CHECK-ORIGINAL:   0000000000001008 T _start
+# CHECK-BOLTED-NOT: 0000000000001008 T _start
+_start:
+    bl _init
+    adrp x0, #:got:_array_end
+    ldr x0, [x0, #:gotpage_lo15:_array_end]
+    adrp x0, #:got:_init
+    ldr x0, [x0, #:gotpage_lo15:_init]
+    ret
+

Copy link
Contributor

@aaupov aaupov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing this. Please address a small comment.

_init is used during startup of binaires. Unfortunately, its address can
be shared (at least on AArch64 glibc static binaries) with a data reference
that lives in the GOT. The GOT rewriting is currently unable to
distinguish between data addresses and function addresses. This leads to
the data address being incorrectly rewritten, causing a crash on startup
of the binary:

  Unexpected reloc type in static binary.

To avoid this, don't consider _init for being moved, by skipping it.

For now, skip _init for static binaries on any architecture; we could
add further conditions to narrow the skipped case for known crashes, but
as a straw man I thought it'd be best to keep the condition as simple as
possible and see if there any objections to this.

Updates llvm#100096.
@peterwaller-arm peterwaller-arm merged commit b5ed375 into llvm:main Nov 28, 2024
7 checks passed
@peterwaller-arm peterwaller-arm deleted the skip-init branch November 28, 2024 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants