Skip to content

Conversation

@yozhu
Copy link
Contributor

@yozhu yozhu commented Apr 15, 2025

Some functions have their sizes as zero in input binary's symbol table, like
those compiled by assembler. When figuring out function sizes, we may create
label symbol if it doesn't point to any constant island. However, before
function size is known, marker symbol cannot be correctly associated to a
function and therefore all such checks would fail and we could end up adding
a code label pointing to constant island as secondary entry point and later
mistakenly marking the function as not simple.

Querying the global marker symbol array has big throughput overhead. Instead
we can run an extra check when post processing entry points to identify such
label symbols that actually point to constant islands.

Some functions have their sizes as zero in input binary's symbol table, like
those compiled by assembler. When figuring out function sizes, we may create
label symbol if it doesn't point to any constant island. However, before
function size is known, marker symbol cannot be correctly associated to a
function and therefore all such checks would fail and we could end up adding
a code label pointing to constant island as secondary entry point and later
mistakenly marking the function as not simple.

Querying the global marker symbol array has big throughput overhead. Instead
we can run an extra check when post processing entry points to identify such
label symbols that actually point to constant islands.
@llvmbot
Copy link
Member

llvmbot commented Apr 15, 2025

@llvm/pr-subscribers-bolt

Author: YongKang Zhu (yozhu)

Changes

Some functions have their sizes as zero in input binary's symbol table, like
those compiled by assembler. When figuring out function sizes, we may create
label symbol if it doesn't point to any constant island. However, before
function size is known, marker symbol cannot be correctly associated to a
function and therefore all such checks would fail and we could end up adding
a code label pointing to constant island as secondary entry point and later
mistakenly marking the function as not simple.

Querying the global marker symbol array has big throughput overhead. Instead
we can run an extra check when post processing entry points to identify such
label symbols that actually point to constant islands.


Full diff: https://github.com/llvm/llvm-project/pull/135731.diff

4 Files Affected:

  • (modified) bolt/include/bolt/Core/BinaryFunction.h (+9)
  • (modified) bolt/lib/Core/BinaryFunction.cpp (+9)
  • (added) bolt/test/AArch64/validate-secondary-entry-point.s (+34)
  • (added) bolt/test/RISCV/validate-secondary-entry-point.s (+34)
diff --git a/bolt/include/bolt/Core/BinaryFunction.h b/bolt/include/bolt/Core/BinaryFunction.h
index d3d11f8c5fb73..a52998564ee1b 100644
--- a/bolt/include/bolt/Core/BinaryFunction.h
+++ b/bolt/include/bolt/Core/BinaryFunction.h
@@ -1174,6 +1174,11 @@ class BinaryFunction {
     return getSecondaryEntryPointSymbol(BB.getLabel());
   }
 
+  /// Remove a label from the secondary entry point map.
+  void removeSymbolFromSecondaryEntryPointMap(const MCSymbol *Label) {
+    SecondaryEntryPoints.erase(Label);
+  }
+
   /// Return true if the basic block is an entry point into the function
   /// (either primary or secondary).
   bool isEntryPoint(const BinaryBasicBlock &BB) const {
@@ -2126,6 +2131,10 @@ class BinaryFunction {
     return Islands && !Islands->DataOffsets.empty();
   }
 
+  bool isStartOfConstantIsland(uint64_t Offset) const {
+    return hasConstantIsland() && Islands->DataOffsets.count(Offset);
+  }
+
   /// Return true iff the symbol could be seen inside this function otherwise
   /// it is probably another function.
   bool isSymbolValidInScope(const SymbolRef &Symbol, uint64_t SymbolSize) const;
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index c4f4d234b30c0..184a4462b356a 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -1896,6 +1896,15 @@ void BinaryFunction::postProcessEntryPoints() {
     if (BC.isAArch64() && Offset == getSize())
       continue;
 
+    // If we have grabbed a wrong code label which actually points to some
+    // constant island inside the function, ignore this label and remove it
+    // from the secondary entry point map.
+    if (isStartOfConstantIsland(Offset)) {
+      BC.SymbolToFunctionMap.erase(Label);
+      removeSymbolFromSecondaryEntryPointMap(Label);
+      continue;
+    }
+
     BC.errs() << "BOLT-WARNING: reference in the middle of instruction "
                  "detected in function "
               << *this << " at offset 0x" << Twine::utohexstr(Offset) << '\n';
diff --git a/bolt/test/AArch64/validate-secondary-entry-point.s b/bolt/test/AArch64/validate-secondary-entry-point.s
new file mode 100644
index 0000000000000..0099a0ee4fe99
--- /dev/null
+++ b/bolt/test/AArch64/validate-secondary-entry-point.s
@@ -0,0 +1,34 @@
+# This test is to verify that BOLT won't take a label pointing to constant
+# island as a secondary entry point (function `_start` doesn't have ELF size
+# set originally) and the function won't otherwise be mistaken as non-simple.
+
+# RUN: %clang %cflags -pie %s -o %t.so -Wl,-q -Wl,--init=_foo -Wl,--fini=_foo
+# RUN: llvm-bolt %t.so -o %t.bolt.so --print-cfg 2>&1 | FileCheck %s
+# CHECK-NOT: BOLT-WARNING: reference in the middle of instruction detected \
+# CHECK-NOT:   function _start at offset 0x{{[0-9a-f]+}}
+# CHECK: Binary Function "_start" after building cfg
+
+  .text
+
+  .global _foo
+  .type _foo, %function
+_foo:
+  ret
+
+  .global _start
+  .type _start, %function
+_start:
+  b _foo
+
+  .balign 16
+_random_consts:
+  .long 0x12345678
+  .long 0x90abcdef
+
+  .global _bar
+  .type _bar, %function
+_bar:
+  ret
+
+  # Dummy relocation to force relocation mode
+  .reloc 0, R_AARCH64_NONE
diff --git a/bolt/test/RISCV/validate-secondary-entry-point.s b/bolt/test/RISCV/validate-secondary-entry-point.s
new file mode 100644
index 0000000000000..0c29f5c97c689
--- /dev/null
+++ b/bolt/test/RISCV/validate-secondary-entry-point.s
@@ -0,0 +1,34 @@
+# This test is to verify that BOLT won't take a label pointing to constant
+# island as a secondary entry point (function `_start` doesn't have ELF size
+# set originally) and the function won't otherwise be mistaken as non-simple.
+
+# RUN: %clang %cflags -pie %s -o %t.so -Wl,-q -Wl,--init=_foo -Wl,--fini=_foo
+# RUN: llvm-bolt %t.so -o %t.bolt.so --print-cfg 2>&1 | FileCheck %s
+# CHECK-NOT: BOLT-WARNING: reference in the middle of instruction detected \
+# CHECK-NOT:   function _start at offset 0x{{[0-9a-f]+}}
+# CHECK: Binary Function "_start" after building cfg
+
+  .text
+
+  .global _foo
+  .type _foo, %function
+_foo:
+  ret
+
+  .global _start
+  .type _start, %function
+_start:
+  j _foo
+
+  .balign 16
+_random_consts:
+  .long 0x12345678
+  .long 0x90abcdef
+
+  .global _bar
+  .type _bar, %function
+_bar:
+  ret
+
+  # Dummy relocation to force relocation mode
+  .reloc 0, R_RISCV_NONE

Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@yozhu yozhu merged commit 823adc7 into llvm:main Apr 15, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants