Skip to content

Conversation

@maksfb
Copy link
Contributor

@maksfb maksfb commented Jul 29, 2025

Code written in assembly can have missing code markers. In BOLT, we can compensate by recognizing that a function entry point should start a code sequence.

Seen such code in lua jit library.

Code written in assembly can have missing code markers. In BOLT, we can
compensate by recognizing that a function entry point should start a
code sequence.

Seen such code in lua jit library.
@llvmbot
Copy link
Member

llvmbot commented Jul 29, 2025

@llvm/pr-subscribers-bolt

Author: Maksim Panchenko (maksfb)

Changes

Code written in assembly can have missing code markers. In BOLT, we can compensate by recognizing that a function entry point should start a code sequence.

Seen such code in lua jit library.


Full diff: https://github.com/llvm/llvm-project/pull/151060.diff

2 Files Affected:

  • (modified) bolt/lib/Rewrite/RewriteInstance.cpp (+14)
  • (added) bolt/test/AArch64/missing-code-marker.s (+26)
diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp
index 9f243a1366928..fe4a23cc01382 100644
--- a/bolt/lib/Rewrite/RewriteInstance.cpp
+++ b/bolt/lib/Rewrite/RewriteInstance.cpp
@@ -896,6 +896,20 @@ void RewriteInstance::discoverFileObjects() {
         continue;
 
       MarkerSymType MarkerType = BC->getMarkerType(SymInfo.Symbol);
+
+      // Treat ST_Function as code.
+      Expected<object::SymbolRef::Type> TypeOrError = SymInfo.Symbol.getType();
+      consumeError(TypeOrError.takeError());
+      if (TypeOrError && *TypeOrError == SymbolRef::ST_Function) {
+        if (IsData) {
+          Expected<StringRef> NameOrError = SymInfo.Symbol.getName();
+          consumeError(NameOrError.takeError());
+          BC->errs() << "BOLT-WARNING: function symbol " << *NameOrError
+                     << " lacks code marker\n";
+        }
+        MarkerType = MarkerSymType::CODE;
+      }
+
       if (MarkerType != MarkerSymType::NONE) {
         SortedMarkerSymbols.push_back(MarkerSym{SymInfo.Address, MarkerType});
         LastAddr = SymInfo.Address;
diff --git a/bolt/test/AArch64/missing-code-marker.s b/bolt/test/AArch64/missing-code-marker.s
new file mode 100644
index 0000000000000..591c9abd34c23
--- /dev/null
+++ b/bolt/test/AArch64/missing-code-marker.s
@@ -0,0 +1,26 @@
+## Check that llvm-bolt is able to recover a missing code marker.
+
+# RUN: %clang %cflags %s -o %t.exe -nostdlib -fuse-ld=lld -Wl,-q
+# RUN: llvm-bolt %t.exe -o %t.bolt 2>&1 | FileCheck %s
+
+# CHECK: BOLT-WARNING: function symbol foo lacks code marker
+
+.text
+.balign 4
+
+.word 0
+
+## Function foo starts immediately after a data object and does not have
+## a matching "$x" symbol to indicate the start of code.
+.global foo
+.type foo, %function
+foo:
+  .word 0xd65f03c0
+.size foo, .-foo
+
+.global _start
+.type _start, %function
+_start:
+  bl foo
+  ret
+.size _start, .-_start

@maksfb maksfb requested a review from yozhu July 29, 2025 03:40
Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Maksim. We've encountered this issue a few of times with some JITs.
It looks good, and I'm happy with the warning emitted.

BTW, before applying your patch, I though BOLT would present an error when encountering code that was missing a marker, but that isn't the case with the added test?

AFAIU, the error is ultimately on the source code side (the JIT's codegen). So I am wondering whether colleagues would prefer to guard this behaviour behind a flag as a way of encouraging fixes to those projects. I'll follow-up if that's the case.

@maksfb
Copy link
Contributor Author

maksfb commented Jul 29, 2025

Thanks for reviews.

BTW, before applying your patch, I though BOLT would present an error when encountering code that was missing a marker, but that isn't the case with the added test?

I haven't seen the error regarding the missing marker. In my case, the error came later from JITLink after a chain of events caused by treating code as data.

AFAIU, the error is ultimately on the source code side (the JIT's codegen). So I am wondering whether colleagues would prefer to guard this behaviour behind a flag as a way of encouraging fixes to those projects. I'll follow-up if that's the case.

Lua JIT has its own assembler and most likely it's the cause of the missing marker. On the BOLT side, if we refactor the code that prints user messages, we can have something like -Werror option.

@maksfb maksfb merged commit 1e0edb0 into llvm:main Jul 29, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants