Skip to content

Conversation

@bgergely0
Copy link
Contributor

In the Inliner pass, we need to convert tailcalls to normal calls
in the BB we want to inline.
These tailcalls can be indirect: in this case we would need to update the BTI
on their TargetBB to keep correctness.

As we don't know the targets of indirect tailcalls, we should skip
inlining such blocks.

In the Inliner pass, we need to convert tailcalls to normal calls
in the BB we want to inline.
These tailcalls can be indirect: in this case we would need to update the BTI
on their TargetBB to keep correctness.

As we don't know the targets of indirect tailcalls, we should skip
inlining such blocks.
Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@bgergely0 bgergely0 marked this pull request as ready for review November 17, 2025 16:56
@llvmbot llvmbot added the BOLT label Nov 17, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 17, 2025

@llvm/pr-subscribers-bolt

Author: Gergely Bálint (bgergely0)

Changes

In the Inliner pass, we need to convert tailcalls to normal calls
in the BB we want to inline.
These tailcalls can be indirect: in this case we would need to update the BTI
on their TargetBB to keep correctness.

As we don't know the targets of indirect tailcalls, we should skip
inlining such blocks.


Full diff: https://github.com/llvm/llvm-project/pull/168403.diff

2 Files Affected:

  • (modified) bolt/lib/Passes/Inliner.cpp (+23)
  • (added) bolt/test/AArch64/inline-bti.s (+38)
diff --git a/bolt/lib/Passes/Inliner.cpp b/bolt/lib/Passes/Inliner.cpp
index 9b28c7efde5bf..900b787e3e106 100644
--- a/bolt/lib/Passes/Inliner.cpp
+++ b/bolt/lib/Passes/Inliner.cpp
@@ -472,6 +472,29 @@ bool Inliner::inlineCallsInFunction(BinaryFunction &Function) {
         }
       }
 
+      // AArch64 BTI:
+      // If the callee has an indirect tailcall (BR), we would transform it to
+      // an indirect call (BLR) in InlineCall. Because of this, we would have to
+      // update the BTI at the target of the tailcall. However, these targets
+      // are not known. Instead, we skip inlining blocks with indirect
+      // tailcalls.
+      auto HasIndirectTailCall = [&](const BinaryFunction &BF) -> bool {
+        for (const auto &BB : BF) {
+          for (const auto &II : BB) {
+            if (BC.MIB->isIndirectBranch(II) && BC.MIB->isTailCall(II)) {
+              return true;
+            }
+          }
+        }
+        return false;
+      };
+
+      if (BC.isAArch64() && BC.usesBTI() &&
+          HasIndirectTailCall(*TargetFunction)) {
+        ++InstIt;
+        continue;
+      }
+
       LLVM_DEBUG(dbgs() << "BOLT-DEBUG: inlining call to " << *TargetFunction
                         << " in " << Function << " : " << BB->getName()
                         << ". Count: " << BB->getKnownExecutionCount()
diff --git a/bolt/test/AArch64/inline-bti.s b/bolt/test/AArch64/inline-bti.s
new file mode 100644
index 0000000000000..62f6ea6f4b63a
--- /dev/null
+++ b/bolt/test/AArch64/inline-bti.s
@@ -0,0 +1,38 @@
+## This test checks that for AArch64 binaries with BTI, we do not inline blocks with indirect tailcalls.
+
+# REQUIRES: system-linux
+
+# RUN: llvm-mc -filetype=obj -triple aarch64-unknown-unknown %s -o %t.o
+# RUN: %clang %cflags -O0 %t.o -o %t.exe -Wl,-q -Wl,-z,force-bti
+# RUN: llvm-bolt --inline-all %t.exe -o %t.bolt  | FileCheck %s
+
+# For BTI, we should not inline foo.
+# CHECK-NOT: BOLT-INFO: inlined {{[0-9]+}} calls at {{[0-9]+}} call sites in {{[0-9]+}} iteration(s). Change in binary size: {{[0-9]+}} bytes.
+
+	.text
+	.globl	_Z3fooP1A
+	.type	_Z3fooP1A,@function
+_Z3fooP1A:
+	ldr	x8, [x0]
+	ldr	w0, [x8]
+	br x30
+	.size	_Z3fooP1A, .-_Z3fooP1A
+
+	.globl	_Z3barP1A
+	.type	_Z3barP1A,@function
+_Z3barP1A:
+	stp	x29, x30, [sp, #-16]!
+	mov	x29, sp
+	bl	_Z3fooP1A
+	mul	w0, w0, w0
+	ldp	x29, x30, [sp], #16
+	ret
+	.size	_Z3barP1A, .-_Z3barP1A
+
+	.globl	main
+	.p2align	2
+	.type	main,@function
+main:
+	mov	w0, wzr
+	ret
+	.size	main, .-main

@bgergely0
Copy link
Contributor Author

A possible extra check could be to check if the tailcall is using the x16 or x17 registers. If so, it is likely that the target has a BTI c, which is accepting both indirect calls (BLR) and BR x16/x17.
However, a BR x16/x17 can also be accepted by a BTI j. Because of this, I chose to be conservative here, and drop inlining in this case as well.

@bgergely0
Copy link
Contributor Author

@tamaspetz FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants