Skip to content

Conversation

@yavtuk
Copy link
Contributor

@yavtuk yavtuk commented May 29, 2025

Indirect call instrumentation snippet uses x16 register in exit handler to go to destination target

__bolt_instr_ind_call_handler_func:
        msr  nzcv, x1
        ldp  x0, x1, [sp], #16
        ldr  x16, [sp], #16
        ldp  x0, x1, [sp], #16
        br   x16	<----- go to indirect target

Depends on compiler x16 register used to store something in cross function calling. This patch adds the instrumentation snippet by calling instrumentation runtime library through indirect call instruction and adding the wrapper to store/load target value and the register for original indirect instruction without touching registers except x0, x1

Example:
mov x16, foo

infirectCall:
        adrp x8, Label
        add  x8, x8, #:lo12:Lable
        blr x8

Before:

Instrumented indirect call:
        stp     x0, x1, [sp, #-16]!
        mov     x0, x8
        movk    x1, #0x0, lsl #48
        movk    x1, #0x0, lsl #32
        movk    x1, #0x0, lsl #16
        movk    x1, #0x0
        stp     x0, x1, [sp, #-16]!
        adrp    x0, __bolt_instr_ind_call_handler_func
        add     x0, x0, #:lo12:__bolt_instr_ind_call_handler_func
        blr     x0

__bolt_instr_ind_call_handler:  (exit snippet)
        msr     nzcv, x1
        ldp     x0, x1, [sp], #16
        ldr     x16, [sp], #16
        ldp     x0, x1, [sp], #16
        br      x16    <- overwrites the original value in X16

__bolt_instr_ind_call_handler_func:  (entry snippet)
        stp     x0, x1, [sp, #-16]!
        mrs     x1, nzcv
        adrp    x0, __bolt_instr_ind_call_handler
        add     x0, x0, x0, #:lo12:__bolt_instr_ind_call_handler
        ldr     x0, [x0]
        cmp     x0, #0x0
        b.eq    __bolt_instr_ind_call_handler
        str     x30, [sp, #-16]!
        blr     x0     <--- runtime lib store/load all regs
        ldr     x30, [sp], #16
        b       __bolt_instr_ind_call_handler

After:

        mov     x16, foo
infirectCall:
        adrp    x8, Label
        add     x8, x8, #:lo12:Lable
        blr     x8

Instrumented indirect call:
        stp     x0, x1, [sp, #-16]!
        mov     x0, x8
        movk    x1, #0x0, lsl #48
        movk    x1, #0x0, lsl #32
        movk    x1, #0x0, lsl #16
        movk    x1, #0x0
        stp     x0, x0, [sp, #-16]!
        adrp    x8, __bolt_instr_ind_call_handler_func
        add     x8, x8, #:lo12:__bolt_instr_ind_call_handler_func
        str     x30, [sp, #-16]!
        blr     x8       <--- call trampoline instr lib
        ldr     x30, [sp], #16
        ldp     x0, x1, [sp], #16
        mov     x8, x0   <---- restore original target
        ldp     x0, x1, [sp], #16
        blr     x8       <--- original indirect call instruction

// don't touch regs besides x0, x1
__bolt_instr_ind_call_handler:  (exit snippet)
        ldr     x1, sp, 16
        msr     nzcv, x1
        ldp     x0, x1, [sp], #16
        ret     <---- return to original function with indirect call

__bolt_instr_ind_call_handler_func: (entry snippet)
        stp     x0, x1, [sp, #-16]!
        mrs     x1, nzcv
        str     x1, [sp, #-16]!
        adrp    x0, __bolt_instr_ind_call_handler
        add     x0, x0, #:lo12:__bolt_instr_ind_call_handler
        ldr     x0, [x0]
        cmp     x0, #0x0
        b.eq    __bolt_instr_ind_call_handler
        str     x30, [sp, #-16]!
        blr     x0     <--- runtime lib store/load all regs
        ldr     x30, [sp], #16
        b       __bolt_instr_ind_call_handler

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the BOLT label May 29, 2025
@llvmbot
Copy link
Member

llvmbot commented May 29, 2025

@llvm/pr-subscribers-bolt

Author: Alexey Moksyakov (yavtuk)

Changes

Indirect call instrumentation snippet uses x16 register in exit handler to go to destination target

__bolt_instr_ind_call_handler_func:
        msr  nzcv, x1
        ldp  x0, x1, [sp], #<!-- -->16
        ldr  x16, [sp], #<!-- -->16
        ldp  x0, x1, [sp], #<!-- -->16
        br   x16	&lt;----- go to indirect target

Depends on compiler x16 register used to store something in cross function calling. This patch adds the instrumentation snippet by calling instrumentation runtime library through indirect call instruction and adding the wrapper to store/load target value and the register for original indirect instruction without touching registers except x0, x1

Example:
mov x16, foo

infirectCall:
        adrp x8, Label
        add  x8, x8, #:lo12:Lable
        blr x8

Before:

Instrumented indirect call:
        stp     x0, x1, [sp, #-16]!
        mov     x0, x8
        movk    x1, #<!-- -->0x0, lsl #<!-- -->48
        movk    x1, #<!-- -->0x0, lsl #<!-- -->32
        movk    x1, #<!-- -->0x0, lsl #<!-- -->16
        movk    x1, #<!-- -->0x0
        stp     x0, x1, [sp, #-16]!
        adrp    x0, __bolt_instr_ind_call_handler_func
        add     x0, x0, #:lo12:__bolt_instr_ind_call_handler_func
        blr     x0

__bolt_instr_ind_call_handler:  (exit snippet)
        msr     nzcv, x1
        ldp     x0, x1, [sp], #<!-- -->16
        ldr     x16, [sp], #<!-- -->16
        ldp     x0, x1, [sp], #<!-- -->16
        br      x16    &lt;- overwrites the original value in X16

__bolt_instr_ind_call_handler_func:  (entry snippet)
        stp     x0, x1, [sp, #-16]!
        mrs     x1, nzcv
        adrp    x0, __bolt_instr_ind_call_handler
        add     x0, x0, x0, #:lo12:__bolt_instr_ind_call_handler
        ldr     x0, [x0]
        cmp     x0, #<!-- -->0x0
        b.eq    __bolt_instr_ind_call_handler
        str     x30, [sp, #-16]!
        blr     x0     &lt;--- runtime lib store/load all regs
        ldr     x30, [sp], #<!-- -->16
        b       __bolt_instr_ind_call_handler

After:

        mov     x16, foo
infirectCall:
        adrp    x8, Label
        add     x8, x8, #:lo12:Lable
        blr     x8

Instrumented indirect call:
        stp     x0, x1, [sp, #-16]!
        mov     x0, x8
        movk    x1, #<!-- -->0x0, lsl #<!-- -->48
        movk    x1, #<!-- -->0x0, lsl #<!-- -->32
        movk    x1, #<!-- -->0x0, lsl #<!-- -->16
        movk    x1, #<!-- -->0x0
        stp     x0, x0, [sp, #-16]!
        adrp    x8, __bolt_instr_ind_call_handler_func
        add     x8, x8, #:lo12:__bolt_instr_ind_call_handler_func
        str     x30, [sp, #-16]!
        blr     x8       &lt;--- call trampoline instr lib
        ldr     x30, [sp], #<!-- -->16
        ldp     x0, x1, [sp], #<!-- -->16
        mov     x8, x0   &lt;---- restore original target
        ldp     x0, x1, [sp], #<!-- -->16
        blr     x8       &lt;--- original indirect call instruction

// don't touch regs besides x0, x1
__bolt_instr_ind_call_handler:  (exit snippet)
        ldr     x1, sp, 16
        msr     nzcv, x1
        ldp     x0, x1, [sp], #<!-- -->16
        ret     &lt;---- return to original function with indirect call

__bolt_instr_ind_call_handler_func: (entry snippet)
        stp     x0, x1, [sp, #-16]!
        mrs     x1, nzcv
        str     x1, [sp, #-16]!
        adrp    x0, __bolt_instr_ind_call_handler
        add     x0, x0, #:lo12:__bolt_instr_ind_call_handler
        ldr     x0, [x0]
        cmp     x0, #<!-- -->0x0
        b.eq    __bolt_instr_ind_call_handler
        str     x30, [sp, #-16]!
        blr     x0     &lt;--- runtime lib store/load all regs
        ldr     x30, [sp], #<!-- -->16
        b       __bolt_instr_ind_call_handler

Full diff: https://github.com/llvm/llvm-project/pull/141918.diff

4 Files Affected:

  • (modified) bolt/include/bolt/Core/MCPlusBuilder.h (+5)
  • (modified) bolt/lib/Passes/Instrumentation.cpp (+6-5)
  • (modified) bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp (+92-34)
  • (modified) bolt/runtime/instr.cpp (+2-2)
diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h
index b233452985502..a8a3a58dba836 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -511,6 +511,11 @@ class MCPlusBuilder {
     llvm_unreachable("not implemented");
   }
 
+  virtual void createDirectBranch(MCInst &Inst, const MCSymbol *Target,
+                                  MCContext *Ctx) {
+    llvm_unreachable("not implemented");
+  }
+
   virtual MCPhysReg getX86R11() const { llvm_unreachable("not implemented"); }
 
   virtual unsigned getShortBranchOpcode(unsigned Opcode) const {
diff --git a/bolt/lib/Passes/Instrumentation.cpp b/bolt/lib/Passes/Instrumentation.cpp
index fbf889279f1c0..8dc5482c44651 100644
--- a/bolt/lib/Passes/Instrumentation.cpp
+++ b/bolt/lib/Passes/Instrumentation.cpp
@@ -293,9 +293,12 @@ void Instrumentation::instrumentIndirectTarget(BinaryBasicBlock &BB,
                                                BinaryBasicBlock::iterator &Iter,
                                                BinaryFunction &FromFunction,
                                                uint32_t From) {
-  auto L = FromFunction.getBinaryContext().scopeLock();
-  const size_t IndCallSiteID = Summary->IndCallDescriptions.size();
-  createIndCallDescription(FromFunction, From);
+  size_t IndCallSiteID;
+  {
+    auto L = FromFunction.getBinaryContext().scopeLock();
+    IndCallSiteID = Summary->IndCallDescriptions.size();
+    createIndCallDescription(FromFunction, From);
+  }
 
   BinaryContext &BC = FromFunction.getBinaryContext();
   bool IsTailCall = BC.MIB->isTailCall(*Iter);
@@ -305,9 +308,7 @@ void Instrumentation::instrumentIndirectTarget(BinaryBasicBlock &BB,
                  : IndCallHandlerExitBBFunction->getSymbol(),
       IndCallSiteID, &*BC.Ctx);
 
-  Iter = BB.eraseInstruction(Iter);
   Iter = insertInstructions(CounterInstrs, BB, Iter);
-  --Iter;
 }
 
 bool Instrumentation::instrumentOneTarget(
diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
index 9d5a578cfbdff..4895df33bec81 100644
--- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
@@ -1966,6 +1966,15 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
       convertJmpToTailCall(Inst);
   }
 
+  void createDirectBranch(MCInst &Inst, const MCSymbol *Target,
+                          MCContext *Ctx) override {
+    Inst.setOpcode(AArch64::B);
+    Inst.clear();
+    Inst.addOperand(MCOperand::createExpr(getTargetExprFor(
+        Inst, MCSymbolRefExpr::create(Target, MCSymbolRefExpr::VK_None, *Ctx),
+        *Ctx, 0)));
+  }
+
   bool analyzeBranch(InstructionIterator Begin, InstructionIterator End,
                      const MCSymbol *&TBB, const MCSymbol *&FBB,
                      MCInst *&CondBranch,
@@ -2328,21 +2337,26 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
   }
 
   InstructionListType createInstrumentedIndCallHandlerExitBB() const override {
-    InstructionListType Insts(5);
     // Code sequence for instrumented indirect call handler:
+    //   ldr  x1, [sp, #16]
     //   msr  nzcv, x1
     //   ldp  x0, x1, [sp], #16
-    //   ldr  x16, [sp], #16
-    //   ldp  x0, x1, [sp], #16
-    //   br   x16
-    setSystemFlag(Insts[0], AArch64::X1);
-    createPopRegisters(Insts[1], AArch64::X0, AArch64::X1);
-    // Here we load address of the next function which should be called in the
-    // original binary to X16 register. Writing to X16 is permitted without
-    // needing to restore.
-    loadReg(Insts[2], AArch64::X16, AArch64::SP);
-    createPopRegisters(Insts[3], AArch64::X0, AArch64::X1);
-    createIndirectBranch(Insts[4], AArch64::X16, 0);
+    //   ret
+
+    InstructionListType Insts;
+
+    Insts.emplace_back();
+    loadReg(Insts.back(), AArch64::X1, AArch64::SP);
+
+    Insts.emplace_back();
+    setSystemFlag(Insts.back(), AArch64::X1);
+
+    Insts.emplace_back();
+    createPopRegisters(Insts.back(), AArch64::X0, AArch64::X1);
+
+    Insts.emplace_back();
+    createReturn(Insts.back());
+
     return Insts;
   }
 
@@ -2418,39 +2432,69 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
                                                      MCSymbol *HandlerFuncAddr,
                                                      int CallSiteID,
                                                      MCContext *Ctx) override {
-    InstructionListType Insts;
     // Code sequence used to enter indirect call instrumentation helper:
-    //   stp x0, x1, [sp, #-16]! createPushRegisters
+    //   stp x0, x1, [sp, #-16]! createPushRegisters  (1)
     //   mov target x0  convertIndirectCallToLoad -> orr x0 target xzr
     //   mov x1 CallSiteID createLoadImmediate ->
     //   movk    x1, #0x0, lsl #48
     //   movk    x1, #0x0, lsl #32
     //   movk    x1, #0x0, lsl #16
     //   movk    x1, #0x0
-    //   stp x0, x1, [sp, #-16]!
-    //   bl *HandlerFuncAddr createIndirectCall ->
+    //   stp x0, x1, [sp, #-16]!    (2)
     //   adr x0 *HandlerFuncAddr -> adrp + add
-    //   blr x0
+    //   str x30, [sp, #-16]!  (3)
+    //   blr x0   (__bolt_instr_ind_call_handler_func)
+    //   ldr x30, sp, #16      (3)
+    //   ldp x0, x1, [sp], #16   (2)
+    //   mov x0, x0  ; move target address to used register
+    //   ldp x0, x1, [sp], #16   (1)
+
+    InstructionListType Insts;
     Insts.emplace_back();
-    createPushRegisters(Insts.back(), AArch64::X0, AArch64::X1);
+    createPushRegisters(Insts.back(), getIntArgRegister(0),
+                        getIntArgRegister(1));
     Insts.emplace_back(CallInst);
-    convertIndirectCallToLoad(Insts.back(), AArch64::X0);
+    convertIndirectCallToLoad(Insts.back(), getIntArgRegister(0));
     InstructionListType LoadImm =
         createLoadImmediate(getIntArgRegister(1), CallSiteID);
     Insts.insert(Insts.end(), LoadImm.begin(), LoadImm.end());
     Insts.emplace_back();
-    createPushRegisters(Insts.back(), AArch64::X0, AArch64::X1);
+    createPushRegisters(Insts.back(), getIntArgRegister(0),
+                        getIntArgRegister(1));
     Insts.resize(Insts.size() + 2);
-    InstructionListType Addr =
-        materializeAddress(HandlerFuncAddr, Ctx, AArch64::X0);
+    InstructionListType Addr = materializeAddress(
+        HandlerFuncAddr, Ctx, CallInst.getOperand(0).getReg());
     assert(Addr.size() == 2 && "Invalid Addr size");
     std::copy(Addr.begin(), Addr.end(), Insts.end() - Addr.size());
+
+    Insts.emplace_back();
+    storeReg(Insts.back(), AArch64::LR, getSpRegister(/*Size*/ 8));
+
+    Insts.emplace_back();
+    createIndirectCallInst(Insts.back(), false,
+                           CallInst.getOperand(0).getReg());
+
     Insts.emplace_back();
-    createIndirectCallInst(Insts.back(), isTailCall(CallInst), AArch64::X0);
+    loadReg(Insts.back(), AArch64::LR, getSpRegister(/*Size*/ 8));
 
-    // Carry over metadata including tail call marker if present.
-    stripAnnotations(Insts.back());
-    moveAnnotations(std::move(CallInst), Insts.back());
+    Insts.emplace_back();
+    createPopRegisters(Insts.back(), getIntArgRegister(0),
+                       getIntArgRegister(1));
+
+    // move x0 to indirect call register
+    Insts.emplace_back();
+    Insts.back().setOpcode(AArch64::ORRXrs);
+    Insts.back().insert(Insts.back().begin(),
+                        MCOperand::createReg(CallInst.getOperand(0).getReg()));
+    Insts.back().insert(Insts.back().begin() + 1,
+                        MCOperand::createReg(AArch64::XZR));
+    Insts.back().insert(Insts.back().begin() + 2,
+                        MCOperand::createReg(getIntArgRegister(0)));
+    Insts.back().insert(Insts.back().begin() + 3, MCOperand::createImm(0));
+
+    Insts.emplace_back();
+    createPopRegisters(Insts.back(), getIntArgRegister(0),
+                       getIntArgRegister(1));
 
     return Insts;
   }
@@ -2472,30 +2516,44 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
     //   ldr     x30, [sp], #16
     //   b       IndCallHandler
     InstructionListType Insts;
+
     Insts.emplace_back();
-    createPushRegisters(Insts.back(), AArch64::X0, AArch64::X1);
+    createPushRegisters(Insts.back(), getIntArgRegister(0),
+                        getIntArgRegister(1));
+
     Insts.emplace_back();
     getSystemFlag(Insts.back(), getIntArgRegister(1));
+
+    Insts.emplace_back();
+    storeReg(Insts.back(), getIntArgRegister(1), getSpRegister(/*Size*/ 8));
+
     Insts.emplace_back();
     Insts.emplace_back();
     InstructionListType Addr =
-        materializeAddress(InstrTrampoline, Ctx, AArch64::X0);
+        materializeAddress(InstrTrampoline, Ctx, getIntArgRegister(0));
     std::copy(Addr.begin(), Addr.end(), Insts.end() - Addr.size());
     assert(Addr.size() == 2 && "Invalid Addr size");
+
     Insts.emplace_back();
-    loadReg(Insts.back(), AArch64::X0, AArch64::X0);
+    loadReg(Insts.back(), getIntArgRegister(0), getIntArgRegister(0));
+
     InstructionListType cmpJmp =
-        createCmpJE(AArch64::X0, 0, IndCallHandler, Ctx);
+        createCmpJE(getIntArgRegister(0), 0, IndCallHandler, Ctx);
     Insts.insert(Insts.end(), cmpJmp.begin(), cmpJmp.end());
+
     Insts.emplace_back();
-    storeReg(Insts.back(), AArch64::LR, AArch64::SP);
+    storeReg(Insts.back(), AArch64::LR, getSpRegister(/*Size*/ 8));
+
     Insts.emplace_back();
     Insts.back().setOpcode(AArch64::BLR);
-    Insts.back().addOperand(MCOperand::createReg(AArch64::X0));
+    Insts.back().addOperand(MCOperand::createReg(getIntArgRegister(0)));
+
     Insts.emplace_back();
-    loadReg(Insts.back(), AArch64::LR, AArch64::SP);
+    loadReg(Insts.back(), AArch64::LR, getSpRegister(/*Size*/ 8));
+
     Insts.emplace_back();
-    createDirectCall(Insts.back(), IndCallHandler, Ctx, /*IsTailCall*/ true);
+    createDirectBranch(Insts.back(), IndCallHandler, Ctx);
+
     return Insts;
   }
 
diff --git a/bolt/runtime/instr.cpp b/bolt/runtime/instr.cpp
index ae356e71cbe41..a174b982cbb84 100644
--- a/bolt/runtime/instr.cpp
+++ b/bolt/runtime/instr.cpp
@@ -1668,7 +1668,7 @@ extern "C" __attribute((naked)) void __bolt_instr_indirect_call()
 #if defined(__aarch64__)
   // clang-format off
   __asm__ __volatile__(SAVE_ALL
-                       "ldp x0, x1, [sp, #288]\n"
+                       "ldp x0, x1, [sp, #320]\n"
                        "bl instrumentIndirectCall\n"
                        RESTORE_ALL
                        "ret\n"
@@ -1705,7 +1705,7 @@ extern "C" __attribute((naked)) void __bolt_instr_indirect_tailcall()
 #if defined(__aarch64__)
   // clang-format off
   __asm__ __volatile__(SAVE_ALL
-                       "ldp x0, x1, [sp, #288]\n"
+                       "ldp x0, x1, [sp, #320]\n"
                        "bl instrumentIndirectCall\n"
                        RESTORE_ALL
                        "ret\n"

@yavtuk
Copy link
Contributor Author

yavtuk commented May 29, 2025

@yota9 @ilinpv @paschalis-mpeis Hi folks, can you please verify this patch related to instrumentation indirect call/branch.

@yavtuk
Copy link
Contributor Author

yavtuk commented May 29, 2025

Related #133465

@yavtuk yavtuk force-pushed the fix/indirect-call-instrumentation-snippet branch 2 times, most recently from 9e898b9 to cce28ed Compare May 29, 2025 14:58
@yavtuk
Copy link
Contributor Author

yavtuk commented May 29, 2025

@yavtuk
Copy link
Contributor Author

yavtuk commented Oct 31, 2025

Fixed #165664

Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Alexey,

Sorry, for the delay. I've only just come back to this due to your recent comment. I think the logic checks out, but I'd wait for more eyes.

Shall we add a test to cover the assembly sequences (callsite and entry / exit stubs) based on your example? eg:

_start:
  adrp x8, someTarget
  add  x8, x8, :lo12:someTarget
  blr  x8
  ret

Clarification: In the description, on your first code snippet of __bolt_instr_ind_call_handler_func, you mean __bolt_instr_ind_call_handler, right?


TMU, the original flow was:

  • callsite code calls entry handler
  • entry-handler
    • if the hook is set up: it runs, preserving LR
    • then it calls exit-handler, which:
      • restores condition flags and x0/x1
      • clobbers x16 to branch to the original function

The new code preserves the original callsite, and updates the hook to preserve both original x0/x1 and the ind.tgt+callsite id?

The new flow is:

  • callsite code calls the entry handler
  • entry handler
    • if the hook is set up: it runs, preserving LR
    • then it calls the exit handler, which:
      • restores condition flags and x0/x1
      • returns to the original callsite to do the (preserved) call

@paschalis-mpeis paschalis-mpeis changed the title "[bolt][aarch64] Fixed indirect call instrumentation snippet" [BOLT][AArch64] Fixed indirect call instrumentation snippet Nov 4, 2025
@yavtuk
Copy link
Contributor Author

yavtuk commented Nov 5, 2025

@paschalis-mpeis thanks for review,

"Clarification: In the #141918 (comment), on your first code snippet of __bolt_instr_ind_call_handler_func, you mean __bolt_instr_ind_call_handler, right?"
yes, you are right, it's about __bolt_instr_ind_call_handler

@yavtuk
Copy link
Contributor Author

yavtuk commented Nov 5, 2025

"
The new flow is:

callsite code calls the entry handler
entry handler
if the hook is set up: it runs, preserving LR
then it calls the exit handler, which:
restores condition flags and x0/x1
returns to the original callsite to do the (preserved) call
"

yes, you are right here, the main difference that before we called indirect target inside instrumented library and used x16 to return back. in the patch indirect call leave as-is, inside instrumentation library only counters for target function is increased with store/load all touch registers values

@yavtuk
Copy link
Contributor Author

yavtuk commented Nov 5, 2025

I can extend the checking for currect instrumentation-ind-call.c test

@yavtuk yavtuk requested a review from yozhu as a code owner November 5, 2025 11:22
@github-actions
Copy link

github-actions bot commented Nov 5, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@yavtuk
Copy link
Contributor Author

yavtuk commented Nov 5, 2025

I'll fixup commits messages after tests

@yozhu
Copy link
Contributor

yozhu commented Nov 5, 2025

@yavtuk Thanks for the patch!

We have some similar internal changes on rewriting indirect call instrumentation code sequence. I compared the changes and would like to check with you whether the differences are correct:

Instrumented indirect call:
stp x0, x1, [sp, #-16]!
mov x0, x8
movk x1, #0x0, lsl #48
movk x1, #0x0, lsl #32
movk x1, #0x0, lsl #16
movk x1, #0x0
stp x0, x0, [sp, #-16]!
adrp x8, __bolt_instr_ind_call_handler_func
add x8, x8, #:lo12:__bolt_instr_ind_call_handler_func
str x30, [sp, #-16]!
blr x8 <--- call trampoline instr lib
ldr x30, [sp], #16
[x] ldp x0, x1, [sp], #16
[x] mov x8, x0 <---- restore original target
ldp x0, x1, [sp], #16
blr x8 <--- original indirect call instruction

There is a typo in the stp instruction immediately after the four movk instructions - it should be stp x0, x1, [sp, #-16]!, right?

For the call to trampoline sequence, we can also just use x0 to hold the trampoline address. It doesn't need to be x8, the original register used for indirect call. But this doesn't impact functionality or code sequence length.

The two instructions marked with [x] can be replaced by one ldr instruction ldr x8, [sp], #0x10, because we just need to restore x8, and x0 and x1 will be loaded from stack after this and before the final blr x8.

// don't touch regs besides x0, x1
__bolt_instr_ind_call_handler: (exit snippet)
[x] ldr x1, sp, 16
msr nzcv, x1
[x] ldp x0, x1, [sp], #16
ret <---- return to original function with indirect call

__bolt_instr_ind_call_handler_func: (entry snippet)
[x] stp x0, x1, [sp, #-16]!
mrs x1, nzcv
[x] str x1, [sp, #-16]!
adrp x0, __bolt_instr_ind_call_handler
add x0, x0, #:lo12:__bolt_instr_ind_call_handler
ldr x0, [x0]
cmp x0, #0x0
b.eq __bolt_instr_ind_call_handler
str x30, [sp, #-16]!
blr x0 <--- runtime lib store/load all regs
ldr x30, [sp], #16
b __bolt_instr_ind_call_handler

The adrp+add+ldr sequence can be changed to adrp+ldr sequence. If needed, we can make ind_call_handler at least 8 byte aligned:

      adrp    x0, __bolt_instr_ind_call_handler
      ldr        x0, [x0, #:lo12:__bolt_instr_ind_call_handler]

The str x1, [sp, #-16]! instruction immediately after mrs x1, nzcv can be removed, so is the ldr x1, sp, 16 instruction at the entry of exit snippet, since indirect call handler defined in runtime library will save and restore all the registers, so we don't need to save and restore x1 here.

Similar reason, we can also remove the stp at the beginning of entry snippet, and ldp before the ret in exit snippet.

@yavtuk
Copy link
Contributor Author

yavtuk commented Nov 5, 2025

Thanks for the review, yes, the first one is the typo, the rest I will check

@paschalis-mpeis
Copy link
Member

Shall we also check the entry / exit stubs in the amended lit test?
ie objdump and check code output of __bolt_instr _ind_call_handler, bolt_instr_ind_call_handler_func

We could also check the BOLT output, probably with:

BOLT-INSTRUMENTER: Number of indirect call site descriptors: N

@yavtuk
Copy link
Contributor Author

yavtuk commented Nov 6, 2025

Shall we also check the entry / exit stubs in the amended lit test? ie objdump and check code output of __bolt_instr _ind_call_handler, bolt_instr_ind_call_handler_func

We could also check the BOLT output, probably with:

BOLT-INSTRUMENTER: Number of indirect call site descriptors: N

yes, sure

@yavtuk
Copy link
Contributor Author

yavtuk commented Nov 6, 2025

@paschalis-mpeis please wait a while a rewrite code snippet to store/load registers, i will let you and yozhu know when need to check latest changes

@paschalis-mpeis
Copy link
Member

Yes, of course, please take your time. We also want to get a chance to test this a bit.

yavtuk and others added 4 commits November 6, 2025 15:02
Indirect call instrumentation snippet uses x16 register in exit
handler to go to destination target

    __bolt_instr_ind_call_handler_func:
            msr  nzcv, x1
            ldp  x0, x1, [sp], llvm#16
            ldr  x16, [sp], llvm#16
            ldp  x0, x1, [sp], llvm#16
            br   x16	<-----

Depend on compiler x16 register used to store smtng in cross function calling.
This patch adds the instrumentation snippet by calling instrumentation
runtime library through indirect call instruction and adding the wrapper
to store/load target value and the register for original indirect instruction.

Example:
            mov x16, foo

    infirectCall:
            adrp x8, Label
            add  x8, x8, #:lo12:Lable
            blr x8

Before:

    Instrumented indirect call:
            stp     x0, x1, [sp, #-16]!
            mov     x0, x8
            movk    x1, #0x0, lsl llvm#48
            movk    x1, #0x0, lsl llvm#32
            movk    x1, #0x0, lsl llvm#16
            movk    x1, #0x0
            stp     x0, x1, [sp, #-16]!
            adrp    x0, __bolt_instr_ind_call_handler_func
            add     x0, x0, #:lo12:__bolt_instr_ind_call_handler_func
            blr     x0

    __bolt_instr_ind_call_handler:  (exit snippet)
            msr     nzcv, x1
            ldp     x0, x1, [sp], llvm#16
            ldr     x16, [sp], llvm#16
            ldp     x0, x1, [sp], llvm#16
            br      x16    <- overwrites the original value in X16

    __bolt_instr_ind_call_handler_func:  (entry snippet)
            stp     x0, x1, [sp, #-16]!
            mrs     x1, nzcv
            adrp    x0, __bolt_instr_ind_call_handler
            add     x0, x0, x0, #:lo12:__bolt_instr_ind_call_handler
            ldr     x0, [x0]
            cmp     x0, #0x0
            b.eq    __bolt_instr_ind_call_handler
            str     x30, [sp, #-16]!
            blr     x0     <--- runtime lib store/load all regs
            ldr     x30, [sp], llvm#16
            b       __bolt_instr_ind_call_handler

_________________________________________________________________________

After:

            mov     x16, foo
    infirectCall:
            adrp    x8, Label
            add     x8, x8, #:lo12:Lable
            blr     x8

    Instrumented indirect call:
            stp     x0, x1, [sp, #-16]!
            mov     x0, x8
            movk    x1, #0x0, lsl llvm#48
            movk    x1, #0x0, lsl llvm#32
            movk    x1, #0x0, lsl llvm#16
            movk    x1, #0x0
            stp     x0, x0, [sp, #-16]!
            adrp    x8, __bolt_instr_ind_call_handler_func
            add     x8, x8, #:lo12:__bolt_instr_ind_call_handler_func
            str     x30, [sp, #-16]!
            blr     x8       <--- call trampoline instr lib
            ldr     x30, [sp], llvm#16
            ldp     x0, x1, [sp], llvm#16
            mov     x8, x0   <---- restore original target
            ldp     x0, x1, [sp], llvm#16
            blr     x8       <--- original indirect call instruction

    // don't touch regs besides x0, x1
    __bolt_instr_ind_call_handler:  (exit snippet)
            ldr     x1, sp, 16
            msr     nzcv, x1
            ldp     x0, x1, [sp], llvm#16
            ret     <---- return to original function with indirect call

    __bolt_instr_ind_call_handler_func: (entry snippet)
            stp     x0, x1, [sp, #-16]!
            mrs     x1, nzcv
            str     x1, [sp, #-16]!
            adrp    x0, __bolt_instr_ind_call_handler
            add     x0, x0, #:lo12:__bolt_instr_ind_call_handler
            ldr     x0, [x0]
            cmp     x0, #0x0
            b.eq    __bolt_instr_ind_call_handler
            str     x30, [sp, #-16]!
            blr     x0     <--- runtime lib store/load all regs
            ldr     x30, [sp], llvm#16
            b       __bolt_instr_ind_call_handler
@yavtuk yavtuk force-pushed the fix/indirect-call-instrumentation-snippet branch from 507f00c to e26b400 Compare November 6, 2025 15:29
Remove redundant load/store instructions in entryBB/exitBB
indirect call instrumentation snippet, move msr/mrs to
SAVE_ALL/RESTORE_ALL

Signed-off-by: Moksyakov Alexey <[email protected]>
@yavtuk
Copy link
Contributor Author

yavtuk commented Nov 6, 2025

@paschalis-mpeis @yozhu please take a look one more time to test and instrumentation snippets, I remove redundant ldp/stp instruction, move msr/mrs to SAVE_ALL/RESTORE_ALL

I will rewrite first commit message if you agree with current snippets

@yavtuk yavtuk force-pushed the fix/indirect-call-instrumentation-snippet branch from e26b400 to cc807f9 Compare November 6, 2025 15:41
// blr x0 (__bolt_instr_ind_call_handler_func)
// ldr x30, sp, #16 (3)
// ldp x0, x1, [sp], #16 (2)
// mov x0, x0 ; move target address to used register
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two instructions, ldp x0, x1, [sp], #16 followed by mov xN, x0, can be replaced with one instruction ldr xN, [sp], #16, since x0 and x1 will be loaded again from stack anyway, and here we just need to pop stack getting the original indirect call target into xN.

// Code sequence used to check whether InstrTampoline was initialized
// Code sequence used to check whether InstrTrampoline was initialized
// and call it if so, returns via IndCallHandler
// stp x0, x1, [sp, #-16]!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also remove this stp instruction, as well as the corresponding ldr in the exit snippet.

And because the save and restore of nzcv have been moved into the SAVE_ALL and RESTORE_ALL macro's, the exit snippet will contain only one ret instruction. We can replace the b IndCallHandler with ret, replace the b.eq IndCallHandler with b.eq <label_to_the_ret_instruction>, and remove the exit snippet, like

adrp x0, InstrTrampoline
ldr x0, [x0, #lo12:InstrTrampoline]
subs x0, x0, #0x0
b.eq .LtmpRet
str x30, [p, #-16]!
blr x0
ldr x30, [sp], #16
.LtmpRet:
ret

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants