Skip to content

Conversation

kees
Copy link
Contributor

@kees kees commented Oct 16, 2025

Implement KCFI (Kernel Control Flow Integrity) backend support for
ARM32, Thumb2, and Thumb1. The Linux kernel has supported ARM KCFI via
Clang's generic KCFI implementation, but this has finally started to
cause problems
so it's time to get the KCFI operand bundle lowering working on ARM.

Supports patchable-function-prefix with adjusted load offsets. Provides
an instruction size worst case estimate of how large the KCFI bundle is
so that range-limited instructions (e.g. cbz) know how big the indirect
calls can become.

ARM implementation notes:

  • Four-instruction EOR sequence builds the 32-bit type ID byte-by-byte
    to work within ARM's modified immediate encoding constraints.
  • Scratch register selection: r12 (IP) is preferred, r3 used as fallback
    when r12 holds the call target. r3 gets spilled/reloaded if it is
    being used as a call argument.
  • UDF trap encoding: 0x8000 | (0x1F << 5) | target_reg_index, similar
    to aarch64's trap encoding.

Thumb2 implementation notes:

  • Logically the same as ARM
  • UDF trap encoding: 0x80 | target_reg_index

Thumb1 implementation notes:

  • Due to register pressure, 2 scratch registers are needed: r3 and r2,
    which get spilled/reloaded if they are being used as call args.
  • Instead of EOR, add/lsl sequence to load immediate, followed by
    a compare.
  • No trap encoding.

Update tests to validate all three sub targets.

cc @samitolvanen @nathanchance @bwendling

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:ARM clang:codegen IR generation bugs: mangling, exceptions, etc. labels Oct 16, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 16, 2025

@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-clang

Author: Kees Cook (kees)

Changes

Implement KCFI (Kernel Control Flow Integrity) backend support for ARM32 (ARM mode only, not Thumb), as is already supported for x86, aarch64, and riscv. The Linux kernel has supported ARM KCFI via Clang's generic KCFI implementation, but this has finally started to cause problems so it's time to get the KCFI operand bundle lowering working on ARM.

Implementation notes:

  • Four-instruction EOR sequence builds the 32-bit type ID byte-by-byte to work within ARM's modified immediate encoding constraints.
  • Scratch register selection: r12 (IP) is preferred, r3 used as fallback when r12 holds the call target
  • Automatic r3 spill/reload when r3 is live as a call argument (5+ args)
  • UDF trap encoding: 0x8000 | (0x1F << 5) | target_reg_index, similar to aarch64's trap encoding.
  • Support for patchable-function-prefix with adjusted load offsets
  • Only enabled for ARM mode

Frontend integration updated to skip the KCFI IR pass for ARM targets, allowing the backend to handle KCFI operand bundle lowering directly, matching the implementation used by the other architectures.

cc @samitolvanen @nathanchance @bwendling


Patch is 20.68 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/163698.diff

12 Files Affected:

  • (modified) clang/lib/CodeGen/BackendUtil.cpp (+2-1)
  • (modified) llvm/lib/Target/ARM/ARMAsmPrinter.cpp (+118)
  • (modified) llvm/lib/Target/ARM/ARMAsmPrinter.h (+3)
  • (modified) llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp (+2)
  • (modified) llvm/lib/Target/ARM/ARMISelLowering.cpp (+47)
  • (modified) llvm/lib/Target/ARM/ARMISelLowering.h (+6)
  • (modified) llvm/lib/Target/ARM/ARMInstrInfo.td (+8)
  • (modified) llvm/lib/Target/ARM/ARMTargetMachine.cpp (+7)
  • (added) llvm/test/CodeGen/ARM/kcfi-arm.ll (+65)
  • (added) llvm/test/CodeGen/ARM/kcfi-patchable-function-prefix.ll (+46)
  • (added) llvm/test/CodeGen/ARM/kcfi-r3-spill.ll (+121)
  • (renamed) llvm/test/CodeGen/ARM/kcfi-thumb.ll ()
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index 602068436101b..91a0fdfea96a0 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -687,7 +687,8 @@ static void addKCFIPass(const Triple &TargetTriple, const LangOptions &LangOpts,
                         PassBuilder &PB) {
   // If the back-end supports KCFI operand bundle lowering, skip KCFIPass.
   if (TargetTriple.getArch() == llvm::Triple::x86_64 ||
-      TargetTriple.isAArch64(64) || TargetTriple.isRISCV())
+      TargetTriple.isAArch64(64) || TargetTriple.isRISCV() ||
+      TargetTriple.isARM())
     return;
 
   // Ensure we lower KCFI operand bundles with -O0.
diff --git a/llvm/lib/Target/ARM/ARMAsmPrinter.cpp b/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
index 1f773e2a7e0fc..295a2479228a3 100644
--- a/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
+++ b/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
@@ -1471,6 +1471,121 @@ void ARMAsmPrinter::EmitUnwindingInstruction(const MachineInstr *MI) {
 // instructions) auto-generated.
 #include "ARMGenMCPseudoLowering.inc"
 
+void ARMAsmPrinter::LowerKCFI_CHECK(const MachineInstr &MI) {
+  Register AddrReg = MI.getOperand(0).getReg();
+  const int64_t Type = MI.getOperand(1).getImm();
+
+  // Get the call instruction that follows this KCFI_CHECK.
+  assert(std::next(MI.getIterator())->isCall() &&
+         "KCFI_CHECK not followed by a call instruction");
+  const MachineInstr &Call = *std::next(MI.getIterator());
+
+  // Choose scratch register (r12 or r3): r12 (IP) is the primary choice;
+  // use r3 if r12 is the target.
+  unsigned ScratchReg = ARM::R12;
+  bool NeedSpillR3 = false;
+
+  if (AddrReg == ARM::R12) {
+    ScratchReg = ARM::R3;
+
+    // Check if r3 is live (used as implicit operand in the call).
+    // If so, we need to spill/restore it.
+    for (const MachineOperand &MO : Call.implicit_operands()) {
+      if (MO.isReg() && MO.getReg() == ARM::R3 && MO.isUse()) {
+        NeedSpillR3 = true;
+        break;
+      }
+    }
+  }
+
+  // Adjust the offset for patchable-function-prefix.
+  int64_t PrefixNops = 0;
+  MI.getMF()
+      ->getFunction()
+      .getFnAttribute("patchable-function-prefix")
+      .getValueAsString()
+      .getAsInteger(10, PrefixNops);
+
+  // If we need to spill r3, push it first.
+  if (NeedSpillR3) {
+    // push {r3}
+    EmitToStreamer(*OutStreamer,
+      MCInstBuilder(ARM::STMDB_UPD)
+        .addReg(ARM::SP)
+        .addReg(ARM::SP)
+        .addImm(ARMCC::AL)
+        .addReg(0)
+        .addReg(ARM::R3));
+  }
+
+  // ldr scratch, [target, #-(PrefixNops * 4 + 4)]
+  EmitToStreamer(*OutStreamer,
+    MCInstBuilder(ARM::LDRi12)
+      .addReg(ScratchReg)
+      .addReg(AddrReg)
+      .addImm(-(PrefixNops * 4 + 4))
+      .addImm(ARMCC::AL)
+      .addReg(0));
+
+  // Each EOR instruction XORs one byte of the type, shifted to its position.
+  for (int i = 0; i < 4; i++) {
+    uint8_t byte = (Type >> (i * 8)) & 0xFF;
+    uint32_t imm = byte << (i * 8);
+    bool isLast = (i == 3);
+
+    // Encode as ARM modified immediate.
+    int SOImmVal = ARM_AM::getSOImmVal(imm);
+    assert(SOImmVal != -1 && "Cannot encode immediate as ARM modified immediate");
+
+    // eor[s] scratch, scratch, #imm (last one sets flags with CPSR)
+    EmitToStreamer(*OutStreamer,
+      MCInstBuilder(ARM::EORri)
+        .addReg(ScratchReg)
+        .addReg(ScratchReg)
+        .addImm(SOImmVal)
+        .addImm(ARMCC::AL)
+        .addReg(0)
+        .addReg(isLast ? ARM::CPSR : 0));
+  }
+
+  // If we spilled r3, restore it immediately after the comparison.
+  // This must happen before the branch so r3 is valid on both paths.
+  if (NeedSpillR3) {
+    // pop {r3}
+    EmitToStreamer(*OutStreamer,
+      MCInstBuilder(ARM::LDMIA_UPD)
+        .addReg(ARM::SP)
+        .addReg(ARM::SP)
+        .addImm(ARMCC::AL)
+        .addReg(0)
+        .addReg(ARM::R3));
+  }
+
+  // beq .Lpass (branch if types match, i.e., scratch is zero)
+  MCSymbol *Pass = OutContext.createTempSymbol();
+  EmitToStreamer(*OutStreamer,
+    MCInstBuilder(ARM::Bcc)
+      .addExpr(MCSymbolRefExpr::create(Pass, OutContext))
+      .addImm(ARMCC::EQ)
+      .addReg(ARM::CPSR));
+
+  // udf #ESR (trap with encoded diagnostic)
+  // ESR encoding: 0x8000 | (scratch_reg << 5) | addr_reg.
+  // Note: scratch_reg is always 0x1F since the EOR sequence clobbers it
+  // and it contains no useful information at trap time.
+  const ARMBaseRegisterInfo *TRI =
+      static_cast<const ARMBaseRegisterInfo *>(
+          MI.getMF()->getSubtarget().getRegisterInfo());
+  unsigned AddrIndex = TRI->getEncodingValue(AddrReg);
+  unsigned ESR = 0x8000 | (31 << 5) | (AddrIndex & 31);
+
+  EmitToStreamer(*OutStreamer,
+    MCInstBuilder(ARM::UDF)
+      .addImm(ESR));
+
+  OutStreamer->emitLabel(Pass);
+}
+
 void ARMAsmPrinter::emitInstruction(const MachineInstr *MI) {
   ARM_MC::verifyInstructionPredicates(MI->getOpcode(),
                                       getSubtargetInfo().getFeatureBits());
@@ -1504,6 +1619,9 @@ void ARMAsmPrinter::emitInstruction(const MachineInstr *MI) {
   switch (Opc) {
   case ARM::t2MOVi32imm: llvm_unreachable("Should be lowered by thumb2it pass");
   case ARM::DBG_VALUE: llvm_unreachable("Should be handled by generic printing");
+  case ARM::KCFI_CHECK:
+    LowerKCFI_CHECK(*MI);
+    return;
   case ARM::LEApcrel:
   case ARM::tLEApcrel:
   case ARM::t2LEApcrel: {
diff --git a/llvm/lib/Target/ARM/ARMAsmPrinter.h b/llvm/lib/Target/ARM/ARMAsmPrinter.h
index 2b067c753264f..cb44d882243cc 100644
--- a/llvm/lib/Target/ARM/ARMAsmPrinter.h
+++ b/llvm/lib/Target/ARM/ARMAsmPrinter.h
@@ -123,6 +123,9 @@ class LLVM_LIBRARY_VISIBILITY ARMAsmPrinter : public AsmPrinter {
   void LowerPATCHABLE_FUNCTION_EXIT(const MachineInstr &MI);
   void LowerPATCHABLE_TAIL_CALL(const MachineInstr &MI);
 
+  // KCFI check lowering
+  void LowerKCFI_CHECK(const MachineInstr &MI);
+
 private:
   void EmitSled(const MachineInstr &MI, SledKind Kind);
 
diff --git a/llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp b/llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
index 0d7b6d1236442..fffb63738166d 100644
--- a/llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
+++ b/llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
@@ -2301,6 +2301,8 @@ bool ARMExpandPseudo::ExpandMI(MachineBasicBlock &MBB,
       for (unsigned i = 2, e = MBBI->getNumOperands(); i != e; ++i)
         NewMI->addOperand(MBBI->getOperand(i));
 
+      NewMI->setCFIType(*MBB.getParent(), MI.getCFIType());
+
       // Update call info and delete the pseudo instruction TCRETURN.
       if (MI.isCandidateForAdditionalCallInfo())
         MI.getMF()->moveAdditionalCallInfo(&MI, &*NewMI);
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.cpp b/llvm/lib/Target/ARM/ARMISelLowering.cpp
index 67ea2dd3df792..44cafd1854fc6 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.cpp
+++ b/llvm/lib/Target/ARM/ARMISelLowering.cpp
@@ -2848,6 +2848,8 @@ ARMTargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
   if (isTailCall) {
     MF.getFrameInfo().setHasTailCall();
     SDValue Ret = DAG.getNode(ARMISD::TC_RETURN, dl, MVT::Other, Ops);
+    if (CLI.CFIType)
+      Ret.getNode()->setCFIType(CLI.CFIType->getZExtValue());
     DAG.addNoMergeSiteInfo(Ret.getNode(), CLI.NoMerge);
     DAG.addCallSiteInfo(Ret.getNode(), std::move(CSInfo));
     return Ret;
@@ -2855,6 +2857,8 @@ ARMTargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
 
   // Returns a chain and a flag for retval copy to use.
   Chain = DAG.getNode(CallOpc, dl, {MVT::Other, MVT::Glue}, Ops);
+  if (CLI.CFIType)
+    Chain.getNode()->setCFIType(CLI.CFIType->getZExtValue());
   DAG.addNoMergeSiteInfo(Chain.getNode(), CLI.NoMerge);
   InGlue = Chain.getValue(1);
   DAG.addCallSiteInfo(Chain.getNode(), std::move(CSInfo));
@@ -12007,6 +12011,49 @@ static void genTPLoopBody(MachineBasicBlock *TpLoopBody,
       .add(predOps(ARMCC::AL));
 }
 
+bool ARMTargetLowering::supportKCFIBundles() const {
+  // KCFI is only supported in ARM mode, not Thumb mode
+  return !Subtarget->isThumb();
+}
+
+MachineInstr *
+ARMTargetLowering::EmitKCFICheck(MachineBasicBlock &MBB,
+                                 MachineBasicBlock::instr_iterator &MBBI,
+                                 const TargetInstrInfo *TII) const {
+  assert(MBBI->isCall() && MBBI->getCFIType() &&
+         "Invalid call instruction for a KCFI check");
+
+  // KCFI is only supported in ARM mode, not Thumb mode
+  assert(!Subtarget->isThumb() && "KCFI not supported in Thumb mode");
+
+  MachineOperand *TargetOp = nullptr;
+  switch (MBBI->getOpcode()) {
+  case ARM::BLX:
+  case ARM::BLX_pred:
+  case ARM::BLX_noip:
+  case ARM::BLX_pred_noip:
+  case ARM::BX_CALL:
+    TargetOp = &MBBI->getOperand(0);
+    break;
+  case ARM::TCRETURNri:
+  case ARM::TCRETURNrinotr12:
+  case ARM::TAILJMPr:
+  case ARM::TAILJMPr4:
+    TargetOp = &MBBI->getOperand(0);
+    break;
+  default:
+    llvm_unreachable("Unexpected CFI call opcode");
+  }
+
+  assert(TargetOp && TargetOp->isReg() && "Invalid target operand");
+  TargetOp->setIsRenamable(false);
+
+  return BuildMI(MBB, MBBI, MBBI->getDebugLoc(), TII->get(ARM::KCFI_CHECK))
+      .addReg(TargetOp->getReg())
+      .addImm(MBBI->getCFIType())
+      .getInstr();
+}
+
 MachineBasicBlock *
 ARMTargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
                                                MachineBasicBlock *BB) const {
diff --git a/llvm/lib/Target/ARM/ARMISelLowering.h b/llvm/lib/Target/ARM/ARMISelLowering.h
index 70aa001a41885..8c5e0cfbfda1b 100644
--- a/llvm/lib/Target/ARM/ARMISelLowering.h
+++ b/llvm/lib/Target/ARM/ARMISelLowering.h
@@ -447,6 +447,12 @@ class VectorType;
     void AdjustInstrPostInstrSelection(MachineInstr &MI,
                                        SDNode *Node) const override;
 
+    bool supportKCFIBundles() const override;
+
+    MachineInstr *EmitKCFICheck(MachineBasicBlock &MBB,
+                                MachineBasicBlock::instr_iterator &MBBI,
+                                const TargetInstrInfo *TII) const override;
+
     SDValue PerformCMOVCombine(SDNode *N, SelectionDAG &DAG) const;
     SDValue PerformBRCONDCombine(SDNode *N, SelectionDAG &DAG) const;
     SDValue PerformCMOVToBFICombine(SDNode *N, SelectionDAG &DAG) const;
diff --git a/llvm/lib/Target/ARM/ARMInstrInfo.td b/llvm/lib/Target/ARM/ARMInstrInfo.td
index 282ff534fc112..f7d471d84fa94 100644
--- a/llvm/lib/Target/ARM/ARMInstrInfo.td
+++ b/llvm/lib/Target/ARM/ARMInstrInfo.td
@@ -6535,6 +6535,14 @@ def CMP_SWAP_64 : PseudoInst<(outs GPRPair:$Rd, GPRPair:$addr_temp_out),
 
 def : Pat<(atomic_fence (timm), 0), (MEMBARRIER)>;
 
+//===----------------------------------------------------------------------===//
+// KCFI check pseudo-instruction.
+//===----------------------------------------------------------------------===//
+let isPseudo = 1 in {
+def KCFI_CHECK : PseudoInst<
+  (outs), (ins GPR:$ptr, i32imm:$type), NoItinerary, []>, Sched<[]>;
+}
+
 //===----------------------------------------------------------------------===//
 // Instructions used for emitting unwind opcodes on Windows.
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/ARM/ARMTargetMachine.cpp b/llvm/lib/Target/ARM/ARMTargetMachine.cpp
index 86740a92b32c5..62c7eac0d8fca 100644
--- a/llvm/lib/Target/ARM/ARMTargetMachine.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetMachine.cpp
@@ -111,6 +111,7 @@ extern "C" LLVM_ABI LLVM_EXTERNAL_VISIBILITY void LLVMInitializeARMTarget() {
   initializeMVELaneInterleavingPass(Registry);
   initializeARMFixCortexA57AES1742098Pass(Registry);
   initializeARMDAGToDAGISelLegacyPass(Registry);
+  initializeKCFIPass(Registry);
 }
 
 static std::unique_ptr<TargetLoweringObjectFile> createTLOF(const Triple &TT) {
@@ -487,6 +488,9 @@ void ARMPassConfig::addPreSched2() {
   // proper scheduling.
   addPass(createARMExpandPseudoPass());
 
+  // Emit KCFI checks for indirect calls.
+  addPass(createKCFIPass());
+
   if (getOptLevel() != CodeGenOptLevel::None) {
     // When optimising for size, always run the Thumb2SizeReduction pass before
     // IfConversion. Otherwise, check whether IT blocks are restricted
@@ -530,6 +534,9 @@ void ARMPassConfig::addPreEmitPass() {
 }
 
 void ARMPassConfig::addPreEmitPass2() {
+  // Unpack KCFI bundles before AsmPrinter
+  addPass(createUnpackMachineBundles(nullptr));
+
   // Inserts fixup instructions before unsafe AES operations. Instructions may
   // be inserted at the start of blocks and at within blocks so this pass has to
   // come before those below.
diff --git a/llvm/test/CodeGen/ARM/kcfi-arm.ll b/llvm/test/CodeGen/ARM/kcfi-arm.ll
new file mode 100644
index 0000000000000..80a4654a733be
--- /dev/null
+++ b/llvm/test/CodeGen/ARM/kcfi-arm.ll
@@ -0,0 +1,65 @@
+; RUN: llc -mtriple=armv7-linux-gnueabi -verify-machineinstrs < %s | FileCheck %s --check-prefix=ASM
+; RUN: llc -mtriple=armv7-linux-gnueabi -verify-machineinstrs -stop-after=finalize-isel < %s | FileCheck %s --check-prefixes=MIR,ISEL
+; RUN: llc -mtriple=armv7-linux-gnueabi -verify-machineinstrs -stop-after=kcfi < %s | FileCheck %s --check-prefixes=MIR,KCFI
+
+; ASM:       .long 12345678
+define void @f1(ptr noundef %x) !kcfi_type !1 {
+; ASM-LABEL: f1:
+; ASM:       @ %bb.0:
+; ASM:         ldr r12, [r0, #-4]
+; ASM-NEXT:    eor r12, r12, #78
+; ASM-NEXT:    eor r12, r12, #24832
+; ASM-NEXT:    eor r12, r12, #12320768
+; ASM-NEXT:    eors r12, r12, #0
+; ASM-NEXT:    beq .Ltmp{{[0-9]+}}
+; UDF encoding: 0x8000 | (0x1F << 5) | r0 = 0x83e0 = 33760
+; ASM-NEXT:    udf #33760
+; ASM-NEXT:  .Ltmp{{[0-9]+}}:
+; ASM-NEXT:    blx r0
+
+; MIR-LABEL: name: f1
+; MIR: body:
+
+; ISEL:     BLX %0, csr_aapcs,{{.*}} cfi-type 12345678
+
+; KCFI:       BUNDLE{{.*}} {
+; KCFI-NEXT:    KCFI_CHECK $r0, 12345678
+; KCFI-NEXT:    BLX killed $r0, csr_aapcs,{{.*}}
+; KCFI-NEXT:  }
+
+  call void %x() [ "kcfi"(i32 12345678) ]
+  ret void
+}
+
+; Test with tail call
+define void @f2(ptr noundef %x) !kcfi_type !1 {
+; ASM-LABEL: f2:
+; ASM:       @ %bb.0:
+; ASM:         ldr r12, [r0, #-4]
+; ASM:         eor r12, r12, #78
+; ASM:         eor r12, r12, #24832
+; ASM:         eor r12, r12, #12320768
+; ASM:         eors r12, r12, #0
+; ASM:         beq .Ltmp{{[0-9]+}}
+; UDF encoding: 0x8000 | (0x1F << 5) | r0 = 0x83e0 = 33760
+; ASM:         udf #33760
+; ASM:       .Ltmp{{[0-9]+}}:
+; ASM:         bx r0
+
+; MIR-LABEL: name: f2
+; MIR: body:
+
+; ISEL:     TCRETURNri %0, 0, csr_aapcs, implicit $sp, cfi-type 12345678
+
+; KCFI:       BUNDLE{{.*}} {
+; KCFI-NEXT:    KCFI_CHECK $r0, 12345678
+; KCFI-NEXT:    TAILJMPr killed $r0, csr_aapcs, implicit $sp, implicit $sp
+; KCFI-NEXT:  }
+
+  tail call void %x() [ "kcfi"(i32 12345678) ]
+  ret void
+}
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 4, !"kcfi", i32 1}
+!1 = !{i32 12345678}
diff --git a/llvm/test/CodeGen/ARM/kcfi-patchable-function-prefix.ll b/llvm/test/CodeGen/ARM/kcfi-patchable-function-prefix.ll
new file mode 100644
index 0000000000000..771be9ea9ad44
--- /dev/null
+++ b/llvm/test/CodeGen/ARM/kcfi-patchable-function-prefix.ll
@@ -0,0 +1,46 @@
+; RUN: llc -mtriple=armv7-linux-gnueabi -verify-machineinstrs < %s | FileCheck %s
+
+; CHECK:          .p2align 2
+; CHECK-NOT:        nop
+; CHECK:          .long   12345678
+; CHECK-LABEL:    f1:
+define void @f1(ptr noundef %x) !kcfi_type !1 {
+; CHECK:            ldr r12, [r0, #-4]
+  call void %x() [ "kcfi"(i32 12345678) ]
+  ret void
+}
+
+; CHECK:          .p2align 2
+; CHECK-NOT:       .long
+; CHECK-NOT:        nop
+; CHECK-LABEL:    f2:
+define void @f2(ptr noundef %x) {
+; CHECK:            ldr r12, [r0, #-4]
+  call void %x() [ "kcfi"(i32 12345678) ]
+  ret void
+}
+
+; CHECK:          .p2align 2
+; CHECK:          .long   12345678
+; CHECK-COUNT-11:   nop
+; CHECK-LABEL:    f3:
+define void @f3(ptr noundef %x) #0 !kcfi_type !1 {
+; CHECK:            ldr r12, [r0, #-48]
+  call void %x() [ "kcfi"(i32 12345678) ]
+  ret void
+}
+
+; CHECK:          .p2align 2
+; CHECK-COUNT-11:   nop
+; CHECK-LABEL:    f4:
+define void @f4(ptr noundef %x) #0 {
+; CHECK:            ldr r12, [r0, #-48]
+  call void %x() [ "kcfi"(i32 12345678) ]
+  ret void
+}
+
+attributes #0 = { "patchable-function-prefix"="11" }
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 4, !"kcfi", i32 1}
+!1 = !{i32 12345678}
diff --git a/llvm/test/CodeGen/ARM/kcfi-r3-spill.ll b/llvm/test/CodeGen/ARM/kcfi-r3-spill.ll
new file mode 100644
index 0000000000000..be079def354a0
--- /dev/null
+++ b/llvm/test/CodeGen/ARM/kcfi-r3-spill.ll
@@ -0,0 +1,121 @@
+; RUN: llc -mtriple=armv7-linux-gnueabi -verify-machineinstrs < %s | FileCheck %s
+
+; Test r3 spill/reload when target is r12 and r3 is a call argument.
+; With 5+ arguments (target + 4 args), r0-r3 are all used for arguments,
+; forcing r3 to be spilled when we need it as scratch register.
+
+define void @test_r3_spill(ptr noundef %target, i32 %a, i32 %b, i32 %c, i32 %d) {
+; CHECK-LABEL: test_r3_spill:
+; CHECK:       @ %bb.0:
+; Arguments: r0=%target, r1=%a, r2=%b, r3=%c, [sp]=%d
+; Call needs: r0=%a, r1=%b, r2=%c, r3=%d, target in r12
+; Compiler shuffles arguments into place, saving r3 (c) in lr, loading d from stack
+; CHECK:         mov lr, r3
+; CHECK-NEXT:    ldr r3, [sp, #8]
+; CHECK-NEXT:    mov r12, r0
+; CHECK-NEXT:    mov r0, r1
+; CHECK-NEXT:    mov r1, r2
+; CHECK-NEXT:    mov r2, lr
+; r3 is live as 4th argument, so push it before KCFI check
+; CHECK-NEXT:    stmdb sp!, {r3}
+; CHECK-NEXT:    ldr r3, [r12, #-4]
+; CHECK-NEXT:    eor r3, r3, #78
+; CHECK-NEXT:    eor r3, r3, #24832
+; CHECK-NEXT:    eor r3, r3, #12320768
+; CHECK-NEXT:    eors r3, r3, #0
+; Restore r3 immediately after comparison, before branch
+; CHECK-NEXT:    ldm sp!, {r3}
+; CHECK-NEXT:    beq .Ltmp{{[0-9]+}}
+; UDF encoding: 0x8000 | (0x1F << 5) | r12 = 0x83ec = 33772
+; CHECK-NEXT:    udf #33772
+; CHECK-NEXT:  .Ltmp{{[0-9]+}}:
+; CHECK-NEXT:    blx r12
+
+  call void %target(i32 %a, i32 %b, i32 %c, i32 %d) [ "kcfi"(i32 12345678) ]
+  ret void
+}
+
+; Test with 4 arguments - r3 not needed as argument
+
+define void @test_r3_target(ptr noundef %target, i32 %a, i32 %b, i32 %c) {
+; CHECK-LABEL: test_r3_target:
+; CHECK:       @ %bb.0:
+; Compiler shuffles registers: c→r12, target→r3, a→r0, b→r1, r12→r2
+; CHECK:         mov r12, r3
+; CHECK-NEXT:    mov r3, r0
+; CHECK-NEXT:    mov r0, r1
+; CHECK-NEXT:    mov r1, r2
+; CHECK-NEXT:    mov r2, r12
+; Since target is in r3 (not r12), we use r12 as scratch (no spill needed)
+; CHECK-NEXT:    ldr r12, [r3, #-4]
+; CHECK-NEXT:    eor r12, r12, #78
+; CHECK-NEXT:    eor r12, r12, #24832
+; CHECK-NEXT:    eor r12, r12, #12320768
+; CHECK-NEXT:    eors r12, r12, #0
+; CHECK-NEXT:    beq .Ltmp{{[0-9]+}}
+; UDF encoding: 0x8000 | (0x1F << 5) | r3 = 0x83e3 = 33763
+; CHECK-NEXT:    udf #33763
+; CHECK-NEXT:  .Ltmp{{[0-9]+}}:
+; CHECK-NEXT:    blx r3
+
+  call void %target(i32 %a, i32 %b, i32 %c) [ "kcfi"(i32 12345678) ]
+  ret void
+}
+
+; Test case where call target is r12 but r3 is NOT used as an argument.
+; KCFI should use r3 without spilling.
+
+define void @test_r3_unused(ptr noundef %target, i32 %a, i32 %b) {
+; CHECK-LABEL: test_r3_unused:
+; CHECK:       @ %bb.0:
+; Only 3 arguments total, so r3 is not used as call argument
+; Compiler puts target→r3, a→r0, b→r1
+; CHECK:         mov r3, r0
+; CHECK-NEXT:    mov r0, r1
+; CHECK-NEXT:    mov r1, r2
+; r3 is the target, so we use r12 as scratch (no spill needed)
+; CHECK-NEXT:    ldr r12, [r3, #-4]
+; CHECK-NEXT:    eor r12, r12, #78
+; CHECK-NEXT:    eor r12, r12, #24832
+; CHECK-NEXT:    eor r12, r12, #12320768
+; CHECK-NEXT:    eors r12, r12, #0
+; CHECK-NEXT:    beq .Ltmp{{[0-9]+}}
+; UDF encoding: 0x8000 | (0x1F << 5) | r3 = 0x83e3 = 33763
+; CHECK-NEXT:    udf #33763
+; CHECK-NEXT:  .Ltmp{{[0-9]+}}:
+; CHECK-NEXT:    blx r3
+
+  call void %target(i32 %a, i32 %b) [ "kcfi"(i32 12345678) ]
+  ret void
+}
+
+; Test case where call target is NOT r12, so we use r12 as scratch
+; and don't need to worry about r3 at all.
+
+define void @test_r12_scratch(ptr noundef %target, i32 %a, i32 %b, i32 %c) {
+; CHECK-LABEL: test_r12_scratch:
+; CHECK:       @ %bb.0:
+; Compiler shuffles: c→r12, target→r3, a→r0, b→r1, r12→r2
+; CHECK:         mov r12, r3
+; CHECK-NEXT:    mov r3, r0
+; CHECK-NEXT:    mov r0, r1
+; CHECK-NEXT:    mov r1, r2
+; CHECK-NEXT:    mov r2, r12
+; Target is in r3, so we use r12 as scratch (no spill needed)
+; CHECK-NEXT:    ldr...
[truncated]

Copy link

github-actions bot commented Oct 16, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@kees kees changed the title Add backend support for Kernel Control-Flow Integrity [ARM][KCFI] Add backend support for Kernel Control-Flow Integrity Oct 16, 2025
@kees
Copy link
Contributor Author

kees commented Oct 16, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

Whoops! Yes, fixed now.

@nathanchance
Copy link
Member

With CONFIG_CFI=y and CONFIG_THUMB2_KERNEL=y, I see many

error: out of range pc-relative fixup value

with the latest revision. Normal ARM mode works fine still from what I can tell.

@kees
Copy link
Contributor Author

kees commented Oct 17, 2025

With CONFIG_CFI=y and CONFIG_THUMB2_KERNEL=y, I see many

error: out of range pc-relative fixup value

with the latest revision. Normal ARM mode works fine still from what I can tell.

I forgot we could even compile Linux with Thumb2! I did all my testing of Thumb2 under userspace. Thanks for finding that; I will take a look.

@kees kees force-pushed the arm-kcfi branch 2 times, most recently from eab5934 to 49a5b43 Compare October 17, 2025 23:49
@kees
Copy link
Contributor Author

kees commented Oct 17, 2025

With CONFIG_CFI=y and CONFIG_THUMB2_KERNEL=y, I see many

error: out of range pc-relative fixup value

with the latest revision. Normal ARM mode works fine still from what I can tell.

I forgot we could even compile Linux with Thumb2! I did all my testing of Thumb2 under userspace. Thanks for finding that; I will take a look.

Okay, this should be fixed now, and a new regression test added. I've validated a CONFIG_THUMB2_KERNEL=y Linux build, boot, and LKDTM test run.

Implement KCFI (Kernel Control Flow Integrity) backend support for
ARM32, Thumb2, and Thumb1. The Linux kernel has supported ARM KCFI via
Clang's generic KCFI implementation, but this has finally started to
[cause problems](ClangBuiltLinux/linux#2124)
so it's time to get the KCFI operand bundle lowering working on ARM.

Supports patchable-function-prefix with adjusted load offsets. Provides
an instruction size worst case estimate of how large the KCFI bundle is
so that range-limited instructions (e.g. cbz) know how big the indirect
calls can become.

ARM implementation notes:
- Four-instruction EOR sequence builds the 32-bit type ID byte-by-byte
  to work within ARM's modified immediate encoding constraints.
- Scratch register selection: r12 (IP) is preferred, r3 used as fallback
  when r12 holds the call target. r3 gets spilled/reloaded if it is
  being used as a call argument.
- UDF trap encoding: 0x8000 | (0x1F << 5) | target_reg_index, similar
  to aarch64's trap encoding.

Thumb2 implementation notes:
- Logically the same as ARM
- UDF trap encoding: 0x80 | target_reg_index

Thumb1 implementation notes:
- Due to register pressure, 2 scratch registers are needed: r3 and r2,
  which get spilled/reloaded if they are being used as call args.
- Instead of EOR, add/lsl sequence to load immediate, followed by
  a compare.
- No trap encoding.

Update tests to validate all three sub targets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:ARM clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants