[AArch64] Fix the emission of WinCFI for pac-ret+leaf and SCS #147518

atrosinenko · 2025-07-08T13:17:58Z

This commit fixes WinCFI opcodes being incorrectly emitted for test cases in sign-return-address.ll.

Emit SEH_Nop opcode in emitShadowCallStackEpilogue the same way it is done in emitShadowCallStackPrologue function - this fixes

12 bytes of instructions in range, but .seh directives corresponding to 8 bytes

error being reported for the epilogue of non_leaf_scs function.

Emit SEH_PrologEnd on the code path in emitPrologue function that may be taken when pac-ret protection is emitted for a leaf function - this fixes errors like the following:

starting epilogue (.seh_startepilogue) before prologue has ended (.seh_endprologue) in leaf_sign_all_v83
Stray .seh_endepilogue in leaf_sign_all_v83

efriedma-quic · 2025-07-08T20:22:06Z

Please also FileCheck the assembly output on Windows.

This commit fixes WinCFI opcodes being incorrectly emitted for test cases in sign-return-address.ll. Emit SEH_Nop opcode in emitShadowCallStackEpilogue the same way it is done in emitShadowCallStackPrologue function - this fixes 12 bytes of instructions in range, but .seh directives corresponding to 8 bytes error being reported for the epilogue of non_leaf_scs function. Emit SEH_PrologEnd on the code path in emitPrologue function that may be taken when pac-ret protection is emitted for a leaf function - this fixes errors like the following: starting epilogue (.seh_startepilogue) before prologue has ended (.seh_endprologue) in leaf_sign_all_v83 Stray .seh_endepilogue in leaf_sign_all_v83

atrosinenko · 2025-07-09T16:46:23Z

For now, I rebased and un-drafted this PR after #140895 was merged - will add checks for assembly output soon.

llvmbot · 2025-07-09T16:46:53Z

@llvm/pr-subscribers-backend-aarch64

Author: Anatoly Trosinenko (atrosinenko)

Changes

This commit fixes WinCFI opcodes being incorrectly emitted for test cases in sign-return-address.ll.

Emit SEH_Nop opcode in emitShadowCallStackEpilogue the same way it is done in emitShadowCallStackPrologue function - this fixes

12 bytes of instructions in range, but .seh directives corresponding to 8 bytes

error being reported for the epilogue of non_leaf_scs function.

Emit SEH_PrologEnd on the code path in emitPrologue function that may be taken when pac-ret protection is emitted for a leaf function - this fixes errors like the following:

starting epilogue (.seh_startepilogue) before prologue has ended (.seh_endprologue) in leaf_sign_all_v83
Stray .seh_endepilogue in leaf_sign_all_v83

Full diff: https://github.com/llvm/llvm-project/pull/147518.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64FrameLowering.cpp (+23-9)
(modified) llvm/test/CodeGen/AArch64/sign-return-address.ll (+5)

diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index 666ff8bbab42a..00fbeef9d35df 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -1746,7 +1746,7 @@ static void emitShadowCallStackEpilogue(const TargetInstrInfo &TII,
                                         MachineFunction &MF,
                                         MachineBasicBlock &MBB,
                                         MachineBasicBlock::iterator MBBI,
-                                        const DebugLoc &DL) {
+                                        const DebugLoc &DL, bool NeedsWinCFI) {
   // Shadow call stack epilog: ldr x30, [x18, #-8]!
   BuildMI(MBB, MBBI, DL, TII.get(AArch64::LDRXpre))
       .addReg(AArch64::X18, RegState::Define)
@@ -1755,6 +1755,10 @@ static void emitShadowCallStackEpilogue(const TargetInstrInfo &TII,
       .addImm(-8)
       .setMIFlag(MachineInstr::FrameDestroy);
 
+  if (NeedsWinCFI)
+    BuildMI(MBB, MBBI, DL, TII.get(AArch64::SEH_Nop))
+        .setMIFlag(MachineInstr::FrameDestroy);
+
   if (MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF))
     CFIInstBuilder(MBB, MBBI, MachineInstr::FrameDestroy)
         .buildRestore(AArch64::X18);
@@ -1899,13 +1903,15 @@ void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
       BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
           .setMIFlag(MachineInstr::FrameSetup);
     }
-    if (NeedsWinCFI)
-      HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
+    // AArch64PointerAuth pass will insert SEH_PACSignLR
+    HasWinCFI |= NeedsWinCFI;
   }
 
-  if (MFnI.needsShadowCallStackPrologueEpilogue(MF))
+  if (MFnI.needsShadowCallStackPrologueEpilogue(MF)) {
     emitShadowCallStackPrologue(*TII, MF, MBB, MBBI, DL, NeedsWinCFI,
                                 MFnI.needsDwarfUnwindInfo(MF));
+    HasWinCFI |= NeedsWinCFI;
+  }
 
   if (EmitCFI && MFnI.isMTETagged()) {
     BuildMI(MBB, MBBI, DL, TII->get(AArch64::EMITMTETAGGED))
@@ -1990,8 +1996,13 @@ void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
            "unexpected function without stack frame but with SVE objects");
     // All of the stack allocation is for locals.
     AFI->setLocalStackSize(NumBytes);
-    if (!NumBytes)
+    if (!NumBytes) {
+      if (NeedsWinCFI && HasWinCFI) {
+        BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
+            .setMIFlag(MachineInstr::FrameSetup);
+      }
       return;
+    }
     // REDZONE: If the stack size is less than 128 bytes, we don't need
     // to actually allocate.
     if (canUseRedZone(MF)) {
@@ -2460,8 +2471,11 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
   MachineBasicBlock::iterator EpilogStartI = MBB.end();
 
   auto FinishingTouches = make_scope_exit([&]() {
-    if (AFI->needsShadowCallStackPrologueEpilogue(MF))
-      emitShadowCallStackEpilogue(*TII, MF, MBB, MBB.getFirstTerminator(), DL);
+    if (AFI->needsShadowCallStackPrologueEpilogue(MF)) {
+      emitShadowCallStackEpilogue(*TII, MF, MBB, MBB.getFirstTerminator(), DL,
+                                  NeedsWinCFI);
+      HasWinCFI |= NeedsWinCFI;
+    }
     if (EmitCFI)
       emitCalleeSavedGPRRestores(MBB, MBB.getFirstTerminator());
     if (AFI->shouldSignReturnAddress(MF)) {
@@ -2472,8 +2486,8 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
                 TII->get(AArch64::PAUTH_EPILOGUE))
             .setMIFlag(MachineInstr::FrameDestroy);
       }
-      if (NeedsWinCFI)
-        HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
+      // AArch64PointerAuth pass will insert SEH_PACSignLR
+      HasWinCFI |= NeedsWinCFI;
     }
     if (HasWinCFI) {
       BuildMI(MBB, MBB.getFirstTerminator(), DL,
diff --git a/llvm/test/CodeGen/AArch64/sign-return-address.ll b/llvm/test/CodeGen/AArch64/sign-return-address.ll
index b0ab4775cb388..a1e819240f888 100644
--- a/llvm/test/CodeGen/AArch64/sign-return-address.ll
+++ b/llvm/test/CodeGen/AArch64/sign-return-address.ll
@@ -5,6 +5,11 @@
 ; v9.5-A is not expected to change codegen without -mbranch-protection=+pc, so reuse V83A.
 ; RUN: llc -mtriple=aarch64 -mattr=v9.5a < %s | FileCheck --check-prefixes=CHECK,V83A %s
 
+; Make sure no errors are detected when emitting SEH opcodes.
+; Errors are only checked for when generating a binary, so emit a dummy object
+; file and make sure llc produces zero exit code.
+; RUN: llc -mtriple=aarch64-windows -o %t.dummy.o --filetype=obj < %s
+
 define i32 @leaf(i32 %x) {
 ; CHECK-LABEL: leaf:
 ; CHECK:       // %bb.0:

atrosinenko · 2025-07-11T11:15:29Z

@efriedma-quic Added FileCheck assertion lines to sign-return-address.ll in 8283078. I'm not very familiar with WinCFI, but the output looks sensible (except that I'm not sure if signing LR with IA vs. IB key should influence the opcodes).

efriedma-quic

LGTM

llvm-ci · 2025-07-12T10:54:58Z

LLVM Buildbot has detected a new failure on builder mlir-nvidia running on mlir-nvidia while building llvm at step 7 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/138/builds/15984

Here is the relevant piece of the build log for the reference

Step 7 (test-build-check-mlir-build-only-check-mlir) failure: test (failure)
******************** TEST 'MLIR :: Integration/GPU/CUDA/async.mlir' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-kernel-outlining  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary="format=fatbin"  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-runner    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_cuda_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_async_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_runner_utils.so    --entry-point-result=void -O0  | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-kernel-outlining
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt '-pass-pipeline=builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary=format=fatbin
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-runner --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_cuda_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_async_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_runner_utils.so --entry-point-result=void -O0
# .---command stderr------------
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventSynchronize(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# `-----------------------------
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# .---command stderr------------
# | /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir:68:12: error: CHECK: expected string not found in input
# |  // CHECK: [84, 84]
# |            ^
# | <stdin>:1:1: note: scanning from here
# | Unranked Memref base@ = 0x582b625968b0 rank = 1 offset = 0 sizes = [2] strides = [1] data = 
# | ^
# | <stdin>:2:1: note: possible intended match here
# | [42, 42]
# | ^
# | 
# | Input file: <stdin>
# | Check file: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Unranked Memref base@ = 0x582b625968b0 rank = 1 offset = 0 sizes = [2] strides = [1] data =  
# | check:68'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |             2: [42, 42] 
# | check:68'0     ~~~~~~~~~
# | check:68'1     ?         possible intended match
...

atrosinenko mentioned this pull request Jul 8, 2025

[AArch64][PAC] Protect the entire function if pac-ret+leaf is requested #140895

Merged

atrosinenko requested review from DanielKristofKiss, asl and efriedma-quic July 8, 2025 13:26

atrosinenko force-pushed the users/atrosinenko/pauth-wincfi-fixes branch from ef5d3b9 to a36ad29 Compare July 9, 2025 16:06

atrosinenko marked this pull request as ready for review July 9, 2025 16:46

llvmbot added the backend:AArch64 label Jul 9, 2025

atrosinenko added 2 commits July 11, 2025 13:54

Add CHECK lines for WinCFI assembly output

8283078

Add DWARFCFI- to COMPAT and V83A check lines

030554d

efriedma-quic approved these changes Jul 11, 2025

View reviewed changes

atrosinenko merged commit b717568 into main Jul 12, 2025
9 checks passed

atrosinenko deleted the users/atrosinenko/pauth-wincfi-fixes branch July 12, 2025 10:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64] Fix the emission of WinCFI for pac-ret+leaf and SCS #147518

[AArch64] Fix the emission of WinCFI for pac-ret+leaf and SCS #147518

Uh oh!

atrosinenko commented Jul 8, 2025

Uh oh!

efriedma-quic commented Jul 8, 2025

Uh oh!

atrosinenko commented Jul 9, 2025

Uh oh!

llvmbot commented Jul 9, 2025

Uh oh!

atrosinenko commented Jul 11, 2025 •

edited

Loading

Uh oh!

efriedma-quic left a comment

Uh oh!

Uh oh!

llvm-ci commented Jul 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[AArch64] Fix the emission of WinCFI for pac-ret+leaf and SCS #147518

[AArch64] Fix the emission of WinCFI for pac-ret+leaf and SCS #147518

Uh oh!

Conversation

atrosinenko commented Jul 8, 2025

Uh oh!

efriedma-quic commented Jul 8, 2025

Uh oh!

atrosinenko commented Jul 9, 2025

Uh oh!

llvmbot commented Jul 9, 2025

Uh oh!

atrosinenko commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

efriedma-quic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Jul 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

atrosinenko commented Jul 11, 2025 •

edited

Loading