[AArch64] Use Windows-style prologue/epilogue regardless of CFI. #156467

sdesmalen-arm · 2025-09-02T14:50:17Z

To reduce the number of combinations to support, always use the same
prologue/epilogue lowering on windows regardless of whether unwind info is
required.

This also fixes an issue where a function with SVE callee-saves and
nounwind led to a compilation failure, because the SVE lowering makes
assumptions that only hold when using the Windows style prologue/epilogue.

MacDue

LGTM, but I'd wait for someone that works on the Windows side to approve.

"code makes the assumption that CFI is available" => "makes assumptions that only hold when NeedsWinCFI is true"? AFAIK, it's not dependent on the CFI, but changes that only happen when using Windows CFI (such as reversing the callee-saves).

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp

efriedma-quic

I think in practice needsWinCFI will almost always be true on Windows; clang will add "uwtable" attributes.

In order to reduce the number of different configurations, I think I'd prefer to always use Windows-style prologue/epilogue lowering on Windows, even if it isn't strictly required.

llvmbot · 2025-09-03T15:49:17Z

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

Changes

Without this change, functions with 'nounwind' don't compile (correctly), because the frame-lowering code makes the assumption that CFI is available when the function has SVE callee-saves.

Patch is 37.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/156467.diff

7 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64FrameLowering.cpp (+47-46)
(modified) llvm/test/CodeGen/AArch64/arm64ec-reservedregs.ll (+38-38)
(modified) llvm/test/CodeGen/AArch64/framelayout-sve-win.mir (+28)
(modified) llvm/test/CodeGen/AArch64/llvm.frexp.ll (+24-24)
(modified) llvm/test/CodeGen/AArch64/preserve_nonecc_varargs_win64.ll (+20-20)
(modified) llvm/test/CodeGen/AArch64/swift-async-win.ll (+6-6)
(modified) llvm/test/CodeGen/AArch64/win64-no-uwtable.ll (+16-15)

diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index 7725fa4f1ccb1..0fe0845b04fdb 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -355,6 +355,27 @@ static bool isLikelyToHaveSVEStack(const MachineFunction &MF) {
   return false;
 }
 
+static bool isTargetWindows(const MachineFunction &MF) {
+  return MF.getSubtarget<AArch64Subtarget>().isTargetWindows();
+}
+
+// Windows unwind can't represent the required stack adjustments if we have
+// both SVE callee-saves and dynamic stack allocations, and the frame
+// pointer is before the SVE spills.  The allocation of the frame pointer
+// must be the last instruction in the prologue so the unwinder can restore
+// the stack pointer correctly. (And there isn't any unwind opcode for
+// `addvl sp, x29, -17`.)
+//
+// Because of this, we do spills in the opposite order on Windows: first SVE,
+// then GPRs. The main side-effect of this is that it makes accessing
+// parameters passed on the stack more expensive.
+//
+// We could consider rearranging the spills for simpler cases.
+static bool hasSVECalleeSavesAboveFrameRecord(const MachineFunction &MF) {
+  auto *AFI = MF.getInfo<AArch64FunctionInfo>();
+  return isTargetWindows(MF) && AFI->getSVECalleeSavedStackSize();
+}
+
 /// Returns true if a homogeneous prolog or epilog code can be emitted
 /// for the size optimization. If possible, a frame helper call is injected.
 /// When Exit block is given, this check is for epilog.
@@ -368,7 +389,7 @@ bool AArch64FrameLowering::homogeneousPrologEpilog(
     return false;
 
   // TODO: Window is supported yet.
-  if (needsWinCFI(MF))
+  if (isTargetWindows(MF))
     return false;
 
   // TODO: SVE is not supported yet.
@@ -1229,7 +1250,7 @@ bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
   // unwind format for functions that both have a local area and callee saved
   // registers. Using the packed unwind format notably reduces the size of
   // the unwind info.
-  if (needsWinCFI(MF) && AFI->getCalleeSavedStackSize() > 0 &&
+  if (isTargetWindows(MF) && AFI->getCalleeSavedStackSize() > 0 &&
       MF.getFunction().hasOptSize())
     return false;
 
@@ -1694,10 +1715,6 @@ static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI,
   }
 }
 
-static bool isTargetWindows(const MachineFunction &MF) {
-  return MF.getSubtarget<AArch64Subtarget>().isTargetWindows();
-}
-
 static unsigned getStackHazardSize(const MachineFunction &MF) {
   return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
 }
@@ -2052,21 +2069,7 @@ void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
   bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
   unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
 
-  // Windows unwind can't represent the required stack adjustments if we have
-  // both SVE callee-saves and dynamic stack allocations, and the frame
-  // pointer is before the SVE spills.  The allocation of the frame pointer
-  // must be the last instruction in the prologue so the unwinder can restore
-  // the stack pointer correctly. (And there isn't any unwind opcode for
-  // `addvl sp, x29, -17`.)
-  //
-  // Because of this, we do spills in the opposite order on Windows: first SVE,
-  // then GPRs. The main side-effect of this is that it makes accessing
-  // parameters passed on the stack more expensive.
-  //
-  // We could consider rearranging the spills for simpler cases.
-  bool FPAfterSVECalleeSaves =
-      Subtarget.isTargetWindows() && AFI->getSVECalleeSavedStackSize();
-
+  bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
   if (FPAfterSVECalleeSaves && AFI->hasStackHazardSlotIndex())
     reportFatalUsageError("SME hazard padding is not supported on Windows");
 
@@ -2566,8 +2569,7 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
     return;
   }
 
-  bool FPAfterSVECalleeSaves =
-      Subtarget.isTargetWindows() && AFI->getSVECalleeSavedStackSize();
+  bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
 
   bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
   // Assume we can't combine the last pop with the sp restore.
@@ -2895,8 +2897,7 @@ AArch64FrameLowering::getFrameIndexReferenceFromSP(const MachineFunction &MF,
     return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
 
   const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
-  bool FPAfterSVECalleeSaves =
-      isTargetWindows(MF) && AFI->getSVECalleeSavedStackSize();
+  bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
   if (MFI.getStackID(FI) == TargetStackID::ScalableVector) {
     if (FPAfterSVECalleeSaves &&
         -ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize())
@@ -3053,8 +3054,7 @@ StackOffset AArch64FrameLowering::resolveFrameOffsetReference(
       "In the presence of dynamic stack pointer realignment, "
       "non-argument/CSR objects cannot be accessed through the frame pointer");
 
-  bool FPAfterSVECalleeSaves =
-      isTargetWindows(MF) && AFI->getSVECalleeSavedStackSize();
+  bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
 
   if (isSVE) {
     StackOffset FPOffset =
@@ -3253,7 +3253,6 @@ static void computeCalleeSaveRegisterPairs(
     return;
 
   bool IsWindows = isTargetWindows(MF);
-  bool NeedsWinCFI = needsWinCFI(MF);
   AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
   unsigned StackHazardSize = getStackHazardSize(MF);
   MachineFrameInfo &MFI = MF.getFrameInfo();
@@ -3270,7 +3269,7 @@ static void computeCalleeSaveRegisterPairs(
   int StackFillDir = -1;
   int RegInc = 1;
   unsigned FirstReg = 0;
-  if (NeedsWinCFI) {
+  if (IsWindows) {
     // For WinCFI, fill the stack from the bottom up.
     ByteOffset = 0;
     StackFillDir = 1;
@@ -3279,9 +3278,9 @@ static void computeCalleeSaveRegisterPairs(
     RegInc = -1;
     FirstReg = Count - 1;
   }
-  bool FPAfterSVECalleeSaves = IsWindows && AFI->getSVECalleeSavedStackSize();
-  int ScalableByteOffset =
-      FPAfterSVECalleeSaves ? 0 : AFI->getSVECalleeSavedStackSize();
+  int ScalableByteOffset = hasSVECalleeSavesAboveFrameRecord(MF)
+                               ? 0
+                               : AFI->getSVECalleeSavedStackSize();
   bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
   Register LastReg = 0;
 
@@ -3319,6 +3318,7 @@ static void computeCalleeSaveRegisterPairs(
       ByteOffset += StackFillDir * StackHazardSize;
     LastReg = RPI.Reg1;
 
+    bool NeedsWinCFI = needsWinCFI(MF);
     int Scale = TRI->getSpillSize(*RPI.RC);
     // Add the next reg to the pair if it is in the same register class.
     if (unsigned(i + RegInc) < Count && !AFI->hasStackHazardSlotIndex()) {
@@ -3334,8 +3334,9 @@ static void computeCalleeSaveRegisterPairs(
         break;
       case RegPairInfo::FPR64:
         if (AArch64::FPR64RegClass.contains(NextReg) &&
-            !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
-                                              IsFirst, TRI))
+            !invalidateRegisterPairing(
+                RPI.Reg1, NextReg, IsWindows, NeedsWinCFI,
+                /*NeedsFrameRecord=*/false, IsFirst, TRI))
           RPI.Reg2 = NextReg;
         break;
       case RegPairInfo::FPR128:
@@ -3389,7 +3390,7 @@ static void computeCalleeSaveRegisterPairs(
            "Callee-save registers not saved as adjacent register pair!");
 
     RPI.FrameIdx = CSI[i].getFrameIdx();
-    if (NeedsWinCFI &&
+    if (IsWindows &&
         RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
       RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
 
@@ -3416,7 +3417,7 @@ static void computeCalleeSaveRegisterPairs(
 
     // Round up size of non-pair to pair size if we need to pad the
     // callee-save area to ensure 16-byte alignment.
-    if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
+    if (NeedGapToAlignStack && !IsWindows && !RPI.isScalable() &&
         RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
         ByteOffset % 16 != 0) {
       ByteOffset += 8 * StackFillDir;
@@ -3432,7 +3433,7 @@ static void computeCalleeSaveRegisterPairs(
     assert(OffsetPost % Scale == 0);
     // If filling top down (default), we want the offset after incrementing it.
     // If filling bottom up (WinCFI) we need the original offset.
-    int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
+    int Offset = IsWindows ? OffsetPre : OffsetPost;
 
     // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
     // Swift context can directly precede FP.
@@ -3471,7 +3472,7 @@ static void computeCalleeSaveRegisterPairs(
     if (RPI.isPaired())
       i += RegInc;
   }
-  if (NeedsWinCFI) {
+  if (IsWindows) {
     // If we need an alignment gap in the stack, align the topmost stack
     // object. A stack frame with a gap looks like this, bottom up:
     // x19, d8. d9, gap.
@@ -3606,14 +3607,15 @@ bool AArch64FrameLowering::spillCalleeSavedRegisters(
                if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
                dbgs() << ")\n");
 
-    assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
+    assert((!isTargetWindows(MF) ||
+            !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
            "Windows unwdinding requires a consecutive (FP,LR) pair");
     // Windows unwind codes require consecutive registers if registers are
     // paired.  Make the switch here, so that the code below will save (x,x+1)
     // and not (x+1,x).
     unsigned FrameIdxReg1 = RPI.FrameIdx;
     unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
-    if (NeedsWinCFI && RPI.isPaired()) {
+    if (isTargetWindows(MF) && RPI.isPaired()) {
       std::swap(Reg1, Reg2);
       std::swap(FrameIdxReg1, FrameIdxReg2);
     }
@@ -3775,7 +3777,7 @@ bool AArch64FrameLowering::restoreCalleeSavedRegisters(
     // and not (x+1,x).
     unsigned FrameIdxReg1 = RPI.FrameIdx;
     unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
-    if (NeedsWinCFI && RPI.isPaired()) {
+    if (isTargetWindows(MF) && RPI.isPaired()) {
       std::swap(Reg1, Reg2);
       std::swap(FrameIdxReg1, FrameIdxReg2);
     }
@@ -4191,14 +4193,14 @@ bool AArch64FrameLowering::assignCalleeSavedSpillSlots(
     MachineFunction &MF, const TargetRegisterInfo *RegInfo,
     std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
     unsigned &MaxCSFrameIndex) const {
-  bool NeedsWinCFI = needsWinCFI(MF);
+  bool IsWindows = isTargetWindows(MF);
   unsigned StackHazardSize = getStackHazardSize(MF);
   // To match the canonical windows frame layout, reverse the list of
   // callee saved registers to get them laid out by PrologEpilogInserter
   // in the right order. (PrologEpilogInserter allocates stack objects top
   // down. Windows canonical prologs store higher numbered registers at
   // the top, thus have the CSI array start from the highest registers.)
-  if (NeedsWinCFI)
+  if (IsWindows)
     std::reverse(CSI.begin(), CSI.end());
 
   if (CSI.empty())
@@ -4209,8 +4211,7 @@ bool AArch64FrameLowering::assignCalleeSavedSpillSlots(
   MachineFrameInfo &MFI = MF.getFrameInfo();
   auto *AFI = MF.getInfo<AArch64FunctionInfo>();
 
-  bool UsesWinAAPCS = isTargetWindows(MF);
-  if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
+  if (IsWindows && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
     int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
     AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
     if ((unsigned)FrameIdx < MinCSFrameIndex)
@@ -4263,7 +4264,7 @@ bool AArch64FrameLowering::assignCalleeSavedSpillSlots(
       MaxCSFrameIndex = FrameIdx;
 
     // Grab 8 bytes below FP for the extended asynchronous frame info.
-    if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
+    if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !IsWindows &&
         Reg == AArch64::FP) {
       FrameIdx = MFI.CreateStackObject(8, Alignment, true);
       AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
diff --git a/llvm/test/CodeGen/AArch64/arm64ec-reservedregs.ll b/llvm/test/CodeGen/AArch64/arm64ec-reservedregs.ll
index 91ec870dd6d0c..4e6cbccb1a5ae 100644
--- a/llvm/test/CodeGen/AArch64/arm64ec-reservedregs.ll
+++ b/llvm/test/CodeGen/AArch64/arm64ec-reservedregs.ll
@@ -9,20 +9,20 @@
 define i32 @no_int_regs(i32 %x) nounwind {
 ; CHECK-LABEL: no_int_regs:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    stp x30, x29, [sp, #-80]! // 16-byte Folded Spill
-; CHECK-NEXT:    str x27, [sp, #16] // 8-byte Folded Spill
-; CHECK-NEXT:    stp x26, x25, [sp, #32] // 16-byte Folded Spill
-; CHECK-NEXT:    stp x22, x21, [sp, #48] // 16-byte Folded Spill
-; CHECK-NEXT:    stp x20, x19, [sp, #64] // 16-byte Folded Spill
-; CHECK-NEXT:    str w0, [sp, #28] // 4-byte Folded Spill
+; CHECK-NEXT:    stp x19, x20, [sp, #-80]! // 16-byte Folded Spill
+; CHECK-NEXT:    stp x21, x22, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT:    stp x25, x26, [sp, #32] // 16-byte Folded Spill
+; CHECK-NEXT:    str x27, [sp, #48] // 8-byte Folded Spill
+; CHECK-NEXT:    stp x29, x30, [sp, #56] // 16-byte Folded Spill
+; CHECK-NEXT:    str w0, [sp, #76] // 4-byte Folded Spill
 ; CHECK-NEXT:    //APP
 ; CHECK-NEXT:    //NO_APP
-; CHECK-NEXT:    ldp x20, x19, [sp, #64] // 16-byte Folded Reload
-; CHECK-NEXT:    ldr w0, [sp, #28] // 4-byte Folded Reload
-; CHECK-NEXT:    ldp x22, x21, [sp, #48] // 16-byte Folded Reload
-; CHECK-NEXT:    ldr x27, [sp, #16] // 8-byte Folded Reload
-; CHECK-NEXT:    ldp x26, x25, [sp, #32] // 16-byte Folded Reload
-; CHECK-NEXT:    ldp x30, x29, [sp], #80 // 16-byte Folded Reload
+; CHECK-NEXT:    ldp x29, x30, [sp, #56] // 16-byte Folded Reload
+; CHECK-NEXT:    ldr w0, [sp, #76] // 4-byte Folded Reload
+; CHECK-NEXT:    ldp x25, x26, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT:    ldr x27, [sp, #48] // 8-byte Folded Reload
+; CHECK-NEXT:    ldp x21, x22, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp x19, x20, [sp], #80 // 16-byte Folded Reload
 ; CHECK-NEXT:    ret
 entry:
   tail call void asm sideeffect "", "~{x0},~{x1},~{x2},~{x3},~{x4},~{x5},~{x6},~{x7},~{x8},~{x9},~{x10},~{x11},~{x12},~{x15},~{x16},~{x17},~{x19},~{x20},~{x21},~{x22},~{x25},~{x26},~{x27},~{fp},~{lr}"()
@@ -32,20 +32,20 @@ entry:
 define i32 @one_int_reg(i32 %x) nounwind {
 ; CHECK-LABEL: one_int_reg:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    stp x30, x29, [sp, #-80]! // 16-byte Folded Spill
-; CHECK-NEXT:    str x27, [sp, #16] // 8-byte Folded Spill
+; CHECK-NEXT:    stp x19, x20, [sp, #-80]! // 16-byte Folded Spill
+; CHECK-NEXT:    stp x21, x22, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT:    stp x25, x26, [sp, #32] // 16-byte Folded Spill
+; CHECK-NEXT:    str x27, [sp, #48] // 8-byte Folded Spill
+; CHECK-NEXT:    stp x29, x30, [sp, #56] // 16-byte Folded Spill
 ; CHECK-NEXT:    mov w30, w0
-; CHECK-NEXT:    stp x26, x25, [sp, #32] // 16-byte Folded Spill
-; CHECK-NEXT:    stp x22, x21, [sp, #48] // 16-byte Folded Spill
-; CHECK-NEXT:    stp x20, x19, [sp, #64] // 16-byte Folded Spill
 ; CHECK-NEXT:    //APP
 ; CHECK-NEXT:    //NO_APP
-; CHECK-NEXT:    ldp x20, x19, [sp, #64] // 16-byte Folded Reload
-; CHECK-NEXT:    ldr x27, [sp, #16] // 8-byte Folded Reload
-; CHECK-NEXT:    ldp x22, x21, [sp, #48] // 16-byte Folded Reload
 ; CHECK-NEXT:    mov w0, w30
-; CHECK-NEXT:    ldp x26, x25, [sp, #32] // 16-byte Folded Reload
-; CHECK-NEXT:    ldp x30, x29, [sp], #80 // 16-byte Folded Reload
+; CHECK-NEXT:    ldp x29, x30, [sp, #56] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp x25, x26, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT:    ldr x27, [sp, #48] // 8-byte Folded Reload
+; CHECK-NEXT:    ldp x21, x22, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp x19, x20, [sp], #80 // 16-byte Folded Reload
 ; CHECK-NEXT:    ret
 entry:
   tail call void asm sideeffect "", "~{x0},~{x1},~{x2},~{x3},~{x4},~{x5},~{x6},~{x7},~{x8},~{x9},~{x10},~{x11},~{x12},~{x15},~{x16},~{x17},~{x19},~{x20},~{x21},~{x22},~{x25},~{x26},~{x27},~{fp}"()
@@ -56,18 +56,18 @@ define float @no_float_regs(float %x) nounwind {
 ; CHECK-LABEL: no_float_regs:
 ; CHECK:       // %bb.0: // %entry
 ; CHECK-NEXT:    sub sp, sp, #80
-; CHECK-NEXT:    stp d15, d14, [sp, #16] // 16-byte Folded Spill
-; CHECK-NEXT:    stp d13, d12, [sp, #32] // 16-byte Folded Spill
-; CHECK-NEXT:    stp d11, d10, [sp, #48] // 16-byte Folded Spill
-; CHECK-NEXT:    stp d9, d8, [sp, #64] // 16-byte Folded Spill
+; CHECK-NEXT:    stp d8, d9, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT:    stp d10, d11, [sp, #32] // 16-byte Folded Spill
+; CHECK-NEXT:    stp d12, d13, [sp, #48] // 16-byte Folded Spill
+; CHECK-NEXT:    stp d14, d15, [sp, #64] // 16-byte Folded Spill
 ; CHECK-NEXT:    str s0, [sp, #12] // 4-byte Folded Spill
 ; CHECK-NEXT:    //APP
 ; CHECK-NEXT:    //NO_APP
-; CHECK-NEXT:    ldp d9, d8, [sp, #64] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp d14, d15, [sp, #64] // 16-byte Folded Reload
 ; CHECK-NEXT:    ldr s0, [sp, #12] // 4-byte Folded Reload
-; CHECK-NEXT:    ldp d11, d10, [sp, #48] // 16-byte Folded Reload
-; CHECK-NEXT:    ldp d13, d12, [sp, #32] // 16-byte Folded Reload
-; CHECK-NEXT:    ldp d15, d14, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp d12, d13, [sp, #48] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp d10, d11, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp d8, d9, [sp, #16] // 16-byte Folded Reload
 ; CHECK-NEXT:    add sp, sp, #80
 ; CHECK-NEXT:    ret
 entry:
@@ -78,18 +78,18 @@ entry:
 define float @one_float_reg(float %x) nounwind {
 ; CHECK-LABEL: one_float_reg:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    stp d15, d14, [sp, #-64]! // 16-byte Folded Spill
+; CHECK-NEXT:    stp d8, d9, [sp, #-64]! // 16-byte Folded Spill
+; CHECK-NEXT:    stp d14, d15, [sp, #48] // 16-byte Folded Spill
 ; CHECK-NEXT:    fmov s15, s0
-; CHECK-NEXT:    stp d13, d12, [sp, #16] // 16-byte Folded Spill
-; CHECK-NEXT:    stp d11, d10, [sp, #32] // 16-byte Folded Spill
-; CHECK-NEXT:    stp d9, d8, [sp, #48] // 16-byte Folded Spill
+; CHECK-NEXT:    stp d10, d11, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT:    stp d12, d13, [sp, #32] // 16-byte Folded Spill
 ; CHECK-NEXT:    //APP
 ; CHECK-NEXT:    //NO_APP
-; CHECK-NEXT:    ldp d9, d8, [sp, #48] // 16-byte Folded Reload
-; CHECK-NEXT:    ldp d11, d10, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp d12, d13, [sp, #32] // 16-byte Folded Reload
 ; CHECK-NEXT:    fmov s0, s15
-; CHECK-NEXT:    ldp d13, d12, [sp, #16] // 16-byte Folded Reload
-; CHECK-NEXT:    ldp d15, d14, [sp], #64 // 16-byte Folded Reload
+; CHECK-NEXT:    ldp d14, d15, [sp, #48] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp d10, d11, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp d8, d9, [sp], #64 // 16-byte Folded Reload
 ; CHECK-NEXT:    ret
 entry:
   tail call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14}"()
diff --git a/llvm/test/CodeGen/AArch64/framelayout-sve-win.mir b/llvm/test/CodeGen/AArch64/framelayout-sve-win.mir
index 5933c5daa67ed..d07d377f4cda2 100644
--- a/llvm/test/CodeGen/AArch64/framelayout-sve-win.mir
+++ b/llvm/test/CodeGen/AArch64/framelayout-sve-win.mir
@@ -17,6 +17,7 @@
   define aarch64_sve_vector_pcs void @save_restore_sve() uwtable { entry: unreachable }
   define aarch64_sve_vector_pcs void @save_restore_sve_realign() uwtable { entry: unreachable }
   define aarch64_sve_vector_pcs void @frame_layout() uwtable { entry: unreachable }
+  define aarch64_sve_vector_pcs void @test_nounwind_layout() nounwind { entry: unreachable }
 ...
 ---
 name:            test_allocate_sve
@@ -892,3 +893,30 @@ body:             |
 
     RET_ReallyLR
 ...
+---
+name:            test_nounwind_layout
+stack:
+body:             |
+  bb.0.entry:
+    ; CHECK-LABEL: name: test_nounwind_layout
+    ; CHECK: fixedStack:
+    ; CHECK: liveins: $p8, $z8, $lr, $x20
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: $sp = frame-setup ADDVL_XXI $sp, -2, implicit $vg
+    ; CHECK-NEXT: frame-setup STR_PXI killed $p8, $sp, 0 :: (store (s16) into %stack.3)
+    ; CHECK-NEXT: frame-setup STR_ZXI killed $z8, $sp, 1 :: (store (s128) into %stack.2)
+    ; CHECK-NEXT: early-clobber $sp = frame-setup STPXpre killed $x20, killed $lr, $sp, -2 :: (store (s64) into %stack.0), (...
[truncated]

sdesmalen-arm · 2025-09-03T15:49:40Z

I think in practice needsWinCFI will almost always be true on Windows; clang will add "uwtable" attributes.

In order to reduce the number of different configurations, I think I'd prefer to always use Windows-style prologue/epilogue lowering on Windows, even if it isn't strictly required.

Before I update the commit message, I just want to check that this is what you intended?

efriedma-quic

Yes, this looks right.

efriedma-quic

Maybe should use MF.getTarget().getMCAsmInfo()->usesWindowsCFI() to check for "Windows" in this context, though. UEFI targets use Windows unwind, but technically aren't Windows.

efriedma-quic

LGTM with fixed commit message

MacDue

Still LGTM ~a couple of minor comments.

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp

MacDue · 2025-09-09T15:12:08Z

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp

+            !invalidateRegisterPairing(
+                RPI.Reg1, NextReg, IsWindows, NeedsWinCFI,
+                /*NeedsFrameRecord=*/false, IsFirst, TRI))


Is there a reason to change this to invalidateRegisterPairing? IIUC the only difference is the NeedsFrameRecord case (which is set to false here).

It just seemed odd to handle this differently because of Windows, as there might also be other reasons to want to invalidate this register pairing. I probably shouldn't have set NeedsFrameRecord to false though. FWIW the change is NFC.

Looks like sme-lazy-save-windows.ll has changed now that we pass NeedsFrameRecord (it's now forcing x29, x30 for the frame record, rather than x30, x29).

efriedma-quic · 2025-10-30T23:13:44Z

Ping

Without this change, functions with 'nounwind' don't compile (correctly), because the frame-lowering code makes the assumption that CFI is available when the function has SVE callee-saves.

MacDue · 2025-11-03T10:07:16Z

Ping

I've pushed a rebase to this PR (to account for the recent frame lowering changes). I also updated some tests (but they still seem okay to me).

sdesmalen-arm requested review from MacDue, dpaoliello and efriedma-quic September 2, 2025 14:50

MacDue approved these changes Sep 2, 2025

View reviewed changes

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp Outdated Show resolved Hide resolved

efriedma-quic reviewed Sep 2, 2025

View reviewed changes

sdesmalen-arm force-pushed the fix-win-sve-nounwind branch from b7e820e to e9c0090 Compare September 3, 2025 15:48

llvmbot added the backend:AArch64 label Sep 3, 2025

efriedma-quic reviewed Sep 3, 2025

View reviewed changes

efriedma-quic reviewed Sep 5, 2025

View reviewed changes

efriedma-quic mentioned this pull request Sep 8, 2025

[AArch64] Break up AArch64FrameLowering::emitPrologue (NFCI) #157485

Merged

MacDue approved these changes Sep 9, 2025

View reviewed changes

sdesmalen-arm changed the title ~~[AArch64] Fix SVE callee-save layout for nounwind functions on Windows~~ [AArch64] Use Windows-style prologue/epilogue regardless of CFI. Sep 10, 2025

sdesmalen-arm and others added 5 commits November 3, 2025 10:01

[AArch64] Fix SVE callee-save layout for nounwind functions on Windows

a57c196

Without this change, functions with 'nounwind' don't compile (correctly), because the frame-lowering code makes the assumption that CFI is available when the function has SVE callee-saves.

Use Windows-style prologue/epilogue even without CFI

a0b416a

Use getMCAsmInfo()->usesWindowsCFI() instead of isTargetWindows()

7823341

Pass needsFrameRecord to invalidateRegisterPairing

481a87a

Update tests

57af3d2

MacDue force-pushed the fix-win-sve-nounwind branch from 848855f to 57af3d2 Compare November 3, 2025 10:06

[AArch64] Use Windows-style prologue/epilogue regardless of CFI. #156467

Are you sure you want to change the base?

[AArch64] Use Windows-style prologue/epilogue regardless of CFI. #156467

Uh oh!

Conversation

sdesmalen-arm commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MacDue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

efriedma-quic left a comment

Choose a reason for hiding this comment

Uh oh!

llvmbot commented Sep 3, 2025

Uh oh!

sdesmalen-arm commented Sep 3, 2025

Uh oh!

efriedma-quic left a comment

Choose a reason for hiding this comment

Uh oh!

efriedma-quic left a comment

Choose a reason for hiding this comment

Uh oh!

efriedma-quic left a comment

Choose a reason for hiding this comment

Uh oh!

MacDue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MacDue Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

sdesmalen-arm Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

MacDue Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

efriedma-quic commented Oct 30, 2025

Uh oh!

MacDue commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sdesmalen-arm commented Sep 2, 2025 •

edited

Loading

MacDue commented Nov 3, 2025 •

edited

Loading