Skip to content

Conversation

@sdesmalen-arm
Copy link
Collaborator

@sdesmalen-arm sdesmalen-arm commented Sep 2, 2025

To reduce the number of combinations to support, always use the same
prologue/epilogue lowering on windows regardless of whether unwind info is
required.

This also fixes an issue where a function with SVE callee-saves and
nounwind led to a compilation failure, because the SVE lowering makes
assumptions that only hold when using the Windows style prologue/epilogue.

Copy link
Member

@MacDue MacDue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'd wait for someone that works on the Windows side to approve.

"code makes the assumption that CFI is available" => "makes assumptions that only hold when NeedsWinCFI is true"? AFAIK, it's not dependent on the CFI, but changes that only happen when using Windows CFI (such as reversing the callee-saves).

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in practice needsWinCFI will almost always be true on Windows; clang will add "uwtable" attributes.

In order to reduce the number of different configurations, I think I'd prefer to always use Windows-style prologue/epilogue lowering on Windows, even if it isn't strictly required.

@llvmbot
Copy link
Member

llvmbot commented Sep 3, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

Changes

Without this change, functions with 'nounwind' don't compile (correctly), because the frame-lowering code makes the assumption that CFI is available when the function has SVE callee-saves.


Patch is 37.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/156467.diff

7 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64FrameLowering.cpp (+47-46)
  • (modified) llvm/test/CodeGen/AArch64/arm64ec-reservedregs.ll (+38-38)
  • (modified) llvm/test/CodeGen/AArch64/framelayout-sve-win.mir (+28)
  • (modified) llvm/test/CodeGen/AArch64/llvm.frexp.ll (+24-24)
  • (modified) llvm/test/CodeGen/AArch64/preserve_nonecc_varargs_win64.ll (+20-20)
  • (modified) llvm/test/CodeGen/AArch64/swift-async-win.ll (+6-6)
  • (modified) llvm/test/CodeGen/AArch64/win64-no-uwtable.ll (+16-15)
diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index 7725fa4f1ccb1..0fe0845b04fdb 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -355,6 +355,27 @@ static bool isLikelyToHaveSVEStack(const MachineFunction &MF) {
   return false;
 }
 
+static bool isTargetWindows(const MachineFunction &MF) {
+  return MF.getSubtarget<AArch64Subtarget>().isTargetWindows();
+}
+
+// Windows unwind can't represent the required stack adjustments if we have
+// both SVE callee-saves and dynamic stack allocations, and the frame
+// pointer is before the SVE spills.  The allocation of the frame pointer
+// must be the last instruction in the prologue so the unwinder can restore
+// the stack pointer correctly. (And there isn't any unwind opcode for
+// `addvl sp, x29, -17`.)
+//
+// Because of this, we do spills in the opposite order on Windows: first SVE,
+// then GPRs. The main side-effect of this is that it makes accessing
+// parameters passed on the stack more expensive.
+//
+// We could consider rearranging the spills for simpler cases.
+static bool hasSVECalleeSavesAboveFrameRecord(const MachineFunction &MF) {
+  auto *AFI = MF.getInfo<AArch64FunctionInfo>();
+  return isTargetWindows(MF) && AFI->getSVECalleeSavedStackSize();
+}
+
 /// Returns true if a homogeneous prolog or epilog code can be emitted
 /// for the size optimization. If possible, a frame helper call is injected.
 /// When Exit block is given, this check is for epilog.
@@ -368,7 +389,7 @@ bool AArch64FrameLowering::homogeneousPrologEpilog(
     return false;
 
   // TODO: Window is supported yet.
-  if (needsWinCFI(MF))
+  if (isTargetWindows(MF))
     return false;
 
   // TODO: SVE is not supported yet.
@@ -1229,7 +1250,7 @@ bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
   // unwind format for functions that both have a local area and callee saved
   // registers. Using the packed unwind format notably reduces the size of
   // the unwind info.
-  if (needsWinCFI(MF) && AFI->getCalleeSavedStackSize() > 0 &&
+  if (isTargetWindows(MF) && AFI->getCalleeSavedStackSize() > 0 &&
       MF.getFunction().hasOptSize())
     return false;
 
@@ -1694,10 +1715,6 @@ static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI,
   }
 }
 
-static bool isTargetWindows(const MachineFunction &MF) {
-  return MF.getSubtarget<AArch64Subtarget>().isTargetWindows();
-}
-
 static unsigned getStackHazardSize(const MachineFunction &MF) {
   return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
 }
@@ -2052,21 +2069,7 @@ void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
   bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
   unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
 
-  // Windows unwind can't represent the required stack adjustments if we have
-  // both SVE callee-saves and dynamic stack allocations, and the frame
-  // pointer is before the SVE spills.  The allocation of the frame pointer
-  // must be the last instruction in the prologue so the unwinder can restore
-  // the stack pointer correctly. (And there isn't any unwind opcode for
-  // `addvl sp, x29, -17`.)
-  //
-  // Because of this, we do spills in the opposite order on Windows: first SVE,
-  // then GPRs. The main side-effect of this is that it makes accessing
-  // parameters passed on the stack more expensive.
-  //
-  // We could consider rearranging the spills for simpler cases.
-  bool FPAfterSVECalleeSaves =
-      Subtarget.isTargetWindows() && AFI->getSVECalleeSavedStackSize();
-
+  bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
   if (FPAfterSVECalleeSaves && AFI->hasStackHazardSlotIndex())
     reportFatalUsageError("SME hazard padding is not supported on Windows");
 
@@ -2566,8 +2569,7 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
     return;
   }
 
-  bool FPAfterSVECalleeSaves =
-      Subtarget.isTargetWindows() && AFI->getSVECalleeSavedStackSize();
+  bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
 
   bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
   // Assume we can't combine the last pop with the sp restore.
@@ -2895,8 +2897,7 @@ AArch64FrameLowering::getFrameIndexReferenceFromSP(const MachineFunction &MF,
     return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
 
   const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
-  bool FPAfterSVECalleeSaves =
-      isTargetWindows(MF) && AFI->getSVECalleeSavedStackSize();
+  bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
   if (MFI.getStackID(FI) == TargetStackID::ScalableVector) {
     if (FPAfterSVECalleeSaves &&
         -ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize())
@@ -3053,8 +3054,7 @@ StackOffset AArch64FrameLowering::resolveFrameOffsetReference(
       "In the presence of dynamic stack pointer realignment, "
       "non-argument/CSR objects cannot be accessed through the frame pointer");
 
-  bool FPAfterSVECalleeSaves =
-      isTargetWindows(MF) && AFI->getSVECalleeSavedStackSize();
+  bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
 
   if (isSVE) {
     StackOffset FPOffset =
@@ -3253,7 +3253,6 @@ static void computeCalleeSaveRegisterPairs(
     return;
 
   bool IsWindows = isTargetWindows(MF);
-  bool NeedsWinCFI = needsWinCFI(MF);
   AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
   unsigned StackHazardSize = getStackHazardSize(MF);
   MachineFrameInfo &MFI = MF.getFrameInfo();
@@ -3270,7 +3269,7 @@ static void computeCalleeSaveRegisterPairs(
   int StackFillDir = -1;
   int RegInc = 1;
   unsigned FirstReg = 0;
-  if (NeedsWinCFI) {
+  if (IsWindows) {
     // For WinCFI, fill the stack from the bottom up.
     ByteOffset = 0;
     StackFillDir = 1;
@@ -3279,9 +3278,9 @@ static void computeCalleeSaveRegisterPairs(
     RegInc = -1;
     FirstReg = Count - 1;
   }
-  bool FPAfterSVECalleeSaves = IsWindows && AFI->getSVECalleeSavedStackSize();
-  int ScalableByteOffset =
-      FPAfterSVECalleeSaves ? 0 : AFI->getSVECalleeSavedStackSize();
+  int ScalableByteOffset = hasSVECalleeSavesAboveFrameRecord(MF)
+                               ? 0
+                               : AFI->getSVECalleeSavedStackSize();
   bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
   Register LastReg = 0;
 
@@ -3319,6 +3318,7 @@ static void computeCalleeSaveRegisterPairs(
       ByteOffset += StackFillDir * StackHazardSize;
     LastReg = RPI.Reg1;
 
+    bool NeedsWinCFI = needsWinCFI(MF);
     int Scale = TRI->getSpillSize(*RPI.RC);
     // Add the next reg to the pair if it is in the same register class.
     if (unsigned(i + RegInc) < Count && !AFI->hasStackHazardSlotIndex()) {
@@ -3334,8 +3334,9 @@ static void computeCalleeSaveRegisterPairs(
         break;
       case RegPairInfo::FPR64:
         if (AArch64::FPR64RegClass.contains(NextReg) &&
-            !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
-                                              IsFirst, TRI))
+            !invalidateRegisterPairing(
+                RPI.Reg1, NextReg, IsWindows, NeedsWinCFI,
+                /*NeedsFrameRecord=*/false, IsFirst, TRI))
           RPI.Reg2 = NextReg;
         break;
       case RegPairInfo::FPR128:
@@ -3389,7 +3390,7 @@ static void computeCalleeSaveRegisterPairs(
            "Callee-save registers not saved as adjacent register pair!");
 
     RPI.FrameIdx = CSI[i].getFrameIdx();
-    if (NeedsWinCFI &&
+    if (IsWindows &&
         RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
       RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
 
@@ -3416,7 +3417,7 @@ static void computeCalleeSaveRegisterPairs(
 
     // Round up size of non-pair to pair size if we need to pad the
     // callee-save area to ensure 16-byte alignment.
-    if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
+    if (NeedGapToAlignStack && !IsWindows && !RPI.isScalable() &&
         RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
         ByteOffset % 16 != 0) {
       ByteOffset += 8 * StackFillDir;
@@ -3432,7 +3433,7 @@ static void computeCalleeSaveRegisterPairs(
     assert(OffsetPost % Scale == 0);
     // If filling top down (default), we want the offset after incrementing it.
     // If filling bottom up (WinCFI) we need the original offset.
-    int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
+    int Offset = IsWindows ? OffsetPre : OffsetPost;
 
     // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
     // Swift context can directly precede FP.
@@ -3471,7 +3472,7 @@ static void computeCalleeSaveRegisterPairs(
     if (RPI.isPaired())
       i += RegInc;
   }
-  if (NeedsWinCFI) {
+  if (IsWindows) {
     // If we need an alignment gap in the stack, align the topmost stack
     // object. A stack frame with a gap looks like this, bottom up:
     // x19, d8. d9, gap.
@@ -3606,14 +3607,15 @@ bool AArch64FrameLowering::spillCalleeSavedRegisters(
                if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
                dbgs() << ")\n");
 
-    assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
+    assert((!isTargetWindows(MF) ||
+            !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
            "Windows unwdinding requires a consecutive (FP,LR) pair");
     // Windows unwind codes require consecutive registers if registers are
     // paired.  Make the switch here, so that the code below will save (x,x+1)
     // and not (x+1,x).
     unsigned FrameIdxReg1 = RPI.FrameIdx;
     unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
-    if (NeedsWinCFI && RPI.isPaired()) {
+    if (isTargetWindows(MF) && RPI.isPaired()) {
       std::swap(Reg1, Reg2);
       std::swap(FrameIdxReg1, FrameIdxReg2);
     }
@@ -3775,7 +3777,7 @@ bool AArch64FrameLowering::restoreCalleeSavedRegisters(
     // and not (x+1,x).
     unsigned FrameIdxReg1 = RPI.FrameIdx;
     unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
-    if (NeedsWinCFI && RPI.isPaired()) {
+    if (isTargetWindows(MF) && RPI.isPaired()) {
       std::swap(Reg1, Reg2);
       std::swap(FrameIdxReg1, FrameIdxReg2);
     }
@@ -4191,14 +4193,14 @@ bool AArch64FrameLowering::assignCalleeSavedSpillSlots(
     MachineFunction &MF, const TargetRegisterInfo *RegInfo,
     std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
     unsigned &MaxCSFrameIndex) const {
-  bool NeedsWinCFI = needsWinCFI(MF);
+  bool IsWindows = isTargetWindows(MF);
   unsigned StackHazardSize = getStackHazardSize(MF);
   // To match the canonical windows frame layout, reverse the list of
   // callee saved registers to get them laid out by PrologEpilogInserter
   // in the right order. (PrologEpilogInserter allocates stack objects top
   // down. Windows canonical prologs store higher numbered registers at
   // the top, thus have the CSI array start from the highest registers.)
-  if (NeedsWinCFI)
+  if (IsWindows)
     std::reverse(CSI.begin(), CSI.end());
 
   if (CSI.empty())
@@ -4209,8 +4211,7 @@ bool AArch64FrameLowering::assignCalleeSavedSpillSlots(
   MachineFrameInfo &MFI = MF.getFrameInfo();
   auto *AFI = MF.getInfo<AArch64FunctionInfo>();
 
-  bool UsesWinAAPCS = isTargetWindows(MF);
-  if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
+  if (IsWindows && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
     int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
     AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
     if ((unsigned)FrameIdx < MinCSFrameIndex)
@@ -4263,7 +4264,7 @@ bool AArch64FrameLowering::assignCalleeSavedSpillSlots(
       MaxCSFrameIndex = FrameIdx;
 
     // Grab 8 bytes below FP for the extended asynchronous frame info.
-    if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
+    if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !IsWindows &&
         Reg == AArch64::FP) {
       FrameIdx = MFI.CreateStackObject(8, Alignment, true);
       AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
diff --git a/llvm/test/CodeGen/AArch64/arm64ec-reservedregs.ll b/llvm/test/CodeGen/AArch64/arm64ec-reservedregs.ll
index 91ec870dd6d0c..4e6cbccb1a5ae 100644
--- a/llvm/test/CodeGen/AArch64/arm64ec-reservedregs.ll
+++ b/llvm/test/CodeGen/AArch64/arm64ec-reservedregs.ll
@@ -9,20 +9,20 @@
 define i32 @no_int_regs(i32 %x) nounwind {
 ; CHECK-LABEL: no_int_regs:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    stp x30, x29, [sp, #-80]! // 16-byte Folded Spill
-; CHECK-NEXT:    str x27, [sp, #16] // 8-byte Folded Spill
-; CHECK-NEXT:    stp x26, x25, [sp, #32] // 16-byte Folded Spill
-; CHECK-NEXT:    stp x22, x21, [sp, #48] // 16-byte Folded Spill
-; CHECK-NEXT:    stp x20, x19, [sp, #64] // 16-byte Folded Spill
-; CHECK-NEXT:    str w0, [sp, #28] // 4-byte Folded Spill
+; CHECK-NEXT:    stp x19, x20, [sp, #-80]! // 16-byte Folded Spill
+; CHECK-NEXT:    stp x21, x22, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT:    stp x25, x26, [sp, #32] // 16-byte Folded Spill
+; CHECK-NEXT:    str x27, [sp, #48] // 8-byte Folded Spill
+; CHECK-NEXT:    stp x29, x30, [sp, #56] // 16-byte Folded Spill
+; CHECK-NEXT:    str w0, [sp, #76] // 4-byte Folded Spill
 ; CHECK-NEXT:    //APP
 ; CHECK-NEXT:    //NO_APP
-; CHECK-NEXT:    ldp x20, x19, [sp, #64] // 16-byte Folded Reload
-; CHECK-NEXT:    ldr w0, [sp, #28] // 4-byte Folded Reload
-; CHECK-NEXT:    ldp x22, x21, [sp, #48] // 16-byte Folded Reload
-; CHECK-NEXT:    ldr x27, [sp, #16] // 8-byte Folded Reload
-; CHECK-NEXT:    ldp x26, x25, [sp, #32] // 16-byte Folded Reload
-; CHECK-NEXT:    ldp x30, x29, [sp], #80 // 16-byte Folded Reload
+; CHECK-NEXT:    ldp x29, x30, [sp, #56] // 16-byte Folded Reload
+; CHECK-NEXT:    ldr w0, [sp, #76] // 4-byte Folded Reload
+; CHECK-NEXT:    ldp x25, x26, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT:    ldr x27, [sp, #48] // 8-byte Folded Reload
+; CHECK-NEXT:    ldp x21, x22, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp x19, x20, [sp], #80 // 16-byte Folded Reload
 ; CHECK-NEXT:    ret
 entry:
   tail call void asm sideeffect "", "~{x0},~{x1},~{x2},~{x3},~{x4},~{x5},~{x6},~{x7},~{x8},~{x9},~{x10},~{x11},~{x12},~{x15},~{x16},~{x17},~{x19},~{x20},~{x21},~{x22},~{x25},~{x26},~{x27},~{fp},~{lr}"()
@@ -32,20 +32,20 @@ entry:
 define i32 @one_int_reg(i32 %x) nounwind {
 ; CHECK-LABEL: one_int_reg:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    stp x30, x29, [sp, #-80]! // 16-byte Folded Spill
-; CHECK-NEXT:    str x27, [sp, #16] // 8-byte Folded Spill
+; CHECK-NEXT:    stp x19, x20, [sp, #-80]! // 16-byte Folded Spill
+; CHECK-NEXT:    stp x21, x22, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT:    stp x25, x26, [sp, #32] // 16-byte Folded Spill
+; CHECK-NEXT:    str x27, [sp, #48] // 8-byte Folded Spill
+; CHECK-NEXT:    stp x29, x30, [sp, #56] // 16-byte Folded Spill
 ; CHECK-NEXT:    mov w30, w0
-; CHECK-NEXT:    stp x26, x25, [sp, #32] // 16-byte Folded Spill
-; CHECK-NEXT:    stp x22, x21, [sp, #48] // 16-byte Folded Spill
-; CHECK-NEXT:    stp x20, x19, [sp, #64] // 16-byte Folded Spill
 ; CHECK-NEXT:    //APP
 ; CHECK-NEXT:    //NO_APP
-; CHECK-NEXT:    ldp x20, x19, [sp, #64] // 16-byte Folded Reload
-; CHECK-NEXT:    ldr x27, [sp, #16] // 8-byte Folded Reload
-; CHECK-NEXT:    ldp x22, x21, [sp, #48] // 16-byte Folded Reload
 ; CHECK-NEXT:    mov w0, w30
-; CHECK-NEXT:    ldp x26, x25, [sp, #32] // 16-byte Folded Reload
-; CHECK-NEXT:    ldp x30, x29, [sp], #80 // 16-byte Folded Reload
+; CHECK-NEXT:    ldp x29, x30, [sp, #56] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp x25, x26, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT:    ldr x27, [sp, #48] // 8-byte Folded Reload
+; CHECK-NEXT:    ldp x21, x22, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp x19, x20, [sp], #80 // 16-byte Folded Reload
 ; CHECK-NEXT:    ret
 entry:
   tail call void asm sideeffect "", "~{x0},~{x1},~{x2},~{x3},~{x4},~{x5},~{x6},~{x7},~{x8},~{x9},~{x10},~{x11},~{x12},~{x15},~{x16},~{x17},~{x19},~{x20},~{x21},~{x22},~{x25},~{x26},~{x27},~{fp}"()
@@ -56,18 +56,18 @@ define float @no_float_regs(float %x) nounwind {
 ; CHECK-LABEL: no_float_regs:
 ; CHECK:       // %bb.0: // %entry
 ; CHECK-NEXT:    sub sp, sp, #80
-; CHECK-NEXT:    stp d15, d14, [sp, #16] // 16-byte Folded Spill
-; CHECK-NEXT:    stp d13, d12, [sp, #32] // 16-byte Folded Spill
-; CHECK-NEXT:    stp d11, d10, [sp, #48] // 16-byte Folded Spill
-; CHECK-NEXT:    stp d9, d8, [sp, #64] // 16-byte Folded Spill
+; CHECK-NEXT:    stp d8, d9, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT:    stp d10, d11, [sp, #32] // 16-byte Folded Spill
+; CHECK-NEXT:    stp d12, d13, [sp, #48] // 16-byte Folded Spill
+; CHECK-NEXT:    stp d14, d15, [sp, #64] // 16-byte Folded Spill
 ; CHECK-NEXT:    str s0, [sp, #12] // 4-byte Folded Spill
 ; CHECK-NEXT:    //APP
 ; CHECK-NEXT:    //NO_APP
-; CHECK-NEXT:    ldp d9, d8, [sp, #64] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp d14, d15, [sp, #64] // 16-byte Folded Reload
 ; CHECK-NEXT:    ldr s0, [sp, #12] // 4-byte Folded Reload
-; CHECK-NEXT:    ldp d11, d10, [sp, #48] // 16-byte Folded Reload
-; CHECK-NEXT:    ldp d13, d12, [sp, #32] // 16-byte Folded Reload
-; CHECK-NEXT:    ldp d15, d14, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp d12, d13, [sp, #48] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp d10, d11, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp d8, d9, [sp, #16] // 16-byte Folded Reload
 ; CHECK-NEXT:    add sp, sp, #80
 ; CHECK-NEXT:    ret
 entry:
@@ -78,18 +78,18 @@ entry:
 define float @one_float_reg(float %x) nounwind {
 ; CHECK-LABEL: one_float_reg:
 ; CHECK:       // %bb.0: // %entry
-; CHECK-NEXT:    stp d15, d14, [sp, #-64]! // 16-byte Folded Spill
+; CHECK-NEXT:    stp d8, d9, [sp, #-64]! // 16-byte Folded Spill
+; CHECK-NEXT:    stp d14, d15, [sp, #48] // 16-byte Folded Spill
 ; CHECK-NEXT:    fmov s15, s0
-; CHECK-NEXT:    stp d13, d12, [sp, #16] // 16-byte Folded Spill
-; CHECK-NEXT:    stp d11, d10, [sp, #32] // 16-byte Folded Spill
-; CHECK-NEXT:    stp d9, d8, [sp, #48] // 16-byte Folded Spill
+; CHECK-NEXT:    stp d10, d11, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT:    stp d12, d13, [sp, #32] // 16-byte Folded Spill
 ; CHECK-NEXT:    //APP
 ; CHECK-NEXT:    //NO_APP
-; CHECK-NEXT:    ldp d9, d8, [sp, #48] // 16-byte Folded Reload
-; CHECK-NEXT:    ldp d11, d10, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp d12, d13, [sp, #32] // 16-byte Folded Reload
 ; CHECK-NEXT:    fmov s0, s15
-; CHECK-NEXT:    ldp d13, d12, [sp, #16] // 16-byte Folded Reload
-; CHECK-NEXT:    ldp d15, d14, [sp], #64 // 16-byte Folded Reload
+; CHECK-NEXT:    ldp d14, d15, [sp, #48] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp d10, d11, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT:    ldp d8, d9, [sp], #64 // 16-byte Folded Reload
 ; CHECK-NEXT:    ret
 entry:
   tail call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14}"()
diff --git a/llvm/test/CodeGen/AArch64/framelayout-sve-win.mir b/llvm/test/CodeGen/AArch64/framelayout-sve-win.mir
index 5933c5daa67ed..d07d377f4cda2 100644
--- a/llvm/test/CodeGen/AArch64/framelayout-sve-win.mir
+++ b/llvm/test/CodeGen/AArch64/framelayout-sve-win.mir
@@ -17,6 +17,7 @@
   define aarch64_sve_vector_pcs void @save_restore_sve() uwtable { entry: unreachable }
   define aarch64_sve_vector_pcs void @save_restore_sve_realign() uwtable { entry: unreachable }
   define aarch64_sve_vector_pcs void @frame_layout() uwtable { entry: unreachable }
+  define aarch64_sve_vector_pcs void @test_nounwind_layout() nounwind { entry: unreachable }
 ...
 ---
 name:            test_allocate_sve
@@ -892,3 +893,30 @@ body:             |
 
     RET_ReallyLR
 ...
+---
+name:            test_nounwind_layout
+stack:
+body:             |
+  bb.0.entry:
+    ; CHECK-LABEL: name: test_nounwind_layout
+    ; CHECK: fixedStack:
+    ; CHECK: liveins: $p8, $z8, $lr, $x20
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: $sp = frame-setup ADDVL_XXI $sp, -2, implicit $vg
+    ; CHECK-NEXT: frame-setup STR_PXI killed $p8, $sp, 0 :: (store (s16) into %stack.3)
+    ; CHECK-NEXT: frame-setup STR_ZXI killed $z8, $sp, 1 :: (store (s128) into %stack.2)
+    ; CHECK-NEXT: early-clobber $sp = frame-setup STPXpre killed $x20, killed $lr, $sp, -2 :: (store (s64) into %stack.0), (...
[truncated]

@sdesmalen-arm
Copy link
Collaborator Author

I think in practice needsWinCFI will almost always be true on Windows; clang will add "uwtable" attributes.

In order to reduce the number of different configurations, I think I'd prefer to always use Windows-style prologue/epilogue lowering on Windows, even if it isn't strictly required.

Before I update the commit message, I just want to check that this is what you intended?

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks right.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe should use MF.getTarget().getMCAsmInfo()->usesWindowsCFI() to check for "Windows" in this context, though. UEFI targets use Windows unwind, but technically aren't Windows.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with fixed commit message

Copy link
Member

@MacDue MacDue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM ~a couple of minor comments.

Comment on lines 3337 to 3339
!invalidateRegisterPairing(
RPI.Reg1, NextReg, IsWindows, NeedsWinCFI,
/*NeedsFrameRecord=*/false, IsFirst, TRI))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to change this to invalidateRegisterPairing? IIUC the only difference is the NeedsFrameRecord case (which is set to false here).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just seemed odd to handle this differently because of Windows, as there might also be other reasons to want to invalidate this register pairing. I probably shouldn't have set NeedsFrameRecord to false though. FWIW the change is NFC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like sme-lazy-save-windows.ll has changed now that we pass NeedsFrameRecord (it's now forcing x29, x30 for the frame record, rather than x30, x29).

@sdesmalen-arm sdesmalen-arm changed the title [AArch64] Fix SVE callee-save layout for nounwind functions on Windows [AArch64] Use Windows-style prologue/epilogue regardless of CFI. Sep 10, 2025
@efriedma-quic
Copy link
Collaborator

Ping

sdesmalen-arm and others added 5 commits November 3, 2025 10:01
@MacDue MacDue force-pushed the fix-win-sve-nounwind branch from 848855f to 57af3d2 Compare November 3, 2025 10:06
@MacDue
Copy link
Member

MacDue commented Nov 3, 2025

Ping

I've pushed a rebase to this PR (to account for the recent frame lowering changes). I also updated some tests (but they still seem okay to me).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants