-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[AArch64] Use Windows-style prologue/epilogue regardless of CFI. #156467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
MacDue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I'd wait for someone that works on the Windows side to approve.
"code makes the assumption that CFI is available" => "makes assumptions that only hold when NeedsWinCFI is true"? AFAIK, it's not dependent on the CFI, but changes that only happen when using Windows CFI (such as reversing the callee-saves).
efriedma-quic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in practice needsWinCFI will almost always be true on Windows; clang will add "uwtable" attributes.
In order to reduce the number of different configurations, I think I'd prefer to always use Windows-style prologue/epilogue lowering on Windows, even if it isn't strictly required.
b7e820e to
e9c0090
Compare
|
@llvm/pr-subscribers-backend-aarch64 Author: Sander de Smalen (sdesmalen-arm) ChangesWithout this change, functions with 'nounwind' don't compile (correctly), because the frame-lowering code makes the assumption that CFI is available when the function has SVE callee-saves. Patch is 37.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/156467.diff 7 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index 7725fa4f1ccb1..0fe0845b04fdb 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -355,6 +355,27 @@ static bool isLikelyToHaveSVEStack(const MachineFunction &MF) {
return false;
}
+static bool isTargetWindows(const MachineFunction &MF) {
+ return MF.getSubtarget<AArch64Subtarget>().isTargetWindows();
+}
+
+// Windows unwind can't represent the required stack adjustments if we have
+// both SVE callee-saves and dynamic stack allocations, and the frame
+// pointer is before the SVE spills. The allocation of the frame pointer
+// must be the last instruction in the prologue so the unwinder can restore
+// the stack pointer correctly. (And there isn't any unwind opcode for
+// `addvl sp, x29, -17`.)
+//
+// Because of this, we do spills in the opposite order on Windows: first SVE,
+// then GPRs. The main side-effect of this is that it makes accessing
+// parameters passed on the stack more expensive.
+//
+// We could consider rearranging the spills for simpler cases.
+static bool hasSVECalleeSavesAboveFrameRecord(const MachineFunction &MF) {
+ auto *AFI = MF.getInfo<AArch64FunctionInfo>();
+ return isTargetWindows(MF) && AFI->getSVECalleeSavedStackSize();
+}
+
/// Returns true if a homogeneous prolog or epilog code can be emitted
/// for the size optimization. If possible, a frame helper call is injected.
/// When Exit block is given, this check is for epilog.
@@ -368,7 +389,7 @@ bool AArch64FrameLowering::homogeneousPrologEpilog(
return false;
// TODO: Window is supported yet.
- if (needsWinCFI(MF))
+ if (isTargetWindows(MF))
return false;
// TODO: SVE is not supported yet.
@@ -1229,7 +1250,7 @@ bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
// unwind format for functions that both have a local area and callee saved
// registers. Using the packed unwind format notably reduces the size of
// the unwind info.
- if (needsWinCFI(MF) && AFI->getCalleeSavedStackSize() > 0 &&
+ if (isTargetWindows(MF) && AFI->getCalleeSavedStackSize() > 0 &&
MF.getFunction().hasOptSize())
return false;
@@ -1694,10 +1715,6 @@ static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI,
}
}
-static bool isTargetWindows(const MachineFunction &MF) {
- return MF.getSubtarget<AArch64Subtarget>().isTargetWindows();
-}
-
static unsigned getStackHazardSize(const MachineFunction &MF) {
return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
}
@@ -2052,21 +2069,7 @@ void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
- // Windows unwind can't represent the required stack adjustments if we have
- // both SVE callee-saves and dynamic stack allocations, and the frame
- // pointer is before the SVE spills. The allocation of the frame pointer
- // must be the last instruction in the prologue so the unwinder can restore
- // the stack pointer correctly. (And there isn't any unwind opcode for
- // `addvl sp, x29, -17`.)
- //
- // Because of this, we do spills in the opposite order on Windows: first SVE,
- // then GPRs. The main side-effect of this is that it makes accessing
- // parameters passed on the stack more expensive.
- //
- // We could consider rearranging the spills for simpler cases.
- bool FPAfterSVECalleeSaves =
- Subtarget.isTargetWindows() && AFI->getSVECalleeSavedStackSize();
-
+ bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
if (FPAfterSVECalleeSaves && AFI->hasStackHazardSlotIndex())
reportFatalUsageError("SME hazard padding is not supported on Windows");
@@ -2566,8 +2569,7 @@ void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
return;
}
- bool FPAfterSVECalleeSaves =
- Subtarget.isTargetWindows() && AFI->getSVECalleeSavedStackSize();
+ bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
// Assume we can't combine the last pop with the sp restore.
@@ -2895,8 +2897,7 @@ AArch64FrameLowering::getFrameIndexReferenceFromSP(const MachineFunction &MF,
return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
- bool FPAfterSVECalleeSaves =
- isTargetWindows(MF) && AFI->getSVECalleeSavedStackSize();
+ bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
if (MFI.getStackID(FI) == TargetStackID::ScalableVector) {
if (FPAfterSVECalleeSaves &&
-ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize())
@@ -3053,8 +3054,7 @@ StackOffset AArch64FrameLowering::resolveFrameOffsetReference(
"In the presence of dynamic stack pointer realignment, "
"non-argument/CSR objects cannot be accessed through the frame pointer");
- bool FPAfterSVECalleeSaves =
- isTargetWindows(MF) && AFI->getSVECalleeSavedStackSize();
+ bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
if (isSVE) {
StackOffset FPOffset =
@@ -3253,7 +3253,6 @@ static void computeCalleeSaveRegisterPairs(
return;
bool IsWindows = isTargetWindows(MF);
- bool NeedsWinCFI = needsWinCFI(MF);
AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
unsigned StackHazardSize = getStackHazardSize(MF);
MachineFrameInfo &MFI = MF.getFrameInfo();
@@ -3270,7 +3269,7 @@ static void computeCalleeSaveRegisterPairs(
int StackFillDir = -1;
int RegInc = 1;
unsigned FirstReg = 0;
- if (NeedsWinCFI) {
+ if (IsWindows) {
// For WinCFI, fill the stack from the bottom up.
ByteOffset = 0;
StackFillDir = 1;
@@ -3279,9 +3278,9 @@ static void computeCalleeSaveRegisterPairs(
RegInc = -1;
FirstReg = Count - 1;
}
- bool FPAfterSVECalleeSaves = IsWindows && AFI->getSVECalleeSavedStackSize();
- int ScalableByteOffset =
- FPAfterSVECalleeSaves ? 0 : AFI->getSVECalleeSavedStackSize();
+ int ScalableByteOffset = hasSVECalleeSavesAboveFrameRecord(MF)
+ ? 0
+ : AFI->getSVECalleeSavedStackSize();
bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
Register LastReg = 0;
@@ -3319,6 +3318,7 @@ static void computeCalleeSaveRegisterPairs(
ByteOffset += StackFillDir * StackHazardSize;
LastReg = RPI.Reg1;
+ bool NeedsWinCFI = needsWinCFI(MF);
int Scale = TRI->getSpillSize(*RPI.RC);
// Add the next reg to the pair if it is in the same register class.
if (unsigned(i + RegInc) < Count && !AFI->hasStackHazardSlotIndex()) {
@@ -3334,8 +3334,9 @@ static void computeCalleeSaveRegisterPairs(
break;
case RegPairInfo::FPR64:
if (AArch64::FPR64RegClass.contains(NextReg) &&
- !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
- IsFirst, TRI))
+ !invalidateRegisterPairing(
+ RPI.Reg1, NextReg, IsWindows, NeedsWinCFI,
+ /*NeedsFrameRecord=*/false, IsFirst, TRI))
RPI.Reg2 = NextReg;
break;
case RegPairInfo::FPR128:
@@ -3389,7 +3390,7 @@ static void computeCalleeSaveRegisterPairs(
"Callee-save registers not saved as adjacent register pair!");
RPI.FrameIdx = CSI[i].getFrameIdx();
- if (NeedsWinCFI &&
+ if (IsWindows &&
RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
@@ -3416,7 +3417,7 @@ static void computeCalleeSaveRegisterPairs(
// Round up size of non-pair to pair size if we need to pad the
// callee-save area to ensure 16-byte alignment.
- if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
+ if (NeedGapToAlignStack && !IsWindows && !RPI.isScalable() &&
RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
ByteOffset % 16 != 0) {
ByteOffset += 8 * StackFillDir;
@@ -3432,7 +3433,7 @@ static void computeCalleeSaveRegisterPairs(
assert(OffsetPost % Scale == 0);
// If filling top down (default), we want the offset after incrementing it.
// If filling bottom up (WinCFI) we need the original offset.
- int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
+ int Offset = IsWindows ? OffsetPre : OffsetPost;
// The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
// Swift context can directly precede FP.
@@ -3471,7 +3472,7 @@ static void computeCalleeSaveRegisterPairs(
if (RPI.isPaired())
i += RegInc;
}
- if (NeedsWinCFI) {
+ if (IsWindows) {
// If we need an alignment gap in the stack, align the topmost stack
// object. A stack frame with a gap looks like this, bottom up:
// x19, d8. d9, gap.
@@ -3606,14 +3607,15 @@ bool AArch64FrameLowering::spillCalleeSavedRegisters(
if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
dbgs() << ")\n");
- assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
+ assert((!isTargetWindows(MF) ||
+ !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
"Windows unwdinding requires a consecutive (FP,LR) pair");
// Windows unwind codes require consecutive registers if registers are
// paired. Make the switch here, so that the code below will save (x,x+1)
// and not (x+1,x).
unsigned FrameIdxReg1 = RPI.FrameIdx;
unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
- if (NeedsWinCFI && RPI.isPaired()) {
+ if (isTargetWindows(MF) && RPI.isPaired()) {
std::swap(Reg1, Reg2);
std::swap(FrameIdxReg1, FrameIdxReg2);
}
@@ -3775,7 +3777,7 @@ bool AArch64FrameLowering::restoreCalleeSavedRegisters(
// and not (x+1,x).
unsigned FrameIdxReg1 = RPI.FrameIdx;
unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
- if (NeedsWinCFI && RPI.isPaired()) {
+ if (isTargetWindows(MF) && RPI.isPaired()) {
std::swap(Reg1, Reg2);
std::swap(FrameIdxReg1, FrameIdxReg2);
}
@@ -4191,14 +4193,14 @@ bool AArch64FrameLowering::assignCalleeSavedSpillSlots(
MachineFunction &MF, const TargetRegisterInfo *RegInfo,
std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
unsigned &MaxCSFrameIndex) const {
- bool NeedsWinCFI = needsWinCFI(MF);
+ bool IsWindows = isTargetWindows(MF);
unsigned StackHazardSize = getStackHazardSize(MF);
// To match the canonical windows frame layout, reverse the list of
// callee saved registers to get them laid out by PrologEpilogInserter
// in the right order. (PrologEpilogInserter allocates stack objects top
// down. Windows canonical prologs store higher numbered registers at
// the top, thus have the CSI array start from the highest registers.)
- if (NeedsWinCFI)
+ if (IsWindows)
std::reverse(CSI.begin(), CSI.end());
if (CSI.empty())
@@ -4209,8 +4211,7 @@ bool AArch64FrameLowering::assignCalleeSavedSpillSlots(
MachineFrameInfo &MFI = MF.getFrameInfo();
auto *AFI = MF.getInfo<AArch64FunctionInfo>();
- bool UsesWinAAPCS = isTargetWindows(MF);
- if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
+ if (IsWindows && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
if ((unsigned)FrameIdx < MinCSFrameIndex)
@@ -4263,7 +4264,7 @@ bool AArch64FrameLowering::assignCalleeSavedSpillSlots(
MaxCSFrameIndex = FrameIdx;
// Grab 8 bytes below FP for the extended asynchronous frame info.
- if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
+ if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !IsWindows &&
Reg == AArch64::FP) {
FrameIdx = MFI.CreateStackObject(8, Alignment, true);
AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
diff --git a/llvm/test/CodeGen/AArch64/arm64ec-reservedregs.ll b/llvm/test/CodeGen/AArch64/arm64ec-reservedregs.ll
index 91ec870dd6d0c..4e6cbccb1a5ae 100644
--- a/llvm/test/CodeGen/AArch64/arm64ec-reservedregs.ll
+++ b/llvm/test/CodeGen/AArch64/arm64ec-reservedregs.ll
@@ -9,20 +9,20 @@
define i32 @no_int_regs(i32 %x) nounwind {
; CHECK-LABEL: no_int_regs:
; CHECK: // %bb.0: // %entry
-; CHECK-NEXT: stp x30, x29, [sp, #-80]! // 16-byte Folded Spill
-; CHECK-NEXT: str x27, [sp, #16] // 8-byte Folded Spill
-; CHECK-NEXT: stp x26, x25, [sp, #32] // 16-byte Folded Spill
-; CHECK-NEXT: stp x22, x21, [sp, #48] // 16-byte Folded Spill
-; CHECK-NEXT: stp x20, x19, [sp, #64] // 16-byte Folded Spill
-; CHECK-NEXT: str w0, [sp, #28] // 4-byte Folded Spill
+; CHECK-NEXT: stp x19, x20, [sp, #-80]! // 16-byte Folded Spill
+; CHECK-NEXT: stp x21, x22, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT: stp x25, x26, [sp, #32] // 16-byte Folded Spill
+; CHECK-NEXT: str x27, [sp, #48] // 8-byte Folded Spill
+; CHECK-NEXT: stp x29, x30, [sp, #56] // 16-byte Folded Spill
+; CHECK-NEXT: str w0, [sp, #76] // 4-byte Folded Spill
; CHECK-NEXT: //APP
; CHECK-NEXT: //NO_APP
-; CHECK-NEXT: ldp x20, x19, [sp, #64] // 16-byte Folded Reload
-; CHECK-NEXT: ldr w0, [sp, #28] // 4-byte Folded Reload
-; CHECK-NEXT: ldp x22, x21, [sp, #48] // 16-byte Folded Reload
-; CHECK-NEXT: ldr x27, [sp, #16] // 8-byte Folded Reload
-; CHECK-NEXT: ldp x26, x25, [sp, #32] // 16-byte Folded Reload
-; CHECK-NEXT: ldp x30, x29, [sp], #80 // 16-byte Folded Reload
+; CHECK-NEXT: ldp x29, x30, [sp, #56] // 16-byte Folded Reload
+; CHECK-NEXT: ldr w0, [sp, #76] // 4-byte Folded Reload
+; CHECK-NEXT: ldp x25, x26, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT: ldr x27, [sp, #48] // 8-byte Folded Reload
+; CHECK-NEXT: ldp x21, x22, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT: ldp x19, x20, [sp], #80 // 16-byte Folded Reload
; CHECK-NEXT: ret
entry:
tail call void asm sideeffect "", "~{x0},~{x1},~{x2},~{x3},~{x4},~{x5},~{x6},~{x7},~{x8},~{x9},~{x10},~{x11},~{x12},~{x15},~{x16},~{x17},~{x19},~{x20},~{x21},~{x22},~{x25},~{x26},~{x27},~{fp},~{lr}"()
@@ -32,20 +32,20 @@ entry:
define i32 @one_int_reg(i32 %x) nounwind {
; CHECK-LABEL: one_int_reg:
; CHECK: // %bb.0: // %entry
-; CHECK-NEXT: stp x30, x29, [sp, #-80]! // 16-byte Folded Spill
-; CHECK-NEXT: str x27, [sp, #16] // 8-byte Folded Spill
+; CHECK-NEXT: stp x19, x20, [sp, #-80]! // 16-byte Folded Spill
+; CHECK-NEXT: stp x21, x22, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT: stp x25, x26, [sp, #32] // 16-byte Folded Spill
+; CHECK-NEXT: str x27, [sp, #48] // 8-byte Folded Spill
+; CHECK-NEXT: stp x29, x30, [sp, #56] // 16-byte Folded Spill
; CHECK-NEXT: mov w30, w0
-; CHECK-NEXT: stp x26, x25, [sp, #32] // 16-byte Folded Spill
-; CHECK-NEXT: stp x22, x21, [sp, #48] // 16-byte Folded Spill
-; CHECK-NEXT: stp x20, x19, [sp, #64] // 16-byte Folded Spill
; CHECK-NEXT: //APP
; CHECK-NEXT: //NO_APP
-; CHECK-NEXT: ldp x20, x19, [sp, #64] // 16-byte Folded Reload
-; CHECK-NEXT: ldr x27, [sp, #16] // 8-byte Folded Reload
-; CHECK-NEXT: ldp x22, x21, [sp, #48] // 16-byte Folded Reload
; CHECK-NEXT: mov w0, w30
-; CHECK-NEXT: ldp x26, x25, [sp, #32] // 16-byte Folded Reload
-; CHECK-NEXT: ldp x30, x29, [sp], #80 // 16-byte Folded Reload
+; CHECK-NEXT: ldp x29, x30, [sp, #56] // 16-byte Folded Reload
+; CHECK-NEXT: ldp x25, x26, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT: ldr x27, [sp, #48] // 8-byte Folded Reload
+; CHECK-NEXT: ldp x21, x22, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT: ldp x19, x20, [sp], #80 // 16-byte Folded Reload
; CHECK-NEXT: ret
entry:
tail call void asm sideeffect "", "~{x0},~{x1},~{x2},~{x3},~{x4},~{x5},~{x6},~{x7},~{x8},~{x9},~{x10},~{x11},~{x12},~{x15},~{x16},~{x17},~{x19},~{x20},~{x21},~{x22},~{x25},~{x26},~{x27},~{fp}"()
@@ -56,18 +56,18 @@ define float @no_float_regs(float %x) nounwind {
; CHECK-LABEL: no_float_regs:
; CHECK: // %bb.0: // %entry
; CHECK-NEXT: sub sp, sp, #80
-; CHECK-NEXT: stp d15, d14, [sp, #16] // 16-byte Folded Spill
-; CHECK-NEXT: stp d13, d12, [sp, #32] // 16-byte Folded Spill
-; CHECK-NEXT: stp d11, d10, [sp, #48] // 16-byte Folded Spill
-; CHECK-NEXT: stp d9, d8, [sp, #64] // 16-byte Folded Spill
+; CHECK-NEXT: stp d8, d9, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT: stp d10, d11, [sp, #32] // 16-byte Folded Spill
+; CHECK-NEXT: stp d12, d13, [sp, #48] // 16-byte Folded Spill
+; CHECK-NEXT: stp d14, d15, [sp, #64] // 16-byte Folded Spill
; CHECK-NEXT: str s0, [sp, #12] // 4-byte Folded Spill
; CHECK-NEXT: //APP
; CHECK-NEXT: //NO_APP
-; CHECK-NEXT: ldp d9, d8, [sp, #64] // 16-byte Folded Reload
+; CHECK-NEXT: ldp d14, d15, [sp, #64] // 16-byte Folded Reload
; CHECK-NEXT: ldr s0, [sp, #12] // 4-byte Folded Reload
-; CHECK-NEXT: ldp d11, d10, [sp, #48] // 16-byte Folded Reload
-; CHECK-NEXT: ldp d13, d12, [sp, #32] // 16-byte Folded Reload
-; CHECK-NEXT: ldp d15, d14, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT: ldp d12, d13, [sp, #48] // 16-byte Folded Reload
+; CHECK-NEXT: ldp d10, d11, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT: ldp d8, d9, [sp, #16] // 16-byte Folded Reload
; CHECK-NEXT: add sp, sp, #80
; CHECK-NEXT: ret
entry:
@@ -78,18 +78,18 @@ entry:
define float @one_float_reg(float %x) nounwind {
; CHECK-LABEL: one_float_reg:
; CHECK: // %bb.0: // %entry
-; CHECK-NEXT: stp d15, d14, [sp, #-64]! // 16-byte Folded Spill
+; CHECK-NEXT: stp d8, d9, [sp, #-64]! // 16-byte Folded Spill
+; CHECK-NEXT: stp d14, d15, [sp, #48] // 16-byte Folded Spill
; CHECK-NEXT: fmov s15, s0
-; CHECK-NEXT: stp d13, d12, [sp, #16] // 16-byte Folded Spill
-; CHECK-NEXT: stp d11, d10, [sp, #32] // 16-byte Folded Spill
-; CHECK-NEXT: stp d9, d8, [sp, #48] // 16-byte Folded Spill
+; CHECK-NEXT: stp d10, d11, [sp, #16] // 16-byte Folded Spill
+; CHECK-NEXT: stp d12, d13, [sp, #32] // 16-byte Folded Spill
; CHECK-NEXT: //APP
; CHECK-NEXT: //NO_APP
-; CHECK-NEXT: ldp d9, d8, [sp, #48] // 16-byte Folded Reload
-; CHECK-NEXT: ldp d11, d10, [sp, #32] // 16-byte Folded Reload
+; CHECK-NEXT: ldp d12, d13, [sp, #32] // 16-byte Folded Reload
; CHECK-NEXT: fmov s0, s15
-; CHECK-NEXT: ldp d13, d12, [sp, #16] // 16-byte Folded Reload
-; CHECK-NEXT: ldp d15, d14, [sp], #64 // 16-byte Folded Reload
+; CHECK-NEXT: ldp d14, d15, [sp, #48] // 16-byte Folded Reload
+; CHECK-NEXT: ldp d10, d11, [sp, #16] // 16-byte Folded Reload
+; CHECK-NEXT: ldp d8, d9, [sp], #64 // 16-byte Folded Reload
; CHECK-NEXT: ret
entry:
tail call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14}"()
diff --git a/llvm/test/CodeGen/AArch64/framelayout-sve-win.mir b/llvm/test/CodeGen/AArch64/framelayout-sve-win.mir
index 5933c5daa67ed..d07d377f4cda2 100644
--- a/llvm/test/CodeGen/AArch64/framelayout-sve-win.mir
+++ b/llvm/test/CodeGen/AArch64/framelayout-sve-win.mir
@@ -17,6 +17,7 @@
define aarch64_sve_vector_pcs void @save_restore_sve() uwtable { entry: unreachable }
define aarch64_sve_vector_pcs void @save_restore_sve_realign() uwtable { entry: unreachable }
define aarch64_sve_vector_pcs void @frame_layout() uwtable { entry: unreachable }
+ define aarch64_sve_vector_pcs void @test_nounwind_layout() nounwind { entry: unreachable }
...
---
name: test_allocate_sve
@@ -892,3 +893,30 @@ body: |
RET_ReallyLR
...
+---
+name: test_nounwind_layout
+stack:
+body: |
+ bb.0.entry:
+ ; CHECK-LABEL: name: test_nounwind_layout
+ ; CHECK: fixedStack:
+ ; CHECK: liveins: $p8, $z8, $lr, $x20
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: $sp = frame-setup ADDVL_XXI $sp, -2, implicit $vg
+ ; CHECK-NEXT: frame-setup STR_PXI killed $p8, $sp, 0 :: (store (s16) into %stack.3)
+ ; CHECK-NEXT: frame-setup STR_ZXI killed $z8, $sp, 1 :: (store (s128) into %stack.2)
+ ; CHECK-NEXT: early-clobber $sp = frame-setup STPXpre killed $x20, killed $lr, $sp, -2 :: (store (s64) into %stack.0), (...
[truncated]
|
Before I update the commit message, I just want to check that this is what you intended? |
efriedma-quic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks right.
efriedma-quic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe should use MF.getTarget().getMCAsmInfo()->usesWindowsCFI() to check for "Windows" in this context, though. UEFI targets use Windows unwind, but technically aren't Windows.
efriedma-quic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with fixed commit message
MacDue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still LGTM ~a couple of minor comments.
| !invalidateRegisterPairing( | ||
| RPI.Reg1, NextReg, IsWindows, NeedsWinCFI, | ||
| /*NeedsFrameRecord=*/false, IsFirst, TRI)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to change this to invalidateRegisterPairing? IIUC the only difference is the NeedsFrameRecord case (which is set to false here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just seemed odd to handle this differently because of Windows, as there might also be other reasons to want to invalidate this register pairing. I probably shouldn't have set NeedsFrameRecord to false though. FWIW the change is NFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like sme-lazy-save-windows.ll has changed now that we pass NeedsFrameRecord (it's now forcing x29, x30 for the frame record, rather than x30, x29).
|
Ping |
Without this change, functions with 'nounwind' don't compile (correctly), because the frame-lowering code makes the assumption that CFI is available when the function has SVE callee-saves.
848855f to
57af3d2
Compare
I've pushed a rebase to this PR (to account for the recent frame lowering changes). I also updated some tests (but they still seem okay to me). |
To reduce the number of combinations to support, always use the same
prologue/epilogue lowering on windows regardless of whether unwind info is
required.
This also fixes an issue where a function with SVE callee-saves and
nounwindled to a compilation failure, because the SVE lowering makesassumptions that only hold when using the Windows style prologue/epilogue.