-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[AArch64][SVE] Avoid extra pop of "FixedObject" with FPAfterSVECalleeSaves #156452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here makes sense.
Kind ping |
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-debuginfo Author: Benjamin Maxwell (MacDue) ChangesPreviously, we would pop This patch removes the erroneous pop, and refactors Full diff: https://github.com/llvm/llvm-project/pull/156452.diff 4 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64PrologueEpilogue.cpp b/llvm/lib/Target/AArch64/AArch64PrologueEpilogue.cpp
index 7947469b6c04f..2cf88651132c4 100644
--- a/llvm/lib/Target/AArch64/AArch64PrologueEpilogue.cpp
+++ b/llvm/lib/Target/AArch64/AArch64PrologueEpilogue.cpp
@@ -1278,14 +1278,22 @@ void AArch64EpilogueEmitter::emitEpilogue() {
Subtarget.isTargetWindows() && AFI->getSVECalleeSavedStackSize();
bool CombineSPBump = shouldCombineCSRLocalStackBump(NumBytes);
- // Assume we can't combine the last pop with the sp restore.
- bool CombineAfterCSRBump = false;
+
+ unsigned ProloguePopSize = PrologueSaveSize;
if (FPAfterSVECalleeSaves) {
+ // With FPAfterSVECalleeSaves ProloguePopSize is the amount of stack that
+ // needs to be popped until we reach the start of the SVE save area. The
+ // "FixedObject" stack occurs after the SVE area and must be popped later.
+ ProloguePopSize -= FixedObject;
AfterCSRPopSize += FixedObject;
- } else if (!CombineSPBump && PrologueSaveSize != 0) {
+ }
+
+ // Assume we can't combine the last pop with the sp restore.
+ if (!CombineSPBump && ProloguePopSize != 0) {
MachineBasicBlock::iterator Pop = std::prev(MBB.getFirstTerminator());
while (Pop->getOpcode() == TargetOpcode::CFI_INSTRUCTION ||
- AArch64InstrInfo::isSEHInstruction(*Pop))
+ AArch64InstrInfo::isSEHInstruction(*Pop) ||
+ (FPAfterSVECalleeSaves && isSVECalleeSave(Pop)))
Pop = std::prev(Pop);
// Converting the last ldp to a post-index ldp is valid only if the last
// ldp's offset is 0.
@@ -1295,18 +1303,24 @@ void AArch64EpilogueEmitter::emitEpilogue() {
// may clobber), convert it to a post-index ldp.
if (OffsetOp.getImm() == 0 && AfterCSRPopSize >= 0) {
convertCalleeSaveRestoreToSPPrePostIncDec(
- Pop, DL, PrologueSaveSize, EmitCFI, MachineInstr::FrameDestroy,
- PrologueSaveSize);
+ Pop, DL, ProloguePopSize, EmitCFI, MachineInstr::FrameDestroy,
+ ProloguePopSize);
+ } else if (FPAfterSVECalleeSaves) {
+ // If not, and FPAfterSVECalleeSaves is enabled, deallocate callee-save
+ // non-SVE registers to move the stack pointer to the start of the SVE
+ // area.
+ emitFrameOffset(MBB, std::next(Pop), DL, AArch64::SP, AArch64::SP,
+ StackOffset::getFixed(ProloguePopSize), TII,
+ MachineInstr::FrameDestroy, false, NeedsWinCFI,
+ &HasWinCFI);
} else {
- // If not, make sure to emit an add after the last ldp.
+ // Otherwise, make sure to emit an add after the last ldp.
// We're doing this by transferring the size to be restored from the
// adjustment *before* the CSR pops to the adjustment *after* the CSR
// pops.
- AfterCSRPopSize += PrologueSaveSize;
- CombineAfterCSRBump = true;
+ AfterCSRPopSize += ProloguePopSize;
}
}
-
// Move past the restores of the callee-saved registers.
// If we plan on combining the sp bump of the local stack size and the callee
// save stack size, we might need to adjust the CSR save and restore offsets.
@@ -1394,16 +1408,6 @@ void AArch64EpilogueEmitter::emitEpilogue() {
NeedsWinCFI, &HasWinCFI);
}
- // Deallocate callee-save non-SVE registers.
- emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
- StackOffset::getFixed(AFI->getCalleeSavedStackSize()), TII,
- MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
-
- // Deallocate fixed objects.
- emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
- StackOffset::getFixed(FixedObject), TII,
- MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
-
// Deallocate callee-save SVE registers.
emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
@@ -1522,7 +1526,7 @@ void AArch64EpilogueEmitter::emitEpilogue() {
MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
StackOffset::getFixed(AfterCSRPopSize), TII, MachineInstr::FrameDestroy,
false, NeedsWinCFI, &HasWinCFI, EmitCFI,
- StackOffset::getFixed(CombineAfterCSRBump ? PrologueSaveSize : 0));
+ StackOffset::getFixed(AfterCSRPopSize - ArgumentStackToRestore));
}
}
diff --git a/llvm/test/CodeGen/AArch64/framelayout-sve-win.mir b/llvm/test/CodeGen/AArch64/framelayout-sve-win.mir
index 5933c5daa67ed..b8302e64f282d 100644
--- a/llvm/test/CodeGen/AArch64/framelayout-sve-win.mir
+++ b/llvm/test/CodeGen/AArch64/framelayout-sve-win.mir
@@ -380,10 +380,8 @@ body: |
; CHECK-NEXT: frame-destroy SEH_EpilogStart
; CHECK-NEXT: $sp = frame-destroy ADDXri $sp, 32, 0
; CHECK-NEXT: frame-destroy SEH_StackAlloc 32
- ; CHECK-NEXT: $lr = frame-destroy LDRXui $sp, 0 :: (load (s64) from %stack.1)
- ; CHECK-NEXT: frame-destroy SEH_SaveReg 30, 0
- ; CHECK-NEXT: $sp = frame-destroy ADDXri $sp, 16, 0
- ; CHECK-NEXT: frame-destroy SEH_StackAlloc 16
+ ; CHECK-NEXT: early-clobber $sp, $lr = frame-destroy LDRXpost $sp, 16 :: (load (s64) from %stack.1)
+ ; CHECK-NEXT: frame-destroy SEH_SaveReg_X 30, -16
; CHECK-NEXT: $p4 = frame-destroy LDR_PXI $sp, 0 :: (load (s16) from %stack.4)
; CHECK-NEXT: frame-destroy SEH_SavePReg 4, 0
; CHECK-NEXT: $p5 = frame-destroy LDR_PXI $sp, 1 :: (load (s16) from %stack.3)
@@ -430,10 +428,8 @@ body: |
; CHECK-NEXT: frame-destroy SEH_EpilogStart
; CHECK-NEXT: $sp = frame-destroy ADDXri $sp, 32, 0
; CHECK-NEXT: frame-destroy SEH_StackAlloc 32
- ; CHECK-NEXT: $lr = frame-destroy LDRXui $sp, 0 :: (load (s64) from %stack.1)
- ; CHECK-NEXT: frame-destroy SEH_SaveReg 30, 0
- ; CHECK-NEXT: $sp = frame-destroy ADDXri $sp, 16, 0
- ; CHECK-NEXT: frame-destroy SEH_StackAlloc 16
+ ; CHECK-NEXT: early-clobber $sp, $lr = frame-destroy LDRXpost $sp, 16 :: (load (s64) from %stack.1)
+ ; CHECK-NEXT: frame-destroy SEH_SaveReg_X 30, -16
; CHECK-NEXT: $z8 = frame-destroy LDR_ZXI $sp, 0 :: (load (s128) from %stack.4)
; CHECK-NEXT: frame-destroy SEH_SaveZReg 8, 0
; CHECK-NEXT: $z9 = frame-destroy LDR_ZXI $sp, 1 :: (load (s128) from %stack.3)
@@ -557,10 +553,8 @@ body: |
; CHECK-NEXT: frame-destroy SEH_StackAlloc 32
; CHECK-NEXT: $x21, $lr = frame-destroy LDPXi $sp, 2 :: (load (s64) from %stack.2), (load (s64) from %stack.3)
; CHECK-NEXT: frame-destroy SEH_SaveRegP 21, 30, 16
- ; CHECK-NEXT: $x19, $x20 = frame-destroy LDPXi $sp, 0 :: (load (s64) from %stack.4), (load (s64) from %stack.5)
- ; CHECK-NEXT: frame-destroy SEH_SaveRegP 19, 20, 0
- ; CHECK-NEXT: $sp = frame-destroy ADDXri $sp, 32, 0
- ; CHECK-NEXT: frame-destroy SEH_StackAlloc 32
+ ; CHECK-NEXT: early-clobber $sp, $x19, $x20 = frame-destroy LDPXpost $sp, 4 :: (load (s64) from %stack.4), (load (s64) from %stack.5)
+ ; CHECK-NEXT: frame-destroy SEH_SaveRegP_X 19, 20, -32
; CHECK-NEXT: $z8 = frame-destroy LDR_ZXI $sp, 2 :: (load (s128) from %stack.21)
; CHECK-NEXT: frame-destroy SEH_SaveZReg 8, 2
; CHECK-NEXT: $z9 = frame-destroy LDR_ZXI $sp, 3 :: (load (s128) from %stack.20)
@@ -745,10 +739,8 @@ body: |
; CHECK-NEXT: frame-destroy SEH_EpilogStart
; CHECK-NEXT: $sp = frame-destroy ADDXri $fp, 0, 0
; CHECK-NEXT: frame-destroy SEH_SetFP
- ; CHECK-NEXT: $fp, $lr = frame-destroy LDPXi $sp, 0 :: (load (s64) from %stack.2), (load (s64) from %stack.3)
- ; CHECK-NEXT: frame-destroy SEH_SaveFPLR 0
- ; CHECK-NEXT: $sp = frame-destroy ADDXri $sp, 16, 0
- ; CHECK-NEXT: frame-destroy SEH_StackAlloc 16
+ ; CHECK-NEXT: early-clobber $sp, $fp, $lr = frame-destroy LDPXpost $sp, 2 :: (load (s64) from %stack.2), (load (s64) from %stack.3)
+ ; CHECK-NEXT: frame-destroy SEH_SaveFPLR_X -16
; CHECK-NEXT: $z8 = frame-destroy LDR_ZXI $sp, 2 :: (load (s128) from %stack.19)
; CHECK-NEXT: frame-destroy SEH_SaveZReg 8, 2
; CHECK-NEXT: $z9 = frame-destroy LDR_ZXI $sp, 3 :: (load (s128) from %stack.18)
@@ -869,10 +861,8 @@ body: |
; CHECK-NEXT: frame-destroy SEH_EpilogStart
; CHECK-NEXT: $sp = frame-destroy ADDVL_XXI $sp, 7, implicit $vg
; CHECK-NEXT: frame-destroy SEH_AllocZ 7
- ; CHECK-NEXT: $lr = frame-destroy LDRXui $sp, 0 :: (load (s64) from %stack.6)
- ; CHECK-NEXT: frame-destroy SEH_SaveReg 30, 0
- ; CHECK-NEXT: $sp = frame-destroy ADDXri $sp, 16, 0
- ; CHECK-NEXT: frame-destroy SEH_StackAlloc 16
+ ; CHECK-NEXT: early-clobber $sp, $lr = frame-destroy LDRXpost $sp, 16 :: (load (s64) from %stack.6)
+ ; CHECK-NEXT: frame-destroy SEH_SaveReg_X 30, -16
; CHECK-NEXT: $z8 = frame-destroy LDR_ZXI $sp, 1 :: (load (s128) from %stack.8)
; CHECK-NEXT: frame-destroy SEH_SaveZReg 8, 1
; CHECK-NEXT: $z23 = frame-destroy LDR_ZXI $sp, 2 :: (load (s128) from %stack.7)
diff --git a/llvm/test/CodeGen/AArch64/win-sve.ll b/llvm/test/CodeGen/AArch64/win-sve.ll
index 53ac9344175a3..8446de1a59cf9 100644
--- a/llvm/test/CodeGen/AArch64/win-sve.ll
+++ b/llvm/test/CodeGen/AArch64/win-sve.ll
@@ -75,10 +75,8 @@ define i32 @f(<vscale x 2 x i64> %x) {
; CHECK-NEXT: .seh_startepilogue
; CHECK-NEXT: ldr x30, [sp, #8] // 8-byte Folded Reload
; CHECK-NEXT: .seh_save_reg x30, 8
-; CHECK-NEXT: ldr x28, [sp] // 8-byte Folded Reload
-; CHECK-NEXT: .seh_save_reg x28, 0
-; CHECK-NEXT: add sp, sp, #16
-; CHECK-NEXT: .seh_stackalloc 16
+; CHECK-NEXT: ldr x28, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: .seh_save_reg_x x28, 16
; CHECK-NEXT: ldr z8, [sp, #2, mul vl] // 16-byte Folded Reload
; CHECK-NEXT: .seh_save_zreg z8, 2
; CHECK-NEXT: ldr z9, [sp, #3, mul vl] // 16-byte Folded Reload
@@ -234,10 +232,8 @@ define void @f2(i64 %n, <vscale x 2 x i64> %x) {
; CHECK-NEXT: .seh_save_fplr 16
; CHECK-NEXT: ldr x28, [sp, #8] // 8-byte Folded Reload
; CHECK-NEXT: .seh_save_reg x28, 8
-; CHECK-NEXT: ldr x19, [sp] // 8-byte Folded Reload
-; CHECK-NEXT: .seh_save_reg x19, 0
-; CHECK-NEXT: add sp, sp, #32
-; CHECK-NEXT: .seh_stackalloc 32
+; CHECK-NEXT: ldr x19, [sp], #32 // 8-byte Folded Reload
+; CHECK-NEXT: .seh_save_reg_x x19, 32
; CHECK-NEXT: ldr z8, [sp, #2, mul vl] // 16-byte Folded Reload
; CHECK-NEXT: .seh_save_zreg z8, 2
; CHECK-NEXT: ldr z9, [sp, #3, mul vl] // 16-byte Folded Reload
@@ -384,10 +380,8 @@ define void @f3(i64 %n, <vscale x 2 x i64> %x) {
; CHECK-NEXT: .seh_stackalloc 16
; CHECK-NEXT: ldr x30, [sp, #8] // 8-byte Folded Reload
; CHECK-NEXT: .seh_save_reg x30, 8
-; CHECK-NEXT: ldr x28, [sp] // 8-byte Folded Reload
-; CHECK-NEXT: .seh_save_reg x28, 0
-; CHECK-NEXT: add sp, sp, #16
-; CHECK-NEXT: .seh_stackalloc 16
+; CHECK-NEXT: ldr x28, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: .seh_save_reg_x x28, 16
; CHECK-NEXT: ldr z8, [sp, #2, mul vl] // 16-byte Folded Reload
; CHECK-NEXT: .seh_save_zreg z8, 2
; CHECK-NEXT: ldr z9, [sp, #3, mul vl] // 16-byte Folded Reload
@@ -538,10 +532,8 @@ define void @f4(i64 %n, <vscale x 2 x i64> %x) {
; CHECK-NEXT: .seh_stackalloc 16
; CHECK-NEXT: ldr x30, [sp, #8] // 8-byte Folded Reload
; CHECK-NEXT: .seh_save_reg x30, 8
-; CHECK-NEXT: ldr x28, [sp] // 8-byte Folded Reload
-; CHECK-NEXT: .seh_save_reg x28, 0
-; CHECK-NEXT: add sp, sp, #16
-; CHECK-NEXT: .seh_stackalloc 16
+; CHECK-NEXT: ldr x28, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: .seh_save_reg_x x28, 16
; CHECK-NEXT: ldr z8, [sp, #2, mul vl] // 16-byte Folded Reload
; CHECK-NEXT: .seh_save_zreg z8, 2
; CHECK-NEXT: ldr z9, [sp, #3, mul vl] // 16-byte Folded Reload
@@ -702,10 +694,8 @@ define void @f5(i64 %n, <vscale x 2 x i64> %x) {
; CHECK-NEXT: .seh_save_fplr 16
; CHECK-NEXT: ldr x28, [sp, #8] // 8-byte Folded Reload
; CHECK-NEXT: .seh_save_reg x28, 8
-; CHECK-NEXT: ldr x19, [sp] // 8-byte Folded Reload
-; CHECK-NEXT: .seh_save_reg x19, 0
-; CHECK-NEXT: add sp, sp, #32
-; CHECK-NEXT: .seh_stackalloc 32
+; CHECK-NEXT: ldr x19, [sp], #32 // 8-byte Folded Reload
+; CHECK-NEXT: .seh_save_reg_x x19, 32
; CHECK-NEXT: ldr z8, [sp, #2, mul vl] // 16-byte Folded Reload
; CHECK-NEXT: .seh_save_zreg z8, 2
; CHECK-NEXT: ldr z9, [sp, #3, mul vl] // 16-byte Folded Reload
@@ -860,10 +850,10 @@ define void @f6(<vscale x 2 x i64> %x, [8 x i64] %pad, i64 %n9) personality ptr
; CHECK-NEXT: stur x0, [x8, #16]
; CHECK-NEXT: addvl x8, x29, #18
; CHECK-NEXT: ldr x1, [x8, #32]
-; CHECK-NEXT: .Ltmp0:
+; CHECK-NEXT: .Ltmp0: // EH_LABEL
; CHECK-NEXT: add x0, x19, #0
; CHECK-NEXT: bl g6
-; CHECK-NEXT: .Ltmp1:
+; CHECK-NEXT: .Ltmp1: // EH_LABEL
; CHECK-NEXT: // %bb.1: // %invoke.cont
; CHECK-NEXT: .seh_startepilogue
; CHECK-NEXT: add sp, sp, #64
@@ -872,10 +862,8 @@ define void @f6(<vscale x 2 x i64> %x, [8 x i64] %pad, i64 %n9) personality ptr
; CHECK-NEXT: .seh_save_fplr 16
; CHECK-NEXT: ldr x28, [sp, #8] // 8-byte Folded Reload
; CHECK-NEXT: .seh_save_reg x28, 8
-; CHECK-NEXT: ldr x19, [sp] // 8-byte Folded Reload
-; CHECK-NEXT: .seh_save_reg x19, 0
-; CHECK-NEXT: add sp, sp, #32
-; CHECK-NEXT: .seh_stackalloc 32
+; CHECK-NEXT: ldr x19, [sp], #32 // 8-byte Folded Reload
+; CHECK-NEXT: .seh_save_reg_x x19, 32
; CHECK-NEXT: ldr z8, [sp, #2, mul vl] // 16-byte Folded Reload
; CHECK-NEXT: .seh_save_zreg z8, 2
; CHECK-NEXT: ldr z9, [sp, #3, mul vl] // 16-byte Folded Reload
@@ -932,8 +920,6 @@ define void @f6(<vscale x 2 x i64> %x, [8 x i64] %pad, i64 %n9) personality ptr
; CHECK-NEXT: .seh_save_preg p14, 10
; CHECK-NEXT: ldr p15, [sp, #11, mul vl] // 2-byte Folded Reload
; CHECK-NEXT: .seh_save_preg p15, 11
-; CHECK-NEXT: add sp, sp, #16
-; CHECK-NEXT: .seh_stackalloc 16
; CHECK-NEXT: addvl sp, sp, #18
; CHECK-NEXT: .seh_allocz 18
; CHECK-NEXT: add sp, sp, #16
@@ -1024,10 +1010,8 @@ define void @f6(<vscale x 2 x i64> %x, [8 x i64] %pad, i64 %n9) personality ptr
; CHECK-NEXT: .seh_save_fplr 16
; CHECK-NEXT: ldr x28, [sp, #8] // 8-byte Folded Reload
; CHECK-NEXT: .seh_save_reg x28, 8
-; CHECK-NEXT: ldr x19, [sp] // 8-byte Folded Reload
-; CHECK-NEXT: .seh_save_reg x19, 0
-; CHECK-NEXT: add sp, sp, #32
-; CHECK-NEXT: .seh_stackalloc 32
+; CHECK-NEXT: ldr x19, [sp], #32 // 8-byte Folded Reload
+; CHECK-NEXT: .seh_save_reg_x x19, 32
; CHECK-NEXT: ldr z8, [sp, #2, mul vl] // 16-byte Folded Reload
; CHECK-NEXT: .seh_save_zreg z8, 2
; CHECK-NEXT: ldr z9, [sp, #3, mul vl] // 16-byte Folded Reload
@@ -1144,10 +1128,8 @@ define void @f8(<vscale x 2 x i64> %v) {
; CHECK-NEXT: //APP
; CHECK-NEXT: //NO_APP
; CHECK-NEXT: .seh_startepilogue
-; CHECK-NEXT: ldr x30, [sp] // 8-byte Folded Reload
-; CHECK-NEXT: .seh_save_reg x30, 0
-; CHECK-NEXT: add sp, sp, #16
-; CHECK-NEXT: .seh_stackalloc 16
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: .seh_save_reg_x x30, 16
; CHECK-NEXT: ldr z8, [sp] // 16-byte Folded Reload
; CHECK-NEXT: .seh_save_zreg z8, 0
; CHECK-NEXT: addvl sp, sp, #1
@@ -1196,14 +1178,10 @@ define void @f9(<vscale x 2 x i64> %v, ...) {
; CHECK-NEXT: //APP
; CHECK-NEXT: //NO_APP
; CHECK-NEXT: .seh_startepilogue
-; CHECK-NEXT: ldr x30, [sp] // 8-byte Folded Reload
-; CHECK-NEXT: .seh_save_reg x30, 0
-; CHECK-NEXT: add sp, sp, #16
-; CHECK-NEXT: .seh_stackalloc 16
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: .seh_save_reg_x x30, 16
; CHECK-NEXT: ldr z8, [sp] // 16-byte Folded Reload
; CHECK-NEXT: .seh_save_zreg z8, 0
-; CHECK-NEXT: add sp, sp, #64
-; CHECK-NEXT: .seh_stackalloc 64
; CHECK-NEXT: addvl sp, sp, #1
; CHECK-NEXT: .seh_allocz 1
; CHECK-NEXT: add sp, sp, #64
@@ -1301,10 +1279,8 @@ define void @f10(i64 %n, <vscale x 2 x i64> %x) "frame-pointer"="all" {
; CHECK-NEXT: .seh_stackalloc 16
; CHECK-NEXT: ldp x29, x30, [sp, #8] // 16-byte Folded Reload
; CHECK-NEXT: .seh_save_fplr 8
-; CHECK-NEXT: ldr x28, [sp] // 8-byte Folded Reload
-; CHECK-NEXT: .seh_save_reg x28, 0
-; CHECK-NEXT: add sp, sp, #32
-; CHECK-NEXT: .seh_stackalloc 32
+; CHECK-NEXT: ldr x28, [sp], #32 // 8-byte Folded Reload
+; CHECK-NEXT: .seh_save_reg_x x28, 32
; CHECK-NEXT: ldr z8, [sp, #2, mul vl] // 16-byte Folded Reload
; CHECK-NEXT: .seh_save_zreg z8, 2
; CHECK-NEXT: ldr z9, [sp, #3, mul vl] // 16-byte Folded Reload
@@ -1390,10 +1366,8 @@ define i32 @f11(double %d, <vscale x 4 x i32> %vs) "aarch64_pstate_sm_compatible
; CHECK-NEXT: //NO_APP
; CHECK-NEXT: str d0, [sp, #8]
; CHECK-NEXT: .seh_startepilogue
-; CHECK-NEXT: ldr x30, [sp] // 8-byte Folded Reload
-; CHECK-NEXT: .seh_save_reg x30, 0
-; CHECK-NEXT: add sp, sp, #16
-; CHECK-NEXT: .seh_stackalloc 16
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: .seh_save_reg_x x30, 16
; CHECK-NEXT: ldr z8, [sp] // 16-byte Folded Reload
; CHECK-NEXT: .seh_save_zreg z8, 0
; CHECK-NEXT: addvl sp, sp, #1
@@ -1431,10 +1405,8 @@ define i32 @f12(double %d, <vscale x 4 x i32> %vs) "aarch64_pstate_sm_compatible
; CHECK-NEXT: .seh_startepilogue
; CHECK-NEXT: addvl sp, sp, #1
; CHECK-NEXT: .seh_allocz 1
-; CHECK-NEXT: ldr x30, [sp] // 8-byte Folded Reload
-; CHECK-NEXT: .seh_save_reg x30, 0
-; CHECK-NEXT: add sp, sp, #16
-; CHECK-NEXT: .seh_stackalloc 16
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: .seh_save_reg_x x30, 16
; CHECK-NEXT: ldr z8, [sp] // 16-byte Folded Reload
; CHECK-NEXT: .seh_save_zreg z8, 0
; CHECK-NEXT: addvl sp, sp, #1
@@ -1475,10 +1447,8 @@ define i32 @f13(double %d, <vscale x 4 x i32> %vs) "frame-pointer"="all" {
; CHECK-NEXT: .seh_startepilogue
; CHECK-NEXT: ldp x29, x30, [sp, #8] // 16-byte Folded Reload
; CHECK-NEXT: .seh_save_fplr 8
-; CHECK-NEXT: ldr x28, [sp] // 8-byte Folded Reload
-; CHECK-NEXT: .seh_save_reg x28, 0
-; CHECK-NEXT: add sp, sp, #32
-; CHECK-NEXT: .seh_stackalloc 32
+; CHECK-NEXT: ldr x28, [sp], #32 // 8-byte Folded Reload
+; CHECK-NEXT: .seh_save_reg_x x28, 32
; CHECK-NEXT: ldr z8, [sp] // 16-byte Folded Reload
; CHECK-NEXT: .seh_save_zreg z8, 0
; CHECK-NEXT: addvl sp, sp, #1
@@ -1521,10 +1491,8 @@ define i32 @f14(double %d, <vscale x 4 x i32> %vs) "frame-pointer"="all" {
; CHECK-NEXT: .seh_allocz 1
; CHECK-NEXT: ldp x29, x30, [sp, #8] // 16-byte Folded Reload
; CHECK-NEXT: .seh_save_fplr 8
-; CHECK-NEXT: ldr x28, [sp] // 8-byte Folded Reload
-; CHECK-NEXT: .seh_save_reg x28, 0
-; CHECK-NEXT: add sp, sp, #32
-; CHECK-NEXT: .seh_stackalloc 32
+; CHECK-NEXT: ldr x28, [sp], #32 // 8-byte Folded Reload
+; CHECK-NEXT: .seh_save_reg_x x28, 32
; CHECK-NEXT: ldr z8, [sp] // 16-byte Folded Reload
; CHECK-NEXT: .seh_save_zreg z8, 0
; CHECK-NEXT: addvl sp, sp, #1
@@ -1572,10 +1540,8 @@ define tailcc void @f15(double %d, <vscale x 4 x i32> %vs, [9 x i64], i32 %i) {
; CHECK-NEXT: .seh_stackalloc 16
; CHECK-NEXT: ldr x30, [sp, #8] // 8-byte Folded Reload
; CHECK-NEXT: .seh_save_reg x30, 8
-; CHECK-NEXT: ldr x28, [sp] // 8-byte Folded Reload
-; CHECK-NEXT: .seh_save_reg x28, 0
-; CHECK-NEXT: add sp, sp, #16
-; CHECK-NEXT: .seh_stackalloc 16
+; CHECK-NEXT: ldr x28, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: .seh_save_reg_x x28, 16
; CHECK-NEXT: ldr z8, [sp] // 16-byte Folded Reload
; CHECK-NEXT: .seh_save_zreg z8, 0
; CHECK-NEXT: addvl sp, sp, #1
diff --git a/llvm/test/DebugInfo/COFF/AArch64/codeview-sve.ll b/llvm/test/DebugInfo/COFF/AArch64/codeview-sve.ll
index 446a84dc0294c..ffdc80a350a24 100644
--- a/llvm/test/DebugInfo/COFF/AArch64/codeview-sve.ll
+++ b/llvm/test/DebugInfo/COFF/AArch64/codeview-sve.ll
@@ -101,7 +101,7 @@
; CHECK-NEXT: LocalVariableAddrRange {
; CHECK-NEXT: OffsetStart: .text+0x0
; CHECK-NEXT: ISectStart: 0x0
-; CHECK-NEXT: Range: 0xBC
+; CHECK-NEXT: Range: 0xB8
; CHECK-NEXT: }
; CHECK-NEXT: }
; CHECK-NEXT: ProcEnd {
|
…Saves Previously, we would pop `FixedObject`-bytes after deallocating the SVE area, then again as part of the "AfterCSRPopSize". This could be seen in the tests `@f6` and `@f9`. This patch removes the erroneous pop, and refactors `FPAfterSVECalleeSaves` to reuse more of the existing GPR deallocation logic, which allows for post-decrements.
Kind ping. Rebased which fixed the diff. I think it's probably worth fixing this miscompile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This looks like a nice simplification, in addition to the bugfix.
llvm/test/CodeGen/AArch64/win-sve.ll
Outdated
; CHECK-NEXT: ldp x29, x30, [sp, #8] // 16-byte Folded Reload | ||
; CHECK-NEXT: add sp, sp, #32 | ||
; CHECK-NEXT: .seh_stackalloc 32 | ||
; CHECK-NEXT: .seh_save_fplr 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is related to your patch, but this is in the wrong order: the seh_save_fplr needs to come before the seh_stackalloc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check this tomorrow. Shouldn't it also come before the add sp, sp, #32
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ordering of seh_* intrinsics relatively to the corresponding instructions doesn't strictly matter: the encoding just counts the number of opcodes. But yes, we usually try to emit the seh_* opcode after the corresponding instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this 👍 I was not skipping the SEH opcode after the last pop, in the case the SP increment could not be folded into the pop.
Previously, we would pop
FixedObject
-bytes after deallocating the SVE area, then again as part of the "AfterCSRPopSize". This could be seen in the tests@f6
and@f9
.This patch removes the erroneous pop, and refactors
FPAfterSVECalleeSaves
to reuse more of the existing GPR deallocation logic, which allows for post-decrements.