-
Notifications
You must be signed in to change notification settings - Fork 15.4k
Fix SIFixSGPRCopies To Handle Physical Registers #149859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-llvm-regalloc Author: Patrick Simmons (linuxrocks123) ChangesThis PR fixes AMDGPU's SIFixSGPRCopies to handle COPY instructions where the SGPR is a physical rather than virtual register. It also fixes target-independent LiveVariables to add live-ins This PR also adds a fuzzer-generated CodeGen testcase which now outputs assembly instead of crashing because of these three changes and deletes three CodeGen testcases which were failing bec This is my first PR in the AMDGPU backend, so there may be things I'm missing. Full diff: https://github.com/llvm/llvm-project/pull/149859.diff 9 Files Affected:
diff --git a/llvm/lib/CodeGen/LiveVariables.cpp b/llvm/lib/CodeGen/LiveVariables.cpp
index 1f23418642bc6..dc041ac5aac0a 100644
--- a/llvm/lib/CodeGen/LiveVariables.cpp
+++ b/llvm/lib/CodeGen/LiveVariables.cpp
@@ -246,6 +246,22 @@ LiveVariables::FindLastPartialDef(Register Reg,
return LastDef;
}
+static void fixupLiveIns(MachineInstr &MI, MCPhysReg SubReg) {
+ MachineBasicBlock &MBB = *MI.getParent();
+ if (MBB.isLiveIn(SubReg))
+ return;
+
+ for (MachineBasicBlock::reverse_iterator RIt = MI.getReverseIterator();
+ RIt != MBB.rend(); RIt++)
+ if (RIt->definesRegister(SubReg, nullptr))
+ return;
+
+ MBB.addLiveIn(SubReg);
+ for (const auto &PredMBB : MBB.predecessors())
+ if (!PredMBB->empty())
+ fixupLiveIns(PredMBB->back(), SubReg);
+}
+
/// HandlePhysRegUse - Turn previous partial def's into read/mod/writes. Add
/// implicit defs to a machine instruction if there was an earlier def of its
/// super-register.
@@ -279,6 +295,7 @@ void LiveVariables::HandlePhysRegUse(Register Reg, MachineInstr &MI) {
LastPartialDef->addOperand(MachineOperand::CreateReg(SubReg,
false/*IsDef*/,
true/*IsImp*/));
+ fixupLiveIns(MI, SubReg);
PhysRegDef[SubReg] = LastPartialDef;
Processed.insert_range(TRI->subregs(SubReg));
}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp b/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
index 8101c68986241..9d48e821fa084 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
@@ -256,17 +256,18 @@ AMDGPUResourceUsageAnalysisImpl::analyzeResourceUsage(
const Function *Callee = getCalleeFunction(*CalleeOp);
+ auto isSameFunction = [](const MachineFunction &MF, const Function *F) {
+ return F == &MF.getFunction();
+ };
+
// Avoid crashing on undefined behavior with an illegal call to a
// kernel. If a callsite's calling convention doesn't match the
// function's, it's undefined behavior. If the callsite calling
// convention does match, that would have errored earlier.
- if (Callee && AMDGPU::isEntryFunctionCC(Callee->getCallingConv()))
+ if (Callee && AMDGPU::isEntryFunctionCC(Callee->getCallingConv()) &&
+ !isSameFunction(MF, Callee))
report_fatal_error("invalid call to entry function");
- auto isSameFunction = [](const MachineFunction &MF, const Function *F) {
- return F == &MF.getFunction();
- };
-
if (Callee && !isSameFunction(MF, Callee))
Info.Callees.push_back(Callee);
diff --git a/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp b/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
index f018f77bc83e1..45580fd01725c 100644
--- a/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
@@ -898,12 +898,13 @@ bool SIFixSGPRCopies::lowerSpecialCase(MachineInstr &MI,
TII->get(AMDGPU::V_READFIRSTLANE_B32), TmpReg)
.add(MI.getOperand(1));
MI.getOperand(1).setReg(TmpReg);
+ return true;
} else if (tryMoveVGPRConstToSGPR(MI.getOperand(1), DstReg, MI.getParent(),
MI, MI.getDebugLoc())) {
I = std::next(I);
MI.eraseFromParent();
+ return true;
}
- return true;
}
if (!SrcReg.isVirtual() || TRI->isAGPR(*MRI, SrcReg)) {
SIInstrWorklist worklist;
@@ -929,7 +930,9 @@ void SIFixSGPRCopies::analyzeVGPRToSGPRCopy(MachineInstr* MI) {
if (PHISources.contains(MI))
return;
Register DstReg = MI->getOperand(0).getReg();
- const TargetRegisterClass *DstRC = MRI->getRegClass(DstReg);
+ const TargetRegisterClass *DstRC = DstReg.isVirtual()
+ ? MRI->getRegClass(DstReg)
+ : TRI->getPhysRegBaseClass(DstReg);
V2SCopyInfo Info(getNextVGPRToSGPRCopyId(), MI,
TRI->getRegSizeInBits(*DstRC));
diff --git a/llvm/test/CodeGen/AMDGPU/511263.ll b/llvm/test/CodeGen/AMDGPU/511263.ll
new file mode 100644
index 0000000000000..3f2b65ad22af0
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/511263.ll
@@ -0,0 +1,44 @@
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -O1 < %s
+source_filename = "i1-copy-from-loop.ll"
+
+@G = global ptr addrspace(8) poison
+
+define amdgpu_ps void @i1_copy_from_loop(ptr addrspace(8) inreg %rsrc, i32 %tid) {
+entry:
+ br label %for.body
+
+for.body: ; preds = %end.loop, %entry
+ %i = phi i32 [ 0, %entry ], [ %i.inc, %end.loop ]
+ %LGV = load ptr addrspace(8), ptr @G, align 8
+ %cc = icmp ult i32 %i, 4
+ call void @i1_copy_from_loop(ptr addrspace(8) %LGV, i32 -2147483648)
+ br i1 %cc, label %mid.loop, label %for.end
+
+mid.loop: ; preds = %for.body
+ %v = call float @llvm.amdgcn.struct.ptr.buffer.load.f32(ptr addrspace(8) %rsrc, i32 %tid, i32 %i, i32 0, i32 0)
+ %cc2 = fcmp oge float %v, 0.000000e+00
+ br i1 %cc2, label %end.loop, label %for.end
+
+end.loop: ; preds = %mid.loop
+ %i.inc = add i32 %i, 1
+ br label %for.body
+
+for.end: ; preds = %mid.loop, %for.body
+ br i1 %cc, label %if, label %end
+
+if: ; preds = %for.end
+ call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float undef, float undef, float undef, float undef, i1 true, i1 true)
+ br label %end
+
+end: ; preds = %if, %for.end
+ ret void
+}
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: read)
+declare float @llvm.amdgcn.struct.ptr.buffer.load.f32(ptr addrspace(8) nocapture readonly, i32, i32, i32, i32 immarg) #0
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write)
+declare void @llvm.amdgcn.exp.f32(i32 immarg, i32 immarg, float, float, float, float, i1 immarg, i1 immarg) #1
+
+attributes #0 = { nocallback nofree nosync nounwind willreturn memory(argmem: read) }
+attributes #1 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) }
diff --git a/llvm/test/CodeGen/AMDGPU/call-args-inreg-no-sgpr-for-csrspill-xfail.ll b/llvm/test/CodeGen/AMDGPU/call-args-inreg-no-sgpr-for-csrspill-xfail.ll
deleted file mode 100644
index 34f4476f7fd6a..0000000000000
--- a/llvm/test/CodeGen/AMDGPU/call-args-inreg-no-sgpr-for-csrspill-xfail.ll
+++ /dev/null
@@ -1,27 +0,0 @@
-; RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs=0 -filetype=null %s 2>&1 | FileCheck -enable-var-scope %s
-
-; CHECK: illegal VGPR to SGPR copy
-
-declare hidden void @external_void_func_a15i32_inreg([15 x i32] inreg) #0
-declare hidden void @external_void_func_a16i32_inreg([16 x i32] inreg) #0
-declare hidden void @external_void_func_a15i32_inreg_i32_inreg([15 x i32] inreg, i32 inreg) #0
-
-define void @test_call_external_void_func_a15i32_inreg([15 x i32] inreg %arg0) #0 {
- call void @external_void_func_a15i32_inreg([15 x i32] inreg %arg0)
- ret void
-}
-
-define void @test_call_external_void_func_a16i32_inreg([16 x i32] inreg %arg0) #0 {
- call void @external_void_func_a16i32_inreg([16 x i32] inreg %arg0)
- ret void
-}
-
-define void @test_call_external_void_func_a15i32_inreg_i32_inreg([15 x i32] inreg %arg0, i32 inreg %arg1) #0 {
- call void @external_void_func_a15i32_inreg_i32_inreg([15 x i32] inreg %arg0, i32 inreg %arg1)
- ret void
-}
-
-attributes #0 = { nounwind }
-
-!llvm.module.flags = !{!0}
-!0 = !{i32 1, !"amdhsa_code_object_version", i32 400}
diff --git a/llvm/test/CodeGen/AMDGPU/illegal-sgpr-to-vgpr-copy.ll b/llvm/test/CodeGen/AMDGPU/illegal-sgpr-to-vgpr-copy.ll
deleted file mode 100644
index 597f90c0f4e84..0000000000000
--- a/llvm/test/CodeGen/AMDGPU/illegal-sgpr-to-vgpr-copy.ll
+++ /dev/null
@@ -1,65 +0,0 @@
-; RUN: not llc -mtriple=amdgcn -verify-machineinstrs=0 < %s 2>&1 | FileCheck -check-prefix=ERR %s
-; RUN: not llc -mtriple=amdgcn -verify-machineinstrs=0 < %s 2>&1 | FileCheck -check-prefix=GCN %s
-
-; ERR: error: <unknown>:0:0: in function illegal_vgpr_to_sgpr_copy_i32 void (): illegal VGPR to SGPR copy
-; GCN: ; illegal copy v1 to s9
-
-define amdgpu_kernel void @illegal_vgpr_to_sgpr_copy_i32() #0 {
- %vgpr = call i32 asm sideeffect "; def $0", "=${v1}"()
- call void asm sideeffect "; use $0", "${s9}"(i32 %vgpr)
- ret void
-}
-
-; ERR: error: <unknown>:0:0: in function illegal_vgpr_to_sgpr_copy_v2i32 void (): illegal VGPR to SGPR copy
-; GCN: ; illegal copy v[0:1] to s[10:11]
-define amdgpu_kernel void @illegal_vgpr_to_sgpr_copy_v2i32() #0 {
- %vgpr = call <2 x i32> asm sideeffect "; def $0", "=${v[0:1]}"()
- call void asm sideeffect "; use $0", "${s[10:11]}"(<2 x i32> %vgpr)
- ret void
-}
-
-; ERR: error: <unknown>:0:0: in function illegal_vgpr_to_sgpr_copy_v4i32 void (): illegal VGPR to SGPR copy
-; GCN: ; illegal copy v[0:3] to s[8:11]
-define amdgpu_kernel void @illegal_vgpr_to_sgpr_copy_v4i32() #0 {
- %vgpr = call <4 x i32> asm sideeffect "; def $0", "=${v[0:3]}"()
- call void asm sideeffect "; use $0", "${s[8:11]}"(<4 x i32> %vgpr)
- ret void
-}
-
-; ERR: error: <unknown>:0:0: in function illegal_vgpr_to_sgpr_copy_v8i32 void (): illegal VGPR to SGPR copy
-; GCN: ; illegal copy v[0:7] to s[8:15]
-define amdgpu_kernel void @illegal_vgpr_to_sgpr_copy_v8i32() #0 {
- %vgpr = call <8 x i32> asm sideeffect "; def $0", "=${v[0:7]}"()
- call void asm sideeffect "; use $0", "${s[8:15]}"(<8 x i32> %vgpr)
- ret void
-}
-
-; ERR: error: <unknown>:0:0: in function illegal_vgpr_to_sgpr_copy_v16i32 void (): illegal VGPR to SGPR copy
-; GCN: ; illegal copy v[0:15] to s[16:31]
-define amdgpu_kernel void @illegal_vgpr_to_sgpr_copy_v16i32() #0 {
- %vgpr = call <16 x i32> asm sideeffect "; def $0", "=${v[0:15]}"()
- call void asm sideeffect "; use $0", "${s[16:31]}"(<16 x i32> %vgpr)
- ret void
-}
-
-; ERR: error: <unknown>:0:0: in function illegal_agpr_to_sgpr_copy_i32 void (): illegal VGPR to SGPR copy
-; GCN: v_accvgpr_read_b32 [[COPY1:v[0-9]+]], a1
-; GCN: ; illegal copy [[COPY1]] to s9
-define amdgpu_kernel void @illegal_agpr_to_sgpr_copy_i32() #1 {
- %agpr = call i32 asm sideeffect "; def $0", "=${a1}"()
- call void asm sideeffect "; use $0", "${s9}"(i32 %agpr)
- ret void
-}
-
-; ERR: error: <unknown>:0:0: in function illegal_agpr_to_sgpr_copy_v2i32 void (): illegal VGPR to SGPR copy
-; GCN-DAG: v_accvgpr_read_b32 v[[COPY1L:[0-9]+]], a0
-; GCN-DAG: v_accvgpr_read_b32 v[[COPY1H:[0-9]+]], a1
-; GCN: ; illegal copy v[[[COPY1L]]:[[COPY1H]]] to s[10:11]
-define amdgpu_kernel void @illegal_agpr_to_sgpr_copy_v2i32() #1 {
- %vgpr = call <2 x i32> asm sideeffect "; def $0", "=${a[0:1]}"()
- call void asm sideeffect "; use $0", "${s[10:11]}"(<2 x i32> %vgpr)
- ret void
-}
-
-attributes #0 = { nounwind }
-attributes #1 = { nounwind "target-cpu"="gfx908" }
diff --git a/llvm/test/CodeGen/AMDGPU/swdev503538-move-to-valu-stack-srd-physreg.ll b/llvm/test/CodeGen/AMDGPU/swdev503538-move-to-valu-stack-srd-physreg.ll
deleted file mode 100644
index f0b3d334af67d..0000000000000
--- a/llvm/test/CodeGen/AMDGPU/swdev503538-move-to-valu-stack-srd-physreg.ll
+++ /dev/null
@@ -1,23 +0,0 @@
-; RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs=0 -O0 2> %t.err < %s | FileCheck %s
-; RUN: FileCheck -check-prefix=ERR %s < %t.err
-
-; FIXME: This error will be fixed by supporting arbitrary divergent
-; dynamic allocas by performing a wave umax of the size.
-
-; ERR: error: <unknown>:0:0: in function move_to_valu_assert_srd_is_physreg_swdev503538 i32 (ptr addrspace(1)): illegal VGPR to SGPR copy
-
-; CHECK: ; illegal copy v0 to s32
-
-define i32 @move_to_valu_assert_srd_is_physreg_swdev503538(ptr addrspace(1) %ptr) {
-entry:
- %idx = load i32, ptr addrspace(1) %ptr, align 4
- %zero = extractelement <4 x i32> zeroinitializer, i32 %idx
- %alloca = alloca [2048 x i8], i32 %zero, align 8, addrspace(5)
- %ld = load i32, ptr addrspace(5) %alloca, align 8
- call void @llvm.memset.p5.i32(ptr addrspace(5) %alloca, i8 0, i32 2048, i1 false)
- ret i32 %ld
-}
-
-declare void @llvm.memset.p5.i32(ptr addrspace(5) nocapture writeonly, i8, i32, i1 immarg) #0
-
-attributes #0 = { nocallback nofree nounwind willreturn memory(argmem: write) }
diff --git a/llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.error.ll b/llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.error.ll
index 242b5e9aeaf42..b73b0d7296da2 100644
--- a/llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.error.ll
+++ b/llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.error.ll
@@ -1,7 +1,5 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-; RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs=0 2> %t.err < %s | FileCheck %s
-; RUN: FileCheck -check-prefix=ERR %s < %t.err
-; FIXME: These tests cannot be tail called, and should be executed in a waterfall loop.
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck %s
declare hidden void @void_func_i32_inreg(i32 inreg)
@@ -20,11 +18,11 @@ define void @tail_call_i32_inreg_divergent(i32 %vgpr) {
; CHECK-NEXT: v_writelane_b32 v40, s16, 2
; CHECK-NEXT: s_addk_i32 s32, 0x400
; CHECK-NEXT: v_writelane_b32 v40, s30, 0
-; CHECK-NEXT: v_writelane_b32 v40, s31, 1
; CHECK-NEXT: s_getpc_b64 s[16:17]
; CHECK-NEXT: s_add_u32 s16, s16, void_func_i32_inreg@rel32@lo+4
; CHECK-NEXT: s_addc_u32 s17, s17, void_func_i32_inreg@rel32@hi+12
-; CHECK-NEXT: ; illegal copy v0 to s0
+; CHECK-NEXT: v_readfirstlane_b32 s0, v0
+; CHECK-NEXT: v_writelane_b32 v40, s31, 1
; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
; CHECK-NEXT: v_readlane_b32 s31, v40, 1
; CHECK-NEXT: v_readlane_b32 s30, v40, 0
@@ -58,8 +56,8 @@ define void @indirect_tail_call_i32_inreg_divergent(i32 %vgpr) {
; CHECK-NEXT: s_addc_u32 s17, s17, constant@rel32@hi+12
; CHECK-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
; CHECK-NEXT: v_writelane_b32 v40, s30, 0
+; CHECK-NEXT: v_readfirstlane_b32 s0, v0
; CHECK-NEXT: v_writelane_b32 v40, s31, 1
-; CHECK-NEXT: ; illegal copy v0 to s0
; CHECK-NEXT: s_waitcnt lgkmcnt(0)
; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
; CHECK-NEXT: v_readlane_b32 s31, v40, 1
diff --git a/llvm/test/CodeGen/AMDGPU/write-register-vgpr-into-sgpr.ll b/llvm/test/CodeGen/AMDGPU/write-register-vgpr-into-sgpr.ll
index de3b1d5bf78b3..9355e60ecd321 100644
--- a/llvm/test/CodeGen/AMDGPU/write-register-vgpr-into-sgpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/write-register-vgpr-into-sgpr.ll
@@ -1,4 +1,3 @@
-; XFAIL: *
; REQUIRES: asserts
; RUN: llc -mtriple=amdgcn -mcpu=bonaire -verify-machineinstrs < %s
|
|
@llvm/pr-subscribers-backend-amdgpu Author: Patrick Simmons (linuxrocks123) ChangesThis PR fixes AMDGPU's SIFixSGPRCopies to handle COPY instructions where the SGPR is a physical rather than virtual register. It also fixes target-independent LiveVariables to add live-ins This PR also adds a fuzzer-generated CodeGen testcase which now outputs assembly instead of crashing because of these three changes and deletes three CodeGen testcases which were failing bec This is my first PR in the AMDGPU backend, so there may be things I'm missing. Full diff: https://github.com/llvm/llvm-project/pull/149859.diff 9 Files Affected:
diff --git a/llvm/lib/CodeGen/LiveVariables.cpp b/llvm/lib/CodeGen/LiveVariables.cpp
index 1f23418642bc6..dc041ac5aac0a 100644
--- a/llvm/lib/CodeGen/LiveVariables.cpp
+++ b/llvm/lib/CodeGen/LiveVariables.cpp
@@ -246,6 +246,22 @@ LiveVariables::FindLastPartialDef(Register Reg,
return LastDef;
}
+static void fixupLiveIns(MachineInstr &MI, MCPhysReg SubReg) {
+ MachineBasicBlock &MBB = *MI.getParent();
+ if (MBB.isLiveIn(SubReg))
+ return;
+
+ for (MachineBasicBlock::reverse_iterator RIt = MI.getReverseIterator();
+ RIt != MBB.rend(); RIt++)
+ if (RIt->definesRegister(SubReg, nullptr))
+ return;
+
+ MBB.addLiveIn(SubReg);
+ for (const auto &PredMBB : MBB.predecessors())
+ if (!PredMBB->empty())
+ fixupLiveIns(PredMBB->back(), SubReg);
+}
+
/// HandlePhysRegUse - Turn previous partial def's into read/mod/writes. Add
/// implicit defs to a machine instruction if there was an earlier def of its
/// super-register.
@@ -279,6 +295,7 @@ void LiveVariables::HandlePhysRegUse(Register Reg, MachineInstr &MI) {
LastPartialDef->addOperand(MachineOperand::CreateReg(SubReg,
false/*IsDef*/,
true/*IsImp*/));
+ fixupLiveIns(MI, SubReg);
PhysRegDef[SubReg] = LastPartialDef;
Processed.insert_range(TRI->subregs(SubReg));
}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp b/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
index 8101c68986241..9d48e821fa084 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
@@ -256,17 +256,18 @@ AMDGPUResourceUsageAnalysisImpl::analyzeResourceUsage(
const Function *Callee = getCalleeFunction(*CalleeOp);
+ auto isSameFunction = [](const MachineFunction &MF, const Function *F) {
+ return F == &MF.getFunction();
+ };
+
// Avoid crashing on undefined behavior with an illegal call to a
// kernel. If a callsite's calling convention doesn't match the
// function's, it's undefined behavior. If the callsite calling
// convention does match, that would have errored earlier.
- if (Callee && AMDGPU::isEntryFunctionCC(Callee->getCallingConv()))
+ if (Callee && AMDGPU::isEntryFunctionCC(Callee->getCallingConv()) &&
+ !isSameFunction(MF, Callee))
report_fatal_error("invalid call to entry function");
- auto isSameFunction = [](const MachineFunction &MF, const Function *F) {
- return F == &MF.getFunction();
- };
-
if (Callee && !isSameFunction(MF, Callee))
Info.Callees.push_back(Callee);
diff --git a/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp b/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
index f018f77bc83e1..45580fd01725c 100644
--- a/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
@@ -898,12 +898,13 @@ bool SIFixSGPRCopies::lowerSpecialCase(MachineInstr &MI,
TII->get(AMDGPU::V_READFIRSTLANE_B32), TmpReg)
.add(MI.getOperand(1));
MI.getOperand(1).setReg(TmpReg);
+ return true;
} else if (tryMoveVGPRConstToSGPR(MI.getOperand(1), DstReg, MI.getParent(),
MI, MI.getDebugLoc())) {
I = std::next(I);
MI.eraseFromParent();
+ return true;
}
- return true;
}
if (!SrcReg.isVirtual() || TRI->isAGPR(*MRI, SrcReg)) {
SIInstrWorklist worklist;
@@ -929,7 +930,9 @@ void SIFixSGPRCopies::analyzeVGPRToSGPRCopy(MachineInstr* MI) {
if (PHISources.contains(MI))
return;
Register DstReg = MI->getOperand(0).getReg();
- const TargetRegisterClass *DstRC = MRI->getRegClass(DstReg);
+ const TargetRegisterClass *DstRC = DstReg.isVirtual()
+ ? MRI->getRegClass(DstReg)
+ : TRI->getPhysRegBaseClass(DstReg);
V2SCopyInfo Info(getNextVGPRToSGPRCopyId(), MI,
TRI->getRegSizeInBits(*DstRC));
diff --git a/llvm/test/CodeGen/AMDGPU/511263.ll b/llvm/test/CodeGen/AMDGPU/511263.ll
new file mode 100644
index 0000000000000..3f2b65ad22af0
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/511263.ll
@@ -0,0 +1,44 @@
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -O1 < %s
+source_filename = "i1-copy-from-loop.ll"
+
+@G = global ptr addrspace(8) poison
+
+define amdgpu_ps void @i1_copy_from_loop(ptr addrspace(8) inreg %rsrc, i32 %tid) {
+entry:
+ br label %for.body
+
+for.body: ; preds = %end.loop, %entry
+ %i = phi i32 [ 0, %entry ], [ %i.inc, %end.loop ]
+ %LGV = load ptr addrspace(8), ptr @G, align 8
+ %cc = icmp ult i32 %i, 4
+ call void @i1_copy_from_loop(ptr addrspace(8) %LGV, i32 -2147483648)
+ br i1 %cc, label %mid.loop, label %for.end
+
+mid.loop: ; preds = %for.body
+ %v = call float @llvm.amdgcn.struct.ptr.buffer.load.f32(ptr addrspace(8) %rsrc, i32 %tid, i32 %i, i32 0, i32 0)
+ %cc2 = fcmp oge float %v, 0.000000e+00
+ br i1 %cc2, label %end.loop, label %for.end
+
+end.loop: ; preds = %mid.loop
+ %i.inc = add i32 %i, 1
+ br label %for.body
+
+for.end: ; preds = %mid.loop, %for.body
+ br i1 %cc, label %if, label %end
+
+if: ; preds = %for.end
+ call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float undef, float undef, float undef, float undef, i1 true, i1 true)
+ br label %end
+
+end: ; preds = %if, %for.end
+ ret void
+}
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: read)
+declare float @llvm.amdgcn.struct.ptr.buffer.load.f32(ptr addrspace(8) nocapture readonly, i32, i32, i32, i32 immarg) #0
+
+; Function Attrs: nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write)
+declare void @llvm.amdgcn.exp.f32(i32 immarg, i32 immarg, float, float, float, float, i1 immarg, i1 immarg) #1
+
+attributes #0 = { nocallback nofree nosync nounwind willreturn memory(argmem: read) }
+attributes #1 = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) }
diff --git a/llvm/test/CodeGen/AMDGPU/call-args-inreg-no-sgpr-for-csrspill-xfail.ll b/llvm/test/CodeGen/AMDGPU/call-args-inreg-no-sgpr-for-csrspill-xfail.ll
deleted file mode 100644
index 34f4476f7fd6a..0000000000000
--- a/llvm/test/CodeGen/AMDGPU/call-args-inreg-no-sgpr-for-csrspill-xfail.ll
+++ /dev/null
@@ -1,27 +0,0 @@
-; RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs=0 -filetype=null %s 2>&1 | FileCheck -enable-var-scope %s
-
-; CHECK: illegal VGPR to SGPR copy
-
-declare hidden void @external_void_func_a15i32_inreg([15 x i32] inreg) #0
-declare hidden void @external_void_func_a16i32_inreg([16 x i32] inreg) #0
-declare hidden void @external_void_func_a15i32_inreg_i32_inreg([15 x i32] inreg, i32 inreg) #0
-
-define void @test_call_external_void_func_a15i32_inreg([15 x i32] inreg %arg0) #0 {
- call void @external_void_func_a15i32_inreg([15 x i32] inreg %arg0)
- ret void
-}
-
-define void @test_call_external_void_func_a16i32_inreg([16 x i32] inreg %arg0) #0 {
- call void @external_void_func_a16i32_inreg([16 x i32] inreg %arg0)
- ret void
-}
-
-define void @test_call_external_void_func_a15i32_inreg_i32_inreg([15 x i32] inreg %arg0, i32 inreg %arg1) #0 {
- call void @external_void_func_a15i32_inreg_i32_inreg([15 x i32] inreg %arg0, i32 inreg %arg1)
- ret void
-}
-
-attributes #0 = { nounwind }
-
-!llvm.module.flags = !{!0}
-!0 = !{i32 1, !"amdhsa_code_object_version", i32 400}
diff --git a/llvm/test/CodeGen/AMDGPU/illegal-sgpr-to-vgpr-copy.ll b/llvm/test/CodeGen/AMDGPU/illegal-sgpr-to-vgpr-copy.ll
deleted file mode 100644
index 597f90c0f4e84..0000000000000
--- a/llvm/test/CodeGen/AMDGPU/illegal-sgpr-to-vgpr-copy.ll
+++ /dev/null
@@ -1,65 +0,0 @@
-; RUN: not llc -mtriple=amdgcn -verify-machineinstrs=0 < %s 2>&1 | FileCheck -check-prefix=ERR %s
-; RUN: not llc -mtriple=amdgcn -verify-machineinstrs=0 < %s 2>&1 | FileCheck -check-prefix=GCN %s
-
-; ERR: error: <unknown>:0:0: in function illegal_vgpr_to_sgpr_copy_i32 void (): illegal VGPR to SGPR copy
-; GCN: ; illegal copy v1 to s9
-
-define amdgpu_kernel void @illegal_vgpr_to_sgpr_copy_i32() #0 {
- %vgpr = call i32 asm sideeffect "; def $0", "=${v1}"()
- call void asm sideeffect "; use $0", "${s9}"(i32 %vgpr)
- ret void
-}
-
-; ERR: error: <unknown>:0:0: in function illegal_vgpr_to_sgpr_copy_v2i32 void (): illegal VGPR to SGPR copy
-; GCN: ; illegal copy v[0:1] to s[10:11]
-define amdgpu_kernel void @illegal_vgpr_to_sgpr_copy_v2i32() #0 {
- %vgpr = call <2 x i32> asm sideeffect "; def $0", "=${v[0:1]}"()
- call void asm sideeffect "; use $0", "${s[10:11]}"(<2 x i32> %vgpr)
- ret void
-}
-
-; ERR: error: <unknown>:0:0: in function illegal_vgpr_to_sgpr_copy_v4i32 void (): illegal VGPR to SGPR copy
-; GCN: ; illegal copy v[0:3] to s[8:11]
-define amdgpu_kernel void @illegal_vgpr_to_sgpr_copy_v4i32() #0 {
- %vgpr = call <4 x i32> asm sideeffect "; def $0", "=${v[0:3]}"()
- call void asm sideeffect "; use $0", "${s[8:11]}"(<4 x i32> %vgpr)
- ret void
-}
-
-; ERR: error: <unknown>:0:0: in function illegal_vgpr_to_sgpr_copy_v8i32 void (): illegal VGPR to SGPR copy
-; GCN: ; illegal copy v[0:7] to s[8:15]
-define amdgpu_kernel void @illegal_vgpr_to_sgpr_copy_v8i32() #0 {
- %vgpr = call <8 x i32> asm sideeffect "; def $0", "=${v[0:7]}"()
- call void asm sideeffect "; use $0", "${s[8:15]}"(<8 x i32> %vgpr)
- ret void
-}
-
-; ERR: error: <unknown>:0:0: in function illegal_vgpr_to_sgpr_copy_v16i32 void (): illegal VGPR to SGPR copy
-; GCN: ; illegal copy v[0:15] to s[16:31]
-define amdgpu_kernel void @illegal_vgpr_to_sgpr_copy_v16i32() #0 {
- %vgpr = call <16 x i32> asm sideeffect "; def $0", "=${v[0:15]}"()
- call void asm sideeffect "; use $0", "${s[16:31]}"(<16 x i32> %vgpr)
- ret void
-}
-
-; ERR: error: <unknown>:0:0: in function illegal_agpr_to_sgpr_copy_i32 void (): illegal VGPR to SGPR copy
-; GCN: v_accvgpr_read_b32 [[COPY1:v[0-9]+]], a1
-; GCN: ; illegal copy [[COPY1]] to s9
-define amdgpu_kernel void @illegal_agpr_to_sgpr_copy_i32() #1 {
- %agpr = call i32 asm sideeffect "; def $0", "=${a1}"()
- call void asm sideeffect "; use $0", "${s9}"(i32 %agpr)
- ret void
-}
-
-; ERR: error: <unknown>:0:0: in function illegal_agpr_to_sgpr_copy_v2i32 void (): illegal VGPR to SGPR copy
-; GCN-DAG: v_accvgpr_read_b32 v[[COPY1L:[0-9]+]], a0
-; GCN-DAG: v_accvgpr_read_b32 v[[COPY1H:[0-9]+]], a1
-; GCN: ; illegal copy v[[[COPY1L]]:[[COPY1H]]] to s[10:11]
-define amdgpu_kernel void @illegal_agpr_to_sgpr_copy_v2i32() #1 {
- %vgpr = call <2 x i32> asm sideeffect "; def $0", "=${a[0:1]}"()
- call void asm sideeffect "; use $0", "${s[10:11]}"(<2 x i32> %vgpr)
- ret void
-}
-
-attributes #0 = { nounwind }
-attributes #1 = { nounwind "target-cpu"="gfx908" }
diff --git a/llvm/test/CodeGen/AMDGPU/swdev503538-move-to-valu-stack-srd-physreg.ll b/llvm/test/CodeGen/AMDGPU/swdev503538-move-to-valu-stack-srd-physreg.ll
deleted file mode 100644
index f0b3d334af67d..0000000000000
--- a/llvm/test/CodeGen/AMDGPU/swdev503538-move-to-valu-stack-srd-physreg.ll
+++ /dev/null
@@ -1,23 +0,0 @@
-; RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -verify-machineinstrs=0 -O0 2> %t.err < %s | FileCheck %s
-; RUN: FileCheck -check-prefix=ERR %s < %t.err
-
-; FIXME: This error will be fixed by supporting arbitrary divergent
-; dynamic allocas by performing a wave umax of the size.
-
-; ERR: error: <unknown>:0:0: in function move_to_valu_assert_srd_is_physreg_swdev503538 i32 (ptr addrspace(1)): illegal VGPR to SGPR copy
-
-; CHECK: ; illegal copy v0 to s32
-
-define i32 @move_to_valu_assert_srd_is_physreg_swdev503538(ptr addrspace(1) %ptr) {
-entry:
- %idx = load i32, ptr addrspace(1) %ptr, align 4
- %zero = extractelement <4 x i32> zeroinitializer, i32 %idx
- %alloca = alloca [2048 x i8], i32 %zero, align 8, addrspace(5)
- %ld = load i32, ptr addrspace(5) %alloca, align 8
- call void @llvm.memset.p5.i32(ptr addrspace(5) %alloca, i8 0, i32 2048, i1 false)
- ret i32 %ld
-}
-
-declare void @llvm.memset.p5.i32(ptr addrspace(5) nocapture writeonly, i8, i32, i1 immarg) #0
-
-attributes #0 = { nocallback nofree nounwind willreturn memory(argmem: write) }
diff --git a/llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.error.ll b/llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.error.ll
index 242b5e9aeaf42..b73b0d7296da2 100644
--- a/llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.error.ll
+++ b/llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.error.ll
@@ -1,7 +1,5 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-; RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs=0 2> %t.err < %s | FileCheck %s
-; RUN: FileCheck -check-prefix=ERR %s < %t.err
-; FIXME: These tests cannot be tail called, and should be executed in a waterfall loop.
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck %s
declare hidden void @void_func_i32_inreg(i32 inreg)
@@ -20,11 +18,11 @@ define void @tail_call_i32_inreg_divergent(i32 %vgpr) {
; CHECK-NEXT: v_writelane_b32 v40, s16, 2
; CHECK-NEXT: s_addk_i32 s32, 0x400
; CHECK-NEXT: v_writelane_b32 v40, s30, 0
-; CHECK-NEXT: v_writelane_b32 v40, s31, 1
; CHECK-NEXT: s_getpc_b64 s[16:17]
; CHECK-NEXT: s_add_u32 s16, s16, void_func_i32_inreg@rel32@lo+4
; CHECK-NEXT: s_addc_u32 s17, s17, void_func_i32_inreg@rel32@hi+12
-; CHECK-NEXT: ; illegal copy v0 to s0
+; CHECK-NEXT: v_readfirstlane_b32 s0, v0
+; CHECK-NEXT: v_writelane_b32 v40, s31, 1
; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
; CHECK-NEXT: v_readlane_b32 s31, v40, 1
; CHECK-NEXT: v_readlane_b32 s30, v40, 0
@@ -58,8 +56,8 @@ define void @indirect_tail_call_i32_inreg_divergent(i32 %vgpr) {
; CHECK-NEXT: s_addc_u32 s17, s17, constant@rel32@hi+12
; CHECK-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
; CHECK-NEXT: v_writelane_b32 v40, s30, 0
+; CHECK-NEXT: v_readfirstlane_b32 s0, v0
; CHECK-NEXT: v_writelane_b32 v40, s31, 1
-; CHECK-NEXT: ; illegal copy v0 to s0
; CHECK-NEXT: s_waitcnt lgkmcnt(0)
; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
; CHECK-NEXT: v_readlane_b32 s31, v40, 1
diff --git a/llvm/test/CodeGen/AMDGPU/write-register-vgpr-into-sgpr.ll b/llvm/test/CodeGen/AMDGPU/write-register-vgpr-into-sgpr.ll
index de3b1d5bf78b3..9355e60ecd321 100644
--- a/llvm/test/CodeGen/AMDGPU/write-register-vgpr-into-sgpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/write-register-vgpr-into-sgpr.ll
@@ -1,4 +1,3 @@
-; XFAIL: *
; REQUIRES: asserts
; RUN: llc -mtriple=amdgcn -mcpu=bonaire -verify-machineinstrs < %s
|
|
@doru1004 please let me know what you think. |
arsenm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't simply insert a readfirstlane to make this correct. In general you need to emit a waterfall loop, and you need more context than a single physical register copy. @Shoreshen is working on introducing waterfall loops
| return LastDef; | ||
| } | ||
|
|
||
| static void fixupLiveIns(MachineInstr &MI, MCPhysReg SubReg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't belong in this PR (and we should also delete LiveVariables)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was necessary to get the added testcase to pass; otherwise it would crash in LiveVariables.
| TII->get(AMDGPU::V_READFIRSTLANE_B32), TmpReg) | ||
| .add(MI.getOperand(1)); | ||
| MI.getOperand(1).setReg(TmpReg); | ||
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No else after return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind I understand now.
| @@ -1,4 +1,3 @@ | |||
| ; XFAIL: * | |||
| ; REQUIRES: asserts | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't require asserts, and also needs to check what is produced
| if (Callee && AMDGPU::isEntryFunctionCC(Callee->getCallingConv()) && | ||
| !isSameFunction(MF, Callee)) | ||
| report_fatal_error("invalid call to entry function"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I may be missing something but why is this error message being constrained?
Or, to phrase it differently, why are self-recursive (entry) functions allowed beyond this error now?
Also, is there a test case that showcases this? If this change to ResourceUsageAnalysis is worth it, it may be better off in its own PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added testcase, from a fuzzer, uses self-recursion. If that's not allowed, please let me know so the fuzzer can be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU, the isEntryFunctionCC-ness of this is the blocker on why it cannot be a callee, regardless of (self-)recursion. Recursion itself is possible (albeit dubious in terms of resource allocation and runtime use, but I digress) so long as it isn't marked as an entry function.
| @@ -0,0 +1,44 @@ | |||
| ; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -O1 < %s | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be piped to FileCheck + add checks on expected output (can probably just use update_llc_test_check.py)
I think this test file could use some cleanup (manually or automated through llvm-reduce) and a file rename to also add the swdev prefix + what's being checked.
|
@arsenm wrote:
I have a question, if you don't mind my asking, since I'm new to this architecture and this is my first issue: why do VGPR to SGPR copies differ in what is needed to make them correct if the registers are virtual versus physical? SIFixSGPRCopies can apparently handle virtual VGPR to SGPR copies with existing code, so why is something different needed for physical VGPR to SGPR copies? Also, if @Shoreshen is working on waterfall loops, does that mean he is likely to create a PR that would duplicate the work of fixing this fuzzer testcase? If so, I should probably drop this and take another issue instead. |
Hi @linuxrocks123 the difference of physical and virtual vgpr lies in the procedure of handling. The basic idea of this pass in my understanding is to use vgpr to replace sgpr and use v_ instruction to replace s_ instructions, and propagate them forward (users of SGPR) If it is a physical sgpr at this stage, we cannot do that since for instance, the physical sgpr passing a parameter to inreg function call (which means parameter has to be in determined physical spgr required by the callee). Simply saying, if it is a copy to some physical SGPR, then we cannot use other register to replace it...... My PR is #146997 but it also has problems need to be fixed...... Thanks |
|
Converting to draft pending merge of PR #146997. |
This PR fixes AMDGPU's SIFixSGPRCopies to handle COPY instructions where the SGPR is a physical rather than virtual register. It also fixes target-independent LiveVariables to add live-ins
when it adds implicit uses of physical registers. Finally, it checks whether a function is calling itself recursively before erroring out because the function used the same calling conventi
on as the caller, which I speculate must normally be a bad thing on AMDGPU or else that check wouldn't be there.
This PR also adds a fuzzer-generated CodeGen testcase which now outputs assembly instead of crashing because of these three changes and deletes three CodeGen testcases which were failing bec
ause SIFixSGPRCopies didn't work when SGPR was a physical register.
This is my first PR in the AMDGPU backend, so there may be things I'm missing.