-
Notifications
You must be signed in to change notification settings - Fork 15.2k
AMDGPU: Select vector reg class for divergent build_vector #168169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMDGPU: Select vector reg class for divergent build_vector #168169
Conversation
|
@llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) ChangesAMDGPU: Select vector reg class for divergent build_vector The main improvement is to the mfma tests. There are some test regressions Patch is 7.13 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/168169.diff 114 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
index 9308934c8baf8..ac0cb549d020b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
@@ -726,10 +726,14 @@ void AMDGPUDAGToDAGISel::Select(SDNode *N) {
break;
}
+ const SIRegisterInfo *TRI = Subtarget->getRegisterInfo();
assert(VT.getVectorElementType().bitsEq(MVT::i32));
- unsigned RegClassID =
- SIRegisterInfo::getSGPRClassForBitWidth(NumVectorElts * 32)->getID();
- SelectBuildVector(N, RegClassID);
+ const TargetRegisterClass *RegClass =
+ N->isDivergent()
+ ? TRI->getDefaultVectorSuperClassForBitWidth(NumVectorElts * 32)
+ : SIRegisterInfo::getSGPRClassForBitWidth(NumVectorElts * 32);
+
+ SelectBuildVector(N, RegClass->getID());
return;
}
case ISD::VECTOR_SHUFFLE:
diff --git a/llvm/test/CodeGen/AMDGPU/a-v-flat-atomic-cmpxchg.ll b/llvm/test/CodeGen/AMDGPU/a-v-flat-atomic-cmpxchg.ll
index bc341f2baa804..e882769f97ac1 100644
--- a/llvm/test/CodeGen/AMDGPU/a-v-flat-atomic-cmpxchg.ll
+++ b/llvm/test/CodeGen/AMDGPU/a-v-flat-atomic-cmpxchg.ll
@@ -95,13 +95,13 @@ define void @flat_atomic_cmpxchg_i32_ret_a_a__a(ptr %ptr) #0 {
; CHECK: ; %bb.0:
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CHECK-NEXT: ;;#ASMSTART
-; CHECK-NEXT: ; def a0
+; CHECK-NEXT: ; def a1
; CHECK-NEXT: ;;#ASMEND
; CHECK-NEXT: ;;#ASMSTART
-; CHECK-NEXT: ; def a1
+; CHECK-NEXT: ; def a0
; CHECK-NEXT: ;;#ASMEND
-; CHECK-NEXT: v_accvgpr_read_b32 v2, a1
-; CHECK-NEXT: v_accvgpr_read_b32 v3, a0
+; CHECK-NEXT: v_accvgpr_read_b32 v3, a1
+; CHECK-NEXT: v_accvgpr_read_b32 v2, a0
; CHECK-NEXT: buffer_wbl2
; CHECK-NEXT: flat_atomic_cmpswap v0, v[0:1], v[2:3] offset:40 glc
; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -126,13 +126,13 @@ define void @flat_atomic_cmpxchg_i32_ret_a_a__v(ptr %ptr) #0 {
; CHECK: ; %bb.0:
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CHECK-NEXT: ;;#ASMSTART
-; CHECK-NEXT: ; def a0
+; CHECK-NEXT: ; def a1
; CHECK-NEXT: ;;#ASMEND
; CHECK-NEXT: ;;#ASMSTART
-; CHECK-NEXT: ; def a1
+; CHECK-NEXT: ; def a0
; CHECK-NEXT: ;;#ASMEND
-; CHECK-NEXT: v_accvgpr_read_b32 v2, a1
-; CHECK-NEXT: v_accvgpr_read_b32 v3, a0
+; CHECK-NEXT: v_accvgpr_read_b32 v3, a1
+; CHECK-NEXT: v_accvgpr_read_b32 v2, a0
; CHECK-NEXT: buffer_wbl2
; CHECK-NEXT: flat_atomic_cmpswap v0, v[0:1], v[2:3] offset:40 glc
; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -156,12 +156,14 @@ define void @flat_atomic_cmpxchg_i32_ret_v_a__v(ptr %ptr) #0 {
; CHECK: ; %bb.0:
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CHECK-NEXT: ;;#ASMSTART
-; CHECK-NEXT: ; def a0
+; CHECK-NEXT: ; def v2
; CHECK-NEXT: ;;#ASMEND
-; CHECK-NEXT: v_accvgpr_read_b32 v2, a0
+; CHECK-NEXT: v_accvgpr_write_b32 a1, v2
; CHECK-NEXT: ;;#ASMSTART
-; CHECK-NEXT: ; def v3
+; CHECK-NEXT: ; def a0
; CHECK-NEXT: ;;#ASMEND
+; CHECK-NEXT: v_accvgpr_read_b32 v3, a1
+; CHECK-NEXT: v_accvgpr_read_b32 v2, a0
; CHECK-NEXT: buffer_wbl2
; CHECK-NEXT: flat_atomic_cmpswap v0, v[0:1], v[2:3] offset:40 glc
; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -299,12 +301,13 @@ define void @flat_atomic_cmpxchg_i32_ret_av_a__av(ptr %ptr) #0 {
; CHECK: ; %bb.0:
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CHECK-NEXT: ;;#ASMSTART
-; CHECK-NEXT: ; def a0
+; CHECK-NEXT: ; def a1
; CHECK-NEXT: ;;#ASMEND
-; CHECK-NEXT: v_accvgpr_read_b32 v2, a0
; CHECK-NEXT: ;;#ASMSTART
-; CHECK-NEXT: ; def v3
+; CHECK-NEXT: ; def a0
; CHECK-NEXT: ;;#ASMEND
+; CHECK-NEXT: v_accvgpr_read_b32 v3, a1
+; CHECK-NEXT: v_accvgpr_read_b32 v2, a0
; CHECK-NEXT: buffer_wbl2
; CHECK-NEXT: flat_atomic_cmpswap v0, v[0:1], v[2:3] offset:40 glc
; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -328,12 +331,13 @@ define void @flat_atomic_cmpxchg_i32_ret_a_av__av(ptr %ptr) #0 {
; CHECK: ; %bb.0:
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CHECK-NEXT: ;;#ASMSTART
-; CHECK-NEXT: ; def a0
+; CHECK-NEXT: ; def a1
; CHECK-NEXT: ;;#ASMEND
-; CHECK-NEXT: v_accvgpr_read_b32 v3, a0
; CHECK-NEXT: ;;#ASMSTART
-; CHECK-NEXT: ; def v2
+; CHECK-NEXT: ; def a0
; CHECK-NEXT: ;;#ASMEND
+; CHECK-NEXT: v_accvgpr_read_b32 v3, a1
+; CHECK-NEXT: v_accvgpr_read_b32 v2, a0
; CHECK-NEXT: buffer_wbl2
; CHECK-NEXT: flat_atomic_cmpswap v0, v[0:1], v[2:3] offset:40 glc
; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
@@ -533,50 +537,55 @@ define void @flat_atomic_cmpxchg_i64_ret_a_a__a(ptr %ptr) #0 {
; CHECK-LABEL: flat_atomic_cmpxchg_i64_ret_a_a__a:
; CHECK: ; %bb.0:
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; CHECK-NEXT: v_add_co_u32_e32 v4, vcc, 0x50, v0
+; CHECK-NEXT: v_add_co_u32_e32 v0, vcc, 0x50, v0
+; CHECK-NEXT: s_mov_b64 s[4:5], src_private_base
+; CHECK-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; CHECK-NEXT: ;;#ASMSTART
-; CHECK-NEXT: ; def a[0:1]
+; CHECK-NEXT: ; def a[2:3]
; CHECK-NEXT: ;;#ASMEND
-; CHECK-NEXT: v_accvgpr_read_b32 v3, a1
-; CHECK-NEXT: s_mov_b64 s[4:5], src_private_base
-; CHECK-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v1, vcc
-; CHECK-NEXT: v_accvgpr_read_b32 v2, a0
+; CHECK-NEXT: v_accvgpr_read_b32 v5, a3
; CHECK-NEXT: ;;#ASMSTART
-; CHECK-NEXT: ; def a[0:1]
+; CHECK-NEXT: ; def a[4:5]
; CHECK-NEXT: ;;#ASMEND
-; CHECK-NEXT: v_accvgpr_read_b32 v0, a0
-; CHECK-NEXT: v_accvgpr_read_b32 v1, a1
-; CHECK-NEXT: v_cmp_ne_u32_e32 vcc, s5, v5
+; CHECK-NEXT: v_accvgpr_read_b32 v2, a4
+; CHECK-NEXT: v_accvgpr_read_b32 v4, a2
+; CHECK-NEXT: v_accvgpr_read_b32 v3, a5
+; CHECK-NEXT: v_cmp_ne_u32_e32 vcc, s5, v1
; CHECK-NEXT: ; implicit-def: $agpr0_agpr1
; CHECK-NEXT: s_and_saveexec_b64 s[4:5], vcc
; CHECK-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
; CHECK-NEXT: s_cbranch_execz .LBB15_2
; CHECK-NEXT: ; %bb.1: ; %atomicrmw.global
+; CHECK-NEXT: v_accvgpr_read_b32 v2, a4
+; CHECK-NEXT: v_accvgpr_read_b32 v3, a5
+; CHECK-NEXT: v_accvgpr_read_b32 v4, a2
+; CHECK-NEXT: v_accvgpr_read_b32 v5, a3
; CHECK-NEXT: buffer_wbl2
-; CHECK-NEXT: flat_atomic_cmpswap_x2 v[0:1], v[4:5], v[0:3] glc
+; CHECK-NEXT: flat_atomic_cmpswap_x2 v[0:1], v[0:1], v[2:5] glc
; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; CHECK-NEXT: buffer_invl2
; CHECK-NEXT: buffer_wbinvl1_vol
; CHECK-NEXT: ; implicit-def: $vgpr4_vgpr5
+; CHECK-NEXT: ; implicit-def: $vgpr2_vgpr3
; CHECK-NEXT: v_accvgpr_write_b32 a0, v0
; CHECK-NEXT: v_accvgpr_write_b32 a1, v1
-; CHECK-NEXT: ; implicit-def: $vgpr2_vgpr3
+; CHECK-NEXT: ; implicit-def: $vgpr0_vgpr1
; CHECK-NEXT: .LBB15_2: ; %Flow
; CHECK-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
; CHECK-NEXT: s_cbranch_execz .LBB15_4
; CHECK-NEXT: ; %bb.3: ; %atomicrmw.private
-; CHECK-NEXT: v_cmp_ne_u64_e32 vcc, 0, v[4:5]
-; CHECK-NEXT: v_cndmask_b32_e32 v6, -1, v4, vcc
-; CHECK-NEXT: buffer_load_dword v4, v6, s[0:3], 0 offen
-; CHECK-NEXT: buffer_load_dword v5, v6, s[0:3], 0 offen offset:4
+; CHECK-NEXT: v_cmp_ne_u64_e32 vcc, 0, v[0:1]
+; CHECK-NEXT: v_cndmask_b32_e32 v6, -1, v0, vcc
+; CHECK-NEXT: buffer_load_dword v0, v6, s[0:3], 0 offen
+; CHECK-NEXT: buffer_load_dword v1, v6, s[0:3], 0 offen offset:4
; CHECK-NEXT: s_waitcnt vmcnt(1)
-; CHECK-NEXT: v_accvgpr_write_b32 a0, v4
+; CHECK-NEXT: v_accvgpr_write_b32 a0, v0
; CHECK-NEXT: s_waitcnt vmcnt(0)
-; CHECK-NEXT: v_cmp_eq_u64_e32 vcc, v[4:5], v[2:3]
-; CHECK-NEXT: v_cndmask_b32_e32 v1, v5, v1, vcc
-; CHECK-NEXT: v_accvgpr_write_b32 a1, v5
-; CHECK-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc
-; CHECK-NEXT: buffer_store_dword v1, v6, s[0:3], 0 offen offset:4
+; CHECK-NEXT: v_cmp_eq_u64_e32 vcc, v[0:1], v[4:5]
+; CHECK-NEXT: v_cndmask_b32_e32 v3, v1, v3, vcc
+; CHECK-NEXT: v_accvgpr_write_b32 a1, v1
+; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc
+; CHECK-NEXT: buffer_store_dword v3, v6, s[0:3], 0 offen offset:4
; CHECK-NEXT: buffer_store_dword v0, v6, s[0:3], 0 offen
; CHECK-NEXT: .LBB15_4: ; %atomicrmw.phi
; CHECK-NEXT: s_or_b64 exec, exec, s[4:5]
@@ -598,50 +607,55 @@ define void @flat_atomic_cmpxchg_i64_ret_a_a__v(ptr %ptr) #0 {
; CHECK-LABEL: flat_atomic_cmpxchg_i64_ret_a_a__v:
; CHECK: ; %bb.0:
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; CHECK-NEXT: v_add_co_u32_e32 v6, vcc, 0x50, v0
+; CHECK-NEXT: v_add_co_u32_e32 v0, vcc, 0x50, v0
+; CHECK-NEXT: s_mov_b64 s[4:5], src_private_base
+; CHECK-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; CHECK-NEXT: ;;#ASMSTART
; CHECK-NEXT: ; def a[0:1]
; CHECK-NEXT: ;;#ASMEND
-; CHECK-NEXT: v_accvgpr_read_b32 v3, a1
-; CHECK-NEXT: s_mov_b64 s[4:5], src_private_base
-; CHECK-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v1, vcc
-; CHECK-NEXT: v_accvgpr_read_b32 v2, a0
+; CHECK-NEXT: v_accvgpr_read_b32 v5, a1
; CHECK-NEXT: ;;#ASMSTART
-; CHECK-NEXT: ; def a[0:1]
+; CHECK-NEXT: ; def a[2:3]
; CHECK-NEXT: ;;#ASMEND
-; CHECK-NEXT: v_accvgpr_read_b32 v0, a0
-; CHECK-NEXT: v_accvgpr_read_b32 v1, a1
-; CHECK-NEXT: v_cmp_ne_u32_e32 vcc, s5, v7
-; CHECK-NEXT: ; implicit-def: $vgpr4_vgpr5
+; CHECK-NEXT: v_accvgpr_read_b32 v7, a3
+; CHECK-NEXT: v_accvgpr_read_b32 v4, a0
+; CHECK-NEXT: v_accvgpr_read_b32 v6, a2
+; CHECK-NEXT: v_cmp_ne_u32_e32 vcc, s5, v1
+; CHECK-NEXT: ; implicit-def: $vgpr2_vgpr3
; CHECK-NEXT: s_and_saveexec_b64 s[4:5], vcc
; CHECK-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
; CHECK-NEXT: s_cbranch_execz .LBB16_2
; CHECK-NEXT: ; %bb.1: ; %atomicrmw.global
+; CHECK-NEXT: v_accvgpr_read_b32 v2, a2
+; CHECK-NEXT: v_accvgpr_read_b32 v3, a3
+; CHECK-NEXT: v_accvgpr_read_b32 v4, a0
+; CHECK-NEXT: v_accvgpr_read_b32 v5, a1
; CHECK-NEXT: buffer_wbl2
-; CHECK-NEXT: flat_atomic_cmpswap_x2 v[4:5], v[6:7], v[0:3] glc
+; CHECK-NEXT: flat_atomic_cmpswap_x2 v[2:3], v[0:1], v[2:5] glc
; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; CHECK-NEXT: buffer_invl2
; CHECK-NEXT: buffer_wbinvl1_vol
+; CHECK-NEXT: ; implicit-def: $vgpr0_vgpr1
+; CHECK-NEXT: ; implicit-def: $vgpr4_vgpr5
; CHECK-NEXT: ; implicit-def: $vgpr6_vgpr7
-; CHECK-NEXT: ; implicit-def: $vgpr2_vgpr3
; CHECK-NEXT: .LBB16_2: ; %Flow
; CHECK-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
; CHECK-NEXT: s_cbranch_execz .LBB16_4
; CHECK-NEXT: ; %bb.3: ; %atomicrmw.private
-; CHECK-NEXT: v_cmp_ne_u64_e32 vcc, 0, v[6:7]
-; CHECK-NEXT: v_cndmask_b32_e32 v6, -1, v6, vcc
-; CHECK-NEXT: buffer_load_dword v4, v6, s[0:3], 0 offen
-; CHECK-NEXT: buffer_load_dword v5, v6, s[0:3], 0 offen offset:4
+; CHECK-NEXT: v_cmp_ne_u64_e32 vcc, 0, v[0:1]
+; CHECK-NEXT: v_cndmask_b32_e32 v0, -1, v0, vcc
+; CHECK-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen
+; CHECK-NEXT: buffer_load_dword v3, v0, s[0:3], 0 offen offset:4
; CHECK-NEXT: s_waitcnt vmcnt(0)
-; CHECK-NEXT: v_cmp_eq_u64_e32 vcc, v[4:5], v[2:3]
-; CHECK-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc
-; CHECK-NEXT: v_cndmask_b32_e32 v1, v5, v1, vcc
-; CHECK-NEXT: buffer_store_dword v0, v6, s[0:3], 0 offen
-; CHECK-NEXT: buffer_store_dword v1, v6, s[0:3], 0 offen offset:4
+; CHECK-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[4:5]
+; CHECK-NEXT: v_cndmask_b32_e32 v4, v2, v6, vcc
+; CHECK-NEXT: v_cndmask_b32_e32 v1, v3, v7, vcc
+; CHECK-NEXT: buffer_store_dword v4, v0, s[0:3], 0 offen
+; CHECK-NEXT: buffer_store_dword v1, v0, s[0:3], 0 offen offset:4
; CHECK-NEXT: .LBB16_4: ; %atomicrmw.phi
; CHECK-NEXT: s_or_b64 exec, exec, s[4:5]
; CHECK-NEXT: ;;#ASMSTART
-; CHECK-NEXT: ; use v[4:5]
+; CHECK-NEXT: ; use v[2:3]
; CHECK-NEXT: ;;#ASMEND
; CHECK-NEXT: s_waitcnt vmcnt(0)
; CHECK-NEXT: s_setpc_b64 s[30:31]
@@ -658,48 +672,51 @@ define void @flat_atomic_cmpxchg_i64_ret_v_a__v(ptr %ptr) #0 {
; CHECK-LABEL: flat_atomic_cmpxchg_i64_ret_v_a__v:
; CHECK: ; %bb.0:
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; CHECK-NEXT: v_add_co_u32_e32 v6, vcc, 0x50, v0
+; CHECK-NEXT: v_add_co_u32_e32 v4, vcc, 0x50, v0
; CHECK-NEXT: s_mov_b64 s[4:5], src_private_base
-; CHECK-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v1, vcc
+; CHECK-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v1, vcc
; CHECK-NEXT: ;;#ASMSTART
; CHECK-NEXT: ; def a[0:1]
; CHECK-NEXT: ;;#ASMEND
-; CHECK-NEXT: v_accvgpr_read_b32 v0, a0
-; CHECK-NEXT: v_accvgpr_read_b32 v1, a1
-; CHECK-NEXT: v_cmp_ne_u32_e32 vcc, s5, v7
+; CHECK-NEXT: v_accvgpr_read_b32 v7, a1
+; CHECK-NEXT: v_accvgpr_read_b32 v6, a0
+; CHECK-NEXT: v_cmp_ne_u32_e32 vcc, s5, v5
; CHECK-NEXT: ;;#ASMSTART
; CHECK-NEXT: ; def v[2:3]
; CHECK-NEXT: ;;#ASMEND
-; CHECK-NEXT: ; implicit-def: $vgpr4_vgpr5
+; CHECK-NEXT: ; implicit-def: $vgpr0_vgpr1
; CHECK-NEXT: s_and_saveexec_b64 s[4:5], vcc
; CHECK-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
; CHECK-NEXT: s_cbranch_execz .LBB17_2
; CHECK-NEXT: ; %bb.1: ; %atomicrmw.global
+; CHECK-NEXT: v_accvgpr_read_b32 v0, a0
+; CHECK-NEXT: v_accvgpr_read_b32 v1, a1
; CHECK-NEXT: buffer_wbl2
-; CHECK-NEXT: flat_atomic_cmpswap_x2 v[4:5], v[6:7], v[0:3] glc
+; CHECK-NEXT: flat_atomic_cmpswap_x2 v[0:1], v[4:5], v[0:3] glc
; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; CHECK-NEXT: buffer_invl2
; CHECK-NEXT: buffer_wbinvl1_vol
-; CHECK-NEXT: ; implicit-def: $vgpr6_vgpr7
+; CHECK-NEXT: ; implicit-def: $vgpr4_vgpr5
; CHECK-NEXT: ; implicit-def: $vgpr2_vgpr3
+; CHECK-NEXT: ; implicit-def: $vgpr6_vgpr7
; CHECK-NEXT: .LBB17_2: ; %Flow
; CHECK-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
; CHECK-NEXT: s_cbranch_execz .LBB17_4
; CHECK-NEXT: ; %bb.3: ; %atomicrmw.private
-; CHECK-NEXT: v_cmp_ne_u64_e32 vcc, 0, v[6:7]
-; CHECK-NEXT: v_cndmask_b32_e32 v6, -1, v6, vcc
-; CHECK-NEXT: buffer_load_dword v4, v6, s[0:3], 0 offen
-; CHECK-NEXT: buffer_load_dword v5, v6, s[0:3], 0 offen offset:4
+; CHECK-NEXT: v_cmp_ne_u64_e32 vcc, 0, v[4:5]
+; CHECK-NEXT: v_cndmask_b32_e32 v4, -1, v4, vcc
+; CHECK-NEXT: buffer_load_dword v0, v4, s[0:3], 0 offen
+; CHECK-NEXT: buffer_load_dword v1, v4, s[0:3], 0 offen offset:4
; CHECK-NEXT: s_waitcnt vmcnt(0)
-; CHECK-NEXT: v_cmp_eq_u64_e32 vcc, v[4:5], v[2:3]
-; CHECK-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc
-; CHECK-NEXT: v_cndmask_b32_e32 v1, v5, v1, vcc
-; CHECK-NEXT: buffer_store_dword v0, v6, s[0:3], 0 offen
-; CHECK-NEXT: buffer_store_dword v1, v6, s[0:3], 0 offen offset:4
+; CHECK-NEXT: v_cmp_eq_u64_e32 vcc, v[0:1], v[2:3]
+; CHECK-NEXT: v_cndmask_b32_e32 v3, v0, v6, vcc
+; CHECK-NEXT: v_cndmask_b32_e32 v2, v1, v7, vcc
+; CHECK-NEXT: buffer_store_dword v3, v4, s[0:3], 0 offen
+; CHECK-NEXT: buffer_store_dword v2, v4, s[0:3], 0 offen offset:4
; CHECK-NEXT: .LBB17_4: ; %atomicrmw.phi
; CHECK-NEXT: s_or_b64 exec, exec, s[4:5]
; CHECK-NEXT: ;;#ASMSTART
-; CHECK-NEXT: ; use v[4:5]
+; CHECK-NEXT: ; use v[0:1]
; CHECK-NEXT: ;;#ASMEND
; CHECK-NEXT: s_waitcnt vmcnt(0)
; CHECK-NEXT: s_setpc_b64 s[30:31]
@@ -716,48 +733,51 @@ define void @flat_atomic_cmpxchg_i64_ret_a_v__v(ptr %ptr) #0 {
; CHECK-LABEL: flat_atomic_cmpxchg_i64_ret_a_v__v:
; CHECK: ; %bb.0:
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; CHECK-NEXT: v_add_co_u32_e32 v6, vcc, 0x50, v0
+; CHECK-NEXT: v_add_co_u32_e32 v4, vcc, 0x50, v0
; CHECK-NEXT: s_mov_b64 s[4:5], src_private_base
-; CHECK-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v1, vcc
+; CHECK-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v1, vcc
; CHECK-NEXT: ;;#ASMSTART
; CHECK-NEXT: ; def a[0:1]
; CHECK-NEXT: ;;#ASMEND
-; CHECK-NEXT: v_accvgpr_read_b32 v3, a1
-; CHECK-NEXT: v_accvgpr_read_b32 v2, a0
-; CHECK-NEXT: v_cmp_ne_u32_e32 vcc, s5, v7
+; CHECK-NEXT: v_accvgpr_read_b32 v7, a1
+; CHECK-NEXT: v_accvgpr_read_b32 v6, a0
+; CHECK-NEXT: v_cmp_ne_u32_e32 vcc, s5, v5
; CHECK-NEXT: ;;#ASMSTART
; CHECK-NEXT: ; def v[0:1]
; CHECK-NEXT: ;;#ASMEND
-; CHECK-NEXT: ; implicit-def: $vgpr4_vgpr5
+; CHECK-NEXT: ; implicit-def: $vgpr2_vgpr3
; CHECK-NEXT: s_and_saveexec_b64 s[4:5], vcc
; CHECK-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
; CHECK-NEXT: s_cbranch_execz .LBB18_2
; CHECK-NEXT: ; %bb.1: ; %atomicrmw.global
+; CHECK-NEXT: v_accvgpr_read_b32 v2, a0
+; CHECK-NEXT: v_accvgpr_read_b32 v3, a1
; CHECK-NEXT: buffer_wbl2
-; CHECK-NEXT: flat_atomic_cmpswap_x2 v[4:5], v[6:7], v[0:3] glc
+; CHECK-NEXT: flat_atomic_cmpswap_x2 v[2:3], v[4:5], v[0:3] glc
; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; CHECK-NEXT: buffer_invl2
; CHECK-NEXT: buffer_wbinvl1_vol
+; CHECK-NEXT: ; implicit-def: $vgpr4_vgpr5
; CHECK-NEXT: ; implicit-def: $vgpr6_vgpr7
-; CHECK-NEXT: ; implicit-def: $vgpr2_vgpr3
+; CHECK-NEXT: ; implicit-def: $vgpr0_vgpr1
; CHECK-NEXT: .LBB18_2: ; %Flow
; CHECK-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
; CHECK-NEXT: s_cbranch_execz .LBB18_4
; CHECK-NEXT: ; %bb.3: ; %atomicrmw.private
-; CHECK-NEXT: v_cmp_ne_u64_e32 vcc, 0, v[6:7]
-; CHECK-NEXT: v_cndmask_b32_e32 v6, -1, v6, vcc
-; CHECK-NEXT: buffer_load_dword v4, v6, s[0:3], 0 offen
-; CHECK-NEXT: buffer_load_dword v5, v6, s[0:3], 0 offen offset:4
+; CHECK-NEXT: v_cmp_ne_u64_e32 vcc, 0, v[4:5]
+; CHECK-NEXT: v_cndmask_b32_e32 v4, -1, v4, vcc
+; CHECK-NEXT: buffer_load_dword v2, v4, s[0:3], 0 offen
+; CHECK-NEXT: buffer_load_dword v3, v4, s[0:3], 0 offen offset:4
; CHECK-NEXT: s_waitcnt vmcnt(0)
-; CHECK-NEXT: v_cmp_eq_u64_e32 vcc, v[4:5], v[2:3]
-; CHECK-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc
-; CHECK-NEXT: v_cndmask_b32_e32 v1, v5, v1, vcc
-; CHECK-NEXT: buffer_store_dword v0, v6, s[0:3], 0 offen
-; CHECK-NEXT: buffer_store_dword v1, v6, s[0:3], 0 offen offset:4
+; CHECK-NEXT: v_cmp_eq_u64_e32 vcc, v[2:3], v[6:7]
+; CHECK-NEXT: v_cndmask_b32_e32 v0, v2, v0, vcc
+; CHECK-NEXT: v_cndmask_b32_e32 v1, v3, v1, vcc
+; CHECK-NEXT: buffer_store_dword v0, v4, s[0:3], 0 offen
+; CHECK-NEXT: buffer_store_dword v1, v4, s[0:3], 0 offen offset:4
; CHECK-NEXT: .LBB18_4: ; %atomicrmw.phi
; CHECK-NEXT: s_or_b64 exec, exec, s[4:5]
; CHECK-NEXT: ;;#ASMSTART
-; CHECK-NEXT: ; use v[4:5]
+; CHECK-NEXT: ; use v[2:3]
; CHECK-NEXT: ;;#ASMEND
; CHECK-NEXT: s_waitcnt vmcnt(0)
; CHECK-NEXT: s_setpc_b64 s[30:31]
@@ -947,48 +967,51 @@ define void @flat_atomic_cmpxchg_i64_ret_av_a__av(ptr %ptr) #0 {
; CHECK-LABEL: flat_atomic_cmpxchg_i64_ret_av_a__av:
; CHECK: ; %bb.0:
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; CHECK-NEXT: v_add_co_u32_e32 v6, vcc, 0x50, v0
+; CHECK-NEXT: v_add_co_u32_e32 v4, vcc, 0x50, v0
; CHECK-NEXT: s_mov_b64 s[4:5], src_private_base
-; CHECK-NEXT: v_addc_co_u32_e32 v7, vcc, 0, v1, vcc
+; CHECK-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v1, vcc
; CHECK-NEXT: ;;#ASMSTART
; CHECK-NEXT: ; def a[0:1]
; CHECK-NEXT: ;;#ASMEND
-; CHECK-NEXT: v_accvgpr_read_b32 v0, a0
-; CHECK-NEXT: v_accvgpr_read_b32 v1, a1
-; CHECK-NEXT: v_cmp_ne_u32_e32 vcc, s5, v7
+; CHECK-NEXT: v_accvgpr_read_b32 v7, a1
+; CHECK-NEXT: v_accvgpr_read_b32 v6, a0
+; CHECK-NEXT: v_cmp_ne_u32_e32 vcc, s5, v5
; CHECK-NEXT: ;;#ASMSTART
; CHECK-NEXT: ; def v[2:3]
; CHECK-NEXT: ;;#ASMEND
-; CHECK-NEXT: ; implicit-def: $vgpr4_vgpr5
+; CHECK-NEXT: ; implicit-def: $vgpr0_vgpr1
; CHECK-NEXT: s_and_saveexec_b64 s[4:5], vcc
; CHECK-NEXT: s_xor_b64 s[4:5], exec, s[4:5]
; CHECK-NEXT: s_cbranch_execz .LBB22_2
; CHECK-NEXT: ; %bb.1: ; %atomicrmw.global
+; CHECK-NEXT: v_accvgpr_read_b32 v0, a0
+; CHECK-NEXT: v_accvgpr_read_b32 v1, a1
; CHECK-NEXT: buffer_wbl2
-; CHECK-NEXT: flat_atomic_cmpswap_x2 v[4:5], v[6:7], v[0:3] glc
+; CHECK-NEXT: flat_atomic_cmpswap_x2 v[0:1], v[4:5], v[0:3] glc
; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; CHECK-NEXT: buffer_invl2
; CHECK-NEXT: buffer_wbinvl1_vol
-; CHECK-NEXT: ; implicit-def: $vgpr6_vgpr7
+; CHECK-NEXT: ; implicit-def: $vgpr4_vgpr5
; CHECK-NEXT: ; implicit-def: $vgpr2_vgpr3
+; CHECK-NEXT: ; implicit-def: $vgpr6_...
[truncated]
|
111d5aa to
25683ac
Compare
ee34e82 to
9a8a0ec
Compare
25683ac to
6a0fc72
Compare
shiltian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like there is anything significantly different, but in some cases GISel and SelectionDAG generate same code, which is nice.
This probably should have turned into a regular integer constant earlier. This is to defend against future regressions.
The main improvement is to the mfma tests. There are some mild regressions scattered around, and a few major ones. The worst regressions are in some of the bitcast tests; these are cases where the SGPR argument list runs out and uses VGPRs, and the copies-from-VGPR are misidentified as divergent. Most of the shufflevector tests are also regressions. These end up with cleaner MIR, but then get poor regalloc decisions.
9a8a0ec to
f3c3a66
Compare
6a0fc72 to
93256f5
Compare

The main improvement is to the mfma tests. There are some
mild regressions scattered around, and a few major ones.
The worst regressions are in some of the bitcast tests;
these are cases where the SGPR argument list runs out
and uses VGPRs, and the copies-from-VGPR are misidentified
as divergent. Most of the shufflevector tests are also
regressions. These end up with cleaner MIR, but then get poor
regalloc decisions.