-
Notifications
You must be signed in to change notification settings - Fork 14.9k
AMDGPU: Add agpr variants of multi-data DS instructions #156420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMDGPU: Add agpr variants of multi-data DS instructions #156420
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
defm DS_READ_U16_D16_HI : DS_Real_gfx11_gfx12<0x0a7, "ds_load_u16_d16_hi">; | ||
defm DS_WRITE_ADDTID_B32 : DS_Real_gfx11_gfx12<0x0b0, "ds_store_addtid_b32">; | ||
defm DS_READ_ADDTID_B32 : DS_Real_gfx11_gfx12<0x0b1, "ds_load_addtid_b32">; | ||
defm DS_WRITE_B8_D16_HI : DS_Real_gfx11_gfx12<0x0a0, "ds_store_b8_d16_hi", DS_WRITE_B8_D16_HI>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish there is a way to avoid this.
76a6947
to
570dc4b
Compare
849b10b
to
21fa4b7
Compare
570dc4b
to
fe4b601
Compare
The instruction definitions for loads and stores do not accurately model the operand constraints of loads and stores with AGPRs. They use AV register classes, plus a hack a hack in getRegClass/getOpRegClass to avoid using AGPRs or AV classes with the multiple operand cases, but it did not consider the 3 operand case. Model this correctly by using separate all-VGPR and all-AGPR variants for the cases with multiple data operands. This does regress the assembler errors on gfx908 for the multi-operand cases. It now reports a generic operand invalid error for GPU instead of the specific message that agpr loads and stores aren't supported. In the future AMDGPURewriteAGPRCopyMFMA should be taught to replace the VGPR forms with the AGPR ones. Most of the diff is fighting the DS pseudo structure. The mnemonic was being used as the key to SIMCInstr, which is a collision in the AGPR case. We also need to go out of our way to make sure we are using the gfx9+ variants of the pseudos without the m0 use. The DS multiclasses could use a lot of cleanup. Fixes #155777
fe4b601
to
4715a46
Compare
@llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) ChangesThe instruction definitions for loads and stores do not Model this correctly by using separate all-VGPR and all-AGPR This does regress the assembler errors on gfx908 for the In the future AMDGPURewriteAGPRCopyMFMA should be taught Most of the diff is fighting the DS pseudo structure. The Fixes #155777 Patch is 127.51 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/156420.diff 6 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/DSInstructions.td b/llvm/lib/Target/AMDGPU/DSInstructions.td
index e6a07ebe1cafb..7552326c39468 100644
--- a/llvm/lib/Target/AMDGPU/DSInstructions.td
+++ b/llvm/lib/Target/AMDGPU/DSInstructions.td
@@ -8,7 +8,7 @@
class DS_Pseudo <string opName, dag outs, dag ins, string asmOps, list<dag> pattern=[]> :
InstSI <outs, ins, "", pattern>,
- SIMCInstr <opName, SIEncodingFamily.NONE> {
+ SIMCInstr <NAME, SIEncodingFamily.NONE> {
let LGKM_CNT = 1;
let DS = 1;
@@ -51,6 +51,22 @@ class DS_Pseudo <string opName, dag outs, dag ins, string asmOps, list<dag> patt
let Uses = !if(has_m0_read, [M0, EXEC], [EXEC]);
}
+class DstOperandIsAV<dag OperandList> {
+ bit ret = OperandIsAV<!getdagarg<DAGOperand>(OperandList, "vdst")>.ret;
+}
+
+class DstOperandIsAGPR<dag OperandList> {
+ bit ret = OperandIsAGPR<!getdagarg<DAGOperand>(OperandList, "vdst")>.ret;
+}
+
+class DataOperandIsAV<dag OperandList> {
+ bit ret = OperandIsAV<!getdagarg<DAGOperand>(OperandList, "data0")>.ret;
+}
+
+class DataOperandIsAGPR<dag OperandList> {
+ bit ret = OperandIsAGPR<!getdagarg<DAGOperand>(OperandList, "data0")>.ret;
+}
+
class DS_Real <DS_Pseudo ps, string opName = ps.Mnemonic> :
InstSI <ps.OutOperandList, ps.InOperandList, opName # ps.AsmOperands>,
Enc64 {
@@ -91,8 +107,25 @@ class DS_Real <DS_Pseudo ps, string opName = ps.Mnemonic> :
let offset0 = !if(ps.has_offset, offset{7-0}, ?);
let offset1 = !if(ps.has_offset, offset{15-8}, ?);
- bits<1> acc = !if(ps.has_vdst, vdst{9},
- !if(!or(ps.has_data0, ps.has_gws_data0), data0{9}, 0));
+ // Figure out if we should set the acc bit. Simple load and store
+ // instructions with a single data operand can use AV_* classes, in
+ // which case the encoding comes from the assigned register field.
+
+ // For more compliated cases with multiple data operands, since the
+ // register fields are only 8-bit, so data operands must all be AGPR
+ // or VGPR.
+ defvar DstOpIsAV = !if(ps.has_vdst,
+ DstOperandIsAV<ps.OutOperandList>.ret, 0);
+ defvar DstOpIsAGPR = !if(ps.has_vdst,
+ DstOperandIsAGPR<ps.OutOperandList>.ret, 0);
+ defvar DataOpIsAV = !if(!or(ps.has_data0, ps.has_gws_data0),
+ DataOperandIsAV<ps.InOperandList>.ret, 0);
+ defvar DataOpIsAGPR = !if(!or(ps.has_data0, ps.has_gws_data0),
+ DataOperandIsAGPR<ps.InOperandList>.ret, 0);
+
+ bits<1> acc = !if(ps.has_vdst,
+ !if(DstOpIsAV, vdst{9}, DstOpIsAGPR),
+ !if(DataOpIsAV, data0{9}, DataOpIsAGPR));
}
// DS Pseudo instructions
@@ -143,8 +176,7 @@ multiclass DS_1A1D_NORET_mc_gfx9<string opName, RegisterClass rc = VGPR_32> {
}
}
-class DS_1A2D_NORET<string opName, RegisterClass rc = VGPR_32,
- RegisterOperand data_op = getLdStRegisterOperand<rc>.ret>
+class DS_1A2D_NORET<string opName, RegisterClass data_op = VGPR_32>
: DS_Pseudo<opName,
(outs),
(ins VGPR_32:$addr, data_op:$data0, data_op:$data1, Offset:$offset, gds:$gds),
@@ -159,11 +191,15 @@ multiclass DS_1A2D_NORET_mc<string opName, RegisterClass rc = VGPR_32> {
let has_m0_read = 0 in {
def _gfx9 : DS_1A2D_NORET<opName, rc>;
+
+ // All data operands are replaced with AGPRs in this form.
+ let SubtargetPredicate = isGFX90APlus in {
+ def _agpr : DS_1A2D_NORET<opName, getEquivalentAGPRClass<rc>.ret>;
+ }
}
}
-class DS_1A2D_Off8_NORET <string opName, RegisterClass rc = VGPR_32,
- RegisterOperand data_op = getLdStRegisterOperand<rc>.ret>
+class DS_1A2D_Off8_NORET <string opName, RegisterClass data_op = VGPR_32>
: DS_Pseudo<opName,
(outs),
(ins VGPR_32:$addr, data_op:$data0, data_op:$data1,
@@ -179,6 +215,10 @@ multiclass DS_1A2D_Off8_NORET_mc <string opName, RegisterClass rc = VGPR_32> {
let has_m0_read = 0 in {
def _gfx9 : DS_1A2D_Off8_NORET<opName, rc>;
+
+ let SubtargetPredicate = isGFX90APlus in {
+ def _agpr : DS_1A2D_Off8_NORET<opName, getEquivalentAGPRClass<rc>.ret>;
+ }
}
}
@@ -223,48 +263,47 @@ multiclass DS_1A1D_RET_mc_gfx9 <string opName, RegisterClass rc = VGPR_32> {
}
class DS_1A2D_RET<string opName,
- RegisterClass rc = VGPR_32,
- RegisterClass src = rc,
- RegisterOperand dst_op = getLdStRegisterOperand<rc>.ret,
- RegisterOperand src_op = getLdStRegisterOperand<src>.ret>
-: DS_Pseudo<opName,
- (outs dst_op:$vdst),
- (ins VGPR_32:$addr, src_op:$data0, src_op:$data1, Offset:$offset, gds:$gds),
+ RegisterClass dst_rc = VGPR_32,
+ RegisterClass src_rc = dst_rc>: DS_Pseudo<opName,
+ (outs dst_rc:$vdst),
+ (ins VGPR_32:$addr, src_rc:$data0, src_rc:$data1, Offset:$offset, gds:$gds),
" $vdst, $addr, $data0, $data1$offset$gds"> {
let IsAtomicRet = 1;
}
multiclass DS_1A2D_RET_mc<string opName,
- RegisterClass rc = VGPR_32,
- RegisterClass src = rc> {
- def "" : DS_1A2D_RET<opName, rc, src>;
+ RegisterClass dst_rc = VGPR_32,
+ RegisterClass src_rc = dst_rc> {
+ def "" : DS_1A2D_RET<opName, dst_rc, src_rc>;
let has_m0_read = 0 in {
- def _gfx9 : DS_1A2D_RET<opName, rc, src>;
+ def _gfx9 : DS_1A2D_RET<opName, dst_rc, src_rc>;
+ def _agpr : DS_1A2D_RET<opName, getEquivalentAGPRClass<dst_rc>.ret,
+ getEquivalentAGPRClass<src_rc>.ret>;
}
}
class DS_1A2D_Off8_RET<string opName,
- RegisterClass rc = VGPR_32,
- RegisterClass src = rc,
- RegisterOperand dst_op = getLdStRegisterOperand<rc>.ret,
- RegisterOperand src_op = getLdStRegisterOperand<src>.ret>
+ RegisterClass dst_rc = VGPR_32,
+ RegisterClass src_rc = dst_rc>
: DS_Pseudo<opName,
- (outs dst_op:$vdst),
- (ins VGPR_32:$addr, src_op:$data0, src_op:$data1, Offset0:$offset0, Offset1:$offset1, gds:$gds),
+ (outs dst_rc:$vdst),
+ (ins VGPR_32:$addr, src_rc:$data0, src_rc:$data1, Offset0:$offset0, Offset1:$offset1, gds:$gds),
" $vdst, $addr, $data0, $data1$offset0$offset1$gds"> {
let has_offset = 0;
}
multiclass DS_1A2D_Off8_RET_mc<string opName,
- RegisterClass rc = VGPR_32,
- RegisterClass src = rc> {
- def "" : DS_1A2D_Off8_RET<opName, rc, src>;
+ RegisterClass dst_rc = VGPR_32,
+ RegisterClass src_rc = dst_rc> {
+ def "" : DS_1A2D_Off8_RET<opName, dst_rc, src_rc>;
let has_m0_read = 0 in {
- def _gfx9 : DS_1A2D_Off8_RET<opName, rc, src>;
+ def _gfx9 : DS_1A2D_Off8_RET<opName, dst_rc, src_rc>;
+ def _agpr : DS_1A2D_Off8_RET<opName, getEquivalentAGPRClass<dst_rc>.ret,
+ getEquivalentAGPRClass<src_rc>.ret>;
}
}
@@ -305,7 +344,7 @@ multiclass DS_1A_RET_mc<string opName, RegisterClass rc = VGPR_32, bit HasTiedOu
}
}
-multiclass DS_1A_RET_t16<string opName, RegisterClass rc = VGPR_32, bit HasTiedOutput = 0, Operand ofs = Offset>
+multiclass DS_1A_RET_t16<string opName, RegisterClass rc = VGPR_32, bit HasTiedOutput = 0, Operand ofs = Offset>
: DS_1A_RET_mc<opName, rc, HasTiedOutput, ofs> {
let has_m0_read = 0 in {
let True16Predicate = UseRealTrue16Insts in {
@@ -1379,7 +1418,7 @@ multiclass DS_Real_gfx12<bits<8> op,
// Helper to avoid repeating the pseudo-name if we only need to set
// the gfx12 name.
multiclass DS_Real_gfx12_with_name<bits<8> op, string name> {
- defm "" : DS_Real_gfx12<op, !cast<DS_Pseudo>(NAME), name>;
+ defm "" : DS_Real_gfx12<op, !cast<DS_Pseudo>(NAME#"_gfx9"), name>;
}
defm DS_MIN_F32 : DS_Real_gfx12_with_name<0x012, "ds_min_num_f32">;
@@ -1405,8 +1444,8 @@ defm DS_LOAD_TR6_B96 : DS_Real_gfx12<0x0fb>;
defm DS_LOAD_TR16_B128 : DS_Real_gfx12<0x0fc>;
defm DS_LOAD_TR8_B64 : DS_Real_gfx12<0x0fd>;
-defm DS_BVH_STACK_RTN_B32 : DS_Real_gfx12_with_name<0x0e0,
- "ds_bvh_stack_push4_pop1_rtn_b32">;
+defm DS_BVH_STACK_RTN_B32 : DS_Real_gfx12<0x0e0, DS_BVH_STACK_RTN_B32,
+ "ds_bvh_stack_push4_pop1_rtn_b32">;
defm DS_BVH_STACK_PUSH8_POP1_RTN_B32 : DS_Real_gfx12<0x0e1>;
defm DS_BVH_STACK_PUSH8_POP2_RTN_B64 : DS_Real_gfx12<0x0e2>;
@@ -1434,7 +1473,7 @@ def : MnemonicAlias<"ds_load_tr_b128", "ds_load_tr16_b128">, Requires<[isGFX1250
// GFX11.
//===----------------------------------------------------------------------===//
-multiclass DS_Real_gfx11<bits<8> op, DS_Pseudo ps = !cast<DS_Pseudo>(NAME),
+multiclass DS_Real_gfx11<bits<8> op, DS_Pseudo ps = !cast<DS_Pseudo>(NAME#"_gfx9"),
string name = !tolower(NAME)> {
let AssemblerPredicate = isGFX11Only in {
let DecoderNamespace = "GFX11" in
@@ -1448,7 +1487,7 @@ multiclass DS_Real_gfx11<bits<8> op, DS_Pseudo ps = !cast<DS_Pseudo>(NAME),
multiclass DS_Real_gfx11_gfx12<bits<8> op,
string name = !tolower(NAME),
- DS_Pseudo ps = !cast<DS_Pseudo>(NAME)>
+ DS_Pseudo ps = !cast<DS_Pseudo>(NAME#"_gfx9")>
: DS_Real_gfx11<op, ps, name>,
DS_Real_gfx12<op, ps, name>;
@@ -1476,16 +1515,16 @@ defm DS_WRXCHG2ST64_RTN_B64 : DS_Real_gfx11_gfx12<0x06f, "ds_storexchg_2addr_str
defm DS_READ_B64 : DS_Real_gfx11_gfx12<0x076, "ds_load_b64">;
defm DS_READ2_B64 : DS_Real_gfx11_gfx12<0x077, "ds_load_2addr_b64">;
defm DS_READ2ST64_B64 : DS_Real_gfx11_gfx12<0x078, "ds_load_2addr_stride64_b64">;
-defm DS_WRITE_B8_D16_HI : DS_Real_gfx11_gfx12<0x0a0, "ds_store_b8_d16_hi">;
-defm DS_WRITE_B16_D16_HI : DS_Real_gfx11_gfx12<0x0a1, "ds_store_b16_d16_hi">;
-defm DS_READ_U8_D16 : DS_Real_gfx11_gfx12<0x0a2, "ds_load_u8_d16">;
-defm DS_READ_U8_D16_HI : DS_Real_gfx11_gfx12<0x0a3, "ds_load_u8_d16_hi">;
-defm DS_READ_I8_D16 : DS_Real_gfx11_gfx12<0x0a4, "ds_load_i8_d16">;
-defm DS_READ_I8_D16_HI : DS_Real_gfx11_gfx12<0x0a5, "ds_load_i8_d16_hi">;
-defm DS_READ_U16_D16 : DS_Real_gfx11_gfx12<0x0a6, "ds_load_u16_d16">;
-defm DS_READ_U16_D16_HI : DS_Real_gfx11_gfx12<0x0a7, "ds_load_u16_d16_hi">;
-defm DS_WRITE_ADDTID_B32 : DS_Real_gfx11_gfx12<0x0b0, "ds_store_addtid_b32">;
-defm DS_READ_ADDTID_B32 : DS_Real_gfx11_gfx12<0x0b1, "ds_load_addtid_b32">;
+defm DS_WRITE_B8_D16_HI : DS_Real_gfx11_gfx12<0x0a0, "ds_store_b8_d16_hi", DS_WRITE_B8_D16_HI>;
+defm DS_WRITE_B16_D16_HI : DS_Real_gfx11_gfx12<0x0a1, "ds_store_b16_d16_hi", DS_WRITE_B16_D16_HI>;
+defm DS_READ_U8_D16 : DS_Real_gfx11_gfx12<0x0a2, "ds_load_u8_d16", DS_READ_U8_D16>;
+defm DS_READ_U8_D16_HI : DS_Real_gfx11_gfx12<0x0a3, "ds_load_u8_d16_hi", DS_READ_U8_D16_HI>;
+defm DS_READ_I8_D16 : DS_Real_gfx11_gfx12<0x0a4, "ds_load_i8_d16", DS_READ_I8_D16>;
+defm DS_READ_I8_D16_HI : DS_Real_gfx11_gfx12<0x0a5, "ds_load_i8_d16_hi", DS_READ_I8_D16_HI>;
+defm DS_READ_U16_D16 : DS_Real_gfx11_gfx12<0x0a6, "ds_load_u16_d16", DS_READ_U16_D16>;
+defm DS_READ_U16_D16_HI : DS_Real_gfx11_gfx12<0x0a7, "ds_load_u16_d16_hi", DS_READ_U16_D16_HI>;
+defm DS_WRITE_ADDTID_B32 : DS_Real_gfx11_gfx12<0x0b0, "ds_store_addtid_b32", DS_WRITE_ADDTID_B32>;
+defm DS_READ_ADDTID_B32 : DS_Real_gfx11_gfx12<0x0b1, "ds_load_addtid_b32", DS_READ_ADDTID_B32>;
defm DS_WRITE_B96 : DS_Real_gfx11_gfx12<0x0de, "ds_store_b96">;
defm DS_WRITE_B128 : DS_Real_gfx11_gfx12<0x0df, "ds_store_b128">;
defm DS_READ_B96 : DS_Real_gfx11_gfx12<0x0fe, "ds_load_b96">;
@@ -1505,22 +1544,22 @@ defm DS_CMPSTORE_RTN_B64 : DS_Real_gfx11_gfx12<0x070>;
defm DS_CMPSTORE_RTN_F64 : DS_Real_gfx11<0x071>;
defm DS_ADD_RTN_F32 : DS_Real_gfx11_gfx12<0x079>;
-defm DS_ADD_GS_REG_RTN : DS_Real_gfx11<0x07a>;
-defm DS_SUB_GS_REG_RTN : DS_Real_gfx11<0x07b>;
-defm DS_BVH_STACK_RTN_B32 : DS_Real_gfx11<0x0ad>;
+defm DS_ADD_GS_REG_RTN : DS_Real_gfx11<0x07a, DS_ADD_GS_REG_RTN>;
+defm DS_SUB_GS_REG_RTN : DS_Real_gfx11<0x07b, DS_SUB_GS_REG_RTN>;
+defm DS_BVH_STACK_RTN_B32 : DS_Real_gfx11<0x0ad, DS_BVH_STACK_RTN_B32>;
//===----------------------------------------------------------------------===//
// GFX10.
//===----------------------------------------------------------------------===//
let AssemblerPredicate = isGFX10Only, DecoderNamespace = "GFX10" in {
- multiclass DS_Real_gfx10<bits<8> op> {
+ multiclass DS_Real_gfx10<bits<8> op, DS_Pseudo ps = !cast<DS_Pseudo>(NAME)> {
def _gfx10 : Base_DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<op,
- !cast<DS_Pseudo>(NAME), SIEncodingFamily.GFX10>;
+ ps, SIEncodingFamily.GFX10>;
}
} // End AssemblerPredicate = isGFX10Only, DecoderNamespace = "GFX10"
-defm DS_ADD_RTN_F32 : DS_Real_gfx10<0x055>;
+defm DS_ADD_RTN_F32 : DS_Real_gfx10<0x055, DS_ADD_RTN_F32_gfx9>;
defm DS_WRITE_B8_D16_HI : DS_Real_gfx10<0x0a0>;
defm DS_WRITE_B16_D16_HI : DS_Real_gfx10<0x0a1>;
defm DS_READ_U8_D16 : DS_Real_gfx10<0x0a2>;
@@ -1536,39 +1575,48 @@ defm DS_READ_ADDTID_B32 : DS_Real_gfx10<0x0b1>;
// GFX10, GFX11, GFX12.
//===----------------------------------------------------------------------===//
-multiclass DS_Real_gfx10_gfx11_gfx12<bits<8> op> :
- DS_Real_gfx10<op>, DS_Real_gfx11<op>, DS_Real_gfx12<op>;
+multiclass DS_Real_gfx10_gfx11_gfx12<bits<8> op, DS_Pseudo ps = !cast<DS_Pseudo>(NAME#"_gfx9")> :
+ DS_Real_gfx10<op, ps>,
+ DS_Real_gfx11<op, ps>,
+ DS_Real_gfx12<op, ps>;
-multiclass DS_Real_gfx10_gfx11<bits<8> op> :
- DS_Real_gfx10<op>, DS_Real_gfx11<op>;
+multiclass DS_Real_gfx10_gfx11<bits<8> op, DS_Pseudo ps = !cast<DS_Pseudo>(NAME#"_gfx9")> :
+ DS_Real_gfx10<op, ps>, DS_Real_gfx11<op, ps>;
defm DS_ADD_F32 : DS_Real_gfx10_gfx11_gfx12<0x015>;
defm DS_ADD_SRC2_F32 : DS_Real_gfx10<0x095>;
-defm DS_PERMUTE_B32 : DS_Real_gfx10_gfx11_gfx12<0x0b2>;
-defm DS_BPERMUTE_B32 : DS_Real_gfx10_gfx11_gfx12<0x0b3>;
+defm DS_PERMUTE_B32 : DS_Real_gfx10_gfx11_gfx12<0x0b2, DS_PERMUTE_B32>;
+defm DS_BPERMUTE_B32 : DS_Real_gfx10_gfx11_gfx12<0x0b3, DS_BPERMUTE_B32>;
//===----------------------------------------------------------------------===//
// GFX7, GFX10, GFX11, GFX12.
//===----------------------------------------------------------------------===//
let AssemblerPredicate = isGFX7Only, DecoderNamespace = "GFX7" in {
- multiclass DS_Real_gfx7<bits<8> op> {
+ multiclass DS_Real_gfx7<bits<8> op, DS_Pseudo ps> {
def _gfx7 : Base_DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<op,
- !cast<DS_Pseudo>(NAME), SIEncodingFamily.SI>;
+ ps, SIEncodingFamily.SI>;
}
} // End AssemblerPredicate = isGFX7Only, DecoderNamespace = "GFX7"
-multiclass DS_Real_gfx7_gfx10_gfx11_gfx12<bits<8> op> :
- DS_Real_gfx7<op>, DS_Real_gfx10_gfx11_gfx12<op>;
+multiclass DS_Real_gfx7_gfx10_gfx11_gfx12<bits<8> op,
+ DS_Pseudo ps_gfx6 = !cast<DS_Pseudo>(NAME),
+ DS_Pseudo ps_gfx9 = !cast<DS_Pseudo>(NAME#"_gfx9")> :
+ DS_Real_gfx7<op, ps_gfx6>,
+ DS_Real_gfx10_gfx11_gfx12<op, ps_gfx9>;
-multiclass DS_Real_gfx7_gfx10_gfx11<bits<8> op> :
- DS_Real_gfx7<op>, DS_Real_gfx10_gfx11<op>;
+multiclass DS_Real_gfx7_gfx10_gfx11<bits<8> op,
+ DS_Pseudo ps_gfx6 = !cast<DS_Pseudo>(NAME),
+ DS_Pseudo ps_gfx9 = !cast<DS_Pseudo>(NAME#"_gfx9")> :
+ DS_Real_gfx7<op, ps_gfx6>, DS_Real_gfx10_gfx11<op, ps_gfx9>;
-multiclass DS_Real_gfx7_gfx10<bits<8> op> :
- DS_Real_gfx7<op>, DS_Real_gfx10<op>;
+multiclass DS_Real_gfx7_gfx10<bits<8> op,
+ DS_Pseudo ps_gfx6 = !cast<DS_Pseudo>(NAME),
+ DS_Pseudo ps_gfx9 = !cast<DS_Pseudo>(NAME#"_gfx9")> :
+ DS_Real_gfx7<op, ps_gfx6>, DS_Real_gfx10<op, ps_gfx9>;
// FIXME-GFX7: Add tests when upstreaming this part.
-defm DS_GWS_SEMA_RELEASE_ALL : DS_Real_gfx7_gfx10_gfx11<0x018>;
+defm DS_GWS_SEMA_RELEASE_ALL : DS_Real_gfx7_gfx10_gfx11<0x018, DS_GWS_SEMA_RELEASE_ALL, DS_GWS_SEMA_RELEASE_ALL>;
defm DS_WRAP_RTN_B32 : DS_Real_gfx7_gfx10_gfx11<0x034>;
defm DS_CONDXCHG32_RTN_B64 : DS_Real_gfx7_gfx10_gfx11_gfx12<0x07e>;
defm DS_WRITE_B96 : DS_Real_gfx7_gfx10<0x0de>;
@@ -1581,20 +1629,27 @@ defm DS_READ_B128 : DS_Real_gfx7_gfx10<0x0ff>;
//===----------------------------------------------------------------------===//
let AssemblerPredicate = isGFX6GFX7, DecoderNamespace = "GFX6GFX7" in {
- multiclass DS_Real_gfx6_gfx7<bits<8> op> {
+ multiclass DS_Real_gfx6_gfx7<bits<8> op, DS_Pseudo ps> {
def _gfx6_gfx7 : Base_DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<op,
- !cast<DS_Pseudo>(NAME), SIEncodingFamily.SI>;
+ ps, SIEncodingFamily.SI>;
}
} // End AssemblerPredicate = isGFX6GFX7, DecoderNamespace = "GFX6GFX7"
-multiclass DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<bits<8> op> :
- DS_Real_gfx6_gfx7<op>, DS_Real_gfx10_gfx11_gfx12<op>;
+multiclass DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<bits<8> op,
+ DS_Pseudo ps_gfx6 = !cast<DS_Pseudo>(NAME),
+ DS_Pseudo ps_gfx9 = !cast<DS_Pseudo>(NAME#"_gfx9")> :
+ DS_Real_gfx6_gfx7<op, ps_gfx6>,
+ DS_Real_gfx10_gfx11_gfx12<op, ps_gfx9>;
-multiclass DS_Real_gfx6_gfx7_gfx10_gfx11<bits<8> op> :
- DS_Real_gfx6_gfx7<op>, DS_Real_gfx10_gfx11<op>;
+multiclass DS_Real_gfx6_gfx7_gfx10_gfx11<bits<8> op,
+ DS_Pseudo ps_gfx6 = !cast<DS_Pseudo>(NAME),
+ DS_Pseudo ps_gfx9 = !cast<DS_Pseudo>(NAME#"_gfx9")> :
+ DS_Real_gfx6_gfx7<op, ps_gfx6>, DS_Real_gfx10_gfx11<op, ps_gfx9>;
-multiclass DS_Real_gfx6_gfx7_gfx10<bits<8> op> :
- DS_Real_gfx6_gfx7<op>, DS_Real_gfx10<op>;
+multiclass DS_Real_gfx6_gfx7_gfx10<bits<8> op,
+ DS_Pseudo ps_gfx6 = !cast<DS_Pseudo>(NAME),
+ DS_Pseudo ps_gfx9 = !cast<DS_Pseudo>(NAME#"_gfx9")> :
+ DS_Real_gfx6_gfx7<op, ps_gfx6>, DS_Real_gfx10<op, ps_gfx9>;
defm DS_ADD_U32 : DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<0x000>;
defm DS_SUB_U32 : DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<0x001>;
@@ -1618,12 +1673,12 @@ defm DS_CMPST_F32 : DS_Real_gfx6_gfx7_gfx10<0x011>;
defm DS_MIN_F32 : DS_Real_gfx6_gfx7_gfx10_gfx11<0x012>;
defm DS_MAX_F32 : DS_Real_gfx6_gfx7_gfx10_gfx11<0x013>;
-defm DS_NOP : DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<0x014>;
-defm DS_GWS_INIT : DS_Real_gfx6_gfx7_gfx10_gfx11<0x019>;
-defm DS_GWS_SEMA_V : DS_Real_gfx6_gfx7_gfx10_gfx11<0x01a>;
-defm DS_GWS_SEMA_BR : DS_Real_gfx6_gfx7_gfx10_gfx11<0x01b>;
-defm DS_GWS_SEMA_P : DS_Real_gfx6_gfx7_gfx10_gfx11<0x01c>;
-defm DS_GWS_BARRIER : DS_Real_gfx6_gfx7_gfx10_gfx11<0x01d>;
+defm DS_NOP : DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<0x014, DS_NOP, DS_NOP>;
+defm DS_GWS_INIT : DS_Real_gfx6_gfx7_gfx10_gfx11<0x019, DS_GWS_INIT, DS_GWS_INIT>;
+defm DS_GWS_SEMA_V : DS_Real_gfx6_gfx7_gfx10_gfx11<0x01a, DS_GWS_SEMA_V, DS_GWS_SEMA_V>;
+defm DS_GWS_SEMA_BR : DS_Real_gfx6_gfx7_gfx10_gfx11<0x01b, DS_GWS_SEMA_BR, DS_GWS_SEMA_BR>;
+defm DS_GWS_SEMA_P : DS_Real_gfx6_gfx7_gfx10_gfx11<0x01c, DS_GWS_SEMA_P, DS_GWS_SEMA_P>;
+defm DS_GWS_BARRIER : DS_Real_gfx6_gfx7_gfx10_gfx11<0x01d, DS_GWS_BARRIER, DS_GWS_BARRIER>;
defm DS_WRITE_B8 : DS_Real_gfx6_gfx7_gfx10<0x01e>;
defm DS_WRITE_B16 : DS_Real_gfx6_gfx7_gfx10<0x01f>;
@@ -1650,7 +1705,7 @@ defm DS_CMPST_RTN_F32 : DS_Real_gfx6_gfx7_gfx10<0x031>;
defm DS_MIN_RTN_F32 : DS_Real_gfx6_gfx7_gfx10_gfx11<0x032>;
defm DS_MAX_RTN_F32 : DS_Real_gfx6_gfx7_gfx10_gfx11<0x033>;
-defm DS_SWIZZLE_B32 : DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<0x035>;
+defm DS_SWIZZLE_B32 : DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<0x035, DS_SWIZZLE_B32, DS_SWIZZLE_B32>;
defm DS_READ_B32 : DS_Real_gfx6_gfx7_gfx10<0x036>;
defm DS_READ2_B32 : DS_Real_gfx6_gfx7_gfx10<0x037>;
@@ -1660,9 +1715,9 @@ defm DS_READ_U8 : DS_Real_gfx6_gfx7_gfx10<0x03a>;
defm DS_READ_I16 : DS_Real_gfx6_gfx7_gfx10<0x03b>;
defm DS_READ_U16 : DS_Real_gfx6_gfx7_gfx10<0x03c>;
-defm DS_CONSUME ...
[truncated]
|
* main: (1483 commits) [clang] fix error recovery for invalid nested name specifiers (llvm#156772) Revert "[lldb] Add count for errors of DWO files in statistics and combine DWO file count functions" (llvm#156777) AMDGPU: Add agpr variants of multi-data DS instructions (llvm#156420) [libc][NFC] disable localtime on aarch64/baremetal (llvm#156776) [win/asan] Improve SharedReAlloc with HEAP_REALLOC_IN_PLACE_ONLY. (llvm#132558) [LLDB] Make internal shell the default for running LLDB lit tests. (llvm#156729) [lldb][debugserver] Max response size for qSpeedTest (llvm#156099) [AMDGPU] Define 1024 VGPRs on gfx1250 (llvm#156765) [flang] Check for BIND(C) name conflicts with alternate entries (llvm#156563) [RISCV] Add exhausted_gprs_fprs test to calling-conv-half.ll. NFC (llvm#156586) [NFC] Remove trailing whitespaces from `clang/include/clang/Basic/AttrDocs.td` [lldb] Mark scripted frames as synthetic instead of artificial (llvm#153117) [docs] Refine some of the wording in the quality developer policy (llvm#156555) [MLIR] Apply clang-tidy fixes for readability-identifier-naming in TransformOps.cpp (NFC) [MLIR] Add LDBG() tracing to VectorTransferOpTransforms.cpp (NFC) [NFC] Apply clang-format to PPCInstrFutureMMA.td (llvm#156749) [libc] implement template functions for localtime (llvm#110363) [llvm-objcopy][COFF] Update .symidx values after stripping (llvm#153322) Add documentation on debugging LLVM. [lldb] Add count for errors of DWO files in statistics and combine DWO file count functions (llvm#155023) ...
The instruction definitions for loads and stores do not
accurately model the operand constraints of loads and stores
with AGPRs. They use AV register classes, plus a hack
a hack in getRegClass/getOpRegClass to avoid using AGPRs or
AV classes with the multiple operand cases, but it did not
consider the 3 operand case.
Model this correctly by using separate all-VGPR and all-AGPR
variants for the cases with multiple data operands.
This does regress the assembler errors on gfx908 for the
multi-operand cases. It now reports a generic operand
invalid error for GPU instead of the specific message
that agpr loads and stores aren't supported.
In the future AMDGPURewriteAGPRCopyMFMA should be taught
to replace the VGPR forms with the AGPR ones.
Most of the diff is fighting the DS pseudo structure. The
mnemonic was being used as the key to SIMCInstr, which is a
collision in the AGPR case. We also need to go out of our way
to make sure we are using the gfx9+ variants of the pseudos
without the m0 use. The DS multiclasses could use a lot of
cleanup.
Fixes #155777