Skip to content

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Sep 2, 2025

The instruction definitions for loads and stores do not
accurately model the operand constraints of loads and stores
with AGPRs. They use AV register classes, plus a hack
a hack in getRegClass/getOpRegClass to avoid using AGPRs or
AV classes with the multiple operand cases, but it did not
consider the 3 operand case.

Model this correctly by using separate all-VGPR and all-AGPR
variants for the cases with multiple data operands.

This does regress the assembler errors on gfx908 for the
multi-operand cases. It now reports a generic operand
invalid error for GPU instead of the specific message
that agpr loads and stores aren't supported.

In the future AMDGPURewriteAGPRCopyMFMA should be taught
to replace the VGPR forms with the AGPR ones.

Most of the diff is fighting the DS pseudo structure. The
mnemonic was being used as the key to SIMCInstr, which is a
collision in the AGPR case. We also need to go out of our way
to make sure we are using the gfx9+ variants of the pseudos
without the m0 use. The DS multiclasses could use a lot of
cleanup.

Fixes #155777

Copy link
Contributor Author

arsenm commented Sep 2, 2025

Copy link
Collaborator

@rampitec rampitec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

defm DS_READ_U16_D16_HI : DS_Real_gfx11_gfx12<0x0a7, "ds_load_u16_d16_hi">;
defm DS_WRITE_ADDTID_B32 : DS_Real_gfx11_gfx12<0x0b0, "ds_store_addtid_b32">;
defm DS_READ_ADDTID_B32 : DS_Real_gfx11_gfx12<0x0b1, "ds_load_addtid_b32">;
defm DS_WRITE_B8_D16_HI : DS_Real_gfx11_gfx12<0x0a0, "ds_store_b8_d16_hi", DS_WRITE_B8_D16_HI>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish there is a way to avoid this.

@arsenm arsenm force-pushed the users/arsenm/amdgpu/define-agpr-variants-ds-write2-insts branch from 76a6947 to 570dc4b Compare September 3, 2025 00:04
@arsenm arsenm force-pushed the users/arsenm/amdgpu/fix-true16-d16-table-pseudo-entry branch from 849b10b to 21fa4b7 Compare September 3, 2025 00:04
Base automatically changed from users/arsenm/amdgpu/fix-true16-d16-table-pseudo-entry to main September 3, 2025 00:45
@arsenm arsenm force-pushed the users/arsenm/amdgpu/define-agpr-variants-ds-write2-insts branch from 570dc4b to fe4b601 Compare September 3, 2025 03:39
The instruction definitions for loads and stores do not
accurately model the operand constraints of loads and stores
with AGPRs. They use AV register classes, plus a hack
a hack in getRegClass/getOpRegClass to avoid using AGPRs or
AV classes with the multiple operand cases, but it did not
consider the 3 operand case.

Model this correctly by using separate all-VGPR and all-AGPR
variants for the cases with multiple data operands.

This does regress the assembler errors on gfx908 for the
multi-operand cases. It now reports a generic operand
invalid error for GPU instead of the specific message
that agpr loads and stores aren't supported.

In the future AMDGPURewriteAGPRCopyMFMA should be taught
to replace the VGPR forms with the AGPR ones.

Most of the diff is fighting the DS pseudo structure. The
mnemonic was being used as the key to SIMCInstr, which is a
collision in the AGPR case. We also need to go out of our way
to make sure we are using the gfx9+ variants of the pseudos
without the m0 use. The DS multiclasses could use a lot of
cleanup.

Fixes #155777
@llvmbot
Copy link
Member

llvmbot commented Sep 3, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

The instruction definitions for loads and stores do not
accurately model the operand constraints of loads and stores
with AGPRs. They use AV register classes, plus a hack
a hack in getRegClass/getOpRegClass to avoid using AGPRs or
AV classes with the multiple operand cases, but it did not
consider the 3 operand case.

Model this correctly by using separate all-VGPR and all-AGPR
variants for the cases with multiple data operands.

This does regress the assembler errors on gfx908 for the
multi-operand cases. It now reports a generic operand
invalid error for GPU instead of the specific message
that agpr loads and stores aren't supported.

In the future AMDGPURewriteAGPRCopyMFMA should be taught
to replace the VGPR forms with the AGPR ones.

Most of the diff is fighting the DS pseudo structure. The
mnemonic was being used as the key to SIMCInstr, which is a
collision in the AGPR case. We also need to go out of our way
to make sure we are using the gfx9+ variants of the pseudos
without the m0 use. The DS multiclasses could use a lot of
cleanup.

Fixes #155777


Patch is 127.51 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/156420.diff

6 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/DSInstructions.td (+366-284)
  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+11)
  • (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.td (+32)
  • (modified) llvm/test/CodeGen/AMDGPU/a-v-ds-atomic-cmpxchg.ll (+103-40)
  • (modified) llvm/test/MC/AMDGPU/gfx90a_err.s (+3-3)
  • (modified) llvm/test/MC/AMDGPU/gfx90a_ldst_acc.s (+182-182)
diff --git a/llvm/lib/Target/AMDGPU/DSInstructions.td b/llvm/lib/Target/AMDGPU/DSInstructions.td
index e6a07ebe1cafb..7552326c39468 100644
--- a/llvm/lib/Target/AMDGPU/DSInstructions.td
+++ b/llvm/lib/Target/AMDGPU/DSInstructions.td
@@ -8,7 +8,7 @@
 
 class DS_Pseudo <string opName, dag outs, dag ins, string asmOps, list<dag> pattern=[]> :
   InstSI <outs, ins, "", pattern>,
-  SIMCInstr <opName, SIEncodingFamily.NONE> {
+  SIMCInstr <NAME, SIEncodingFamily.NONE> {
 
   let LGKM_CNT = 1;
   let DS = 1;
@@ -51,6 +51,22 @@ class DS_Pseudo <string opName, dag outs, dag ins, string asmOps, list<dag> patt
   let Uses = !if(has_m0_read, [M0, EXEC], [EXEC]);
 }
 
+class DstOperandIsAV<dag OperandList> {
+  bit ret = OperandIsAV<!getdagarg<DAGOperand>(OperandList, "vdst")>.ret;
+}
+
+class DstOperandIsAGPR<dag OperandList> {
+  bit ret = OperandIsAGPR<!getdagarg<DAGOperand>(OperandList, "vdst")>.ret;
+}
+
+class DataOperandIsAV<dag OperandList> {
+  bit ret = OperandIsAV<!getdagarg<DAGOperand>(OperandList, "data0")>.ret;
+}
+
+class DataOperandIsAGPR<dag OperandList> {
+  bit ret = OperandIsAGPR<!getdagarg<DAGOperand>(OperandList, "data0")>.ret;
+}
+
 class DS_Real <DS_Pseudo ps, string opName = ps.Mnemonic> :
   InstSI <ps.OutOperandList, ps.InOperandList, opName # ps.AsmOperands>,
   Enc64 {
@@ -91,8 +107,25 @@ class DS_Real <DS_Pseudo ps, string opName = ps.Mnemonic> :
   let offset0 = !if(ps.has_offset, offset{7-0}, ?);
   let offset1 = !if(ps.has_offset, offset{15-8}, ?);
 
-  bits<1> acc = !if(ps.has_vdst, vdst{9},
-                    !if(!or(ps.has_data0, ps.has_gws_data0), data0{9}, 0));
+  // Figure out if we should set the acc bit. Simple load and store
+  // instructions with a single data operand can use AV_* classes, in
+  // which case the encoding comes from the assigned register field.
+
+  // For more compliated cases with multiple data operands, since the
+  // register fields are only 8-bit, so data operands must all be AGPR
+  // or VGPR.
+  defvar DstOpIsAV = !if(ps.has_vdst,
+                         DstOperandIsAV<ps.OutOperandList>.ret, 0);
+  defvar DstOpIsAGPR = !if(ps.has_vdst,
+                           DstOperandIsAGPR<ps.OutOperandList>.ret, 0);
+  defvar DataOpIsAV = !if(!or(ps.has_data0, ps.has_gws_data0),
+                          DataOperandIsAV<ps.InOperandList>.ret, 0);
+  defvar DataOpIsAGPR = !if(!or(ps.has_data0, ps.has_gws_data0),
+                            DataOperandIsAGPR<ps.InOperandList>.ret, 0);
+
+  bits<1> acc = !if(ps.has_vdst,
+                    !if(DstOpIsAV, vdst{9}, DstOpIsAGPR),
+                    !if(DataOpIsAV, data0{9}, DataOpIsAGPR));
 }
 
 // DS Pseudo instructions
@@ -143,8 +176,7 @@ multiclass DS_1A1D_NORET_mc_gfx9<string opName, RegisterClass rc = VGPR_32> {
   }
 }
 
-class DS_1A2D_NORET<string opName, RegisterClass rc = VGPR_32,
-                    RegisterOperand data_op = getLdStRegisterOperand<rc>.ret>
+class DS_1A2D_NORET<string opName, RegisterClass data_op = VGPR_32>
 : DS_Pseudo<opName,
   (outs),
   (ins VGPR_32:$addr, data_op:$data0, data_op:$data1, Offset:$offset, gds:$gds),
@@ -159,11 +191,15 @@ multiclass DS_1A2D_NORET_mc<string opName, RegisterClass rc = VGPR_32> {
 
   let has_m0_read = 0 in {
     def _gfx9 : DS_1A2D_NORET<opName, rc>;
+
+    // All data operands are replaced with AGPRs in this form.
+    let SubtargetPredicate = isGFX90APlus in {
+      def _agpr : DS_1A2D_NORET<opName, getEquivalentAGPRClass<rc>.ret>;
+    }
   }
 }
 
-class DS_1A2D_Off8_NORET <string opName, RegisterClass rc = VGPR_32,
-                          RegisterOperand data_op = getLdStRegisterOperand<rc>.ret>
+class DS_1A2D_Off8_NORET <string opName, RegisterClass data_op = VGPR_32>
 : DS_Pseudo<opName,
   (outs),
   (ins VGPR_32:$addr, data_op:$data0, data_op:$data1,
@@ -179,6 +215,10 @@ multiclass DS_1A2D_Off8_NORET_mc <string opName, RegisterClass rc = VGPR_32> {
 
   let has_m0_read = 0 in {
     def _gfx9 : DS_1A2D_Off8_NORET<opName, rc>;
+
+    let SubtargetPredicate = isGFX90APlus in {
+      def _agpr : DS_1A2D_Off8_NORET<opName, getEquivalentAGPRClass<rc>.ret>;
+    }
   }
 }
 
@@ -223,48 +263,47 @@ multiclass DS_1A1D_RET_mc_gfx9 <string opName, RegisterClass rc = VGPR_32> {
 }
 
 class DS_1A2D_RET<string opName,
-                  RegisterClass rc = VGPR_32,
-                  RegisterClass src = rc,
-                  RegisterOperand dst_op = getLdStRegisterOperand<rc>.ret,
-                  RegisterOperand src_op = getLdStRegisterOperand<src>.ret>
-: DS_Pseudo<opName,
-  (outs dst_op:$vdst),
-  (ins VGPR_32:$addr, src_op:$data0, src_op:$data1, Offset:$offset, gds:$gds),
+                  RegisterClass dst_rc = VGPR_32,
+                  RegisterClass src_rc = dst_rc>: DS_Pseudo<opName,
+  (outs dst_rc:$vdst),
+  (ins VGPR_32:$addr, src_rc:$data0, src_rc:$data1, Offset:$offset, gds:$gds),
   " $vdst, $addr, $data0, $data1$offset$gds"> {
 
   let IsAtomicRet = 1;
 }
 
 multiclass DS_1A2D_RET_mc<string opName,
-                          RegisterClass rc = VGPR_32,
-                          RegisterClass src = rc> {
-  def "" : DS_1A2D_RET<opName, rc, src>;
+                          RegisterClass dst_rc = VGPR_32,
+                          RegisterClass src_rc = dst_rc> {
+  def "" : DS_1A2D_RET<opName, dst_rc, src_rc>;
 
   let has_m0_read = 0 in {
-    def _gfx9 : DS_1A2D_RET<opName, rc, src>;
+    def _gfx9 : DS_1A2D_RET<opName, dst_rc, src_rc>;
+    def _agpr : DS_1A2D_RET<opName, getEquivalentAGPRClass<dst_rc>.ret,
+                                    getEquivalentAGPRClass<src_rc>.ret>;
   }
 }
 
 class DS_1A2D_Off8_RET<string opName,
-                       RegisterClass rc = VGPR_32,
-                       RegisterClass src = rc,
-                       RegisterOperand dst_op = getLdStRegisterOperand<rc>.ret,
-                       RegisterOperand src_op = getLdStRegisterOperand<src>.ret>
+                       RegisterClass dst_rc = VGPR_32,
+                       RegisterClass src_rc = dst_rc>
 : DS_Pseudo<opName,
-  (outs dst_op:$vdst),
-  (ins VGPR_32:$addr, src_op:$data0, src_op:$data1, Offset0:$offset0, Offset1:$offset1, gds:$gds),
+  (outs dst_rc:$vdst),
+  (ins VGPR_32:$addr, src_rc:$data0, src_rc:$data1, Offset0:$offset0, Offset1:$offset1, gds:$gds),
   " $vdst, $addr, $data0, $data1$offset0$offset1$gds"> {
 
   let has_offset = 0;
 }
 
 multiclass DS_1A2D_Off8_RET_mc<string opName,
-                               RegisterClass rc = VGPR_32,
-                               RegisterClass src = rc> {
-  def "" : DS_1A2D_Off8_RET<opName, rc, src>;
+                               RegisterClass dst_rc = VGPR_32,
+                               RegisterClass src_rc = dst_rc> {
+  def "" : DS_1A2D_Off8_RET<opName, dst_rc, src_rc>;
 
   let has_m0_read = 0 in {
-    def _gfx9 : DS_1A2D_Off8_RET<opName, rc, src>;
+    def _gfx9 : DS_1A2D_Off8_RET<opName, dst_rc, src_rc>;
+    def _agpr : DS_1A2D_Off8_RET<opName, getEquivalentAGPRClass<dst_rc>.ret,
+                                         getEquivalentAGPRClass<src_rc>.ret>;
   }
 }
 
@@ -305,7 +344,7 @@ multiclass DS_1A_RET_mc<string opName, RegisterClass rc = VGPR_32, bit HasTiedOu
   }
 }
 
-multiclass DS_1A_RET_t16<string opName, RegisterClass rc = VGPR_32, bit HasTiedOutput = 0, Operand ofs = Offset> 
+multiclass DS_1A_RET_t16<string opName, RegisterClass rc = VGPR_32, bit HasTiedOutput = 0, Operand ofs = Offset>
 : DS_1A_RET_mc<opName, rc, HasTiedOutput, ofs> {
   let has_m0_read = 0 in {
     let True16Predicate = UseRealTrue16Insts in {
@@ -1379,7 +1418,7 @@ multiclass DS_Real_gfx12<bits<8> op,
 // Helper to avoid repeating the pseudo-name if we only need to set
 // the gfx12 name.
 multiclass DS_Real_gfx12_with_name<bits<8> op, string name> {
-  defm "" : DS_Real_gfx12<op, !cast<DS_Pseudo>(NAME), name>;
+  defm "" : DS_Real_gfx12<op, !cast<DS_Pseudo>(NAME#"_gfx9"), name>;
 }
 
 defm DS_MIN_F32           : DS_Real_gfx12_with_name<0x012, "ds_min_num_f32">;
@@ -1405,8 +1444,8 @@ defm DS_LOAD_TR6_B96      : DS_Real_gfx12<0x0fb>;
 defm DS_LOAD_TR16_B128    : DS_Real_gfx12<0x0fc>;
 defm DS_LOAD_TR8_B64      : DS_Real_gfx12<0x0fd>;
 
-defm DS_BVH_STACK_RTN_B32             : DS_Real_gfx12_with_name<0x0e0,
-  "ds_bvh_stack_push4_pop1_rtn_b32">;
+defm DS_BVH_STACK_RTN_B32 : DS_Real_gfx12<0x0e0, DS_BVH_STACK_RTN_B32,
+                                          "ds_bvh_stack_push4_pop1_rtn_b32">;
 defm DS_BVH_STACK_PUSH8_POP1_RTN_B32  : DS_Real_gfx12<0x0e1>;
 defm DS_BVH_STACK_PUSH8_POP2_RTN_B64  : DS_Real_gfx12<0x0e2>;
 
@@ -1434,7 +1473,7 @@ def : MnemonicAlias<"ds_load_tr_b128", "ds_load_tr16_b128">, Requires<[isGFX1250
 // GFX11.
 //===----------------------------------------------------------------------===//
 
-multiclass DS_Real_gfx11<bits<8> op, DS_Pseudo ps = !cast<DS_Pseudo>(NAME),
+multiclass DS_Real_gfx11<bits<8> op, DS_Pseudo ps = !cast<DS_Pseudo>(NAME#"_gfx9"),
                                      string name = !tolower(NAME)> {
   let AssemblerPredicate = isGFX11Only in {
     let DecoderNamespace = "GFX11" in
@@ -1448,7 +1487,7 @@ multiclass DS_Real_gfx11<bits<8> op, DS_Pseudo ps = !cast<DS_Pseudo>(NAME),
 
 multiclass DS_Real_gfx11_gfx12<bits<8> op,
                                string name = !tolower(NAME),
-                               DS_Pseudo ps = !cast<DS_Pseudo>(NAME)>
+                               DS_Pseudo ps = !cast<DS_Pseudo>(NAME#"_gfx9")>
   : DS_Real_gfx11<op, ps, name>,
     DS_Real_gfx12<op, ps, name>;
 
@@ -1476,16 +1515,16 @@ defm DS_WRXCHG2ST64_RTN_B64 : DS_Real_gfx11_gfx12<0x06f, "ds_storexchg_2addr_str
 defm DS_READ_B64            : DS_Real_gfx11_gfx12<0x076, "ds_load_b64">;
 defm DS_READ2_B64           : DS_Real_gfx11_gfx12<0x077, "ds_load_2addr_b64">;
 defm DS_READ2ST64_B64       : DS_Real_gfx11_gfx12<0x078, "ds_load_2addr_stride64_b64">;
-defm DS_WRITE_B8_D16_HI     : DS_Real_gfx11_gfx12<0x0a0, "ds_store_b8_d16_hi">;
-defm DS_WRITE_B16_D16_HI    : DS_Real_gfx11_gfx12<0x0a1, "ds_store_b16_d16_hi">;
-defm DS_READ_U8_D16         : DS_Real_gfx11_gfx12<0x0a2, "ds_load_u8_d16">;
-defm DS_READ_U8_D16_HI      : DS_Real_gfx11_gfx12<0x0a3, "ds_load_u8_d16_hi">;
-defm DS_READ_I8_D16         : DS_Real_gfx11_gfx12<0x0a4, "ds_load_i8_d16">;
-defm DS_READ_I8_D16_HI      : DS_Real_gfx11_gfx12<0x0a5, "ds_load_i8_d16_hi">;
-defm DS_READ_U16_D16        : DS_Real_gfx11_gfx12<0x0a6, "ds_load_u16_d16">;
-defm DS_READ_U16_D16_HI     : DS_Real_gfx11_gfx12<0x0a7, "ds_load_u16_d16_hi">;
-defm DS_WRITE_ADDTID_B32    : DS_Real_gfx11_gfx12<0x0b0, "ds_store_addtid_b32">;
-defm DS_READ_ADDTID_B32     : DS_Real_gfx11_gfx12<0x0b1, "ds_load_addtid_b32">;
+defm DS_WRITE_B8_D16_HI     : DS_Real_gfx11_gfx12<0x0a0, "ds_store_b8_d16_hi", DS_WRITE_B8_D16_HI>;
+defm DS_WRITE_B16_D16_HI    : DS_Real_gfx11_gfx12<0x0a1, "ds_store_b16_d16_hi", DS_WRITE_B16_D16_HI>;
+defm DS_READ_U8_D16         : DS_Real_gfx11_gfx12<0x0a2, "ds_load_u8_d16", DS_READ_U8_D16>;
+defm DS_READ_U8_D16_HI      : DS_Real_gfx11_gfx12<0x0a3, "ds_load_u8_d16_hi", DS_READ_U8_D16_HI>;
+defm DS_READ_I8_D16         : DS_Real_gfx11_gfx12<0x0a4, "ds_load_i8_d16", DS_READ_I8_D16>;
+defm DS_READ_I8_D16_HI      : DS_Real_gfx11_gfx12<0x0a5, "ds_load_i8_d16_hi", DS_READ_I8_D16_HI>;
+defm DS_READ_U16_D16        : DS_Real_gfx11_gfx12<0x0a6, "ds_load_u16_d16", DS_READ_U16_D16>;
+defm DS_READ_U16_D16_HI     : DS_Real_gfx11_gfx12<0x0a7, "ds_load_u16_d16_hi", DS_READ_U16_D16_HI>;
+defm DS_WRITE_ADDTID_B32    : DS_Real_gfx11_gfx12<0x0b0, "ds_store_addtid_b32", DS_WRITE_ADDTID_B32>;
+defm DS_READ_ADDTID_B32     : DS_Real_gfx11_gfx12<0x0b1, "ds_load_addtid_b32", DS_READ_ADDTID_B32>;
 defm DS_WRITE_B96           : DS_Real_gfx11_gfx12<0x0de, "ds_store_b96">;
 defm DS_WRITE_B128          : DS_Real_gfx11_gfx12<0x0df, "ds_store_b128">;
 defm DS_READ_B96            : DS_Real_gfx11_gfx12<0x0fe, "ds_load_b96">;
@@ -1505,22 +1544,22 @@ defm DS_CMPSTORE_RTN_B64                 : DS_Real_gfx11_gfx12<0x070>;
 defm DS_CMPSTORE_RTN_F64                 : DS_Real_gfx11<0x071>;
 
 defm DS_ADD_RTN_F32                      : DS_Real_gfx11_gfx12<0x079>;
-defm DS_ADD_GS_REG_RTN                   : DS_Real_gfx11<0x07a>;
-defm DS_SUB_GS_REG_RTN                   : DS_Real_gfx11<0x07b>;
-defm DS_BVH_STACK_RTN_B32                : DS_Real_gfx11<0x0ad>;
+defm DS_ADD_GS_REG_RTN                   : DS_Real_gfx11<0x07a, DS_ADD_GS_REG_RTN>;
+defm DS_SUB_GS_REG_RTN                   : DS_Real_gfx11<0x07b, DS_SUB_GS_REG_RTN>;
+defm DS_BVH_STACK_RTN_B32                : DS_Real_gfx11<0x0ad, DS_BVH_STACK_RTN_B32>;
 
 //===----------------------------------------------------------------------===//
 // GFX10.
 //===----------------------------------------------------------------------===//
 
 let AssemblerPredicate = isGFX10Only, DecoderNamespace = "GFX10" in {
-  multiclass DS_Real_gfx10<bits<8> op>  {
+  multiclass DS_Real_gfx10<bits<8> op, DS_Pseudo ps = !cast<DS_Pseudo>(NAME)>  {
     def _gfx10 : Base_DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<op,
-      !cast<DS_Pseudo>(NAME), SIEncodingFamily.GFX10>;
+      ps, SIEncodingFamily.GFX10>;
   }
 } // End AssemblerPredicate = isGFX10Only, DecoderNamespace = "GFX10"
 
-defm DS_ADD_RTN_F32      : DS_Real_gfx10<0x055>;
+defm DS_ADD_RTN_F32      : DS_Real_gfx10<0x055, DS_ADD_RTN_F32_gfx9>;
 defm DS_WRITE_B8_D16_HI  : DS_Real_gfx10<0x0a0>;
 defm DS_WRITE_B16_D16_HI : DS_Real_gfx10<0x0a1>;
 defm DS_READ_U8_D16      : DS_Real_gfx10<0x0a2>;
@@ -1536,39 +1575,48 @@ defm DS_READ_ADDTID_B32  : DS_Real_gfx10<0x0b1>;
 // GFX10, GFX11, GFX12.
 //===----------------------------------------------------------------------===//
 
-multiclass DS_Real_gfx10_gfx11_gfx12<bits<8> op> :
-  DS_Real_gfx10<op>, DS_Real_gfx11<op>, DS_Real_gfx12<op>;
+multiclass DS_Real_gfx10_gfx11_gfx12<bits<8> op, DS_Pseudo ps = !cast<DS_Pseudo>(NAME#"_gfx9")> :
+  DS_Real_gfx10<op, ps>,
+  DS_Real_gfx11<op, ps>,
+  DS_Real_gfx12<op, ps>;
 
-multiclass DS_Real_gfx10_gfx11<bits<8> op> :
-  DS_Real_gfx10<op>, DS_Real_gfx11<op>;
+multiclass DS_Real_gfx10_gfx11<bits<8> op, DS_Pseudo ps = !cast<DS_Pseudo>(NAME#"_gfx9")> :
+  DS_Real_gfx10<op, ps>, DS_Real_gfx11<op, ps>;
 
 defm DS_ADD_F32          : DS_Real_gfx10_gfx11_gfx12<0x015>;
 defm DS_ADD_SRC2_F32     : DS_Real_gfx10<0x095>;
-defm DS_PERMUTE_B32      : DS_Real_gfx10_gfx11_gfx12<0x0b2>;
-defm DS_BPERMUTE_B32     : DS_Real_gfx10_gfx11_gfx12<0x0b3>;
+defm DS_PERMUTE_B32      : DS_Real_gfx10_gfx11_gfx12<0x0b2, DS_PERMUTE_B32>;
+defm DS_BPERMUTE_B32     : DS_Real_gfx10_gfx11_gfx12<0x0b3, DS_BPERMUTE_B32>;
 
 //===----------------------------------------------------------------------===//
 // GFX7, GFX10, GFX11, GFX12.
 //===----------------------------------------------------------------------===//
 
 let AssemblerPredicate = isGFX7Only, DecoderNamespace = "GFX7" in {
-  multiclass DS_Real_gfx7<bits<8> op> {
+  multiclass DS_Real_gfx7<bits<8> op, DS_Pseudo ps> {
     def _gfx7 : Base_DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<op,
-      !cast<DS_Pseudo>(NAME), SIEncodingFamily.SI>;
+      ps, SIEncodingFamily.SI>;
   }
 } // End AssemblerPredicate = isGFX7Only, DecoderNamespace = "GFX7"
 
-multiclass DS_Real_gfx7_gfx10_gfx11_gfx12<bits<8> op> :
-  DS_Real_gfx7<op>, DS_Real_gfx10_gfx11_gfx12<op>;
+multiclass DS_Real_gfx7_gfx10_gfx11_gfx12<bits<8> op,
+           DS_Pseudo ps_gfx6 = !cast<DS_Pseudo>(NAME),
+           DS_Pseudo ps_gfx9 = !cast<DS_Pseudo>(NAME#"_gfx9")> :
+  DS_Real_gfx7<op, ps_gfx6>,
+  DS_Real_gfx10_gfx11_gfx12<op, ps_gfx9>;
 
-multiclass DS_Real_gfx7_gfx10_gfx11<bits<8> op> :
-  DS_Real_gfx7<op>, DS_Real_gfx10_gfx11<op>;
+multiclass DS_Real_gfx7_gfx10_gfx11<bits<8> op,
+           DS_Pseudo ps_gfx6 = !cast<DS_Pseudo>(NAME),
+           DS_Pseudo ps_gfx9 = !cast<DS_Pseudo>(NAME#"_gfx9")> :
+  DS_Real_gfx7<op, ps_gfx6>, DS_Real_gfx10_gfx11<op, ps_gfx9>;
 
-multiclass DS_Real_gfx7_gfx10<bits<8> op> :
-  DS_Real_gfx7<op>, DS_Real_gfx10<op>;
+multiclass DS_Real_gfx7_gfx10<bits<8> op,
+           DS_Pseudo ps_gfx6 = !cast<DS_Pseudo>(NAME),
+           DS_Pseudo ps_gfx9 = !cast<DS_Pseudo>(NAME#"_gfx9")> :
+  DS_Real_gfx7<op, ps_gfx6>, DS_Real_gfx10<op, ps_gfx9>;
 
 // FIXME-GFX7: Add tests when upstreaming this part.
-defm DS_GWS_SEMA_RELEASE_ALL : DS_Real_gfx7_gfx10_gfx11<0x018>;
+defm DS_GWS_SEMA_RELEASE_ALL : DS_Real_gfx7_gfx10_gfx11<0x018, DS_GWS_SEMA_RELEASE_ALL, DS_GWS_SEMA_RELEASE_ALL>;
 defm DS_WRAP_RTN_B32         : DS_Real_gfx7_gfx10_gfx11<0x034>;
 defm DS_CONDXCHG32_RTN_B64   : DS_Real_gfx7_gfx10_gfx11_gfx12<0x07e>;
 defm DS_WRITE_B96            : DS_Real_gfx7_gfx10<0x0de>;
@@ -1581,20 +1629,27 @@ defm DS_READ_B128            : DS_Real_gfx7_gfx10<0x0ff>;
 //===----------------------------------------------------------------------===//
 
 let AssemblerPredicate = isGFX6GFX7, DecoderNamespace = "GFX6GFX7" in {
-  multiclass DS_Real_gfx6_gfx7<bits<8> op> {
+  multiclass DS_Real_gfx6_gfx7<bits<8> op, DS_Pseudo ps> {
     def _gfx6_gfx7 : Base_DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<op,
-      !cast<DS_Pseudo>(NAME), SIEncodingFamily.SI>;
+      ps, SIEncodingFamily.SI>;
   }
 } // End AssemblerPredicate = isGFX6GFX7, DecoderNamespace = "GFX6GFX7"
 
-multiclass DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<bits<8> op> :
-  DS_Real_gfx6_gfx7<op>, DS_Real_gfx10_gfx11_gfx12<op>;
+multiclass DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<bits<8> op,
+           DS_Pseudo ps_gfx6 = !cast<DS_Pseudo>(NAME),
+           DS_Pseudo ps_gfx9 = !cast<DS_Pseudo>(NAME#"_gfx9")> :
+  DS_Real_gfx6_gfx7<op, ps_gfx6>,
+  DS_Real_gfx10_gfx11_gfx12<op, ps_gfx9>;
 
-multiclass DS_Real_gfx6_gfx7_gfx10_gfx11<bits<8> op> :
-  DS_Real_gfx6_gfx7<op>, DS_Real_gfx10_gfx11<op>;
+multiclass DS_Real_gfx6_gfx7_gfx10_gfx11<bits<8> op,
+           DS_Pseudo ps_gfx6 = !cast<DS_Pseudo>(NAME),
+           DS_Pseudo ps_gfx9 = !cast<DS_Pseudo>(NAME#"_gfx9")> :
+  DS_Real_gfx6_gfx7<op, ps_gfx6>, DS_Real_gfx10_gfx11<op, ps_gfx9>;
 
-multiclass DS_Real_gfx6_gfx7_gfx10<bits<8> op> :
-  DS_Real_gfx6_gfx7<op>, DS_Real_gfx10<op>;
+multiclass DS_Real_gfx6_gfx7_gfx10<bits<8> op,
+                                   DS_Pseudo ps_gfx6 = !cast<DS_Pseudo>(NAME),
+                                   DS_Pseudo ps_gfx9 = !cast<DS_Pseudo>(NAME#"_gfx9")> :
+  DS_Real_gfx6_gfx7<op, ps_gfx6>, DS_Real_gfx10<op, ps_gfx9>;
 
 defm DS_ADD_U32             : DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<0x000>;
 defm DS_SUB_U32             : DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<0x001>;
@@ -1618,12 +1673,12 @@ defm DS_CMPST_F32           : DS_Real_gfx6_gfx7_gfx10<0x011>;
 
 defm DS_MIN_F32             : DS_Real_gfx6_gfx7_gfx10_gfx11<0x012>;
 defm DS_MAX_F32             : DS_Real_gfx6_gfx7_gfx10_gfx11<0x013>;
-defm DS_NOP                 : DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<0x014>;
-defm DS_GWS_INIT            : DS_Real_gfx6_gfx7_gfx10_gfx11<0x019>;
-defm DS_GWS_SEMA_V          : DS_Real_gfx6_gfx7_gfx10_gfx11<0x01a>;
-defm DS_GWS_SEMA_BR         : DS_Real_gfx6_gfx7_gfx10_gfx11<0x01b>;
-defm DS_GWS_SEMA_P          : DS_Real_gfx6_gfx7_gfx10_gfx11<0x01c>;
-defm DS_GWS_BARRIER         : DS_Real_gfx6_gfx7_gfx10_gfx11<0x01d>;
+defm DS_NOP                 : DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<0x014, DS_NOP, DS_NOP>;
+defm DS_GWS_INIT            : DS_Real_gfx6_gfx7_gfx10_gfx11<0x019, DS_GWS_INIT, DS_GWS_INIT>;
+defm DS_GWS_SEMA_V          : DS_Real_gfx6_gfx7_gfx10_gfx11<0x01a, DS_GWS_SEMA_V, DS_GWS_SEMA_V>;
+defm DS_GWS_SEMA_BR         : DS_Real_gfx6_gfx7_gfx10_gfx11<0x01b, DS_GWS_SEMA_BR, DS_GWS_SEMA_BR>;
+defm DS_GWS_SEMA_P          : DS_Real_gfx6_gfx7_gfx10_gfx11<0x01c, DS_GWS_SEMA_P, DS_GWS_SEMA_P>;
+defm DS_GWS_BARRIER         : DS_Real_gfx6_gfx7_gfx10_gfx11<0x01d, DS_GWS_BARRIER, DS_GWS_BARRIER>;
 
 defm DS_WRITE_B8            : DS_Real_gfx6_gfx7_gfx10<0x01e>;
 defm DS_WRITE_B16           : DS_Real_gfx6_gfx7_gfx10<0x01f>;
@@ -1650,7 +1705,7 @@ defm DS_CMPST_RTN_F32       : DS_Real_gfx6_gfx7_gfx10<0x031>;
 
 defm DS_MIN_RTN_F32         : DS_Real_gfx6_gfx7_gfx10_gfx11<0x032>;
 defm DS_MAX_RTN_F32         : DS_Real_gfx6_gfx7_gfx10_gfx11<0x033>;
-defm DS_SWIZZLE_B32         : DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<0x035>;
+defm DS_SWIZZLE_B32         : DS_Real_gfx6_gfx7_gfx10_gfx11_gfx12<0x035, DS_SWIZZLE_B32, DS_SWIZZLE_B32>;
 
 defm DS_READ_B32            : DS_Real_gfx6_gfx7_gfx10<0x036>;
 defm DS_READ2_B32           : DS_Real_gfx6_gfx7_gfx10<0x037>;
@@ -1660,9 +1715,9 @@ defm DS_READ_U8             : DS_Real_gfx6_gfx7_gfx10<0x03a>;
 defm DS_READ_I16            : DS_Real_gfx6_gfx7_gfx10<0x03b>;
 defm DS_READ_U16            : DS_Real_gfx6_gfx7_gfx10<0x03c>;
 
-defm DS_CONSUME             ...
[truncated]

@arsenm arsenm merged commit 1959e12 into main Sep 4, 2025
9 checks passed
@arsenm arsenm deleted the users/arsenm/amdgpu/define-agpr-variants-ds-write2-insts branch September 4, 2025 00:13
ckoparkar added a commit to ckoparkar/llvm-project that referenced this pull request Sep 4, 2025
* main: (1483 commits)
  [clang] fix error recovery for invalid nested name specifiers (llvm#156772)
  Revert "[lldb] Add count for errors of DWO files in statistics and combine DWO file count functions" (llvm#156777)
  AMDGPU: Add agpr variants of multi-data DS instructions (llvm#156420)
  [libc][NFC] disable localtime on aarch64/baremetal (llvm#156776)
  [win/asan] Improve SharedReAlloc with HEAP_REALLOC_IN_PLACE_ONLY. (llvm#132558)
  [LLDB] Make internal shell the default for running LLDB lit tests. (llvm#156729)
  [lldb][debugserver] Max response size for qSpeedTest (llvm#156099)
  [AMDGPU] Define 1024 VGPRs on gfx1250 (llvm#156765)
  [flang] Check for BIND(C) name conflicts with alternate entries (llvm#156563)
  [RISCV] Add exhausted_gprs_fprs test to calling-conv-half.ll. NFC (llvm#156586)
  [NFC] Remove trailing whitespaces from `clang/include/clang/Basic/AttrDocs.td`
  [lldb] Mark scripted frames as synthetic instead of artificial (llvm#153117)
  [docs] Refine some of the wording in the quality developer policy (llvm#156555)
  [MLIR] Apply clang-tidy fixes for readability-identifier-naming in TransformOps.cpp (NFC)
  [MLIR] Add LDBG() tracing to VectorTransferOpTransforms.cpp (NFC)
  [NFC] Apply clang-format to PPCInstrFutureMMA.td (llvm#156749)
  [libc] implement template functions for localtime (llvm#110363)
  [llvm-objcopy][COFF] Update .symidx values after stripping (llvm#153322)
  Add documentation on debugging LLVM.
  [lldb] Add count for errors of DWO files in statistics and combine DWO file count functions (llvm#155023)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AMDGPU DS cmpxhg fails machine verifier with AGPR inputs
3 participants