[AMDGPU] Make rotr illegal #166558

jayfoad · 2025-11-05T13:45:13Z

fshr is already legal and is strictly more powerful than rotr, so we
should only need selection patterns for fshr.

fshr is already legal and is strictly more powerful than rotr, so we should only need selection patterns for fshr.

llvmbot · 2025-11-05T13:45:47Z

@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

Changes

fshr is already legal and is strictly more powerful than rotr, so we
should only need selection patterns for fshr.

Full diff: https://github.com/llvm/llvm-project/pull/166558.diff

7 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp (+1-3)
(modified) llvm/lib/Target/AMDGPU/AMDGPUInstructions.td (-6)
(modified) llvm/lib/Target/AMDGPU/EvergreenInstructions.td (-1)
(modified) llvm/lib/Target/AMDGPU/SIInstructions.td (-27)
(modified) llvm/test/CodeGen/AMDGPU/packetizer.ll (+1)
(modified) llvm/test/CodeGen/AMDGPU/permute_i8.ll (+3-2)
(modified) llvm/test/CodeGen/AMDGPU/shl.ll (+7-8)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index 1b559a628be08..7f942fdd95211 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -502,9 +502,7 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
   // The hardware supports 32-bit FSHR, but not FSHL.
   setOperationAction(ISD::FSHR, MVT::i32, Legal);
 
-  // The hardware supports 32-bit ROTR, but not ROTL.
-  setOperationAction(ISD::ROTL, {MVT::i32, MVT::i64}, Expand);
-  setOperationAction(ISD::ROTR, MVT::i64, Expand);
+  setOperationAction({ISD::ROTL, ISD::ROTR}, {MVT::i32, MVT::i64}, Expand);
 
   setOperationAction({ISD::MULHU, ISD::MULHS}, MVT::i16, Expand);
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
index bd443b5b6f1e6..ddcb431f39a87 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
@@ -806,12 +806,6 @@ class DwordAddrPat<ValueType vt, RegisterClass rc> : AMDGPUPat <
   (vt rc:$addr)
 >;
 
-// rotr pattern
-class ROTRPattern <Instruction BIT_ALIGN> : AMDGPUPat <
-  (rotr i32:$src0, i32:$src1),
-  (BIT_ALIGN $src0, $src0, $src1)
->;
-
 // Special conversion patterns
 
 def cvt_rpi_i32_f32 : PatFrag <
diff --git a/llvm/lib/Target/AMDGPU/EvergreenInstructions.td b/llvm/lib/Target/AMDGPU/EvergreenInstructions.td
index dadc7dcd7054a..a2e3ecef1c206 100644
--- a/llvm/lib/Target/AMDGPU/EvergreenInstructions.td
+++ b/llvm/lib/Target/AMDGPU/EvergreenInstructions.td
@@ -505,7 +505,6 @@ def : AMDGPUPat <
   (fshr i32:$src0, i32:$src1, i32:$src2),
   (BIT_ALIGN_INT_eg $src0, $src1, $src2)
 >;
-def : ROTRPattern <BIT_ALIGN_INT_eg>;
 def MULADD_eg : MULADD_Common<0x14>;
 def MULADD_IEEE_eg : MULADD_IEEE_Common<0x18>;
 def FMA_eg : FMA_Common<0x7>;
diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index 6f1feb1dc2996..a432d297da595 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -2663,8 +2663,6 @@ def : AMDGPUPat <
 
 let True16Predicate = NotHasTrue16BitInsts in {
 let SubtargetPredicate = isNotGFX9Plus in {
-def : ROTRPattern <V_ALIGNBIT_B32_e64>;
-
 def : GCNPat<(i32 (DivergentUnaryFrag<trunc> (srl i64:$src0, (and i32:$src1, (i32 31))))),
           (V_ALIGNBIT_B32_e64 (i32 (EXTRACT_SUBREG (i64 $src0), sub1)),
                           (i32 (EXTRACT_SUBREG (i64 $src0), sub0)), $src1)>;
@@ -2675,14 +2673,6 @@ def : GCNPat<(i32 (DivergentUnaryFrag<trunc> (srl i64:$src0, (i32 ShiftAmt32Imm:
 } // isNotGFX9Plus
 
 let SubtargetPredicate = isGFX9GFX10 in {
-def : GCNPat <
-        (rotr i32:$src0, i32:$src1),
-        (V_ALIGNBIT_B32_opsel_e64 /* src0_modifiers */ 0, $src0,
-                                  /* src1_modifiers */ 0, $src0,
-                                  /* src2_modifiers */ 0,
-                                  $src1, /* clamp */ 0, /* op_sel */ 0)
->;
-
 foreach pat = [(i32 (DivergentUnaryFrag<trunc> (srl i64:$src0, (and i32:$src1, (i32 31))))),
                (i32 (DivergentUnaryFrag<trunc> (srl i64:$src0, (i32 ShiftAmt32Imm:$src1))))] in
 def : GCNPat<pat,
@@ -2704,15 +2694,6 @@ def : GCNPat<(fshr i32:$src0, i32:$src1, i32:$src2),
 } // end True16Predicate = NotHasTrue16BitInsts
 
 let True16Predicate = UseRealTrue16Insts in {
-def : GCNPat <
-  (rotr i32:$src0, i32:$src1),
-  (V_ALIGNBIT_B32_t16_e64 /* src0_modifiers */ 0, $src0,
-                          /* src1_modifiers */ 0, $src0,
-                          /* src2_modifiers */ 0,
-                          (EXTRACT_SUBREG $src1, lo16),
-                          /* clamp */ 0, /* op_sel */ 0)
->;
-
 def : GCNPat<(i32 (DivergentUnaryFrag<trunc> (srl i64:$src0, (i32 ShiftAmt32Imm:$src1)))),
           (V_ALIGNBIT_B32_t16_e64 0, /* src0_modifiers */
                           (i32 (EXTRACT_SUBREG (i64 $src0), sub1)),
@@ -2731,14 +2712,6 @@ def : GCNPat<(fshr i32:$src0, i32:$src1, i32:$src2),
 } // end True16Predicate = UseRealTrue16Insts
 
 let True16Predicate = UseFakeTrue16Insts in {
-def : GCNPat <
-  (rotr i32:$src0, i32:$src1),
-  (V_ALIGNBIT_B32_fake16_e64 /* src0_modifiers */ 0, $src0,
-                             /* src1_modifiers */ 0, $src0,
-                             /* src2_modifiers */ 0,
-                             $src1, /* clamp */ 0, /* op_sel */ 0)
->;
-
 def : GCNPat<(i32 (DivergentUnaryFrag<trunc> (srl i64:$src0, (and i32:$src1, (i32 31))))),
      (V_ALIGNBIT_B32_fake16_e64 0, /* src0_modifiers */
                                (i32 (EXTRACT_SUBREG (i64 $src0), sub1)),
diff --git a/llvm/test/CodeGen/AMDGPU/packetizer.ll b/llvm/test/CodeGen/AMDGPU/packetizer.ll
index aab035f811434..0b8acee74eacb 100644
--- a/llvm/test/CodeGen/AMDGPU/packetizer.ll
+++ b/llvm/test/CodeGen/AMDGPU/packetizer.ll
@@ -1,5 +1,6 @@
 ; RUN: llc < %s -mtriple=r600 -mcpu=redwood | FileCheck %s
 ; RUN: llc < %s -mtriple=r600 -mcpu=cayman | FileCheck %s
+; XFAIL: *
 
 ; CHECK: {{^}}test:
 ; CHECK: BIT_ALIGN_INT T{{[0-9]}}.X
diff --git a/llvm/test/CodeGen/AMDGPU/permute_i8.ll b/llvm/test/CodeGen/AMDGPU/permute_i8.ll
index 0741cb256cc24..75263683371be 100644
--- a/llvm/test/CodeGen/AMDGPU/permute_i8.ll
+++ b/llvm/test/CodeGen/AMDGPU/permute_i8.ll
@@ -353,7 +353,7 @@ define hidden void @shuffle5341ud2(ptr addrspace(1) %in0, ptr addrspace(1) %out0
 ; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX10-NEXT:    global_load_dword v0, v[0:1], off
 ; GFX10-NEXT:    s_waitcnt vmcnt(0)
-; GFX10-NEXT:    v_alignbit_b32 v0, v0, v0, 16
+; GFX10-NEXT:    v_perm_b32 v0, v0, v0, 0x5040706
 ; GFX10-NEXT:    global_store_dword v[2:3], v0, off
 ; GFX10-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -361,8 +361,9 @@ define hidden void @shuffle5341ud2(ptr addrspace(1) %in0, ptr addrspace(1) %out0
 ; GFX9:       ; %bb.0:
 ; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX9-NEXT:    global_load_dword v0, v[0:1], off
+; GFX9-NEXT:    s_mov_b32 s4, 0x5040706
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
-; GFX9-NEXT:    v_alignbit_b32 v0, v0, v0, 16
+; GFX9-NEXT:    v_perm_b32 v0, v0, v0, s4
 ; GFX9-NEXT:    global_store_dword v[2:3], v0, off
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
 ; GFX9-NEXT:    s_setpc_b64 s[30:31]
diff --git a/llvm/test/CodeGen/AMDGPU/shl.ll b/llvm/test/CodeGen/AMDGPU/shl.ll
index 28330bfc9bb69..acf999e586a68 100644
--- a/llvm/test/CodeGen/AMDGPU/shl.ll
+++ b/llvm/test/CodeGen/AMDGPU/shl.ll
@@ -1470,21 +1470,20 @@ define amdgpu_kernel void @s_shl_inline_imm_1_i64(ptr addrspace(1) %out, ptr add
 ;
 ; EG-LABEL: s_shl_inline_imm_1_i64:
 ; EG:       ; %bb.0:
-; EG-NEXT:    ALU 11, @4, KC0[CB0:0-32], KC1[]
+; EG-NEXT:    ALU 10, @4, KC0[CB0:0-32], KC1[]
 ; EG-NEXT:    MEM_RAT_CACHELESS STORE_RAW T0.XY, T1.X, 1
 ; EG-NEXT:    CF_END
 ; EG-NEXT:    PAD
 ; EG-NEXT:    ALU clause starting at 4:
 ; EG-NEXT:     AND_INT T0.W, KC0[2].W, literal.x,
-; EG-NEXT:     LSHL * T1.W, KC0[2].W, literal.y,
-; EG-NEXT:    31(4.344025e-44), 26(3.643376e-44)
-; EG-NEXT:     ASHR T1.W, PS, literal.x,
-; EG-NEXT:     LSHL * T0.W, 1, PV.W,
+; EG-NEXT:     NOT_INT * T1.W, KC0[2].W,
 ; EG-NEXT:    31(4.344025e-44), 0(0.000000e+00)
-; EG-NEXT:     AND_INT T0.Y, PV.W, PS,
-; EG-NEXT:     AND_INT * T1.W, KC0[2].W, literal.x,
+; EG-NEXT:     BIT_ALIGN_INT T0.Z, 0.0, 0.0, PS,
+; EG-NEXT:     AND_INT T1.W, KC0[2].W, literal.x,
+; EG-NEXT:     LSHL * T0.W, 1, PV.W,
 ; EG-NEXT:    32(4.484155e-44), 0(0.000000e+00)
-; EG-NEXT:     CNDE_INT T0.X, PV.W, T0.W, 0.0,
+; EG-NEXT:     CNDE_INT * T0.Y, PV.W, PV.Z, PS,
+; EG-NEXT:     CNDE_INT T0.X, T1.W, T0.W, 0.0,
 ; EG-NEXT:     LSHR * T1.X, KC0[2].Y, literal.x,
 ; EG-NEXT:    2(2.802597e-45), 0(0.000000e+00)
   %shl = shl i64 1, %a

jayfoad · 2025-11-05T13:47:14Z

llvm/test/CodeGen/AMDGPU/permute_i8.ll

 ; GFX10-NEXT:    global_load_dword v0, v[0:1], off
 ; GFX10-NEXT:    s_waitcnt vmcnt(0)
-; GFX10-NEXT:    v_alignbit_b32 v0, v0, v0, 16
+; GFX10-NEXT:    v_perm_b32 v0, v0, v0, 0x5040706


These are just more instances of a problem introduced by #70240 and already noted here:

llvm-project/llvm/test/CodeGen/AMDGPU/permute.ll

Line 114 in ba1dbdd

; FIXME: produce v_alignbit_b32 v2, v2, s0, 24 instead of v_perm

Do you have a plan for these? It feels like you're going to end up adding fshr peepholes to replace the rotr ones?

Do you have a plan for these?

My plan is to hope @jrbyrnes picks it up :) At the moment I don't understand why his #70240 removed this check of yours:

llvm-project/llvm/lib/Target/AMDGPU/SIISelLowering.cpp

Line 11081 in 142efd6

// Check that we haven't just recreated the same FSHR node.

It feels like you're going to end up adding fshr peepholes to replace the rotr ones?

I don't think there are any peepholes. It is just that PerformDAGCombine's FSHR case calls matchPERM but the ROTR case does not.

I reinstated the check for recreating the same FSHR node.

llvm/test/CodeGen/AMDGPU/packetizer.ll

jayfoad · 2025-11-21T14:33:20Z

@aleksandar-amd what do you think about this as an alternative to #143551?

This reinstates a check from https://reviews.llvm.org/D159533 that was removed by llvm#70240.

github-actions · 2025-11-21T15:31:04Z

🐧 Linux x64 Test Results

186823 tests passed
4910 tests skipped

✅ The build succeeded and all tests passed.

aleksandar-amd · 2025-11-24T13:00:25Z

@aleksandar-amd what do you think about this as an alternative to #143551?

Taking into account that #165295 resolved optimal selection of fshr with uniform operands, making rotr illegal and using fshr, which is a more powerful instruction, seems fine to me.

jayfoad · 2025-12-02T08:44:54Z

Ping! I think all significant regressions are resolved now.

RKSimon · 2025-12-02T10:16:44Z

LLVM :: CodeGen/AMDGPU/amdgcn-cs-chain-intrinsic-dyn-vgpr-w32.ll ?

jayfoad · 2025-12-02T10:33:47Z

LLVM :: CodeGen/AMDGPU/amdgcn-cs-chain-intrinsic-dyn-vgpr-w32.ll ?

That was broken on main. I have now merged the fix from main.

arsenm · 2025-12-02T17:08:32Z

llvm/test/CodeGen/AMDGPU/rotl.ll

 ; SI-NEXT:    s_mov_b32 s7, 0xf000
 ; SI-NEXT:    s_waitcnt lgkmcnt(0)
-; SI-NEXT:    s_sub_i32 s3, 32, s3
+; SI-NEXT:    s_sub_i32 s4, 32, s3


This looks worse but I suppose it's just the consequence of now being scalar (though in this type of case we probably should have rewritten the tail to fold into the copy to VGPR)

[AMDGPU] Make rotr illegal

b6f5f88

fshr is already legal and is strictly more powerful than rotr, so we should only need selection patterns for fshr.

llvmbot added the backend:AMDGPU label Nov 5, 2025

jayfoad commented Nov 5, 2025

View reviewed changes

llvm/test/CodeGen/AMDGPU/packetizer.ll Outdated Show resolved Hide resolved

jayfoad requested review from RKSimon, alex-t, arsenm and gandhi56 November 5, 2025 13:52

arsenm requested a review from jrbyrnes November 5, 2025 15:47

Merge remote-tracking branch 'origin/main' into illegal-rotr

165f4b6

jayfoad added 2 commits November 21, 2025 14:35

Merge remote-tracking branch 'origin/main' into illegal-rotr

3e546d7

Do not convert v_alignbit_b32 to v_perm_b32

14ac60c

This reinstates a check from https://reviews.llvm.org/D159533 that was removed by llvm#70240.

Merge remote-tracking branch 'origin/main' into illegal-rotr

181c625

Merge remote-tracking branch 'origin/main' into illegal-rotr

9a9658c

arsenm approved these changes Dec 2, 2025

View reviewed changes

[AMDGPU] Make rotr illegal #166558

Are you sure you want to change the base?

[AMDGPU] Make rotr illegal #166558

Uh oh!

Conversation

jayfoad commented Nov 5, 2025

Uh oh!

llvmbot commented Nov 5, 2025

Uh oh!

jayfoad Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

jayfoad Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

jayfoad Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jayfoad commented Nov 21, 2025

Uh oh!

github-actions bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

aleksandar-amd commented Nov 24, 2025 • edited by jayfoad Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jayfoad commented Dec 2, 2025

Uh oh!

RKSimon commented Dec 2, 2025

Uh oh!

jayfoad commented Dec 2, 2025

Uh oh!

arsenm Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions bot commented Nov 21, 2025 •

edited

Loading

aleksandar-amd commented Nov 24, 2025 •

edited by jayfoad

Loading