Skip to content

Conversation

@LU-JOHN
Copy link
Contributor

@LU-JOHN LU-JOHN commented Oct 17, 2025

32-bit ABS can be lowered legally.

@llvmbot
Copy link
Member

llvmbot commented Oct 17, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: None (LU-JOHN)

Changes

32-bit ABS can be lowered legally.


Patch is 108.10 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/163907.diff

10 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp (+2-2)
  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+7)
  • (modified) llvm/test/CodeGen/AMDGPU/abs_i16.ll (+474-506)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fold-binop-select.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll (+65-75)
  • (modified) llvm/test/CodeGen/AMDGPU/bypass-div.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/divergence-driven-abs.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/sdiv.ll (+388-400)
  • (modified) llvm/test/CodeGen/AMDGPU/sminmax.ll (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/srem.ll (+13-13)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index 1b559a628be08..8ed4062e43946 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -514,8 +514,8 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
       MVT::i64, Custom);
   setOperationAction(ISD::SELECT_CC, MVT::i64, Expand);
 
-  setOperationAction({ISD::SMIN, ISD::UMIN, ISD::SMAX, ISD::UMAX}, MVT::i32,
-                     Legal);
+  setOperationAction({ISD::ABS, ISD::SMIN, ISD::UMIN, ISD::SMAX, ISD::UMAX},
+                     MVT::i32, Legal);
 
   setOperationAction(
       {ISD::CTTZ, ISD::CTTZ_ZERO_UNDEF, ISD::CTLZ, ISD::CTLZ_ZERO_UNDEF},
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a2841c114a698..8ded201f03055 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -14945,6 +14945,13 @@ SDValue SITargetLowering::performMinMaxCombine(SDNode *N,
     }
   }
 
+  // max(x, neg(x)) -> abs(x)
+  if (Opc == ISD::SMAX && VT == MVT::i32) {
+    SDValue Value;
+    if (sd_match(N, m_SMax(m_Value(Value), m_Neg(m_Deferred(Value)))))
+      return DAG.getNode(ISD::ABS, SDLoc(N), VT, Value);
+  }
+
   // min(max(x, K0), K1), K0 < K1 -> med3(x, K0, K1)
   // max(min(x, K0), K1), K1 < K0 -> med3(x, K1, K0)
   if (Opc == ISD::SMIN && Op0.getOpcode() == ISD::SMAX && Op0.hasOneUse()) {
diff --git a/llvm/test/CodeGen/AMDGPU/abs_i16.ll b/llvm/test/CodeGen/AMDGPU/abs_i16.ll
index 7633ba0eb4f9c..66cc7f3db03c2 100644
--- a/llvm/test/CodeGen/AMDGPU/abs_i16.ll
+++ b/llvm/test/CodeGen/AMDGPU/abs_i16.ll
@@ -15,7 +15,7 @@ define i16 @abs_i16(i16 %arg) {
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX6-NEXT:    v_bfe_i32 v0, v0, 0, 16
 ; GFX6-NEXT:    v_sub_i32_e32 v1, vcc, 0, v0
-; GFX6-NEXT:    v_max_i32_e32 v0, v0, v1
+; GFX6-NEXT:    v_max_i32_e32 v0, v1, v0
 ; GFX6-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX7-LABEL: abs_i16:
@@ -23,7 +23,7 @@ define i16 @abs_i16(i16 %arg) {
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX7-NEXT:    v_bfe_i32 v0, v0, 0, 16
 ; GFX7-NEXT:    v_sub_i32_e32 v1, vcc, 0, v0
-; GFX7-NEXT:    v_max_i32_e32 v0, v0, v1
+; GFX7-NEXT:    v_max_i32_e32 v0, v1, v0
 ; GFX7-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: abs_i16:
@@ -97,9 +97,9 @@ define <2 x i16> @v_abs_v2i16(<2 x i16> %arg) {
 ; GFX6-NEXT:    v_bfe_i32 v0, v0, 0, 16
 ; GFX6-NEXT:    v_bfe_i32 v1, v1, 0, 16
 ; GFX6-NEXT:    v_sub_i32_e32 v2, vcc, 0, v0
-; GFX6-NEXT:    v_max_i32_e32 v0, v0, v2
+; GFX6-NEXT:    v_max_i32_e32 v0, v2, v0
 ; GFX6-NEXT:    v_sub_i32_e32 v2, vcc, 0, v1
-; GFX6-NEXT:    v_max_i32_e32 v1, v1, v2
+; GFX6-NEXT:    v_max_i32_e32 v1, v2, v1
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v2, 16, v1
 ; GFX6-NEXT:    v_or_b32_e32 v0, v0, v2
 ; GFX6-NEXT:    s_setpc_b64 s[30:31]
@@ -110,9 +110,9 @@ define <2 x i16> @v_abs_v2i16(<2 x i16> %arg) {
 ; GFX7-NEXT:    v_bfe_i32 v0, v0, 0, 16
 ; GFX7-NEXT:    v_bfe_i32 v1, v1, 0, 16
 ; GFX7-NEXT:    v_sub_i32_e32 v2, vcc, 0, v0
-; GFX7-NEXT:    v_max_i32_e32 v0, v0, v2
+; GFX7-NEXT:    v_max_i32_e32 v0, v2, v0
 ; GFX7-NEXT:    v_sub_i32_e32 v2, vcc, 0, v1
-; GFX7-NEXT:    v_max_i32_e32 v1, v1, v2
+; GFX7-NEXT:    v_max_i32_e32 v1, v2, v1
 ; GFX7-NEXT:    v_lshlrev_b32_e32 v2, 16, v1
 ; GFX7-NEXT:    v_or_b32_e32 v0, v0, v2
 ; GFX7-NEXT:    s_setpc_b64 s[30:31]
@@ -172,15 +172,15 @@ define <3 x i16> @v_abs_v3i16(<3 x i16> %arg) {
 ; GFX6-NEXT:    v_bfe_i32 v0, v0, 0, 16
 ; GFX6-NEXT:    v_bfe_i32 v1, v1, 0, 16
 ; GFX6-NEXT:    v_sub_i32_e32 v3, vcc, 0, v0
-; GFX6-NEXT:    v_max_i32_e32 v0, v0, v3
-; GFX6-NEXT:    v_sub_i32_e32 v3, vcc, 0, v1
-; GFX6-NEXT:    v_max_i32_e32 v1, v1, v3
 ; GFX6-NEXT:    v_bfe_i32 v2, v2, 0, 16
+; GFX6-NEXT:    v_max_i32_e32 v0, v3, v0
+; GFX6-NEXT:    v_sub_i32_e32 v3, vcc, 0, v1
+; GFX6-NEXT:    v_max_i32_e32 v1, v3, v1
+; GFX6-NEXT:    v_sub_i32_e32 v3, vcc, 0, v2
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
+; GFX6-NEXT:    v_max_i32_e32 v2, v3, v2
 ; GFX6-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX6-NEXT:    v_sub_i32_e32 v1, vcc, 0, v2
-; GFX6-NEXT:    v_max_i32_e32 v2, v2, v1
-; GFX6-NEXT:    v_alignbit_b32 v1, v2, v0, 16
+; GFX6-NEXT:    v_alignbit_b32 v1, v2, v1, 16
 ; GFX6-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX7-LABEL: v_abs_v3i16:
@@ -189,15 +189,15 @@ define <3 x i16> @v_abs_v3i16(<3 x i16> %arg) {
 ; GFX7-NEXT:    v_bfe_i32 v0, v0, 0, 16
 ; GFX7-NEXT:    v_bfe_i32 v1, v1, 0, 16
 ; GFX7-NEXT:    v_sub_i32_e32 v3, vcc, 0, v0
-; GFX7-NEXT:    v_max_i32_e32 v0, v0, v3
-; GFX7-NEXT:    v_sub_i32_e32 v3, vcc, 0, v1
-; GFX7-NEXT:    v_max_i32_e32 v1, v1, v3
 ; GFX7-NEXT:    v_bfe_i32 v2, v2, 0, 16
+; GFX7-NEXT:    v_max_i32_e32 v0, v3, v0
+; GFX7-NEXT:    v_sub_i32_e32 v3, vcc, 0, v1
+; GFX7-NEXT:    v_max_i32_e32 v1, v3, v1
+; GFX7-NEXT:    v_sub_i32_e32 v3, vcc, 0, v2
 ; GFX7-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
+; GFX7-NEXT:    v_max_i32_e32 v2, v3, v2
 ; GFX7-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX7-NEXT:    v_sub_i32_e32 v1, vcc, 0, v2
-; GFX7-NEXT:    v_max_i32_e32 v2, v2, v1
-; GFX7-NEXT:    v_alignbit_b32 v1, v2, v0, 16
+; GFX7-NEXT:    v_alignbit_b32 v1, v2, v1, 16
 ; GFX7-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: v_abs_v3i16:
@@ -262,47 +262,45 @@ define <4 x i16> @v_abs_v4i16(<4 x i16> %arg) {
 ; GFX6-LABEL: v_abs_v4i16:
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX6-NEXT:    v_bfe_i32 v0, v0, 0, 16
+; GFX6-NEXT:    v_bfe_i32 v1, v1, 0, 16
+; GFX6-NEXT:    v_sub_i32_e32 v4, vcc, 0, v0
 ; GFX6-NEXT:    v_bfe_i32 v2, v2, 0, 16
+; GFX6-NEXT:    v_max_i32_e32 v0, v4, v0
+; GFX6-NEXT:    v_sub_i32_e32 v4, vcc, 0, v1
 ; GFX6-NEXT:    v_bfe_i32 v3, v3, 0, 16
+; GFX6-NEXT:    v_max_i32_e32 v1, v4, v1
 ; GFX6-NEXT:    v_sub_i32_e32 v4, vcc, 0, v2
-; GFX6-NEXT:    v_max_i32_e32 v2, v2, v4
+; GFX6-NEXT:    v_max_i32_e32 v2, v4, v2
 ; GFX6-NEXT:    v_sub_i32_e32 v4, vcc, 0, v3
-; GFX6-NEXT:    v_max_i32_e32 v3, v3, v4
-; GFX6-NEXT:    v_bfe_i32 v0, v0, 0, 16
-; GFX6-NEXT:    v_lshlrev_b32_e32 v3, 16, v3
-; GFX6-NEXT:    v_bfe_i32 v1, v1, 0, 16
-; GFX6-NEXT:    v_or_b32_e32 v2, v2, v3
-; GFX6-NEXT:    v_sub_i32_e32 v3, vcc, 0, v0
-; GFX6-NEXT:    v_max_i32_e32 v0, v0, v3
-; GFX6-NEXT:    v_sub_i32_e32 v3, vcc, 0, v1
-; GFX6-NEXT:    v_max_i32_e32 v1, v1, v3
+; GFX6-NEXT:    v_max_i32_e32 v3, v4, v3
+; GFX6-NEXT:    v_lshlrev_b32_e32 v4, 16, v3
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
+; GFX6-NEXT:    v_or_b32_e32 v2, v2, v4
 ; GFX6-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX6-NEXT:    v_alignbit_b32 v1, v2, v0, 16
-; GFX6-NEXT:    v_lshrrev_b32_e32 v3, 16, v2
+; GFX6-NEXT:    v_alignbit_b32 v1, v2, v1, 16
 ; GFX6-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX7-LABEL: v_abs_v4i16:
 ; GFX7:       ; %bb.0:
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT:    v_bfe_i32 v0, v0, 0, 16
+; GFX7-NEXT:    v_bfe_i32 v1, v1, 0, 16
+; GFX7-NEXT:    v_sub_i32_e32 v4, vcc, 0, v0
 ; GFX7-NEXT:    v_bfe_i32 v2, v2, 0, 16
+; GFX7-NEXT:    v_max_i32_e32 v0, v4, v0
+; GFX7-NEXT:    v_sub_i32_e32 v4, vcc, 0, v1
 ; GFX7-NEXT:    v_bfe_i32 v3, v3, 0, 16
+; GFX7-NEXT:    v_max_i32_e32 v1, v4, v1
 ; GFX7-NEXT:    v_sub_i32_e32 v4, vcc, 0, v2
-; GFX7-NEXT:    v_max_i32_e32 v2, v2, v4
+; GFX7-NEXT:    v_max_i32_e32 v2, v4, v2
 ; GFX7-NEXT:    v_sub_i32_e32 v4, vcc, 0, v3
-; GFX7-NEXT:    v_max_i32_e32 v3, v3, v4
-; GFX7-NEXT:    v_bfe_i32 v0, v0, 0, 16
-; GFX7-NEXT:    v_lshlrev_b32_e32 v3, 16, v3
-; GFX7-NEXT:    v_bfe_i32 v1, v1, 0, 16
-; GFX7-NEXT:    v_or_b32_e32 v2, v2, v3
-; GFX7-NEXT:    v_sub_i32_e32 v3, vcc, 0, v0
-; GFX7-NEXT:    v_max_i32_e32 v0, v0, v3
-; GFX7-NEXT:    v_sub_i32_e32 v3, vcc, 0, v1
-; GFX7-NEXT:    v_max_i32_e32 v1, v1, v3
+; GFX7-NEXT:    v_max_i32_e32 v3, v4, v3
+; GFX7-NEXT:    v_lshlrev_b32_e32 v4, 16, v3
 ; GFX7-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
+; GFX7-NEXT:    v_or_b32_e32 v2, v2, v4
 ; GFX7-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX7-NEXT:    v_alignbit_b32 v1, v2, v0, 16
-; GFX7-NEXT:    v_lshrrev_b32_e32 v3, 16, v2
+; GFX7-NEXT:    v_alignbit_b32 v1, v2, v1, 16
 ; GFX7-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: v_abs_v4i16:
@@ -370,63 +368,61 @@ define <6 x i16> @v_abs_v6i16(<6 x i16> %arg) {
 ; GFX6-LABEL: v_abs_v6i16:
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX6-NEXT:    v_bfe_i32 v0, v0, 0, 16
+; GFX6-NEXT:    v_bfe_i32 v1, v1, 0, 16
+; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, 0, v0
+; GFX6-NEXT:    v_bfe_i32 v4, v4, 0, 16
+; GFX6-NEXT:    v_max_i32_e32 v0, v6, v0
+; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, 0, v1
+; GFX6-NEXT:    v_bfe_i32 v5, v5, 0, 16
+; GFX6-NEXT:    v_max_i32_e32 v1, v6, v1
+; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, 0, v4
+; GFX6-NEXT:    v_max_i32_e32 v4, v6, v4
+; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, 0, v5
+; GFX6-NEXT:    v_max_i32_e32 v5, v6, v5
 ; GFX6-NEXT:    v_bfe_i32 v2, v2, 0, 16
+; GFX6-NEXT:    v_lshlrev_b32_e32 v6, 16, v5
 ; GFX6-NEXT:    v_bfe_i32 v3, v3, 0, 16
+; GFX6-NEXT:    v_or_b32_e32 v4, v4, v6
 ; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, 0, v2
-; GFX6-NEXT:    v_max_i32_e32 v2, v2, v6
+; GFX6-NEXT:    v_max_i32_e32 v2, v6, v2
 ; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, 0, v3
-; GFX6-NEXT:    v_max_i32_e32 v3, v3, v6
-; GFX6-NEXT:    v_bfe_i32 v0, v0, 0, 16
-; GFX6-NEXT:    v_lshlrev_b32_e32 v3, 16, v3
-; GFX6-NEXT:    v_bfe_i32 v1, v1, 0, 16
-; GFX6-NEXT:    v_or_b32_e32 v2, v2, v3
-; GFX6-NEXT:    v_sub_i32_e32 v3, vcc, 0, v0
-; GFX6-NEXT:    v_max_i32_e32 v0, v0, v3
-; GFX6-NEXT:    v_sub_i32_e32 v3, vcc, 0, v1
-; GFX6-NEXT:    v_bfe_i32 v5, v5, 0, 16
-; GFX6-NEXT:    v_max_i32_e32 v1, v1, v3
-; GFX6-NEXT:    v_bfe_i32 v4, v4, 0, 16
+; GFX6-NEXT:    v_max_i32_e32 v3, v6, v3
+; GFX6-NEXT:    v_lshlrev_b32_e32 v6, 16, v3
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
-; GFX6-NEXT:    v_sub_i32_e32 v3, vcc, 0, v5
+; GFX6-NEXT:    v_or_b32_e32 v2, v2, v6
 ; GFX6-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX6-NEXT:    v_sub_i32_e32 v1, vcc, 0, v4
-; GFX6-NEXT:    v_max_i32_e32 v5, v5, v3
-; GFX6-NEXT:    v_max_i32_e32 v1, v4, v1
-; GFX6-NEXT:    v_lshlrev_b32_e32 v3, 16, v5
-; GFX6-NEXT:    v_or_b32_e32 v4, v1, v3
-; GFX6-NEXT:    v_alignbit_b32 v1, v2, v0, 16
-; GFX6-NEXT:    v_lshrrev_b32_e32 v3, 16, v2
+; GFX6-NEXT:    v_alignbit_b32 v1, v2, v1, 16
 ; GFX6-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX7-LABEL: v_abs_v6i16:
 ; GFX7:       ; %bb.0:
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT:    v_bfe_i32 v0, v0, 0, 16
+; GFX7-NEXT:    v_bfe_i32 v1, v1, 0, 16
+; GFX7-NEXT:    v_sub_i32_e32 v6, vcc, 0, v0
+; GFX7-NEXT:    v_bfe_i32 v4, v4, 0, 16
+; GFX7-NEXT:    v_max_i32_e32 v0, v6, v0
+; GFX7-NEXT:    v_sub_i32_e32 v6, vcc, 0, v1
+; GFX7-NEXT:    v_bfe_i32 v5, v5, 0, 16
+; GFX7-NEXT:    v_max_i32_e32 v1, v6, v1
+; GFX7-NEXT:    v_sub_i32_e32 v6, vcc, 0, v4
+; GFX7-NEXT:    v_max_i32_e32 v4, v6, v4
+; GFX7-NEXT:    v_sub_i32_e32 v6, vcc, 0, v5
+; GFX7-NEXT:    v_max_i32_e32 v5, v6, v5
 ; GFX7-NEXT:    v_bfe_i32 v2, v2, 0, 16
+; GFX7-NEXT:    v_lshlrev_b32_e32 v6, 16, v5
 ; GFX7-NEXT:    v_bfe_i32 v3, v3, 0, 16
+; GFX7-NEXT:    v_or_b32_e32 v4, v4, v6
 ; GFX7-NEXT:    v_sub_i32_e32 v6, vcc, 0, v2
-; GFX7-NEXT:    v_max_i32_e32 v2, v2, v6
+; GFX7-NEXT:    v_max_i32_e32 v2, v6, v2
 ; GFX7-NEXT:    v_sub_i32_e32 v6, vcc, 0, v3
-; GFX7-NEXT:    v_max_i32_e32 v3, v3, v6
-; GFX7-NEXT:    v_bfe_i32 v0, v0, 0, 16
-; GFX7-NEXT:    v_lshlrev_b32_e32 v3, 16, v3
-; GFX7-NEXT:    v_bfe_i32 v1, v1, 0, 16
-; GFX7-NEXT:    v_or_b32_e32 v2, v2, v3
-; GFX7-NEXT:    v_sub_i32_e32 v3, vcc, 0, v0
-; GFX7-NEXT:    v_max_i32_e32 v0, v0, v3
-; GFX7-NEXT:    v_sub_i32_e32 v3, vcc, 0, v1
-; GFX7-NEXT:    v_bfe_i32 v5, v5, 0, 16
-; GFX7-NEXT:    v_max_i32_e32 v1, v1, v3
-; GFX7-NEXT:    v_bfe_i32 v4, v4, 0, 16
+; GFX7-NEXT:    v_max_i32_e32 v3, v6, v3
+; GFX7-NEXT:    v_lshlrev_b32_e32 v6, 16, v3
 ; GFX7-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
-; GFX7-NEXT:    v_sub_i32_e32 v3, vcc, 0, v5
+; GFX7-NEXT:    v_or_b32_e32 v2, v2, v6
 ; GFX7-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX7-NEXT:    v_sub_i32_e32 v1, vcc, 0, v4
-; GFX7-NEXT:    v_max_i32_e32 v5, v5, v3
-; GFX7-NEXT:    v_max_i32_e32 v1, v4, v1
-; GFX7-NEXT:    v_lshlrev_b32_e32 v3, 16, v5
-; GFX7-NEXT:    v_or_b32_e32 v4, v1, v3
-; GFX7-NEXT:    v_alignbit_b32 v1, v2, v0, 16
-; GFX7-NEXT:    v_lshrrev_b32_e32 v3, 16, v2
+; GFX7-NEXT:    v_alignbit_b32 v1, v2, v1, 16
 ; GFX7-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: v_abs_v6i16:
@@ -509,83 +505,79 @@ define <8 x i16> @v_abs_v8i16(<8 x i16> %arg) {
 ; GFX6-LABEL: v_abs_v8i16:
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX6-NEXT:    v_bfe_i32 v0, v0, 0, 16
+; GFX6-NEXT:    v_bfe_i32 v1, v1, 0, 16
+; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, 0, v0
+; GFX6-NEXT:    v_bfe_i32 v4, v4, 0, 16
+; GFX6-NEXT:    v_max_i32_e32 v0, v8, v0
+; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, 0, v1
+; GFX6-NEXT:    v_bfe_i32 v5, v5, 0, 16
+; GFX6-NEXT:    v_max_i32_e32 v1, v8, v1
+; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, 0, v4
 ; GFX6-NEXT:    v_bfe_i32 v6, v6, 0, 16
+; GFX6-NEXT:    v_max_i32_e32 v4, v8, v4
+; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, 0, v5
 ; GFX6-NEXT:    v_bfe_i32 v7, v7, 0, 16
+; GFX6-NEXT:    v_max_i32_e32 v5, v8, v5
 ; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, 0, v6
-; GFX6-NEXT:    v_max_i32_e32 v6, v6, v8
+; GFX6-NEXT:    v_max_i32_e32 v6, v8, v6
 ; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, 0, v7
-; GFX6-NEXT:    v_max_i32_e32 v7, v7, v8
-; GFX6-NEXT:    v_bfe_i32 v4, v4, 0, 16
-; GFX6-NEXT:    v_lshlrev_b32_e32 v7, 16, v7
-; GFX6-NEXT:    v_bfe_i32 v5, v5, 0, 16
-; GFX6-NEXT:    v_or_b32_e32 v6, v6, v7
-; GFX6-NEXT:    v_sub_i32_e32 v7, vcc, 0, v4
-; GFX6-NEXT:    v_max_i32_e32 v4, v4, v7
-; GFX6-NEXT:    v_sub_i32_e32 v7, vcc, 0, v5
-; GFX6-NEXT:    v_max_i32_e32 v5, v5, v7
+; GFX6-NEXT:    v_max_i32_e32 v7, v8, v7
 ; GFX6-NEXT:    v_bfe_i32 v2, v2, 0, 16
-; GFX6-NEXT:    v_lshlrev_b32_e32 v5, 16, v5
+; GFX6-NEXT:    v_lshlrev_b32_e32 v8, 16, v7
 ; GFX6-NEXT:    v_bfe_i32 v3, v3, 0, 16
-; GFX6-NEXT:    v_or_b32_e32 v4, v4, v5
-; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, 0, v2
-; GFX6-NEXT:    v_max_i32_e32 v2, v2, v5
-; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, 0, v3
-; GFX6-NEXT:    v_max_i32_e32 v3, v3, v5
-; GFX6-NEXT:    v_bfe_i32 v0, v0, 0, 16
-; GFX6-NEXT:    v_lshlrev_b32_e32 v3, 16, v3
-; GFX6-NEXT:    v_bfe_i32 v1, v1, 0, 16
-; GFX6-NEXT:    v_or_b32_e32 v2, v2, v3
-; GFX6-NEXT:    v_sub_i32_e32 v3, vcc, 0, v0
-; GFX6-NEXT:    v_max_i32_e32 v0, v0, v3
-; GFX6-NEXT:    v_sub_i32_e32 v3, vcc, 0, v1
-; GFX6-NEXT:    v_max_i32_e32 v1, v1, v3
+; GFX6-NEXT:    v_or_b32_e32 v6, v6, v8
+; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, 0, v2
+; GFX6-NEXT:    v_max_i32_e32 v2, v8, v2
+; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, 0, v3
+; GFX6-NEXT:    v_max_i32_e32 v3, v8, v3
+; GFX6-NEXT:    v_lshlrev_b32_e32 v8, 16, v3
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
+; GFX6-NEXT:    v_lshlrev_b32_e32 v5, 16, v5
+; GFX6-NEXT:    v_or_b32_e32 v2, v2, v8
 ; GFX6-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX6-NEXT:    v_alignbit_b32 v1, v2, v0, 16
-; GFX6-NEXT:    v_alignbit_b32 v5, v6, v4, 16
-; GFX6-NEXT:    v_lshrrev_b32_e32 v3, 16, v2
-; GFX6-NEXT:    v_lshrrev_b32_e32 v7, 16, v6
+; GFX6-NEXT:    v_or_b32_e32 v4, v4, v5
+; GFX6-NEXT:    v_alignbit_b32 v1, v2, v1, 16
+; GFX6-NEXT:    v_alignbit_b32 v5, v6, v5, 16
 ; GFX6-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX7-LABEL: v_abs_v8i16:
 ; GFX7:       ; %bb.0:
 ; GFX7-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT:    v_bfe_i32 v0, v0, 0, 16
+; GFX7-NEXT:    v_bfe_i32 v1, v1, 0, 16
+; GFX7-NEXT:    v_sub_i32_e32 v8, vcc, 0, v0
+; GFX7-NEXT:    v_bfe_i32 v4, v4, 0, 16
+; GFX7-NEXT:    v_max_i32_e32 v0, v8, v0
+; GFX7-NEXT:    v_sub_i32_e32 v8, vcc, 0, v1
+; GFX7-NEXT:    v_bfe_i32 v5, v5, 0, 16
+; GFX7-NEXT:    v_max_i32_e32 v1, v8, v1
+; GFX7-NEXT:    v_sub_i32_e32 v8, vcc, 0, v4
 ; GFX7-NEXT:    v_bfe_i32 v6, v6, 0, 16
+; GFX7-NEXT:    v_max_i32_e32 v4, v8, v4
+; GFX7-NEXT:    v_sub_i32_e32 v8, vcc, 0, v5
 ; GFX7-NEXT:    v_bfe_i32 v7, v7, 0, 16
+; GFX7-NEXT:    v_max_i32_e32 v5, v8, v5
 ; GFX7-NEXT:    v_sub_i32_e32 v8, vcc, 0, v6
-; GFX7-NEXT:    v_max_i32_e32 v6, v6, v8
+; GFX7-NEXT:    v_max_i32_e32 v6, v8, v6
 ; GFX7-NEXT:    v_sub_i32_e32 v8, vcc, 0, v7
-; GFX7-NEXT:    v_max_i32_e32 v7, v7, v8
-; GFX7-NEXT:    v_bfe_i32 v4, v4, 0, 16
-; GFX7-NEXT:    v_lshlrev_b32_e32 v7, 16, v7
-; GFX7-NEXT:    v_bfe_i32 v5, v5, 0, 16
-; GFX7-NEXT:    v_or_b32_e32 v6, v6, v7
-; GFX7-NEXT:    v_sub_i32_e32 v7, vcc, 0, v4
-; GFX7-NEXT:    v_max_i32_e32 v4, v4, v7
-; GFX7-NEXT:    v_sub_i32_e32 v7, vcc, 0, v5
-; GFX7-NEXT:    v_max_i32_e32 v5, v5, v7
+; GFX7-NEXT:    v_max_i32_e32 v7, v8, v7
 ; GFX7-NEXT:    v_bfe_i32 v2, v2, 0, 16
-; GFX7-NEXT:    v_lshlrev_b32_e32 v5, 16, v5
+; GFX7-NEXT:    v_lshlrev_b32_e32 v8, 16, v7
 ; GFX7-NEXT:    v_bfe_i32 v3, v3, 0, 16
-; GFX7-NEXT:    v_or_b32_e32 v4, v4, v5
-; GFX7-NEXT:    v_sub_i32_e32 v5, vcc, 0, v2
-; GFX7-NEXT:    v_max_i32_e32 v2, v2, v5
-; GFX7-NEXT:    v_sub_i32_e32 v5, vcc, 0, v3
-; GFX7-NEXT:    v_max_i32_e32 v3, v3, v5
-; GFX7-NEXT:    v_bfe_i32 v0, v0, 0, 16
-; GFX7-NEXT:    v_lshlrev_b32_e32 v3, 16, v3
-; GFX7-NEXT:    v_bfe_i32 v1, v1, 0, 16
-; GFX7-NEXT:    v_or_b32_e32 v2, v2, v3
-; GFX7-NEXT:    v_sub_i32_e32 v3, vcc, 0, v0
-; GFX7-NEXT:    v_max_i32_e32 v0, v0, v3
-; GFX7-NEXT:    v_sub_i32_e32 v3, vcc, 0, v1
-; GFX7-NEXT:    v_max_i32_e32 v1, v1, v3
+; GFX7-NEXT:    v_or_b32_e32 v6, v6, v8
+; GFX7-NEXT:    v_sub_i32_e32 v8, vcc, 0, v2
+; GFX7-NEXT:    v_max_i32_e32 v2, v8, v2
+; GFX7-NEXT:    v_sub_i32_e32 v8, vcc, 0, v3
+; GFX7-NEXT:    v_max_i32_e32 v3, v8, v3
+; GFX7-NEXT:    v_lshlrev_b32_e32 v8, 16, v3
 ; GFX7-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
+; GFX7-NEXT:    v_lshlrev_b32_e32 v5, 16, v5
+; GFX7-NEXT:    v_or_b32_e32 v2, v2, v8
 ; GFX7-NEXT:    v_or_b32_e32 v0, v0, v1
-; GFX7-NEXT:    v_alignbit_b32 v1, v2, v0, 16
-; GFX7-NEXT:    v_alignbit_b32 v5, v6, v4, 16
-; GFX7-NEXT:    v_lshrrev_b32_e32 v3, 16, v2
-; GFX7-NEXT:    v_lshrrev_b32_e32 v7, 16, v6
+; GFX7-NEXT:    v_or_b32_e32 v4, v4, v5
+; GFX7-NEXT:    v_alignbit_b32 v1, v2, v1, 16
+; GFX7-NEXT:    v_alignbit_b32 v5, v6, v5, 16
 ; GFX7-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: v_abs_v8i16:
@@ -682,155 +674,147 @@ define <16 x i16> @v_abs_v16i16(<16 x i16> %arg) {
 ; GFX6-LABEL: v_abs_v16i16:
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX6-NEXT:    v_bfe_i32 v0, v0, 0, 16
+; GFX6-NEXT:    v_bfe_i32 v1, v1, 0, 16
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0, v0
+; GFX6-NEXT:    v_bfe_i32 v4, v4, 0, 16
+; GFX6-NEXT:    v_max_i32_e32 v0, v16, v0
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0, v1
+; GFX6-NEXT:    v_bfe_i32 v5, v5, 0, 16
+; GFX6-NEXT:    v_max_i32_e32 v1, v16, v1
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0, v4
+; GFX6-NEXT:    v_bfe_i32 v8, v8, 0, 16
+; GFX6-NEXT:    v_max_i32_e32 v4, v16, v4
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0, v5
+; GFX6-NEXT:    v_bfe_i32 v9, v9, 0, 16
+; GFX6-NEXT:    v_max_i32_e32 v5, v16, v5
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0, v8
+; GFX6-NEXT:    v_bfe_i32 v12, v12, 0, 16
+; GFX6-NEXT:    v_max_i32_e32 v8, v16, v8
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0, v9
+; GFX6-NEXT:    v_bfe_i32 v13, v13, 0, 16
+; GFX6-NEXT:    v_max_i32_e32 v9, v16, v9
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0, v12
 ; GFX6-NEXT:    v_bfe_i32 v14, v14, 0, 16
+; GFX6-NEXT:    v_max_i32_e32 v12, v16, v12
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0, v13
 ; GFX6-NEXT:    v_bfe_i32 v15, v15, 0, 16
+; GFX6-NEXT:    v_max_i32_e32 v13, v16, v13
 ; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0, v14
-; GFX6-NEXT:    v_max_i32_e32 v14, v14, v16
+; GFX6-NEXT:    v_max_i32_e32 v14, v16, v14
 ; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0, v15
-; GFX6-NEXT:    v_max_i32_e32 v15, v15, v16
-; GFX6-NEXT:    v_bfe_i32 v12, v12, 0, 16
-; GFX6-NEXT:    v_lshlrev_b32_e32 v15, 16, v15
-; GFX6-NEXT:    v_bfe_i32 v13, v13, 0, 16
-; GFX6-NEXT:    v_or_b32_e32 v14, v14, v15
-; GFX6-NEXT:    v_sub_i32_e32 v15, vcc, 0, v12
-; GFX6-NEXT:    v_max_i32_e32 v12, v12, v15
-; GFX6-NEXT:    v_sub_i32_e32 v15, vcc, 0, v13
-; GFX6-NEXT:    v_max_i32_e32 v13, v13, v15
+; GFX6-NEXT:    v_max_i32_e32 v15, v16, v15
 ; GFX6-NEXT:    v_bfe_i32 v10, v10, 0, 16
-; GFX6-NEXT:    v_lshlrev_b32_e32 v13, 16, v13
+; GFX6-NEXT:    v_lshlrev_b32_e32 v16, 16, v15
 ; GFX6-NEXT:    v_bfe_i32 v11, v11, 0, 16
-; ...
[truncated]

}
}

// max(x, neg(x)) -> abs(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this in the generic part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this in the generic part?

All ISD::SMAX nodes are sent to this function from the switch in PerformDAGCombine. I don't see where else to put it. This function is intended to process min or max nodes so it makes sense to put it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any AMDGPU specific ISD here so why can't this be done in the generic combine part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed smax transformation from this PR. Will do it in a separate generic PR.

@jayfoad
Copy link
Contributor

jayfoad commented Oct 17, 2025

Interesting. We don't have single VALU instruction for ABS, but we do have an isel pattern that expands it to V_SUB_U32+V_MAX_I32, so I guess it's OK to say that ABS is legal. Are there any significant improvements or regressions in the lit tests?

@LU-JOHN
Copy link
Contributor Author

LU-JOHN commented Oct 17, 2025

Interesting. We don't have single VALU instruction for ABS, but we do have an isel pattern that expands it to V_SUB_U32+V_MAX_I32, so I guess it's OK to say that ABS is legal. Are there any significant improvements or regressions in the lit tests?

In amdgpu-codegenprepare-idiv.ll, the expansion for sdiv used to generate:

; GFX9-NEXT:    s_ashr_i32 s4, s3, 31
; GFX9-NEXT:    s_add_i32 s3, s3, s4
; GFX9-NEXT:    s_xor_b32 s3, s3, s4

Now it generates:

; GFX9-NEXT: s_abs_i32 s4, s3

In abs_i16.ll, v_abs_v*i16 are implemented without v_lshrrev_b32_e32, accounting for 32 fewer check lines.

@LU-JOHN
Copy link
Contributor Author

LU-JOHN commented Oct 17, 2025

Interesting. We don't have single VALU instruction for ABS, but we do have an isel pattern that expands it to V_SUB_U32+V_MAX_I32, so I guess it's OK to say that ABS is legal. Are there any significant improvements or regressions in the lit tests?

In amdgpu-codegenprepare-idiv.ll, the expansion for sdiv used to generate:

; GFX9-NEXT:    s_ashr_i32 s4, s3, 31
; GFX9-NEXT:    s_add_i32 s3, s3, s4
; GFX9-NEXT:    s_xor_b32 s3, s3, s4

Now it generates:

; GFX9-NEXT: s_abs_i32 s4, s3

In abs_i16.ll, v_abs_v*i16 are implemented with fewer v_lshlrrev_b32_e32, accounting for 32 fewer check lines.

This PR is in preparation for generating absdiff.

@jayfoad
Copy link
Contributor

jayfoad commented Oct 17, 2025

Interesting. We don't have single VALU instruction for ABS, but we do have an isel pattern that expands it to V_SUB_U32+V_MAX_I32, so I guess it's OK to say that ABS is legal. Are there any significant improvements or regressions in the lit tests?

In amdgpu-codegenprepare-idiv.ll, the expansion for sdiv used to generate:

; GFX9-NEXT:    s_ashr_i32 s4, s3, 31
; GFX9-NEXT:    s_add_i32 s3, s3, s4
; GFX9-NEXT:    s_xor_b32 s3, s3, s4

Now it generates:

; GFX9-NEXT: s_abs_i32 s4, s3

In abs_i16.ll, v_abs_v*i16 are implemented without v_lshrrev_b32_e32, accounting for 32 fewer check lines.

OK, that's good. But another way to implement that would be to add another pattern similar to this one, that matches the ashr/add/xor version:

def : GCNPat <
  (i32 (smax i32:$x, (i32 (ineg i32:$x)))),
  (i32 (UniformBinFrag<smax> i32:$x, (i32 (ineg i32:$x)))),
  (S_ABS_I32 SReg_32:$x)
>;

@jayfoad
Copy link
Contributor

jayfoad commented Oct 17, 2025

I have no objection to the current patch. I just wanted to explore the alternatives.

@LU-JOHN LU-JOHN merged commit f7c9618 into llvm:main Oct 17, 2025
10 checks passed
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test changes seem incidental? Missing test that shows a targeted change?

LU-JOHN added a commit that referenced this pull request Oct 19, 2025
Fix bug introduced in #163907.
32-bit abs is not legal on R600.

---------

Signed-off-by: John Lu <[email protected]>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 19, 2025
Fix bug introduced in llvm/llvm-project#163907.
32-bit abs is not legal on R600.

---------

Signed-off-by: John Lu <[email protected]>
@LU-JOHN
Copy link
Contributor Author

LU-JOHN commented Oct 23, 2025

These test changes seem incidental? Missing test that shows a targeted change?

Making ABS legal facilitates generation of s_absdiff #164835.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants