-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[AMDGPU] 32-bit ABS is a legal DAG node #163907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: John Lu <[email protected]>
|
@llvm/pr-subscribers-backend-amdgpu Author: None (LU-JOHN) Changes32-bit ABS can be lowered legally. Patch is 108.10 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/163907.diff 10 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index 1b559a628be08..8ed4062e43946 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -514,8 +514,8 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
MVT::i64, Custom);
setOperationAction(ISD::SELECT_CC, MVT::i64, Expand);
- setOperationAction({ISD::SMIN, ISD::UMIN, ISD::SMAX, ISD::UMAX}, MVT::i32,
- Legal);
+ setOperationAction({ISD::ABS, ISD::SMIN, ISD::UMIN, ISD::SMAX, ISD::UMAX},
+ MVT::i32, Legal);
setOperationAction(
{ISD::CTTZ, ISD::CTTZ_ZERO_UNDEF, ISD::CTLZ, ISD::CTLZ_ZERO_UNDEF},
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a2841c114a698..8ded201f03055 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -14945,6 +14945,13 @@ SDValue SITargetLowering::performMinMaxCombine(SDNode *N,
}
}
+ // max(x, neg(x)) -> abs(x)
+ if (Opc == ISD::SMAX && VT == MVT::i32) {
+ SDValue Value;
+ if (sd_match(N, m_SMax(m_Value(Value), m_Neg(m_Deferred(Value)))))
+ return DAG.getNode(ISD::ABS, SDLoc(N), VT, Value);
+ }
+
// min(max(x, K0), K1), K0 < K1 -> med3(x, K0, K1)
// max(min(x, K0), K1), K1 < K0 -> med3(x, K1, K0)
if (Opc == ISD::SMIN && Op0.getOpcode() == ISD::SMAX && Op0.hasOneUse()) {
diff --git a/llvm/test/CodeGen/AMDGPU/abs_i16.ll b/llvm/test/CodeGen/AMDGPU/abs_i16.ll
index 7633ba0eb4f9c..66cc7f3db03c2 100644
--- a/llvm/test/CodeGen/AMDGPU/abs_i16.ll
+++ b/llvm/test/CodeGen/AMDGPU/abs_i16.ll
@@ -15,7 +15,7 @@ define i16 @abs_i16(i16 %arg) {
; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX6-NEXT: v_bfe_i32 v0, v0, 0, 16
; GFX6-NEXT: v_sub_i32_e32 v1, vcc, 0, v0
-; GFX6-NEXT: v_max_i32_e32 v0, v0, v1
+; GFX6-NEXT: v_max_i32_e32 v0, v1, v0
; GFX6-NEXT: s_setpc_b64 s[30:31]
;
; GFX7-LABEL: abs_i16:
@@ -23,7 +23,7 @@ define i16 @abs_i16(i16 %arg) {
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX7-NEXT: v_bfe_i32 v0, v0, 0, 16
; GFX7-NEXT: v_sub_i32_e32 v1, vcc, 0, v0
-; GFX7-NEXT: v_max_i32_e32 v0, v0, v1
+; GFX7-NEXT: v_max_i32_e32 v0, v1, v0
; GFX7-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: abs_i16:
@@ -97,9 +97,9 @@ define <2 x i16> @v_abs_v2i16(<2 x i16> %arg) {
; GFX6-NEXT: v_bfe_i32 v0, v0, 0, 16
; GFX6-NEXT: v_bfe_i32 v1, v1, 0, 16
; GFX6-NEXT: v_sub_i32_e32 v2, vcc, 0, v0
-; GFX6-NEXT: v_max_i32_e32 v0, v0, v2
+; GFX6-NEXT: v_max_i32_e32 v0, v2, v0
; GFX6-NEXT: v_sub_i32_e32 v2, vcc, 0, v1
-; GFX6-NEXT: v_max_i32_e32 v1, v1, v2
+; GFX6-NEXT: v_max_i32_e32 v1, v2, v1
; GFX6-NEXT: v_lshlrev_b32_e32 v2, 16, v1
; GFX6-NEXT: v_or_b32_e32 v0, v0, v2
; GFX6-NEXT: s_setpc_b64 s[30:31]
@@ -110,9 +110,9 @@ define <2 x i16> @v_abs_v2i16(<2 x i16> %arg) {
; GFX7-NEXT: v_bfe_i32 v0, v0, 0, 16
; GFX7-NEXT: v_bfe_i32 v1, v1, 0, 16
; GFX7-NEXT: v_sub_i32_e32 v2, vcc, 0, v0
-; GFX7-NEXT: v_max_i32_e32 v0, v0, v2
+; GFX7-NEXT: v_max_i32_e32 v0, v2, v0
; GFX7-NEXT: v_sub_i32_e32 v2, vcc, 0, v1
-; GFX7-NEXT: v_max_i32_e32 v1, v1, v2
+; GFX7-NEXT: v_max_i32_e32 v1, v2, v1
; GFX7-NEXT: v_lshlrev_b32_e32 v2, 16, v1
; GFX7-NEXT: v_or_b32_e32 v0, v0, v2
; GFX7-NEXT: s_setpc_b64 s[30:31]
@@ -172,15 +172,15 @@ define <3 x i16> @v_abs_v3i16(<3 x i16> %arg) {
; GFX6-NEXT: v_bfe_i32 v0, v0, 0, 16
; GFX6-NEXT: v_bfe_i32 v1, v1, 0, 16
; GFX6-NEXT: v_sub_i32_e32 v3, vcc, 0, v0
-; GFX6-NEXT: v_max_i32_e32 v0, v0, v3
-; GFX6-NEXT: v_sub_i32_e32 v3, vcc, 0, v1
-; GFX6-NEXT: v_max_i32_e32 v1, v1, v3
; GFX6-NEXT: v_bfe_i32 v2, v2, 0, 16
+; GFX6-NEXT: v_max_i32_e32 v0, v3, v0
+; GFX6-NEXT: v_sub_i32_e32 v3, vcc, 0, v1
+; GFX6-NEXT: v_max_i32_e32 v1, v3, v1
+; GFX6-NEXT: v_sub_i32_e32 v3, vcc, 0, v2
; GFX6-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX6-NEXT: v_max_i32_e32 v2, v3, v2
; GFX6-NEXT: v_or_b32_e32 v0, v0, v1
-; GFX6-NEXT: v_sub_i32_e32 v1, vcc, 0, v2
-; GFX6-NEXT: v_max_i32_e32 v2, v2, v1
-; GFX6-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GFX6-NEXT: v_alignbit_b32 v1, v2, v1, 16
; GFX6-NEXT: s_setpc_b64 s[30:31]
;
; GFX7-LABEL: v_abs_v3i16:
@@ -189,15 +189,15 @@ define <3 x i16> @v_abs_v3i16(<3 x i16> %arg) {
; GFX7-NEXT: v_bfe_i32 v0, v0, 0, 16
; GFX7-NEXT: v_bfe_i32 v1, v1, 0, 16
; GFX7-NEXT: v_sub_i32_e32 v3, vcc, 0, v0
-; GFX7-NEXT: v_max_i32_e32 v0, v0, v3
-; GFX7-NEXT: v_sub_i32_e32 v3, vcc, 0, v1
-; GFX7-NEXT: v_max_i32_e32 v1, v1, v3
; GFX7-NEXT: v_bfe_i32 v2, v2, 0, 16
+; GFX7-NEXT: v_max_i32_e32 v0, v3, v0
+; GFX7-NEXT: v_sub_i32_e32 v3, vcc, 0, v1
+; GFX7-NEXT: v_max_i32_e32 v1, v3, v1
+; GFX7-NEXT: v_sub_i32_e32 v3, vcc, 0, v2
; GFX7-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX7-NEXT: v_max_i32_e32 v2, v3, v2
; GFX7-NEXT: v_or_b32_e32 v0, v0, v1
-; GFX7-NEXT: v_sub_i32_e32 v1, vcc, 0, v2
-; GFX7-NEXT: v_max_i32_e32 v2, v2, v1
-; GFX7-NEXT: v_alignbit_b32 v1, v2, v0, 16
+; GFX7-NEXT: v_alignbit_b32 v1, v2, v1, 16
; GFX7-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: v_abs_v3i16:
@@ -262,47 +262,45 @@ define <4 x i16> @v_abs_v4i16(<4 x i16> %arg) {
; GFX6-LABEL: v_abs_v4i16:
; GFX6: ; %bb.0:
; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX6-NEXT: v_bfe_i32 v0, v0, 0, 16
+; GFX6-NEXT: v_bfe_i32 v1, v1, 0, 16
+; GFX6-NEXT: v_sub_i32_e32 v4, vcc, 0, v0
; GFX6-NEXT: v_bfe_i32 v2, v2, 0, 16
+; GFX6-NEXT: v_max_i32_e32 v0, v4, v0
+; GFX6-NEXT: v_sub_i32_e32 v4, vcc, 0, v1
; GFX6-NEXT: v_bfe_i32 v3, v3, 0, 16
+; GFX6-NEXT: v_max_i32_e32 v1, v4, v1
; GFX6-NEXT: v_sub_i32_e32 v4, vcc, 0, v2
-; GFX6-NEXT: v_max_i32_e32 v2, v2, v4
+; GFX6-NEXT: v_max_i32_e32 v2, v4, v2
; GFX6-NEXT: v_sub_i32_e32 v4, vcc, 0, v3
-; GFX6-NEXT: v_max_i32_e32 v3, v3, v4
-; GFX6-NEXT: v_bfe_i32 v0, v0, 0, 16
-; GFX6-NEXT: v_lshlrev_b32_e32 v3, 16, v3
-; GFX6-NEXT: v_bfe_i32 v1, v1, 0, 16
-; GFX6-NEXT: v_or_b32_e32 v2, v2, v3
-; GFX6-NEXT: v_sub_i32_e32 v3, vcc, 0, v0
-; GFX6-NEXT: v_max_i32_e32 v0, v0, v3
-; GFX6-NEXT: v_sub_i32_e32 v3, vcc, 0, v1
-; GFX6-NEXT: v_max_i32_e32 v1, v1, v3
+; GFX6-NEXT: v_max_i32_e32 v3, v4, v3
+; GFX6-NEXT: v_lshlrev_b32_e32 v4, 16, v3
; GFX6-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX6-NEXT: v_or_b32_e32 v2, v2, v4
; GFX6-NEXT: v_or_b32_e32 v0, v0, v1
-; GFX6-NEXT: v_alignbit_b32 v1, v2, v0, 16
-; GFX6-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GFX6-NEXT: v_alignbit_b32 v1, v2, v1, 16
; GFX6-NEXT: s_setpc_b64 s[30:31]
;
; GFX7-LABEL: v_abs_v4i16:
; GFX7: ; %bb.0:
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT: v_bfe_i32 v0, v0, 0, 16
+; GFX7-NEXT: v_bfe_i32 v1, v1, 0, 16
+; GFX7-NEXT: v_sub_i32_e32 v4, vcc, 0, v0
; GFX7-NEXT: v_bfe_i32 v2, v2, 0, 16
+; GFX7-NEXT: v_max_i32_e32 v0, v4, v0
+; GFX7-NEXT: v_sub_i32_e32 v4, vcc, 0, v1
; GFX7-NEXT: v_bfe_i32 v3, v3, 0, 16
+; GFX7-NEXT: v_max_i32_e32 v1, v4, v1
; GFX7-NEXT: v_sub_i32_e32 v4, vcc, 0, v2
-; GFX7-NEXT: v_max_i32_e32 v2, v2, v4
+; GFX7-NEXT: v_max_i32_e32 v2, v4, v2
; GFX7-NEXT: v_sub_i32_e32 v4, vcc, 0, v3
-; GFX7-NEXT: v_max_i32_e32 v3, v3, v4
-; GFX7-NEXT: v_bfe_i32 v0, v0, 0, 16
-; GFX7-NEXT: v_lshlrev_b32_e32 v3, 16, v3
-; GFX7-NEXT: v_bfe_i32 v1, v1, 0, 16
-; GFX7-NEXT: v_or_b32_e32 v2, v2, v3
-; GFX7-NEXT: v_sub_i32_e32 v3, vcc, 0, v0
-; GFX7-NEXT: v_max_i32_e32 v0, v0, v3
-; GFX7-NEXT: v_sub_i32_e32 v3, vcc, 0, v1
-; GFX7-NEXT: v_max_i32_e32 v1, v1, v3
+; GFX7-NEXT: v_max_i32_e32 v3, v4, v3
+; GFX7-NEXT: v_lshlrev_b32_e32 v4, 16, v3
; GFX7-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX7-NEXT: v_or_b32_e32 v2, v2, v4
; GFX7-NEXT: v_or_b32_e32 v0, v0, v1
-; GFX7-NEXT: v_alignbit_b32 v1, v2, v0, 16
-; GFX7-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GFX7-NEXT: v_alignbit_b32 v1, v2, v1, 16
; GFX7-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: v_abs_v4i16:
@@ -370,63 +368,61 @@ define <6 x i16> @v_abs_v6i16(<6 x i16> %arg) {
; GFX6-LABEL: v_abs_v6i16:
; GFX6: ; %bb.0:
; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX6-NEXT: v_bfe_i32 v0, v0, 0, 16
+; GFX6-NEXT: v_bfe_i32 v1, v1, 0, 16
+; GFX6-NEXT: v_sub_i32_e32 v6, vcc, 0, v0
+; GFX6-NEXT: v_bfe_i32 v4, v4, 0, 16
+; GFX6-NEXT: v_max_i32_e32 v0, v6, v0
+; GFX6-NEXT: v_sub_i32_e32 v6, vcc, 0, v1
+; GFX6-NEXT: v_bfe_i32 v5, v5, 0, 16
+; GFX6-NEXT: v_max_i32_e32 v1, v6, v1
+; GFX6-NEXT: v_sub_i32_e32 v6, vcc, 0, v4
+; GFX6-NEXT: v_max_i32_e32 v4, v6, v4
+; GFX6-NEXT: v_sub_i32_e32 v6, vcc, 0, v5
+; GFX6-NEXT: v_max_i32_e32 v5, v6, v5
; GFX6-NEXT: v_bfe_i32 v2, v2, 0, 16
+; GFX6-NEXT: v_lshlrev_b32_e32 v6, 16, v5
; GFX6-NEXT: v_bfe_i32 v3, v3, 0, 16
+; GFX6-NEXT: v_or_b32_e32 v4, v4, v6
; GFX6-NEXT: v_sub_i32_e32 v6, vcc, 0, v2
-; GFX6-NEXT: v_max_i32_e32 v2, v2, v6
+; GFX6-NEXT: v_max_i32_e32 v2, v6, v2
; GFX6-NEXT: v_sub_i32_e32 v6, vcc, 0, v3
-; GFX6-NEXT: v_max_i32_e32 v3, v3, v6
-; GFX6-NEXT: v_bfe_i32 v0, v0, 0, 16
-; GFX6-NEXT: v_lshlrev_b32_e32 v3, 16, v3
-; GFX6-NEXT: v_bfe_i32 v1, v1, 0, 16
-; GFX6-NEXT: v_or_b32_e32 v2, v2, v3
-; GFX6-NEXT: v_sub_i32_e32 v3, vcc, 0, v0
-; GFX6-NEXT: v_max_i32_e32 v0, v0, v3
-; GFX6-NEXT: v_sub_i32_e32 v3, vcc, 0, v1
-; GFX6-NEXT: v_bfe_i32 v5, v5, 0, 16
-; GFX6-NEXT: v_max_i32_e32 v1, v1, v3
-; GFX6-NEXT: v_bfe_i32 v4, v4, 0, 16
+; GFX6-NEXT: v_max_i32_e32 v3, v6, v3
+; GFX6-NEXT: v_lshlrev_b32_e32 v6, 16, v3
; GFX6-NEXT: v_lshlrev_b32_e32 v1, 16, v1
-; GFX6-NEXT: v_sub_i32_e32 v3, vcc, 0, v5
+; GFX6-NEXT: v_or_b32_e32 v2, v2, v6
; GFX6-NEXT: v_or_b32_e32 v0, v0, v1
-; GFX6-NEXT: v_sub_i32_e32 v1, vcc, 0, v4
-; GFX6-NEXT: v_max_i32_e32 v5, v5, v3
-; GFX6-NEXT: v_max_i32_e32 v1, v4, v1
-; GFX6-NEXT: v_lshlrev_b32_e32 v3, 16, v5
-; GFX6-NEXT: v_or_b32_e32 v4, v1, v3
-; GFX6-NEXT: v_alignbit_b32 v1, v2, v0, 16
-; GFX6-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GFX6-NEXT: v_alignbit_b32 v1, v2, v1, 16
; GFX6-NEXT: s_setpc_b64 s[30:31]
;
; GFX7-LABEL: v_abs_v6i16:
; GFX7: ; %bb.0:
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT: v_bfe_i32 v0, v0, 0, 16
+; GFX7-NEXT: v_bfe_i32 v1, v1, 0, 16
+; GFX7-NEXT: v_sub_i32_e32 v6, vcc, 0, v0
+; GFX7-NEXT: v_bfe_i32 v4, v4, 0, 16
+; GFX7-NEXT: v_max_i32_e32 v0, v6, v0
+; GFX7-NEXT: v_sub_i32_e32 v6, vcc, 0, v1
+; GFX7-NEXT: v_bfe_i32 v5, v5, 0, 16
+; GFX7-NEXT: v_max_i32_e32 v1, v6, v1
+; GFX7-NEXT: v_sub_i32_e32 v6, vcc, 0, v4
+; GFX7-NEXT: v_max_i32_e32 v4, v6, v4
+; GFX7-NEXT: v_sub_i32_e32 v6, vcc, 0, v5
+; GFX7-NEXT: v_max_i32_e32 v5, v6, v5
; GFX7-NEXT: v_bfe_i32 v2, v2, 0, 16
+; GFX7-NEXT: v_lshlrev_b32_e32 v6, 16, v5
; GFX7-NEXT: v_bfe_i32 v3, v3, 0, 16
+; GFX7-NEXT: v_or_b32_e32 v4, v4, v6
; GFX7-NEXT: v_sub_i32_e32 v6, vcc, 0, v2
-; GFX7-NEXT: v_max_i32_e32 v2, v2, v6
+; GFX7-NEXT: v_max_i32_e32 v2, v6, v2
; GFX7-NEXT: v_sub_i32_e32 v6, vcc, 0, v3
-; GFX7-NEXT: v_max_i32_e32 v3, v3, v6
-; GFX7-NEXT: v_bfe_i32 v0, v0, 0, 16
-; GFX7-NEXT: v_lshlrev_b32_e32 v3, 16, v3
-; GFX7-NEXT: v_bfe_i32 v1, v1, 0, 16
-; GFX7-NEXT: v_or_b32_e32 v2, v2, v3
-; GFX7-NEXT: v_sub_i32_e32 v3, vcc, 0, v0
-; GFX7-NEXT: v_max_i32_e32 v0, v0, v3
-; GFX7-NEXT: v_sub_i32_e32 v3, vcc, 0, v1
-; GFX7-NEXT: v_bfe_i32 v5, v5, 0, 16
-; GFX7-NEXT: v_max_i32_e32 v1, v1, v3
-; GFX7-NEXT: v_bfe_i32 v4, v4, 0, 16
+; GFX7-NEXT: v_max_i32_e32 v3, v6, v3
+; GFX7-NEXT: v_lshlrev_b32_e32 v6, 16, v3
; GFX7-NEXT: v_lshlrev_b32_e32 v1, 16, v1
-; GFX7-NEXT: v_sub_i32_e32 v3, vcc, 0, v5
+; GFX7-NEXT: v_or_b32_e32 v2, v2, v6
; GFX7-NEXT: v_or_b32_e32 v0, v0, v1
-; GFX7-NEXT: v_sub_i32_e32 v1, vcc, 0, v4
-; GFX7-NEXT: v_max_i32_e32 v5, v5, v3
-; GFX7-NEXT: v_max_i32_e32 v1, v4, v1
-; GFX7-NEXT: v_lshlrev_b32_e32 v3, 16, v5
-; GFX7-NEXT: v_or_b32_e32 v4, v1, v3
-; GFX7-NEXT: v_alignbit_b32 v1, v2, v0, 16
-; GFX7-NEXT: v_lshrrev_b32_e32 v3, 16, v2
+; GFX7-NEXT: v_alignbit_b32 v1, v2, v1, 16
; GFX7-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: v_abs_v6i16:
@@ -509,83 +505,79 @@ define <8 x i16> @v_abs_v8i16(<8 x i16> %arg) {
; GFX6-LABEL: v_abs_v8i16:
; GFX6: ; %bb.0:
; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX6-NEXT: v_bfe_i32 v0, v0, 0, 16
+; GFX6-NEXT: v_bfe_i32 v1, v1, 0, 16
+; GFX6-NEXT: v_sub_i32_e32 v8, vcc, 0, v0
+; GFX6-NEXT: v_bfe_i32 v4, v4, 0, 16
+; GFX6-NEXT: v_max_i32_e32 v0, v8, v0
+; GFX6-NEXT: v_sub_i32_e32 v8, vcc, 0, v1
+; GFX6-NEXT: v_bfe_i32 v5, v5, 0, 16
+; GFX6-NEXT: v_max_i32_e32 v1, v8, v1
+; GFX6-NEXT: v_sub_i32_e32 v8, vcc, 0, v4
; GFX6-NEXT: v_bfe_i32 v6, v6, 0, 16
+; GFX6-NEXT: v_max_i32_e32 v4, v8, v4
+; GFX6-NEXT: v_sub_i32_e32 v8, vcc, 0, v5
; GFX6-NEXT: v_bfe_i32 v7, v7, 0, 16
+; GFX6-NEXT: v_max_i32_e32 v5, v8, v5
; GFX6-NEXT: v_sub_i32_e32 v8, vcc, 0, v6
-; GFX6-NEXT: v_max_i32_e32 v6, v6, v8
+; GFX6-NEXT: v_max_i32_e32 v6, v8, v6
; GFX6-NEXT: v_sub_i32_e32 v8, vcc, 0, v7
-; GFX6-NEXT: v_max_i32_e32 v7, v7, v8
-; GFX6-NEXT: v_bfe_i32 v4, v4, 0, 16
-; GFX6-NEXT: v_lshlrev_b32_e32 v7, 16, v7
-; GFX6-NEXT: v_bfe_i32 v5, v5, 0, 16
-; GFX6-NEXT: v_or_b32_e32 v6, v6, v7
-; GFX6-NEXT: v_sub_i32_e32 v7, vcc, 0, v4
-; GFX6-NEXT: v_max_i32_e32 v4, v4, v7
-; GFX6-NEXT: v_sub_i32_e32 v7, vcc, 0, v5
-; GFX6-NEXT: v_max_i32_e32 v5, v5, v7
+; GFX6-NEXT: v_max_i32_e32 v7, v8, v7
; GFX6-NEXT: v_bfe_i32 v2, v2, 0, 16
-; GFX6-NEXT: v_lshlrev_b32_e32 v5, 16, v5
+; GFX6-NEXT: v_lshlrev_b32_e32 v8, 16, v7
; GFX6-NEXT: v_bfe_i32 v3, v3, 0, 16
-; GFX6-NEXT: v_or_b32_e32 v4, v4, v5
-; GFX6-NEXT: v_sub_i32_e32 v5, vcc, 0, v2
-; GFX6-NEXT: v_max_i32_e32 v2, v2, v5
-; GFX6-NEXT: v_sub_i32_e32 v5, vcc, 0, v3
-; GFX6-NEXT: v_max_i32_e32 v3, v3, v5
-; GFX6-NEXT: v_bfe_i32 v0, v0, 0, 16
-; GFX6-NEXT: v_lshlrev_b32_e32 v3, 16, v3
-; GFX6-NEXT: v_bfe_i32 v1, v1, 0, 16
-; GFX6-NEXT: v_or_b32_e32 v2, v2, v3
-; GFX6-NEXT: v_sub_i32_e32 v3, vcc, 0, v0
-; GFX6-NEXT: v_max_i32_e32 v0, v0, v3
-; GFX6-NEXT: v_sub_i32_e32 v3, vcc, 0, v1
-; GFX6-NEXT: v_max_i32_e32 v1, v1, v3
+; GFX6-NEXT: v_or_b32_e32 v6, v6, v8
+; GFX6-NEXT: v_sub_i32_e32 v8, vcc, 0, v2
+; GFX6-NEXT: v_max_i32_e32 v2, v8, v2
+; GFX6-NEXT: v_sub_i32_e32 v8, vcc, 0, v3
+; GFX6-NEXT: v_max_i32_e32 v3, v8, v3
+; GFX6-NEXT: v_lshlrev_b32_e32 v8, 16, v3
; GFX6-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX6-NEXT: v_lshlrev_b32_e32 v5, 16, v5
+; GFX6-NEXT: v_or_b32_e32 v2, v2, v8
; GFX6-NEXT: v_or_b32_e32 v0, v0, v1
-; GFX6-NEXT: v_alignbit_b32 v1, v2, v0, 16
-; GFX6-NEXT: v_alignbit_b32 v5, v6, v4, 16
-; GFX6-NEXT: v_lshrrev_b32_e32 v3, 16, v2
-; GFX6-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GFX6-NEXT: v_or_b32_e32 v4, v4, v5
+; GFX6-NEXT: v_alignbit_b32 v1, v2, v1, 16
+; GFX6-NEXT: v_alignbit_b32 v5, v6, v5, 16
; GFX6-NEXT: s_setpc_b64 s[30:31]
;
; GFX7-LABEL: v_abs_v8i16:
; GFX7: ; %bb.0:
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT: v_bfe_i32 v0, v0, 0, 16
+; GFX7-NEXT: v_bfe_i32 v1, v1, 0, 16
+; GFX7-NEXT: v_sub_i32_e32 v8, vcc, 0, v0
+; GFX7-NEXT: v_bfe_i32 v4, v4, 0, 16
+; GFX7-NEXT: v_max_i32_e32 v0, v8, v0
+; GFX7-NEXT: v_sub_i32_e32 v8, vcc, 0, v1
+; GFX7-NEXT: v_bfe_i32 v5, v5, 0, 16
+; GFX7-NEXT: v_max_i32_e32 v1, v8, v1
+; GFX7-NEXT: v_sub_i32_e32 v8, vcc, 0, v4
; GFX7-NEXT: v_bfe_i32 v6, v6, 0, 16
+; GFX7-NEXT: v_max_i32_e32 v4, v8, v4
+; GFX7-NEXT: v_sub_i32_e32 v8, vcc, 0, v5
; GFX7-NEXT: v_bfe_i32 v7, v7, 0, 16
+; GFX7-NEXT: v_max_i32_e32 v5, v8, v5
; GFX7-NEXT: v_sub_i32_e32 v8, vcc, 0, v6
-; GFX7-NEXT: v_max_i32_e32 v6, v6, v8
+; GFX7-NEXT: v_max_i32_e32 v6, v8, v6
; GFX7-NEXT: v_sub_i32_e32 v8, vcc, 0, v7
-; GFX7-NEXT: v_max_i32_e32 v7, v7, v8
-; GFX7-NEXT: v_bfe_i32 v4, v4, 0, 16
-; GFX7-NEXT: v_lshlrev_b32_e32 v7, 16, v7
-; GFX7-NEXT: v_bfe_i32 v5, v5, 0, 16
-; GFX7-NEXT: v_or_b32_e32 v6, v6, v7
-; GFX7-NEXT: v_sub_i32_e32 v7, vcc, 0, v4
-; GFX7-NEXT: v_max_i32_e32 v4, v4, v7
-; GFX7-NEXT: v_sub_i32_e32 v7, vcc, 0, v5
-; GFX7-NEXT: v_max_i32_e32 v5, v5, v7
+; GFX7-NEXT: v_max_i32_e32 v7, v8, v7
; GFX7-NEXT: v_bfe_i32 v2, v2, 0, 16
-; GFX7-NEXT: v_lshlrev_b32_e32 v5, 16, v5
+; GFX7-NEXT: v_lshlrev_b32_e32 v8, 16, v7
; GFX7-NEXT: v_bfe_i32 v3, v3, 0, 16
-; GFX7-NEXT: v_or_b32_e32 v4, v4, v5
-; GFX7-NEXT: v_sub_i32_e32 v5, vcc, 0, v2
-; GFX7-NEXT: v_max_i32_e32 v2, v2, v5
-; GFX7-NEXT: v_sub_i32_e32 v5, vcc, 0, v3
-; GFX7-NEXT: v_max_i32_e32 v3, v3, v5
-; GFX7-NEXT: v_bfe_i32 v0, v0, 0, 16
-; GFX7-NEXT: v_lshlrev_b32_e32 v3, 16, v3
-; GFX7-NEXT: v_bfe_i32 v1, v1, 0, 16
-; GFX7-NEXT: v_or_b32_e32 v2, v2, v3
-; GFX7-NEXT: v_sub_i32_e32 v3, vcc, 0, v0
-; GFX7-NEXT: v_max_i32_e32 v0, v0, v3
-; GFX7-NEXT: v_sub_i32_e32 v3, vcc, 0, v1
-; GFX7-NEXT: v_max_i32_e32 v1, v1, v3
+; GFX7-NEXT: v_or_b32_e32 v6, v6, v8
+; GFX7-NEXT: v_sub_i32_e32 v8, vcc, 0, v2
+; GFX7-NEXT: v_max_i32_e32 v2, v8, v2
+; GFX7-NEXT: v_sub_i32_e32 v8, vcc, 0, v3
+; GFX7-NEXT: v_max_i32_e32 v3, v8, v3
+; GFX7-NEXT: v_lshlrev_b32_e32 v8, 16, v3
; GFX7-NEXT: v_lshlrev_b32_e32 v1, 16, v1
+; GFX7-NEXT: v_lshlrev_b32_e32 v5, 16, v5
+; GFX7-NEXT: v_or_b32_e32 v2, v2, v8
; GFX7-NEXT: v_or_b32_e32 v0, v0, v1
-; GFX7-NEXT: v_alignbit_b32 v1, v2, v0, 16
-; GFX7-NEXT: v_alignbit_b32 v5, v6, v4, 16
-; GFX7-NEXT: v_lshrrev_b32_e32 v3, 16, v2
-; GFX7-NEXT: v_lshrrev_b32_e32 v7, 16, v6
+; GFX7-NEXT: v_or_b32_e32 v4, v4, v5
+; GFX7-NEXT: v_alignbit_b32 v1, v2, v1, 16
+; GFX7-NEXT: v_alignbit_b32 v5, v6, v5, 16
; GFX7-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: v_abs_v8i16:
@@ -682,155 +674,147 @@ define <16 x i16> @v_abs_v16i16(<16 x i16> %arg) {
; GFX6-LABEL: v_abs_v16i16:
; GFX6: ; %bb.0:
; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX6-NEXT: v_bfe_i32 v0, v0, 0, 16
+; GFX6-NEXT: v_bfe_i32 v1, v1, 0, 16
+; GFX6-NEXT: v_sub_i32_e32 v16, vcc, 0, v0
+; GFX6-NEXT: v_bfe_i32 v4, v4, 0, 16
+; GFX6-NEXT: v_max_i32_e32 v0, v16, v0
+; GFX6-NEXT: v_sub_i32_e32 v16, vcc, 0, v1
+; GFX6-NEXT: v_bfe_i32 v5, v5, 0, 16
+; GFX6-NEXT: v_max_i32_e32 v1, v16, v1
+; GFX6-NEXT: v_sub_i32_e32 v16, vcc, 0, v4
+; GFX6-NEXT: v_bfe_i32 v8, v8, 0, 16
+; GFX6-NEXT: v_max_i32_e32 v4, v16, v4
+; GFX6-NEXT: v_sub_i32_e32 v16, vcc, 0, v5
+; GFX6-NEXT: v_bfe_i32 v9, v9, 0, 16
+; GFX6-NEXT: v_max_i32_e32 v5, v16, v5
+; GFX6-NEXT: v_sub_i32_e32 v16, vcc, 0, v8
+; GFX6-NEXT: v_bfe_i32 v12, v12, 0, 16
+; GFX6-NEXT: v_max_i32_e32 v8, v16, v8
+; GFX6-NEXT: v_sub_i32_e32 v16, vcc, 0, v9
+; GFX6-NEXT: v_bfe_i32 v13, v13, 0, 16
+; GFX6-NEXT: v_max_i32_e32 v9, v16, v9
+; GFX6-NEXT: v_sub_i32_e32 v16, vcc, 0, v12
; GFX6-NEXT: v_bfe_i32 v14, v14, 0, 16
+; GFX6-NEXT: v_max_i32_e32 v12, v16, v12
+; GFX6-NEXT: v_sub_i32_e32 v16, vcc, 0, v13
; GFX6-NEXT: v_bfe_i32 v15, v15, 0, 16
+; GFX6-NEXT: v_max_i32_e32 v13, v16, v13
; GFX6-NEXT: v_sub_i32_e32 v16, vcc, 0, v14
-; GFX6-NEXT: v_max_i32_e32 v14, v14, v16
+; GFX6-NEXT: v_max_i32_e32 v14, v16, v14
; GFX6-NEXT: v_sub_i32_e32 v16, vcc, 0, v15
-; GFX6-NEXT: v_max_i32_e32 v15, v15, v16
-; GFX6-NEXT: v_bfe_i32 v12, v12, 0, 16
-; GFX6-NEXT: v_lshlrev_b32_e32 v15, 16, v15
-; GFX6-NEXT: v_bfe_i32 v13, v13, 0, 16
-; GFX6-NEXT: v_or_b32_e32 v14, v14, v15
-; GFX6-NEXT: v_sub_i32_e32 v15, vcc, 0, v12
-; GFX6-NEXT: v_max_i32_e32 v12, v12, v15
-; GFX6-NEXT: v_sub_i32_e32 v15, vcc, 0, v13
-; GFX6-NEXT: v_max_i32_e32 v13, v13, v15
+; GFX6-NEXT: v_max_i32_e32 v15, v16, v15
; GFX6-NEXT: v_bfe_i32 v10, v10, 0, 16
-; GFX6-NEXT: v_lshlrev_b32_e32 v13, 16, v13
+; GFX6-NEXT: v_lshlrev_b32_e32 v16, 16, v15
; GFX6-NEXT: v_bfe_i32 v11, v11, 0, 16
-; ...
[truncated]
|
| } | ||
| } | ||
|
|
||
| // max(x, neg(x)) -> abs(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this in the generic part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this in the generic part?
All ISD::SMAX nodes are sent to this function from the switch in PerformDAGCombine. I don't see where else to put it. This function is intended to process min or max nodes so it makes sense to put it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any AMDGPU specific ISD here so why can't this be done in the generic combine part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed smax transformation from this PR. Will do it in a separate generic PR.
Signed-off-by: John Lu <[email protected]>
|
Interesting. We don't have single VALU instruction for ABS, but we do have an isel pattern that expands it to V_SUB_U32+V_MAX_I32, so I guess it's OK to say that ABS is legal. Are there any significant improvements or regressions in the lit tests? |
In amdgpu-codegenprepare-idiv.ll, the expansion for sdiv used to generate: Now it generates:
In abs_i16.ll, v_abs_v*i16 are implemented without v_lshrrev_b32_e32, accounting for 32 fewer check lines. |
This PR is in preparation for generating absdiff. |
OK, that's good. But another way to implement that would be to add another pattern similar to this one, that matches the ashr/add/xor version: |
|
I have no objection to the current patch. I just wanted to explore the alternatives. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test changes seem incidental? Missing test that shows a targeted change?
Fix bug introduced in #163907. 32-bit abs is not legal on R600. --------- Signed-off-by: John Lu <[email protected]>
Fix bug introduced in llvm/llvm-project#163907. 32-bit abs is not legal on R600. --------- Signed-off-by: John Lu <[email protected]>
Making ABS legal facilitates generation of s_absdiff #164835. |
32-bit ABS can be lowered legally.