-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[AMDGPU][SDAG] Support source modifiers on select integer operands #147325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-backend-amdgpu Author: Chris Jackson (chrisjbris) ChangesExtend the DAGCombine() for select to directly support fneg and fabs for i32, v2i32 and i64. Pre-cursor PR to simplify #140694. Full diff: https://github.com/llvm/llvm-project/pull/147325.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
index e32accaba85fc..bec86877e1705 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
@@ -4830,6 +4830,64 @@ AMDGPUTargetLowering::foldFreeOpFromSelect(TargetLowering::DAGCombinerInfo &DCI,
return SDValue();
}
+static EVT IntToFloatVT(EVT VT) {
+ return VT = VT.isVector() ? MVT::getVectorVT(MVT::getFloatingPointVT(
+ VT.getScalarSizeInBits()),
+ VT.getVectorNumElements())
+ : MVT::getFloatingPointVT(VT.getFixedSizeInBits());
+}
+
+static SDValue BitwiseToSrcModifierOp(SDValue N,
+ TargetLowering::DAGCombinerInfo &DCI) {
+
+ unsigned Opc = N.getNode()->getOpcode();
+ if (Opc != ISD::AND && Opc != ISD::XOR && Opc != ISD::AND)
+ return SDValue();
+
+ SelectionDAG &DAG = DCI.DAG;
+ SDValue LHS = N.getNode()->getOperand(0);
+ SDValue RHS = N.getNode()->getOperand(1);
+ ConstantSDNode *CRHS = isConstOrConstSplat(RHS);
+
+ if (!CRHS)
+ return SDValue();
+
+ EVT VT = RHS.getValueType();
+
+ assert((VT == MVT::i32 || VT == MVT::v2i32 || VT == MVT::i64) &&
+ "Expected i32, v2i32 or i64 value type.");
+
+ uint64_t Mask = 0;
+ if (VT.isVector()) {
+ SDValue Splat = DAG.getSplatValue(RHS);
+ const ConstantSDNode *C = dyn_cast<ConstantSDNode>(Splat);
+ Mask = C->getZExtValue();
+ } else
+ Mask = CRHS->getZExtValue();
+
+ EVT FVT = IntToFloatVT(VT);
+ SDValue BC = DAG.getNode(ISD::BITCAST, SDLoc(N), FVT, LHS);
+
+ switch (Opc) {
+ case ISD::XOR:
+ if (Mask == 0x80000000u || Mask == 0x8000000000000000u)
+ return DAG.getNode(ISD::FNEG, SDLoc(N), FVT, BC);
+ return SDValue();
+ case ISD::OR:
+ if (Mask == 0x80000000u || Mask == 0x8000000000000000u) {
+ SDValue Abs = DAG.getNode(ISD::FNEG, SDLoc(N), FVT, BC);
+ return DAG.getNode(ISD::FABS, SDLoc(N), FVT, Abs);
+ }
+ return SDValue();
+ case ISD::AND:
+ if (Mask == 0x7fffffffu || Mask == 0x7fffffffffffffffu)
+ return DAG.getNode(ISD::FABS, SDLoc(N), FVT, BC);
+ return SDValue();
+ default:
+ return SDValue();
+ }
+}
+
SDValue AMDGPUTargetLowering::performSelectCombine(SDNode *N,
DAGCombinerInfo &DCI) const {
if (SDValue Folded = foldFreeOpFromSelect(DCI, SDValue(N, 0)))
@@ -4864,12 +4922,25 @@ SDValue AMDGPUTargetLowering::performSelectCombine(SDNode *N,
}
if (VT == MVT::f32 && Subtarget->hasFminFmaxLegacy()) {
- SDValue MinMax
- = combineFMinMaxLegacy(SDLoc(N), VT, LHS, RHS, True, False, CC, DCI);
+ SDValue MinMax =
+ combineFMinMaxLegacy(SDLoc(N), VT, LHS, RHS, True, False, CC, DCI);
// Revisit this node so we can catch min3/max3/med3 patterns.
- //DCI.AddToWorklist(MinMax.getNode());
+ // DCI.AddToWorklist(MinMax.getNode());
return MinMax;
}
+
+ // Support source modifiers as integer.
+ if (VT == MVT::i32 || VT == MVT::v2i32 || VT == MVT::i64) {
+ SDLoc SL(N);
+ SDValue LHS = N->getOperand(1);
+ SDValue RHS = N->getOperand(2);
+ if (SDValue SrcMod = BitwiseToSrcModifierOp(LHS, DCI)) {
+ SDValue FRHS = DAG.getNode(ISD::BITCAST, SL, VT, RHS);
+ SDValue FSelect = DAG.getNode(ISD::SELECT, SL, VT, Cond, SrcMod, FRHS);
+ SDValue BC = DAG.getNode(ISD::BITCAST, SL, VT, FSelect);
+ return BC;
+ }
+ }
}
// There's no reason to not do this if the condition has other uses.
diff --git a/llvm/test/CodeGen/AMDGPU/integer-select-source-modifiers.ll b/llvm/test/CodeGen/AMDGPU/integer-select-source-modifiers.ll
new file mode 100644
index 0000000000000..2db20e672c303
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/integer-select-source-modifiers.ll
@@ -0,0 +1,222 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 < %s | FileCheck -check-prefixes=GCN,GFX7 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck -check-prefixes=GCN,GFX9 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -mattr=+real-true16 < %s | FileCheck -check-prefixes=GFX11,GFX11-TRUE16 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 -mattr=-real-true16 < %s | FileCheck -check-prefixes=GFX11,GFX11-FAKE16 %s
+
+define i32 @fneg_select_i32(i32 %cond, i32 %a, i32 %b) {
+; GCN-LABEL: fneg_select_i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT: v_cndmask_b32_e64 v0, v2, -v1, vcc
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_select_i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT: v_cndmask_b32_e64 v0, v2, -v1, vcc_lo
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %neg.a = xor i32 %a, u0x80000000
+ %cmp = icmp eq i32 %cond, zeroinitializer
+ %select = select i1 %cmp, i32 %neg.a, i32 %b
+ ret i32 %select
+}
+
+define <2 x i32> @fneg_select_v2i32(<2 x i32> %cond, <2 x i32> %a, <2 x i32> %b) {
+; GCN-LABEL: fneg_select_v2i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT: v_cndmask_b32_e64 v0, v4, -v2, vcc
+; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v1
+; GCN-NEXT: v_cndmask_b32_e64 v1, v5, -v3, vcc
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_select_v2i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT: v_cndmask_b32_e64 v0, v4, -v2, vcc_lo
+; GFX11-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v1
+; GFX11-NEXT: v_cndmask_b32_e64 v1, v5, -v3, vcc_lo
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %neg.a = xor <2 x i32> %a, splat (i32 u0x80000000)
+ %cmp = icmp eq <2 x i32> %cond, zeroinitializer
+ %select = select <2 x i1> %cmp, <2 x i32> %neg.a, <2 x i32> %b
+ ret <2 x i32> %select
+}
+
+define i32 @fabs_select_i32(i32 %cond, i32 %a, i32 %b) {
+; GCN-LABEL: fabs_select_i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT: v_cndmask_b32_e64 v0, v2, |v1|, vcc
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fabs_select_i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT: v_cndmask_b32_e64 v0, v2, |v1|, vcc_lo
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %neg.a = and i32 %a, u0x7fffffff
+ %cmp = icmp eq i32 %cond, zeroinitializer
+ %select = select i1 %cmp, i32 %neg.a, i32 %b
+ ret i32 %select
+}
+
+define <2 x i32> @fabs_select_v2i32(<2 x i32> %cond, <2 x i32> %a, <2 x i32> %b) {
+; GCN-LABEL: fabs_select_v2i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT: v_cndmask_b32_e64 v0, v4, |v2|, vcc
+; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v1
+; GCN-NEXT: v_cndmask_b32_e64 v1, v5, |v3|, vcc
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fabs_select_v2i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT: v_cndmask_b32_e64 v0, v4, |v2|, vcc_lo
+; GFX11-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v1
+; GFX11-NEXT: v_cndmask_b32_e64 v1, v5, |v3|, vcc_lo
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %neg.a = and <2 x i32> %a, splat (i32 u0x7fffffff)
+ %cmp = icmp eq <2 x i32> %cond, zeroinitializer
+ %select = select <2 x i1> %cmp, <2 x i32> %neg.a, <2 x i32> %b
+ ret <2 x i32> %select
+}
+
+define i32 @fneg_fabs_select_i32(i32 %cond, i32 %a, i32 %b) {
+; GCN-LABEL: fneg_fabs_select_i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_or_b32_e32 v1, 0x80000000, v1
+; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT: v_cndmask_b32_e32 v0, v2, v1, vcc
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_fabs_select_i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: v_or_b32_e32 v1, 0x80000000, v1
+; GFX11-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v2, v1, vcc_lo
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %neg.a = or i32 %a, u0x80000000
+ %cmp = icmp eq i32 %cond, zeroinitializer
+ %select = select i1 %cmp, i32 %neg.a, i32 %b
+ ret i32 %select
+}
+
+define <2 x i32> @fneg_fabs_select_v2i32(<2 x i32> %cond, <2 x i32> %a, <2 x i32> %b) {
+; GCN-LABEL: fneg_fabs_select_v2i32:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_or_b32_e32 v2, 0x80000000, v2
+; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
+; GCN-NEXT: v_or_b32_e32 v3, 0x80000000, v3
+; GCN-NEXT: v_cndmask_b32_e32 v0, v4, v2, vcc
+; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v1
+; GCN-NEXT: v_cndmask_b32_e32 v1, v5, v3, vcc
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_fabs_select_v2i32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: v_or_b32_e32 v2, 0x80000000, v2
+; GFX11-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
+; GFX11-NEXT: v_or_b32_e32 v3, 0x80000000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(SKIP_1) | instid1(VALU_DEP_3)
+; GFX11-NEXT: v_cndmask_b32_e32 v0, v4, v2, vcc_lo
+; GFX11-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v1
+; GFX11-NEXT: v_cndmask_b32_e32 v1, v5, v3, vcc_lo
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %neg.a = or <2 x i32> %a, splat (i32 u0x80000000)
+ %cmp = icmp eq <2 x i32> %cond, zeroinitializer
+ %select = select <2 x i1> %cmp, <2 x i32> %neg.a, <2 x i32> %b
+ ret <2 x i32> %select
+}
+
+define i64 @fneg_select_i64(i64 %cond, i64 %a, i64 %b) {
+; GCN-LABEL: fneg_select_i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_eq_u64_e32 vcc, 0, v[0:1]
+; GCN-NEXT: v_xor_b32_e32 v3, 0x80000000, v3
+; GCN-NEXT: v_cndmask_b32_e32 v0, v4, v2, vcc
+; GCN-NEXT: v_cndmask_b32_e32 v1, v5, v3, vcc
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_select_i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: v_cmp_eq_u64_e32 vcc_lo, 0, v[0:1]
+; GFX11-NEXT: v_xor_b32_e32 v1, 0x80000000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_dual_cndmask_b32 v0, v4, v2 :: v_dual_cndmask_b32 v1, v5, v1
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %neg.a = xor i64 %a, u0x8000000000000000
+ %cmp = icmp eq i64 %cond, zeroinitializer
+ %select = select i1 %cmp, i64 %neg.a, i64 %b
+ ret i64 %select
+}
+
+define i64 @fabs_select_i64(i64 %cond, i64 %a, i64 %b) {
+; GCN-LABEL: fabs_select_i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_eq_u64_e32 vcc, 0, v[0:1]
+; GCN-NEXT: v_and_b32_e32 v3, 0x7fffffff, v3
+; GCN-NEXT: v_cndmask_b32_e32 v0, v4, v2, vcc
+; GCN-NEXT: v_cndmask_b32_e32 v1, v5, v3, vcc
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fabs_select_i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: v_cmp_eq_u64_e32 vcc_lo, 0, v[0:1]
+; GFX11-NEXT: v_dual_cndmask_b32 v0, v4, v2 :: v_dual_and_b32 v1, 0x7fffffff, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_cndmask_b32_e32 v1, v5, v1, vcc_lo
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %neg.a = and i64 %a, u0x7fffffffffffffff
+ %cmp = icmp eq i64 %cond, zeroinitializer
+ %select = select i1 %cmp, i64 %neg.a, i64 %b
+ ret i64 %select
+}
+
+define i64 @fneg_fabs_select_i64(i64 %cond, i64 %a, i64 %b) {
+; GCN-LABEL: fneg_fabs_select_i64:
+; GCN: ; %bb.0:
+; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GCN-NEXT: v_cmp_eq_u64_e32 vcc, 0, v[0:1]
+; GCN-NEXT: v_or_b32_e32 v3, 0x80000000, v3
+; GCN-NEXT: v_cndmask_b32_e32 v0, v4, v2, vcc
+; GCN-NEXT: v_cndmask_b32_e32 v1, v5, v3, vcc
+; GCN-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX11-LABEL: fneg_fabs_select_i64:
+; GFX11: ; %bb.0:
+; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT: v_cmp_eq_u64_e32 vcc_lo, 0, v[0:1]
+; GFX11-NEXT: v_or_b32_e32 v1, 0x80000000, v3
+; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-NEXT: v_dual_cndmask_b32 v0, v4, v2 :: v_dual_cndmask_b32 v1, v5, v1
+; GFX11-NEXT: s_setpc_b64 s[30:31]
+ %neg.a = or i64 %a, u0x8000000000000000
+ %cmp = icmp eq i64 %cond, zeroinitializer
+ %select = select i1 %cmp, i64 %neg.a, i64 %b
+ ret i64 %select
+}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; GFX11-FAKE16: {{.*}}
+; GFX11-TRUE16: {{.*}}
+; GFX7: {{.*}}
+; GFX9: {{.*}}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 ways to go about this. Option 1 is what you've done here, where you hack up the integer operations to use the FP types. Option 2 would be to start directly selecting the modifiers through the integer patterns. There are plusses and minuses to each, and the main deciding factor would be how this interferes with other combines.
The examples here don't have any other context to make a judgement on. We also know have SimplifyDemandedBits through the FP sign bit operators, so maybe it matters less now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ty - However I'm still struggling with i64. The DAG produces the correct operations but then ISEL lowers to bitwise or/xor/and after i64 has been moved to v2i32.
✅ With the latest revision this PR passed the C/C++ code formatter. |
So the i64 fneg case is failing. However, i64 fabs and fneg+fabs are succeeding. Quick analysis shows a combine somewhere adding an extra fneg, creating a double negative, which cancels out. I'll try to identify the offending combine now. |
Once the i64 fneg case is fixed I'll refactor the solution for i64 as it is currently verbose. |
So what's happening here is, after the source modifier folding for the select has occurred, the combiner later produces:
Presumably a realisation the only the high value of the xor operands are significant.
Resulting in fneg being applied twice. Hence, the vgpr in the fneg i64 test cases are not negated. Note that it's target independent code that does this. |
I'm currently attempting move a simplified version of the selectcombine implemented into the TI selectcombine, to take advantage of the combiner splitting the i64 op for us. This will require careful use of shouldFoldSelectWithIdentityConstant() from TargetLowering.h. |
Extend the DAGCombine() for select to directly support fneg and fabs for i32, v2i32 and i64.
Also extend the test. Still struggling with 64-bit though as the legalizer is splitting some 64-bit ops into v2i32.
While this is functional it can be refactored and simplified, working on this now.
- Allows removal of i64 specific code - the TI combine splits to i32 ops. - Update quite a few AMDGPU tests, these all appear to be improvements in codegen. Need to double-check.
6153f46
to
686919f
Compare
I've moved a simplified combine to the TI DAGCombiner::visitSelect(). For AMDGPU this appears to have improved codegen across a number of tests, but probably worth inspecting by more experienced AMDGPU programmers. Thanks. but let's see what CI thinks, as locally I only test against X86 and AMDGPU. |
It's possible I'm misusing shouldFoldSelectWithIdentityConstant(), but if this is the case I propose adding a similar function to TargetLowering that fulfills the intended purpose in this PR. |
llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll
Outdated
Show resolved
Hide resolved
llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll
Outdated
Show resolved
Hide resolved
(Remove references to source modifiers).
Although moving to target independent combine enabled a simpler solution and resulted in improvements in other tests, I'm not sure this was the best way of working around the problem with the ordering of the combines. I'm considering other options. |
Thanks to all reviewers but I think I need to deprecate this PR in favour of the simpler #149110. |
On average it's probably better to do it that way. However, we don't have any of the combines trying to make the form match for integers. So on this path, there's additional work to actively try to form this |
Yes I've already begun this work by rebasing on top of the new srcmod patch and adapting to legal v2i32, also modifying SITargetLowering::performXorCombine where appropriate. |
Extend the DAGCombine() for select to directly support fneg and fabs for i32, v2i32 and i64.
This combine enables select operands that are bitmask sign-bit altering operations to be replaced with fneg, fabs or fneg+fabs. The AMDGPU backend supports source modifiers for these fp logic operations and enabling this combine enables more efficient lowering of these kinds of select instructions.
Pre-cursor PR to simplify #140694.