-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[AArch64] canCreateUndefOrPoisonForTargetNode - AArch64ISD::VASHR\VLSHR\VSHL can't create undef/poison #156445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…'t create undef/poison We can always fold freeze(vashr(x,y)) -> vashr(freeze(x),freeze(y)) as VASHR has defined behaviour for out-of-range shift amounts. Test coverage can be tricky, so I've hijacked a ComputeNumSignBits test to show that value tracking can still analyse the VASHR node as the FREEZE will have been discarded by the canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison logic in getFreeze(). If this AArch64SelectionDAGTest.cpp approach is OK I'm intending to use it in llvm#149323 once llvm#155696 has landed.
; CHECK-NEXT: fmov x9, d1 | ||
; CHECK-NEXT: bfi x8, x9, #3, #1 | ||
; CHECK-NEXT: and x9, x9, #0x8 | ||
; CHECK-NEXT: orr x8, x8, x9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't entirely worked out what's going on here - but we've gone from:
t24: v2i64 = AArch64ISD::VASHR t23, Constant:i32<63>
t55: v2i64 = freeze t24
t32: i64 = extract_vector_elt t55, Constant:i64<0>
t53: i64 = shl t32, Constant:i64<3>
t54: i64 = and t53, Constant:i64<8>
t49: i64 = or disjoint FrameIndex:i64<0>, t54
to (what should be better):
t56: v2i64 = freeze t23
t24: v2i64 = AArch64ISD::VASHR t56, Constant:i32<63>
t32: i64 = extract_vector_elt t24, Constant:i64<0>
t54: i64 = and t32, Constant:i64<8>
t49: i64 = or FrameIndex:i64<0>, t54
I suppose we now know the extracted element is all sign bits - but that screws up the BFI isel somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to say this looks OK (and there are VLSHR/VSHL which are similar too). The second parameter will always be a constant within range.
VASHR can have exact flags that we copy from shifts and use to perform certain optimizations though. Does that alter things for this opcode?
OK - should I assert an inrange ConstantSDNode value or just assume it? I can add VLSHR/VSHL as well but don't currently have test coverage for them - I'll see if we can add it to AArch64SelectionDAGTest.cpp
I'll see if I can add test coverage for this - some of the freeze folds will strip poison flags like Exact when they push the freeze through the node - could that be whats causing the regression? |
@llvm/pr-subscribers-backend-aarch64 Author: Simon Pilgrim (RKSimon) ChangesWe can always fold freeze(vashr(x,y)) -> vashr(freeze(x),freeze(y)) as VASHR has defined behaviour for out-of-range shift amounts. Test coverage can be tricky, so I've hijacked a ComputeNumSignBits test to show that value tracking can still analyse the VASHR node as the FREEZE will have been discarded by the canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison logic in getFreeze(). If this AArch64SelectionDAGTest.cpp approach is OK I'm intending to use it in #149323 once #155696 has landed. Full diff: https://github.com/llvm/llvm-project/pull/156445.diff 4 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index d3515cf81f443..f2aea5f795d60 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -30959,6 +30959,19 @@ bool AArch64TargetLowering::SimplifyDemandedBitsForTargetNode(
Op, OriginalDemandedBits, OriginalDemandedElts, Known, TLO, Depth);
}
+bool AArch64TargetLowering::canCreateUndefOrPoisonForTargetNode(
+ SDValue Op, const APInt &DemandedElts, const SelectionDAG &DAG,
+ bool PoisonOnly, bool ConsiderFlags, unsigned Depth) const {
+
+ // TODO: Add more target nodes.
+ switch (Op.getOpcode()) {
+ case AArch64ISD::VASHR:
+ return false;
+ }
+ return TargetLowering::canCreateUndefOrPoisonForTargetNode(
+ Op, DemandedElts, DAG, PoisonOnly, ConsiderFlags, Depth);
+}
+
bool AArch64TargetLowering::isTargetCanonicalConstantNode(SDValue Op) const {
return Op.getOpcode() == AArch64ISD::DUP ||
Op.getOpcode() == AArch64ISD::MOVI ||
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index b840af36be7e7..1988ee82880a8 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -869,6 +869,12 @@ class AArch64TargetLowering : public TargetLowering {
TargetLoweringOpt &TLO,
unsigned Depth) const override;
+ bool canCreateUndefOrPoisonForTargetNode(SDValue Op,
+ const APInt &DemandedElts,
+ const SelectionDAG &DAG,
+ bool PoisonOnly, bool ConsiderFlags,
+ unsigned Depth) const override;
+
bool isTargetCanonicalConstantNode(SDValue Op) const override;
// With the exception of data-predicate transitions, no instructions are
diff --git a/llvm/test/CodeGen/AArch64/vector-compress.ll b/llvm/test/CodeGen/AArch64/vector-compress.ll
index a580913d40d95..67a0379d05244 100644
--- a/llvm/test/CodeGen/AArch64/vector-compress.ll
+++ b/llvm/test/CodeGen/AArch64/vector-compress.ll
@@ -12,16 +12,15 @@ define <4 x i32> @test_compress_v4i32(<4 x i32> %vec, <4 x i1> %mask) {
; CHECK-NEXT: shl.4s v1, v1, #31
; CHECK-NEXT: cmlt.4s v1, v1, #0
; CHECK-NEXT: mov.s w9, v1[1]
-; CHECK-NEXT: mov.s w10, v1[2]
; CHECK-NEXT: fmov w11, s1
+; CHECK-NEXT: mov.s w10, v1[2]
+; CHECK-NEXT: and x12, x11, #0x1
; CHECK-NEXT: bfi x8, x11, #2, #1
-; CHECK-NEXT: and x11, x11, #0x1
-; CHECK-NEXT: and x9, x9, #0x1
-; CHECK-NEXT: and w10, w10, #0x1
-; CHECK-NEXT: add x9, x11, x9
; CHECK-NEXT: mov x11, sp
+; CHECK-NEXT: and x9, x9, #0x1
+; CHECK-NEXT: add x9, x12, x9
; CHECK-NEXT: st1.s { v0 }[1], [x8]
-; CHECK-NEXT: add w10, w9, w10
+; CHECK-NEXT: sub w10, w9, w10
; CHECK-NEXT: orr x9, x11, x9, lsl #2
; CHECK-NEXT: bfi x11, x10, #2, #2
; CHECK-NEXT: st1.s { v0 }[2], [x9]
@@ -93,7 +92,8 @@ define <2 x double> @test_compress_v2f64(<2 x double> %vec, <2 x i1> %mask) {
; CHECK-NEXT: shl.2d v1, v1, #63
; CHECK-NEXT: cmlt.2d v1, v1, #0
; CHECK-NEXT: fmov x9, d1
-; CHECK-NEXT: bfi x8, x9, #3, #1
+; CHECK-NEXT: and x9, x9, #0x8
+; CHECK-NEXT: orr x8, x8, x9
; CHECK-NEXT: st1.d { v0 }[1], [x8]
; CHECK-NEXT: ldr q0, [sp], #16
; CHECK-NEXT: ret
@@ -420,16 +420,15 @@ define <3 x i32> @test_compress_narrow(<3 x i32> %vec, <3 x i1> %mask) {
; CHECK-NEXT: shl.4s v1, v1, #31
; CHECK-NEXT: cmlt.4s v1, v1, #0
; CHECK-NEXT: mov.s w8, v1[1]
-; CHECK-NEXT: mov.s w9, v1[2]
; CHECK-NEXT: fmov w10, s1
+; CHECK-NEXT: mov.s w9, v1[2]
+; CHECK-NEXT: and x12, x10, #0x1
; CHECK-NEXT: bfi x11, x10, #2, #1
-; CHECK-NEXT: and x10, x10, #0x1
-; CHECK-NEXT: and x8, x8, #0x1
-; CHECK-NEXT: and w9, w9, #0x1
-; CHECK-NEXT: add x8, x10, x8
; CHECK-NEXT: mov x10, sp
+; CHECK-NEXT: and x8, x8, #0x1
+; CHECK-NEXT: add x8, x12, x8
; CHECK-NEXT: st1.s { v0 }[1], [x11]
-; CHECK-NEXT: add w9, w8, w9
+; CHECK-NEXT: sub w9, w8, w9
; CHECK-NEXT: orr x8, x10, x8, lsl #2
; CHECK-NEXT: bfi x10, x9, #2, #2
; CHECK-NEXT: st1.s { v0 }[2], [x8]
diff --git a/llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp b/llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp
index 18c8d4a69e7a8..af4debb93aefa 100644
--- a/llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp
+++ b/llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp
@@ -172,6 +172,9 @@ TEST_F(AArch64SelectionDAGTest, ComputeNumSignBits_VASHR) {
auto VecA = DAG->getConstant(0xaa, Loc, VecVT);
auto Op2 = DAG->getNode(AArch64ISD::VASHR, Loc, VecVT, VecA, Shift);
EXPECT_EQ(DAG->ComputeNumSignBits(Op2), 5u);
+ // VASHR can't create undef/poison - FREEZE(VASHR(C1,C2)) -> VASHR(C1,C2).
+ auto Fr2 = DAG->getFreeze(Op2);
+ EXPECT_EQ(DAG->ComputeNumSignBits(Fr2), 5u);
}
TEST_F(AArch64SelectionDAGTest, SimplifyDemandedVectorElts_EXTRACT_SUBVECTOR) {
|
We select these nodes to the instructions with constant shifts in the range of 1-BW, so slightly outside of normal ranges but all produce valid values (no poisons). Assuming this is true should be fine, it should fail elsewhere if not.
The |
OK - by the looks of it computeKnownBits etc. already assume its all OK, so I'll do the same.
DAG.visitFREEZE will call canCreateUndefOrPoison with ConsiderFlags=false, which means it ignores the Exact flag. If it succeeds in pushing the FREEZE through it will create a new VASHR node but without the Exact flag. Its a hack but we could always treat ConsiderFlags=true for these node and fail if they have the Exact flag. |
Does that mean it should be |
That's what happens already - SelectionDAG::canCreateUndefOrPoison handles this for us: bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
bool PoisonOnly, bool ConsiderFlags,
unsigned Depth) const {
if (ConsiderFlags && Op->hasPoisonGeneratingFlags())
return true;
... |
Ah I see. That was what I was missing. It feels a it odd, there can be instructions with flags that cannot generate poison due to the range of the arguments or whatnot. But if that is already covered then this LGTM. Thanks. |
We can always fold freeze(vashr(x,c)) -> vashr(freeze(x),c) as VASHR\VLSHR\VSHL shold always have an in-range constant shift amount.
Test coverage can be tricky, so I've hijacked some computeKnownBits/ComputeNumSignBits tests to show that value tracking can still analyse the shift node as the FREEZE will have been discarded by the canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison logic in getFreeze().