[AArch64] canCreateUndefOrPoisonForTargetNode - AArch64ISD::VASHR\VLSHR\VSHL can't create undef/poison #156445

RKSimon · 2025-09-02T11:32:00Z

We can always fold freeze(vashr(x,c)) -> vashr(freeze(x),c) as VASHR\VLSHR\VSHL shold always have an in-range constant shift amount.

Test coverage can be tricky, so I've hijacked some computeKnownBits/ComputeNumSignBits tests to show that value tracking can still analyse the shift node as the FREEZE will have been discarded by the canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison logic in getFreeze().

…'t create undef/poison We can always fold freeze(vashr(x,y)) -> vashr(freeze(x),freeze(y)) as VASHR has defined behaviour for out-of-range shift amounts. Test coverage can be tricky, so I've hijacked a ComputeNumSignBits test to show that value tracking can still analyse the VASHR node as the FREEZE will have been discarded by the canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison logic in getFreeze(). If this AArch64SelectionDAGTest.cpp approach is OK I'm intending to use it in llvm#149323 once llvm#155696 has landed.

RKSimon · 2025-09-02T11:44:08Z

llvm/test/CodeGen/AArch64/vector-compress.ll

 ; CHECK-NEXT:    fmov x9, d1
-; CHECK-NEXT:    bfi x8, x9, #3, #1
+; CHECK-NEXT:    and x9, x9, #0x8
+; CHECK-NEXT:    orr x8, x8, x9


I haven't entirely worked out what's going on here - but we've gone from:

t24: v2i64 = AArch64ISD::VASHR t23, Constant:i32<63> t55: v2i64 = freeze t24 t32: i64 = extract_vector_elt t55, Constant:i64<0> t53: i64 = shl t32, Constant:i64<3> t54: i64 = and t53, Constant:i64<8> t49: i64 = or disjoint FrameIndex:i64<0>, t54

to (what should be better):

t56: v2i64 = freeze t23 t24: v2i64 = AArch64ISD::VASHR t56, Constant:i32<63> t32: i64 = extract_vector_elt t24, Constant:i64<0> t54: i64 = and t32, Constant:i64<8> t49: i64 = or FrameIndex:i64<0>, t54

I suppose we now know the extracted element is all sign bits - but that screws up the BFI isel somehow?

davemgreen

I was going to say this looks OK (and there are VLSHR/VSHL which are similar too). The second parameter will always be a constant within range.

VASHR can have exact flags that we copy from shifts and use to perform certain optimizations though. Does that alter things for this opcode?

RKSimon · 2025-09-03T08:53:19Z

I was going to say this looks OK (and there are VLSHR/VSHL which are similar too). The second parameter will always be a constant within range.

OK - should I assert an inrange ConstantSDNode value or just assume it? I can add VLSHR/VSHL as well but don't currently have test coverage for them - I'll see if we can add it to AArch64SelectionDAGTest.cpp

VASHR can have exact flags that we copy from shifts and use to perform certain optimizations though. Does that alter things for this opcode?

I'll see if I can add test coverage for this - some of the freeze folds will strip poison flags like Exact when they push the freeze through the node - could that be whats causing the regression?

…HR/VLSHR/VSHL (#156631) Base tests so we can add additional FREEZE tests on top in #156445

llvmbot · 2025-09-03T13:34:13Z

@llvm/pr-subscribers-backend-aarch64

Author: Simon Pilgrim (RKSimon)

Changes

We can always fold freeze(vashr(x,y)) -> vashr(freeze(x),freeze(y)) as VASHR has defined behaviour for out-of-range shift amounts.

Test coverage can be tricky, so I've hijacked a ComputeNumSignBits test to show that value tracking can still analyse the VASHR node as the FREEZE will have been discarded by the canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison logic in getFreeze().

If this AArch64SelectionDAGTest.cpp approach is OK I'm intending to use it in #149323 once #155696 has landed.

Full diff: https://github.com/llvm/llvm-project/pull/156445.diff

4 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+13)
(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+6)
(modified) llvm/test/CodeGen/AArch64/vector-compress.ll (+12-13)
(modified) llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp (+3)

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index d3515cf81f443..f2aea5f795d60 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -30959,6 +30959,19 @@ bool AArch64TargetLowering::SimplifyDemandedBitsForTargetNode(
       Op, OriginalDemandedBits, OriginalDemandedElts, Known, TLO, Depth);
 }
 
+bool AArch64TargetLowering::canCreateUndefOrPoisonForTargetNode(
+    SDValue Op, const APInt &DemandedElts, const SelectionDAG &DAG,
+    bool PoisonOnly, bool ConsiderFlags, unsigned Depth) const {
+
+  // TODO: Add more target nodes.
+  switch (Op.getOpcode()) {
+  case AArch64ISD::VASHR:
+    return false;
+  }
+  return TargetLowering::canCreateUndefOrPoisonForTargetNode(
+      Op, DemandedElts, DAG, PoisonOnly, ConsiderFlags, Depth);
+}
+
 bool AArch64TargetLowering::isTargetCanonicalConstantNode(SDValue Op) const {
   return Op.getOpcode() == AArch64ISD::DUP ||
          Op.getOpcode() == AArch64ISD::MOVI ||
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index b840af36be7e7..1988ee82880a8 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -869,6 +869,12 @@ class AArch64TargetLowering : public TargetLowering {
                                          TargetLoweringOpt &TLO,
                                          unsigned Depth) const override;
 
+  bool canCreateUndefOrPoisonForTargetNode(SDValue Op,
+                                           const APInt &DemandedElts,
+                                           const SelectionDAG &DAG,
+                                           bool PoisonOnly, bool ConsiderFlags,
+                                           unsigned Depth) const override;
+
   bool isTargetCanonicalConstantNode(SDValue Op) const override;
 
   // With the exception of data-predicate transitions, no instructions are
diff --git a/llvm/test/CodeGen/AArch64/vector-compress.ll b/llvm/test/CodeGen/AArch64/vector-compress.ll
index a580913d40d95..67a0379d05244 100644
--- a/llvm/test/CodeGen/AArch64/vector-compress.ll
+++ b/llvm/test/CodeGen/AArch64/vector-compress.ll
@@ -12,16 +12,15 @@ define <4 x i32> @test_compress_v4i32(<4 x i32> %vec, <4 x i1> %mask) {
 ; CHECK-NEXT:    shl.4s v1, v1, #31
 ; CHECK-NEXT:    cmlt.4s v1, v1, #0
 ; CHECK-NEXT:    mov.s w9, v1[1]
-; CHECK-NEXT:    mov.s w10, v1[2]
 ; CHECK-NEXT:    fmov w11, s1
+; CHECK-NEXT:    mov.s w10, v1[2]
+; CHECK-NEXT:    and x12, x11, #0x1
 ; CHECK-NEXT:    bfi x8, x11, #2, #1
-; CHECK-NEXT:    and x11, x11, #0x1
-; CHECK-NEXT:    and x9, x9, #0x1
-; CHECK-NEXT:    and w10, w10, #0x1
-; CHECK-NEXT:    add x9, x11, x9
 ; CHECK-NEXT:    mov x11, sp
+; CHECK-NEXT:    and x9, x9, #0x1
+; CHECK-NEXT:    add x9, x12, x9
 ; CHECK-NEXT:    st1.s { v0 }[1], [x8]
-; CHECK-NEXT:    add w10, w9, w10
+; CHECK-NEXT:    sub w10, w9, w10
 ; CHECK-NEXT:    orr x9, x11, x9, lsl #2
 ; CHECK-NEXT:    bfi x11, x10, #2, #2
 ; CHECK-NEXT:    st1.s { v0 }[2], [x9]
@@ -93,7 +92,8 @@ define <2 x double> @test_compress_v2f64(<2 x double> %vec, <2 x i1> %mask) {
 ; CHECK-NEXT:    shl.2d v1, v1, #63
 ; CHECK-NEXT:    cmlt.2d v1, v1, #0
 ; CHECK-NEXT:    fmov x9, d1
-; CHECK-NEXT:    bfi x8, x9, #3, #1
+; CHECK-NEXT:    and x9, x9, #0x8
+; CHECK-NEXT:    orr x8, x8, x9
 ; CHECK-NEXT:    st1.d { v0 }[1], [x8]
 ; CHECK-NEXT:    ldr q0, [sp], #16
 ; CHECK-NEXT:    ret
@@ -420,16 +420,15 @@ define <3 x i32> @test_compress_narrow(<3 x i32> %vec, <3 x i1> %mask) {
 ; CHECK-NEXT:    shl.4s v1, v1, #31
 ; CHECK-NEXT:    cmlt.4s v1, v1, #0
 ; CHECK-NEXT:    mov.s w8, v1[1]
-; CHECK-NEXT:    mov.s w9, v1[2]
 ; CHECK-NEXT:    fmov w10, s1
+; CHECK-NEXT:    mov.s w9, v1[2]
+; CHECK-NEXT:    and x12, x10, #0x1
 ; CHECK-NEXT:    bfi x11, x10, #2, #1
-; CHECK-NEXT:    and x10, x10, #0x1
-; CHECK-NEXT:    and x8, x8, #0x1
-; CHECK-NEXT:    and w9, w9, #0x1
-; CHECK-NEXT:    add x8, x10, x8
 ; CHECK-NEXT:    mov x10, sp
+; CHECK-NEXT:    and x8, x8, #0x1
+; CHECK-NEXT:    add x8, x12, x8
 ; CHECK-NEXT:    st1.s { v0 }[1], [x11]
-; CHECK-NEXT:    add w9, w8, w9
+; CHECK-NEXT:    sub w9, w8, w9
 ; CHECK-NEXT:    orr x8, x10, x8, lsl #2
 ; CHECK-NEXT:    bfi x10, x9, #2, #2
 ; CHECK-NEXT:    st1.s { v0 }[2], [x8]
diff --git a/llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp b/llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp
index 18c8d4a69e7a8..af4debb93aefa 100644
--- a/llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp
+++ b/llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp
@@ -172,6 +172,9 @@ TEST_F(AArch64SelectionDAGTest, ComputeNumSignBits_VASHR) {
   auto VecA = DAG->getConstant(0xaa, Loc, VecVT);
   auto Op2 = DAG->getNode(AArch64ISD::VASHR, Loc, VecVT, VecA, Shift);
   EXPECT_EQ(DAG->ComputeNumSignBits(Op2), 5u);
+  // VASHR can't create undef/poison - FREEZE(VASHR(C1,C2)) -> VASHR(C1,C2).
+  auto Fr2 = DAG->getFreeze(Op2);
+  EXPECT_EQ(DAG->ComputeNumSignBits(Fr2), 5u);
 }
 
 TEST_F(AArch64SelectionDAGTest, SimplifyDemandedVectorElts_EXTRACT_SUBVECTOR) {

davemgreen · 2025-09-03T14:18:02Z

OK - should I assert an inrange ConstantSDNode value or just assume it? I can add VLSHR/VSHL as well but don't currently have test coverage for them - I'll see if we can add it to AArch64SelectionDAGTest.cpp

We select these nodes to the instructions with constant shifts in the range of 1-BW, so slightly outside of normal ranges but all produce valid values (no poisons). Assuming this is true should be fine, it should fail elsewhere if not.

I'll see if I can add test coverage for this - some of the freeze folds will strip poison flags like Exact when they push the freeze through the node - could that be whats causing the regression?

The exact flag is used just to produce SCVTF _shift instructions as far as I understand. Am I correct that a shift with an exact flag will produce poison if non-zeroes are shifted out? It doesn't seem to be part of the ValueTracking canCreateUndefOrPoison or SelectionDAG::canCreateUndefOrPoison.

RKSimon · 2025-09-03T14:44:45Z

OK - should I assert an inrange ConstantSDNode value or just assume it? I can add VLSHR/VSHL as well but don't currently have test coverage for them - I'll see if we can add it to AArch64SelectionDAGTest.cpp

We select these nodes to the instructions with constant shifts in the range of 1-BW, so slightly outside of normal ranges but all produce valid values (no poisons). Assuming this is true should be fine, it should fail elsewhere if not.

OK - by the looks of it computeKnownBits etc. already assume its all OK, so I'll do the same.

I'll see if I can add test coverage for this - some of the freeze folds will strip poison flags like Exact when they push the freeze through the node - could that be whats causing the regression?

The exact flag is used just to produce SCVTF _shift instructions as far as I understand. Am I correct that a shift with an exact flag will produce poison if non-zeroes are shifted out? It doesn't seem to be part of the ValueTracking canCreateUndefOrPoison or SelectionDAG::canCreateUndefOrPoison.

DAG.visitFREEZE will call canCreateUndefOrPoison with ConsiderFlags=false, which means it ignores the Exact flag. If it succeeds in pushing the FREEZE through it will create a new VASHR node but without the Exact flag. Its a hack but we could always treat ConsiderFlags=true for these node and fail if they have the Exact flag.

davemgreen · 2025-09-04T09:07:25Z

DAG.visitFREEZE will call canCreateUndefOrPoison with ConsiderFlags=false, which means it ignores the Exact flag. If it succeeds in pushing the FREEZE through it will create a new VASHR node but without the Exact flag. Its a hack but we could always treat ConsiderFlags=true for these node and fail if they have the Exact flag.

Does that mean it should be return ConsiderFlags ? !Op.hasExact() : false;?

RKSimon · 2025-09-04T14:42:53Z

DAG.visitFREEZE will call canCreateUndefOrPoison with ConsiderFlags=false, which means it ignores the Exact flag. If it succeeds in pushing the FREEZE through it will create a new VASHR node but without the Exact flag. Its a hack but we could always treat ConsiderFlags=true for these node and fail if they have the Exact flag.

Does that mean it should be return ConsiderFlags ? !Op.hasExact() : false;?

That's what happens already - SelectionDAG::canCreateUndefOrPoison handles this for us:

bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
                                          bool PoisonOnly, bool ConsiderFlags,
                                          unsigned Depth) const {
  if (ConsiderFlags && Op->hasPoisonGeneratingFlags())
    return true;
...

davemgreen · 2025-09-04T20:38:44Z

Ah I see. That was what I was missing. It feels a it odd, there can be instructions with flags that cannot generate poison due to the range of the arguments or whatnot. But if that is already covered then this LGTM. Thanks.

RKSimon requested a review from davemgreen September 2, 2025 11:32

RKSimon commented Sep 2, 2025

View reviewed changes

davemgreen reviewed Sep 2, 2025

View reviewed changes

This was referenced Sep 3, 2025

[AArch64] canCreateUndefOrPoisonForTargetNode - AArch64ISD::MOVI opcodes can't create undef/poison #149323

Merged

[AArch64] Add computeKnownBits unit test coverage for AArch64ISD::VASHR/VLSHR/VSHL #156631

Merged

RKSimon added a commit that referenced this pull request Sep 3, 2025

[AArch64] Add computeKnownBits unit test coverage for AArch64ISD::VAS…

9f9b480

…HR/VLSHR/VSHL (#156631) Base tests so we can add additional FREEZE tests on top in #156445

Merge branch 'main' into aarch64-vashr-nopoison

545159b

llvmbot added the backend:AArch64 label Sep 3, 2025

canCreateUndefOrPoisonForTargetNode - add VLSHR/VSHL handling

2cec029

RKSimon changed the title ~~[AArch64] canCreateUndefOrPoisonForTargetNode - AArch64ISD::VASHR can't create undef/poison~~ [AArch64] canCreateUndefOrPoisonForTargetNode - AArch64ISD::VASHR\VLSHR\VSHL can't create undef/poison Sep 3, 2025

Merge branch 'upstream' into aarch64-vashr-nopoison

7a890b5

davemgreen approved these changes Sep 4, 2025

View reviewed changes

Merge branch 'main' into aarch64-vashr-nopoison

d48151f

RKSimon enabled auto-merge (squash) September 5, 2025 07:56

RKSimon merged commit 6711099 into llvm:main Sep 5, 2025
9 checks passed

RKSimon deleted the aarch64-vashr-nopoison branch September 5, 2025 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64] canCreateUndefOrPoisonForTargetNode - AArch64ISD::VASHR\VLSHR\VSHL can't create undef/poison #156445

[AArch64] canCreateUndefOrPoisonForTargetNode - AArch64ISD::VASHR\VLSHR\VSHL can't create undef/poison #156445

RKSimon commented Sep 2, 2025 •

edited

Loading

Uh oh!

RKSimon Sep 2, 2025

Uh oh!

davemgreen left a comment

Uh oh!

RKSimon commented Sep 3, 2025

Uh oh!

llvmbot commented Sep 3, 2025

Uh oh!

davemgreen commented Sep 3, 2025

Uh oh!

RKSimon commented Sep 3, 2025 •

edited

Loading

Uh oh!

davemgreen commented Sep 4, 2025 •

edited

Loading

Uh oh!

RKSimon commented Sep 4, 2025 •

edited

Loading

Uh oh!

davemgreen commented Sep 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[AArch64] canCreateUndefOrPoisonForTargetNode - AArch64ISD::VASHR\VLSHR\VSHL can't create undef/poison #156445

[AArch64] canCreateUndefOrPoisonForTargetNode - AArch64ISD::VASHR\VLSHR\VSHL can't create undef/poison #156445

Conversation

RKSimon commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RKSimon Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

RKSimon commented Sep 3, 2025

Uh oh!

llvmbot commented Sep 3, 2025

Uh oh!

davemgreen commented Sep 3, 2025

Uh oh!

RKSimon commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davemgreen commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RKSimon commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davemgreen commented Sep 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RKSimon commented Sep 2, 2025 •

edited

Loading

RKSimon commented Sep 3, 2025 •

edited

Loading

davemgreen commented Sep 4, 2025 •

edited

Loading

RKSimon commented Sep 4, 2025 •

edited

Loading