Skip to content

Conversation

RKSimon
Copy link
Collaborator

@RKSimon RKSimon commented Sep 2, 2025

We can always fold freeze(vashr(x,c)) -> vashr(freeze(x),c) as VASHR\VLSHR\VSHL shold always have an in-range constant shift amount.

Test coverage can be tricky, so I've hijacked some computeKnownBits/ComputeNumSignBits tests to show that value tracking can still analyse the shift node as the FREEZE will have been discarded by the canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison logic in getFreeze().

…'t create undef/poison

We can always fold freeze(vashr(x,y)) -> vashr(freeze(x),freeze(y)) as VASHR has defined behaviour for out-of-range shift amounts.

Test coverage can be tricky, so I've hijacked a ComputeNumSignBits test to show that value tracking can still analyse the VASHR node as the FREEZE will have been discarded by the canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison logic in getFreeze().

If this AArch64SelectionDAGTest.cpp approach is OK I'm intending to use it in llvm#149323 once llvm#155696 has landed.
@RKSimon RKSimon requested a review from davemgreen September 2, 2025 11:32
; CHECK-NEXT: fmov x9, d1
; CHECK-NEXT: bfi x8, x9, #3, #1
; CHECK-NEXT: and x9, x9, #0x8
; CHECK-NEXT: orr x8, x8, x9
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't entirely worked out what's going on here - but we've gone from:

    t24: v2i64 = AArch64ISD::VASHR t23, Constant:i32<63>
    t55: v2i64 = freeze t24
    t32: i64 = extract_vector_elt t55, Constant:i64<0>
    t53: i64 = shl t32, Constant:i64<3>
    t54: i64 = and t53, Constant:i64<8>
    t49: i64 = or disjoint FrameIndex:i64<0>, t54

to (what should be better):

    t56: v2i64 = freeze t23
    t24: v2i64 = AArch64ISD::VASHR t56, Constant:i32<63>
	t32: i64 = extract_vector_elt t24, Constant:i64<0>
    t54: i64 = and t32, Constant:i64<8>
    t49: i64 = or FrameIndex:i64<0>, t54

I suppose we now know the extracted element is all sign bits - but that screws up the BFI isel somehow?

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to say this looks OK (and there are VLSHR/VSHL which are similar too). The second parameter will always be a constant within range.

VASHR can have exact flags that we copy from shifts and use to perform certain optimizations though. Does that alter things for this opcode?

@RKSimon
Copy link
Collaborator Author

RKSimon commented Sep 3, 2025

I was going to say this looks OK (and there are VLSHR/VSHL which are similar too). The second parameter will always be a constant within range.

OK - should I assert an inrange ConstantSDNode value or just assume it? I can add VLSHR/VSHL as well but don't currently have test coverage for them - I'll see if we can add it to AArch64SelectionDAGTest.cpp

VASHR can have exact flags that we copy from shifts and use to perform certain optimizations though. Does that alter things for this opcode?

I'll see if I can add test coverage for this - some of the freeze folds will strip poison flags like Exact when they push the freeze through the node - could that be whats causing the regression?

@llvmbot
Copy link
Member

llvmbot commented Sep 3, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Simon Pilgrim (RKSimon)

Changes

We can always fold freeze(vashr(x,y)) -> vashr(freeze(x),freeze(y)) as VASHR has defined behaviour for out-of-range shift amounts.

Test coverage can be tricky, so I've hijacked a ComputeNumSignBits test to show that value tracking can still analyse the VASHR node as the FREEZE will have been discarded by the canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison logic in getFreeze().

If this AArch64SelectionDAGTest.cpp approach is OK I'm intending to use it in #149323 once #155696 has landed.


Full diff: https://github.com/llvm/llvm-project/pull/156445.diff

4 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+13)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.h (+6)
  • (modified) llvm/test/CodeGen/AArch64/vector-compress.ll (+12-13)
  • (modified) llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp (+3)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index d3515cf81f443..f2aea5f795d60 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -30959,6 +30959,19 @@ bool AArch64TargetLowering::SimplifyDemandedBitsForTargetNode(
       Op, OriginalDemandedBits, OriginalDemandedElts, Known, TLO, Depth);
 }
 
+bool AArch64TargetLowering::canCreateUndefOrPoisonForTargetNode(
+    SDValue Op, const APInt &DemandedElts, const SelectionDAG &DAG,
+    bool PoisonOnly, bool ConsiderFlags, unsigned Depth) const {
+
+  // TODO: Add more target nodes.
+  switch (Op.getOpcode()) {
+  case AArch64ISD::VASHR:
+    return false;
+  }
+  return TargetLowering::canCreateUndefOrPoisonForTargetNode(
+      Op, DemandedElts, DAG, PoisonOnly, ConsiderFlags, Depth);
+}
+
 bool AArch64TargetLowering::isTargetCanonicalConstantNode(SDValue Op) const {
   return Op.getOpcode() == AArch64ISD::DUP ||
          Op.getOpcode() == AArch64ISD::MOVI ||
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index b840af36be7e7..1988ee82880a8 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -869,6 +869,12 @@ class AArch64TargetLowering : public TargetLowering {
                                          TargetLoweringOpt &TLO,
                                          unsigned Depth) const override;
 
+  bool canCreateUndefOrPoisonForTargetNode(SDValue Op,
+                                           const APInt &DemandedElts,
+                                           const SelectionDAG &DAG,
+                                           bool PoisonOnly, bool ConsiderFlags,
+                                           unsigned Depth) const override;
+
   bool isTargetCanonicalConstantNode(SDValue Op) const override;
 
   // With the exception of data-predicate transitions, no instructions are
diff --git a/llvm/test/CodeGen/AArch64/vector-compress.ll b/llvm/test/CodeGen/AArch64/vector-compress.ll
index a580913d40d95..67a0379d05244 100644
--- a/llvm/test/CodeGen/AArch64/vector-compress.ll
+++ b/llvm/test/CodeGen/AArch64/vector-compress.ll
@@ -12,16 +12,15 @@ define <4 x i32> @test_compress_v4i32(<4 x i32> %vec, <4 x i1> %mask) {
 ; CHECK-NEXT:    shl.4s v1, v1, #31
 ; CHECK-NEXT:    cmlt.4s v1, v1, #0
 ; CHECK-NEXT:    mov.s w9, v1[1]
-; CHECK-NEXT:    mov.s w10, v1[2]
 ; CHECK-NEXT:    fmov w11, s1
+; CHECK-NEXT:    mov.s w10, v1[2]
+; CHECK-NEXT:    and x12, x11, #0x1
 ; CHECK-NEXT:    bfi x8, x11, #2, #1
-; CHECK-NEXT:    and x11, x11, #0x1
-; CHECK-NEXT:    and x9, x9, #0x1
-; CHECK-NEXT:    and w10, w10, #0x1
-; CHECK-NEXT:    add x9, x11, x9
 ; CHECK-NEXT:    mov x11, sp
+; CHECK-NEXT:    and x9, x9, #0x1
+; CHECK-NEXT:    add x9, x12, x9
 ; CHECK-NEXT:    st1.s { v0 }[1], [x8]
-; CHECK-NEXT:    add w10, w9, w10
+; CHECK-NEXT:    sub w10, w9, w10
 ; CHECK-NEXT:    orr x9, x11, x9, lsl #2
 ; CHECK-NEXT:    bfi x11, x10, #2, #2
 ; CHECK-NEXT:    st1.s { v0 }[2], [x9]
@@ -93,7 +92,8 @@ define <2 x double> @test_compress_v2f64(<2 x double> %vec, <2 x i1> %mask) {
 ; CHECK-NEXT:    shl.2d v1, v1, #63
 ; CHECK-NEXT:    cmlt.2d v1, v1, #0
 ; CHECK-NEXT:    fmov x9, d1
-; CHECK-NEXT:    bfi x8, x9, #3, #1
+; CHECK-NEXT:    and x9, x9, #0x8
+; CHECK-NEXT:    orr x8, x8, x9
 ; CHECK-NEXT:    st1.d { v0 }[1], [x8]
 ; CHECK-NEXT:    ldr q0, [sp], #16
 ; CHECK-NEXT:    ret
@@ -420,16 +420,15 @@ define <3 x i32> @test_compress_narrow(<3 x i32> %vec, <3 x i1> %mask) {
 ; CHECK-NEXT:    shl.4s v1, v1, #31
 ; CHECK-NEXT:    cmlt.4s v1, v1, #0
 ; CHECK-NEXT:    mov.s w8, v1[1]
-; CHECK-NEXT:    mov.s w9, v1[2]
 ; CHECK-NEXT:    fmov w10, s1
+; CHECK-NEXT:    mov.s w9, v1[2]
+; CHECK-NEXT:    and x12, x10, #0x1
 ; CHECK-NEXT:    bfi x11, x10, #2, #1
-; CHECK-NEXT:    and x10, x10, #0x1
-; CHECK-NEXT:    and x8, x8, #0x1
-; CHECK-NEXT:    and w9, w9, #0x1
-; CHECK-NEXT:    add x8, x10, x8
 ; CHECK-NEXT:    mov x10, sp
+; CHECK-NEXT:    and x8, x8, #0x1
+; CHECK-NEXT:    add x8, x12, x8
 ; CHECK-NEXT:    st1.s { v0 }[1], [x11]
-; CHECK-NEXT:    add w9, w8, w9
+; CHECK-NEXT:    sub w9, w8, w9
 ; CHECK-NEXT:    orr x8, x10, x8, lsl #2
 ; CHECK-NEXT:    bfi x10, x9, #2, #2
 ; CHECK-NEXT:    st1.s { v0 }[2], [x8]
diff --git a/llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp b/llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp
index 18c8d4a69e7a8..af4debb93aefa 100644
--- a/llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp
+++ b/llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp
@@ -172,6 +172,9 @@ TEST_F(AArch64SelectionDAGTest, ComputeNumSignBits_VASHR) {
   auto VecA = DAG->getConstant(0xaa, Loc, VecVT);
   auto Op2 = DAG->getNode(AArch64ISD::VASHR, Loc, VecVT, VecA, Shift);
   EXPECT_EQ(DAG->ComputeNumSignBits(Op2), 5u);
+  // VASHR can't create undef/poison - FREEZE(VASHR(C1,C2)) -> VASHR(C1,C2).
+  auto Fr2 = DAG->getFreeze(Op2);
+  EXPECT_EQ(DAG->ComputeNumSignBits(Fr2), 5u);
 }
 
 TEST_F(AArch64SelectionDAGTest, SimplifyDemandedVectorElts_EXTRACT_SUBVECTOR) {

@RKSimon RKSimon changed the title [AArch64] canCreateUndefOrPoisonForTargetNode - AArch64ISD::VASHR can't create undef/poison [AArch64] canCreateUndefOrPoisonForTargetNode - AArch64ISD::VASHR\VLSHR\VSHL can't create undef/poison Sep 3, 2025
@davemgreen
Copy link
Collaborator

OK - should I assert an inrange ConstantSDNode value or just assume it? I can add VLSHR/VSHL as well but don't currently have test coverage for them - I'll see if we can add it to AArch64SelectionDAGTest.cpp

We select these nodes to the instructions with constant shifts in the range of 1-BW, so slightly outside of normal ranges but all produce valid values (no poisons). Assuming this is true should be fine, it should fail elsewhere if not.

I'll see if I can add test coverage for this - some of the freeze folds will strip poison flags like Exact when they push the freeze through the node - could that be whats causing the regression?

The exact flag is used just to produce SCVTF _shift instructions as far as I understand. Am I correct that a shift with an exact flag will produce poison if non-zeroes are shifted out? It doesn't seem to be part of the ValueTracking canCreateUndefOrPoison or SelectionDAG::canCreateUndefOrPoison.

@RKSimon
Copy link
Collaborator Author

RKSimon commented Sep 3, 2025

OK - should I assert an inrange ConstantSDNode value or just assume it? I can add VLSHR/VSHL as well but don't currently have test coverage for them - I'll see if we can add it to AArch64SelectionDAGTest.cpp

We select these nodes to the instructions with constant shifts in the range of 1-BW, so slightly outside of normal ranges but all produce valid values (no poisons). Assuming this is true should be fine, it should fail elsewhere if not.

OK - by the looks of it computeKnownBits etc. already assume its all OK, so I'll do the same.

I'll see if I can add test coverage for this - some of the freeze folds will strip poison flags like Exact when they push the freeze through the node - could that be whats causing the regression?

The exact flag is used just to produce SCVTF _shift instructions as far as I understand. Am I correct that a shift with an exact flag will produce poison if non-zeroes are shifted out? It doesn't seem to be part of the ValueTracking canCreateUndefOrPoison or SelectionDAG::canCreateUndefOrPoison.

DAG.visitFREEZE will call canCreateUndefOrPoison with ConsiderFlags=false, which means it ignores the Exact flag. If it succeeds in pushing the FREEZE through it will create a new VASHR node but without the Exact flag. Its a hack but we could always treat ConsiderFlags=true for these node and fail if they have the Exact flag.

@davemgreen
Copy link
Collaborator

davemgreen commented Sep 4, 2025

DAG.visitFREEZE will call canCreateUndefOrPoison with ConsiderFlags=false, which means it ignores the Exact flag. If it succeeds in pushing the FREEZE through it will create a new VASHR node but without the Exact flag. Its a hack but we could always treat ConsiderFlags=true for these node and fail if they have the Exact flag.

Does that mean it should be return ConsiderFlags ? !Op.hasExact() : false;?

@RKSimon
Copy link
Collaborator Author

RKSimon commented Sep 4, 2025

DAG.visitFREEZE will call canCreateUndefOrPoison with ConsiderFlags=false, which means it ignores the Exact flag. If it succeeds in pushing the FREEZE through it will create a new VASHR node but without the Exact flag. Its a hack but we could always treat ConsiderFlags=true for these node and fail if they have the Exact flag.

Does that mean it should be return ConsiderFlags ? !Op.hasExact() : false;?

That's what happens already - SelectionDAG::canCreateUndefOrPoison handles this for us:

bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
                                          bool PoisonOnly, bool ConsiderFlags,
                                          unsigned Depth) const {
  if (ConsiderFlags && Op->hasPoisonGeneratingFlags())
    return true;
...

@davemgreen
Copy link
Collaborator

Ah I see. That was what I was missing. It feels a it odd, there can be instructions with flags that cannot generate poison due to the range of the arguments or whatnot. But if that is already covered then this LGTM. Thanks.

@RKSimon RKSimon enabled auto-merge (squash) September 5, 2025 07:56
@RKSimon RKSimon merged commit 6711099 into llvm:main Sep 5, 2025
9 checks passed
@RKSimon RKSimon deleted the aarch64-vashr-nopoison branch September 5, 2025 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants