-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[SDAG] Teach FoldConstantArithmetic to match splats inserted into vectors #163984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-llvm-selectiondag Author: Benjamin Maxwell (MacDue) ChangesThis allows for more constant folding when inserting fixed-length vector splats into scalable vectors. Full diff: https://github.com/llvm/llvm-project/pull/163984.diff 4 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index c97300d64d455..8e74bcb25da63 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -28014,9 +28014,13 @@ SDValue DAGCombiner::visitINSERT_SUBVECTOR(SDNode *N) {
// Simplify scalar inserts into an undef vector:
// insert_subvector undef, (splat X), N2 -> splat X
- if (N0.isUndef() && N1.getOpcode() == ISD::SPLAT_VECTOR)
- if (DAG.isConstantValueOfAnyType(N1.getOperand(0)) || N1.hasOneUse())
+ auto *BV0 = dyn_cast<BuildVectorSDNode>(N1);
+ if (N0.isUndef() && (N1.getOpcode() == ISD::SPLAT_VECTOR || BV0)) {
+ SDValue Splat = BV0 ? BV0->getSplatValue() : N1.getOperand(0);
+ if (Splat &&
+ (N1.hasOneUse() || (!BV0 && DAG.isConstantValueOfAnyType(Splat))))
return DAG.getNode(ISD::SPLAT_VECTOR, SDLoc(N), VT, N1.getOperand(0));
+ }
// insert_subvector (splat X), (splat X), N2 -> splat X
if (N0.getOpcode() == ISD::SPLAT_VECTOR && N0.getOpcode() == N1.getOpcode() &&
diff --git a/llvm/test/CodeGen/AArch64/fixed-subvector-insert-into-scalable.ll b/llvm/test/CodeGen/AArch64/fixed-subvector-insert-into-scalable.ll
new file mode 100644
index 0000000000000..5ce5baf42f5f5
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/fixed-subvector-insert-into-scalable.ll
@@ -0,0 +1,46 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+
+define <vscale x 4 x i32> @insert_div() {
+; CHECK-LABEL: insert_div:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: mov z0.s, #3 // =0x3
+; CHECK-NEXT: ret
+entry:
+ %0 = tail call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> undef, <4 x i32> splat (i32 9), i64 0)
+ %div = udiv <vscale x 4 x i32> %0, splat (i32 3)
+ ret <vscale x 4 x i32> %div
+}
+
+define <vscale x 4 x i32> @insert_mul() {
+; CHECK-LABEL: insert_mul:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: mov z0.s, #7 // =0x7
+; CHECK-NEXT: ret
+entry:
+ %0 = tail call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> undef, <4 x i32> splat (i32 1), i64 0)
+ %mul = mul <vscale x 4 x i32> %0, splat (i32 7)
+ ret <vscale x 4 x i32> %mul
+}
+
+define <vscale x 4 x i32> @insert_add() {
+; CHECK-LABEL: insert_add:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: mov z0.s, #16 // =0x10
+; CHECK-NEXT: ret
+entry:
+ %0 = tail call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> undef, <4 x i32> splat (i32 5), i64 0)
+ %add = add <vscale x 4 x i32> %0, splat (i32 11)
+ ret <vscale x 4 x i32> %add
+}
+
+define <vscale x 4 x i32> @insert_sub() {
+; CHECK-LABEL: insert_sub:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: movi v0.2d, #0000000000000000
+; CHECK-NEXT: ret
+entry:
+ %0 = tail call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> undef, <4 x i32> splat (i32 11), i64 0)
+ %sub = add <vscale x 4 x i32> %0, splat (i32 -11)
+ ret <vscale x 4 x i32> %sub
+}
diff --git a/llvm/test/CodeGen/AArch64/vecreduce-add.ll b/llvm/test/CodeGen/AArch64/vecreduce-add.ll
index 2d0df562b9a4b..f56e1680f79c6 100644
--- a/llvm/test/CodeGen/AArch64/vecreduce-add.ll
+++ b/llvm/test/CodeGen/AArch64/vecreduce-add.ll
@@ -4778,7 +4778,7 @@ entry:
define i64 @extract_scalable(<2 x i32> %0) "target-features"="+sve2" {
; CHECK-SD-LABEL: extract_scalable:
; CHECK-SD: // %bb.0:
-; CHECK-SD-NEXT: movi v1.2s, #1
+; CHECK-SD-NEXT: mov z1.s, #1 // =0x1
; CHECK-SD-NEXT: ptrue p0.s, vl2
; CHECK-SD-NEXT: // kill: def $d0 killed $d0 def $z0
; CHECK-SD-NEXT: sdivr z0.s, p0/m, z0.s, z1.s
diff --git a/llvm/test/CodeGen/X86/pr35443.ll b/llvm/test/CodeGen/X86/pr35443.ll
index 430a1380c7c8c..8438b73acac23 100644
--- a/llvm/test/CodeGen/X86/pr35443.ll
+++ b/llvm/test/CodeGen/X86/pr35443.ll
@@ -8,7 +8,7 @@
define void @pr35443() {
; CHECK-LABEL: pr35443:
; CHECK: # %bb.0: # %entry
-; CHECK-NEXT: vpbroadcastb ac+4(%rip), %xmm0
+; CHECK-NEXT: vpbroadcastb ac+4(%rip), %ymm0
; CHECK-NEXT: vpmovzxbq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero,xmm0[2],zero,zero,zero,zero,zero,zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero
; CHECK-NEXT: vpxor %xmm1, %xmm1, %xmm1
; CHECK-NEXT: vpsubq %ymm0, %ymm1, %ymm0
|
✅ With the latest revision this PR passed the undef deprecator. |
auto *BV0 = dyn_cast<BuildVectorSDNode>(N1); | ||
if (N0.isUndef() && (N1.getOpcode() == ISD::SPLAT_VECTOR || BV0)) { | ||
SDValue Splat = BV0 ? BV0->getSplatValue() : N1.getOperand(0); | ||
bool SplatLegal = TLI.isOperationLegalOrCustom(ISD::SPLAT_VECTOR, VT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: The legality check avoids a regression (where the splat is just lowered to another BUILD_VECTOR
, which seems to duplicate some code).
llvm/test/CodeGen/X86/pr35443.ll
Outdated
; CHECK-LABEL: pr35443: | ||
; CHECK: # %bb.0: # %entry | ||
; CHECK-NEXT: vpbroadcastb ac+4(%rip), %xmm0 | ||
; CHECK-NEXT: vpbroadcastb ac+4(%rip), %ymm0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unfortunate - SimplifyDemandedVectorElts should narrow this to xmm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may move this fold to AArch64 only, it looks like RISCV may also have some trouble with this form, so I'm not sure applying it generally is worthwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure we've tried this before with the same outcome, because the combine is removing potentially valuable information. I wonder if instead of removing it you can teach the relevant is_splat/get_splat functions that take allow_undefs like parameters to do a better job of this idiom for scalable vectors?
Same goes for matching splats during isel as most of the relevant patterns should just work by looking-through the insert_subvector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've come up with an alternate approach that updates FoldConstantArithmetic
to handle this case (does not appear to cause any regressions).
I looked into the various IsSplat/GetSplat functions, but I didn't see any uses that looked like they would result in these cases being constant-folded. Waiting till ISEL also looked like it may be too late, as some nodes, like UDIV, may expand to a more complex sequence by that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this path has proven successful I now think the combine is better done at the IR level.
My ISEL suggestion is more for the next logical step when only the "value inserted into poison" is constant. While we want to preserve the information that only part of the vector has real data, we'll still want to match compatible variants as scalable vector splats during selection to match against the instructions that take an immediate.
…tors This teaches FoldConstantArithmetic to match `insert_subvector undef, (splat X), N2` as a splat of X. This pattern can occur for scalable vectors when a fixed-length splat is inserted into an undef vector. This allows the cases in `fixed-subvector-insert-into-scalable.ll` to be constant-folded (where previously they would all be computed at runtime).
✅ With the latest revision this PR passed the C/C++ code formatter. |
(CI failure looks unrelated) |
This teaches FoldConstantArithmetic to match
insert_subvector undef, (splat X), N2
as a splat of X. This pattern can occur for scalable vectors when a fixed-length splat is inserted into an undef vector.This allows the cases in
fixed-subvector-insert-into-scalable.ll
to constant-folded (where previously they would all be computed at runtime).