[SDAG] Teach FoldConstantArithmetic to match splats inserted into vectors #163984

MacDue · 2025-10-17T15:58:02Z

This teaches FoldConstantArithmetic to match insert_subvector undef, (splat X), N2 as a splat of X. This pattern can occur for scalable vectors when a fixed-length splat is inserted into an undef vector.

This allows the cases in fixed-subvector-insert-into-scalable.ll to constant-folded (where previously they would all be computed at runtime).

llvmbot · 2025-10-17T16:01:41Z

@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-selectiondag

Author: Benjamin Maxwell (MacDue)

Changes

This allows for more constant folding when inserting fixed-length vector splats into scalable vectors.

Full diff: https://github.com/llvm/llvm-project/pull/163984.diff

4 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+6-2)
(added) llvm/test/CodeGen/AArch64/fixed-subvector-insert-into-scalable.ll (+46)
(modified) llvm/test/CodeGen/AArch64/vecreduce-add.ll (+1-1)
(modified) llvm/test/CodeGen/X86/pr35443.ll (+1-1)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index c97300d64d455..8e74bcb25da63 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -28014,9 +28014,13 @@ SDValue DAGCombiner::visitINSERT_SUBVECTOR(SDNode *N) {
 
   // Simplify scalar inserts into an undef vector:
   // insert_subvector undef, (splat X), N2 -> splat X
-  if (N0.isUndef() && N1.getOpcode() == ISD::SPLAT_VECTOR)
-    if (DAG.isConstantValueOfAnyType(N1.getOperand(0)) || N1.hasOneUse())
+  auto *BV0 = dyn_cast<BuildVectorSDNode>(N1);
+  if (N0.isUndef() && (N1.getOpcode() == ISD::SPLAT_VECTOR || BV0)) {
+    SDValue Splat = BV0 ? BV0->getSplatValue() : N1.getOperand(0);
+    if (Splat &&
+        (N1.hasOneUse() || (!BV0 && DAG.isConstantValueOfAnyType(Splat))))
       return DAG.getNode(ISD::SPLAT_VECTOR, SDLoc(N), VT, N1.getOperand(0));
+  }
 
   // insert_subvector (splat X), (splat X), N2 -> splat X
   if (N0.getOpcode() == ISD::SPLAT_VECTOR && N0.getOpcode() == N1.getOpcode() &&
diff --git a/llvm/test/CodeGen/AArch64/fixed-subvector-insert-into-scalable.ll b/llvm/test/CodeGen/AArch64/fixed-subvector-insert-into-scalable.ll
new file mode 100644
index 0000000000000..5ce5baf42f5f5
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/fixed-subvector-insert-into-scalable.ll
@@ -0,0 +1,46 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+
+define <vscale x 4 x i32> @insert_div() {
+; CHECK-LABEL: insert_div:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov z0.s, #3 // =0x3
+; CHECK-NEXT:    ret
+entry:
+  %0 = tail call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> undef, <4 x i32> splat (i32 9), i64 0)
+  %div = udiv <vscale x 4 x i32> %0, splat (i32 3)
+  ret <vscale x 4 x i32> %div
+}
+
+define <vscale x 4 x i32> @insert_mul() {
+; CHECK-LABEL: insert_mul:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov z0.s, #7 // =0x7
+; CHECK-NEXT:    ret
+entry:
+  %0 = tail call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> undef, <4 x i32> splat (i32 1), i64 0)
+  %mul = mul <vscale x 4 x i32> %0, splat (i32 7)
+  ret <vscale x 4 x i32> %mul
+}
+
+define <vscale x 4 x i32> @insert_add() {
+; CHECK-LABEL: insert_add:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov z0.s, #16 // =0x10
+; CHECK-NEXT:    ret
+entry:
+  %0 = tail call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> undef, <4 x i32> splat (i32 5), i64 0)
+  %add = add <vscale x 4 x i32> %0, splat (i32 11)
+  ret <vscale x 4 x i32> %add
+}
+
+define <vscale x 4 x i32> @insert_sub() {
+; CHECK-LABEL: insert_sub:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-NEXT:    ret
+entry:
+  %0 = tail call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> undef, <4 x i32> splat (i32 11), i64 0)
+  %sub = add <vscale x 4 x i32> %0, splat (i32 -11)
+  ret <vscale x 4 x i32> %sub
+}
diff --git a/llvm/test/CodeGen/AArch64/vecreduce-add.ll b/llvm/test/CodeGen/AArch64/vecreduce-add.ll
index 2d0df562b9a4b..f56e1680f79c6 100644
--- a/llvm/test/CodeGen/AArch64/vecreduce-add.ll
+++ b/llvm/test/CodeGen/AArch64/vecreduce-add.ll
@@ -4778,7 +4778,7 @@ entry:
 define i64 @extract_scalable(<2 x i32> %0) "target-features"="+sve2" {
 ; CHECK-SD-LABEL: extract_scalable:
 ; CHECK-SD:       // %bb.0:
-; CHECK-SD-NEXT:    movi v1.2s, #1
+; CHECK-SD-NEXT:    mov z1.s, #1 // =0x1
 ; CHECK-SD-NEXT:    ptrue p0.s, vl2
 ; CHECK-SD-NEXT:    // kill: def $d0 killed $d0 def $z0
 ; CHECK-SD-NEXT:    sdivr z0.s, p0/m, z0.s, z1.s
diff --git a/llvm/test/CodeGen/X86/pr35443.ll b/llvm/test/CodeGen/X86/pr35443.ll
index 430a1380c7c8c..8438b73acac23 100644
--- a/llvm/test/CodeGen/X86/pr35443.ll
+++ b/llvm/test/CodeGen/X86/pr35443.ll
@@ -8,7 +8,7 @@
 define void @pr35443() {
 ; CHECK-LABEL: pr35443:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vpbroadcastb ac+4(%rip), %xmm0
+; CHECK-NEXT:    vpbroadcastb ac+4(%rip), %ymm0
 ; CHECK-NEXT:    vpmovzxbq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero,xmm0[2],zero,zero,zero,zero,zero,zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vpxor %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    vpsubq %ymm0, %ymm1, %ymm0

github-actions · 2025-10-17T16:02:08Z

✅ With the latest revision this PR passed the undef deprecator.

MacDue · 2025-10-17T16:31:33Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

+  auto *BV0 = dyn_cast<BuildVectorSDNode>(N1);
+  if (N0.isUndef() && (N1.getOpcode() == ISD::SPLAT_VECTOR || BV0)) {
+    SDValue Splat = BV0 ? BV0->getSplatValue() : N1.getOperand(0);
+    bool SplatLegal = TLI.isOperationLegalOrCustom(ISD::SPLAT_VECTOR, VT);


Note: The legality check avoids a regression (where the splat is just lowered to another BUILD_VECTOR, which seems to duplicate some code).

RKSimon · 2025-10-17T16:34:46Z

llvm/test/CodeGen/X86/pr35443.ll

 ; CHECK-LABEL: pr35443:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vpbroadcastb ac+4(%rip), %xmm0
+; CHECK-NEXT:    vpbroadcastb ac+4(%rip), %ymm0


This is unfortunate - SimplifyDemandedVectorElts should narrow this to xmm

I may move this fold to AArch64 only, it looks like RISCV may also have some trouble with this form, so I'm not sure applying it generally is worthwhile.

I'm pretty sure we've tried this before with the same outcome, because the combine is removing potentially valuable information. I wonder if instead of removing it you can teach the relevant is_splat/get_splat functions that take allow_undefs like parameters to do a better job of this idiom for scalable vectors?

Same goes for matching splats during isel as most of the relevant patterns should just work by looking-through the insert_subvector.

I'll look into it 👍

I've come up with an alternate approach that updates FoldConstantArithmetic to handle this case (does not appear to cause any regressions).

I looked into the various IsSplat/GetSplat functions, but I didn't see any uses that looked like they would result in these cases being constant-folded. Waiting till ISEL also looked like it may be too late, as some nodes, like UDIV, may expand to a more complex sequence by that point.

As this path has proven successful I now think the combine is better done at the IR level.

My ISEL suggestion is more for the next logical step when only the "value inserted into poison" is constant. While we want to preserve the information that only part of the vector has real data, we'll still want to match compatible variants as scalable vector splats during selection to match against the instructions that take an immediate.

…tors This teaches FoldConstantArithmetic to match `insert_subvector undef, (splat X), N2` as a splat of X. This pattern can occur for scalable vectors when a fixed-length splat is inserted into an undef vector. This allows the cases in `fixed-subvector-insert-into-scalable.ll` to be constant-folded (where previously they would all be computed at runtime).

github-actions · 2025-10-20T10:04:09Z

✅ With the latest revision this PR passed the C/C++ code formatter.

MacDue · 2025-10-20T15:12:10Z

(CI failure looks unrelated)

Precommit tests

1e605fc

MacDue requested review from SamTebbs33, davemgreen and paulwalker-arm October 17, 2025 15:58

llvmbot added backend:AArch64 backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Oct 17, 2025

MacDue commented Oct 17, 2025

View reviewed changes

RKSimon reviewed Oct 17, 2025

View reviewed changes

MacDue force-pushed the splat_fold branch from 52e5a38 to addf878 Compare October 20, 2025 10:02

MacDue changed the title ~~[SDAG] Match BUILD_VECTOR in INSERT_SUBVECTOR to SPLAT_VECTOR fold~~ [SDAG] Teach FoldConstantArithmetic to match splats inserted into vectors Oct 20, 2025

MacDue added 2 commits October 20, 2025 10:08

Format

7d18495

undef > poison

a1b2627

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SDAG] Teach FoldConstantArithmetic to match splats inserted into vectors #163984

[SDAG] Teach FoldConstantArithmetic to match splats inserted into vectors #163984

Uh oh!

MacDue commented Oct 17, 2025 •

edited

Loading

Uh oh!

llvmbot commented Oct 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 17, 2025 •

edited

Loading

Uh oh!

MacDue Oct 17, 2025

Uh oh!

RKSimon Oct 17, 2025

Uh oh!

MacDue Oct 17, 2025

Uh oh!

paulwalker-arm Oct 18, 2025

Uh oh!

MacDue Oct 20, 2025

Uh oh!

MacDue Oct 20, 2025

Uh oh!

paulwalker-arm Oct 21, 2025

Uh oh!

github-actions bot commented Oct 20, 2025 •

edited

Loading

Uh oh!

MacDue commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SDAG] Teach FoldConstantArithmetic to match splats inserted into vectors #163984

Are you sure you want to change the base?

[SDAG] Teach FoldConstantArithmetic to match splats inserted into vectors #163984

Uh oh!

Conversation

MacDue commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MacDue Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

MacDue Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

paulwalker-arm Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

MacDue Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

MacDue Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

paulwalker-arm Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MacDue commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MacDue commented Oct 17, 2025 •

edited

Loading

llvmbot commented Oct 17, 2025 •

edited

Loading

github-actions bot commented Oct 17, 2025 •

edited

Loading

github-actions bot commented Oct 20, 2025 •

edited

Loading