Skip to content

Conversation

MacDue
Copy link
Member

@MacDue MacDue commented Oct 17, 2025

This teaches FoldConstantArithmetic to match insert_subvector undef, (splat X), N2 as a splat of X. This pattern can occur for scalable vectors when a fixed-length splat is inserted into an undef vector.

This allows the cases in fixed-subvector-insert-into-scalable.ll to constant-folded (where previously they would all be computed at runtime).

@llvmbot
Copy link
Member

llvmbot commented Oct 17, 2025

@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-selectiondag

Author: Benjamin Maxwell (MacDue)

Changes

This allows for more constant folding when inserting fixed-length vector splats into scalable vectors.


Full diff: https://github.com/llvm/llvm-project/pull/163984.diff

4 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+6-2)
  • (added) llvm/test/CodeGen/AArch64/fixed-subvector-insert-into-scalable.ll (+46)
  • (modified) llvm/test/CodeGen/AArch64/vecreduce-add.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr35443.ll (+1-1)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index c97300d64d455..8e74bcb25da63 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -28014,9 +28014,13 @@ SDValue DAGCombiner::visitINSERT_SUBVECTOR(SDNode *N) {
 
   // Simplify scalar inserts into an undef vector:
   // insert_subvector undef, (splat X), N2 -> splat X
-  if (N0.isUndef() && N1.getOpcode() == ISD::SPLAT_VECTOR)
-    if (DAG.isConstantValueOfAnyType(N1.getOperand(0)) || N1.hasOneUse())
+  auto *BV0 = dyn_cast<BuildVectorSDNode>(N1);
+  if (N0.isUndef() && (N1.getOpcode() == ISD::SPLAT_VECTOR || BV0)) {
+    SDValue Splat = BV0 ? BV0->getSplatValue() : N1.getOperand(0);
+    if (Splat &&
+        (N1.hasOneUse() || (!BV0 && DAG.isConstantValueOfAnyType(Splat))))
       return DAG.getNode(ISD::SPLAT_VECTOR, SDLoc(N), VT, N1.getOperand(0));
+  }
 
   // insert_subvector (splat X), (splat X), N2 -> splat X
   if (N0.getOpcode() == ISD::SPLAT_VECTOR && N0.getOpcode() == N1.getOpcode() &&
diff --git a/llvm/test/CodeGen/AArch64/fixed-subvector-insert-into-scalable.ll b/llvm/test/CodeGen/AArch64/fixed-subvector-insert-into-scalable.ll
new file mode 100644
index 0000000000000..5ce5baf42f5f5
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/fixed-subvector-insert-into-scalable.ll
@@ -0,0 +1,46 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+
+define <vscale x 4 x i32> @insert_div() {
+; CHECK-LABEL: insert_div:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov z0.s, #3 // =0x3
+; CHECK-NEXT:    ret
+entry:
+  %0 = tail call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> undef, <4 x i32> splat (i32 9), i64 0)
+  %div = udiv <vscale x 4 x i32> %0, splat (i32 3)
+  ret <vscale x 4 x i32> %div
+}
+
+define <vscale x 4 x i32> @insert_mul() {
+; CHECK-LABEL: insert_mul:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov z0.s, #7 // =0x7
+; CHECK-NEXT:    ret
+entry:
+  %0 = tail call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> undef, <4 x i32> splat (i32 1), i64 0)
+  %mul = mul <vscale x 4 x i32> %0, splat (i32 7)
+  ret <vscale x 4 x i32> %mul
+}
+
+define <vscale x 4 x i32> @insert_add() {
+; CHECK-LABEL: insert_add:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov z0.s, #16 // =0x10
+; CHECK-NEXT:    ret
+entry:
+  %0 = tail call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> undef, <4 x i32> splat (i32 5), i64 0)
+  %add = add <vscale x 4 x i32> %0, splat (i32 11)
+  ret <vscale x 4 x i32> %add
+}
+
+define <vscale x 4 x i32> @insert_sub() {
+; CHECK-LABEL: insert_sub:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-NEXT:    ret
+entry:
+  %0 = tail call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> undef, <4 x i32> splat (i32 11), i64 0)
+  %sub = add <vscale x 4 x i32> %0, splat (i32 -11)
+  ret <vscale x 4 x i32> %sub
+}
diff --git a/llvm/test/CodeGen/AArch64/vecreduce-add.ll b/llvm/test/CodeGen/AArch64/vecreduce-add.ll
index 2d0df562b9a4b..f56e1680f79c6 100644
--- a/llvm/test/CodeGen/AArch64/vecreduce-add.ll
+++ b/llvm/test/CodeGen/AArch64/vecreduce-add.ll
@@ -4778,7 +4778,7 @@ entry:
 define i64 @extract_scalable(<2 x i32> %0) "target-features"="+sve2" {
 ; CHECK-SD-LABEL: extract_scalable:
 ; CHECK-SD:       // %bb.0:
-; CHECK-SD-NEXT:    movi v1.2s, #1
+; CHECK-SD-NEXT:    mov z1.s, #1 // =0x1
 ; CHECK-SD-NEXT:    ptrue p0.s, vl2
 ; CHECK-SD-NEXT:    // kill: def $d0 killed $d0 def $z0
 ; CHECK-SD-NEXT:    sdivr z0.s, p0/m, z0.s, z1.s
diff --git a/llvm/test/CodeGen/X86/pr35443.ll b/llvm/test/CodeGen/X86/pr35443.ll
index 430a1380c7c8c..8438b73acac23 100644
--- a/llvm/test/CodeGen/X86/pr35443.ll
+++ b/llvm/test/CodeGen/X86/pr35443.ll
@@ -8,7 +8,7 @@
 define void @pr35443() {
 ; CHECK-LABEL: pr35443:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vpbroadcastb ac+4(%rip), %xmm0
+; CHECK-NEXT:    vpbroadcastb ac+4(%rip), %ymm0
 ; CHECK-NEXT:    vpmovzxbq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero,xmm0[2],zero,zero,zero,zero,zero,zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero
 ; CHECK-NEXT:    vpxor %xmm1, %xmm1, %xmm1
 ; CHECK-NEXT:    vpsubq %ymm0, %ymm1, %ymm0

Copy link

github-actions bot commented Oct 17, 2025

✅ With the latest revision this PR passed the undef deprecator.

auto *BV0 = dyn_cast<BuildVectorSDNode>(N1);
if (N0.isUndef() && (N1.getOpcode() == ISD::SPLAT_VECTOR || BV0)) {
SDValue Splat = BV0 ? BV0->getSplatValue() : N1.getOperand(0);
bool SplatLegal = TLI.isOperationLegalOrCustom(ISD::SPLAT_VECTOR, VT);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: The legality check avoids a regression (where the splat is just lowered to another BUILD_VECTOR, which seems to duplicate some code).

; CHECK-LABEL: pr35443:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: vpbroadcastb ac+4(%rip), %xmm0
; CHECK-NEXT: vpbroadcastb ac+4(%rip), %ymm0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunate - SimplifyDemandedVectorElts should narrow this to xmm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may move this fold to AArch64 only, it looks like RISCV may also have some trouble with this form, so I'm not sure applying it generally is worthwhile.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure we've tried this before with the same outcome, because the combine is removing potentially valuable information. I wonder if instead of removing it you can teach the relevant is_splat/get_splat functions that take allow_undefs like parameters to do a better job of this idiom for scalable vectors?

Same goes for matching splats during isel as most of the relevant patterns should just work by looking-through the insert_subvector.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into it 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've come up with an alternate approach that updates FoldConstantArithmetic to handle this case (does not appear to cause any regressions).

I looked into the various IsSplat/GetSplat functions, but I didn't see any uses that looked like they would result in these cases being constant-folded. Waiting till ISEL also looked like it may be too late, as some nodes, like UDIV, may expand to a more complex sequence by that point.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this path has proven successful I now think the combine is better done at the IR level.

My ISEL suggestion is more for the next logical step when only the "value inserted into poison" is constant. While we want to preserve the information that only part of the vector has real data, we'll still want to match compatible variants as scalable vector splats during selection to match against the instructions that take an immediate.

…tors

This teaches FoldConstantArithmetic to match `insert_subvector undef, (splat X), N2`
as a splat of X. This pattern can occur for scalable vectors when a
fixed-length splat is inserted into an undef vector.

This allows the cases in `fixed-subvector-insert-into-scalable.ll` to be
constant-folded (where previously they would all be computed at runtime).
@MacDue MacDue changed the title [SDAG] Match BUILD_VECTOR in INSERT_SUBVECTOR to SPLAT_VECTOR fold [SDAG] Teach FoldConstantArithmetic to match splats inserted into vectors Oct 20, 2025
Copy link

github-actions bot commented Oct 20, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@MacDue
Copy link
Member Author

MacDue commented Oct 20, 2025

(CI failure looks unrelated)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants