[MLIR][Conversion] Vector to LLVM: Remove unneeded vector shuffle #162946

silee2 · 2025-10-11T00:13:21Z

if vector.broadcast source is a scalar and target is a single element 1D vector.

llvmbot · 2025-10-11T00:13:59Z

@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-mlir

Author: Sang Ik Lee (silee2)

Changes

if vector.broadcast source is a scalar and target is a single element 1D vector.

Full diff: https://github.com/llvm/llvm-project/pull/162946.diff

2 Files Affected:

(modified) mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp (+5-1)
(modified) mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir (+11)

diff --git a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
index 5355909b62a7f..2bab94f82723e 100644
--- a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
+++ b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
@@ -1723,12 +1723,16 @@ struct VectorBroadcastScalarToLowRankLowering
       return success();
     }
 
-    // For 1-d vector, we additionally do a `vectorshuffle`.
+    // For 1-d vector, we additionally do a `vectorshuffle` if vector width > 1.
     auto v =
         LLVM::InsertElementOp::create(rewriter, broadcast.getLoc(), vectorType,
                                       poison, adaptor.getSource(), zero);
 
     int64_t width = cast<VectorType>(broadcast.getType()).getDimSize(0);
+    if (width == 1) {
+      rewriter.replaceOp(broadcast, v);
+      return success();
+    }
     SmallVector<int32_t> zeroValues(width, 0);
 
     // Shuffle the value across the desired number of elements.
diff --git a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
index 2d33888854ea7..f704b8dba5eed 100644
--- a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
+++ b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
@@ -76,6 +76,17 @@ func.func @broadcast_vec1d_from_f32(%arg0: f32) -> vector<2xf32> {
 
 // -----
 
+func.func @broadcast_single_elem_vec1d_from_f32(%arg0: f32) -> vector<1xf32> {
+  %0 = vector.broadcast %arg0 : f32 to vector<1xf32>
+  return %0 : vector<1xf32>
+}
+// CHECK-LABEL: @broadcast_single_elem_vec1d_from_f32
+// CHECK-SAME:  %[[A:.*]]: f32)
+// CHECK:       %[[T0:.*]] = llvm.insertelement %[[A]]
+// CHECK:       return %[[T0]] : vector<1xf32>
+
+// -----
+
 func.func @broadcast_vec1d_from_f32_scalable(%arg0: f32) -> vector<[2]xf32> {
   %0 = vector.broadcast %arg0 : f32 to vector<[2]xf32>
   return %0 : vector<[2]xf32>

banach-space

LG % minor suggestions

Thanks!

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp

mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir

dcaballe

LGTM, thanks!

Groverkss · 2025-10-14T11:25:49Z

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp

+    // For 1-d vector, if vector width > 1, we additionally do a
+    // `vector shuffle`
    int64_t width = cast<VectorType>(broadcast.getType()).getDimSize(0);
+    if (width == 1) {
+      rewriter.replaceOp(broadcast, v);
+      return success();
+    }


Shouldn't this really just be a canonicalization for the shufflevector operation? Does this actually matter in practice where llvm isn't able to canonicalize it? I'm sure we can pre-canonicalize every piece of IR, but why are we doing it?

Imagine if we start doing this for every operation that generates shufflevector, where we add special casing to always generate the canonicalized form. Why not just add it on the llvm operation or check if llvm does it for you?

Why not implement it as a folding pattern on shufflevector and use createOrFold when creating the shufflevecotr operation?

Added folding pattern for llvm.shufflevector and using createOrFold.

Groverkss

Blocking to get a better reasoning for the PR, happy to unblock after agreeing on a reasoning (not a strong block, just want to make sure we discuss before landing). Currently, I think either we should implement this as a canonicalization for llvm.shufflevector or check if llvm already cleans this up.

banach-space · 2025-10-14T12:54:49Z

Currently, I think either we should implement this as a canonicalization for llvm.shufflevector or check if llvm already cleans this up.

I’d argue we shouldn’t rely on canonicalization or post-processing to clean up poor code when a cheap and straightforward alternative exists.

This fix falls squarely into that category - it’s simple, local, and makes the generated IR strictly better. We should aim to produce good code by construction whenever it’s practical, rather than depend on later passes to clean things up.

Groverkss · 2025-10-14T13:02:38Z

Currently, I think either we should implement this as a canonicalization for llvm.shufflevector or check if llvm already cleans this up.

I’d argue we shouldn’t rely on canonicalization or post-processing to clean up poor code when a cheap and straightforward alternative exists.

This fix falls squarely into that category - it’s simple, local, and makes the generated IR strictly better. We should aim to produce good code by construction whenever it’s practical, rather than depend on later passes to clean things up.

(I'm not blocking this change, the block is just to get clarification from the author why they originally intended to do this change before we land. Does LLVM have problems with this? I'm actually curious.)

Do we have any docs on what the correct thing to do here is? Should every path trying to generate a vector<1xf32> canonicalize it to some nicer form when a canonicalization on the operation wouldve done it?

I don't know what the correct answer is but this just seems like something that can grow very easily where every single pattern is trying to generate a better form of the operation.

If implemented on the LLVM op, this would just be a fold pattern, which would be as expensive as the current way of implementing it in the pattern, just reusable. This is just calling createOrFold on it.

This fix falls squarely into that category - it’s simple, local, and makes the generated IR strictly better. We should aim to produce good code by construction whenever it’s practical, rather than depend on later passes to clean things up.

We should also not reimplement everything when it could be fold pattern and a call to createOrFold with no additional overhead.

I'll remove my block, but this should be a fold pattern in the shufflevector op and we should use createOrFold if we care about doing it in the pattern.

reason in comment

…single element vector.

silee2 · 2025-10-15T17:10:39Z

Blocking to get a better reasoning for the PR, happy to unblock after agreeing on a reasoning (not a strong block, just want to make sure we discuss before landing). Currently, I think either we should implement this as a canonicalization for llvm.shufflevector or check if llvm already cleans this up.

The reason for this PR is to better support source materialization cast required in the following case.
XeVM dialect type system does not allow single element vector.
Meanwhile in XeGPU dialect, single element vector is legal.
To bridge this gap, XeGPU to XeVM conversion adds

target materialization cast to convert single element vector to scalar
source materialization cast to convert scalar to single element vector [MLIR][Conversion] XeGPU to XeVM: Remove unused type converter source materializations. #162947

And found that a redundant shufflevector is generated during source target materialization while working on #162947

Updated implementation and added canonicalization for llvm.shufflevector.

dcaballe · 2025-10-15T18:34:01Z

The canonicalization version makes sense to me. Thanks!

Groverkss

Nicely implemented!

[MLIR][Conversion] Vector to LLVM: Remove unneeded vectorshuffle

c44404b

if vector.broadcast source is a scalar and target is a single element 1D vector.

silee2 requested review from banach-space, dcaballe and nicolasvasilache as code owners October 11, 2025 00:13

llvmbot added the mlir label Oct 11, 2025

banach-space approved these changes Oct 13, 2025

View reviewed changes

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp Outdated Show resolved Hide resolved

mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir Show resolved Hide resolved

silee2 added 2 commits October 13, 2025 10:49

Update comment.

f75fe90

Update test.

b0d476e

dcaballe approved these changes Oct 13, 2025

View reviewed changes

Groverkss reviewed Oct 14, 2025

View reviewed changes

Groverkss previously requested changes Oct 14, 2025

View reviewed changes

Add op fold pattern for llvm.shufflevector and use it for optimizing …

a33e198

…single element vector.

llvmbot added the mlir:llvm label Oct 14, 2025

Groverkss approved these changes Oct 15, 2025

View reviewed changes

silee2 merged commit 856de05 into llvm:main Oct 15, 2025
11 checks passed

[MLIR][Conversion] Vector to LLVM: Remove unneeded vector shuffle #162946

[MLIR][Conversion] Vector to LLVM: Remove unneeded vector shuffle #162946

Uh oh!

Conversation

silee2 commented Oct 11, 2025

Uh oh!

llvmbot commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

banach-space left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dcaballe left a comment

Choose a reason for hiding this comment

Uh oh!

Groverkss Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Groverkss Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Groverkss Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

silee2 Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Groverkss left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

banach-space commented Oct 14, 2025

Uh oh!

Groverkss commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

silee2 commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcaballe commented Oct 15, 2025

Uh oh!

Groverkss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

llvmbot commented Oct 11, 2025 •

edited

Loading

Groverkss left a comment •

edited

Loading

Groverkss commented Oct 14, 2025 •

edited

Loading

silee2 commented Oct 15, 2025 •

edited

Loading