Skip to content

Conversation

silee2
Copy link
Contributor

@silee2 silee2 commented Oct 11, 2025

if vector.broadcast source is a scalar and target is a single element 1D vector.

if vector.broadcast source is a scalar and target is a single element 1D vector.
@llvmbot
Copy link
Member

llvmbot commented Oct 11, 2025

@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-mlir

Author: Sang Ik Lee (silee2)

Changes

if vector.broadcast source is a scalar and target is a single element 1D vector.


Full diff: https://github.com/llvm/llvm-project/pull/162946.diff

2 Files Affected:

  • (modified) mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp (+5-1)
  • (modified) mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir (+11)
diff --git a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
index 5355909b62a7f..2bab94f82723e 100644
--- a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
+++ b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
@@ -1723,12 +1723,16 @@ struct VectorBroadcastScalarToLowRankLowering
       return success();
     }
 
-    // For 1-d vector, we additionally do a `vectorshuffle`.
+    // For 1-d vector, we additionally do a `vectorshuffle` if vector width > 1.
     auto v =
         LLVM::InsertElementOp::create(rewriter, broadcast.getLoc(), vectorType,
                                       poison, adaptor.getSource(), zero);
 
     int64_t width = cast<VectorType>(broadcast.getType()).getDimSize(0);
+    if (width == 1) {
+      rewriter.replaceOp(broadcast, v);
+      return success();
+    }
     SmallVector<int32_t> zeroValues(width, 0);
 
     // Shuffle the value across the desired number of elements.
diff --git a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
index 2d33888854ea7..f704b8dba5eed 100644
--- a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
+++ b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
@@ -76,6 +76,17 @@ func.func @broadcast_vec1d_from_f32(%arg0: f32) -> vector<2xf32> {
 
 // -----
 
+func.func @broadcast_single_elem_vec1d_from_f32(%arg0: f32) -> vector<1xf32> {
+  %0 = vector.broadcast %arg0 : f32 to vector<1xf32>
+  return %0 : vector<1xf32>
+}
+// CHECK-LABEL: @broadcast_single_elem_vec1d_from_f32
+// CHECK-SAME:  %[[A:.*]]: f32)
+// CHECK:       %[[T0:.*]] = llvm.insertelement %[[A]]
+// CHECK:       return %[[T0]] : vector<1xf32>
+
+// -----
+
 func.func @broadcast_vec1d_from_f32_scalable(%arg0: f32) -> vector<[2]xf32> {
   %0 = vector.broadcast %arg0 : f32 to vector<[2]xf32>
   return %0 : vector<[2]xf32>

Copy link
Contributor

@banach-space banach-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG % minor suggestions

Thanks!

Copy link
Contributor

@dcaballe dcaballe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Comment on lines 1730 to 1736
// For 1-d vector, if vector width > 1, we additionally do a
// `vector shuffle`
int64_t width = cast<VectorType>(broadcast.getType()).getDimSize(0);
if (width == 1) {
rewriter.replaceOp(broadcast, v);
return success();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this really just be a canonicalization for the shufflevector operation? Does this actually matter in practice where llvm isn't able to canonicalize it? I'm sure we can pre-canonicalize every piece of IR, but why are we doing it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine if we start doing this for every operation that generates shufflevector, where we add special casing to always generate the canonicalized form. Why not just add it on the llvm operation or check if llvm does it for you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not implement it as a folding pattern on shufflevector and use createOrFold when creating the shufflevecotr operation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added folding pattern for llvm.shufflevector and using createOrFold.

Copy link
Member

@Groverkss Groverkss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking to get a better reasoning for the PR, happy to unblock after agreeing on a reasoning (not a strong block, just want to make sure we discuss before landing). Currently, I think either we should implement this as a canonicalization for llvm.shufflevector or check if llvm already cleans this up.

@banach-space
Copy link
Contributor

Currently, I think either we should implement this as a canonicalization for llvm.shufflevector or check if llvm already cleans this up.

I’d argue we shouldn’t rely on canonicalization or post-processing to clean up poor code when a cheap and straightforward alternative exists.

This fix falls squarely into that category - it’s simple, local, and makes the generated IR strictly better. We should aim to produce good code by construction whenever it’s practical, rather than depend on later passes to clean things up.

@Groverkss
Copy link
Member

Groverkss commented Oct 14, 2025

Currently, I think either we should implement this as a canonicalization for llvm.shufflevector or check if llvm already cleans this up.

I’d argue we shouldn’t rely on canonicalization or post-processing to clean up poor code when a cheap and straightforward alternative exists.

This fix falls squarely into that category - it’s simple, local, and makes the generated IR strictly better. We should aim to produce good code by construction whenever it’s practical, rather than depend on later passes to clean things up.

(I'm not blocking this change, the block is just to get clarification from the author why they originally intended to do this change before we land. Does LLVM have problems with this? I'm actually curious.)

Do we have any docs on what the correct thing to do here is? Should every path trying to generate a vector<1xf32> canonicalize it to some nicer form when a canonicalization on the operation wouldve done it?

I don't know what the correct answer is but this just seems like something that can grow very easily where every single pattern is trying to generate a better form of the operation.

If implemented on the LLVM op, this would just be a fold pattern, which would be as expensive as the current way of implementing it in the pattern, just reusable. This is just calling createOrFold on it.

This fix falls squarely into that category - it’s simple, local, and makes the generated IR strictly better. We should aim to produce good code by construction whenever it’s practical, rather than depend on later passes to clean things up.

We should also not reimplement everything when it could be fold pattern and a call to createOrFold with no additional overhead.

I'll remove my block, but this should be a fold pattern in the shufflevector op and we should use createOrFold if we care about doing it in the pattern.

@Groverkss Groverkss dismissed their stale review October 14, 2025 13:02

reason in comment

@silee2
Copy link
Contributor Author

silee2 commented Oct 15, 2025

Blocking to get a better reasoning for the PR, happy to unblock after agreeing on a reasoning (not a strong block, just want to make sure we discuss before landing). Currently, I think either we should implement this as a canonicalization for llvm.shufflevector or check if llvm already cleans this up.

The reason for this PR is to better support source materialization cast required in the following case.
XeVM dialect type system does not allow single element vector.
Meanwhile in XeGPU dialect, single element vector is legal.
To bridge this gap, XeGPU to XeVM conversion adds

And found that a redundant shufflevector is generated during source target materialization while working on #162947

Updated implementation and added canonicalization for llvm.shufflevector.

@dcaballe
Copy link
Contributor

The canonicalization version makes sense to me. Thanks!

Copy link
Member

@Groverkss Groverkss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely implemented!

@silee2 silee2 merged commit 856de05 into llvm:main Oct 15, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants