-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[MLIR][Conversion] Vector to LLVM: Remove unneeded vector shuffle #162946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MLIR][Conversion] Vector to LLVM: Remove unneeded vector shuffle #162946
Conversation
if vector.broadcast source is a scalar and target is a single element 1D vector.
@llvm/pr-subscribers-mlir-llvm @llvm/pr-subscribers-mlir Author: Sang Ik Lee (silee2) Changesif vector.broadcast source is a scalar and target is a single element 1D vector. Full diff: https://github.com/llvm/llvm-project/pull/162946.diff 2 Files Affected:
diff --git a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
index 5355909b62a7f..2bab94f82723e 100644
--- a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
+++ b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
@@ -1723,12 +1723,16 @@ struct VectorBroadcastScalarToLowRankLowering
return success();
}
- // For 1-d vector, we additionally do a `vectorshuffle`.
+ // For 1-d vector, we additionally do a `vectorshuffle` if vector width > 1.
auto v =
LLVM::InsertElementOp::create(rewriter, broadcast.getLoc(), vectorType,
poison, adaptor.getSource(), zero);
int64_t width = cast<VectorType>(broadcast.getType()).getDimSize(0);
+ if (width == 1) {
+ rewriter.replaceOp(broadcast, v);
+ return success();
+ }
SmallVector<int32_t> zeroValues(width, 0);
// Shuffle the value across the desired number of elements.
diff --git a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
index 2d33888854ea7..f704b8dba5eed 100644
--- a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
+++ b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir
@@ -76,6 +76,17 @@ func.func @broadcast_vec1d_from_f32(%arg0: f32) -> vector<2xf32> {
// -----
+func.func @broadcast_single_elem_vec1d_from_f32(%arg0: f32) -> vector<1xf32> {
+ %0 = vector.broadcast %arg0 : f32 to vector<1xf32>
+ return %0 : vector<1xf32>
+}
+// CHECK-LABEL: @broadcast_single_elem_vec1d_from_f32
+// CHECK-SAME: %[[A:.*]]: f32)
+// CHECK: %[[T0:.*]] = llvm.insertelement %[[A]]
+// CHECK: return %[[T0]] : vector<1xf32>
+
+// -----
+
func.func @broadcast_vec1d_from_f32_scalable(%arg0: f32) -> vector<[2]xf32> {
%0 = vector.broadcast %arg0 : f32 to vector<[2]xf32>
return %0 : vector<[2]xf32>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG % minor suggestions
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
// For 1-d vector, if vector width > 1, we additionally do a | ||
// `vector shuffle` | ||
int64_t width = cast<VectorType>(broadcast.getType()).getDimSize(0); | ||
if (width == 1) { | ||
rewriter.replaceOp(broadcast, v); | ||
return success(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this really just be a canonicalization for the shufflevector operation? Does this actually matter in practice where llvm isn't able to canonicalize it? I'm sure we can pre-canonicalize every piece of IR, but why are we doing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imagine if we start doing this for every operation that generates shufflevector, where we add special casing to always generate the canonicalized form. Why not just add it on the llvm operation or check if llvm does it for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not implement it as a folding pattern on shufflevector and use createOrFold when creating the shufflevecotr operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added folding pattern for llvm.shufflevector and using createOrFold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking to get a better reasoning for the PR, happy to unblock after agreeing on a reasoning (not a strong block, just want to make sure we discuss before landing). Currently, I think either we should implement this as a canonicalization for llvm.shufflevector or check if llvm already cleans this up.
I’d argue we shouldn’t rely on canonicalization or post-processing to clean up poor code when a cheap and straightforward alternative exists. This fix falls squarely into that category - it’s simple, local, and makes the generated IR strictly better. We should aim to produce good code by construction whenever it’s practical, rather than depend on later passes to clean things up. |
(I'm not blocking this change, the block is just to get clarification from the author why they originally intended to do this change before we land. Does LLVM have problems with this? I'm actually curious.) Do we have any docs on what the correct thing to do here is? Should every path trying to generate a vector<1xf32> canonicalize it to some nicer form when a canonicalization on the operation wouldve done it? I don't know what the correct answer is but this just seems like something that can grow very easily where every single pattern is trying to generate a better form of the operation. If implemented on the LLVM op, this would just be a fold pattern, which would be as expensive as the current way of implementing it in the pattern, just reusable. This is just calling createOrFold on it.
We should also not reimplement everything when it could be fold pattern and a call to createOrFold with no additional overhead. I'll remove my block, but this should be a fold pattern in the shufflevector op and we should use createOrFold if we care about doing it in the pattern. |
…single element vector.
The reason for this PR is to better support source materialization cast required in the following case.
And found that a redundant shufflevector is generated during source target materialization while working on #162947 Updated implementation and added canonicalization for llvm.shufflevector. |
The canonicalization version makes sense to me. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely implemented!
if vector.broadcast source is a scalar and target is a single element 1D vector.