[mlir][Vector] Add patterns to lower `vector.shuffle` #157611

dcaballe · 2025-09-09T04:12:55Z

This PR adds patterns to lower vector.shuffle with inputs with different vector sizes more efficiently. The current LLVM lowering for these cases degenerates to a sequence of vector.extract and vector.insert operations. With this PR, the smaller input is promoted to larger vector size by introducing an extra vector.shuffle.

llvmbot · 2025-09-09T04:13:25Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-vector

Author: Diego Caballero (dcaballe)

Changes

This PR adds patterns to lower vector.shuffle with inputs with different vector sizes more efficiently. The current LLVM lowering for these cases degenerates to a sequence of vector.extract and vector.insert operations. With this PR, the smaller input is promoted to larger vector size by introducing an extra vector.shuffle.

Full diff: https://github.com/llvm/llvm-project/pull/157611.diff

5 Files Affected:

(modified) mlir/include/mlir/Dialect/Vector/Transforms/LoweringPatterns.h (+3)
(modified) mlir/lib/Dialect/Vector/Transforms/CMakeLists.txt (+1)
(added) mlir/lib/Dialect/Vector/Transforms/LowerVectorShuffle.cpp (+104)
(added) mlir/test/Dialect/Vector/vector-shuffle-lowering.mlir (+77)
(modified) mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp (+18)

diff --git a/mlir/include/mlir/Dialect/Vector/Transforms/LoweringPatterns.h b/mlir/include/mlir/Dialect/Vector/Transforms/LoweringPatterns.h
index 47f96112a9433..5bd745582a362 100644
--- a/mlir/include/mlir/Dialect/Vector/Transforms/LoweringPatterns.h
+++ b/mlir/include/mlir/Dialect/Vector/Transforms/LoweringPatterns.h
@@ -293,6 +293,9 @@ void populateVectorBitCastLoweringPatterns(RewritePatternSet &patterns,
                                            int64_t targetRank = 1,
                                            PatternBenefit benefit = 1);
 
+void populateVectorShuffleLoweringPatterns(RewritePatternSet &patterns,
+                                           PatternBenefit benefit = 1);
+
 /// Populates a pattern that rank-reduces n-D FMAs into (n-1)-D FMAs where
 /// n > 1.
 void populateVectorRankReducingFMAPattern(RewritePatternSet &patterns);
diff --git a/mlir/lib/Dialect/Vector/Transforms/CMakeLists.txt b/mlir/lib/Dialect/Vector/Transforms/CMakeLists.txt
index acbf2b746037b..80d4f5a38d231 100644
--- a/mlir/lib/Dialect/Vector/Transforms/CMakeLists.txt
+++ b/mlir/lib/Dialect/Vector/Transforms/CMakeLists.txt
@@ -10,6 +10,7 @@ add_mlir_dialect_library(MLIRVectorTransforms
   LowerVectorMultiReduction.cpp
   LowerVectorScan.cpp
   LowerVectorShapeCast.cpp
+  LowerVectorShuffle.cpp
   LowerVectorStep.cpp
   LowerVectorToFromElementsToShuffleTree.cpp
   LowerVectorTransfer.cpp
diff --git a/mlir/lib/Dialect/Vector/Transforms/LowerVectorShuffle.cpp b/mlir/lib/Dialect/Vector/Transforms/LowerVectorShuffle.cpp
new file mode 100644
index 0000000000000..2ef6401d137c9
--- /dev/null
+++ b/mlir/lib/Dialect/Vector/Transforms/LowerVectorShuffle.cpp
@@ -0,0 +1,104 @@
+//===- LowerVectorShuffle.cpp - Lower 'vector.shuffle' operation ----------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the lowering of complex `vector.shuffle` operation to a
+// set of simpler operations supported by LLVM/SPIR-V.
+//
+//===----------------------------------------------------------------------===//
+
+#include "mlir/Dialect/Arith/IR/Arith.h"
+#include "mlir/Dialect/Vector/IR/VectorOps.h"
+#include "mlir/Dialect/Vector/Transforms/LoweringPatterns.h"
+#include "mlir/IR/PatternMatch.h"
+
+#define DEBUG_TYPE "vector-shuffle-lowering"
+
+using namespace mlir;
+using namespace mlir::vector;
+
+namespace {
+
+/// Lowers a `vector.shuffle` operation with mix-size inputs to a new `vector.shuffle`
+/// which promotes the smaller input to the larger vector size and an updated version
+/// of the original `vector.shuffle`.
+///
+/// Example:
+///
+///     %0 = vector.shuffle %v1, %v2 [0, 2, 1, 3] : vector<2xf32>, vector<4xf32>
+///
+///   is lowered to:
+///
+///     %0 = vector.shuffle %v1, %v1 [0, 1, -1, -1] : vector<2xf32>, vector<2xf32>
+///     %1 = vector.shuffle %0, %v2 [0, 4, 1, 5] : vector<4xf32>, vector<4xf32>
+///
+struct MixSizeInputShuffleOpRewrite final
+    : OpRewritePattern<vector::ShuffleOp> {
+  using OpRewritePattern::OpRewritePattern;
+
+  LogicalResult matchAndRewrite(vector::ShuffleOp shuffleOp,
+                                PatternRewriter &rewriter) const override {
+    auto v1Type = shuffleOp.getV1VectorType();
+    auto v2Type = shuffleOp.getV2VectorType();
+
+    // Only support 1-D shuffle for now.
+    if (v1Type.getRank() != 1 || v2Type.getRank() != 1)
+      return failure();
+
+    // No mix-size inputs.
+    int64_t v1OrigNumElems = v1Type.getNumElements();
+    int64_t v2OrigNumElems = v2Type.getNumElements();
+    if (v1OrigNumElems == v2OrigNumElems)
+      return failure();
+
+    // Determine which input needs promotion.
+    bool promoteV1 = v1OrigNumElems < v2OrigNumElems;
+    Value inputToPromote = promoteV1 ? shuffleOp.getV1() : shuffleOp.getV2();
+    VectorType promotedType = promoteV1 ? v2Type : v1Type;
+    int64_t origNumElems = promoteV1 ? v1OrigNumElems : v2OrigNumElems;
+    int64_t promotedNumElems = promoteV1 ? v2OrigNumElems : v1OrigNumElems;
+
+    // Create a shuffle with a mask that preserves existing elements and fills
+    // up with poison.
+    SmallVector<int64_t> promoteMask(promotedNumElems, ShuffleOp::kPoisonIndex);
+    for (int64_t i = 0; i < origNumElems; ++i)
+      promoteMask[i] = i;
+
+    Value promotedInput = rewriter.create<vector::ShuffleOp>(
+        shuffleOp.getLoc(), promotedType, inputToPromote, inputToPromote,
+        promoteMask);
+
+    // Create the final shuffle with the promoted inputs.
+    Value promotedV1 = promoteV1 ? promotedInput : shuffleOp.getV1();
+    Value promotedV2 = promoteV1 ? shuffleOp.getV2() : promotedInput;
+
+    SmallVector<int64_t> newMask;
+    if (!promoteV1) {
+      newMask = to_vector(shuffleOp.getMask());
+    } else {
+      // Adjust V2 indices to account for the new V1 size.
+      for (auto idx : shuffleOp.getMask()) {
+        int64_t newIdx = idx;
+        if (idx >= v1OrigNumElems) {
+          newIdx += promotedNumElems - v1OrigNumElems;
+        }
+        newMask.push_back(newIdx);
+      }
+    }
+
+    rewriter.replaceOpWithNewOp<vector::ShuffleOp>(
+        shuffleOp, shuffleOp.getResultVectorType(), promotedV1, promotedV2,
+        newMask);
+    return success();
+  }
+};
+} // namespace
+
+void mlir::vector::populateVectorShuffleLoweringPatterns(
+    RewritePatternSet &patterns, PatternBenefit benefit) {
+  patterns.add<MixSizeInputShuffleOpRewrite>(patterns.getContext(), benefit);
+}
diff --git a/mlir/test/Dialect/Vector/vector-shuffle-lowering.mlir b/mlir/test/Dialect/Vector/vector-shuffle-lowering.mlir
new file mode 100644
index 0000000000000..3acc9aa9bffa0
--- /dev/null
+++ b/mlir/test/Dialect/Vector/vector-shuffle-lowering.mlir
@@ -0,0 +1,77 @@
+// RUN: mlir-opt %s --test-vector-shuffle-lowering --split-input-file | FileCheck %s
+
+// CHECK-LABEL: func.func @shuffle_v1_smaller_arbitrary
+// CHECK-SAME:    %[[V1:.*]]: vector<2xf32>, %[[V2:.*]]: vector<4xf32>
+func.func @shuffle_v1_smaller_arbitrary(%v1: vector<2xf32>, %v2: vector<4xf32>) -> vector<5xf32> {
+  // CHECK: %[[PROMOTE_V1:.*]] = vector.shuffle %[[V1]], %[[V1]] [0, 1, -1, -1] : vector<2xf32>, vector<2xf32>
+  // CHECK: %[[RESULT:.*]] = vector.shuffle %[[PROMOTE_V1]], %[[V2]] [1, 5, 0, 6, 7] : vector<4xf32>, vector<4xf32>
+  // CHECK: return %[[RESULT]] : vector<5xf32>
+  %0 = vector.shuffle %v1, %v2 [1, 3, 0, 4, 5] : vector<2xf32>, vector<4xf32>
+  return %0 : vector<5xf32>
+}
+
+// -----
+
+// CHECK-LABEL: func.func @shuffle_v2_smaller_arbitrary
+// CHECK-SAME:    %[[V1:.*]]: vector<4xi32>, %[[V2:.*]]: vector<2xi32>
+func.func @shuffle_v2_smaller_arbitrary(%v1: vector<4xi32>, %v2: vector<2xi32>) -> vector<6xi32> {
+  // CHECK: %[[PROMOTE_V2:.*]] = vector.shuffle %[[V2]], %[[V2]] [0, 1, -1, -1] : vector<2xi32>, vector<2xi32>
+  // CHECK: %[[RESULT:.*]] = vector.shuffle %[[V1]], %[[PROMOTE_V2]] [3, 5, 1, 4, 0, 2] : vector<4xi32>, vector<4xi32>
+  // CHECK: return %[[RESULT]] : vector<6xi32>
+  %0 = vector.shuffle %v1, %v2 [3, 5, 1, 4, 0, 2] : vector<4xi32>, vector<2xi32>
+  return %0 : vector<6xi32>
+}
+
+// -----
+
+// CHECK-LABEL: func.func @shuffle_v1_smaller_concat
+// CHECK-SAME:    %[[V1:.*]]: vector<3xf64>, %[[V2:.*]]: vector<5xf64>
+func.func @shuffle_v1_smaller_concat(%v1: vector<3xf64>, %v2: vector<5xf64>) -> vector<8xf64> {
+  // CHECK: %[[PROMOTE_V1:.*]] = vector.shuffle %[[V1]], %[[V1]] [0, 1, 2, -1, -1] : vector<3xf64>, vector<3xf64>
+  // CHECK: %[[RESULT:.*]] = vector.shuffle %[[PROMOTE_V1]], %[[V2]] [0, 1, 2, 5, 6, 7, 8, 9] : vector<5xf64>, vector<5xf64>
+  // CHECK: return %[[RESULT]] : vector<8xf64>
+  %0 = vector.shuffle %v1, %v2 [0, 1, 2, 3, 4, 5, 6, 7] : vector<3xf64>, vector<5xf64>
+  return %0 : vector<8xf64>
+}
+
+// -----
+
+// CHECK-LABEL: func.func @shuffle_v2_smaller_concat
+// CHECK-SAME:    %[[V1:.*]]: vector<4xi16>, %[[V2:.*]]: vector<2xi16>
+func.func @shuffle_v2_smaller_concat(%v1: vector<4xi16>, %v2: vector<2xi16>) -> vector<6xi16> {
+  // CHECK: %[[PROMOTE_V2:.*]] = vector.shuffle %[[V2]], %[[V2]] [0, 1, -1, -1] : vector<2xi16>, vector<2xi16>
+  // CHECK: %[[RESULT:.*]] = vector.shuffle %[[V1]], %[[PROMOTE_V2]] [0, 1, 2, 3, 4, 5] : vector<4xi16>, vector<4xi16>
+  // CHECK: return %[[RESULT]] : vector<6xi16>
+  %0 = vector.shuffle %v1, %v2 [0, 1, 2, 3, 4, 5] : vector<4xi16>, vector<2xi16>
+  return %0 : vector<6xi16>
+}
+
+// -----
+
+// Test that shuffles with same size inputs are not modified.
+
+// CHECK-LABEL: func.func @shuffle_same_input_sizes
+// CHECK-SAME:    %[[V1:.*]]: vector<4xf32>, %[[V2:.*]]: vector<4xf32>
+func.func @shuffle_same_input_sizes(%v1: vector<4xf32>, %v2: vector<4xf32>) -> vector<6xf32> {
+  // CHECK-NOT: vector.shuffle %[[V1]], %[[V1]]
+  // CHECK-NOT: vector.shuffle %[[V2]], %[[V2]]
+  // CHECK: %[[RESULT:.*]] = vector.shuffle %[[V1]], %[[V2]] [0, 1, 4, 5, 2, 6] : vector<4xf32>, vector<4xf32>
+  // CHECK: return %[[RESULT]] : vector<6xf32>
+  %0 = vector.shuffle %v1, %v2 [0, 1, 4, 5, 2, 6] : vector<4xf32>, vector<4xf32>
+  return %0 : vector<6xf32>
+}
+
+// -----
+
+// Test that multi-dimensional shuffles are not modified.
+
+// CHECK-LABEL: func.func @shuffle_2d_vectors_no_change
+// CHECK-SAME:    %[[V1:.*]]: vector<2x4xf32>, %[[V2:.*]]: vector<3x4xf32>
+func.func @shuffle_2d_vectors_no_change(%v1: vector<2x4xf32>, %v2: vector<3x4xf32>) -> vector<4x4xf32> {
+  // CHECK-NOT: vector.shuffle %[[V1]], %[[V1]]
+  // CHECK-NOT: vector.shuffle %[[V2]], %[[V2]]
+  // CHECK: %[[RESULT:.*]] = vector.shuffle %[[V1]], %[[V2]] [0, 1, 2, 3] : vector<2x4xf32>, vector<3x4xf32>
+  // CHECK: return %[[RESULT]] : vector<4x4xf32>
+  %0 = vector.shuffle %v1, %v2 [0, 1, 2, 3] : vector<2x4xf32>, vector<3x4xf32>
+  return %0 : vector<4x4xf32>
+}
diff --git a/mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp b/mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp
index bb1598ee3efe5..d4a490cb83672 100644
--- a/mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp
+++ b/mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp
@@ -1040,6 +1040,22 @@ struct TestEliminateVectorMasks
                          VscaleRange{vscaleMin, vscaleMax});
   }
 };
+
+struct TestVectorShuffleLowering
+    : public PassWrapper<TestVectorShuffleLowering,
+                         OperationPass<func::FuncOp>> {
+  MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(TestVectorShuffleLowering)
+
+  StringRef getArgument() const final { return "test-vector-shuffle-lowering"; }
+  StringRef getDescription() const final {
+    return "Test lowering patterns for vector.shuffle with mixed-size inputs";
+  }
+  void runOnOperation() override {
+    RewritePatternSet patterns(&getContext());
+    populateVectorShuffleLoweringPatterns(patterns);
+    (void)applyPatternsGreedily(getOperation(), std::move(patterns));
+  }
+};
 } // namespace
 
 namespace mlir {
@@ -1071,6 +1087,8 @@ void registerTestVectorLowerings() {
 
   PassRegistration<TestVectorScanLowering>();
 
+  PassRegistration<TestVectorShuffleLowering>();
+
   PassRegistration<TestVectorDistribution>();
 
   PassRegistration<TestVectorExtractStridedSliceLowering>();

github-actions · 2025-09-09T04:16:24Z

✅ With the latest revision this PR passed the C/C++ code formatter.

banach-space

Nice, thank you! Just some minor points.

banach-space · 2025-09-15T08:52:06Z

mlir/lib/Dialect/Vector/Transforms/LowerVectorShuffle.cpp

Is this comment correct?

Not clear enough if you are asking :) Let me improve it

banach-space · 2025-09-15T08:55:42Z

mlir/test/Dialect/Vector/vector-shuffle-lowering.mlir

Suggested change

func.func @shuffle_2d_vectors_no_change(%v1: vector<2x4xf32>, %v2: vector<3x4xf32>) -> vector<4x4xf32> {

func.func @negative_shuffle_2d_vectors_no_change(%v1: vector<2x4xf32>, %v2: vector<3x4xf32>) -> vector<4x4xf32> {

https://mlir.llvm.org/getting_started/TestingGuide/#test-naming-convention

banach-space · 2025-09-15T08:57:41Z

mlir/test/Dialect/Vector/vector-shuffle-lowering.mlir

This would be a bit more descriptive IMHO

Suggested change

func.func @shuffle_v1_smaller_arbitrary(%v1: vector<2xf32>, %v2: vector<4xf32>) -> vector<5xf32> {

// CHECK: %[[PROMOTE_V1:.*]] = vector.shuffle %[[V1]], %[[V1]] [0, 1, -1, -1] : vector<2xf32>, vector<2xf32>

// CHECK: %[[RESULT:.*]] = vector.shuffle %[[PROMOTE_V1]], %[[V2]] [1, 5, 0, 6, 7] : vector<4xf32>, vector<4xf32>

// CHECK: return %[[RESULT]] : vector<5xf32>

%0 = vector.shuffle %v1, %v2 [1, 3, 0, 4, 5] : vector<2xf32>, vector<4xf32>

return %0 : vector<5xf32>

}

// -----

// CHECK-LABEL: func.func @shuffle_v2_smaller_arbitrary

// CHECK-SAME: %[[V1:.*]]: vector<4xi32>, %[[V2:.*]]: vector<2xi32>

func.func @shuffle_v2_smaller_arbitrary(%v1: vector<4xi32>, %v2: vector<2xi32>) -> vector<6xi32> {

// CHECK: %[[PROMOTE_V2:.*]] = vector.shuffle %[[V2]], %[[V2]] [0, 1, -1, -1] : vector<2xi32>, vector<2xi32>

// CHECK: %[[RESULT:.*]] = vector.shuffle %[[V1]], %[[PROMOTE_V2]] [3, 5, 1, 4, 0, 2] : vector<4xi32>, vector<4xi32>

// CHECK: return %[[RESULT]] : vector<6xi32>

%0 = vector.shuffle %v1, %v2 [3, 5, 1, 4, 0, 2] : vector<4xi32>, vector<2xi32>

return %0 : vector<6xi32>

}

func.func @shuffle_smaller_rhs_arbitrary(%v1: vector<2xf32>, %v2: vector<4xf32>) -> vector<5xf32> {

// CHECK: %[[PROMOTE_V1:.*]] = vector.shuffle %[[V1]], %[[V1]] [0, 1, -1, -1] : vector<2xf32>, vector<2xf32>

// CHECK: %[[RESULT:.*]] = vector.shuffle %[[PROMOTE_V1]], %[[V2]] [1, 5, 0, 6, 7] : vector<4xf32>, vector<4xf32>

// CHECK: return %[[RESULT]] : vector<5xf32>

%0 = vector.shuffle %v1, %v2 [1, 3, 0, 4, 5] : vector<2xf32>, vector<4xf32>

return %0 : vector<5xf32>

}

// -----

// CHECK-LABEL: func.func @shuffle_smaller_lhs_arbitrary

// CHECK-SAME: %[[V1:.*]]: vector<4xi32>, %[[V2:.*]]: vector<2xi32>

func.func @shuffle_smaller_lhs_arbitrary(%v1: vector<4xi32>, %v2: vector<2xi32>) -> vector<6xi32> {

// CHECK: %[[PROMOTE_V2:.*]] = vector.shuffle %[[V2]], %[[V2]] [0, 1, -1, -1] : vector<2xi32>, vector<2xi32>

// CHECK: %[[RESULT:.*]] = vector.shuffle %[[V1]], %[[PROMOTE_V2]] [3, 5, 1, 4, 0, 2] : vector<4xi32>, vector<4xi32>

// CHECK: return %[[RESULT]] : vector<6xi32>

%0 = vector.shuffle %v1, %v2 [3, 5, 1, 4, 0, 2] : vector<4xi32>, vector<2xi32>

return %0 : vector<6xi32>

}

banach-space · 2025-09-15T09:49:35Z

mlir/test/Dialect/Vector/vector-shuffle-lowering.mlir

Suggested change

func.func @shuffle_same_input_sizes(%v1: vector<4xf32>, %v2: vector<4xf32>) -> vector<6xf32> {

func.func @negative_shuffle_same_input_sizes(%v1: vector<4xf32>, %v2: vector<4xf32>) -> vector<6xf32> {

https://mlir.llvm.org/getting_started/TestingGuide/#test-naming-convention

banach-space · 2025-09-15T09:58:56Z

mlir/lib/Dialect/Vector/Transforms/LowerVectorShuffle.cpp

General remark about this PR - "lowering" would imply that some high-level representation is replaced with something lower level. But that's not quite what is happening here ATM, is it?

Even if this is to evolve towards "lowering", I'd choose a name more adequate to the current state of things and then rename later as things evolve.

We are lowering generic vector.shuffle ops into simpler/more constained versions of them (supported by LLVM/SPIR-V). We follow this approach/naming for other ops (e.g., shape_cast) so I was trying to be consistent with that pattern.

To me, this:

%0 = vector.shuffle %v1, %v1 [0, 1, -1, -1] : vector<2xf32>, vector<2xf32> %1 = vector.shuffle %0, %v2 [0, 4, 1, 5] : vector<4xf32>, vector<4xf32>

... is actually more complex than:

%0 = vector.shuffle %v1, %v2 [0, 2, 1, 3] : vector<2xf32>, vector<4xf32>

:) (the former involves 2 ops rather than one). Yes, the former form allows us to improve the lowering to LLVM/SPIR-V code, but IMHO that's just one possible use case.

Anyway, naming is hard 🤷🏻 If you want to keep the current name, could you expand the comment for MixSizeInputShuffleOpRewrite to clarify that that's to help with lowering to SPIR-V/LLVM? I just genuinely don't see it as "lowering" (might be just me), so some additional note would help 🙏🏻

banach-space · 2025-09-15T10:09:28Z

mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp

Is this added merely for testing?

My personal view is that TD ops scale better when it comes to testing - especially when a single populate* method is to be used (e.g. less boiler plate C++ required). Also, with TD Ops you can write (and nicely annotate) complex lowering pipelines, e.g.

llvm-project/mlir/test/Integration/Dialect/Linalg/CPU/ArmSVE/pack-unpack-scalable-inner-tile.mlir

Lines 142 to 183 in 5437d90

// 1.1 Tile the linalg.pack Op so that we can decompose it into e.g. tensor.pad

// and other lower-level Ops (see step 2.1)

%tiled_pack_op_p, %loops_pack:2 = transform.structured.tile_using_for %pack tile_sizes [1, 1]

: (!transform.any_op) -> (!transform.any_op, !transform.any_op, !transform.any_op)

// 1.2 Tile the linalg.unpack Op so that we can decompose it into e.g. tensor.pad

// and other lower-level Ops (see step 2)

%tiled_unpack_op_p, %loops_unpack:2 = transform.structured.tile_using_for %unpack tile_sizes [4, 1]

: (!transform.any_op) -> (!transform.any_op, !transform.any_op, !transform.any_op)

// 2.1. Decompose tiled PackOp into lower-level Ops

%func_op_pack = transform.get_parent_op %tiled_pack_op_p {isolated_from_above} : (!transform.any_op) -> !transform.op<"func.func">

transform.apply_patterns to %func_op_pack {

transform.apply_patterns.linalg.decompose_pack_unpack

transform.apply_patterns.linalg.decompose_pad

} : !transform.op<"func.func">

transform.apply_patterns to %func_op_pack {

transform.apply_patterns.tensor.fold_tensor_subset_ops

transform.apply_patterns.canonicalization

} : !transform.op<"func.func">

// 2.1. Decompose tiled UnpackOp into lower-level Ops

%func_op_unpack = transform.get_parent_op %tiled_unpack_op_p {isolated_from_above} : (!transform.any_op) -> !transform.op<"func.func">

transform.apply_patterns to %func_op_unpack {

transform.apply_patterns.linalg.decompose_pack_unpack

} : !transform.op<"func.func">

transform.apply_patterns to %func_op_unpack {

transform.apply_patterns.tensor.fold_tensor_subset_ops

transform.apply_patterns.canonicalization

} : !transform.op<"func.func">

// 3. Bufferize before lowering to LLVM

%bufferize = transform.bufferization.one_shot_bufferize %module

{bufferize_function_boundaries=true} : (!transform.any_op) -> !transform.any_op

// 4. Canonicalize

%func_op_bufferized = transform.structured.match ops{["func.func"]} in %bufferize : (!transform.any_op) -> !transform.op<"func.func">

transform.apply_patterns to %func_op_bufferized {

transform.apply_patterns.canonicalization

} : !transform.op<"func.func">

Basically, I do end-up re-using TD Ops in other places. I very rarely re-use these "test passes". We don't have any official guidelines, but if you don't mind, I'd prefer a TD Op for testing :)

I understand the preference but I'd stick to the existing guidelines. I personally don't think adding a TD op for every populate scales better or reduces boilerplate code.

I'd stick to the existing guidelines.

Do we have guidelines for this? 🤔

Implicit, I guess? We can ask on Discourse

banach-space

LG, thanks for addressing my comments!

My other outstanding comment is more of a general discussion. Though I would appreciate some extra documentation near where MixSizeInputShuffleOpRewrite is defined (see my inline comment re "lowering").

This PR adds patterns to lower `vector.shuffle` with inputs with different vector sizes more efficiently. The current LLVM lowering for these cases degenerates to a sequence of `vector.extract` and `vector.insert` operations. With this PR, the smaller input is promoted to larger vector size by introducing an extra `vector.shuffle`.

dcaballe requested a review from newling September 9, 2025 04:12

dcaballe requested review from Groverkss, banach-space, hanhanW and nicolasvasilache as code owners September 9, 2025 04:12

llvmbot added mlir:vectorops mlir mlir:vector labels Sep 9, 2025

dcaballe force-pushed the vector-shuffle-lowering branch from d316bec to 0ebcf00 Compare September 9, 2025 04:21

banach-space reviewed Sep 15, 2025

View reviewed changes

dcaballe force-pushed the vector-shuffle-lowering branch from 0ebcf00 to a9da663 Compare September 15, 2025 19:11

banach-space approved these changes Sep 16, 2025

View reviewed changes

dcaballe added 3 commits September 17, 2025 00:04

Feedback

76cf138

Feedback

82ac739

dcaballe force-pushed the vector-shuffle-lowering branch from a9da663 to 82ac739 Compare September 17, 2025 00:17

dcaballe merged commit 7bdd88c into llvm:main Sep 17, 2025
9 checks passed

	func.func @shuffle_2d_vectors_no_change(%v1: vector<2x4xf32>, %v2: vector<3x4xf32>) -> vector<4x4xf32> {
	func.func @negative_shuffle_2d_vectors_no_change(%v1: vector<2x4xf32>, %v2: vector<3x4xf32>) -> vector<4x4xf32> {

	func.func @shuffle_same_input_sizes(%v1: vector<4xf32>, %v2: vector<4xf32>) -> vector<6xf32> {
	func.func @negative_shuffle_same_input_sizes(%v1: vector<4xf32>, %v2: vector<4xf32>) -> vector<6xf32> {

	// 1.1 Tile the linalg.pack Op so that we can decompose it into e.g. tensor.pad
	// and other lower-level Ops (see step 2.1)
	%tiled_pack_op_p, %loops_pack:2 = transform.structured.tile_using_for %pack tile_sizes [1, 1]
	: (!transform.any_op) -> (!transform.any_op, !transform.any_op, !transform.any_op)

	// 1.2 Tile the linalg.unpack Op so that we can decompose it into e.g. tensor.pad
	// and other lower-level Ops (see step 2)
	%tiled_unpack_op_p, %loops_unpack:2 = transform.structured.tile_using_for %unpack tile_sizes [4, 1]
	: (!transform.any_op) -> (!transform.any_op, !transform.any_op, !transform.any_op)

	// 2.1. Decompose tiled PackOp into lower-level Ops
	%func_op_pack = transform.get_parent_op %tiled_pack_op_p {isolated_from_above} : (!transform.any_op) -> !transform.op<"func.func">
	transform.apply_patterns to %func_op_pack {
	transform.apply_patterns.linalg.decompose_pack_unpack
	transform.apply_patterns.linalg.decompose_pad
	} : !transform.op<"func.func">

	transform.apply_patterns to %func_op_pack {
	transform.apply_patterns.tensor.fold_tensor_subset_ops
	transform.apply_patterns.canonicalization
	} : !transform.op<"func.func">

	// 2.1. Decompose tiled UnpackOp into lower-level Ops
	%func_op_unpack = transform.get_parent_op %tiled_unpack_op_p {isolated_from_above} : (!transform.any_op) -> !transform.op<"func.func">
	transform.apply_patterns to %func_op_unpack {
	transform.apply_patterns.linalg.decompose_pack_unpack
	} : !transform.op<"func.func">

	transform.apply_patterns to %func_op_unpack {
	transform.apply_patterns.tensor.fold_tensor_subset_ops
	transform.apply_patterns.canonicalization
	} : !transform.op<"func.func">

	// 3. Bufferize before lowering to LLVM
	%bufferize = transform.bufferization.one_shot_bufferize %module
	{bufferize_function_boundaries=true} : (!transform.any_op) -> !transform.any_op

	// 4. Canonicalize
	%func_op_bufferized = transform.structured.match ops{["func.func"]} in %bufferize : (!transform.any_op) -> !transform.op<"func.func">
	transform.apply_patterns to %func_op_bufferized {
	transform.apply_patterns.canonicalization
	} : !transform.op<"func.func">

[mlir][Vector] Add patterns to lower vector.shuffle #157611

[mlir][Vector] Add patterns to lower vector.shuffle #157611

Uh oh!

Conversation

dcaballe commented Sep 9, 2025

Uh oh!

llvmbot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

banach-space left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

banach-space left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[mlir][Vector] Add patterns to lower `vector.shuffle` #157611

[mlir][Vector] Add patterns to lower `vector.shuffle` #157611

llvmbot commented Sep 9, 2025 •

edited

Loading

github-actions bot commented Sep 9, 2025 •

edited

Loading