Skip to content

Commit 672e1de

Browse files
CopilothanhanW
authored andcommitted
[Codegen] Add vector size inference for ukernel operations. (iree-org#22440)
## Summary This change enables vector size inference for `iree_codegen.ukernel.generic` operations, allowing downstream operations like `linalg.unpack` to properly vectorize when consuming ukernel outputs. ## Problem The vector size inference mechanism in `inferSizesFromIR` did not handle `UKernelGenericOp`, causing operations that consume ukernel outputs to fail vectorization. This resulted in scalar code generation instead of vectorized operations, leading to suboptimal performance. Example: An `linalg.unpack` operation following a `iree_uk_mmt4d` ukernel would use vector sizes of `[1, 1]` instead of the correct `[16, 16]` based on the ukernel's output shape. ## Solution Added a case in the `inferSizesFromIR` TypeSwitch to handle `UKernelGenericOp`: - Infer vector sizes directly from the result tensor type shape - Handle dynamic dimensions using `ValueBoundsConstraintSet` - Return `VectorizationTileSizes` with both `vectorSizes` and `destShape` populated ## Changes **compiler/src/iree/compiler/Codegen/Utils/Utils.cpp:** - Added `#include "iree/compiler/Codegen/Dialect/Codegen/IR/UKernelOps.h"` - Added standalone `inferSizesFromIR(UKernelGenericOp, OpResult)` function following the pattern of other operations - Updated TypeSwitch in `inferSizesFromIR(Value val)` to call the new function - Used `llvm::enumerate` with `static_cast<unsigned>` for dimension indexing **compiler/src/iree/compiler/Codegen/Utils/Utils.h:** - Added `#include "iree/compiler/Codegen/Dialect/Codegen/IR/UKernelOps.h"` in proper alphabetical order - Added public function declaration for `inferSizesFromIR(UKernelGenericOp, OpResult)` **compiler/src/iree/compiler/Codegen/Common/test/generic_vectorization.mlir:** - Added test `@ukernel_unpack_infer_vector_sizes` verifying vectorization of unpack after ukernel ## Testing The test verifies that: 1. A ukernel operation produces a `tensor<1x1x16x16xf32>` 2. The following unpack operation correctly infers vector sizes and vectorizes to `vector<16x16xf32>` 3. The vectorization uses `vector.transfer_read`, `vector.shape_cast`, and `vector.transfer_write` operations Run test with: ```bash iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-generic-vectorization{enable-vector-masking=true}))" test.mlir ``` This change bridges the gap in the vector size inference mechanism, enabling proper vectorization for any operations that consume ukernel outputs, not just unpack operations. Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: hanhanW <[email protected]>
1 parent c1a373b commit 672e1de

File tree

3 files changed

+73
-0
lines changed

3 files changed

+73
-0
lines changed

compiler/src/iree/compiler/Codegen/Common/test/generic_vectorization.mlir

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -811,3 +811,28 @@ func.func @negative_no_vectorize_large_vector(%arg0 : tensor<1x9007199254740991x
811811
// CHECK-MASK: } -> tensor<1x9007199254740991xf32>
812812
// CHECK-MASK: return %[[VAL_2]] : tensor<1x9007199254740991xf32>
813813
// CHECK-MASK: }
814+
815+
// -----
816+
817+
// Test that unpack operations following ukernel operations can properly infer
818+
// vector sizes from the ukernel output shape.
819+
820+
func.func @ukernel_unpack_infer_vector_sizes(%lhs: tensor<1x8x16x1xf32>, %rhs: tensor<1x8x16x1xf32>, %dest: tensor<16x16xf32>) -> tensor<16x16xf32> {
821+
%init = tensor.empty() : tensor<1x1x16x16xf32>
822+
%ukernel = iree_codegen.ukernel.generic "foo"
823+
ins(%lhs, %rhs : tensor<1x8x16x1xf32>, tensor<1x8x16x1xf32>)
824+
outs(%init : tensor<1x1x16x16xf32>)
825+
-> tensor<1x1x16x16xf32>
826+
%unpack = linalg.unpack %ukernel
827+
outer_dims_perm = [0, 1]
828+
inner_dims_pos = [0, 1]
829+
inner_tiles = [16, 16]
830+
into %dest
831+
: tensor<1x1x16x16xf32> -> tensor<16x16xf32>
832+
return %unpack : tensor<16x16xf32>
833+
}
834+
// CHECK-MASK-LABEL: func.func @ukernel_unpack_infer_vector_sizes
835+
// CHECK-MASK: %[[UKERNEL:.*]] = iree_codegen.ukernel.generic "foo"
836+
// CHECK-MASK: %[[READ:.*]] = vector.transfer_read %[[UKERNEL]]{{.*}} : tensor<1x1x16x16xf32>, vector<1x1x16x16xf32>
837+
// CHECK-MASK: %[[CAST:.*]] = vector.shape_cast %[[READ]] : vector<1x1x16x16xf32> to vector<16x16xf32>
838+
// CHECK-MASK: vector.transfer_write %[[CAST]]{{.*}} : vector<16x16xf32>, tensor<16x16xf32>

compiler/src/iree/compiler/Codegen/Utils/Utils.cpp

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
#include "iree/compiler/Codegen/Utils/Utils.h"
88

99
#include "iree/compiler/Codegen/Dialect/Codegen/IR/IREECodegenAttrs.h"
10+
#include "iree/compiler/Codegen/Dialect/Codegen/IR/UKernelOps.h"
1011
#include "iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUDialect.h"
1112
#include "iree/compiler/Codegen/Interfaces/ProcessorOpInterfaces.h"
1213
#include "iree/compiler/Codegen/Interfaces/UKernelOpInterface.h"
@@ -1912,6 +1913,44 @@ std::optional<VectorizationTileSizes> inferSizesFromIR(linalg::UnPackOp op) {
19121913
return result;
19131914
}
19141915

1916+
std::optional<VectorizationTileSizes>
1917+
inferSizesFromIR(IREE::Codegen::UKernelGenericOp ukernelOp, OpResult opResult) {
1918+
LDBG() << "Inferring dest sizes for: " << ukernelOp;
1919+
auto resultType = dyn_cast<RankedTensorType>(opResult.getType());
1920+
if (!resultType) {
1921+
LDBG()
1922+
<< "failed to infer sizes because result type is not a ranked tensor";
1923+
return std::nullopt;
1924+
}
1925+
1926+
VectorizationTileSizes result;
1927+
for (auto [idx, dim] : llvm::enumerate(resultType.getShape())) {
1928+
if (ShapedType::isDynamic(dim)) {
1929+
FailureOr<int64_t> maybeDimBound =
1930+
ValueBoundsConstraintSet::computeConstantBound(
1931+
presburger::BoundType::UB, {opResult, static_cast<unsigned>(idx)},
1932+
/*stopCondition=*/nullptr, /*closedUB=*/true);
1933+
if (failed(maybeDimBound)) {
1934+
LDBG() << "failed to infer bounds for dynamic dim";
1935+
return std::nullopt;
1936+
}
1937+
result.vectorSizes.push_back(maybeDimBound.value());
1938+
} else {
1939+
result.vectorSizes.push_back(dim);
1940+
}
1941+
}
1942+
result.destShape = result.vectorSizes;
1943+
1944+
LLVM_DEBUG({
1945+
LDBG() << "Inferred vector sizes:";
1946+
for (auto [idx, val] : llvm::enumerate(result.vectorSizes)) {
1947+
LDBG() << "Dim #" << idx << ": " << val;
1948+
}
1949+
});
1950+
1951+
return result;
1952+
}
1953+
19151954
std::optional<VectorizationTileSizes> static inferSizesFromMixedSizes(
19161955
SmallVector<OpFoldResult> shape) {
19171956
VectorizationTileSizes result;
@@ -1948,6 +1987,8 @@ std::optional<VectorizationTileSizes> inferSizesFromIR(Value val) {
19481987
// the values.
19491988
result = inferSizesFromMixedSizes(op.getMixedSizes());
19501989
})
1990+
.Case<IREE::Codegen::UKernelGenericOp>(
1991+
[&](auto op) { result = inferSizesFromIR(op, cast<OpResult>(val)); })
19511992
.Default([&](Operation *) {});
19521993

19531994
return result;

compiler/src/iree/compiler/Codegen/Utils/Utils.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99

1010
#include "iree/compiler/Codegen/Dialect/Codegen/IR/IREECodegenAttrs.h"
1111
#include "iree/compiler/Codegen/Dialect/Codegen/IR/IREECodegenInterfaces.h"
12+
#include "iree/compiler/Codegen/Dialect/Codegen/IR/UKernelOps.h"
1213
#include "iree/compiler/Dialect/HAL/IR/HALOps.h"
1314
#include "iree/compiler/Dialect/LinalgExt/IR/LinalgExtOps.h"
1415
#include "iree/compiler/Dialect/TensorExt/IR/TensorExtOps.h"
@@ -365,6 +366,12 @@ inferSizesFromIR(linalg::LinalgOp linalgOp, std::optional<OpResult> opResult);
365366
std::optional<VectorizationTileSizes> inferSizesFromIR(scf::ForOp forOp,
366367
OpResult opResult);
367368

369+
/// Returns the result sizes and vector input sizes of the ukernel.generic op.
370+
/// The inferred bounding size is returned if it is dynamic shape. Returns
371+
/// std::nullopt if the shape inference failed.
372+
std::optional<VectorizationTileSizes>
373+
inferSizesFromIR(IREE::Codegen::UKernelGenericOp ukernelOp, OpResult opResult);
374+
368375
/// Returns the underlying index if the given value is a constant index.
369376
std::optional<int64_t> getConstantIndex(Value value);
370377

0 commit comments

Comments
 (0)