[mlir] Clamp UnPackOp tiling sizes from operand tile #112429

Max191 · 2024-10-15T20:09:14Z

The getIterationDomainTileFromOperandTile implementation for tensor.unpack did not clamp sizes when the unpack op had extract_slice semantics. This PR fixes the bug.

The PR also makes a minor change to tileAndFuseConsumerOfSlice. When replacing DPS inits, the iteration domain is needed, and it is computed from the tiled version of the operation after the initial tiling transformation. This can result in some extra indexing computation, so the PR changes it to use the original full sized cloned consumer op.

llvmbot · 2024-10-15T20:09:46Z

@llvm/pr-subscribers-mlir-scf

@llvm/pr-subscribers-mlir

Author: None (Max191)

Changes

The getIterationDomainTileFromOperandTile implementation for tensor.unpack did not clamp sizes when the unpack op had extract_slice semantics. This PR fixes the bug.

The PR also makes a minor change to tileAndFuseConsumerOfSlice. When replacing DPS inits, the iteration domain is needed, and it is computed from the tiled version of the operation after the initial tiling transformation. This can result in some extra indexing computation, so the PR changes it to use the original full sized cloned consumer op.

Full diff: https://github.com/llvm/llvm-project/pull/112429.diff

3 Files Affected:

(modified) mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp (+7-3)
(modified) mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp (+20-5)
(modified) mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir (+8-6)

diff --git a/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp b/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
index e2feb10b314540..1f53d595c372cd 100644
--- a/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
+++ b/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
@@ -1880,13 +1880,17 @@ mlir::scf::tileAndFuseConsumerOfSlice(RewriterBase &rewriter,
           candidateSliceOp, "containingOp's result yield with stride");
     }
 
-    // 10. Try to get iter domain position from input position.
+    // 10. Try to get iter domain position from input position. Use
+    // clonedConsumerOp instead of tiledConsumerOp, because the iteration domain
+    // may require index computation based on the result size. The sizes and
+    // offsets should be the same either way, but using tiledConsumerOp could
+    // lead to some chained unnecessary extra index computation.
     SmallVector<OpFoldResult> iterDomainOffsets, iterDomainSizes;
-    if (failed(tiledConsumerOp.getIterationDomainTileFromOperandTile(
+    if (failed(clonedConsumerOp.getIterationDomainTileFromOperandTile(
             rewriter, operandNumber, offsets, sizes, iterDomainOffsets,
             iterDomainSizes))) {
       return rewriter.notifyMatchFailure(
-          tiledConsumerOp,
+          clonedConsumerOp,
           "can't get iter domain position from input position");
     }
 
diff --git a/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp b/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp
index 104d6ae1f9f6b5..7cfd608508537b 100644
--- a/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp
+++ b/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp
@@ -16,6 +16,7 @@
 #include "mlir/Dialect/Tensor/IR/Tensor.h"
 #include "mlir/Dialect/Tensor/Utils/Utils.h"
 #include "mlir/Dialect/Utils/IndexingUtils.h"
+#include "mlir/Interfaces/InferTypeOpInterface.h"
 #include "mlir/Interfaces/TilingInterface.h"
 #include "mlir/Interfaces/ValueBoundsOpInterface.h"
 
@@ -621,6 +622,12 @@ struct UnPackOpTiling
       SmallVectorImpl<OpFoldResult> &resultOffsets,
       SmallVectorImpl<OpFoldResult> &resultSizes) const {
     auto unPackOp = cast<UnPackOp>(op);
+    // If the operand tile is the dest, then no adjustment is needed.
+    if (operandNumber == unPackOp.getDestMutable().getOperandNumber()) {
+      resultOffsets = llvm::to_vector(offsets);
+      resultSizes = llvm::to_vector(sizes);
+      return success();
+    }
     Location loc = unPackOp.getLoc();
 
     int64_t numTiles = unPackOp.getInnerDimsPos().size();
@@ -629,6 +636,11 @@ struct UnPackOpTiling
     // The tiling is applied on interchanged dimensions. We have to undo the
     // interchange to map sizes and offsets to the original input.
     int64_t outputRank = unPackOp.getDestRank();
+    ReifiedRankedShapedTypeDims reifiedReturnShapes;
+    if (failed(reifyResultShapes(b, unPackOp, reifiedReturnShapes))) {
+      return failure();
+    }
+    SmallVector<OpFoldResult> outputMixedSizes = reifiedReturnShapes.front();
     SmallVector<OpFoldResult> origOffsets(destOffsets);
     SmallVector<OpFoldResult> origSizes(destSizes);
     applyPermToRange(origOffsets, origSizes,
@@ -640,18 +652,21 @@ struct UnPackOpTiling
     for (auto dim : llvm::seq<int64_t>(0, outputRank)) {
       using AV = affine::AffineValueExpr;
       affine::AffineBuilder ab(b, loc);
-      AffineExpr dim0, dim1, sym;
+      AffineExpr dim0, dim1, sym0;
       bindDims(b.getContext(), dim0, dim1);
-      bindSymbols(b.getContext(), sym);
+      bindSymbols(b.getContext(), sym0);
       if (dimAndTileMapping.count(dim)) {
         // If the data dimension is tiled, the i-th index is the product of
         // offset_i and tile_i, and the i-th size is the product of sizes_i and
-        // tile_i.
+        // tile_i. The sizes must be clamped to the sizes of the unpack result.
         auto avOffset = AV(dim0).bind(origOffsets[dim]);
         auto avSize = AV(dim0).bind(origSizes[dim]);
-        auto avTileSize = AV(sym).bind(dimAndTileMapping[dim]);
+        auto avTileSize = AV(sym0).bind(dimAndTileMapping[dim]);
+        auto avResultSize = AV(dim0).bind(outputMixedSizes[dim]);
         resultOffsets.push_back(ab.mul(avOffset, avTileSize));
-        resultSizes.push_back(ab.mul(avSize, avTileSize));
+        auto avResultOffset = AV(dim1).bind(resultOffsets.back());
+        resultSizes.push_back(ab.min({ab.mul(avSize, avTileSize),
+                                      ab.sub(avResultSize, avResultOffset)}));
       } else {
         resultOffsets.push_back(origOffsets[dim]);
         resultSizes.push_back(origSizes[dim]);
diff --git a/mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir b/mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir
index f5f703d95e2d5b..d4ed3e3419a70a 100644
--- a/mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir
+++ b/mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir
@@ -265,7 +265,7 @@ module {
         %c4 = arith.constant 4 : index
         %c64 = arith.constant 64 : index
         %c0 = arith.constant 0 : index
-        %1 = scf.forall (%arg3, %arg4) in (2, 2) shared_outs(%arg5 = %arg2) -> (tensor<64x32xf32>) {
+        %1 = scf.forall (%arg3, %arg4) = (0, 0) to (64, 32) step (32, 32) shared_outs(%arg5 = %arg2) -> (tensor<64x32xf32>) {
             %extracted_slice = tensor.extract_slice %arg5[%arg3, %arg4] [32, 32] [1, 1] : tensor<64x32xf32> to tensor<32x32xf32>
             %3 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%arg0, %arg1 : tensor<32x32xf32>, tensor<32x32xf32>) outs(%extracted_slice : tensor<32x32xf32>) {
                 ^bb0(%in: f32, %in_16: f32, %out: f32):
@@ -292,26 +292,28 @@ module attributes {transform.with_named_sequence} {
         transform.yield
     }
 }
-//      CHECK: #[[UNPACK_RESULT_MAP:.*]] = affine_map<(d0) -> (d0 * 32)>
+//  CHECK-DAG: #[[UNPACK_RESULT_OFFSET_MAP:.*]] = affine_map<(d0) -> (d0 * 32)>
+//  CHECK-DAG: #[[UNPACK_RESULT_SIZE_MAP:.*]] = affine_map<(d0) -> (1024, d0 * -32 + 2048)>
 //      CHECK: func.func @fuse_unpack_consumer_into_scf_forall(
 // CHECK-SAME:     %[[ARG0:[a-zA-Z0-9]+]]: tensor<32x32xf32>
 // CHECK-SAME:     %[[ARG1:[a-zA-Z0-9]+]]: tensor<32x32xf32>
 // CHECK-SAME:     %[[ARG2:[a-zA-Z0-9]+]]: tensor<64x32xf32>)
 //      CHECK:   %[[OUT_INIT:.*]] = tensor.empty() : tensor<2048xf32>
-//      CHECK:   %[[FINAL_RESULT:.*]]:2 = scf.forall (%[[IV1:.*]], %[[IV2:.*]]) in (2, 2)
+//      CHECK:   %[[FINAL_RESULT:.*]]:2 = scf.forall (%[[IV1:.*]], %[[IV2:.*]]) = (0, 0) to (64, 32) step (32, 32)
 // CHECK-SAME:      shared_outs(%[[FIRST_OUT_ARG:.*]] = %[[ARG2]], %[[UNPACK_OUT_ARG:.*]] = %[[OUT_INIT]])
 // CHECK-SAME:   {
 //      CHECK:      %[[GENERIC_OUT_SLICE:.*]] = tensor.extract_slice %[[FIRST_OUT_ARG]][%[[IV1]], %[[IV2]]] [32, 32] [1, 1]
 //      CHECK:      %[[GENERIC_OUT:.*]] = linalg.generic
 // CHECK-SAME:              outs(%[[GENERIC_OUT_SLICE]] :
-//      CHECK:      %[[UNPACK_RESULT_OFFSET:.*]] = affine.apply #[[UNPACK_RESULT_MAP]](%[[IV1]])
-//      CHECK:      %[[TILED_UNPACK_DEST:.*]] = tensor.extract_slice %[[UNPACK_OUT_ARG]][%[[UNPACK_RESULT_OFFSET]]] [1024] [1]
+//  CHECK-DAG:      %[[UNPACK_RESULT_OFFSET:.*]] = affine.apply #[[UNPACK_RESULT_OFFSET_MAP]](%[[IV1]])
+//  CHECK-DAG:      %[[UNPACK_RESULT_SIZE:.*]] = affine.min #[[UNPACK_RESULT_SIZE_MAP]](%[[IV1]])
+//      CHECK:      %[[TILED_UNPACK_DEST:.*]] = tensor.extract_slice %[[UNPACK_OUT_ARG]][%[[UNPACK_RESULT_OFFSET]]] [%[[UNPACK_RESULT_SIZE]]] [1]
 //      CHECK:      %[[TILED_UNPACK_OUT:.*]] = tensor.unpack %[[GENERIC_OUT]]
 // CHECK-SAME:                              outer_dims_perm = [0] inner_dims_pos = [0] inner_tiles = [32]
 // CHECK-SAME:                              into %[[TILED_UNPACK_DEST]]
 //      CHECK:      scf.forall.in_parallel {
 //      CHECK:          tensor.parallel_insert_slice %[[GENERIC_OUT]] into %[[FIRST_OUT_ARG]][%[[IV1]], %[[IV2]]] [32, 32] [1, 1]
-//      CHECK:          tensor.parallel_insert_slice %[[TILED_UNPACK_OUT]] into %[[UNPACK_OUT_ARG]][%[[UNPACK_RESULT_OFFSET]]] [1024] [1]
+//      CHECK:          tensor.parallel_insert_slice %[[TILED_UNPACK_OUT]] into %[[UNPACK_OUT_ARG]][%[[UNPACK_RESULT_OFFSET]]] [%[[UNPACK_RESULT_SIZE]]] [1]
 //      CHECK:       }
 //      CHECK:   }
 //      CHECK:   return %[[FINAL_RESULT]]#1 :

llvmbot · 2024-10-15T20:09:46Z

@llvm/pr-subscribers-mlir-tensor

Author: None (Max191)

Changes

The getIterationDomainTileFromOperandTile implementation for tensor.unpack did not clamp sizes when the unpack op had extract_slice semantics. This PR fixes the bug.

The PR also makes a minor change to tileAndFuseConsumerOfSlice. When replacing DPS inits, the iteration domain is needed, and it is computed from the tiled version of the operation after the initial tiling transformation. This can result in some extra indexing computation, so the PR changes it to use the original full sized cloned consumer op.

Full diff: https://github.com/llvm/llvm-project/pull/112429.diff

3 Files Affected:

(modified) mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp (+7-3)
(modified) mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp (+20-5)
(modified) mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir (+8-6)

diff --git a/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp b/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
index e2feb10b314540..1f53d595c372cd 100644
--- a/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
+++ b/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
@@ -1880,13 +1880,17 @@ mlir::scf::tileAndFuseConsumerOfSlice(RewriterBase &rewriter,
           candidateSliceOp, "containingOp's result yield with stride");
     }
 
-    // 10. Try to get iter domain position from input position.
+    // 10. Try to get iter domain position from input position. Use
+    // clonedConsumerOp instead of tiledConsumerOp, because the iteration domain
+    // may require index computation based on the result size. The sizes and
+    // offsets should be the same either way, but using tiledConsumerOp could
+    // lead to some chained unnecessary extra index computation.
     SmallVector<OpFoldResult> iterDomainOffsets, iterDomainSizes;
-    if (failed(tiledConsumerOp.getIterationDomainTileFromOperandTile(
+    if (failed(clonedConsumerOp.getIterationDomainTileFromOperandTile(
             rewriter, operandNumber, offsets, sizes, iterDomainOffsets,
             iterDomainSizes))) {
       return rewriter.notifyMatchFailure(
-          tiledConsumerOp,
+          clonedConsumerOp,
           "can't get iter domain position from input position");
     }
 
diff --git a/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp b/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp
index 104d6ae1f9f6b5..7cfd608508537b 100644
--- a/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp
+++ b/mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp
@@ -16,6 +16,7 @@
 #include "mlir/Dialect/Tensor/IR/Tensor.h"
 #include "mlir/Dialect/Tensor/Utils/Utils.h"
 #include "mlir/Dialect/Utils/IndexingUtils.h"
+#include "mlir/Interfaces/InferTypeOpInterface.h"
 #include "mlir/Interfaces/TilingInterface.h"
 #include "mlir/Interfaces/ValueBoundsOpInterface.h"
 
@@ -621,6 +622,12 @@ struct UnPackOpTiling
       SmallVectorImpl<OpFoldResult> &resultOffsets,
       SmallVectorImpl<OpFoldResult> &resultSizes) const {
     auto unPackOp = cast<UnPackOp>(op);
+    // If the operand tile is the dest, then no adjustment is needed.
+    if (operandNumber == unPackOp.getDestMutable().getOperandNumber()) {
+      resultOffsets = llvm::to_vector(offsets);
+      resultSizes = llvm::to_vector(sizes);
+      return success();
+    }
     Location loc = unPackOp.getLoc();
 
     int64_t numTiles = unPackOp.getInnerDimsPos().size();
@@ -629,6 +636,11 @@ struct UnPackOpTiling
     // The tiling is applied on interchanged dimensions. We have to undo the
     // interchange to map sizes and offsets to the original input.
     int64_t outputRank = unPackOp.getDestRank();
+    ReifiedRankedShapedTypeDims reifiedReturnShapes;
+    if (failed(reifyResultShapes(b, unPackOp, reifiedReturnShapes))) {
+      return failure();
+    }
+    SmallVector<OpFoldResult> outputMixedSizes = reifiedReturnShapes.front();
     SmallVector<OpFoldResult> origOffsets(destOffsets);
     SmallVector<OpFoldResult> origSizes(destSizes);
     applyPermToRange(origOffsets, origSizes,
@@ -640,18 +652,21 @@ struct UnPackOpTiling
     for (auto dim : llvm::seq<int64_t>(0, outputRank)) {
       using AV = affine::AffineValueExpr;
       affine::AffineBuilder ab(b, loc);
-      AffineExpr dim0, dim1, sym;
+      AffineExpr dim0, dim1, sym0;
       bindDims(b.getContext(), dim0, dim1);
-      bindSymbols(b.getContext(), sym);
+      bindSymbols(b.getContext(), sym0);
       if (dimAndTileMapping.count(dim)) {
         // If the data dimension is tiled, the i-th index is the product of
         // offset_i and tile_i, and the i-th size is the product of sizes_i and
-        // tile_i.
+        // tile_i. The sizes must be clamped to the sizes of the unpack result.
         auto avOffset = AV(dim0).bind(origOffsets[dim]);
         auto avSize = AV(dim0).bind(origSizes[dim]);
-        auto avTileSize = AV(sym).bind(dimAndTileMapping[dim]);
+        auto avTileSize = AV(sym0).bind(dimAndTileMapping[dim]);
+        auto avResultSize = AV(dim0).bind(outputMixedSizes[dim]);
         resultOffsets.push_back(ab.mul(avOffset, avTileSize));
-        resultSizes.push_back(ab.mul(avSize, avTileSize));
+        auto avResultOffset = AV(dim1).bind(resultOffsets.back());
+        resultSizes.push_back(ab.min({ab.mul(avSize, avTileSize),
+                                      ab.sub(avResultSize, avResultOffset)}));
       } else {
         resultOffsets.push_back(origOffsets[dim]);
         resultSizes.push_back(origSizes[dim]);
diff --git a/mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir b/mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir
index f5f703d95e2d5b..d4ed3e3419a70a 100644
--- a/mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir
+++ b/mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir
@@ -265,7 +265,7 @@ module {
         %c4 = arith.constant 4 : index
         %c64 = arith.constant 64 : index
         %c0 = arith.constant 0 : index
-        %1 = scf.forall (%arg3, %arg4) in (2, 2) shared_outs(%arg5 = %arg2) -> (tensor<64x32xf32>) {
+        %1 = scf.forall (%arg3, %arg4) = (0, 0) to (64, 32) step (32, 32) shared_outs(%arg5 = %arg2) -> (tensor<64x32xf32>) {
             %extracted_slice = tensor.extract_slice %arg5[%arg3, %arg4] [32, 32] [1, 1] : tensor<64x32xf32> to tensor<32x32xf32>
             %3 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%arg0, %arg1 : tensor<32x32xf32>, tensor<32x32xf32>) outs(%extracted_slice : tensor<32x32xf32>) {
                 ^bb0(%in: f32, %in_16: f32, %out: f32):
@@ -292,26 +292,28 @@ module attributes {transform.with_named_sequence} {
         transform.yield
     }
 }
-//      CHECK: #[[UNPACK_RESULT_MAP:.*]] = affine_map<(d0) -> (d0 * 32)>
+//  CHECK-DAG: #[[UNPACK_RESULT_OFFSET_MAP:.*]] = affine_map<(d0) -> (d0 * 32)>
+//  CHECK-DAG: #[[UNPACK_RESULT_SIZE_MAP:.*]] = affine_map<(d0) -> (1024, d0 * -32 + 2048)>
 //      CHECK: func.func @fuse_unpack_consumer_into_scf_forall(
 // CHECK-SAME:     %[[ARG0:[a-zA-Z0-9]+]]: tensor<32x32xf32>
 // CHECK-SAME:     %[[ARG1:[a-zA-Z0-9]+]]: tensor<32x32xf32>
 // CHECK-SAME:     %[[ARG2:[a-zA-Z0-9]+]]: tensor<64x32xf32>)
 //      CHECK:   %[[OUT_INIT:.*]] = tensor.empty() : tensor<2048xf32>
-//      CHECK:   %[[FINAL_RESULT:.*]]:2 = scf.forall (%[[IV1:.*]], %[[IV2:.*]]) in (2, 2)
+//      CHECK:   %[[FINAL_RESULT:.*]]:2 = scf.forall (%[[IV1:.*]], %[[IV2:.*]]) = (0, 0) to (64, 32) step (32, 32)
 // CHECK-SAME:      shared_outs(%[[FIRST_OUT_ARG:.*]] = %[[ARG2]], %[[UNPACK_OUT_ARG:.*]] = %[[OUT_INIT]])
 // CHECK-SAME:   {
 //      CHECK:      %[[GENERIC_OUT_SLICE:.*]] = tensor.extract_slice %[[FIRST_OUT_ARG]][%[[IV1]], %[[IV2]]] [32, 32] [1, 1]
 //      CHECK:      %[[GENERIC_OUT:.*]] = linalg.generic
 // CHECK-SAME:              outs(%[[GENERIC_OUT_SLICE]] :
-//      CHECK:      %[[UNPACK_RESULT_OFFSET:.*]] = affine.apply #[[UNPACK_RESULT_MAP]](%[[IV1]])
-//      CHECK:      %[[TILED_UNPACK_DEST:.*]] = tensor.extract_slice %[[UNPACK_OUT_ARG]][%[[UNPACK_RESULT_OFFSET]]] [1024] [1]
+//  CHECK-DAG:      %[[UNPACK_RESULT_OFFSET:.*]] = affine.apply #[[UNPACK_RESULT_OFFSET_MAP]](%[[IV1]])
+//  CHECK-DAG:      %[[UNPACK_RESULT_SIZE:.*]] = affine.min #[[UNPACK_RESULT_SIZE_MAP]](%[[IV1]])
+//      CHECK:      %[[TILED_UNPACK_DEST:.*]] = tensor.extract_slice %[[UNPACK_OUT_ARG]][%[[UNPACK_RESULT_OFFSET]]] [%[[UNPACK_RESULT_SIZE]]] [1]
 //      CHECK:      %[[TILED_UNPACK_OUT:.*]] = tensor.unpack %[[GENERIC_OUT]]
 // CHECK-SAME:                              outer_dims_perm = [0] inner_dims_pos = [0] inner_tiles = [32]
 // CHECK-SAME:                              into %[[TILED_UNPACK_DEST]]
 //      CHECK:      scf.forall.in_parallel {
 //      CHECK:          tensor.parallel_insert_slice %[[GENERIC_OUT]] into %[[FIRST_OUT_ARG]][%[[IV1]], %[[IV2]]] [32, 32] [1, 1]
-//      CHECK:          tensor.parallel_insert_slice %[[TILED_UNPACK_OUT]] into %[[UNPACK_OUT_ARG]][%[[UNPACK_RESULT_OFFSET]]] [1024] [1]
+//      CHECK:          tensor.parallel_insert_slice %[[TILED_UNPACK_OUT]] into %[[UNPACK_OUT_ARG]][%[[UNPACK_RESULT_OFFSET]]] [%[[UNPACK_RESULT_SIZE]]] [1]
 //      CHECK:       }
 //      CHECK:   }
 //      CHECK:   return %[[FINAL_RESULT]]#1 :

mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir

mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp

hanhanW

Sorry for the late review, thanks!

Signed-off-by: Max Dawkins <[email protected]>

Max191 requested review from hanhanW and nicolasvasilache as code owners October 15, 2024 20:09

llvmbot added mlir mlir:tensor mlir:scf labels Oct 15, 2024

Max191 requested review from MaheshRavishankar and qedawkins October 15, 2024 20:09

hanhanW reviewed Oct 28, 2024

View reviewed changes

mlir/test/Interfaces/TilingInterface/tile-and-fuse-consumer.mlir Show resolved Hide resolved

hanhanW reviewed Oct 28, 2024

View reviewed changes

mlir/lib/Dialect/Tensor/IR/TensorTilingInterfaceImpl.cpp Outdated Show resolved Hide resolved

Max191 force-pushed the clamp-unpack-consumer-tiling branch from 5e08333 to ab7a5b2 Compare October 29, 2024 14:11

Max191 requested a review from hanhanW October 29, 2024 14:11

hanhanW approved these changes Nov 12, 2024

View reviewed changes

Max191 added 2 commits November 13, 2024 09:25

[mlir] Clamp UnPackOp tiling sizes from operand tile

789768a

Signed-off-by: Max Dawkins <[email protected]>

address comments

328a505

Signed-off-by: Max Dawkins <[email protected]>

Max191 force-pushed the clamp-unpack-consumer-tiling branch from ab7a5b2 to 328a505 Compare November 13, 2024 14:26

Max191 merged commit 8cc616b into llvm:main Nov 13, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir] Clamp UnPackOp tiling sizes from operand tile #112429

[mlir] Clamp UnPackOp tiling sizes from operand tile #112429

Uh oh!

Max191 commented Oct 15, 2024

Uh oh!

llvmbot commented Oct 15, 2024 •

edited

Loading

Uh oh!

llvmbot commented Oct 15, 2024

Uh oh!

Uh oh!

Uh oh!

hanhanW left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[mlir] Clamp UnPackOp tiling sizes from operand tile #112429

[mlir] Clamp UnPackOp tiling sizes from operand tile #112429

Uh oh!

Conversation

Max191 commented Oct 15, 2024

Uh oh!

llvmbot commented Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 15, 2024

Uh oh!

Uh oh!

Uh oh!

hanhanW left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

llvmbot commented Oct 15, 2024 •

edited

Loading