[mlir] Canonicalize tensor.extract_slice (linalg.fill) #112619

nithinsubbiah · 2024-10-16T21:09:21Z

This patch adds a canonicalization pattern to fold away tensor.extract_slice preceded by a linalg.fill if the latter op has only one consumer.

github-actions · 2024-10-16T21:09:39Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-10-16T21:10:16Z

@llvm/pr-subscribers-mlir

Author: Nithin Meganathan (nithinsubbiah)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/112619.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp (+31-1)
(modified) mlir/test/Dialect/Linalg/canonicalize.mlir (+14)

diff --git a/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp b/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
index 730c478c2883ef..658445f19e91aa 100644
--- a/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
+++ b/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
@@ -806,6 +806,36 @@ struct FoldFillWithTensorExtract : public OpRewritePattern<tensor::ExtractOp> {
   }
 };
 
+/// Fold tensor.extract_slice(linalg.fill(<input>)) into <input>
+struct FoldFillWithTensorExtractSlice
+    : public OpRewritePattern<tensor::ExtractSliceOp> {
+public:
+  using OpRewritePattern<tensor::ExtractSliceOp>::OpRewritePattern;
+
+  LogicalResult matchAndRewrite(tensor::ExtractSliceOp extractSliceOp,
+                                PatternRewriter &rewriter) const override {
+    // See if tensor input of tensor.extract_slice op is the result of a
+    // linalg.fill op.
+    auto fillOp = extractSliceOp.getSource().getDefiningOp<linalg::FillOp>();
+    if (!fillOp)
+      return failure();
+
+    Value fillInput = fillOp.getInputs()[0];
+
+    Location loc = extractSliceOp.getLoc();
+    SmallVector<OpFoldResult> mixedSizes = extractSliceOp.getMixedSizes();
+    auto emptyOp = rewriter.create<tensor::EmptyOp>(
+        loc, mixedSizes, extractSliceOp.getType().getElementType());
+
+    // Replace tensor.extract_slice op with new linalg.fillOp (former's result
+    // type and shape).
+    rewriter.replaceOpWithNewOp<linalg::FillOp>(
+        extractSliceOp, extractSliceOp.getResultType(), ValueRange{fillInput},
+        ValueRange{emptyOp});
+    return success();
+  }
+};
+
 /// Folds pack(fill) into a single fill op if
 ///   1. The pack op does not have padding value, or
 ///   2. The filled value and padding value are the same.
@@ -936,7 +966,7 @@ struct FoldConcatsOfFill : public OpRewritePattern<tensor::ConcatOp> {
 void FillOp::getCanonicalizationPatterns(RewritePatternSet &results,
                                          MLIRContext *context) {
   results.add<FoldConcatsOfFill, FoldFillWithCopy, FoldFillWithTensorExtract,
-              FoldFillWithPack, FoldFillWithPad,
+              FoldFillWithTensorExtractSlice, FoldFillWithPack, FoldFillWithPad,
               FoldFillWithTensorReshape<tensor::CollapseShapeOp>,
               FoldFillWithTensorReshape<tensor::ExpandShapeOp>,
               FoldInsertPadIntoFill, FoldFillWithTranspose>(context);
diff --git a/mlir/test/Dialect/Linalg/canonicalize.mlir b/mlir/test/Dialect/Linalg/canonicalize.mlir
index 4bc2ed140da91a..763cf80241fcc0 100644
--- a/mlir/test/Dialect/Linalg/canonicalize.mlir
+++ b/mlir/test/Dialect/Linalg/canonicalize.mlir
@@ -352,6 +352,20 @@ func.func @fold_fill_extract(%arg0 : i1) -> i1 {
 
 // -----
 
+func.func @fold_fill_extract_slice() -> tensor<2x1920x64x66xf32> {
+  %c0 = arith.constant 0. : f32
+  %0 = tensor.empty() : tensor<2x1920x66x66xf32>
+  %1 = linalg.fill ins(%c0 : f32) outs(%0 : tensor<2x1920x66x66xf32>) -> tensor<2x1920x66x66xf32>
+  %extracted_slice = tensor.extract_slice %1[0, 0, 1, 0] [2, 1920, 64, 66] [1, 1, 1, 1] : tensor<2x1920x66x66xf32> to tensor<2x1920x64x66xf32>
+  return %extracted_slice : tensor<2x1920x64x66xf32>
+}
+// CHECK-LABEL: func.func @fold_fill_extract_slice
+// CHECK:         %[[EMPTY_TENSOR:.+]] = tensor.empty() : tensor<2x1920x64x66xf32>
+// CHECK:         %[[FILL:.+]] = linalg.fill ins(%{{.+}}) outs(%[[EMPTY_TENSOR]]
+// CHECK:         return %[[FILL]]
+
+// -----
+
 func.func @fill_pack() -> tensor<24x32x16x16xf32> {
   %dest = tensor.empty() : tensor<384x512xf32>
   %cst = arith.constant 0.000000e+00 : f32

llvmbot · 2024-10-16T21:10:16Z

@llvm/pr-subscribers-mlir-linalg

Author: Nithin Meganathan (nithinsubbiah)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/112619.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp (+31-1)
(modified) mlir/test/Dialect/Linalg/canonicalize.mlir (+14)

diff --git a/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp b/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
index 730c478c2883ef..658445f19e91aa 100644
--- a/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
+++ b/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
@@ -806,6 +806,36 @@ struct FoldFillWithTensorExtract : public OpRewritePattern<tensor::ExtractOp> {
   }
 };
 
+/// Fold tensor.extract_slice(linalg.fill(<input>)) into <input>
+struct FoldFillWithTensorExtractSlice
+    : public OpRewritePattern<tensor::ExtractSliceOp> {
+public:
+  using OpRewritePattern<tensor::ExtractSliceOp>::OpRewritePattern;
+
+  LogicalResult matchAndRewrite(tensor::ExtractSliceOp extractSliceOp,
+                                PatternRewriter &rewriter) const override {
+    // See if tensor input of tensor.extract_slice op is the result of a
+    // linalg.fill op.
+    auto fillOp = extractSliceOp.getSource().getDefiningOp<linalg::FillOp>();
+    if (!fillOp)
+      return failure();
+
+    Value fillInput = fillOp.getInputs()[0];
+
+    Location loc = extractSliceOp.getLoc();
+    SmallVector<OpFoldResult> mixedSizes = extractSliceOp.getMixedSizes();
+    auto emptyOp = rewriter.create<tensor::EmptyOp>(
+        loc, mixedSizes, extractSliceOp.getType().getElementType());
+
+    // Replace tensor.extract_slice op with new linalg.fillOp (former's result
+    // type and shape).
+    rewriter.replaceOpWithNewOp<linalg::FillOp>(
+        extractSliceOp, extractSliceOp.getResultType(), ValueRange{fillInput},
+        ValueRange{emptyOp});
+    return success();
+  }
+};
+
 /// Folds pack(fill) into a single fill op if
 ///   1. The pack op does not have padding value, or
 ///   2. The filled value and padding value are the same.
@@ -936,7 +966,7 @@ struct FoldConcatsOfFill : public OpRewritePattern<tensor::ConcatOp> {
 void FillOp::getCanonicalizationPatterns(RewritePatternSet &results,
                                          MLIRContext *context) {
   results.add<FoldConcatsOfFill, FoldFillWithCopy, FoldFillWithTensorExtract,
-              FoldFillWithPack, FoldFillWithPad,
+              FoldFillWithTensorExtractSlice, FoldFillWithPack, FoldFillWithPad,
               FoldFillWithTensorReshape<tensor::CollapseShapeOp>,
               FoldFillWithTensorReshape<tensor::ExpandShapeOp>,
               FoldInsertPadIntoFill, FoldFillWithTranspose>(context);
diff --git a/mlir/test/Dialect/Linalg/canonicalize.mlir b/mlir/test/Dialect/Linalg/canonicalize.mlir
index 4bc2ed140da91a..763cf80241fcc0 100644
--- a/mlir/test/Dialect/Linalg/canonicalize.mlir
+++ b/mlir/test/Dialect/Linalg/canonicalize.mlir
@@ -352,6 +352,20 @@ func.func @fold_fill_extract(%arg0 : i1) -> i1 {
 
 // -----
 
+func.func @fold_fill_extract_slice() -> tensor<2x1920x64x66xf32> {
+  %c0 = arith.constant 0. : f32
+  %0 = tensor.empty() : tensor<2x1920x66x66xf32>
+  %1 = linalg.fill ins(%c0 : f32) outs(%0 : tensor<2x1920x66x66xf32>) -> tensor<2x1920x66x66xf32>
+  %extracted_slice = tensor.extract_slice %1[0, 0, 1, 0] [2, 1920, 64, 66] [1, 1, 1, 1] : tensor<2x1920x66x66xf32> to tensor<2x1920x64x66xf32>
+  return %extracted_slice : tensor<2x1920x64x66xf32>
+}
+// CHECK-LABEL: func.func @fold_fill_extract_slice
+// CHECK:         %[[EMPTY_TENSOR:.+]] = tensor.empty() : tensor<2x1920x64x66xf32>
+// CHECK:         %[[FILL:.+]] = linalg.fill ins(%{{.+}}) outs(%[[EMPTY_TENSOR]]
+// CHECK:         return %[[FILL]]
+
+// -----
+
 func.func @fill_pack() -> tensor<24x32x16x16xf32> {
   %dest = tensor.empty() : tensor<384x512xf32>
   %cst = arith.constant 0.000000e+00 : f32

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp

banach-space · 2024-10-17T08:25:58Z

mlir/test/Dialect/Linalg/canonicalize.mlir


 // -----

+func.func @fold_fill_extract_slice() -> tensor<2x1920x64x66xf32> {


Please add a negative test, e.g. linalg.fill following by tensor.insert? Provided that doesn't trigger some other canonicalization.

Could you elaborate why we may need such a test here? To test that such a pattern wouldn't get folded?

To test that such a pattern wouldn't get folded?

Indeed. IMHO, we should always try covering "interesting" corner cases (within reason). These tests effectively document the code and its design - a negative test is an example of where the fold should not work. And if anything changes, we will almost immediately know (the test will start failing),

banach-space · 2024-10-17T08:29:28Z

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp

  }
 };

+/// Fold tensor.extract_slice(linalg.fill(<input>)) into <input>


Hm, this comment is not quite correct, is it? You are replacing tensor.extract_slice with linalg.fill, right?

What if there are multiple uses of the original linalg.fill? Then this is adding another linalg.fill (and tensor.empty) rather than updating the original linalg.fill? Is that still more canonical?

I think even with multiple users it might be ok to introduce a new fill of the smaller shape. Thats an opinion though. I think all such slice propagation patterns need to be kept separate, but fill -> slice might be worth adding to canonicalization.

banach-space · 2024-10-17T08:29:47Z

mlir/test/Dialect/Linalg/canonicalize.mlir

+func.func @fold_fill_extract_slice() -> tensor<2x1920x64x66xf32> {
+  %c0 = arith.constant 0. : f32
+  %0 = tensor.empty() : tensor<2x1920x66x66xf32>
+  %1 = linalg.fill ins(%c0 : f32) outs(%0 : tensor<2x1920x66x66xf32>) -> tensor<2x1920x66x66xf32>


What if this linalg.fill has multiple uses?

It doesnt matter. You will have two fills, one larger, and one smaller. Fills are always better I think (especially in tensor semantics).

MaheshRavishankar · 2024-10-17T16:55:37Z

We probably want to have a separate set of patterns that "bubble up slices' this way. In IREE we have a pass that does this https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/DispatchCreation/BubbleUpExtractSlices.cpp . I think we want to just collect such patterns into a single populate* method and not put this in canonicalization.

joker-eph · 2024-10-17T16:59:25Z

Can you elaborate why this isn’t a good canonicalization? Is this losing semantics or anything?

MaheshRavishankar · 2024-10-17T17:09:33Z

Can you elaborate why this isn’t a good canonicalization? Is this losing semantics or anything?

Like Andrej said, it depends on whether for a compiler stack a linalg.fill -> extract_slice is better or a two linalg.fills are better. I think it is an opinionated decision. So I can see reasons not to do it as a canonicalization. Fill is a bit of a corner case though. IMO it is always better to do this, but that is also based on my use case. I am not necessarily pushing back against this as a canonicalization, but if someone has an issue with it being one, then id be fine moving it out of canonicalization.

joker-eph · 2024-10-17T23:00:56Z

Like Andrej said, it depends on whether for a compiler stack a linalg.fill -> extract_slice is better or a two linalg.fills are better. I think it is an opinionated decision

Sorry but that's not a principled way to think about canonicalization: this actually a perfect example of why we need canonicalization. If the two forms can be equally well matched, the canonicalization is here so that a compiler does not have to match both but can match a single one.
A "compiler stack" cannot just expect something to be "better" without a principled argument, hence the question about semantic loss.

if someone has an issue with it being one, then id be fine moving it out of canonicalization.

Sure, I'd want to hear a principled reason about why it's not a good one (that is "it is not alway possible to match the new canonical form", or "we lost some important information from the original program"). This should always be possible to express with an example of input IR and the intended lowering.

banach-space · 2024-10-18T12:22:45Z

First, a note:

BEFORE (1 use):

%empty = tensor.empty
%fill = linalg.fill %empty
%res = tensor.extract_slice

AFTER (1 use):

%empty = tensor.empty
%res = linalg.fill %empty

BEFORE (2 uses):

%empty = tensor.empty
%fill = linalg.fill %empty
%res_1 = tensor.extract_slice %fill
%res_2 = tensor.extract_slice %fill

AFTER (2 uses):

%empty_1 = tensor.empty
%res_1 = linalg.fill %empty_1
%empty_2 = tensor.empty
%res_2 = linalg.fill %empty_2

Looking at the case with 1 use, this feels like a canonicalization to me:

no information is lost and, with fewer Ops, the AFTER variant is basically simpler.

This is no longer so clear to me in the case with 2 uses. Perhaps this should be limited to cases with 1 use?

On a related note, going by the Tensor Op docs

[tensor.empty]
This op can be lowered to a bufferization.alloc_tensor, at which point it turns into an explicit buffer allocation.

[tensor.extract_slice]
After buffer allocation, the “extract_slice” op is expected to lower into a memref.subview op.

So, with 2 uses, this would potentially replace (after bufferization):

1 allocation (bufferization.alloc_tensor) followed by 2 subviews (memref.subview), with
2 allocations.

It's not immediately obvious to me whether one is more "canonical" or "better" than the other. If aliasing is an issue, separate allocations would help. If aliasing is not a problem, re-using one buffer is probably better?

No strong opinion here, mostly just analysing out loud.

nithinsubbiah · 2024-10-21T01:04:10Z

Thanks for the comments. I agree with @banach-space that this might not fit into a canonicalization pattern when the FillOp has multiple consumers introducing additional buffer allocation.

I've changed the pattern to fold only when there's one consumer of the FillOp.

stellaraccident · 2024-10-21T01:27:41Z

Thanks for the comments. I agree with @banach-space that this might not fit into a canonicalization pattern when the FillOp has multiple consumers introducing additional buffer allocation.

I've changed the pattern to fold only when there's one consumer of the FillOp.

I really don't want to add to this thread, but I am skeptical that this qualifies if restricted to a single consumer: that doesn't actually simplify any downstream matching because now a compiler still has to pattern match all of the variants. I could imagine arguments why this might be the case, but having run into a lot of these fragile single-consumer optimizations trying to fly as a canonicalization, I am default skeptical.

If the multi use case is a canonical form, then so be it. But I expect that Mahesh is right that the actual optimization being sought here is a full graph transformation where the graph is put into a form such that some of these ops are maximally "bubbled". That is an optimization. The effect can be approximated by the canonicalizer but it is typically not the right tool for the job, ime.

nithinsubbiah · 2024-10-23T02:11:16Z

Let me try to summarize based on the discussions we've had so far.

Folding of extract_slice into linalg.fill when the latter op has single vs multiple consumers. Do we consider either/both of them as part of canonicalization? I think there's agreement when there's a single consumer of FillOp, this pass produces a simpler representation of the same set of input instructions.

When there's multiple consumers, an additional tensor.empty needs to be introduced since replacing the old FillOp with a new sliced FillOp might not always work. An example of where it will not work:

%1 = linalg.fill
%2 = tensor.extract %1
%3 = scf.for [%1]

I see two options here:

Don't enforce single consumer constraint and have an additional tensor.empty.
I can also create a separate pattern like https://github.com/llvm/llvm-project/blob/78e026f845fb4d924673a9d534cc36cf7b55473c/mlir/lib/Dialect/Linalg/Transforms/SwapExtractSliceWithFillPatterns.cpp

MaheshRavishankar · 2024-10-23T06:07:03Z

Let me try to summarize based on the discussions we've had so far.

Folding of extract_slice into linalg.fill when the latter op has single vs multiple consumers. Do we consider either/both of them as part of canonicalization? I think there's agreement when there's a single consumer of FillOp, this pass produces a simpler representation of the same set of input instructions.

When there's multiple consumers, an additional tensor.empty needs to be introduced since replacing the old FillOp with a new sliced FillOp might not always work. An example of where it will not work:
%1 = linalg.fill
%2 = tensor.extract %1
%3 = scf.for [%1]
I see two options here:

Don't enforce single consumer constraint and have an additional tensor.empty.

I can also create a separate pattern like https://github.com/llvm/llvm-project/blob/78e026f845fb4d924673a9d534cc36cf7b55473c/mlir/lib/Dialect/Linalg/Transforms/SwapExtractSliceWithFillPatterns.cpp

I think Andrej's point is good. If we want to add a canonicalization, we add it for a single user of the linalg.fill. But mainly I think we want to add a separate pattern that does the "bubbling up" of extract_slice irrespective of number of uses and that is a separate pattern + populate method.

Signed-off-by: nithinsubbiah <[email protected]>

…o FillOp Signed-off-by: nithinsubbiah <[email protected]>

MaheshRavishankar

I think you should avoid using a tensor.empty for the destination of the fill op. It is better just using a slice of a destination.

Come to think of it, that way you could drop the multiple use constraint too..

MaheshRavishankar · 2024-10-26T02:22:41Z

BEFORE (2 uses):

%empty = tensor.empty
%fill = linalg.fill %empty
%res_1 = tensor.extract_slice %fill
%res_2 = tensor.extract_slice %fill

AFTER (2 uses):

%empty_1 = tensor.empty
%res_1 = linalg.fill %empty_1
%empty_2 = tensor.empty
%res_2 = linalg.fill %empty_2

@banach-space how about this change for this example

%empty = tensor.empty
%extract_slice1 = tensor.extract_slice %empty
%extract_slice2 = tensor.extract_slice %empty
%res1 = linalg.fill (...) outs(%extract_slice1)
%res2 = linalg.fill (...) outs(%extract_slice2)

That seems like that should fall in the always good to do category. Again I am not advocating one way or another, just thinking aloud.

banach-space · 2024-10-28T15:13:18Z

@banach-space how about this change for this example
%empty = tensor.empty
%extract_slice1 = tensor.extract_slice %empty
%extract_slice2 = tensor.extract_slice %empty
%res1 = linalg.fill (...) outs(%extract_slice1)
%res2 = linalg.fill (...) outs(%extract_slice2)
That seems like that should fall in the always good to do category. Again I am not advocating one way or another, just thinking aloud.

I see 4 Ops before and 5 Ops after , so it's not immediately obvious to me that AFTER is better :) What metric do you use to decide what's better/worse? Not claiming my metric is perfect, just sharing how I reason about these things.

I tried this simple experiment to better understand the tradeoffs (from my biased viewpoint):

// 4 Ops - empty + fill + extract_slice + extract_slice
func.func @foo() -> (tensor<2x1920x64x66xf32>, tensor<2x1920x64x66xf32>) {
  %c0 = arith.constant 0. : f32
  %0 = tensor.empty() : tensor<2x1920x66x66xf32>
  %1 = linalg.fill ins(%c0 : f32) outs(%0 : tensor<2x1920x66x66xf32>) -> tensor<2x1920x66x66xf32>
  %extracted_slice = tensor.extract_slice %1[0, 0, 0, 0] [2, 1920, 64, 66] [1, 1, 1, 1] : tensor<2x1920x66x66xf32> to tensor<2x1920x64x66xf32>
  %extracted_slice_2 = tensor.extract_slice %1[0, 0, 2, 0] [2, 1920, 64, 66] [1, 1, 1, 1] : tensor<2x1920x66x66xf32> to tensor<2x1920x64x66xf32>
  return %extracted_slice,%extracted_slice_2 : tensor<2x1920x64x66xf32>, tensor<2x1920x64x66xf32>
}

// 5 Ops - empty + extract_slice + extract_slice + fill + fill
func.func @bar() -> (tensor<2x1920x64x66xf32>, tensor<2x1920x64x66xf32>) {
  %c0 = arith.constant 0. : f32
  %0 = tensor.empty() : tensor<2x1920x66x66xf32>
  %extracted_slice = tensor.extract_slice %0[0, 0, 0, 0] [2, 1920, 64, 66] [1, 1, 1, 1] : tensor<2x1920x66x66xf32> to tensor<2x1920x64x66xf32>
  %extracted_slice_2 = tensor.extract_slice %0[0, 0, 2, 0] [2, 1920, 64, 66] [1, 1, 1, 1] : tensor<2x1920x66x66xf32> to tensor<2x1920x64x66xf32>
  %res_1 = linalg.fill ins(%c0 : f32) outs(%extracted_slice : tensor<2x1920x64x66xf32>) -> tensor<2x1920x64x66xf32>
  %res_2 = linalg.fill ins(%c0 : f32) outs(%extracted_slice_2 : tensor<2x1920x64x66xf32>) -> tensor<2x1920x64x66xf32>
  return %res_1, %res_2 : tensor<2x1920x64x66xf32>, tensor<2x1920x64x66xf32>
}

After bufferization (-one-shot-bufferize="bufferize-function-boundaries"):

  func.func @foo() -> (memref<2x1920x64x66xf32, strided<[8363520, 4356, 66, 1]>>, memref<2x1920x64x66xf32, strided<[8363520, 4356, 66, 1], offset: 132>>) {
    %cst = arith.constant 0.000000e+00 : f32
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<2x1920x66x66xf32>
    linalg.fill ins(%cst : f32) outs(%alloc : memref<2x1920x66x66xf32>)
    %subview = memref.subview %alloc[0, 0, 0, 0] [2, 1920, 64, 66] [1, 1, 1, 1] : memref<2x1920x66x66xf32> to memref<2x1920x64x66xf32, strided<[8363520, 4356, 66, 1]>>
    %subview_0 = memref.subview %alloc[0, 0, 2, 0] [2, 1920, 64, 66] [1, 1, 1, 1] : memref<2x1920x66x66xf32> to memref<2x1920x64x66xf32, strided<[8363520, 4356, 66, 1], offset: 132>>
    %cast = memref.cast %subview : memref<2x1920x64x66xf32, strided<[8363520, 4356, 66, 1]>> to memref<2x1920x64x66xf32, strided<[?, ?, ?, ?], offset: ?>>
    %cast_1 = memref.cast %subview_0 : memref<2x1920x64x66xf32, strided<[8363520, 4356, 66, 1], offset: 132>> to memref<2x1920x64x66xf32, strided<[?, ?, ?, ?], offset: ?>>
    return %subview, %subview_0 : memref<2x1920x64x66xf32, strided<[8363520, 4356, 66, 1]>>, memref<2x1920x64x66xf32, strided<[8363520, 4356, 66, 1], offset: 132>>
  }
  func.func @bar() -> (memref<2x1920x64x66xf32>, memref<2x1920x64x66xf32, strided<[8363520, 4356, 66, 1], offset: 132>>) {
    %cst = arith.constant 0.000000e+00 : f32
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<2x1920x66x66xf32>
    %subview = memref.subview %alloc[0, 0, 0, 0] [2, 1920, 64, 66] [1, 1, 1, 1] : memref<2x1920x66x66xf32> to memref<2x1920x64x66xf32, strided<[8363520, 4356, 66, 1]>>
    %alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<2x1920x64x66xf32>
    %subview_1 = memref.subview %alloc[0, 0, 2, 0] [2, 1920, 64, 66] [1, 1, 1, 1] : memref<2x1920x66x66xf32> to memref<2x1920x64x66xf32, strided<[8363520, 4356, 66, 1], offset: 132>>
    linalg.fill ins(%cst : f32) outs(%alloc_0 : memref<2x1920x64x66xf32>)
    linalg.fill ins(%cst : f32) outs(%subview_1 : memref<2x1920x64x66xf32, strided<[8363520, 4356, 66, 1], offset: 132>>)
    %cast = memref.cast %alloc_0 : memref<2x1920x64x66xf32> to memref<2x1920x64x66xf32, strided<[?, ?, ?, ?], offset: ?>>
    %cast_2 = memref.cast %subview_1 : memref<2x1920x64x66xf32, strided<[8363520, 4356, 66, 1], offset: 132>> to memref<2x1920x64x66xf32, strided<[?, ?, ?, ?], offset: ?>>
    return %alloc_0, %subview_1 : memref<2x1920x64x66xf32>, memref<2x1920x64x66xf32, strided<[8363520, 4356, 66, 1], offset: 132>>
  }

So, with 5 Ops there's one additional big allocation and an additional FillOp. Which is not ideal. However, I appreciate that a lot can happen between the proposed canonicalization and bufferization (i.e. my example is a bit academic).

I think you should avoid using a tensor.empty for the destination of the fill op. It is better just using a slice of a destination.

Could you elaborate? I'd like to better follow the rationale here.

Just to clarify, I am not against this change. And it should be OK to keep it as canonicalization - no semantic context is lost, matching should be similar. IIUC, you are trying to get rid of tensor.extract_slice? I am trying to better grasp what is it that we are aiming for 😅

MaheshRavishankar · 2024-10-29T00:25:00Z

Thanks for checking.... I had things not connected correctly. After seeing the output you have, I think it makes more sense to me to just do this for a single use case for now as a canonicalization.

banach-space

LGTM % my final (minor) comments

Great technical discussion, thank you everyone!

banach-space · 2024-10-29T13:16:24Z

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp

  }
 };

+/// Fold tensor.extract_slice(linalg.fill(<input>)) into linalg.fill(<input>)


[nit] Add a note that this triggers only when there's only one use "by design" (with one use this is obviously beneficial, with multiple uses it is not so clear)

banach-space · 2024-10-29T13:20:38Z

mlir/test/Dialect/Linalg/canonicalize.mlir


 // -----

+func.func @fold_fill_extract_slice() -> tensor<2x1920x64x66xf32> {


To test that such a pattern wouldn't get folded?

Indeed. IMHO, we should always try covering "interesting" corner cases (within reason). These tests effectively document the code and its design - a negative test is an example of where the fold should not work. And if anything changes, we will almost immediately know (the test will start failing),

nithinsubbiah requested review from dcaballe, nicolasvasilache and rengolin as code owners October 16, 2024 21:09

llvmbot added mlir:linalg mlir labels Oct 16, 2024

banach-space reviewed Oct 17, 2024

View reviewed changes

nithinsubbiah force-pushed the fold_fill_slice branch from 9d24d35 to 1c93e3d Compare October 21, 2024 00:58

nithinsubbiah requested review from MaheshRavishankar and banach-space October 21, 2024 01:04

[mlir] Canonicalize tensor.extract_slice (linalg.fill)

758228d

Signed-off-by: nithinsubbiah <[email protected]>

nithinsubbiah force-pushed the fold_fill_slice branch from f017af6 to 52e7b53 Compare October 25, 2024 22:56

nithinsubbiah and others added 2 commits October 25, 2024 15:58

[mlir] Check if ExtractSliceOp is the only consumer while folding int…

52e7b53

…o FillOp Signed-off-by: nithinsubbiah <[email protected]>

Merge branch 'main' into fold_fill_slice

15e65e8

MaheshRavishankar requested changes Oct 26, 2024

View reviewed changes

MaheshRavishankar approved these changes Oct 29, 2024

View reviewed changes

banach-space approved these changes Oct 29, 2024

View reviewed changes


		// -----

		func.func @fold_fill_extract_slice() -> tensor<2x1920x64x66xf32> {

[mlir] Canonicalize tensor.extract_slice (linalg.fill) #112619

Are you sure you want to change the base?

[mlir] Canonicalize tensor.extract_slice (linalg.fill) #112619

Uh oh!

Conversation

nithinsubbiah commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 16, 2024

Uh oh!

llvmbot commented Oct 16, 2024

Uh oh!

llvmbot commented Oct 16, 2024

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaheshRavishankar commented Oct 17, 2024

Uh oh!

joker-eph commented Oct 17, 2024

Uh oh!

MaheshRavishankar commented Oct 17, 2024

Uh oh!

joker-eph commented Oct 17, 2024

Uh oh!

banach-space commented Oct 18, 2024

Uh oh!

nithinsubbiah commented Oct 21, 2024

Uh oh!

stellaraccident commented Oct 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nithinsubbiah commented Oct 23, 2024

Uh oh!

MaheshRavishankar commented Oct 23, 2024

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

MaheshRavishankar commented Oct 26, 2024

Uh oh!

banach-space commented Oct 28, 2024

Uh oh!

MaheshRavishankar commented Oct 29, 2024

Uh oh!

banach-space left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

nithinsubbiah commented Oct 16, 2024 •

edited

Loading

stellaraccident commented Oct 21, 2024 •

edited

Loading