-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[mlir][Linalg] Fix fusing of indexed linalg consumer with different axes #140892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-mlir Author: Simone Pellegrini (simpel01) ChangesWhen fusing two This patch fixes the issue by directly using the number of loops from the generated fused op. Full diff: https://github.com/llvm/llvm-project/pull/140892.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp b/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
index 1f5af39e604e7..f97ed3d6d5111 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
@@ -231,8 +231,7 @@ static void generateFusedElementwiseOpRegion(
// `consumerToProducerLoopsMap` to map the producer indices.
if (producer.hasIndexSemantics()) {
// Add an index operation for every fused loop dimension.
- unsigned numFusedOpLoops =
- std::max(producer.getNumLoops(), consumer.getNumLoops());
+ unsigned numFusedOpLoops = fusedOp.getNumLoops();
SmallVector<Value> fusedIndices;
fusedIndices.reserve(numFusedOpLoops);
llvm::transform(llvm::seq<uint64_t>(0, numFusedOpLoops),
diff --git a/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir b/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
index 28e1291bce1fa..031cb350bfba4 100644
--- a/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
+++ b/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
@@ -860,6 +860,46 @@ func.func @fusion_different_axes(%arg0 : tensor<5000xi64>, %arg1 : tensor<5000xi
// -----
+func.func @fusion_different_axes_indexed(%arg0: tensor<2x2xi32>) -> tensor<4xi32> {
+ %0 = tensor.empty() : tensor<2x2xi32>
+ %1 = linalg.generic {
+ indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>],
+ iterator_types = ["parallel", "parallel"]}
+ ins(%arg0 : tensor<2x2xi32>) outs(%0 : tensor<2x2xi32>) {
+ ^bb0(%in: i32, %out: i32):
+ %2 = linalg.index 1 : index
+ %3 = arith.index_cast %2 : index to i32
+ linalg.yield %3 : i32
+ } -> tensor<2x2xi32>
+ %4 = tensor.empty() : tensor<4xi32>
+ %5 = linalg.generic {
+ indexing_maps = [affine_map<(d0) -> (d0 floordiv 2, d0 mod 2)>, affine_map<(d0) -> (d0)>],
+ iterator_types = ["parallel"]}
+ ins(%1 : tensor<2x2xi32>) outs(%4 : tensor<4xi32>) {
+ ^bb0(%in: i32, %out: i32):
+ linalg.yield %in : i32
+ } -> tensor<4xi32>
+ return %5 : tensor<4xi32>
+}
+
+// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0) -> (d0)>
+// CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0] -> (s0 mod 2)>
+// CHECK: func @fusion_different_axes_indexed(
+// CHECK-SAME: %[[ARG0:.+]]: tensor<2x2xi32>
+// CHECK-DAG: %[[INIT:.+]] = tensor.empty() : tensor<4xi32>
+// CHECK: %[[RESULT:.+]] = linalg.generic
+// CHECK-SAME: indexing_maps = [#[[MAP0]]]
+// CHECK-SAME: outs(%[[INIT]] :
+// CHECK-NEXT: ^bb0(
+// CHECK-SAME: %[[B0:.+]]: i32
+// CHECK-DAG: %[[T0:.+]] = linalg.index 0
+// CHECK-DAG: %[[T1:.+]] = affine.apply #[[MAP1]]()[%[[T0]]]
+// CHECK-DAG: %[[CAST:.+]] = arith.index_cast %[[T1]] : index to i32
+// CHECK: linalg.yield %[[CAST]]
+// CHECK: return %[[RESULT]]
+
+// -----
+
// CHECK-LABEL: func @fold_fill_generic_basic
// CHECK-SAME: (%[[ARG0:.*]]: tensor<?xf32>) -> tensor<?xf32> {
// CHECK-NOT: linalg.fill
|
|
@llvm/pr-subscribers-mlir-linalg Author: Simone Pellegrini (simpel01) ChangesWhen fusing two This patch fixes the issue by directly using the number of loops from the generated fused op. Full diff: https://github.com/llvm/llvm-project/pull/140892.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp b/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
index 1f5af39e604e7..f97ed3d6d5111 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
@@ -231,8 +231,7 @@ static void generateFusedElementwiseOpRegion(
// `consumerToProducerLoopsMap` to map the producer indices.
if (producer.hasIndexSemantics()) {
// Add an index operation for every fused loop dimension.
- unsigned numFusedOpLoops =
- std::max(producer.getNumLoops(), consumer.getNumLoops());
+ unsigned numFusedOpLoops = fusedOp.getNumLoops();
SmallVector<Value> fusedIndices;
fusedIndices.reserve(numFusedOpLoops);
llvm::transform(llvm::seq<uint64_t>(0, numFusedOpLoops),
diff --git a/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir b/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
index 28e1291bce1fa..031cb350bfba4 100644
--- a/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
+++ b/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
@@ -860,6 +860,46 @@ func.func @fusion_different_axes(%arg0 : tensor<5000xi64>, %arg1 : tensor<5000xi
// -----
+func.func @fusion_different_axes_indexed(%arg0: tensor<2x2xi32>) -> tensor<4xi32> {
+ %0 = tensor.empty() : tensor<2x2xi32>
+ %1 = linalg.generic {
+ indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>],
+ iterator_types = ["parallel", "parallel"]}
+ ins(%arg0 : tensor<2x2xi32>) outs(%0 : tensor<2x2xi32>) {
+ ^bb0(%in: i32, %out: i32):
+ %2 = linalg.index 1 : index
+ %3 = arith.index_cast %2 : index to i32
+ linalg.yield %3 : i32
+ } -> tensor<2x2xi32>
+ %4 = tensor.empty() : tensor<4xi32>
+ %5 = linalg.generic {
+ indexing_maps = [affine_map<(d0) -> (d0 floordiv 2, d0 mod 2)>, affine_map<(d0) -> (d0)>],
+ iterator_types = ["parallel"]}
+ ins(%1 : tensor<2x2xi32>) outs(%4 : tensor<4xi32>) {
+ ^bb0(%in: i32, %out: i32):
+ linalg.yield %in : i32
+ } -> tensor<4xi32>
+ return %5 : tensor<4xi32>
+}
+
+// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0) -> (d0)>
+// CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0] -> (s0 mod 2)>
+// CHECK: func @fusion_different_axes_indexed(
+// CHECK-SAME: %[[ARG0:.+]]: tensor<2x2xi32>
+// CHECK-DAG: %[[INIT:.+]] = tensor.empty() : tensor<4xi32>
+// CHECK: %[[RESULT:.+]] = linalg.generic
+// CHECK-SAME: indexing_maps = [#[[MAP0]]]
+// CHECK-SAME: outs(%[[INIT]] :
+// CHECK-NEXT: ^bb0(
+// CHECK-SAME: %[[B0:.+]]: i32
+// CHECK-DAG: %[[T0:.+]] = linalg.index 0
+// CHECK-DAG: %[[T1:.+]] = affine.apply #[[MAP1]]()[%[[T0]]]
+// CHECK-DAG: %[[CAST:.+]] = arith.index_cast %[[T1]] : index to i32
+// CHECK: linalg.yield %[[CAST]]
+// CHECK: return %[[RESULT]]
+
+// -----
+
// CHECK-LABEL: func @fold_fill_generic_basic
// CHECK-SAME: (%[[ARG0:.*]]: tensor<?xf32>) -> tensor<?xf32> {
// CHECK-NOT: linalg.fill
|
MaheshRavishankar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. The fix itself looks fine, but could you change the tests.
The op you use is not explicitly caught as an illegal op, but very few things in Linalg work well for floordiv and mod in indexing maps. Could you add an example where the indexing maps are strictly affine
When fusing two `linalg.genericOp`, where the producer has index semantics, invalid `affine.apply` ops can be generated where the number of indices do not match the number of loops in the fused genericOp. This patch fixes the issue by directly using the number of loops from the generated fused op.
MaheshRavishankar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change. I think the one-liner change maybe ok, but I am concerned about the test added here.
%5 = linalg.generic {
indexing_maps = [affine_map<(d0) -> (d0, 1)>, affine_map<(d0) -> (d0)>],
iterator_types = ["parallel"]}
ins(%1 : tensor<2x2xi32>) outs(%4 : tensor<2xi32>) {
^bb0(%in: i32, %out: i32):
linalg.yield %in : i32
} -> tensor<2xi32>
return %5 : tensor<2xi32>
is effectively just a
%5 = tensor.extract_slice %1[0, 1][.., 1] [1, 1] : tensor<2x2xi32> to tensor<2xi32>
Linalg operations are not meant to represent slice semantics this way. It is not caught by the verifier cause that would require a heavy hammer of disallowing indexing maps with constants in the range, and that is not a practical thing to do right now based on how Linalg is used in general.
If you have a different example that motivates the change that might help.
Unrelated to this PR, but I fell into the same "trap" (?) because I couldn't find anything in the linalg documentation that would forbid this or explain why it is an bad idea. |
|
Hi, thanks for the comments. In order to trigger the compiler assert that this patch fixes, I need an N-D I haven't mentioned this, but without the fix the compiler would generate the following linalg (out of the example I added): which is illegal as: and later the compiler hits the following assert: The affine expression per se is not really important as long as is not trivial and requires remapping. For example using something trivial like this: won't cause the assert to trigger. The expression I used in my latest example is the simplest strictly affine map that causes the assert to trigger. The example that motivated this change was what I used in the original patch based on the I think it is going to be difficult to come up with something else that triggers the assert whilst remaining in the "strictly affine realm". But I am not an expert in this subject, so suggestions for suitable 1D into 2D strictly affine expressions are welcome! :) |
MaheshRavishankar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks right. Lets see if this breaks anything downstream.
|
@simpel01 Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
…xes (llvm#140892) When fusing two `linalg.genericOp`, where the producer has index semantics, invalid `affine.apply` ops can be generated where the number of indices do not match the number of loops in the fused genericOp. This patch fixes the issue by directly using the number of loops from the generated fused op.
…xes (llvm#140892) When fusing two `linalg.genericOp`, where the producer has index semantics, invalid `affine.apply` ops can be generated where the number of indices do not match the number of loops in the fused genericOp. This patch fixes the issue by directly using the number of loops from the generated fused op.
When fusing two
linalg.genericOp, where the producer has index semantics, invalidaffine.applyops can be generated where the number of indices do not match the number of loops in the fused genericOp.This patch fixes the issue by directly using the number of loops from the generated fused op.