[mlir][Linalg] Fix fusing of indexed linalg consumer with different axes #140892

simpel01 · 2025-05-21T13:06:36Z

When fusing two linalg.genericOp, where the producer has index semantics, invalid affine.apply ops can be generated where the number of indices do not match the number of loops in the fused genericOp.

This patch fixes the issue by directly using the number of loops from the generated fused op.

github-actions · 2025-05-21T13:06:55Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-05-21T13:07:29Z

@llvm/pr-subscribers-mlir

Author: Simone Pellegrini (simpel01)

Changes

When fusing two linalg.genericOp, where the producer has index semantics, invalid affine.apply ops can be generated where the number of indices do not match the number of loops in the fused genericOp.

This patch fixes the issue by directly using the number of loops from the generated fused op.

Full diff: https://github.com/llvm/llvm-project/pull/140892.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp (+1-2)
(modified) mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir (+40)

diff --git a/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp b/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
index 1f5af39e604e7..f97ed3d6d5111 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
@@ -231,8 +231,7 @@ static void generateFusedElementwiseOpRegion(
   // `consumerToProducerLoopsMap` to map the producer indices.
   if (producer.hasIndexSemantics()) {
     // Add an index operation for every fused loop dimension.
-    unsigned numFusedOpLoops =
-        std::max(producer.getNumLoops(), consumer.getNumLoops());
+    unsigned numFusedOpLoops = fusedOp.getNumLoops();
     SmallVector<Value> fusedIndices;
     fusedIndices.reserve(numFusedOpLoops);
     llvm::transform(llvm::seq<uint64_t>(0, numFusedOpLoops),
diff --git a/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir b/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
index 28e1291bce1fa..031cb350bfba4 100644
--- a/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
+++ b/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
@@ -860,6 +860,46 @@ func.func @fusion_different_axes(%arg0 : tensor<5000xi64>, %arg1 : tensor<5000xi
 
 // -----
 
+func.func @fusion_different_axes_indexed(%arg0: tensor<2x2xi32>) ->  tensor<4xi32> {
+  %0 = tensor.empty() : tensor<2x2xi32>
+  %1 = linalg.generic {
+        indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>],
+        iterator_types = ["parallel", "parallel"]}
+        ins(%arg0 : tensor<2x2xi32>) outs(%0 : tensor<2x2xi32>) {
+          ^bb0(%in: i32, %out: i32):
+            %2 = linalg.index 1 : index
+            %3 = arith.index_cast %2 : index to i32
+            linalg.yield %3 : i32
+        } -> tensor<2x2xi32>
+  %4 = tensor.empty() : tensor<4xi32>
+  %5 = linalg.generic {
+        indexing_maps = [affine_map<(d0) -> (d0 floordiv 2, d0 mod 2)>, affine_map<(d0) -> (d0)>],
+        iterator_types = ["parallel"]}
+        ins(%1 : tensor<2x2xi32>) outs(%4 : tensor<4xi32>) {
+          ^bb0(%in: i32, %out: i32):
+            linalg.yield %in : i32
+        } -> tensor<4xi32>
+  return %5 : tensor<4xi32>
+}
+
+//  CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0) -> (d0)>
+//  CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0] -> (s0 mod 2)>
+//      CHECK: func @fusion_different_axes_indexed(
+// CHECK-SAME:     %[[ARG0:.+]]: tensor<2x2xi32>
+//  CHECK-DAG:   %[[INIT:.+]] = tensor.empty() : tensor<4xi32>
+//      CHECK:   %[[RESULT:.+]] = linalg.generic
+// CHECK-SAME:       indexing_maps = [#[[MAP0]]]
+// CHECK-SAME:       outs(%[[INIT]] :
+// CHECK-NEXT:   ^bb0(
+// CHECK-SAME:       %[[B0:.+]]: i32
+//  CHECK-DAG:     %[[T0:.+]] = linalg.index 0
+//  CHECK-DAG:     %[[T1:.+]] = affine.apply #[[MAP1]]()[%[[T0]]]
+//  CHECK-DAG:     %[[CAST:.+]] = arith.index_cast %[[T1]] : index to i32
+//      CHECK:     linalg.yield %[[CAST]]
+//      CHECK:   return %[[RESULT]]
+
+// -----
+
 // CHECK-LABEL: func @fold_fill_generic_basic
 //  CHECK-SAME: (%[[ARG0:.*]]: tensor<?xf32>) -> tensor<?xf32> {
 //   CHECK-NOT: linalg.fill

llvmbot · 2025-05-21T13:07:30Z

@llvm/pr-subscribers-mlir-linalg

Author: Simone Pellegrini (simpel01)

Changes

When fusing two linalg.genericOp, where the producer has index semantics, invalid affine.apply ops can be generated where the number of indices do not match the number of loops in the fused genericOp.

This patch fixes the issue by directly using the number of loops from the generated fused op.

Full diff: https://github.com/llvm/llvm-project/pull/140892.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp (+1-2)
(modified) mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir (+40)

diff --git a/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp b/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
index 1f5af39e604e7..f97ed3d6d5111 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
@@ -231,8 +231,7 @@ static void generateFusedElementwiseOpRegion(
   // `consumerToProducerLoopsMap` to map the producer indices.
   if (producer.hasIndexSemantics()) {
     // Add an index operation for every fused loop dimension.
-    unsigned numFusedOpLoops =
-        std::max(producer.getNumLoops(), consumer.getNumLoops());
+    unsigned numFusedOpLoops = fusedOp.getNumLoops();
     SmallVector<Value> fusedIndices;
     fusedIndices.reserve(numFusedOpLoops);
     llvm::transform(llvm::seq<uint64_t>(0, numFusedOpLoops),
diff --git a/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir b/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
index 28e1291bce1fa..031cb350bfba4 100644
--- a/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
+++ b/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
@@ -860,6 +860,46 @@ func.func @fusion_different_axes(%arg0 : tensor<5000xi64>, %arg1 : tensor<5000xi
 
 // -----
 
+func.func @fusion_different_axes_indexed(%arg0: tensor<2x2xi32>) ->  tensor<4xi32> {
+  %0 = tensor.empty() : tensor<2x2xi32>
+  %1 = linalg.generic {
+        indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>],
+        iterator_types = ["parallel", "parallel"]}
+        ins(%arg0 : tensor<2x2xi32>) outs(%0 : tensor<2x2xi32>) {
+          ^bb0(%in: i32, %out: i32):
+            %2 = linalg.index 1 : index
+            %3 = arith.index_cast %2 : index to i32
+            linalg.yield %3 : i32
+        } -> tensor<2x2xi32>
+  %4 = tensor.empty() : tensor<4xi32>
+  %5 = linalg.generic {
+        indexing_maps = [affine_map<(d0) -> (d0 floordiv 2, d0 mod 2)>, affine_map<(d0) -> (d0)>],
+        iterator_types = ["parallel"]}
+        ins(%1 : tensor<2x2xi32>) outs(%4 : tensor<4xi32>) {
+          ^bb0(%in: i32, %out: i32):
+            linalg.yield %in : i32
+        } -> tensor<4xi32>
+  return %5 : tensor<4xi32>
+}
+
+//  CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0) -> (d0)>
+//  CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0] -> (s0 mod 2)>
+//      CHECK: func @fusion_different_axes_indexed(
+// CHECK-SAME:     %[[ARG0:.+]]: tensor<2x2xi32>
+//  CHECK-DAG:   %[[INIT:.+]] = tensor.empty() : tensor<4xi32>
+//      CHECK:   %[[RESULT:.+]] = linalg.generic
+// CHECK-SAME:       indexing_maps = [#[[MAP0]]]
+// CHECK-SAME:       outs(%[[INIT]] :
+// CHECK-NEXT:   ^bb0(
+// CHECK-SAME:       %[[B0:.+]]: i32
+//  CHECK-DAG:     %[[T0:.+]] = linalg.index 0
+//  CHECK-DAG:     %[[T1:.+]] = affine.apply #[[MAP1]]()[%[[T0]]]
+//  CHECK-DAG:     %[[CAST:.+]] = arith.index_cast %[[T1]] : index to i32
+//      CHECK:     linalg.yield %[[CAST]]
+//      CHECK:   return %[[RESULT]]
+
+// -----
+
 // CHECK-LABEL: func @fold_fill_generic_basic
 //  CHECK-SAME: (%[[ARG0:.*]]: tensor<?xf32>) -> tensor<?xf32> {
 //   CHECK-NOT: linalg.fill

MaheshRavishankar

Thanks for the fix. The fix itself looks fine, but could you change the tests.
The op you use is not explicitly caught as an illegal op, but very few things in Linalg work well for floordiv and mod in indexing maps. Could you add an example where the indexing maps are strictly affine

When fusing two `linalg.genericOp`, where the producer has index semantics, invalid `affine.apply` ops can be generated where the number of indices do not match the number of loops in the fused genericOp. This patch fixes the issue by directly using the number of loops from the generated fused op.

MaheshRavishankar

Thanks for the change. I think the one-liner change maybe ok, but I am concerned about the test added here.

%5 = linalg.generic {
        indexing_maps = [affine_map<(d0) -> (d0, 1)>, affine_map<(d0) -> (d0)>],
        iterator_types = ["parallel"]}
        ins(%1 : tensor<2x2xi32>) outs(%4 : tensor<2xi32>) {
          ^bb0(%in: i32, %out: i32):
            linalg.yield %in : i32
        } -> tensor<2xi32>
  return %5 : tensor<2xi32>

is effectively just a

%5 = tensor.extract_slice %1[0, 1][.., 1] [1, 1] : tensor<2x2xi32> to tensor<2xi32>

Linalg operations are not meant to represent slice semantics this way. It is not caught by the verifier cause that would require a heavy hammer of disallowing indexing maps with constants in the range, and that is not a practical thing to do right now based on how Linalg is used in general.

If you have a different example that motivates the change that might help.

mgehre-amd · 2025-06-02T06:34:48Z

Linalg operations are not meant to represent slice semantics this way. It is not caught by the verifier cause that would require a heavy hammer of disallowing indexing maps with constants in the range, and that is not a practical thing to do right now based on how Linalg is used in general.

Unrelated to this PR, but I fell into the same "trap" (?) because I couldn't find anything in the linalg documentation that would forbid this or explain why it is an bad idea.

simpel01 · 2025-06-02T09:48:58Z

Hi, thanks for the comments.

In order to trigger the compiler assert that this patch fixes, I need an N-D linalg.generic to be folded into some M-D linalg.generic with M<N.

I haven't mentioned this, but without the fix the compiler would generate the following linalg (out of the example I added):

  %3 = "linalg.generic"(%2) <{indexing_maps = [affine_map<(d0) -> (d0)>], iterator_types = [#linalg.iterator_type<parallel>], operandSegmentSizes = array<i32: 0, 1>}> ({
  ^bb0(%arg1: i32):
    %4 = "linalg.index"() <{dim = 0 : i64}> : () -> index
    %5 = "linalg.index"() <{dim = 1 : i64}> : () -> index
    %6 = "affine.apply"(%4, %5) <{map = affine_map<(d0) -> (1)>}> : (index, index) -> index
    %7 = "arith.index_cast"(%6) : (index) -> i32
    "linalg.yield"(%7) : (i32) -> ()
  }) : (tensor<2xi32>) -> tensor<2xi32>

which is illegal as:

'linalg.index' op expected dim (1) to be lower than the number of loops (1) of the enclosing LinalgOp

and later the compiler hits the following assert:

mlir-opt: llvm-project/mlir/lib/IR/AffineMap.cpp:456: mlir::AffineMap mlir::AffineMap::partialConstantFold(llvm::ArrayRef<mlir::Attribute>, llvm::SmallVectorImpl<long int>*, bool*) const: Assertion `getNumInputs() == operandConstants.size()' failed.

The affine expression per se is not really important as long as is not trivial and requires remapping. For example using something trivial like this:

affine_map<(d0) -> (d0, d0)>

won't cause the assert to trigger.

The expression I used in my latest example is the simplest strictly affine map that causes the assert to trigger. The example that motivated this change was what I used in the original patch based on the mod and floordiv expressions.

I think it is going to be difficult to come up with something else that triggers the assert whilst remaining in the "strictly affine realm". But I am not an expert in this subject, so suggestions for suitable 1D into 2D strictly affine expressions are welcome! :)

MaheshRavishankar

Overall this looks right. Lets see if this breaks anything downstream.

github-actions · 2025-06-13T09:03:27Z

@simpel01 Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

…xes (llvm#140892) When fusing two `linalg.genericOp`, where the producer has index semantics, invalid `affine.apply` ops can be generated where the number of indices do not match the number of loops in the fused genericOp. This patch fixes the issue by directly using the number of loops from the generated fused op.

simpel01 requested review from MaheshRavishankar and nicolasvasilache as code owners May 21, 2025 13:06

llvmbot added mlir:linalg mlir labels May 21, 2025

MaheshRavishankar requested changes May 21, 2025

View reviewed changes

simpel01 force-pushed the linalg-fix branch from d1f967d to 7a6a8f0 Compare May 26, 2025 10:07

simpel01 requested a review from MaheshRavishankar May 26, 2025 11:32

MaheshRavishankar requested changes Jun 1, 2025

View reviewed changes

simpel01 requested a review from MaheshRavishankar June 9, 2025 06:58

MaheshRavishankar approved these changes Jun 12, 2025

View reviewed changes

Merge branch 'main' into linalg-fix

3764b89

RoboTux merged commit 4b59b7b into llvm:main Jun 13, 2025
7 checks passed

simpel01 deleted the linalg-fix branch June 13, 2025 09:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][Linalg] Fix fusing of indexed linalg consumer with different axes #140892

[mlir][Linalg] Fix fusing of indexed linalg consumer with different axes #140892

Uh oh!

simpel01 commented May 21, 2025

Uh oh!

github-actions bot commented May 21, 2025

Uh oh!

llvmbot commented May 21, 2025

Uh oh!

llvmbot commented May 21, 2025

Uh oh!

MaheshRavishankar left a comment

Uh oh!

MaheshRavishankar left a comment

Uh oh!

mgehre-amd commented Jun 2, 2025

Uh oh!

simpel01 commented Jun 2, 2025

Uh oh!

MaheshRavishankar left a comment

Uh oh!

Uh oh!

github-actions bot commented Jun 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[mlir][Linalg] Fix fusing of indexed linalg consumer with different axes #140892

[mlir][Linalg] Fix fusing of indexed linalg consumer with different axes #140892

Uh oh!

Conversation

simpel01 commented May 21, 2025

Uh oh!

github-actions bot commented May 21, 2025

Uh oh!

llvmbot commented May 21, 2025

Uh oh!

llvmbot commented May 21, 2025

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

mgehre-amd commented Jun 2, 2025

Uh oh!

simpel01 commented Jun 2, 2025

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jun 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants