Skip to content

Conversation

@simpel01
Copy link
Contributor

When fusing two linalg.genericOp, where the producer has index semantics, invalid affine.apply ops can be generated where the number of indices do not match the number of loops in the fused genericOp.

This patch fixes the issue by directly using the number of loops from the generated fused op.

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented May 21, 2025

@llvm/pr-subscribers-mlir

Author: Simone Pellegrini (simpel01)

Changes

When fusing two linalg.genericOp, where the producer has index semantics, invalid affine.apply ops can be generated where the number of indices do not match the number of loops in the fused genericOp.

This patch fixes the issue by directly using the number of loops from the generated fused op.


Full diff: https://github.com/llvm/llvm-project/pull/140892.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp (+1-2)
  • (modified) mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir (+40)
diff --git a/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp b/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
index 1f5af39e604e7..f97ed3d6d5111 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
@@ -231,8 +231,7 @@ static void generateFusedElementwiseOpRegion(
   // `consumerToProducerLoopsMap` to map the producer indices.
   if (producer.hasIndexSemantics()) {
     // Add an index operation for every fused loop dimension.
-    unsigned numFusedOpLoops =
-        std::max(producer.getNumLoops(), consumer.getNumLoops());
+    unsigned numFusedOpLoops = fusedOp.getNumLoops();
     SmallVector<Value> fusedIndices;
     fusedIndices.reserve(numFusedOpLoops);
     llvm::transform(llvm::seq<uint64_t>(0, numFusedOpLoops),
diff --git a/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir b/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
index 28e1291bce1fa..031cb350bfba4 100644
--- a/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
+++ b/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
@@ -860,6 +860,46 @@ func.func @fusion_different_axes(%arg0 : tensor<5000xi64>, %arg1 : tensor<5000xi
 
 // -----
 
+func.func @fusion_different_axes_indexed(%arg0: tensor<2x2xi32>) ->  tensor<4xi32> {
+  %0 = tensor.empty() : tensor<2x2xi32>
+  %1 = linalg.generic {
+        indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>],
+        iterator_types = ["parallel", "parallel"]}
+        ins(%arg0 : tensor<2x2xi32>) outs(%0 : tensor<2x2xi32>) {
+          ^bb0(%in: i32, %out: i32):
+            %2 = linalg.index 1 : index
+            %3 = arith.index_cast %2 : index to i32
+            linalg.yield %3 : i32
+        } -> tensor<2x2xi32>
+  %4 = tensor.empty() : tensor<4xi32>
+  %5 = linalg.generic {
+        indexing_maps = [affine_map<(d0) -> (d0 floordiv 2, d0 mod 2)>, affine_map<(d0) -> (d0)>],
+        iterator_types = ["parallel"]}
+        ins(%1 : tensor<2x2xi32>) outs(%4 : tensor<4xi32>) {
+          ^bb0(%in: i32, %out: i32):
+            linalg.yield %in : i32
+        } -> tensor<4xi32>
+  return %5 : tensor<4xi32>
+}
+
+//  CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0) -> (d0)>
+//  CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0] -> (s0 mod 2)>
+//      CHECK: func @fusion_different_axes_indexed(
+// CHECK-SAME:     %[[ARG0:.+]]: tensor<2x2xi32>
+//  CHECK-DAG:   %[[INIT:.+]] = tensor.empty() : tensor<4xi32>
+//      CHECK:   %[[RESULT:.+]] = linalg.generic
+// CHECK-SAME:       indexing_maps = [#[[MAP0]]]
+// CHECK-SAME:       outs(%[[INIT]] :
+// CHECK-NEXT:   ^bb0(
+// CHECK-SAME:       %[[B0:.+]]: i32
+//  CHECK-DAG:     %[[T0:.+]] = linalg.index 0
+//  CHECK-DAG:     %[[T1:.+]] = affine.apply #[[MAP1]]()[%[[T0]]]
+//  CHECK-DAG:     %[[CAST:.+]] = arith.index_cast %[[T1]] : index to i32
+//      CHECK:     linalg.yield %[[CAST]]
+//      CHECK:   return %[[RESULT]]
+
+// -----
+
 // CHECK-LABEL: func @fold_fill_generic_basic
 //  CHECK-SAME: (%[[ARG0:.*]]: tensor<?xf32>) -> tensor<?xf32> {
 //   CHECK-NOT: linalg.fill

@llvmbot
Copy link
Member

llvmbot commented May 21, 2025

@llvm/pr-subscribers-mlir-linalg

Author: Simone Pellegrini (simpel01)

Changes

When fusing two linalg.genericOp, where the producer has index semantics, invalid affine.apply ops can be generated where the number of indices do not match the number of loops in the fused genericOp.

This patch fixes the issue by directly using the number of loops from the generated fused op.


Full diff: https://github.com/llvm/llvm-project/pull/140892.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp (+1-2)
  • (modified) mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir (+40)
diff --git a/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp b/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
index 1f5af39e604e7..f97ed3d6d5111 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
@@ -231,8 +231,7 @@ static void generateFusedElementwiseOpRegion(
   // `consumerToProducerLoopsMap` to map the producer indices.
   if (producer.hasIndexSemantics()) {
     // Add an index operation for every fused loop dimension.
-    unsigned numFusedOpLoops =
-        std::max(producer.getNumLoops(), consumer.getNumLoops());
+    unsigned numFusedOpLoops = fusedOp.getNumLoops();
     SmallVector<Value> fusedIndices;
     fusedIndices.reserve(numFusedOpLoops);
     llvm::transform(llvm::seq<uint64_t>(0, numFusedOpLoops),
diff --git a/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir b/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
index 28e1291bce1fa..031cb350bfba4 100644
--- a/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
+++ b/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
@@ -860,6 +860,46 @@ func.func @fusion_different_axes(%arg0 : tensor<5000xi64>, %arg1 : tensor<5000xi
 
 // -----
 
+func.func @fusion_different_axes_indexed(%arg0: tensor<2x2xi32>) ->  tensor<4xi32> {
+  %0 = tensor.empty() : tensor<2x2xi32>
+  %1 = linalg.generic {
+        indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>],
+        iterator_types = ["parallel", "parallel"]}
+        ins(%arg0 : tensor<2x2xi32>) outs(%0 : tensor<2x2xi32>) {
+          ^bb0(%in: i32, %out: i32):
+            %2 = linalg.index 1 : index
+            %3 = arith.index_cast %2 : index to i32
+            linalg.yield %3 : i32
+        } -> tensor<2x2xi32>
+  %4 = tensor.empty() : tensor<4xi32>
+  %5 = linalg.generic {
+        indexing_maps = [affine_map<(d0) -> (d0 floordiv 2, d0 mod 2)>, affine_map<(d0) -> (d0)>],
+        iterator_types = ["parallel"]}
+        ins(%1 : tensor<2x2xi32>) outs(%4 : tensor<4xi32>) {
+          ^bb0(%in: i32, %out: i32):
+            linalg.yield %in : i32
+        } -> tensor<4xi32>
+  return %5 : tensor<4xi32>
+}
+
+//  CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0) -> (d0)>
+//  CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0] -> (s0 mod 2)>
+//      CHECK: func @fusion_different_axes_indexed(
+// CHECK-SAME:     %[[ARG0:.+]]: tensor<2x2xi32>
+//  CHECK-DAG:   %[[INIT:.+]] = tensor.empty() : tensor<4xi32>
+//      CHECK:   %[[RESULT:.+]] = linalg.generic
+// CHECK-SAME:       indexing_maps = [#[[MAP0]]]
+// CHECK-SAME:       outs(%[[INIT]] :
+// CHECK-NEXT:   ^bb0(
+// CHECK-SAME:       %[[B0:.+]]: i32
+//  CHECK-DAG:     %[[T0:.+]] = linalg.index 0
+//  CHECK-DAG:     %[[T1:.+]] = affine.apply #[[MAP1]]()[%[[T0]]]
+//  CHECK-DAG:     %[[CAST:.+]] = arith.index_cast %[[T1]] : index to i32
+//      CHECK:     linalg.yield %[[CAST]]
+//      CHECK:   return %[[RESULT]]
+
+// -----
+
 // CHECK-LABEL: func @fold_fill_generic_basic
 //  CHECK-SAME: (%[[ARG0:.*]]: tensor<?xf32>) -> tensor<?xf32> {
 //   CHECK-NOT: linalg.fill

Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. The fix itself looks fine, but could you change the tests.
The op you use is not explicitly caught as an illegal op, but very few things in Linalg work well for floordiv and mod in indexing maps. Could you add an example where the indexing maps are strictly affine

When fusing two `linalg.genericOp`, where the producer has index
semantics, invalid `affine.apply` ops can be generated where the number
of indices do not match the number of loops in the fused genericOp.

This patch fixes the issue by directly using the number of loops from the
generated fused op.
Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change. I think the one-liner change maybe ok, but I am concerned about the test added here.

%5 = linalg.generic {
        indexing_maps = [affine_map<(d0) -> (d0, 1)>, affine_map<(d0) -> (d0)>],
        iterator_types = ["parallel"]}
        ins(%1 : tensor<2x2xi32>) outs(%4 : tensor<2xi32>) {
          ^bb0(%in: i32, %out: i32):
            linalg.yield %in : i32
        } -> tensor<2xi32>
  return %5 : tensor<2xi32>

is effectively just a

%5 = tensor.extract_slice %1[0, 1][.., 1] [1, 1] : tensor<2x2xi32> to tensor<2xi32>

Linalg operations are not meant to represent slice semantics this way. It is not caught by the verifier cause that would require a heavy hammer of disallowing indexing maps with constants in the range, and that is not a practical thing to do right now based on how Linalg is used in general.

If you have a different example that motivates the change that might help.

@mgehre-amd
Copy link
Contributor

Linalg operations are not meant to represent slice semantics this way. It is not caught by the verifier cause that would require a heavy hammer of disallowing indexing maps with constants in the range, and that is not a practical thing to do right now based on how Linalg is used in general.

Unrelated to this PR, but I fell into the same "trap" (?) because I couldn't find anything in the linalg documentation that would forbid this or explain why it is an bad idea.

@simpel01
Copy link
Contributor Author

simpel01 commented Jun 2, 2025

Hi, thanks for the comments.

In order to trigger the compiler assert that this patch fixes, I need an N-D linalg.generic to be folded into some M-D linalg.generic with M<N.

I haven't mentioned this, but without the fix the compiler would generate the following linalg (out of the example I added):

  %3 = "linalg.generic"(%2) <{indexing_maps = [affine_map<(d0) -> (d0)>], iterator_types = [#linalg.iterator_type<parallel>], operandSegmentSizes = array<i32: 0, 1>}> ({
  ^bb0(%arg1: i32):
    %4 = "linalg.index"() <{dim = 0 : i64}> : () -> index
    %5 = "linalg.index"() <{dim = 1 : i64}> : () -> index
    %6 = "affine.apply"(%4, %5) <{map = affine_map<(d0) -> (1)>}> : (index, index) -> index
    %7 = "arith.index_cast"(%6) : (index) -> i32
    "linalg.yield"(%7) : (i32) -> ()
  }) : (tensor<2xi32>) -> tensor<2xi32>

which is illegal as:

'linalg.index' op expected dim (1) to be lower than the number of loops (1) of the enclosing LinalgOp

and later the compiler hits the following assert:

mlir-opt: llvm-project/mlir/lib/IR/AffineMap.cpp:456: mlir::AffineMap mlir::AffineMap::partialConstantFold(llvm::ArrayRef<mlir::Attribute>, llvm::SmallVectorImpl<long int>*, bool*) const: Assertion `getNumInputs() == operandConstants.size()' failed.

The affine expression per se is not really important as long as is not trivial and requires remapping. For example using something trivial like this:

affine_map<(d0) -> (d0, d0)>

won't cause the assert to trigger.

The expression I used in my latest example is the simplest strictly affine map that causes the assert to trigger. The example that motivated this change was what I used in the original patch based on the mod and floordiv expressions.

I think it is going to be difficult to come up with something else that triggers the assert whilst remaining in the "strictly affine realm". But I am not an expert in this subject, so suggestions for suitable 1D into 2D strictly affine expressions are welcome! :)

Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks right. Lets see if this breaks anything downstream.

@RoboTux RoboTux merged commit 4b59b7b into llvm:main Jun 13, 2025
7 checks passed
@github-actions
Copy link

@simpel01 Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@simpel01 simpel01 deleted the linalg-fix branch June 13, 2025 09:12
tomtor pushed a commit to tomtor/llvm-project that referenced this pull request Jun 14, 2025
…xes (llvm#140892)

When fusing two `linalg.genericOp`, where the producer has index
semantics, invalid `affine.apply` ops can be generated where the number
of indices do not match the number of loops in the fused genericOp.

This patch fixes the issue by directly using the number of loops from
the generated fused op.
akuhlens pushed a commit to akuhlens/llvm-project that referenced this pull request Jun 24, 2025
…xes (llvm#140892)

When fusing two `linalg.genericOp`, where the producer has index
semantics, invalid `affine.apply` ops can be generated where the number
of indices do not match the number of loops in the fused genericOp.

This patch fixes the issue by directly using the number of loops from
the generated fused op.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants