Skip to content

Conversation

@srcarroll
Copy link
Contributor

@srcarroll srcarroll commented Jun 19, 2025

This patch modifies FuseElementwiseOps to support fusing named ops, such as linalg.map and linalg.elementwise, which are always elementwise. The fundamental logic remains the same and, for the most part, we need only to change GenericOp to LinalgOp.

@srcarroll srcarroll changed the title Generalize fusion [mlir][linalg] Elementwise fusion for linalg.map op Jun 19, 2025
@srcarroll srcarroll changed the title [mlir][linalg] Elementwise fusion for linalg.map op [mlir][linalg] Elementwise fusion for any LinalgOp Jun 19, 2025
@github-actions
Copy link

github-actions bot commented Jun 19, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@srcarroll
Copy link
Contributor Author

srcarroll commented Jun 19, 2025

The current changes in this PR will fuse linalg.map ops, but produce a linalg.generic even when it is technically possible to generate another linalg.map. I dont know enough to modify the code in a sane way to get linalg.map outputs (when appropriate) so if anyone has any ideas, i would love to hear. I dont think it's important enough to go crazy with, but would be nice.

@srcarroll
Copy link
Contributor Author

srcarroll commented Jun 19, 2025

Although i didn't expect differences between implementing for generic vs map, but i am seeing a failure in a case with linalg.map that passes for the equivalent linalg.generic version.

map version

func.func @main() -> (tensor<1xi64>, tensor<1xi64>) {
    %cst_16 = arith.constant dense<0> : tensor<1xi64>
    %cst_17 = arith.constant dense<1> : tensor<1xi64>
    %cst_3 = arith.constant dense<26> : tensor<1xi64>
    %c0_i64 = arith.constant 0 : i64
    %c1_i64 = arith.constant 1 : i64
    %9 = tensor.empty() : tensor<1xi1>
    %mapped_25 = linalg.map ins(%cst_17 : tensor<1xi64>) outs(%9 : tensor<1xi1>)
      (%in: i64) {
        %255 = arith.cmpi slt, %in, %c0_i64 : i64
        linalg.yield %255 : i1
      }
    %11 = tensor.empty() : tensor<1xi64>
    %mapped_27 = linalg.map ins(%cst_17 : tensor<1xi64>) outs(%11 : tensor<1xi64>)
      (%in: i64) {
        %255 = arith.subi %c0_i64, %in : i64
        linalg.yield %255 : i64
      }
    %mapped_28 = linalg.map ins(%cst_3 : tensor<1xi64>) outs(%11 : tensor<1xi64>)
      (%in: i64) {
        %255 = arith.addi %in, %c1_i64 : i64
        linalg.yield %255 : i64
      }
    %mapped_29 = linalg.map { arith.select } ins(%mapped_25, %mapped_28, %cst_16 : tensor<1xi1>, tensor<1xi64>, tensor<1xi64>) outs(%11 : tensor<1xi64>)
    %mapped_30 = linalg.map { arith.select } ins(%mapped_25, %mapped_27, %cst_17 : tensor<1xi1>, tensor<1xi64>, tensor<1xi64>) outs(%11 : tensor<1xi64>)
    return %mapped_29, %mapped_30 : tensor<1xi64>, tensor<1xi64>
}

fails with

failed-fusion.mlir:25:18: error: 'linalg.generic' op expected type of bb argument #0 ('i64') to match element or self type of the corresponding operand ('i1')
    %mapped_30 = linalg.map { arith.select } ins(%mapped_25, %mapped_27, %cst_17 : tensor<1xi1>, tensor<1xi64>, tensor<1xi64>) outs(%11 : tensor<1xi64>)
                 ^
failed-fusion.mlir:25:18: note: see current operation: 
%5:2 = "linalg.generic"(%2, %3) <{indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], iterator_types = [#linalg.iterator_type<parallel>], operandSegmentSizes = array<i32: 0, 2>}> ({
^bb0(%arg0: i64, %arg1: i64):
  %6 = "arith.cmpi"(%arg0, %1) <{predicate = 2 : i64}> : (i64, i64) -> i1
  %7 = "arith.select"(%6, %0, %arg1) : (i1, i64, i64) -> i64
  "linalg.yield"(%6, %7) : (i1, i64) -> ()
}) : (tensor<1xi1>, tensor<1xi64>) -> (tensor<1xi1>, tensor<1xi64>)

generic version

#map = affine_map<(d0)->(d0)>
func.func @main() -> (tensor<1xi64>, tensor<1xi64>) {
    %cst_16 = arith.constant dense<0> : tensor<1xi64>
    %cst_17 = arith.constant dense<1> : tensor<1xi64>
    %cst_3 = arith.constant dense<26> : tensor<1xi64>
    %c0_i64 = arith.constant 0 : i64
    %c1_i64 = arith.constant 1 : i64
    %9 = tensor.empty() : tensor<1xi1>
    %mapped_25 = linalg.generic {
      indexing_maps = [#map, #map],
      iterator_types = ["parallel"]}
      ins(%cst_17 : tensor<1xi64>) outs(%9 : tensor<1xi1>) {
    ^bb0(%b0: i64, %b1: i1):
        %255 = arith.cmpi slt, %b0, %c0_i64 : i64
        linalg.yield %255 : i1
    } -> tensor<1xi1>
    %11 = tensor.empty() : tensor<1xi64>
    %mapped_27 = linalg.generic {
      indexing_maps = [#map, #map],
      iterator_types = ["parallel"]}
      ins(%cst_17 : tensor<1xi64>) outs(%11 : tensor<1xi64>) {
    ^bb0(%b0: i64, %b1: i64):
        %255 = arith.subi %c0_i64, %b0 : i64
        linalg.yield %255 : i64
    } -> tensor<1xi64>
    %mapped_28 = linalg.generic {
      indexing_maps = [#map, #map],
      iterator_types = ["parallel"]}
      ins(%cst_3 : tensor<1xi64>) outs(%11 : tensor<1xi64>) {
    ^bb0(%b0: i64, %b1: i64):
        %255 = arith.addi %b0, %c1_i64 : i64
        linalg.yield %255 : i64
    } -> tensor<1xi64>
    %mapped_29 = linalg.generic {
      indexing_maps = [#map, #map, #map, #map],
      iterator_types = ["parallel"]}
      ins(%mapped_25, %mapped_28, %cst_16 : tensor<1xi1>, tensor<1xi64>, tensor<1xi64>) outs(%11 : tensor<1xi64>) {
    ^bb0(%b0: i1, %b1: i64, %b2: i64, %b3: i64):
        %255 = arith.select %b0, %b1, %b2 : i64
        linalg.yield %255 : i64
    } -> tensor<1xi64>
    %mapped_30 = linalg.generic {
      indexing_maps = [#map, #map, #map, #map],
      iterator_types = ["parallel"]}
      ins(%mapped_25, %mapped_27, %cst_17 : tensor<1xi1>, tensor<1xi64>, tensor<1xi64>) outs(%11 : tensor<1xi64>) {
    ^bb0(%b0: i1, %b1: i64, %b2: i64, %b3: i64):
        %255 = arith.select %b0, %b1, %b2 : i64
        linalg.yield %255 : i64
    } -> tensor<1xi64>
    return %mapped_29, %mapped_30 : tensor<1xi64>, tensor<1xi64>
}

successfully fuses and generates

  func.func @main() -> (tensor<1xi64>, tensor<1xi64>) {
    %c1_i64 = arith.constant 1 : i64
    %c0_i64 = arith.constant 0 : i64
    %0 = tensor.empty() : tensor<1xi64>
    %1 = linalg.generic {indexing_maps = [#map], iterator_types = ["parallel"]} outs(%0 : tensor<1xi64>) {
    ^bb0(%out: i64):
      linalg.yield %c0_i64 : i64
    } -> tensor<1xi64>
    %2 = linalg.generic {indexing_maps = [#map], iterator_types = ["parallel"]} outs(%0 : tensor<1xi64>) {
    ^bb0(%out: i64):
      linalg.yield %c1_i64 : i64
    } -> tensor<1xi64>
    return %1, %2 : tensor<1xi64>, tensor<1xi64>
  }

since linalg-generalize-named-ops doesn't convert linalg.map i had to translate by hand, so hopefully didn't mess it up.

It's strange that this case doesn't constant fold in the first place. The first 3 generics can definitely be constant folded to dense<false>, dense<1> and dense<27>, respectively. Which means the last two ops would equate to dense<0> and dense<1>, which is what the above fused result for the generics suggests (but again, can be constant folded)

@srcarroll
Copy link
Contributor Author

srcarroll commented Jun 19, 2025

I think I know why this fails. The fundamental difference between linalg.map and linalg.generic is that the former doesn't have block arguments for the init operands. So any logic involving changing/mapping output block args won't be guaranteed to work generically in the current state. I'll try to figure something out

// CHECK-NEXT: %[[EXP0:.*]] = math.exp %[[IN1]]
// CHECK-NEXT: %[[SQRT0:.*]] = math.sqrt %[[IN0]]
// CHECK-NEXT: %[[EXP1:.*]] = math.exp %[[IN1]]
// CHECK-NEXT: %[[SQRT1:.*]] = math.sqrt %[[IN0]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the duplicate computations are an old artifact. these do go away with cse but let me know if this is something that should be looked at

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed odd. Looks like a bug in the fuser. Could be related to the map vs generic issue you've seen above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did make a generic version of this and ran the old version of the pass and got same results to confirm it's a pre-existing thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's the generic version

#map = affine_map<(d0)->(d0)>
func.func @map_ops_mixed_types(%arg0: tensor<8xf32>, %arg1: tensor<8xf32>, %arg3: tensor<8xf32>) -> tensor<8xf32> {
  %init = tensor.empty() : tensor<8xi1>
  %initf = tensor.empty() : tensor<8xf32>
  %0 = linalg.generic {
      indexing_maps = [#map, #map],
      iterator_types = ["parallel"]} ins(%arg0 : tensor<8xf32>) outs(%initf : tensor<8xf32>) {
    ^bb0(%in0 : f32, %out : f32):
        %sqrt = math.sqrt %in0 : f32
        linalg.yield %sqrt : f32
    } -> tensor<8xf32>
  %1 = linalg.generic {
      indexing_maps = [#map, #map],
      iterator_types = ["parallel"]} ins(%arg1 : tensor<8xf32>) outs(%initf : tensor<8xf32>) {
    ^bb0(%in0 : f32, %out : f32):
        %sqrt = math.exp %in0 : f32
        linalg.yield %sqrt : f32
    } -> tensor<8xf32>
  %2 = linalg.generic {
      indexing_maps = [#map, #map, #map],
      iterator_types = ["parallel"]} ins(%0, %1 : tensor<8xf32>, tensor<8xf32>) outs(%init : tensor<8xi1>) {
    ^bb0(%in0 : f32, %in1 : f32, %out: i1):
      %cmp = arith.cmpf olt, %in0, %in1 : f32
      linalg.yield %cmp : i1
  } -> tensor<8xi1>
  %3 = linalg.generic {
      indexing_maps = [#map, #map, #map, #map],
      iterator_types = ["parallel"]} ins(%2, %0, %1 : tensor<8xi1>, tensor<8xf32>, tensor<8xf32>) outs(%initf : tensor<8xf32>) { 
    ^bb0(%in0 : i1, %in1 : f32, %in2 : f32, %out: f32):
      %select = arith.select %in0, %in1, %in2 : f32
      linalg.yield %select : f32
  } -> tensor<8xf32>
  return %3 : tensor<8xf32>
}

@llvmbot
Copy link
Member

llvmbot commented Jun 19, 2025

@llvm/pr-subscribers-mlir-linalg

@llvm/pr-subscribers-mlir

Author: None (srcarroll)

Changes

This patch modifies FuseElementwiseOps and related code to work on any LinalgOp rather than only GenericOps. For example, this enables fusion for MapOps, which are always elementwise. The fundamental logic is unchanged and, for the most part, the changes are simply changing types from GenericOp to LinalgOp.


Full diff: https://github.com/llvm/llvm-project/pull/144922.diff

4 Files Affected:

  • (modified) mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h (+2-2)
  • (modified) mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp (+42-21)
  • (modified) mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir (+21)
  • (modified) mlir/test/Dialect/Linalg/fusion-elementwise.mlir (+54)
diff --git a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
index 147a2907f52e4..f0c8f0de06637 100644
--- a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
+++ b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
@@ -529,8 +529,8 @@ fuseElementwiseOps(RewriterBase &rewriter, OpOperand *fusedOperand);
 /// * There is a chance that the implementation of the transformation does not
 /// agree with the result of this method. This function gives a prediction based
 /// on an optimized fusion.
-llvm::SmallDenseSet<int> getPreservedProducerResults(GenericOp producer,
-                                                     GenericOp consumer,
+llvm::SmallDenseSet<int> getPreservedProducerResults(LinalgOp producer,
+                                                     LinalgOp consumer,
                                                      OpOperand *fusedOperand);
 
 /// Try to peel and canonicalize loop `op` and return the new result.
diff --git a/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp b/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
index f97ed3d6d5111..fc435b47f5977 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
@@ -77,11 +77,11 @@ static AffineMap getIndexingMapOfProducerOperandsInCoordinatesOfFusedOp(
 // of the fused producer & consumer after the fusion can still compute the
 // bounds of the op.
 static bool isOpOperandCanBeDroppedAfterFusedLinalgs(
-    GenericOp producer, GenericOp consumer,
+    LinalgOp producer, LinalgOp consumer,
     ArrayRef<OpOperand *> opOperandsToIgnore) {
   SmallVector<AffineMap> indexingMaps;
 
-  SmallVector<GenericOp> ops = {producer, consumer};
+  SmallVector<LinalgOp> ops = {producer, consumer};
   for (auto &op : ops) {
     for (auto &opOperand : op->getOpOperands()) {
       if (llvm::is_contained(opOperandsToIgnore, &opOperand)) {
@@ -109,8 +109,9 @@ static bool isOpOperandCanBeDroppedAfterFusedLinalgs(
 /// * There is a chance that the implementation of the transformation does not
 /// agree with the result of this method. This function gives a prediction based
 /// on an optimized fusion.
-llvm::SmallDenseSet<int> mlir::linalg::getPreservedProducerResults(
-    GenericOp producer, GenericOp consumer, OpOperand *fusedOperand) {
+llvm::SmallDenseSet<int>
+mlir::linalg::getPreservedProducerResults(LinalgOp producer, LinalgOp consumer,
+                                          OpOperand *fusedOperand) {
   llvm::SmallDenseSet<int> preservedProducerResults;
   llvm::SmallVector<OpOperand *> opOperandsToIgnore;
 
@@ -140,8 +141,8 @@ bool mlir::linalg::areElementwiseOpsFusable(OpOperand *fusedOperand) {
   if (!fusedOperand)
     return false;
 
-  auto producer = fusedOperand->get().getDefiningOp<GenericOp>();
-  auto consumer = dyn_cast<GenericOp>(fusedOperand->getOwner());
+  auto producer = fusedOperand->get().getDefiningOp<LinalgOp>();
+  auto consumer = dyn_cast<LinalgOp>(fusedOperand->getOwner());
 
   // Check producer and consumer are generic ops.
   if (!producer || !consumer)
@@ -215,16 +216,39 @@ bool mlir::linalg::areElementwiseOpsFusable(OpOperand *fusedOperand) {
 /// Generate the region of the fused tensor operation. The region of the fused
 /// op must be empty.
 static void generateFusedElementwiseOpRegion(
-    RewriterBase &rewriter, GenericOp fusedOp,
+    RewriterBase &rewriter, LinalgOp fusedOp,
     AffineMap consumerToProducerLoopsMap, OpOperand *fusedOperand,
     unsigned nloops, llvm::SmallDenseSet<int> &preservedProducerResults) {
-  auto producer = cast<GenericOp>(fusedOperand->get().getDefiningOp());
-  auto consumer = cast<GenericOp>(fusedOperand->getOwner());
+  auto producer = cast<LinalgOp>(fusedOperand->get().getDefiningOp());
+  auto consumer = cast<LinalgOp>(fusedOperand->getOwner());
   // Build the region of the fused op.
+
+  // Since some ops, like `linalg.map`, do not have block arguments for init operands
+  // then we first "generalize" the block by adding arguments for init operands when
+  // they aren't present. We detect this case by checking if
+  // `getOpOperandsMatchingBBargs() == getDpsInputOperands(); 
   Block &producerBlock = producer->getRegion(0).front();
+  if (producer.getOpOperandsMatchingBBargs() ==
+      producer.getDpsInputOperands()) {
+    for (auto init : producer.getDpsInits()) {
+      Type bbType = isa<ShapedType>(init.getType())
+                        ? cast<ShapedType>(init.getType()).getElementType()
+                        : init.getType();
+      producerBlock.addArgument(bbType, producer.getLoc());
+    }
+  }
   Block &consumerBlock = consumer->getRegion(0).front();
+  if (consumer.getOpOperandsMatchingBBargs() ==
+      consumer.getDpsInputOperands()) {
+    for (auto init : consumer.getDpsInits()) {
+      Type bbType = isa<ShapedType>(init.getType())
+                        ? cast<ShapedType>(init.getType()).getElementType()
+                        : init.getType();
+      consumerBlock.addArgument(bbType, consumer.getLoc());
+    }
+  }
   OpBuilder::InsertionGuard guard(rewriter);
-  Block *fusedBlock = rewriter.createBlock(&fusedOp.getRegion());
+  Block *fusedBlock = rewriter.createBlock(&fusedOp->getRegion(0));
   IRMapping mapper;
 
   // 2. Add an index operation for every fused loop dimension and use the
@@ -330,7 +354,7 @@ static void generateFusedElementwiseOpRegion(
   rewriter.create<YieldOp>(fusedOp.getLoc(), fusedYieldValues);
 
   // Sanity checks.
-  assert(fusedBlock->getNumArguments() == fusedOp.getNumOperands() &&
+  assert(fusedBlock->getNumArguments() == fusedOp->getNumOperands() &&
          "Ill-formed GenericOp region");
 }
 
@@ -340,8 +364,8 @@ mlir::linalg::fuseElementwiseOps(RewriterBase &rewriter,
   assert(areElementwiseOpsFusable(fusedOperand) &&
          "expected elementwise operation pre-conditions to pass");
   auto producerResult = cast<OpResult>(fusedOperand->get());
-  auto producer = cast<GenericOp>(producerResult.getOwner());
-  auto consumer = cast<GenericOp>(fusedOperand->getOwner());
+  auto producer = cast<LinalgOp>(producerResult.getOwner());
+  auto consumer = cast<LinalgOp>(fusedOperand->getOwner());
   // TODO: allow fusing the producer of an output operand.
   assert(consumer.isDpsInput(fusedOperand) &&
          "expected producer of input operand");
@@ -418,10 +442,7 @@ mlir::linalg::fuseElementwiseOps(RewriterBase &rewriter,
   // Generate the fused op.
   auto fusedOp = rewriter.create<GenericOp>(
       consumer.getLoc(), fusedResultTypes, fusedInputOperands,
-      fusedOutputOperands, rewriter.getAffineMapArrayAttr(fusedIndexMaps),
-      consumer.getIteratorTypes(),
-      /*doc=*/nullptr,
-      /*library_call=*/nullptr);
+      fusedOutputOperands, fusedIndexMaps, consumer.getIteratorTypesArray());
   if (!fusedOp.getShapesToLoopsMap()) {
     // Fused op has invalid indexing maps. Typically this means something is off
     // in the input, but going ahead here would result in verification errors.
@@ -460,14 +481,14 @@ mlir::linalg::fuseElementwiseOps(RewriterBase &rewriter,
 
 namespace {
 /// Patterns to fuse a generic op, with the producer of its operands.
-class FuseElementwiseOps : public OpRewritePattern<GenericOp> {
+class FuseElementwiseOps : public OpInterfaceRewritePattern<LinalgOp> {
 public:
   FuseElementwiseOps(MLIRContext *context, ControlFusionFn fun,
                      PatternBenefit benefit = 1)
-      : OpRewritePattern<GenericOp>(context, benefit),
+      : OpInterfaceRewritePattern<LinalgOp>(context, benefit),
         controlFn(std::move(fun)) {}
 
-  LogicalResult matchAndRewrite(GenericOp genericOp,
+  LogicalResult matchAndRewrite(LinalgOp genericOp,
                                 PatternRewriter &rewriter) const override {
     // Find the first operand that is defined by another generic op on tensors.
     for (OpOperand &opOperand : genericOp->getOpOperands()) {
@@ -494,7 +515,7 @@ class FuseElementwiseOps : public OpRewritePattern<GenericOp> {
       rewriter.eraseOp(genericOp);
       return success();
     }
-    return failure();
+    return rewriter.notifyMatchFailure(genericOp, "no fusable operands");
   }
 
 private:
diff --git a/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir b/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
index 66fc55fadf8fa..b581567cf57a7 100644
--- a/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
+++ b/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir
@@ -1014,3 +1014,24 @@ module {
 //   CHECK-DAG:     %[[T3:.+]] = arith.addf %[[T2]], %[[B1]]
 //       CHECK:     linalg.yield %[[T3]] : f32
 //       CHECK:   return %[[GENERIC]]
+
+// -----
+
+func.func @map_ops(%in1: tensor<8xf32>, %in2: tensor<8xf32>) -> tensor<8xf32> {
+    %fill = tensor.empty() : tensor<8xf32>
+    %add = linalg.map {arith.addf} ins(%in1, %in2: tensor<8xf32>, tensor<8xf32>) outs(%fill: tensor<8xf32>)
+    %mapped_65 = linalg.map { math.sqrt } ins(%add : tensor<8xf32>) outs(%fill : tensor<8xf32>)
+    return %mapped_65 : tensor<8xf32>
+}
+
+// CHECK-LABEL: func @map_ops
+//  CHECK-SAME:   %[[ARG0:[a-zA-Z0-9]+]]: tensor<8xf32>
+//  CHECK-SAME:   %[[ARG1:[a-zA-Z0-9]+]]: tensor<8xf32>
+//       CHECK:   %[[EMPTY:.+]] = tensor.empty() : tensor<8xf32>
+//       CHECK:   %[[FUSED_OP:.+]] = linalg.generic
+//  CHECK-SAME:       ins(%[[ARG0]], %[[ARG1]] : {{.*}}) outs(%[[EMPTY]] :
+//  CHECK-NEXT:   ^bb0(%[[IN0:.*]]: f32, %[[IN1:.*]]: f32, %[[OUT:.*]]: f32):
+//  CHECK-NEXT:     %[[ADD:.*]] = arith.addf %[[IN0]], %[[IN1]]
+//  CHECK-NEXT:     %[[SQRT:.*]] = math.sqrt %[[ADD]]
+//  CHECK-NEXT:     linalg.yield %[[SQRT]] 
+//   CHECK-NOT:   linalg.generic
diff --git a/mlir/test/Dialect/Linalg/fusion-elementwise.mlir b/mlir/test/Dialect/Linalg/fusion-elementwise.mlir
index bd9977f1410b9..18ca8b42fa79c 100644
--- a/mlir/test/Dialect/Linalg/fusion-elementwise.mlir
+++ b/mlir/test/Dialect/Linalg/fusion-elementwise.mlir
@@ -59,3 +59,57 @@ func.func @handle_unused_operands(%arg0: tensor<8xf32>, %arg1: tensor<8xf32>) ->
 //       CHECK:   %[[FUSED_OP:.+]] = linalg.generic
 //  CHECK-SAME:       outs(%[[EMPTY]] :
 //   CHECK-NOT:   linalg.generic
+
+// -----
+
+func.func @map_ops(%in1: tensor<8xf32>, %in2: tensor<8xf32>) -> tensor<8xf32> {
+    %fill = tensor.empty() : tensor<8xf32>
+    %add = linalg.map {arith.addf} ins(%in1, %in2: tensor<8xf32>, tensor<8xf32>) outs(%fill: tensor<8xf32>)
+    %mapped_65 = linalg.map { math.sqrt } ins(%add : tensor<8xf32>) outs(%fill : tensor<8xf32>)
+    return %mapped_65 : tensor<8xf32>
+}
+
+// CHECK-LABEL: func @map_ops
+//  CHECK-SAME:   %[[ARG0:[a-zA-Z0-9]+]]: tensor<8xf32>
+//  CHECK-SAME:   %[[ARG1:[a-zA-Z0-9]+]]: tensor<8xf32>
+//       CHECK:   %[[EMPTY:.+]] = tensor.empty() : tensor<8xf32>
+//       CHECK:   %[[FUSED_OP:.+]] = linalg.generic
+//  CHECK-SAME:       ins(%[[ARG0]], %[[ARG1]] : {{.*}}) outs(%[[EMPTY]] :
+//  CHECK-NEXT:   ^bb0(%[[IN0:.*]]: f32, %[[IN1:.*]]: f32, %[[OUT:.*]]: f32):
+//  CHECK-NEXT:     %[[ADD:.*]] = arith.addf %[[IN0]], %[[IN1]]
+//  CHECK-NEXT:     %[[SQRT:.*]] = math.sqrt %[[ADD]]
+//  CHECK-NEXT:     linalg.yield %[[SQRT]] 
+//   CHECK-NOT:   linalg.map
+
+// -----
+
+func.func @map_ops_mixed_types(%arg0: tensor<8xf32>, %arg1: tensor<8xf32>) -> tensor<8xf32> {
+  %init = tensor.empty() : tensor<8xi1>
+  %initf = tensor.empty() : tensor<8xf32>
+  %0 = linalg.map {math.sqrt} ins(%arg0 : tensor<8xf32>) outs(%initf : tensor<8xf32>)
+  %1 = linalg.map {math.exp} ins(%arg1 : tensor<8xf32>) outs(%initf : tensor<8xf32>)
+  %2 = linalg.map ins(%0, %1 : tensor<8xf32>, tensor<8xf32>) outs (%init : tensor<8xi1>)
+    (%in0 : f32, %in1 : f32) {
+      %cmp = arith.cmpf olt, %in0, %in1 : f32
+      linalg.yield %cmp : i1
+  }
+  %3 = linalg.map { arith.select } ins(%2, %0, %1 : tensor<8xi1>, tensor<8xf32>, tensor<8xf32>) outs(%initf : tensor<8xf32>) 
+  return %3 : tensor<8xf32>
+}
+
+// CHECK-LABEL: func @map_ops_mixed_types
+//  CHECK-SAME:   %[[ARG0:[a-zA-Z0-9]+]]: tensor<8xf32>
+//  CHECK-SAME:   %[[ARG1:[a-zA-Z0-9]+]]: tensor<8xf32>
+//       CHECK:   %[[EMPTY:.+]] = tensor.empty() : tensor<8xf32>
+//       CHECK:   %[[FUSED_OP:.+]] = linalg.generic
+//  CHECK-SAME:       ins(%[[ARG0]], %[[ARG1]] : {{.*}}) outs(%[[EMPTY]] :
+//  CHECK-NEXT:   ^bb0(%[[IN0:.*]]: f32, %[[IN1:.*]]: f32, %[[OUT:.*]]: f32):
+//  CHECK-NEXT:     %[[EXP0:.*]] = math.exp %[[IN1]]
+//  CHECK-NEXT:     %[[SQRT0:.*]] = math.sqrt %[[IN0]]
+//  CHECK-NEXT:     %[[EXP1:.*]] = math.exp %[[IN1]]
+//  CHECK-NEXT:     %[[SQRT1:.*]] = math.sqrt %[[IN0]]
+//  CHECK-NEXT:     %[[CMP:.*]] = arith.cmpf olt, %[[SQRT1]], %[[EXP1]]
+//  CHECK-NEXT:     %[[RES:.*]] = arith.select %[[CMP]], %[[SQRT0]], %[[EXP0]]
+//  CHECK-NEXT:     linalg.yield %[[RES]] 
+//   CHECK-NOT:   linalg.map
+

@srcarroll
Copy link
Contributor Author

The current changes in this PR will fuse linalg.map ops, but produce a linalg.generic even when it is technically possible to generate another linalg.map. I dont know enough to modify the code in a sane way to get linalg.map outputs (when appropriate) so if anyone has any ideas, i would love to hear. I dont think it's important enough to go crazy with, but would be nice.

After a chat with @rengolin here , in particular

[quote="rengolin, post:6, topic:83927"]
In this case, a fusion of two elementwise operations would yield a map
[/quote]

it's probably important enough to warrant discussion

@srcarroll
Copy link
Contributor Author

srcarroll commented Jun 20, 2025

It occurs to me now that options like fuse-generic-ops no longer make sense. Although it didn't make sense to me in the first place since generic ops were the only ops that were fused. I think we can get rid of this option altogether in favor of fuse-generic-ops-control. However, I think the functionality of this should change. Since this option always just uses the setFusedOpOperandLimit<4> control function, how about we just repurpose this option to set the limit number? e.g. fuse-operand-limit=4.

Any thoughts?

Edit: Actually, I was just confused by the naming, but I see now how fuse-generic-ops is intended to be used in the context of the rest of the options. I do still think it isn't necessary if absorbed into fuse-generic-ops-control. So, for example, fuse-operand-limit=0 and fuse-operand-limit=max could cover the same functionality as fuse-generic-ops

@srcarroll srcarroll changed the title [mlir][linalg] Elementwise fusion for any LinalgOp [mlir][linalg] Extend FuseElementwiseOps pattern to work with named ops Jun 20, 2025
@srcarroll
Copy link
Contributor Author

I'm now realizing I should extend all the patterns in populateElementwiseOpsFusionPatterns. It should be similarly straight forward. Since the scope is larger than I initially thought, I'll wait for feedback before making any major changes.

@rengolin rengolin requested a review from adam-smnk June 20, 2025 09:31
@rengolin
Copy link
Member

Another question: If we just assume LinalgOp, then wouldn't that also match matmul, contract, softmax and other non-ew operations? Or do we have an explicit check for EW maps?

Would be good to prove that with some negative tests (at least one for contract and one for softmax).

@srcarroll
Copy link
Contributor Author

srcarroll commented Oct 31, 2025

Another question: If we just assume LinalgOp, then wouldn't that also match matmul, contract, softmax and other non-ew operations? Or do we have an explicit check for EW maps?

Would be good to prove that with some negative tests (at least one for contract and one for softmax).

Good question. The name of this pattern is misleading. It already works for non elementwise ops. You can see in tests that they have fusion for contraction ops in linalg.generic form. See https://github.com/llvm/llvm-project/blob/c620d074a3bded54072db8a3bef14430accd007b/mlir/test/Dialect/Linalg/fusion-elementwise-ops.mlir. In my opinion, if it's allowed for the generic form it should also be allowed for the named form. This pattern has it's own rules for whether it fuses operations or not, regardless of if it's elementwise. If matmul + some other op obey those rules, then it should be fused. If that's undesirable, then the pattern logic should change, but that's not really an issue specific to enabling the pattern to any LinalgOp.

I'll do a little more investigating on this

@srcarroll
Copy link
Contributor Author

srcarroll commented Oct 31, 2025

@rengolin just a bit more info

This imposes that the producer is elementwise. But here they allow consumers to have reductions.

@rengolin
Copy link
Member

rengolin commented Nov 1, 2025

If matmul + some other op obey those rules, then it should be fused. If that's undesirable, then the pattern logic should change, but that's not really an issue specific to enabling the pattern to any LinalgOp.

Not directly, I agree, but the current state would limit the damage, if this was really a problem. Widening the scope needs to be done with care. @ftynse @nicolasvasilache, any insight here?

One way to at least document the intention is to actually add tests for the things that make sense (to work and not to work) in this PR, so that when this breaks something, people have something to go on.

@srcarroll
Copy link
Contributor Author

srcarroll commented Nov 1, 2025

If matmul + some other op obey those rules, then it should be fused. If that's undesirable, then the pattern logic should change, but that's not really an issue specific to enabling the pattern to any LinalgOp.

Not directly, I agree, but the current state would limit the damage, if this was really a problem. Widening the scope needs to be done with care. @ftynse @nicolasvasilache, any insight here?

One way to at least document the intention is to actually add tests for the things that make sense (to work and not to work) in this PR, so that when this breaks something, people have something to go on.

Sure I'll add some more test cases. Is there anything in particular you would like to see, in case I miss something of real concern?

Also, to be clear, do you think enabling fusion for, say, linalg.matmul is a bad thing or it's ok? I personally think it's a good thing. However, only in the context of having more options. With the pattern only defined on linalg.generic, that limits my options. On the other hand, fusing ANY op that satisfies conditions is also limiting choices. But the nice thing is this pattern allows for custom control logic via the controlFn callback. I dont actually use the builtin passes, rather I make my own pass by reusing the pattern with my own control logic. So for example, if I dont want to fuse any reduction consumers then I can do something like

    // Don't allow elementwise + reduction fusion
    auto controlFn = [](OpOperand *operand) {
      auto owner = cast<LinalgOp>(operand->getOwner());
      return !llvm::any_of(owner.getIteratorTypesArray(), isReductionIterator);
    };
    RewritePatternSet fusionPatterns(ctx);
    populateElementwiseOpsFusionPatterns(fusionPatterns, controlFn);

So, this together with enabling the pattern with LinalgOp gives us the most options.

@srcarroll
Copy link
Contributor Author

srcarroll commented Nov 1, 2025

@rengolin Here's another idea. Maybe to start i could just modify the controlFn for linalg-fuse-elementwise-ops pass specifically to only allow linalg.generic. This would keep the current behavior for now, while also giving people more options with the pattern, so everyone would win. Then we can have more time think about if linalg-fuse-elementwise-ops should change.

I certainly don't mind adding more tests as you requested, and could still do so for the test-linalg-elementwise-fusion-patterns test pass regardless of what we decide to do with the linalg-fuse-elementwise-ops pass. But this is just a thought.

@MaheshRavishankar
Copy link
Contributor

@rengolin just a bit more info

This imposes that the producer is elementwise. But here they allow consumers to have reductions.

Both the links are the same. Did you intend that?

. Maybe to start i could just modify the controlFn for linalg-fuse-elementwise-ops pass specifically to only allow linalg.generic. This would keep the current behavior for now, while also giving people more options with the pattern, so everyone would win. Then we can have more time think about if linalg-fuse-elementwise-ops should change.

That might not be enough. For the most part, I think of passes like linalg-fuse-elementwise-ops as mostly legacy. I think a lot of users use the fuseElementwiseOps method directly, and making it handle named ops is an unexpected change since the fusion now will also generalize while doing a fusion.

The best way forward might be to add a new method fuseNamedOpsAsElementwiseOps and make the core implementation templatized. This way the original method stays as is and only works on generic ops. And the new methods work on named ops + generic ops.

@srcarroll
Copy link
Contributor Author

srcarroll commented Nov 4, 2025

@rengolin just a bit more info
This imposes that the producer is elementwise. But here they allow consumers to have reductions.

Both the links are the same. Did you intend that?

No, just a mistake. Updated to fix.

. Maybe to start i could just modify the controlFn for linalg-fuse-elementwise-ops pass specifically to only allow linalg.generic. This would keep the current behavior for now, while also giving people more options with the pattern, so everyone would win. Then we can have more time think about if linalg-fuse-elementwise-ops should change.

That might not be enough. For the most part, I think of passes like linalg-fuse-elementwise-ops as mostly legacy. I think a lot of users use the fuseElementwiseOps method directly, and making it handle named ops is an unexpected change since the fusion now will also generalize while doing a fusion.

Ok, makes sense. Although they would already have to generalize named ops to get any use out of the previous implementation anyway. So the end result would be the same, besides the change to MapOp that allows it to be generalized and, thus, fused.

The best way forward might be to add a new method fuseNamedOpsAsElementwiseOps and make the core implementation templatized. This way the original method stays as is and only works on generic ops. And the new methods work on named ops + generic ops.

I think I disagree. Users can already change their controlFn when calling populateElementwiseOpsFusionPatterns if they don't want to opt into this. Or if they aren't using this, but rather fuseElementwiseOps directly, they can still have their own logic to avoid calling this when they don't want to (just like populateElementwiseOpsFusionPatterns does). So I don't see any benefit in adding more code

@MaheshRavishankar
Copy link
Contributor

Ok, makes sense. Although they would already have to generalize named ops to get any use out of the previous implementation anyway. So the end result would be the same, besides the change to MapOp that allows it to be generalized and, thus, fused.

Not everyone goes down named ops for all elementwise operations. Most common use case is that you use named ops for compute ops like matmul/convs/etc. and use generic ops for elementwise operations.

I think I disagree. Users can already change their controlFn when calling populateElementwiseOpsFusionPatterns if they don't want to opt into this. Or if they aren't using this, but rather fuseElementwiseOps directly, they can still have their own logic to avoid calling this when they don't want to (just like populateElementwiseOpsFusionPatterns does). So I don't see any benefit in adding more code

There are many users of this functionality, all using this for elementwise operations. There is not much code duplication. The implementation is common, just a separate entry point without causing all downstream users to HAVE to update the controlFn. It would be good to not break the world.

@srcarroll
Copy link
Contributor Author

There are many users of this functionality, all using this for elementwise operations. There is not much code duplication. The implementation is common, just a separate entry point without causing all downstream users to HAVE to update the controlFn. It would be good to not break the world.

Having to change user code due to upstream changes is just how using open source code works. That's not a good reason to avoid useful changes in upstream. I don't consider this "breaking the world", and that's a pretty alarmist response that isn't helpful code review. And by the way I have to make non-trivial changes all the time when updating upstream, and I'm the only compiler engineer in my company. So I have no sympathy for other users having to change code, especially those from big companies that have plenty of engineering resources.

If there's a specific concern, then please elaborate. Don't just expect the worst and guess that things will break.

@MaheshRavishankar
Copy link
Contributor

Having to change user code due to upstream changes is just how using open source code works. That's not a good reason to avoid useful changes in upstream. I don't consider this "breaking the world", and that's a pretty alarmist response that isn't helpful code review. And by the way I have to make non-trivial changes all the time when updating upstream, and I'm the only compiler engineer in my company. So I have no sympathy for other users having to change code, especially those from big companies that have plenty of engineering resources.

If there's a specific concern, then please elaborate. Don't just expect the worst and guess that things will break.

I dont understand the problem here. Its just a new entry point that reuses the code. Dont see why so much push back against something that straight-forward. You are saying this is a small change. As a long-time maintainer (and someone who wrote most of this code), I am saying this is a fairly substantial change, and its better to just add a new entry point that handles named ops + generic ops. I think thats quite reasonable.

@srcarroll
Copy link
Contributor Author

srcarroll commented Nov 4, 2025

Having to change user code due to upstream changes is just how using open source code works. That's not a good reason to avoid useful changes in upstream. I don't consider this "breaking the world", and that's a pretty alarmist response that isn't helpful code review. And by the way I have to make non-trivial changes all the time when updating upstream, and I'm the only compiler engineer in my company. So I have no sympathy for other users having to change code, especially those from big companies that have plenty of engineering resources.
If there's a specific concern, then please elaborate. Don't just expect the worst and guess that things will break.

I dont understand the problem here. Its just a new entry point that reuses the code. Dont see why so much push back against something that straight-forward. You are saying this is a small change. As a long-time maintainer (and someone who wrote most of this code), I am saying this is a fairly substantial change, and its better to just add a new entry point that handles named ops + generic ops. I think thats quite reasonable.

I'm more objecting to the reason for your suggestion more than pushing back on the change itself. Your suggestion is predicated on the fact that you don't want to make changes in your downstream project. This will never be a valid reason to me and I think most MLIR contributors that aren't invested in their org's self interests would agree with me as well.

If the reason was more of a matter of actual functionality, then fine. If this PR in its current state makes it impossible for you to make changes to keep the same behavior, or the kinds of changes required are unreasonable, then I'd be happy make further changes to avoid that. Moreover, if your suggestion is useful in isolation as more of a code organization thing, then I'm also happy to accommodate.

But the only reason you've given me for the change is that "I dont want to update my downstream code", and sorry but I couldn't care less about that weak excuse. I get this excuse all the time, and I've long since lost my patience for hearing it, so I'm sorry for being blunt about this.

Nevertheless, I will put in some effort to think about your suggestion more and decide for myself if it makes sense on its own outside the context of "I dont want to update my code". And again, it would help to see concrete reasons why this won't work for downstream users. I'm under no delusion that I've thought of everything that could possibly happen so would definitely oblige anyone who is willing to talk about legitimate concrete implications of the changes in this PR.

@MaheshRavishankar
Copy link
Contributor

Having to change user code due to upstream changes is just how using open source code works. That's not a good reason to avoid useful changes in upstream. I don't consider this "breaking the world", and that's a pretty alarmist response that isn't helpful code review. And by the way I have to make non-trivial changes all the time when updating upstream, and I'm the only compiler engineer in my company. So I have no sympathy for other users having to change code, especially those from big companies that have plenty of engineering resources.
If there's a specific concern, then please elaborate. Don't just expect the worst and guess that things will break.

I dont understand the problem here. Its just a new entry point that reuses the code. Dont see why so much push back against something that straight-forward. You are saying this is a small change. As a long-time maintainer (and someone who wrote most of this code), I am saying this is a fairly substantial change, and its better to just add a new entry point that handles named ops + generic ops. I think thats quite reasonable.

I'm more objecting to the reason for your suggestion more than pushing back on the change itself. Your suggestion is predicated on the fact that you don't want to make changes in your downstream project. This will never be a valid reason to me and I think most MLIR contributors that aren't invested in their org's self interests would agree with me as well.

Sorry, this isnt true. MY downstream project does not need to do anything. That already handles this change (and I can fix whatever needs to be fixed downstream for MY project). I am asking for a change to not break all OTHER downstream projects. The tone of the response to me reads unnecessarily hostile. In my opinion this is a fundamental change to what this change does. Earlier the change would take two linalg.generics and generate one linalg.generic. Now it is taking any linalg named ops and producing a linalg.generic. So it is generalizing and fusing. That is a significant change in my book. You can reuse the implementation, but needs to be a separate entry point

If the reason was more of a matter of actual functionality, then fine. If this PR in its current state makes it impossible for you to make changes to keep the same behavior, or the kinds of changes required are unreasonable, then I'd be happy make further changes to avoid that. Moreover, if your suggestion is useful in isolation as more of a code organization thing, then I'm also happy to accommodate.

But the only reason you've given me for the change is that "I dont want to update my downstream code", and sorry but I couldn't care less about that weak excuse. I get this excuse all the time, and I've long since lost my patience for hearing it, so I'm sorry for being blunt about this.

Nevertheless, I will put in some effort to think about your suggestion more and decide for myself if it makes sense on its own outside the context of "I dont want to update my code". And again, it would help to see concrete reasons why this won't work for downstream users. I'm under no delusion that I've thought of everything that could possibly happen so would definitely oblige anyone who is willing to talk about legitimate concrete implications of the changes in this PR.

Again, please be mindful about the tone of your suggestion. Upstream code affects a lot of projects and not just your particular use case. I am not concerned about MY downstream project. I know enough to mitigate any impact of this change. I know these are used in many places, and I think it would be a significant change to users to have to adapt to this. This is why Ii am asking this change to be an opt-in, rather than just this breaking when people bump past this change.

Please be mindful of the tone. It is true that I work downstream, but I have worked on MLIR enough and am very capable of detaching my downstream project concerns and concerns as a developer in MLIR.

@srcarroll
Copy link
Contributor Author

srcarroll commented Nov 5, 2025

@MaheshRavishankar My choice of words do seem directed at you personally. I mean no disrespect. That's my mistake and not my true intention. When I say "you", I mean it more as a placeholder for any person that is trying to use the argument I'm objecting to ("I dont want to change my code"). For that I apologize. But, whether you are speaking for yourself or for other people, I still stand behind my statement that it's a weak argument.

Upstream code affects a lot of projects and not just your particular use case.

I'm aware of that, do care about it, and do give it significant thought, not just for my own use case. If I only cared about my own use case, then I would keep these changes for myself. I don't need to upstream. I do it when I firmly believe it's a benefit to the community. Not to say I'm always right, so I'm very open to feedback from others to help me understand things I miss. And again, "I dont want to change my code" provides zero value in helping me understand.

You can reuse the implementation, but needs to be a separate entry point

I'm not opposed this for the right reason. Again, I will look into your suggestion or think about how this should be abstracted in other ways that make sense outside the context of "I dont want to change my code". I just have to see it myself first to think about the implications.

@joker-eph
Copy link
Collaborator

joker-eph commented Nov 5, 2025

@srcarroll your take seems reasonable to me. This is aligned with the usual value of working upstream: we bias towards future users rather than incumbent. It is also a matter of encouraging upstream evolution and maintenance: having to jump through too many hoops because of just "fears" of causing downstream to update some APIs is discouraging upstream contributors, in particular those who aren't necessarily paid to work upstream. Reducing the overhead/friction to improve the codebase is critical, and the willingness to break APIs (with reasons) is part of it.

Edit: I'm not judging whether your patch here in particular is good or not, just the angle at which you're looking at it and the kind of rationale you've been trying to apply.

Block &consumerBlock = consumer->getRegion(0).front();
OpBuilder::InsertionGuard guard(rewriter);
Block *fusedBlock = rewriter.createBlock(&fusedOp.getRegion());
Block *fusedBlock = rewriter.createBlock(&fusedOp->getRegion(0));
Copy link
Member

@Groverkss Groverkss Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. Please use interface methods to do it correctly:

/*methodName=*/"getRegionBuilder",

or

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getRegionBuilder is wrong since that just gets a function ref, but will use getBlock. thanks for pointing it out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm actually unsure how to apply this suggestion. the other one above makes sense since i'm just getting the blocks that have already been created. however this one is creating a block in the op's region and I dont see an interface method for getting a region. If region 0 isn't guaranteed, shouldn't the interface have a method for that?

Anyway, fusedOp doesn't have to be a LinalgOp for these initial changes since the function that calls this one explicitly creates a GenericOp (see https://github.com/llvm/llvm-project/pull/144922/files#diff-a7543973103a3f3abb605911ca6d141dc4ffd4782b2bc0ad57890d11ab72e2c1R422). So it's probably better to just declare fusedOp as GenericOp for this function and revert this line. Any thoughts?

@rengolin had a discussion back when this PR first went up about generating named ops post fusion when possible. I think it was agreed that it makes sense to leave this as a TODO (see #144922 (comment)). So when we get there we can revisit how to do this, unless there's an obvious solution now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, fusedOp doesn't have to be a LinalgOp for these initial changes since the function that calls this one explicitly creates a GenericOp (see https://github.com/llvm/llvm-project/pull/144922/files#diff-a7543973103a3f3abb605911ca6d141dc4ffd4782b2bc0ad57890d11ab72e2c1R422). So it's probably better to just declare fusedOp as GenericOp for this function and revert this line. Any thoughts?

Probably better to declare it as a GenericOp yes, since the transformation always (today) returns a GenericOp anyway.

@rengolin had a discussion back when this PR first went up about generating named ops post fusion when possible. I think it was agreed that it makes sense to leave this as a TODO (see #144922 (comment)). So when we get there we can revisit how to do this, unless there's an obvious solution now.

I think that is a different problem. My main concern is that expecting something from an interface when the interface doesn't gurantee it.

Comment on lines 227 to 228
Block &producerBlock = producer->getRegion(0).front();
Block &consumerBlock = consumer->getRegion(0).front();
Copy link
Member

@Groverkss Groverkss Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot do this on an interface. There is no gurantee that the op implementing this interface has region 0 as the region of the op. You can use the interface methods to get this:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes. thanks

@Groverkss Groverkss dismissed their stale review November 5, 2025 04:58

i just had techincal comments that were resolved

Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have serious concerns on the change as is

@MaheshRavishankar
Copy link
Contributor

@srcarroll your take seems reasonable to me. This is aligned with the usual value of working upstream: we bias towards future users rather than incumbent. It is also a matter of encouraging upstream evolution and maintenance: having to jump through too many hoops because of just "fears" of causing downstream to update some APIs is discouraging upstream contributors, in particular those who aren't necessarily paid to work upstream. Reducing the overhead/friction to improve the codebase is critical, and the willingness to break APIs (with reasons) is part of it.

Please read through the entire thread. It's not just about breakage. This is about expected functionality. This is doing two things at once and that is not what this is intended for. This is a new functionality that is combining two transformation "generalization" and "fusion". I think they should be kept separate

Edit: I'm not judging whether your patch here in particular is good or not, just the angle at which you're looking at it and the kind of rationale you've been trying to apply.

@ftynse
Copy link
Member

ftynse commented Nov 5, 2025

I suspect we will need @llvm/mlir-area-team on this.

My personal opinion is that the change may go through as is, though we may need some notice-related process around it. LLVM de facto does not have any guarantee on C++ API stability. I don't think this was ever codified in the policy, but was discussed repeatedly, e.g. here: https://discourse.llvm.org/t/explicitly-spelling-out-the-lack-of-stability-for-the-c-api-in-the-developer-policy/55952. I will go as far as claim that this is an incentive to put code upstream to shift the update burden. I am also rather reluctant to bloat upstream API with similar-but-different calls.

That being said, since all contributors work on MLIR because they use it downstream, I think it is a common courtesy to minimize extra workload for everyone by staging the change when reasonable. The upstream committer may have to extend extra effort, but they will ultimately save time if everyone else does the same. Usually we do it through a PSA on Discourse that points to the commit, explains how to adapt to it, and provides some time frame after which the change will be effective. This may mean that PR is approved without landing immediately, or the main implementation lands but a switch for the default behavior is change after some period dependent on the complexity of the change. Maybe we need to adapt this policy https://llvm.org/docs/DeveloperPolicy.html#making-potentially-breaking-changes and establish a vendors team for MLIR to get relevant folks notified.

An alternative suggestion is to (1) land this with the default control function disabling fusion of named ops, (2) post a PSA on the forum that the default will change to fusing named ops in ~2 weeks given the change is rather minor, (3) resolve any comments and land the switch for the default. If there are other downstreams than the one @MaheshRavishankar maintains that have concerns, they should chime in. If they don't put in the effort of doing so, why should we put the effort to adapt to them?

@ftynse
Copy link
Member

ftynse commented Nov 5, 2025

On the change itself, the rationale https://mlir.llvm.org/docs/Rationale/RationaleLinalgDialect/#progressive-lowering-dont-lose-information-too-quicklya-nameprogressive_loweringa explicitly lists fusion as a transformation that drops high-level information, which can be interpreted as a desire to give users control over generalization behavior during fusion. This would be aligned with @MaheshRavishankar's suggestion, though not because it breaks downstreams but because it would better reflect the design rationale of linalg. The control function does so, but maybe we can have several pre-defined for an easy switch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants