[mlir] Stop restricting linalg::ReduceOp to have same number of inputs and outputs #107005

oowekyala · 2024-09-02T15:50:17Z

Fix #93973. This allows using linalg.reduce to eg reduce several tensors into one. The current implementation is limited to have the same number of inputs and outputs.

github-actions · 2024-09-02T15:50:33Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-09-02T15:51:03Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-linalg

Author: Clément Fournier (oowekyala)

Changes

Fix #93973. This allows using linalg.reduce to eg reduce several tensors into one. The current implementation is limited to have the same number of inputs and outputs.

Full diff: https://github.com/llvm/llvm-project/pull/107005.diff

3 Files Affected:

(modified) mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td (+1-1)
(modified) mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp (+11-5)
(modified) mlir/test/Dialect/Linalg/roundtrip.mlir (+42)

diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td
index ac61117c3d6e36..f20f036d6fe480 100644
--- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td
+++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td
@@ -311,7 +311,7 @@ def MapOp : LinalgStructuredBase_Op<"map", [
 def ReduceOp : LinalgStructuredBase_Op<"reduce", [
     DeclareOpInterfaceMethods<OpAsmOpInterface, ["getAsmResultNames"]>,
     DeclareOpInterfaceMethods<OpAsmOpInterface, ["getAsmBlockArgumentNames"]>,
-    SameVariadicOperandSize,
+    AttrSizedOperandSegments,
     SingleBlockImplicitTerminator<"YieldOp">]> {
   let summary = "Reduce operator";
   let description = [{
diff --git a/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp b/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
index 76df3ecf2d2bd4..9c6c36075b55bd 100644
--- a/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
+++ b/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
@@ -1301,11 +1301,12 @@ LogicalResult GenericOp::fold(FoldAdaptor, SmallVectorImpl<OpFoldResult> &) {
 static ParseResult parseDstStyleOp(
     OpAsmParser &parser, OperationState &result,
     function_ref<ParseResult(OpAsmParser &, NamedAttrList &)> parseAttrsFn =
-        nullptr) {
+        nullptr,
+    bool addOperandSegmentSizes = false) {
   // Parse `ins` and `outs`.
   SmallVector<Type, 4> inputTypes, outputTypes;
   if (parseCommonStructuredOpParts(parser, result, inputTypes, outputTypes,
-                                   /*addOperandSegmentSizes=*/false))
+                                   addOperandSegmentSizes))
     return failure();
 
   // Add result types.
@@ -1646,9 +1647,12 @@ ParseResult ReduceOp::parse(OpAsmParser &parser, OperationState &result) {
   }
 
   if (parseDstStyleOp(
-          parser, result, [&](OpAsmParser &parser, NamedAttrList &attributes) {
+          parser, result,
+          [&](OpAsmParser &parser, NamedAttrList &attributes) {
             return parseDenseI64ArrayAttr(parser, attributes, "dimensions");
-          }))
+          },
+          /*addOperandSegmentSizes=*/true))
+
     return failure();
 
   if (payloadOpName.has_value()) {
@@ -1683,7 +1687,9 @@ void ReduceOp::print(OpAsmPrinter &p) {
 
   printCommonStructuredOpParts(p, getDpsInputs(), getDpsInits());
   printDenseI64ArrayAttr(p, getDimensionsAttrName(), getDimensions());
-  p.printOptionalAttrDict((*this)->getAttrs(), {getDimensionsAttrName()});
+  p.printOptionalAttrDict(
+      (*this)->getAttrs(),
+      {getDimensionsAttrName(), getOperandSegmentSizesAttrName()});
   if (!payloadOp) {
     // Print region if the payload op was not detected.
     p.increaseIndent();
diff --git a/mlir/test/Dialect/Linalg/roundtrip.mlir b/mlir/test/Dialect/Linalg/roundtrip.mlir
index 146e9780b8ebbe..802de7c335d9b1 100644
--- a/mlir/test/Dialect/Linalg/roundtrip.mlir
+++ b/mlir/test/Dialect/Linalg/roundtrip.mlir
@@ -485,6 +485,48 @@ func.func @variadic_reduce_memref(%input1: memref<16x32x64xf32>,
 
 // -----
 
+func.func @reduce_asymmetric(%input: tensor<16x32x64xi32>, %input2: tensor<16x32x64xi32>,
+                  %init: tensor<16x64xi32>) -> tensor<16x64xi32> {
+  %reduce = linalg.reduce
+      ins(%input, %input2:tensor<16x32x64xi32>, tensor<16x32x64xi32>)
+      outs(%init:tensor<16x64xi32>)
+      dimensions = [1]
+      (%in: i32, %in2: i32, %out: i32) {
+        %0 = arith.muli %in, %in2: i32
+        %1 = arith.addi %out, %0: i32
+        linalg.yield %1: i32
+      }
+  func.return %reduce : tensor<16x64xi32>
+}
+// CHECK-LABEL: func @reduce_asymmetric
+//       CHECK:   linalg.reduce ins(%{{.*}}, %{{.*}}: tensor<16x32x64xi32>, tensor<16x32x64xi32>)
+//  CHECK-NOT:    operandSegmentSize
+//  CHECK-SAME:   outs(%{{.*}}: tensor<16x64xi32>)
+//  CHECK-SAME:   dimensions = [1]
+
+// -----
+
+func.func @reduce_asymmetric_memref(%input: memref<16x32x64xi32>, %input2: memref<16x32x64xi32>,
+                  %init: memref<16x64xi32>) {
+  linalg.reduce
+      ins(%input, %input2:memref<16x32x64xi32>, memref<16x32x64xi32>)
+      outs(%init:memref<16x64xi32>)
+      dimensions = [1]
+      (%in: i32, %in2: i32, %out: i32) {
+        %0 = arith.muli %in, %in2: i32
+        %1 = arith.addi %out, %0: i32
+        linalg.yield %1: i32
+      }
+  func.return
+}
+// CHECK-LABEL: func @reduce_asymmetric_memref
+//       CHECK:   linalg.reduce ins(%{{.*}}, %{{.*}}: memref<16x32x64xi32>, memref<16x32x64xi32>)
+//  CHECK-NOT:    operandSegmentSize
+//  CHECK-SAME:   outs(%{{.*}}: memref<16x64xi32>)
+//  CHECK-SAME:   dimensions = [1]
+
+// -----
+
 func.func @transpose(%input: tensor<16x32x64xf32>,
                      %init: tensor<32x64x16xf32>) -> tensor<32x64x16xf32> {
   %transpose = linalg.transpose

CoTinker · 2024-09-19T06:39:55Z

Ping~, Is this PR reasonable?

dcaballe · 2024-09-19T17:34:51Z

Thanks for the PR!

  linalg.reduce
      ins(%input, %input2:memref<16x32x64xi32>, memref<16x32x64xi32>)
      outs(%init:memref<16x64xi32>)
      dimensions = [1]
      (%in: i32, %in2: i32, %out: i32) {
        %0 = arith.muli %in, %in2: i32
        %1 = arith.addi %out, %0: i32
        linalg.yield %1: i32

Not sure I follow this change. Shouldn't we just combine the two tensors before the linalg.reduce with a mul and then just add reduce the result of that?

oowekyala · 2024-09-20T10:12:15Z

Thanks for taking a look :)

Not sure I follow this change. Shouldn't we just combine the two tensors before the linalg.reduce with a mul and then just add reduce the result of that?

It probably is possible but it would be much more involved I think. How would you combine the two tensors into one tensor 16x32x64 where the second dimension is the mul, short of using a linalg.generic? And if we have to use a generic, then we could just fuse the reduce into it directly, but then we don't have nice structured ops anymore.

For a compiler that has to generate this code, it is simpler to just generate one reduce op, and the generated code is higher level than a generic, or a sequence of generic + reduce. To me at least it is very natural to think about the reduce operator this way.

I did a usage search for linalg.reduce on GH: https://github.com/search?q=linalg.reduce+language%3AMLIR&type=code

It seems nobody is using reduce with several operands (except our compiler, with the semantics that this PR gives it ^^). This hints that the current behavior is not useful to anyone so far. OTOH these updated semantics are more general, but would support the older behavior just fine, so I don't see the harm in them :)

MaheshRavishankar · 2024-09-24T13:12:16Z

The PR description is misleading. It says "fixing a crash", but it is actually changing the semantics of the operation.

Is there anything more to reduction than just saying that the iterator types are "reduction". I dont see it providing any more value other than that.

oowekyala · 2024-11-01T19:45:48Z

The PR description is misleading. It says "fixing a crash", but it is actually changing the semantics of the operation.

You're right, although there was a logic error in the verifier code that caused a crash, so it also does fix a crash. Since the semantics of this op in case it has several input operands are not documented or tested, I interpreted what the semantics should be. In my view, the number of inputs and number of outputs should not be related. No other linalg op has this restriction afaict.

It seems like the only consistent alternative semantics is to say that the op must have exactly one input and output, which is restrictive for no good reason.

Is there anything more to reduction than just saying that the iterator types are "reduction". I dont see it providing any more value other than that.

Sure, but this can be said of every structured linalg op. linalg.map doesn't require the same number of inputs and outputs, which allows some useful things to be written (as in its documentation, whose example is similar to the example in this comment). Making reduce have the same freedom makes the linalg dialect more consistent.

MaheshRavishankar · 2024-11-02T00:08:39Z

The PR description is misleading. It says "fixing a crash", but it is actually changing the semantics of the operation.

You're right, although there was a logic error in the verifier code that caused a crash, so it also does fix a crash. Since the semantics of this op in case it has several input operands are not documented or tested, I interpreted what the semantics should be. In my view, the number of inputs and number of outputs should not be related. No other linalg op has this restriction afaict.

That is not entirely true. There is a very specific relation ship between all the inputs and outputs of a linalg op. They all have indexing maps with same domains and there is a relationship between how different dimensions are constrained based on the range of the indexing maps used. Also having a dimension that is marked as reduction showing up in the result indexing map is not allowed. We can have multiple inputs and multiple outputs for a reduction. Thats totally OK, but you are saying its one op running independent reductions to generate different outputs, but with the same iteration space and dependence? That works.

It seems like the only consistent alternative semantics is to say that the op must have exactly one input and output, which is restrictive for no good reason.

Is there anything more to reduction than just saying that the iterator types are "reduction". I dont see it providing any more value other than that.

Sure, but this can be said of every structured linalg op. linalg.map doesn't require the same number of inputs and outputs, which allows some useful things to be written (as in its documentation, whose example is similar to the example in this comment). Making reduce have the same freedom makes the linalg dialect more consistent.

I am fine with that but just need to document the semantics that you expect.

MaheshRavishankar · 2024-11-02T00:10:20Z

mlir/test/Dialect/Linalg/roundtrip.mlir


+func.func @reduce_asymmetric(%input: tensor<16x32x64xi32>, %input2: tensor<16x32x64xi32>,
+                  %init: tensor<16x64xi32>) -> tensor<16x64xi32> {
+  %reduce = linalg.reduce


Like I dont understand what this is trying to do. You are effectively doing a multiply and a reduce? How is this different from just using a linalg.generic. As in why use a linalg.reduce better than linalg.generic?

Yes this is a multiply-accumulate type operation. The difference is just that it's higher-level than a linalg generic. That is nice when you want to generate this op in a compiler pass, or when you want to pattern-match it. All linalg ops are just sugar over a generic, so I don't think this use case should be disqualified because it can be implemented with a generic.

oowekyala · 2024-11-04T15:10:41Z

We can have multiple inputs and multiple outputs for a reduction. Thats totally OK, but you are saying its one op running independent reductions to generate different outputs, but with the same iteration space and dependence? That works.

Could you clarify what you mean here? I'm not sure I follow. The main point of this change is to be able to reduce several tensors into one, as in my examples, so the reductions are not independent, rather it uses all inputs.

MaheshRavishankar · 2024-11-04T18:35:13Z

We can have multiple inputs and multiple outputs for a reduction. Thats totally OK, but you are saying its one op running independent reductions to generate different outputs, but with the same iteration space and dependence? That works.

Could you clarify what you mean here? I'm not sure I follow. The main point of this change is to be able to reduce several tensors into one, as in my examples, so the reductions are not independent, rather it uses all inputs.

Sorry to reverse the question on you. Can you write the loop version or linalg.generic version of what the reduce is doing? What are the constraints on the shapes of the inputs? Should they all be the same shape? Are they all accessed using the same indexing map and is this checked in the verifier?

oowekyala · 2024-11-05T13:28:34Z

Here is the affine version:

    affine.for %i = 0 to 16 {
      affine.for %k = 0 to 32 {
        affine.for %j = 0 to 64 {
          %4 = affine.load %1[%i, %k, %j] : memref<16x32x64xi32>
          %5 = affine.load %0[%i, %k, %j] : memref<16x32x64xi32>
          %6 = affine.load %alloc[%i, %j] : memref<16x64xi32>
          %7 = arith.muli %4, %5 : i32
          %8 = arith.addi %6, %7 : i32
          affine.store %8, %alloc[%i, %j] : memref<16x64xi32>
        }
      }
    }

and the generic version:

#map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
#map1 = affine_map<(d0, d1, d2) -> (d0, d2)>

    %0 = linalg.generic {indexing_maps = [#map, #map, #map1], iterator_types = ["parallel", "reduction", "parallel"]} ins(%arg0, %arg1 : tensor<16x32x64xi32>, tensor<16x32x64xi32>) outs(%arg2 : tensor<16x64xi32>) {
    ^bb0(%in: i32, %in_0: i32, %out: i32):
      %1 = arith.muli %in, %in_0 : i32
      %2 = arith.addi %out, %1 : i32
      linalg.yield %2 : i32
    } -> tensor<16x64xi32>

What are the constraints on the shapes of the inputs? Should they all be the same shape? Are they all accessed using the same indexing map and is this checked in the verifier?

The inputs should all have the same shape. This is because the shape of the output is the shape of the inputs with the reduction dimensions removed. So the reduction dimensions must match in extent, and the parallel dimensions must match because they determine the shape of the output, so in total the full shape must match. That is already checked by the verifier yes. The above code samples are generated with the mlir-opt built with this PR, so the way linalg.reduce maps to linalg.generic is not changed by this PR.

rengolin

As @MaheshRavishankar said, the name and description of this PR are misleading. This is adding new semantics to an existing operation without proper consideration and discussion in the forum.

I updated your issue to reflect the changes we're making to linalg as a whole and the new contract operation that you could use instead. Please change this PR to make the verifier more restrict and we can discuss the use case you have with the existing instructions.

oowekyala · 2024-12-13T10:52:13Z

I wasn't aware of this restructuring effort of linalg. It seems the new contract operation would fit my use case most of the time (although it is binary and not variadic)...

I will restrict the verifier in this PR to make reduce only unary. I think there is disagreement over what the semantics of the reduce with several arguments should be. It seems to me your restructuring of linalg also considers reduce to be unary, so I think we should just disallow variadic reduce until we clear that up in the forum.

I will reopen a PR with the verifier fix

oowekyala · 2024-12-13T11:51:53Z

A couple more thoughts, to make sure this change is not misunderstood.

I have not actually been proposing to extend the semantics of linalg.reduce. It is already possible to write the MAC code that I presented above like so:

  linalg.reduce
      ins(%input, %input2:memref<16x32x64xi32>, memref<16x32x64xi32>)
      outs(%init, %init2: memref<16x64xi32>, memref<16x64xi32>)
      dimensions = [1]
      (%in: i32, %in2: i32, %out: i32, %out2: i32) {
        %0 = arith.muli %in, %in2: i32
        %1 = arith.addi %out, %0: i32
        linalg.yield %1, %1: i32, i32
     }

But in current linalg, if you have 2 inputs then you must have 2 outputs, so here we have to return a second result that we don't care about. To me this restriction is most likely an accident. It makes no sense to require that, unless maybe you restrict each input to flow only to its corresponding output, but that would be better expressed by several unary reduce. It is also not documented why this is the case or what the intended semantics should be. On the other hand there is evidence in the code and documentation that this operator is designed to support variadic operands. My change is lifting an artificial restriction, but it is not making reduce more powerful. That fixes the bugs in the verifier so I think that's the semantics that the original authors intended. All in all as far as semantic changes go, this one is IMO very minor.

Of course if to fit within @rengolin's new linalg design, reduce must be unary, then the current semantics are indeed too general... So there is a choice to be made.

Anyway I will close this and reopen a PR with just the verifier fix.

oowekyala requested review from dcaballe, nicolasvasilache and rengolin as code owners September 2, 2024 15:50

llvmbot added mlir:linalg mlir labels Sep 2, 2024

oowekyala mentioned this pull request Sep 11, 2024

Update LLVM version tud-ccc/Cinnamon#14

Closed

oowekyala changed the title ~~[mlir] Fix #93973 - linalg::ReduceOp verifier crash~~ [mlir] Fix semantics of linalg::ReduceOp with several inputs Nov 1, 2024

MaheshRavishankar requested changes Nov 2, 2024

View reviewed changes

[mlir] Fix #93973 - linalg::ReduceOp verifier crash

9e1383f

oowekyala requested a review from MaheshRavishankar December 5, 2024 10:31

rengolin requested changes Dec 12, 2024

View reviewed changes

oowekyala changed the title ~~[mlir] Fix semantics of linalg::ReduceOp with several inputs~~ [mlir] Stop restricting linalg::ReduceOp to have same number of inputs and outputs Dec 13, 2024

oowekyala closed this Dec 13, 2024

oowekyala mentioned this pull request Dec 13, 2024

[mlir][linalg] Fix #93973 - linalg reduce verifier crash #119871

Closed

[mlir] Stop restricting linalg::ReduceOp to have same number of inputs and outputs #107005

[mlir] Stop restricting linalg::ReduceOp to have same number of inputs and outputs #107005

Uh oh!

Conversation

oowekyala commented Sep 2, 2024

Uh oh!

github-actions bot commented Sep 2, 2024

Uh oh!

llvmbot commented Sep 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CoTinker commented Sep 19, 2024

Uh oh!

dcaballe commented Sep 19, 2024

Uh oh!

oowekyala commented Sep 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaheshRavishankar commented Sep 24, 2024

Uh oh!

oowekyala commented Nov 1, 2024

Uh oh!

MaheshRavishankar commented Nov 2, 2024

Uh oh!

MaheshRavishankar Nov 2, 2024

Choose a reason for hiding this comment

Uh oh!

oowekyala Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

oowekyala commented Nov 4, 2024

Uh oh!

MaheshRavishankar commented Nov 4, 2024

Uh oh!

oowekyala commented Nov 5, 2024

Uh oh!

rengolin left a comment

Choose a reason for hiding this comment

Uh oh!

oowekyala commented Dec 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oowekyala commented Dec 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

llvmbot commented Sep 2, 2024 •

edited

Loading

oowekyala commented Sep 20, 2024 •

edited

Loading

oowekyala commented Dec 13, 2024 •

edited

Loading

oowekyala commented Dec 13, 2024 •

edited

Loading