-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[mlir] Stop restricting linalg::ReduceOp to have same number of inputs and outputs #107005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-mlir-linalg Author: Clément Fournier (oowekyala) ChangesFix #93973. This allows using Full diff: https://github.com/llvm/llvm-project/pull/107005.diff 3 Files Affected:
diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td
index ac61117c3d6e36..f20f036d6fe480 100644
--- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td
+++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td
@@ -311,7 +311,7 @@ def MapOp : LinalgStructuredBase_Op<"map", [
def ReduceOp : LinalgStructuredBase_Op<"reduce", [
DeclareOpInterfaceMethods<OpAsmOpInterface, ["getAsmResultNames"]>,
DeclareOpInterfaceMethods<OpAsmOpInterface, ["getAsmBlockArgumentNames"]>,
- SameVariadicOperandSize,
+ AttrSizedOperandSegments,
SingleBlockImplicitTerminator<"YieldOp">]> {
let summary = "Reduce operator";
let description = [{
diff --git a/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp b/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
index 76df3ecf2d2bd4..9c6c36075b55bd 100644
--- a/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
+++ b/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
@@ -1301,11 +1301,12 @@ LogicalResult GenericOp::fold(FoldAdaptor, SmallVectorImpl<OpFoldResult> &) {
static ParseResult parseDstStyleOp(
OpAsmParser &parser, OperationState &result,
function_ref<ParseResult(OpAsmParser &, NamedAttrList &)> parseAttrsFn =
- nullptr) {
+ nullptr,
+ bool addOperandSegmentSizes = false) {
// Parse `ins` and `outs`.
SmallVector<Type, 4> inputTypes, outputTypes;
if (parseCommonStructuredOpParts(parser, result, inputTypes, outputTypes,
- /*addOperandSegmentSizes=*/false))
+ addOperandSegmentSizes))
return failure();
// Add result types.
@@ -1646,9 +1647,12 @@ ParseResult ReduceOp::parse(OpAsmParser &parser, OperationState &result) {
}
if (parseDstStyleOp(
- parser, result, [&](OpAsmParser &parser, NamedAttrList &attributes) {
+ parser, result,
+ [&](OpAsmParser &parser, NamedAttrList &attributes) {
return parseDenseI64ArrayAttr(parser, attributes, "dimensions");
- }))
+ },
+ /*addOperandSegmentSizes=*/true))
+
return failure();
if (payloadOpName.has_value()) {
@@ -1683,7 +1687,9 @@ void ReduceOp::print(OpAsmPrinter &p) {
printCommonStructuredOpParts(p, getDpsInputs(), getDpsInits());
printDenseI64ArrayAttr(p, getDimensionsAttrName(), getDimensions());
- p.printOptionalAttrDict((*this)->getAttrs(), {getDimensionsAttrName()});
+ p.printOptionalAttrDict(
+ (*this)->getAttrs(),
+ {getDimensionsAttrName(), getOperandSegmentSizesAttrName()});
if (!payloadOp) {
// Print region if the payload op was not detected.
p.increaseIndent();
diff --git a/mlir/test/Dialect/Linalg/roundtrip.mlir b/mlir/test/Dialect/Linalg/roundtrip.mlir
index 146e9780b8ebbe..802de7c335d9b1 100644
--- a/mlir/test/Dialect/Linalg/roundtrip.mlir
+++ b/mlir/test/Dialect/Linalg/roundtrip.mlir
@@ -485,6 +485,48 @@ func.func @variadic_reduce_memref(%input1: memref<16x32x64xf32>,
// -----
+func.func @reduce_asymmetric(%input: tensor<16x32x64xi32>, %input2: tensor<16x32x64xi32>,
+ %init: tensor<16x64xi32>) -> tensor<16x64xi32> {
+ %reduce = linalg.reduce
+ ins(%input, %input2:tensor<16x32x64xi32>, tensor<16x32x64xi32>)
+ outs(%init:tensor<16x64xi32>)
+ dimensions = [1]
+ (%in: i32, %in2: i32, %out: i32) {
+ %0 = arith.muli %in, %in2: i32
+ %1 = arith.addi %out, %0: i32
+ linalg.yield %1: i32
+ }
+ func.return %reduce : tensor<16x64xi32>
+}
+// CHECK-LABEL: func @reduce_asymmetric
+// CHECK: linalg.reduce ins(%{{.*}}, %{{.*}}: tensor<16x32x64xi32>, tensor<16x32x64xi32>)
+// CHECK-NOT: operandSegmentSize
+// CHECK-SAME: outs(%{{.*}}: tensor<16x64xi32>)
+// CHECK-SAME: dimensions = [1]
+
+// -----
+
+func.func @reduce_asymmetric_memref(%input: memref<16x32x64xi32>, %input2: memref<16x32x64xi32>,
+ %init: memref<16x64xi32>) {
+ linalg.reduce
+ ins(%input, %input2:memref<16x32x64xi32>, memref<16x32x64xi32>)
+ outs(%init:memref<16x64xi32>)
+ dimensions = [1]
+ (%in: i32, %in2: i32, %out: i32) {
+ %0 = arith.muli %in, %in2: i32
+ %1 = arith.addi %out, %0: i32
+ linalg.yield %1: i32
+ }
+ func.return
+}
+// CHECK-LABEL: func @reduce_asymmetric_memref
+// CHECK: linalg.reduce ins(%{{.*}}, %{{.*}}: memref<16x32x64xi32>, memref<16x32x64xi32>)
+// CHECK-NOT: operandSegmentSize
+// CHECK-SAME: outs(%{{.*}}: memref<16x64xi32>)
+// CHECK-SAME: dimensions = [1]
+
+// -----
+
func.func @transpose(%input: tensor<16x32x64xf32>,
%init: tensor<32x64x16xf32>) -> tensor<32x64x16xf32> {
%transpose = linalg.transpose
|
|
Ping~, Is this PR reasonable? |
|
Thanks for the PR! Not sure I follow this change. Shouldn't we just combine the two tensors before the |
|
Thanks for taking a look :)
It probably is possible but it would be much more involved I think. How would you combine the two tensors into one tensor For a compiler that has to generate this code, it is simpler to just generate one I did a usage search for It seems nobody is using reduce with several operands (except our compiler, with the semantics that this PR gives it ^^). This hints that the current behavior is not useful to anyone so far. OTOH these updated semantics are more general, but would support the older behavior just fine, so I don't see the harm in them :) |
|
The PR description is misleading. It says "fixing a crash", but it is actually changing the semantics of the operation. Is there anything more to reduction than just saying that the iterator types are "reduction". I dont see it providing any more value other than that. |
You're right, although there was a logic error in the verifier code that caused a crash, so it also does fix a crash. Since the semantics of this op in case it has several input operands are not documented or tested, I interpreted what the semantics should be. In my view, the number of inputs and number of outputs should not be related. No other linalg op has this restriction afaict. It seems like the only consistent alternative semantics is to say that the op must have exactly one input and output, which is restrictive for no good reason.
Sure, but this can be said of every structured linalg op. |
That is not entirely true. There is a very specific relation ship between all the inputs and outputs of a
I am fine with that but just need to document the semantics that you expect. |
|
|
||
| func.func @reduce_asymmetric(%input: tensor<16x32x64xi32>, %input2: tensor<16x32x64xi32>, | ||
| %init: tensor<16x64xi32>) -> tensor<16x64xi32> { | ||
| %reduce = linalg.reduce |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I dont understand what this is trying to do. You are effectively doing a multiply and a reduce? How is this different from just using a linalg.generic. As in why use a linalg.reduce better than linalg.generic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is a multiply-accumulate type operation. The difference is just that it's higher-level than a linalg generic. That is nice when you want to generate this op in a compiler pass, or when you want to pattern-match it. All linalg ops are just sugar over a generic, so I don't think this use case should be disqualified because it can be implemented with a generic.
Could you clarify what you mean here? I'm not sure I follow. The main point of this change is to be able to reduce several tensors into one, as in my examples, so the reductions are not independent, rather it uses all inputs. |
Sorry to reverse the question on you. Can you write the loop version or |
|
Here is the affine version: affine.for %i = 0 to 16 {
affine.for %k = 0 to 32 {
affine.for %j = 0 to 64 {
%4 = affine.load %1[%i, %k, %j] : memref<16x32x64xi32>
%5 = affine.load %0[%i, %k, %j] : memref<16x32x64xi32>
%6 = affine.load %alloc[%i, %j] : memref<16x64xi32>
%7 = arith.muli %4, %5 : i32
%8 = arith.addi %6, %7 : i32
affine.store %8, %alloc[%i, %j] : memref<16x64xi32>
}
}
}and the generic version: #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
#map1 = affine_map<(d0, d1, d2) -> (d0, d2)>
%0 = linalg.generic {indexing_maps = [#map, #map, #map1], iterator_types = ["parallel", "reduction", "parallel"]} ins(%arg0, %arg1 : tensor<16x32x64xi32>, tensor<16x32x64xi32>) outs(%arg2 : tensor<16x64xi32>) {
^bb0(%in: i32, %in_0: i32, %out: i32):
%1 = arith.muli %in, %in_0 : i32
%2 = arith.addi %out, %1 : i32
linalg.yield %2 : i32
} -> tensor<16x64xi32>
The inputs should all have the same shape. This is because the shape of the output is the shape of the inputs with the reduction dimensions removed. So the reduction dimensions must match in extent, and the parallel dimensions must match because they determine the shape of the output, so in total the full shape must match. That is already checked by the verifier yes. The above code samples are generated with the |
rengolin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @MaheshRavishankar said, the name and description of this PR are misleading. This is adding new semantics to an existing operation without proper consideration and discussion in the forum.
I updated your issue to reflect the changes we're making to linalg as a whole and the new contract operation that you could use instead. Please change this PR to make the verifier more restrict and we can discuss the use case you have with the existing instructions.
|
I wasn't aware of this restructuring effort of linalg. It seems the new
I will reopen a PR with the verifier fix |
|
A couple more thoughts, to make sure this change is not misunderstood. I have not actually been proposing to extend the semantics of linalg.reduce. It is already possible to write the MAC code that I presented above like so: linalg.reduce
ins(%input, %input2:memref<16x32x64xi32>, memref<16x32x64xi32>)
outs(%init, %init2: memref<16x64xi32>, memref<16x64xi32>)
dimensions = [1]
(%in: i32, %in2: i32, %out: i32, %out2: i32) {
%0 = arith.muli %in, %in2: i32
%1 = arith.addi %out, %0: i32
linalg.yield %1, %1: i32, i32
}But in current linalg, if you have 2 inputs then you must have 2 outputs, so here we have to return a second result that we don't care about. To me this restriction is most likely an accident. It makes no sense to require that, unless maybe you restrict each input to flow only to its corresponding output, but that would be better expressed by several unary reduce. It is also not documented why this is the case or what the intended semantics should be. On the other hand there is evidence in the code and documentation that this operator is designed to support variadic operands. My change is lifting an artificial restriction, but it is not making Of course if to fit within @rengolin's new linalg design, Anyway I will close this and reopen a PR with just the verifier fix. |
Fix #93973. This allows using
linalg.reduceto eg reduce several tensors into one. The current implementation is limited to have the same number of inputs and outputs.