[mlir][scf]: Avoid inserting affine.min when tiling dynamic operation sizes if possible #113819

AviadCo · 2024-10-27T15:53:21Z

During operation tiling using scf, we may avoid inserting affine.min to handle the last tile where upper_bound = step * k where stride may be a constant or a dynamic.

llvmbot · 2024-10-27T15:53:54Z

@llvm/pr-subscribers-mlir-scf
@llvm/pr-subscribers-mlir-linalg

@llvm/pr-subscribers-mlir

Author: Aviad Cohen (AviadCo)

Changes

During operation tiling using scf, we may avoid inserting affine.min to handle the last tile where upper_bound = step * k where stride may be a constant or a dynamic.

Full diff: https://github.com/llvm/llvm-project/pull/113819.diff

2 Files Affected:

(modified) mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp (+24)
(modified) mlir/test/Dialect/Linalg/transform-op-tile.mlir (+47)

diff --git a/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp b/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
index e2feb10b314540..54afbe4b032bd4 100644
--- a/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
+++ b/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
@@ -186,11 +186,35 @@ static void checkSafeToTileToForall(TilingInterface op,
   }
 }
 
+/// Returns true if `size` is dynamic multiplication of `stride`.
+/// i.e. , `size = stride * k` where stride may be a constant or a dynamic.
+static bool dynamiclyDivisible(OpFoldResult size, OpFoldResult stride) {
+  Value dynamicSize = dyn_cast_if_present<Value>(size);
+  if (!dynamicSize)
+    return false;
+  auto mulOp = dynamicSize.getDefiningOp<arith::MulIOp>();
+  if (!mulOp)
+    return false;
+  if (Value dynamicStride = dyn_cast_if_present<Value>(stride))
+    return mulOp.getLhs() == dynamicStride || mulOp.getRhs() == dynamicStride;
+  std::optional<int64_t> strideAsInt = getConstantIntValue(stride);
+  std::optional<int64_t> lhsAsInt = getConstantIntValue(mulOp.getLhs());
+  std::optional<int64_t> rhsAsInt = getConstantIntValue(mulOp.getRhs());
+  if (strideAsInt && lhsAsInt && *strideAsInt == *lhsAsInt)
+    return true;
+  if (strideAsInt && rhsAsInt && *strideAsInt == *rhsAsInt)
+    return true;
+  
+  return false;
+}
+
 /// Check if `stride` evenly divides the trip count `size - offset`.
 static bool tileDividesIterationDomain(Range loopRange) {
   std::optional<int64_t> offsetAsInt = getConstantIntValue(loopRange.offset);
   if (!offsetAsInt)
     return false;
+  if (*offsetAsInt == 0 && dynamiclyDivisible(loopRange.size, loopRange.stride))
+    return true;
   std::optional<int64_t> sizeAsInt = getConstantIntValue(loopRange.size);
   if (!sizeAsInt)
     return false;
diff --git a/mlir/test/Dialect/Linalg/transform-op-tile.mlir b/mlir/test/Dialect/Linalg/transform-op-tile.mlir
index 7bac850d0b7fe9..ade523ef378f36 100644
--- a/mlir/test/Dialect/Linalg/transform-op-tile.mlir
+++ b/mlir/test/Dialect/Linalg/transform-op-tile.mlir
@@ -266,3 +266,50 @@ func.func @tile_linalg_matmul(
     -> tensor<128x128xf32>
   return %0 : tensor<128x128xf32>
 }
+
+// -----
+
+#map = affine_map<(d0) -> (d0)>
+
+// CHECK-LABEL: splited_dynamic_linalg_generic
+func.func @splited_dynamic_linalg_generic(%arg0: tensor<?xi16>, %arg1: tensor<?xi16>) -> tensor<?xi16> {
+  %c80 = arith.constant 80 : index
+  %c0 = arith.constant 0 : index
+  %dim = tensor.dim %arg1, %c0 : tensor<?xi16>
+  %0 = tensor.empty(%dim) : tensor<?xi16>
+  %1 = arith.divui %dim, %c80 : index
+  %2 = arith.muli %1, %c80 : index
+  %3 = arith.remui %dim, %c80 : index
+  %extracted_slice = tensor.extract_slice %arg0[0] [%2] [1] : tensor<?xi16> to tensor<?xi16>
+  %extracted_slice_0 = tensor.extract_slice %arg1[0] [%2] [1] : tensor<?xi16> to tensor<?xi16>
+  %extracted_slice_1 = tensor.extract_slice %0[0] [%2] [1] : tensor<?xi16> to tensor<?xi16>
+  // CHECK: scf.for
+  // CHECK-NOT: affine.min
+  %4 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel"]} ins(%extracted_slice, %extracted_slice_0 : tensor<?xi16>, tensor<?xi16>) outs(%extracted_slice_1 : tensor<?xi16>) {
+  ^bb0(%in_1: i16, %in_2: i16, %out: i16):
+    %6 = arith.addi %in_1, %in_2 : i16
+    linalg.yield %6 : i16
+  } -> tensor<?xi16>
+  %inserted_slice = tensor.insert_slice %4 into %0[%2] [%2] [1] : tensor<?xi16> into tensor<?xi16>
+  %extracted_slice_2 = tensor.extract_slice %arg0[%2] [%3] [1] : tensor<?xi16> to tensor<?xi16>
+  %extracted_slice_3 = tensor.extract_slice %arg1[%2] [%3] [1] : tensor<?xi16> to tensor<?xi16>
+  %extracted_slice_4 = tensor.extract_slice %0[%2] [%3] [1] : tensor<?xi16> to tensor<?xi16>
+  // CHECK-NOT: scf.for
+  %5 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel"]} ins(%extracted_slice_2, %extracted_slice_3 : tensor<?xi16>, tensor<?xi16>) outs(%extracted_slice_4 : tensor<?xi16>) {
+  ^bb0(%in_1: i16, %in_2: i16, %out: i16):
+    %7 = arith.addi %in_1, %in_2 : i16
+    linalg.yield %7 : i16
+  } -> tensor<?xi16>
+  %inserted_slice_0 = tensor.insert_slice %5 into %inserted_slice[%2] [%3] [1] : tensor<?xi16> into tensor<?xi16>
+  return %inserted_slice_0 : tensor<?xi16>
+}
+
+
+module attributes {transform.with_named_sequence} {
+transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
+    %0 = transform.structured.match ops{["linalg.generic"]} in %arg1 : (!transform.any_op) -> !transform.any_op
+    %const = transform.structured.match ops{["arith.constant"]} in %arg1 : (!transform.any_op) -> !transform.any_op
+    %1, %loop = transform.structured.tile_using_for %0 tile_sizes [%const] : (!transform.any_op, !transform.any_op) -> (!transform.any_op, !transform.any_op)
+    transform.yield
+}
+}

github-actions · 2024-10-27T15:56:45Z

✅ With the latest revision this PR passed the C/C++ code formatter.

MaheshRavishankar

Would it be possible to this as a cleanup on `

MaheshRavishankar · 2024-10-27T21:36:17Z

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp

  }
 }

+/// Returns true if `size` is dynamic multiplication of `stride`.


Not sure I follow. The tileDividesIterationDomain is checking for (ub - lb) % tilesize = 0. Why doesnt that catch the case you are trying to cover here?

@MaheshRavishankar
you are right, my change adds dynamic support for (ub - lb) % tilesize where lb is zero, and ub is dynamicly defined by possibly_dynamic_ub = possibly_dynamic_tilesize * possibly_dynamic_stride.
Some background for this change - I am using split on dynamic operations (i.e. linalg::generic) where the I use arith.divui to calculate a tile which fits my cache and another part of arith.remui if the dynamic size won't be perfectly divided.
After the split, the first part size is defined by arith.muli where one of its operands is the tile size I am using to tile the operation - but the the ub is now considered dynamic although we can know the tile size divides it.

Ok, I see what you are trying. I am not a fan of this kind of piecemeal approaches. I tried something. I changed your input to use affine maps. Still didnt work. I get something like this

%1 = affine.apply affine_map<()[s0] -> ((s0 floordiv 80) * 80)>()[%dim] %3 = scf.for %arg2 = %c0 to %1 step %c80 iter_args(%arg3 = %extracted_slice_1) -> (tensor<?xi16>) { %5 = affine.min affine_map<(d0)[s0] -> (-d0 + (s0 floordiv 80) * 80, 80)>(%arg2)[%dim]

IIRC adding --scf-for-loop-canonicalization should have fixed the problem, but it didnt

cc @matthias-springer I thought this was fixed at some time.

Anyway, I am OK with this patch overall given where things are, but this needs to account for the lower bound of the loop as well.

I haven't looked into this pass for a long time. What I vaguely remember is that -scf-for-loop-canonicalization has always been a bit fragile due to limitations around semi-affine expressions in FlatAffineValueConstraints.

@MaheshRavishankar thanks for the replay!
I tried to take a look at scf-for-loop-canonicalization which I was unfamiliar with, as for now I didn't manage to find the issue why it does not manage to remove the affine.min in this case. I updated the MR to also handle none 0 offset.

@MaheshRavishankar can you please take a look on the latest patch?

… sizes if possible * During operation tiling using scf, we may avoid inserting affine.min to handle the last tile where `upper_bound = step * k` where stride may be a constant or a dynamic.

MaheshRavishankar

Ok, this seems to be doing much more than what the original PR set out to do. I am fine with it, but I'd really like this logic simpler. I cant gaurantee that as written this is maintainable.

MaheshRavishankar · 2024-11-06T17:06:09Z

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp

-  if (!strideAsInt)
-    return false;
-  return ((sizeAsInt.value() - offsetAsInt.value()) % strideAsInt.value() == 0);
+  if (strideAsInt && offsetAsInt && sizeAsInt)


nit: Please add { } around multi-line statements.

MaheshRavishankar · 2024-11-06T17:09:01Z

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp

+    return;
+
+  // Given `ofr` = `x` * `y`, all dividers of `x` and `y` are dividers of `ofr`.
+  collectDividers(mulOp.getLhs(), dividers);


Recursions are generally discouraged.

MaheshRavishankar · 2024-11-06T17:36:43Z

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp

+    std::optional<int64_t> dividerAsInt = getConstantIntValue(divider);
+    return dividerAsInt && *dividerAsInt % *strideAsInt == 0;
+  };
+  return llvm::any_of(dividersOfSize, isStrideDividesDivider) &&


Dont you think this logic is more complicated that you want it to be. I'd like to scope this a bit more narrowly. Just look for the immediate operations being a arith.mul and avoid the recursion.

The more general solution should be using something the IntegerDivisibilityAnalysis (here) that is very similar to the range inference analysis that can be used to fold this away. Doing this in transformations like this becomes unmaintainable in the long run.

AviadCo · 2024-11-14T07:48:35Z

@MaheshRavishankar per the discussion with you I decided to abandon the approach of searching patterns in arith operations as it may be too fragile like you mentioned. Instead I am avoiding the "split" before tiling and handling the affine.min accordingly.
I will also investigate IntegerDivisibilityAnalysis &scf-for-loop-canonicalization as those seems intresting for my flow, I appreciate your review!

AviadCo self-assigned this Oct 27, 2024

llvmbot added mlir:linalg mlir mlir:scf labels Oct 27, 2024

AviadCo force-pushed the tiling/avoid-min-on-dynamic branch from d88f749 to 33ba8d6 Compare October 27, 2024 18:49

AviadCo requested review from MaheshRavishankar and ftynse October 27, 2024 18:49

MaheshRavishankar requested changes Oct 27, 2024

View reviewed changes

AviadCo force-pushed the tiling/avoid-min-on-dynamic branch from 33ba8d6 to d90fa69 Compare October 29, 2024 12:38

[mlir][scf]: Avoid inserting affine.min when tiling dynamic operation…

7a5ab7c

… sizes if possible * During operation tiling using scf, we may avoid inserting affine.min to handle the last tile where `upper_bound = step * k` where stride may be a constant or a dynamic.

AviadCo force-pushed the tiling/avoid-min-on-dynamic branch from d90fa69 to 7a5ab7c Compare October 29, 2024 12:42

MaheshRavishankar approved these changes Nov 6, 2024

View reviewed changes

AviadCo closed this Nov 14, 2024

[mlir][scf]: Avoid inserting affine.min when tiling dynamic operation sizes if possible #113819

[mlir][scf]: Avoid inserting affine.min when tiling dynamic operation sizes if possible #113819

Uh oh!

Conversation

AviadCo commented Oct 27, 2024

Uh oh!

llvmbot commented Oct 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AviadCo commented Nov 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

llvmbot commented Oct 27, 2024 •

edited

Loading

github-actions bot commented Oct 27, 2024 •

edited

Loading