Skip to content

Conversation

@mgehre-amd
Copy link
Contributor

computeSliceParameters maps the loop range (given by lower bounds lbs and upper bounds ubs) into the operand tensor to compute the input slices that contains all indexed values.

The computation currently assumes that the affine expressions to map from loop iteration domain to tensor domain are monotonically increasing, and uses that information by just evaluating those at lbs and ubs to obtain the slice of the operand tensor. For non-monotonic expressions, the maximum and minimum values might not be at the boundaries. Here, we detect that case, and then use the full slice to be safe.

Fixes #111830 and is related to #113021.

If shapes are static, we could be cleverer: sometimes we can prove that even though the affine expression is not monotonic in itself, it is monotonic on the range (lbs, ubs). For example when the affine expression contains a d0 mod 8, and the range is (0, 4). This can be a follow up PR.

`computeSliceParameters` maps the loop range (given by lower bounds `lbs` and upper bounds `ubs`) into the operand tensor to compute the input slices that contains all indexed values.

The computation currently assumes that the affine expressions to map from loop iteration domain to tensor domain are monotonically increasing, and use that information by just evaluating those at `lbs` and `ubs` to obtain the slice of the operand tensor.
For non-monotonic expressions, the maximum and minimum values might not be at the boundaries.
Here, we detect that case, and then use the full slice to be safe.

Fixes llvm#111830 and is related to llvm#113021.

If shapes are static, we could be cleverer: sometimes we can prove that even though the affine expression is not monotonic in itself, it is monotonic on the range `(lbs, ubs)`. For example when the affine expression contains a `d0 mod 8`, and the range is `(0, 4)`. This can be a follow up PR.
@llvmbot
Copy link
Member

llvmbot commented Dec 3, 2024

@llvm/pr-subscribers-mlir-core
@llvm/pr-subscribers-mlir-linalg

@llvm/pr-subscribers-mlir

Author: Matthias Gehre (mgehre-amd)

Changes

computeSliceParameters maps the loop range (given by lower bounds lbs and upper bounds ubs) into the operand tensor to compute the input slices that contains all indexed values.

The computation currently assumes that the affine expressions to map from loop iteration domain to tensor domain are monotonically increasing, and uses that information by just evaluating those at lbs and ubs to obtain the slice of the operand tensor. For non-monotonic expressions, the maximum and minimum values might not be at the boundaries. Here, we detect that case, and then use the full slice to be safe.

Fixes #111830 and is related to #113021.

If shapes are static, we could be cleverer: sometimes we can prove that even though the affine expression is not monotonic in itself, it is monotonic on the range (lbs, ubs). For example when the affine expression contains a d0 mod 8, and the range is (0, 4). This can be a follow up PR.


Full diff: https://github.com/llvm/llvm-project/pull/118502.diff

6 Files Affected:

  • (modified) mlir/include/mlir/IR/AffineExpr.h (+5)
  • (modified) mlir/include/mlir/IR/AffineMap.h (+4)
  • (modified) mlir/lib/Dialect/Linalg/Utils/Utils.cpp (+5-2)
  • (modified) mlir/lib/IR/AffineExpr.cpp (+36)
  • (modified) mlir/lib/IR/AffineMap.cpp (+5)
  • (modified) mlir/test/Dialect/Linalg/tile-tensors.mlir (+32)
diff --git a/mlir/include/mlir/IR/AffineExpr.h b/mlir/include/mlir/IR/AffineExpr.h
index a93e74b449ceed..28d00f1299f2f9 100644
--- a/mlir/include/mlir/IR/AffineExpr.h
+++ b/mlir/include/mlir/IR/AffineExpr.h
@@ -110,6 +110,11 @@ class AffineExpr {
   /// floordiv, ceildiv, and mod is only allowed w.r.t constants.
   bool isPureAffine() const;
 
+  /// Returns true if this expression is monotonicically increasing with respect
+  /// to the AffineDimExprs, i.e. increasing the value of any AffineDimExpr will
+  /// never decrease the value of the result.
+  bool isMonotonicallyIncreasing() const;
+
   /// Returns the greatest known integral divisor of this affine expression. The
   /// result is always positive.
   int64_t getLargestKnownDivisor() const;
diff --git a/mlir/include/mlir/IR/AffineMap.h b/mlir/include/mlir/IR/AffineMap.h
index 4bc40a7d4091a9..6a4d9f690f8bb7 100644
--- a/mlir/include/mlir/IR/AffineMap.h
+++ b/mlir/include/mlir/IR/AffineMap.h
@@ -382,6 +382,10 @@ class AffineMap {
   /// Returns true if the AffineMap represents a symbol-less permutation map.
   bool isPermutation() const;
 
+  // Returns true if every result is monotonically increasing.
+  // See AffineExpr::isMonotonicallyIncreasing().
+  bool isComponentWiseMonotonicallyIncreasing() const;
+
   /// Returns the map consisting of the `resultPos` subset.
   AffineMap getSubMap(ArrayRef<unsigned> resultPos) const;
 
diff --git a/mlir/lib/Dialect/Linalg/Utils/Utils.cpp b/mlir/lib/Dialect/Linalg/Utils/Utils.cpp
index 38e427af1c4846..8712aadcfad62e 100644
--- a/mlir/lib/Dialect/Linalg/Utils/Utils.cpp
+++ b/mlir/lib/Dialect/Linalg/Utils/Utils.cpp
@@ -581,7 +581,11 @@ computeSliceParameters(OpBuilder &builder, Location loc, Value valueToTile,
   sliceParams.strides.reserve(rank);
   for (unsigned r = 0; r < rank; ++r) {
     LLVM_DEBUG(llvm::dbgs() << "computeSliceParameters: for dim#" << r);
-    if (!isTiled(map.getSubMap({r}), tileSizes)) {
+    auto m = map.getSubMap({r});
+    // The offset & size computation below only handles the case when
+    // the map is monotonically increasing, i.e. the min and max values are
+    // attained at the lower and upper bounds of the iteration domain.
+    if (!isTiled(m, tileSizes) || !m.isComponentWiseMonotonicallyIncreasing()) {
       sliceParams.offsets.push_back(builder.getIndexAttr(0));
       OpFoldResult dim = createFoldedDimOp(builder, loc, valueToTile, r);
       sliceParams.sizes.push_back(dim);
@@ -593,7 +597,6 @@ computeSliceParameters(OpBuilder &builder, Location loc, Value valueToTile,
 
     // Tiling creates a new slice at the proper index, the slice step is 1
     // (i.e. the op does not subsample, stepping occurs in the loop).
-    auto m = map.getSubMap({r});
     LLVM_DEBUG(llvm::dbgs() << "computeSliceParameters: submap: " << m << "\n");
     IRRewriter rewriter(builder);
     OpFoldResult offset = makeComposedFoldedAffineApply(rewriter, loc, m, lbs);
diff --git a/mlir/lib/IR/AffineExpr.cpp b/mlir/lib/IR/AffineExpr.cpp
index 2291d64c50a560..d1ec15048758cb 100644
--- a/mlir/lib/IR/AffineExpr.cpp
+++ b/mlir/lib/IR/AffineExpr.cpp
@@ -239,6 +239,42 @@ bool AffineExpr::isPureAffine() const {
   llvm_unreachable("Unknown AffineExpr");
 }
 
+static bool isNonNegativeConstant(AffineExpr expr) {
+  auto constant = dyn_cast<AffineConstantExpr>(expr);
+  return constant && constant.getValue() >= 0;
+}
+
+bool AffineExpr::isMonotonicallyIncreasing() const {
+  switch (getKind()) {
+  case AffineExprKind::SymbolId:
+  case AffineExprKind::DimId:
+  case AffineExprKind::Constant:
+    return true;
+  case AffineExprKind::Add: {
+    auto op = llvm::cast<AffineBinaryOpExpr>(*this);
+    return op.getLHS().isMonotonicallyIncreasing() &&
+           op.getRHS().isMonotonicallyIncreasing();
+  }
+  case AffineExprKind::Mul: {
+    // One operand must be a non-negative constant.
+    auto op = llvm::cast<AffineBinaryOpExpr>(*this);
+    return op.getLHS().isMonotonicallyIncreasing() &&
+           op.getRHS().isMonotonicallyIncreasing() &&
+           (isNonNegativeConstant(op.getLHS()) ||
+            isNonNegativeConstant(op.getRHS()));
+  }
+  case AffineExprKind::FloorDiv:
+  case AffineExprKind::CeilDiv: {
+    auto op = llvm::cast<AffineBinaryOpExpr>(*this);
+    return op.getLHS().isMonotonicallyIncreasing() &&
+           isNonNegativeConstant(op.getRHS());
+  }
+  case AffineExprKind::Mod:
+    return false;
+  }
+  llvm_unreachable("Unknown AffineExpr");
+}
+
 // Returns the greatest known integral divisor of this affine expression.
 int64_t AffineExpr::getLargestKnownDivisor() const {
   AffineBinaryOpExpr binExpr(nullptr);
diff --git a/mlir/lib/IR/AffineMap.cpp b/mlir/lib/IR/AffineMap.cpp
index 719a81ec057f9f..5174e038461630 100644
--- a/mlir/lib/IR/AffineMap.cpp
+++ b/mlir/lib/IR/AffineMap.cpp
@@ -651,6 +651,11 @@ bool AffineMap::isPermutation() const {
   return isProjectedPermutation();
 }
 
+bool AffineMap::isComponentWiseMonotonicallyIncreasing() const {
+  return all_of(getResults(),
+                [](auto expr) { return expr.isMonotonicallyIncreasing(); });
+}
+
 AffineMap AffineMap::getSubMap(ArrayRef<unsigned> resultPos) const {
   SmallVector<AffineExpr, 4> exprs;
   exprs.reserve(resultPos.size());
diff --git a/mlir/test/Dialect/Linalg/tile-tensors.mlir b/mlir/test/Dialect/Linalg/tile-tensors.mlir
index 8f13c690704572..cacca8a637ab77 100644
--- a/mlir/test/Dialect/Linalg/tile-tensors.mlir
+++ b/mlir/test/Dialect/Linalg/tile-tensors.mlir
@@ -167,3 +167,35 @@ module attributes {transform.with_named_sequence} {
     transform.yield
   }
 }
+
+// -----
+
+// CHECK-LABEL: func @non_monotonic_affine_expr
+//  CHECK-SAME:   %[[ARG0:[a-zA-Z0-9_]+]]: tensor<?xf32>
+func.func @non_monotonic_affine_expr(%arg0 : tensor<?xf32>) -> tensor<?xf32> {
+  %c0 = arith.constant 0 : index
+  %0 = tensor.dim %arg0, %c0 : tensor<?xf32>
+  %empty = tensor.empty(%0) : tensor<?xf32>
+
+  // CHECK: scf.for
+  // CHECK: %[[SIZE:[a-zA-Z0-9_]+]] = tensor.dim %[[ARG0]],
+  // CHECK: tensor.extract_slice %[[ARG0]][0] [%[[SIZE]]] [1] : tensor<?xf32> to tensor<?xf32>
+  %generic = linalg.generic
+    {indexing_maps = [affine_map<(d0) -> (d0 mod 3)>,
+                      affine_map<(d0) -> (d0)>],
+     iterator_types = ["parallel"]}
+    ins(%arg0: tensor<?xf32>)
+    outs(%empty : tensor<?xf32>) {
+    ^bb0(%in : f32, %out: f32):
+      linalg.yield %in : f32
+    } -> tensor<?xf32>
+  return %generic : tensor<?xf32>
+}
+
+module attributes {transform.with_named_sequence} {
+  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
+    %0 = transform.structured.match ops{["linalg.generic"]} in %arg1 : (!transform.any_op) -> !transform.any_op
+    %1, %loop = transform.structured.tile_using_for %0 tile_sizes [100] : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
+    transform.yield
+  }
+}

@mgehre-amd mgehre-amd changed the title computeSliceParameters: use full slice if affine exprs are non-monotonic [MLIR] computeSliceParameters: use full slice if affine exprs are non-monotonic Dec 3, 2024
// The offset & size computation below only handles the case when
// the map is monotonically increasing, i.e. the min and max values are
// attained at the lower and upper bounds of the iteration domain.
if (!isTiled(m, tileSizes) || !m.isComponentWiseMonotonicallyIncreasing()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this will just silently not tile? I think it might be better to error out to avoid having users expect one thing and it silently doing something else?

Copy link
Contributor Author

@mgehre-amd mgehre-amd Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will take the full slice of that dimension of the input tensor, but it will tile the other dimensions of the input tensor (if their expressions are monotonic).

We could also error out in this case, and require the user to instead use a tile size of 0 for each dim with non-monotonic expression.
This makes sense to me if there is just a single op being tiled.

If we tile & fuse through a whole chain of ops, and only the last one has non-monotonic maps, maybe it would be nicer to allow tiling to succeed even the last op in the chain has some dimensions un-tiled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we tile & fuse through a whole chain of ops, and only the last one has non-monotonic maps, maybe it would be nicer to allow tiling to succeed even the last op in the chain has some dimensions un-tiled?

I am not sure I can visualize this without an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that my check is too conservative (I actually want to tile affine_maps that contain mods) with safe tile sizes. I will abandon this PR.

@mgehre-amd mgehre-amd closed this Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mlir:core MLIR Core Infrastructure mlir:linalg mlir

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[mlir][linalg] Tiling: wrong offset computed when modulo operations are present in access maps

3 participants