Lower affine modulo by powers of two using bitwise AND #146311

sherylll · 2025-06-30T05:52:52Z

This patch adds a special-case optimization in the affine-to-standard lowering pass to replace modulo operations by constant powers of two with a single bitwise AND operation. This reduces instruction count and improves performance for common cases like x mod 2.

github-actions · 2025-06-30T05:53:09Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-06-30T05:53:36Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-affine

Author: Yuxi Sun (sherylll)

Changes

This patch adds a special-case optimization in the affine-to-standard lowering pass to replace modulo operations by constant powers of two with a single bitwise AND operation. This reduces instruction count and improves performance for common cases like x mod 2.

Full diff: https://github.com/llvm/llvm-project/pull/146311.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Affine/Utils/Utils.cpp (+12)
(modified) mlir/test/Conversion/AffineToStandard/lower-affine.mlir (+9)

diff --git a/mlir/lib/Dialect/Affine/Utils/Utils.cpp b/mlir/lib/Dialect/Affine/Utils/Utils.cpp
index 66b3f2a4f93a5..de9c7874767e4 100644
--- a/mlir/lib/Dialect/Affine/Utils/Utils.cpp
+++ b/mlir/lib/Dialect/Affine/Utils/Utils.cpp
@@ -80,12 +80,24 @@ class AffineApplyExpander
   ///         let remainder = srem a, b;
   ///             negative = a < 0 in
   ///         select negative, remainder + b, remainder.
+  ///
+  /// Special case for power of 2: use bitwise AND (x & (n-1)) for non-negative x.
   Value visitModExpr(AffineBinaryOpExpr expr) {
     if (auto rhsConst = dyn_cast<AffineConstantExpr>(expr.getRHS())) {
       if (rhsConst.getValue() <= 0) {
         emitError(loc, "modulo by non-positive value is not supported");
         return nullptr;
       }
+      
+      // Special case: x mod n where n is a power of 2 can be optimized to x & (n-1)
+      int64_t rhsValue = rhsConst.getValue();
+      if (rhsValue > 0 && (rhsValue & (rhsValue - 1)) == 0) {
+        auto lhs = visit(expr.getLHS());
+        assert(lhs && "unexpected affine expr lowering failure");
+        
+        Value maskCst = builder.create<arith::ConstantIndexOp>(loc, rhsValue - 1);
+        return builder.create<arith::AndIOp>(loc, lhs, maskCst);
+      }
     }
 
     auto lhs = visit(expr.getLHS());
diff --git a/mlir/test/Conversion/AffineToStandard/lower-affine.mlir b/mlir/test/Conversion/AffineToStandard/lower-affine.mlir
index 550ea71882e14..07f7c64fe6ea5 100644
--- a/mlir/test/Conversion/AffineToStandard/lower-affine.mlir
+++ b/mlir/test/Conversion/AffineToStandard/lower-affine.mlir
@@ -927,3 +927,12 @@ func.func @affine_parallel_with_reductions_i64(%arg0: memref<3x3xi64>, %arg1: me
 // CHECK:      scf.reduce.return %[[RES]] : i64
 // CHECK:    }
 // CHECK:  }
+
+#map_mod_8 = affine_map<(i) -> (i mod 8)>
+// CHECK-LABEL: func @affine_apply_mod_8
+func.func @affine_apply_mod_8(%arg0 : index) -> (index) {
+  // CHECK-NEXT: %[[c7:.*]] = arith.constant 7 : index
+  // CHECK-NEXT: %[[v0:.*]] = arith.andi %arg0, %[[c7]] : index
+  %0 = affine.apply #map_mod_8 (%arg0)
+  return %0 : index
+}

github-actions · 2025-06-30T07:57:22Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff HEAD~1 HEAD --extensions cpp -- mlir/lib/Dialect/Affine/Utils/Utils.cpp

View the diff from clang-format here.

diff --git a/mlir/lib/Dialect/Affine/Utils/Utils.cpp b/mlir/lib/Dialect/Affine/Utils/Utils.cpp
index de9c78747..0cffe52dd 100644
--- a/mlir/lib/Dialect/Affine/Utils/Utils.cpp
+++ b/mlir/lib/Dialect/Affine/Utils/Utils.cpp
@@ -81,21 +81,24 @@ public:
   ///             negative = a < 0 in
   ///         select negative, remainder + b, remainder.
   ///
-  /// Special case for power of 2: use bitwise AND (x & (n-1)) for non-negative x.
+  /// Special case for power of 2: use bitwise AND (x & (n-1)) for non-negative
+  /// x.
   Value visitModExpr(AffineBinaryOpExpr expr) {
     if (auto rhsConst = dyn_cast<AffineConstantExpr>(expr.getRHS())) {
       if (rhsConst.getValue() <= 0) {
         emitError(loc, "modulo by non-positive value is not supported");
         return nullptr;
       }
-      
-      // Special case: x mod n where n is a power of 2 can be optimized to x & (n-1)
+
+      // Special case: x mod n where n is a power of 2 can be optimized to x &
+      // (n-1)
       int64_t rhsValue = rhsConst.getValue();
       if (rhsValue > 0 && (rhsValue & (rhsValue - 1)) == 0) {
         auto lhs = visit(expr.getLHS());
         assert(lhs && "unexpected affine expr lowering failure");
-        
-        Value maskCst = builder.create<arith::ConstantIndexOp>(loc, rhsValue - 1);
+
+        Value maskCst =
+            builder.create<arith::ConstantIndexOp>(loc, rhsValue - 1);
         return builder.create<arith::AndIOp>(loc, lhs, maskCst);
       }
     }

Groverkss

Why would we not do this on arith.remsi / arith.remui?

sherylll · 2025-07-01T01:24:54Z

@Groverkss you mean add a canonicalizer to arith.remsi / arith.remui? In that case do we also need to handle cmp + select?

The case I have in mind is incrementing x starting from 0 with a step of 1, and x mod 2 is used to index one of the ping/pong buffer. Current canonicalization doesn't seem to optimize for LHS being non-negative, or RHS being power of 2.

sherylll · 2025-07-04T08:36:15Z

According to https://mlir.llvm.org/docs/Dialects/Affine/#affinefor-affineaffineforop: mod is the modulo operation: since its second argument is always positive, its results are always positive in our usage.

mod gets conservatively lowered into remsi, cmp, add, select. remsi produces negative result for negative LHS, so the power-of-2 optimization to a single and can not be applied. To optimize power-of-2 for remsi would require right shift, left shift, sub, and, select.

I guess we could implement different canonicalizers for arith.remsi and arith.remui, but optimizing directly on mod seems a cleaner solution to me, if the power-of-2 optimization is considered necessary...

bondhugula · 2025-10-06T11:21:34Z

According to https://mlir.llvm.org/docs/Dialects/Affine/#affinefor-affineaffineforop: mod is the modulo operation: since its second argument is always positive, its results are always positive in our usage.

mod gets conservatively lowered into remsi, cmp, add, select. remsi produces negative result for negative LHS, so the power-of-2 optimization to a single and can not be applied. To optimize power-of-2 for remsi would require right shift, left shift, sub, and, select.

I guess we could implement different canonicalizers for arith.remsi and arith.remui, but optimizing directly on mod seems a cleaner solution to me, if the power-of-2 optimization is considered necessary...

You are right - this is accurate. The result of an affine mod expression is guaranteed to be positive. So, exploiting this information right away will lead to a single andi and ensure that we won't need a multi-op canonicalization involving rem + cmp + add + select.

bondhugula · 2025-10-06T11:34:36Z

@Groverkss you mean add a canonicalizer to arith.remsi / arith.remui? In that case do we also need to handle cmp + select?

The case I have in mind is incrementing x starting from 0 with a step of 1, and x mod 2 is used to index one of the ping/pong buffer. Current canonicalization doesn't seem to optimize for LHS being non-negative, or RHS being power of 2.

If you had x mod 2, with x being a non-negative IV, the range analysis that exists should conclude that the result of remsi is always positive, optimizing away the cmp + select. But that doesn't accomplish the better lowering you are enabling here -- because if the LHS is negative, we can still generate a single andi for a mod expression in affine maps. So, if it's a canonicalization, it will have to look at remsi + cmp + add + select.

I'm actually surprised we missed this optimization for 6 years! :-) I'm in favor of adding this - perhaps under a flag if needed.

CC: @ftynse for review as well.

Change request comment already responded to, and it's time to take another look. Rerequesting review.

bondhugula · 2025-10-07T13:12:48Z

mlir/test/Conversion/AffineToStandard/lower-affine.mlir

+func.func @affine_apply_mod_8(%arg0 : index) -> (index) {
+  // CHECK-NEXT: %[[c7:.*]] = arith.constant 7 : index
+  // CHECK-NEXT: %[[v0:.*]] = arith.andi %arg0, %[[c7]] : index
+  %0 = affine.apply #map_mod_8 (%arg0)


You can put the map inline for better readability.

bondhugula · 2025-10-07T13:13:27Z

mlir/test/Conversion/AffineToStandard/lower-affine.mlir

 // CHECK:  }
+
+#map_mod_8 = affine_map<(i) -> (i mod 8)>
+// CHECK-LABEL: func @affine_apply_mod_8


Can you also check if using ... mod 1 doesn't lead to any unexpected behavior.

bondhugula · 2025-10-07T13:13:47Z

mlir/lib/Dialect/Affine/Utils/Utils.cpp

      }
+
+      // Special case: x mod n where n is a power of 2 can be optimized to x &
+      // (n-1)


Terminate all comments with a full stop. LLVM style.

bondhugula · 2025-10-07T13:13:58Z

mlir/lib/Dialect/Affine/Utils/Utils.cpp

  ///             negative = a < 0 in
  ///         select negative, remainder + b, remainder.
+  ///
+  /// Special case for power of 2: use bitwise AND (x & (n-1)) for non-negative


power of 2 RHS

bondhugula · 2025-10-07T13:15:42Z

mlir/lib/Dialect/Affine/Utils/Utils.cpp

+      // Special case: x mod n where n is a power of 2 can be optimized to x &
+      // (n-1)
+      int64_t rhsValue = rhsConst.getValue();
+      if (rhsValue > 0 && (rhsValue & (rhsValue - 1)) == 0) {


But we've already returned for all rhs values <= 0 at L90. rhsValue is guaranteed to be positive now. This check is unnecessary.

bondhugula · 2025-10-07T13:16:24Z

mlir/lib/Dialect/Affine/Utils/Utils.cpp

+
+      // Special case: x mod n where n is a power of 2 can be optimized to x &
+      // (n-1)
+      int64_t rhsValue = rhsConst.getValue();


Move this assignment to above L88 to avoid multiple calls to getValue().

Groverkss · 2025-10-07T13:23:23Z

Sorry, i didn't look at this until i got a "rerequest review" notification. Thanks for answering why it cannot be done on remsi/remui, that sounds fair, and I wouldve dismissed my review.

If you had x mod 2, with x being a non-negative IV, the range analysis that exists should conclude that the result of remsi is always positive, optimizing away the cmp + select. But that doesn't accomplish the better lowering you are enabling here -- because if the LHS is negative, we can still generate a single andi for a mod expression in affine maps. So, if it's a canonicalization, it will have to look at remsi + cmp + add + select.

So this optimization works for x mod 2 even if we have no range information on x? Then we should plumb a math.mod operation and implement this canonicalization on that. We already have floor : https://mlir.llvm.org/docs/Dialects/MathOps/#mathfloor-mathfloorop, it should be okay to have mod too.

I usually prefer lowering to an op, and then letting the op canonicalize itself to something better, based on the lowered op's properties, than special casing the lowering, but that's a preference and I don't know what the correct way to do it is, so i'll not block this.

bondhugula · 2025-10-07T18:12:55Z

Sorry, i didn't look at this until i got a "rerequest review" notification. Thanks for answering why it cannot be done on remsi/remui, that sounds fair, and I wouldve dismissed my review.

If you had x mod 2, with x being a non-negative IV, the range analysis that exists should conclude that the result of remsi is always positive, optimizing away the cmp + select. But that doesn't accomplish the better lowering you are enabling here -- because if the LHS is negative, we can still generate a single andi for a mod expression in affine maps. So, if it's a canonicalization, it will have to look at remsi + cmp + add + select.

So this optimization works for x mod 2 even if we have no range information on x?

Absolutely. No range info is needed - this PR already does it.

Then we should plumb a math.mod operation and implement this canonicalization on that. We already have floor :
https://mlir.llvm.org/docs/Dialects/MathOps/#mathfloor-mathfloorop, it should be okay to have mod too.

I usually prefer lowering to an op, and then letting the op canonicalize itself to something better, based on the lowered op's properties, than special casing the lowering, but that's a preference and I don't know what the correct way to do it is, so i'll not block this.

Typically, an op shouldn't be added to such a dialect (like math) without a lowering implemented to LLVM. Adding to math will also require a separate discussion. I wouldn't block this straightforward and elegant improvement on that. Instead, when such an op is ready to be added, with its LLVM lowering also implemented and ready, we can move the lowering to that. That won't break current paths. Canonicalization on that op would ideally be one more PR.

bondhugula · 2025-10-11T03:58:07Z

@sherylll - are you available to take this forward?

llvmbot added mlir:affine mlir labels Jun 30, 2025

Lower affine modulo by powers of two using bitwise AND

db4de47

sherylll force-pushed the affine/modulo-optimization branch from 9792917 to db4de47 Compare June 30, 2025 08:22

Groverkss previously requested changes Jun 30, 2025

View reviewed changes

bondhugula requested review from bondhugula and ftynse October 6, 2025 11:34

bondhugula requested a review from Groverkss October 7, 2025 13:10

bondhugula reviewed Oct 7, 2025

View reviewed changes

Merge branch 'main' into affine/modulo-optimization

3c1596a

Lower affine modulo by powers of two using bitwise AND #146311

Are you sure you want to change the base?

Lower affine modulo by powers of two using bitwise AND #146311

Uh oh!

Conversation

sherylll commented Jun 30, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

llvmbot commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

Groverkss left a comment

Choose a reason for hiding this comment

Uh oh!

sherylll commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sherylll commented Jul 4, 2025

Uh oh!

bondhugula commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bondhugula commented Oct 6, 2025

Uh oh!

bondhugula Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

bondhugula Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

bondhugula Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

bondhugula Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

bondhugula Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

bondhugula Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Groverkss commented Oct 7, 2025

Uh oh!

bondhugula commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bondhugula commented Oct 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

llvmbot commented Jun 30, 2025 •

edited

Loading

sherylll commented Jul 1, 2025 •

edited

Loading

bondhugula commented Oct 6, 2025 •

edited

Loading

bondhugula commented Oct 7, 2025 •

edited

Loading