[InstCombine] Avoid unprofitable add with remainder transform #147319

nikic · 2025-07-07T15:18:15Z

If C1 is 1 and we're working with a power of two divisor, this will end up replacing the and for the remainder with a multiply and a longer dependency chain.

Fixes #147176.

llvmbot · 2025-07-07T15:18:44Z

@llvm/pr-subscribers-llvm-transforms

Author: Nikita Popov (nikic)

Changes

If C1 is 1, this will end up replacing the remainder with a multiply and a longer dependency chain. This is clearly unprofitable in the case where the remainder is an and, but I think the profitability is also questionable for urem (because usually udiv and urem are produced by the same instruction), so I've disabled both cases.

Fixes #147176.

Full diff: https://github.com/llvm/llvm-project/pull/147319.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp (+3-1)
(modified) llvm/test/Transforms/InstCombine/add4.ll (+86)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
index 1ba548b6ff062..1ad0a9c488e80 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
@@ -1193,8 +1193,10 @@ Value *InstCombinerImpl::SimplifyAddWithRemainder(BinaryOperator &I) {
   }
   Value *DivOpV;
   APInt DivOpC;
+  // The transform is valid for C1==1, but not profitable.
   if (MatchRem(Rem, X, C0, IsSigned) &&
-      MatchDiv(Div, DivOpV, DivOpC, IsSigned) && X == DivOpV && C0 == DivOpC) {
+      MatchDiv(Div, DivOpV, DivOpC, IsSigned) && X == DivOpV && C0 == DivOpC &&
+      !C1.isOne()) {
     APInt NewC = C1 - C2 * C0;
     if (!NewC.isZero() && !Rem->hasOneUse())
       return nullptr;
diff --git a/llvm/test/Transforms/InstCombine/add4.ll b/llvm/test/Transforms/InstCombine/add4.ll
index 0e97deb4d98ad..f766ccf651aa2 100644
--- a/llvm/test/Transforms/InstCombine/add4.ll
+++ b/llvm/test/Transforms/InstCombine/add4.ll
@@ -289,3 +289,89 @@ entry:
   %add = add i32 %shl, %rem
   ret i32 %add
 }
+
+define i32 @fold_add_udiv_urem_no_mul(i32 noundef %val) {
+; CHECK-LABEL: @fold_add_udiv_urem_no_mul(
+; CHECK-NEXT:    [[DIV:%.*]] = udiv i32 [[VAL:%.*]], 10
+; CHECK-NEXT:    [[REM:%.*]] = urem i32 [[VAL]], 10
+; CHECK-NEXT:    [[ADD:%.*]] = add nuw nsw i32 [[DIV]], [[REM]]
+; CHECK-NEXT:    ret i32 [[ADD]]
+;
+  %div = udiv i32 %val, 10
+  %rem = urem i32 %val, 10
+  %add = add i32 %div, %rem
+  ret i32 %add
+}
+
+define i32 @fold_add_udiv_urem_rem_mul(i32 noundef %val) {
+; CHECK-LABEL: @fold_add_udiv_urem_rem_mul(
+; CHECK-NEXT:    [[DIV:%.*]] = udiv i32 [[VAL:%.*]], 10
+; CHECK-NEXT:    [[REM:%.*]] = urem i32 [[VAL]], 10
+; CHECK-NEXT:    [[MUL:%.*]] = mul nuw nsw i32 [[REM]], 3
+; CHECK-NEXT:    [[ADD:%.*]] = add nuw nsw i32 [[DIV]], [[MUL]]
+; CHECK-NEXT:    ret i32 [[ADD]]
+;
+  %div = udiv i32 %val, 10
+  %rem = urem i32 %val, 10
+  %mul = mul i32 %rem, 3
+  %add = add i32 %div, %mul
+  ret i32 %add
+}
+
+define i32 @fold_add_udiv_urem_pow2_no_mul(i32 noundef %arg) {
+; CHECK-LABEL: @fold_add_udiv_urem_pow2_no_mul(
+; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 [[ARG:%.*]], 4
+; CHECK-NEXT:    [[AND:%.*]] = and i32 [[ARG]], 15
+; CHECK-NEXT:    [[ADD:%.*]] = add nuw nsw i32 [[LSHR]], [[AND]]
+; CHECK-NEXT:    ret i32 [[ADD]]
+;
+  %lshr = lshr i32 %arg, 4
+  %and = and i32 %arg, 15
+  %add = add i32 %lshr, %and
+  ret i32 %add
+}
+
+define i32 @fold_add_udiv_urem_pow2_div_mul(i32 noundef %arg) {
+; CHECK-LABEL: @fold_add_udiv_urem_pow2_div_mul(
+; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 [[ARG:%.*]], 4
+; CHECK-NEXT:    [[TMP1:%.*]] = mul i32 [[LSHR]], -13
+; CHECK-NEXT:    [[ADD:%.*]] = add i32 [[TMP1]], [[ARG]]
+; CHECK-NEXT:    ret i32 [[ADD]]
+;
+  %lshr = lshr i32 %arg, 4
+  %mul = mul i32 %lshr, 3
+  %and = and i32 %arg, 15
+  %add = add i32 %mul, %and
+  ret i32 %add
+}
+
+define i32 @fold_add_udiv_urem_pow2_rem_mul(i32 noundef %arg) {
+; CHECK-LABEL: @fold_add_udiv_urem_pow2_rem_mul(
+; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 [[ARG:%.*]], 4
+; CHECK-NEXT:    [[AND:%.*]] = and i32 [[ARG]], 15
+; CHECK-NEXT:    [[MUL:%.*]] = mul nuw nsw i32 [[AND]], 3
+; CHECK-NEXT:    [[ADD:%.*]] = add nuw nsw i32 [[LSHR]], [[MUL]]
+; CHECK-NEXT:    ret i32 [[ADD]]
+;
+  %lshr = lshr i32 %arg, 4
+  %and = and i32 %arg, 15
+  %mul = mul i32 %and, 3
+  %add = add i32 %lshr, %mul
+  ret i32 %add
+}
+
+define i32 @fold_add_udiv_urem_pow2_both_mul(i32 noundef %arg) {
+; CHECK-LABEL: @fold_add_udiv_urem_pow2_both_mul(
+; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 [[ARG:%.*]], 4
+; CHECK-NEXT:    [[TMP1:%.*]] = mul i32 [[ARG]], 3
+; CHECK-NEXT:    [[TMP2:%.*]] = mul i32 [[LSHR]], -41
+; CHECK-NEXT:    [[ADD:%.*]] = add i32 [[TMP2]], [[TMP1]]
+; CHECK-NEXT:    ret i32 [[ADD]]
+;
+  %lshr = lshr i32 %arg, 4
+  %mul1 = mul i32 %lshr, 7
+  %and = and i32 %arg, 15
+  %mul2 = mul i32 %and, 3
+  %add = add i32 %mul1, %mul2
+  ret i32 %add
+}

nikic · 2025-07-07T19:55:51Z

If C1 is 1, this will end up replacing the remainder with a multiply and a longer dependency chain. This is clearly unprofitable in the case where the remainder is an and, but I think the profitability is also questionable for urem (because usually udiv and urem are produced by the same instruction), so I've disabled both cases.

I've added a special case where the divisor is 2, as that gets beneficial followup folds.

On further consideration, I think it probably generally makes sense to do the transform for the cases using actual div/rem. For constant divisors we're not going to use actual divrem instructions anyway, and after playing around with codegen for different cases a bit, I think the mul ends up being marginally better in most cases.

nikic · 2025-07-08T07:29:35Z

Updated to narrow the exclusion to just the unsigned, power-of-two (which is not 2) case, so this should now only exclude the specific case of converting and to mul.

dtcxzyw

LGTM.

nikic added 2 commits July 7, 2025 17:07

Add additional tests for add with remainder fold

f526aae

Don't perform transform for C1 == 1

9b937dd

nikic requested a review from dtcxzyw July 7, 2025 15:18

llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Jul 7, 2025

nikic mentioned this pull request Jul 7, 2025

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

zyw-bot mentioned this pull request Jul 7, 2025

pre-commit: PR147319 dtcxzyw/llvm-opt-benchmark#2554

Closed

Add special case for division by 2

91b34ad

zyw-bot mentioned this pull request Jul 7, 2025

pre-commit: PR147319 dtcxzyw/llvm-opt-benchmark#2556

Closed

Only exclude the power of two unsigned case

26176df

dtcxzyw approved these changes Jul 8, 2025

View reviewed changes

nikic merged commit 889854b into llvm:main Jul 8, 2025
9 checks passed

nikic deleted the add-with-remainder branch July 8, 2025 12:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[InstCombine] Avoid unprofitable add with remainder transform #147319

[InstCombine] Avoid unprofitable add with remainder transform #147319

Uh oh!

nikic commented Jul 7, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jul 7, 2025

Uh oh!

nikic commented Jul 7, 2025

Uh oh!

nikic commented Jul 8, 2025

Uh oh!

dtcxzyw left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[InstCombine] Avoid unprofitable add with remainder transform #147319

[InstCombine] Avoid unprofitable add with remainder transform #147319

Uh oh!

Conversation

nikic commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 7, 2025

Uh oh!

nikic commented Jul 7, 2025

Uh oh!

nikic commented Jul 8, 2025

Uh oh!

dtcxzyw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nikic commented Jul 7, 2025 •

edited

Loading