Skip to content

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Sep 21, 2025

Currently we generate (S|U)Max(1, Op) for Op >= 1. This may discard divisibility info of Op. This patch rewrites such SMax/UMax expressions to use the lowest common multiplier for all non-constant operands.

@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Sep 21, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 21, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Florian Hahn (fhahn)

Changes

Currently we generate (S|U)Max(1, Op) for Op >= 1. This may discard divisibility info of Op. This patch rewrites such SMax/UMax expressions to use the lowest common multiplier for all non-constant operands.


Full diff: https://github.com/llvm/llvm-project/pull/160012.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/ScalarEvolution.cpp (+22-2)
  • (modified) llvm/test/Analysis/ScalarEvolution/trip-count-minmax.ll (+2-2)
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index b08399b381f34..ee1f92a4197e8 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -15850,12 +15850,17 @@ void ScalarEvolution::LoopGuards::collectFromBlock(
         To = SE.getUMaxExpr(FromRewritten, RHS);
         if (auto *UMin = dyn_cast<SCEVUMinExpr>(FromRewritten))
           EnqueueOperands(UMin);
+        if (RHS->isOne())
+          ExprsToRewrite.push_back(From);
         break;
       case CmpInst::ICMP_SGT:
       case CmpInst::ICMP_SGE:
         To = SE.getSMaxExpr(FromRewritten, RHS);
-        if (auto *SMin = dyn_cast<SCEVSMinExpr>(FromRewritten))
+        if (auto *SMin = dyn_cast<SCEVSMinExpr>(FromRewritten)) {
           EnqueueOperands(SMin);
+        }
+        if (RHS->isOne())
+          ExprsToRewrite.push_back(From);
         break;
       case CmpInst::ICMP_EQ:
         if (isa<SCEVConstant>(RHS))
@@ -15986,7 +15991,22 @@ void ScalarEvolution::LoopGuards::collectFromBlock(
     for (const SCEV *Expr : ExprsToRewrite) {
       const SCEV *RewriteTo = Guards.RewriteMap[Expr];
       Guards.RewriteMap.erase(Expr);
-      Guards.RewriteMap.insert({Expr, Guards.rewrite(RewriteTo)});
+      const SCEV *Rewritten = Guards.rewrite(RewriteTo);
+
+      // Try to strengthen divisibility of SMax/UMax expressions coming from >=
+      // 1 conditions.
+      if (auto *SMax = dyn_cast<SCEVSMaxExpr>(Rewritten)) {
+        unsigned MinTrailingZeros = SE.getMinTrailingZeros(SMax->getOperand(1));
+        for (const SCEV *Op : drop_begin(SMax->operands(), 2))
+          MinTrailingZeros =
+              std::min(MinTrailingZeros, SE.getMinTrailingZeros(Op));
+        if (MinTrailingZeros != 0)
+          Rewritten = SE.getSMaxExpr(
+              SE.getConstant(APInt(SMax->getType()->getScalarSizeInBits(), 1)
+                                 .shl(MinTrailingZeros)),
+              SMax);
+      }
+      Guards.RewriteMap.insert({Expr, Rewritten});
     }
   }
 }
diff --git a/llvm/test/Analysis/ScalarEvolution/trip-count-minmax.ll b/llvm/test/Analysis/ScalarEvolution/trip-count-minmax.ll
index 8d091a00ed4b9..d38010403dad7 100644
--- a/llvm/test/Analysis/ScalarEvolution/trip-count-minmax.ll
+++ b/llvm/test/Analysis/ScalarEvolution/trip-count-minmax.ll
@@ -61,7 +61,7 @@ define void @umin(i32 noundef %a, i32 noundef %b) {
 ; CHECK-NEXT:  Loop %for.body: backedge-taken count is (-1 + ((2 * %a) umin (4 * %b)))
 ; CHECK-NEXT:  Loop %for.body: constant max backedge-taken count is i32 2147483646
 ; CHECK-NEXT:  Loop %for.body: symbolic max backedge-taken count is (-1 + ((2 * %a) umin (4 * %b)))
-; CHECK-NEXT:  Loop %for.body: Trip multiple is 1
+; CHECK-NEXT:  Loop %for.body: Trip multiple is 2
 ;
 ; void umin(unsigned a, unsigned b) {
 ;   a *= 2;
@@ -157,7 +157,7 @@ define void @smin(i32 noundef %a, i32 noundef %b) {
 ; CHECK-NEXT:  Loop %for.body: backedge-taken count is (-1 + ((2 * %a)<nsw> smin (4 * %b)<nsw>))
 ; CHECK-NEXT:  Loop %for.body: constant max backedge-taken count is i32 2147483646
 ; CHECK-NEXT:  Loop %for.body: symbolic max backedge-taken count is (-1 + ((2 * %a)<nsw> smin (4 * %b)<nsw>))
-; CHECK-NEXT:  Loop %for.body: Trip multiple is 1
+; CHECK-NEXT:  Loop %for.body: Trip multiple is 2
 ;
 ; void smin(signed a, signed b) {
 ;   a *= 2;

fhahn added a commit to fhahn/llvm-project that referenced this pull request Sep 22, 2025
When re-writing SCEVAddExprs to apply information from guards, check if
we have information for the expression itself. If so, apply it.

When we have an expression of the form (Const + A),  check if we have
have guard info for (Const + 1 + A) and use it. This is needed to avoid
regressions in a few cases, where we have BTCs with a subtracted
constant.

Rewriting expressions could cause regressions, e.g. when comparing 2
SCEV expressions where we are only able to rewrite one side, but I could
not find any cases where this happens more with this patch in practice.

Depends on llvm#160012 (included in
PR)

Proofs for some of the test changes: https://alive2.llvm.org/ce/z/RPX6t_
@fhahn
Copy link
Contributor Author

fhahn commented Sep 22, 2025

No differences on llvm-opt-benchmark (dtcxzyw/llvm-opt-benchmark#2846), but there are a few changes on large C/C++ corpus with unrolling and vectorization enabled.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea makes sense to me, but TBH I'm pretty lost in the code structure here. Why are we fixing this up after the fact rather than creating the umax/umin with the larger value from the start?

Also would it make sense to do something more generic during SCEV construction here? Or do we expect this to only be useful for guards?

fhahn added a commit that referenced this pull request Sep 22, 2025
@fhahn fhahn force-pushed the scev-guards-smax-umax-divisibility branch from 7d05774 to 941d620 Compare September 22, 2025 20:03
Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea makes sense to me, but TBH I'm pretty lost in the code structure here. Why are we fixing this up after the fact rather than creating the umax/umin with the larger value from the start?

The benefit of delaying is that we can delay the re-write until we collected information from all loop guards, making the code independent of the order of guards.

We could have 3 guards, establishing

  • umax(%a, %b) > 0
  • %a multiple of 2
  • %b multiple of 2

When we construct umax(1, %a, %b) for the first condition, we may not yet have the information available that %a and %b are multiple of 2.

But once we collected all information, we can rewrite umax(1, %a, %b) to something like umax(1, 2 * %a / 2, 2* %b / 2) and get the common multiple using the info from the guards.

Not sure if there's a nicer way to keep things independent of the guard order.

Also would it make sense to do something more generic during SCEV construction here? Or do we expect this to only be useful for guards?

Hmm, I think the current code relies on the fact that the UMax/SMax with the constant is coming from a compare w/o the constant part on the left side.

Are there any particluar cases you are thinking of on construction?

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 22, 2025
@nikic
Copy link
Contributor

nikic commented Sep 22, 2025

Hmm, I think the current code relies on the fact that the UMax/SMax with the constant is coming from a compare w/o the constant part on the left side.

Are there any particluar cases you are thinking of on construction?

What I meant is that we can generally fold umax(C1, C2*x) with C2>C1 and x!=0 to umax(C2, C2*x). But I guess x!=0 is the critical part here -- we generally do not know that outside the guard context (or rather, can't ignore it outside the guard context).

@fhahn
Copy link
Contributor Author

fhahn commented Sep 23, 2025

Hmm, I think the current code relies on the fact that the UMax/SMax with the constant is coming from a compare w/o the constant part on the left side.
Are there any particluar cases you are thinking of on construction?

What I meant is that we can generally fold umax(C1, C2*x) with C2>C1 and x!=0 to umax(C2, C2*x). But I guess x!=0 is the critical part here -- we generally do not know that outside the guard context (or rather, can't ignore it outside the guard context).

Yep, I can try to see if this would also trigger in practice at construction, but we would still need the guard-specific logic w/o the != 0 check

@preames
Copy link
Collaborator

preames commented Sep 23, 2025

Yep, I can try to see if this would also trigger in practice at construction, but we would still need the guard-specific logic w/o the != 0 check

Your wording here triggered a thought. When phrased like this, this sounds a lot like SCEV construction under an assumption (or predicate) that x != 0. We have a bunch of logic of this variety in PredicatedScalarEvolution, and our assumption handling already, is there a possibility for code sharing here?

(This may not be worth the work to actually do immediately. Not a blocking comment by any means.)

fhahn added a commit to fhahn/llvm-project that referenced this pull request Sep 24, 2025
When collecting information from loop guards, use UMax(1, %b - %a) for
ICMP NE %a, %b, if neither are constant.

This improves results in some cases, and will be even more useful
together with
 * llvm#160012
 * llvm#159942

https://alive2.llvm.org/ce/z/YyBvoT
@fhahn
Copy link
Contributor Author

fhahn commented Sep 24, 2025

Yep, I can try to see if this would also trigger in practice at construction, but we would still need the guard-specific logic w/o the != 0 check

Your wording here triggered a thought. When phrased like this, this sounds a lot like SCEV construction under an assumption (or predicate) that x != 0. We have a bunch of logic of this variety in PredicatedScalarEvolution, and our assumption handling already, is there a possibility for code sharing here?

(This may not be worth the work to actually do immediately. Not a blocking comment by any means.)

Thanks, I need to think about this a bit more. With both PredicatedScalarEvolution and loop guards we rewrite SCEV expressions given extra information (runtime predicates and information from guards respectively), but currently there does't seem much overlap in the types of expressions we rewrite, with PSE mostly focused on extends and forced AddRecs.

@fhahn fhahn force-pushed the scev-guards-smax-umax-divisibility branch from 941d620 to 59a8e0f Compare September 29, 2025 18:26
@fhahn
Copy link
Contributor Author

fhahn commented Sep 29, 2025

ping :)

fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 1, 2025
When collecting information from loop guards, use UMax(1, %b - %a) for
ICMP NE %a, %b, if neither are constant.

This improves results in some cases, and will be even more useful
together with
 * llvm#160012
 * llvm#159942

https://alive2.llvm.org/ce/z/YyBvoT
@fhahn fhahn force-pushed the scev-guards-smax-umax-divisibility branch from 59a8e0f to 8e50ec5 Compare October 6, 2025 14:43
@fhahn fhahn force-pushed the scev-guards-smax-umax-divisibility branch from 502cef4 to 258fb8f Compare October 8, 2025 20:37
case CmpInst::ICMP_UGE: {
const SCEV *OpAlignedUp =
DividesBy ? GetNextSCEVDividesByDivisor(RHS, DividesBy) : RHS;
To = SE.getUMaxExpr(FromRewritten, OpAlignedUp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, are the changes here necessary at all? It looks like we are already doing these next/prev divisor adjustments for RHS in the switch above this. Maybe the generalization of the divisor logic is sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep it looks like we get all the benfits from the patch with just the switch to getConstantMultiple: #162617

fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 9, 2025
Simplify and generalize the code to get a common constant multiple for
expressions when collecting guards, replacing the manual implementation.

Split off from llvm#160012.
fhahn added a commit that referenced this pull request Oct 9, 2025
…s. (#162617)

Simplify and generalize the code to get a common constant multiple for
expressions when collecting guards, replacing the manual implementation.

Split off from #160012.

PR: #162617
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 9, 2025
… from guards. (#162617)

Simplify and generalize the code to get a common constant multiple for
expressions when collecting guards, replacing the manual implementation.

Split off from llvm/llvm-project#160012.

PR: llvm/llvm-project#162617
@fhahn
Copy link
Contributor Author

fhahn commented Oct 9, 2025

Simplified version in #162617 handles all cases, closing for now

@fhahn fhahn closed this Oct 9, 2025
@fhahn fhahn deleted the scev-guards-smax-umax-divisibility branch October 9, 2025 10:23
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 9, 2025
When re-writing SCEVAddExprs to apply information from guards, check if
we have information for the expression itself. If so, apply it.

When we have an expression of the form (Const + A),  check if we have
have guard info for (Const + 1 + A) and use it. This is needed to avoid
regressions in a few cases, where we have BTCs with a subtracted
constant.

Rewriting expressions could cause regressions, e.g. when comparing 2
SCEV expressions where we are only able to rewrite one side, but I could
not find any cases where this happens more with this patch in practice.

Depends on llvm#160012 (included in
PR)

Proofs for some of the test changes: https://alive2.llvm.org/ce/z/RPX6t_
svkeerthy pushed a commit that referenced this pull request Oct 9, 2025
…s. (#162617)

Simplify and generalize the code to get a common constant multiple for
expressions when collecting guards, replacing the manual implementation.

Split off from #160012.

PR: #162617
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 9, 2025
When re-writing SCEVAddExprs to apply information from guards, check if
we have information for the expression itself. If so, apply it.

When we have an expression of the form (Const + A),  check if we have
have guard info for (Const + 1 + A) and use it. This is needed to avoid
regressions in a few cases, where we have BTCs with a subtracted
constant.

Rewriting expressions could cause regressions, e.g. when comparing 2
SCEV expressions where we are only able to rewrite one side, but I could
not find any cases where this happens more with this patch in practice.

Depends on llvm#160012 (included in
PR)

Proofs for some of the test changes: https://alive2.llvm.org/ce/z/RPX6t_
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 9, 2025
When re-writing SCEVAddExprs to apply information from guards, check if
we have information for the expression itself. If so, apply it.

When we have an expression of the form (Const + A),  check if we have
have guard info for (Const + 1 + A) and use it. This is needed to avoid
regressions in a few cases, where we have BTCs with a subtracted
constant.

Rewriting expressions could cause regressions, e.g. when comparing 2
SCEV expressions where we are only able to rewrite one side, but I could
not find any cases where this happens more with this patch in practice.

Depends on llvm#160012 (included in
PR)

Proofs for some of the test changes: https://alive2.llvm.org/ce/z/RPX6t_
clingfei pushed a commit to clingfei/llvm-project that referenced this pull request Oct 10, 2025
…s. (llvm#162617)

Simplify and generalize the code to get a common constant multiple for
expressions when collecting guards, replacing the manual implementation.

Split off from llvm#160012.

PR: llvm#162617
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 11, 2025
When re-writing SCEVAddExprs to apply information from guards, check if
we have information for the expression itself. If so, apply it.

When we have an expression of the form (Const + A),  check if we have
have guard info for (Const + 1 + A) and use it. This is needed to avoid
regressions in a few cases, where we have BTCs with a subtracted
constant.

Rewriting expressions could cause regressions, e.g. when comparing 2
SCEV expressions where we are only able to rewrite one side, but I could
not find any cases where this happens more with this patch in practice.

Depends on llvm#160012 (included in
PR)

Proofs for some of the test changes: https://alive2.llvm.org/ce/z/RPX6t_
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 13, 2025
When collecting information from loop guards, use UMax(1, %b - %a) for
ICMP NE %a, %b, if neither are constant.

This improves results in some cases, and will be even more useful
together with
 * llvm#160012
 * llvm#159942

https://alive2.llvm.org/ce/z/YyBvoT
DharuniRAcharya pushed a commit to DharuniRAcharya/llvm-project that referenced this pull request Oct 13, 2025
…s. (llvm#162617)

Simplify and generalize the code to get a common constant multiple for
expressions when collecting guards, replacing the manual implementation.

Split off from llvm#160012.

PR: llvm#162617
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 13, 2025
When collecting information from loop guards, use UMax(1, %b - %a) for
ICMP NE %a, %b, if neither are constant.

This improves results in some cases, and will be even more useful
together with
 * llvm#160012
 * llvm#159942

https://alive2.llvm.org/ce/z/YyBvoT
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 13, 2025
When re-writing SCEVAddExprs to apply information from guards, check if
we have information for the expression itself. If so, apply it.

When we have an expression of the form (Const + A),  check if we have
have guard info for (Const + 1 + A) and use it. This is needed to avoid
regressions in a few cases, where we have BTCs with a subtracted
constant.

Rewriting expressions could cause regressions, e.g. when comparing 2
SCEV expressions where we are only able to rewrite one side, but I could
not find any cases where this happens more with this patch in practice.

Depends on llvm#160012 (included in
PR)

Proofs for some of the test changes: https://alive2.llvm.org/ce/z/RPX6t_
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 13, 2025
When collecting information from loop guards, use UMax(1, %b - %a) for
ICMP NE %a, %b, if neither are constant.

This improves results in some cases, and will be even more useful
together with
 * llvm#160012
 * llvm#159942

https://alive2.llvm.org/ce/z/YyBvoT
fhahn added a commit that referenced this pull request Oct 14, 2025
When collecting information from loop guards, use UMax(1, %b - %a) for
ICMP NE %a, %b, if neither are constant.

This improves results in some cases, and will be even more useful
together with
 * #160012
 * #159942

https://alive2.llvm.org/ce/z/YyBvoT

PR: #160500
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 14, 2025
…500)

When collecting information from loop guards, use UMax(1, %b - %a) for
ICMP NE %a, %b, if neither are constant.

This improves results in some cases, and will be even more useful
together with
 * llvm/llvm-project#160012
 * llvm/llvm-project#159942

https://alive2.llvm.org/ce/z/YyBvoT

PR: llvm/llvm-project#160500
akadutta pushed a commit to akadutta/llvm-project that referenced this pull request Oct 14, 2025
…s. (llvm#162617)

Simplify and generalize the code to get a common constant multiple for
expressions when collecting guards, replacing the manual implementation.

Split off from llvm#160012.

PR: llvm#162617
akadutta pushed a commit to akadutta/llvm-project that referenced this pull request Oct 14, 2025
When collecting information from loop guards, use UMax(1, %b - %a) for
ICMP NE %a, %b, if neither are constant.

This improves results in some cases, and will be even more useful
together with
 * llvm#160012
 * llvm#159942

https://alive2.llvm.org/ce/z/YyBvoT

PR: llvm#160500
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 15, 2025
When re-writing SCEVAddExprs to apply information from guards, check if
we have information for the expression itself. If so, apply it.

When we have an expression of the form (Const + A),  check if we have
have guard info for (Const + 1 + A) and use it. This is needed to avoid
regressions in a few cases, where we have BTCs with a subtracted
constant.

Rewriting expressions could cause regressions, e.g. when comparing 2
SCEV expressions where we are only able to rewrite one side, but I could
not find any cases where this happens more with this patch in practice.

Depends on llvm#160012 (included in
PR)

Proofs for some of the test changes: https://alive2.llvm.org/ce/z/RPX6t_
fhahn added a commit to fhahn/llvm-project that referenced this pull request Oct 15, 2025
When re-writing SCEVAddExprs to apply information from guards, check if
we have information for the expression itself. If so, apply it.

When we have an expression of the form (Const + A),  check if we have
have guard info for (Const + 1 + A) and use it. This is needed to avoid
regressions in a few cases, where we have BTCs with a subtracted
constant.

Rewriting expressions could cause regressions, e.g. when comparing 2
SCEV expressions where we are only able to rewrite one side, but I could
not find any cases where this happens more with this patch in practice.

Depends on llvm#160012 (included in
PR)

Proofs for some of the test changes: https://alive2.llvm.org/ce/z/RPX6t_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants