Skip to content

Conversation

@m-sai
Copy link

@m-sai m-sai commented Jun 9, 2024

…on PHI

Add support for Induction PHI when calculating peel count.

The following cases are supported

void binary() {
  int x = 0;
  int y = 0;
  int a = 0;
  for (int i = 0; i < 1000; i++) {
    g(x);
    x = y;
    g(a);
    y = a;
    a = i + 2;
  }
}

…on PHI

Add support for Induction PHI when calculating peel count.

The following cases are supported
---
void binary() {
  int x = 0;
  int y = 0;
  int a = 0;
  for (int i = 0; i < 1000; i++) {
    g(x);
    x = y;
    g(a);
    y = a;
    a = i + 2;
  }
}
---
@github-actions
Copy link

github-actions bot commented Jun 9, 2024

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Jun 9, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Moriyuki Saito (m-sai)

Changes

…on PHI

Add support for Induction PHI when calculating peel count.

The following cases are supported

void binary() {
  int x = 0;
  int y = 0;
  int a = 0;
  for (int i = 0; i &lt; 1000; i++) {
    g(x);
    x = y;
    g(a);
    y = a;
    a = i + 2;
  }
}

Full diff: https://github.com/llvm/llvm-project/pull/94900.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/LoopPeel.cpp (+18-11)
  • (modified) llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll (+98)
diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index e2516930d251b..bcf0c7b6b8960 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -13,6 +13,7 @@
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/Statistic.h"
+#include "llvm/Analysis/IVDescriptors.h"
 #include "llvm/Analysis/Loads.h"
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/Analysis/LoopIterator.h"
@@ -160,7 +161,7 @@ class PhiAnalyzer {
 
   // Calculate the sufficient minimum number of iterations of the loop to peel
   // such that phi instructions become determined (subject to allowable limits)
-  std::optional<unsigned> calculateIterationsToPeel();
+  std::optional<unsigned> calculateIterationsToPeel(ScalarEvolution &SE);
 
 protected:
   using PeelCounter = std::optional<unsigned>;
@@ -175,7 +176,7 @@ class PhiAnalyzer {
 
   // Calculate the number of iterations after which the given value
   // becomes an invariant.
-  PeelCounter calculate(const Value &);
+  PeelCounter calculate(Value &, ScalarEvolution &SE);
 
   const Loop &L;
   const unsigned MaxIterations;
@@ -204,7 +205,7 @@ PhiAnalyzer::PhiAnalyzer(const Loop &L, unsigned MaxIterations)
 //           %y = phi(0, 5)
 //           %a = %y + 1
 //   G(%y) = Unknown otherwise (including phi not in header block)
-PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(const Value &V) {
+PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(Value &V, ScalarEvolution &SE) {
   // If we already know the answer, take it from the map.
   auto I = IterationsToInvariance.find(&V);
   if (I != IterationsToInvariance.end())
@@ -217,15 +218,21 @@ PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(const Value &V) {
   if (L.isLoopInvariant(&V))
     // Loop invariant so known at start.
     return (IterationsToInvariance[&V] = 0);
-  if (const PHINode *Phi = dyn_cast<PHINode>(&V)) {
+  if (PHINode *Phi = dyn_cast<PHINode>(&V)) {
     if (Phi->getParent() != L.getHeader()) {
       // Phi is not in header block so Unknown.
       assert(IterationsToInvariance[&V] == Unknown && "unexpected value saved");
       return Unknown;
     }
+
+    // If Induction PHI, register as a starting point.
+    InductionDescriptor ID;
+    if (InductionDescriptor::isInductionPHI(Phi, &L, &SE, ID))
+      return (IterationsToInvariance[&V] = 0);
+
     // We need to analyze the input from the back edge and add 1.
     Value *Input = Phi->getIncomingValueForBlock(L.getLoopLatch());
-    PeelCounter Iterations = calculate(*Input);
+    PeelCounter Iterations = calculate(*Input, SE);
     assert(IterationsToInvariance[Input] == Iterations &&
            "unexpected value saved");
     return (IterationsToInvariance[Phi] = addOne(Iterations));
@@ -233,17 +240,17 @@ PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(const Value &V) {
   if (const Instruction *I = dyn_cast<Instruction>(&V)) {
     if (isa<CmpInst>(I) || I->isBinaryOp()) {
       // Binary instructions get the max of the operands.
-      PeelCounter LHS = calculate(*I->getOperand(0));
+      PeelCounter LHS = calculate(*I->getOperand(0), SE);
       if (LHS == Unknown)
         return Unknown;
-      PeelCounter RHS = calculate(*I->getOperand(1));
+      PeelCounter RHS = calculate(*I->getOperand(1), SE);
       if (RHS == Unknown)
         return Unknown;
       return (IterationsToInvariance[I] = {std::max(*LHS, *RHS)});
     }
     if (I->isCast())
       // Cast instructions get the value of the operand.
-      return (IterationsToInvariance[I] = calculate(*I->getOperand(0)));
+      return (IterationsToInvariance[I] = calculate(*I->getOperand(0), SE));
   }
   // TODO: handle more expressions
 
@@ -252,10 +259,10 @@ PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(const Value &V) {
   return Unknown;
 }
 
-std::optional<unsigned> PhiAnalyzer::calculateIterationsToPeel() {
+std::optional<unsigned> PhiAnalyzer::calculateIterationsToPeel(ScalarEvolution &SE) {
   unsigned Iterations = 0;
   for (auto &PHI : L.getHeader()->phis()) {
-    PeelCounter ToInvariance = calculate(PHI);
+    PeelCounter ToInvariance = calculate(PHI, SE);
     if (ToInvariance != Unknown) {
       assert(*ToInvariance <= MaxIterations && "bad result in phi analysis");
       Iterations = std::max(Iterations, *ToInvariance);
@@ -594,7 +601,7 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize,
   // Phis into invariants.
   if (MaxPeelCount > DesiredPeelCount) {
     // Check how many iterations are useful for resolving Phis
-    auto NumPeels = PhiAnalyzer(*L, MaxPeelCount).calculateIterationsToPeel();
+    auto NumPeels = PhiAnalyzer(*L, MaxPeelCount).calculateIterationsToPeel(SE);
     if (NumPeels)
       DesiredPeelCount = std::max(DesiredPeelCount, *NumPeels);
   }
diff --git a/llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll b/llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll
index e24eeef52de4e..3800c48954a6e 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll
@@ -197,3 +197,101 @@ for.body:
   %exitcond = icmp eq i32 %inc, 100000
   br i1 %exitcond, label %for.cond.cleanup, label %for.body
 }
+
+; Check that phi analysis can handle a binary operator with induction variable.
+define void @_Z6binaryv_induction() {
+; The phis become invariant through the chain of phis, with a unary
+; instruction on a loop invariant.  Check that the phis for x, a, and y
+; are removed since x is based on y, which is based on a, which is based
+; on a binary add of a phi and a constant.
+; Consider the calls to g:
+; First iteration: g(0), x=0, g(0), y=1, a=2
+; Second iteration: g(0), x=1, g(2), y=3(binary operator), a=3
+; Third iteration: g(1), x=3, g(3), y=4, a=4
+; Fourth iteration (and subsequent): g(3), x=4, g(4), y=5, a=6
+; Therefore, peeling 3 times removes the phi nodes.
+;
+; void g(int);
+; void binary() {
+;   int x = 0;
+;   int y = 0;
+;   int a = 0;
+;   for(int i = 0; i <100000; ++i) {
+;     g(x);
+;     x = y;
+;     g(a);
+;     y = a + 1;
+;     a = i + 2;
+;   }
+; }
+; CHECK-LABEL: @_Z6binaryv_induction(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[FOR_BODY_PEEL_BEGIN:%.*]]
+; CHECK:       for.body.peel.begin:
+; CHECK-NEXT:    br label [[FOR_BODY_PEEL:%.*]]
+; CHECK:       for.body.peel:
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext 0)
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext 0)
+; CHECK-NEXT:    [[ADD_PEEL:%.*]] = add nuw nsw i32 0, 2
+; CHECK-NEXT:    [[INC_PEEL:%.*]] = add nuw nsw i32 0, 1
+; CHECK-NEXT:    [[EXITCOND_PEEL:%.*]] = icmp ne i32 [[INC_PEEL]], 100000
+; CHECK-NEXT:    br i1 [[EXITCOND_PEEL]], label [[FOR_BODY_PEEL_NEXT:%.*]], label [[FOR_COND_CLEANUP:%.*]]
+; CHECK:       for.body.peel.next:
+; CHECK-NEXT:    br label [[FOR_BODY_PEEL2:%.*]]
+; CHECK:       for.body.peel2:
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext 0)
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext [[ADD_PEEL]])
+; CHECK-NEXT:    [[ADD_PEEL3:%.*]] = add nuw nsw i32 [[INC_PEEL]], 2
+; CHECK-NEXT:    [[INC_PEEL4:%.*]] = add nuw nsw i32 [[INC_PEEL]], 1
+; CHECK-NEXT:    [[EXITCOND_PEEL5:%.*]] = icmp ne i32 [[INC_PEEL4]], 100000
+; CHECK-NEXT:    br i1 [[EXITCOND_PEEL5]], label [[FOR_BODY_PEEL_NEXT1:%.*]], label [[FOR_COND_CLEANUP]]
+; CHECK:       for.body.peel.next1:
+; CHECK-NEXT:    br label [[FOR_BODY_PEEL7:%.*]]
+; CHECK:       for.body.peel7:
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext 0)
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext [[ADD_PEEL3]])
+; CHECK-NEXT:    [[ADD_PEEL8:%.*]] = add nuw nsw i32 [[INC_PEEL4]], 2
+; CHECK-NEXT:    [[INC_PEEL9:%.*]] = add nuw nsw i32 [[INC_PEEL4]], 1
+; CHECK-NEXT:    [[EXITCOND_PEEL10:%.*]] = icmp ne i32 [[INC_PEEL9]], 100000
+; CHECK-NEXT:    br i1 [[EXITCOND_PEEL10]], label [[FOR_BODY_PEEL_NEXT6:%.*]], label [[FOR_COND_CLEANUP]]
+; CHECK:       for.body.peel.next6:
+; CHECK-NEXT:    br label [[FOR_BODY_PEEL_NEXT11:%.*]]
+; CHECK:       for.body.peel.next11:
+; CHECK-NEXT:    br label [[ENTRY_PEEL_NEWPH:%.*]]
+; CHECK:       entry.peel.newph:
+; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
+; CHECK:       for.cond.cleanup.loopexit:
+; CHECK-NEXT:    br label [[FOR_COND_CLEANUP]]
+; CHECK:       for.cond.cleanup:
+; CHECK-NEXT:    ret void
+; CHECK:       for.body:
+; CHECK-NEXT:    [[I:%.*]] = phi i32 [ [[INC_PEEL9]], [[ENTRY_PEEL_NEWPH]] ], [ [[INC:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[X:%.*]] = phi i32 [ [[ADD_PEEL]], [[ENTRY_PEEL_NEWPH]] ], [ [[Y:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[A:%.*]] = phi i32 [ [[ADD_PEEL8]], [[ENTRY_PEEL_NEWPH]] ], [ [[ADD:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[Y]] = phi i32 [ [[ADD_PEEL3]], [[ENTRY_PEEL_NEWPH]] ], [ [[A]], [[FOR_BODY]] ]
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext [[X]])
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext [[A]])
+; CHECK-NEXT:    [[ADD]] = add nuw nsw i32 [[I]], 2
+; CHECK-NEXT:    [[INC]] = add nuw nsw i32 [[I]], 1
+; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp ne i32 [[INC]], 100000
+; CHECK-NEXT:    br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], !llvm.loop [[LOOP3:![0-9]+]]
+;
+entry:
+  br label %for.body
+
+for.cond.cleanup:
+  ret void
+
+for.body:
+  %i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
+  %x = phi i32 [ 0, %entry ], [ %y, %for.body ]
+  %a = phi i32 [ 0, %entry ], [ %add, %for.body ]
+  %y = phi i32 [ 0, %entry ], [ %a, %for.body ]
+  tail call void @_Z1gi(i32 signext %x)
+  tail call void @_Z1gi(i32 signext %a)
+  %add = add nuw nsw i32 %i, 2
+  %inc = add nuw nsw i32 %i, 1
+  %exitcond = icmp ne i32 %inc, 100000
+  br i1 %exitcond, label %for.body, label %for.cond.cleanup
+}
+

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get what you're trying to achieve here. None of the induction variables involved become invariant (see https://c.godbolt.org/z/WP7KdM66d for the evolution).

Is the point that after some iterations the IVs end up being at fixed offsets from each other?

@m-sai
Copy link
Author

m-sai commented Jun 9, 2024

Thanks for your comment.

I thought of using loop peeling to convert if it can be simply expressed using induction variables after a few iterations.

Unfortunately, a small number of trip counts will result in full unrolling, which is illustrated in the following code example.
https://c.godbolt.org/z/eoecW1jvP

In that case, the status of each variable and argument per iteration is as follows
iteration 0: i=0, g(0,0,0,0), x=0, y=0, a=2
iteration 1: i=1, g(1,0,0,2), x=0, y=2, a=3
iteration 2: i=2, g(2,0,2,3), x=2, y=3, a=4
iteration 3: i=3, g(3,2,3,4), x=3, y=4, a=5
iteration 4: i=4, g(4,3,4,5), x=4, y=5, a=6

I think this is equivalent to the following code.

  // loop peeling part
  for (int i = 0; i < 3; i++) {
    g(i, x, y, a);
    x = y;
    y = a;
    a = i + 2;
  }
  // simple loop part
  for (int i = 3; i < 1000; i++) {
    g(i, i-1, i, i+1);
  }

@github-actions
Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 6b3e0002dfe0029487fc2f8f11f5d5fdc07a5e11 8cd566f056c3e31a5b535af7352f767fa95467c9 -- llvm/lib/Transforms/Utils/LoopPeel.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index bcf0c7b6b8..90d0328762 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -259,7 +259,8 @@ PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(Value &V, ScalarEvolution &SE) {
   return Unknown;
 }
 
-std::optional<unsigned> PhiAnalyzer::calculateIterationsToPeel(ScalarEvolution &SE) {
+std::optional<unsigned>
+PhiAnalyzer::calculateIterationsToPeel(ScalarEvolution &SE) {
   unsigned Iterations = 0;
   for (auto &PHI : L.getHeader()->phis()) {
     PeelCounter ToInvariance = calculate(PHI, SE);

@madhur13490
Copy link
Contributor

Hi @m-sai
@nikic pointed me to this patch on https://discourse.llvm.org/t/missed-loop-peeling-and-thus-vectorization-opportunity/80616. This patch is indeed fixing the vectorization issue in the TSVC benchmark.

Overall, this patch looks good to me as I arrived at some of the changes myself in my local branch :) - adding PHI nodes as a starting point is sensible.

I have another test case from TSVC benchmark but I believe I can add it after you land this patch.

return Unknown;
}

// If Induction PHI, register as a starting point.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment explaining some more context would be useful. May be copy the comments from the test here too as they provide the motivation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madhur13490 Thanks for your comment.
Comment added. How about something like this?

Copy link
Contributor

@madhur13490 madhur13490 Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good.

@madhur13490
Copy link
Contributor

madhur13490 commented Aug 27, 2024

@m-sai
Have you measured the impact of the patch on key benchmarks? LLVM test-suite is a good example to make sure we don't regress.

@m-sai
Copy link
Author

m-sai commented Sep 2, 2024

@madhur13490
I found a bug that caused unnecessary peeling when running LLVM test-suite and have fixed it. I am currently checking to see if there are any problems with the fix.

@madhur13490
Copy link
Contributor

Hello @m-sai Any updates?

@madhur13490
Copy link
Contributor

Gentle ping @m-sai. Are you planning to progress this?

@m-sai
Copy link
Author

m-sai commented Oct 15, 2024

@madhur13490
Sorry for the delay in my reply. Fixing the bug is going to take a bit more time.

@hiraditya hiraditya requested a review from sebpop November 19, 2024 16:48
@kasuga-fj
Copy link
Contributor

Hi. I discussed this PR with @m-sai and we agreed that I will take over this. He also told me about the cases that cause unexpected peeling, so I'll submit a new PR once this is resolved.

kasuga-fj added a commit that referenced this pull request Aug 20, 2025
…21104)

LoopPeel currently considers PHI nodes that become loop invariants
through peeling. However, in some cases, peeling transforms PHI nodes
into induction variables (IVs), potentially enabling further
optimizations such as loop vectorization. For example:

```c
// TSVC s292
int im = N-1;
for (int i=0; i<N; i++) {
  a[i] = b[i] + b[im];
  im = i;
}
```

In this case, peeling one iteration converts `im` into an IV, allowing
it to be handled by the loop vectorizer.

This patch adds a new feature to peel loops when to convert PHIs into
IVs. At the moment this feature is disabled by default.

Enabling it allows to vectorize the above example. I have measured on
neoverse-v2 and observed a speedup of more than 60% (options: `-O3
-ffast-math -mcpu=neoverse-v2 -mllvm -enable-peeling-for-iv`).

This PR is taken over from #94900
Related #81851
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants