[LoopPeel] Added support for cases where PhiAnalyzer contains Induction PHI #94900

m-sai · 2024-06-09T12:03:05Z

…on PHI

Add support for Induction PHI when calculating peel count.

The following cases are supported

void binary() {
  int x = 0;
  int y = 0;
  int a = 0;
  for (int i = 0; i < 1000; i++) {
    g(x);
    x = y;
    g(a);
    y = a;
    a = i + 2;
  }
}

…on PHI Add support for Induction PHI when calculating peel count. The following cases are supported --- void binary() { int x = 0; int y = 0; int a = 0; for (int i = 0; i < 1000; i++) { g(x); x = y; g(a); y = a; a = i + 2; } } ---

github-actions · 2024-06-09T12:03:30Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2024-06-09T12:04:02Z

@llvm/pr-subscribers-llvm-transforms

Author: Moriyuki Saito (m-sai)

Changes

…on PHI

Add support for Induction PHI when calculating peel count.

The following cases are supported

void binary() {
  int x = 0;
  int y = 0;
  int a = 0;
  for (int i = 0; i &lt; 1000; i++) {
    g(x);
    x = y;
    g(a);
    y = a;
    a = i + 2;
  }
}

Full diff: https://github.com/llvm/llvm-project/pull/94900.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Utils/LoopPeel.cpp (+18-11)
(modified) llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll (+98)

diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index e2516930d251b..bcf0c7b6b8960 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -13,6 +13,7 @@
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/Statistic.h"
+#include "llvm/Analysis/IVDescriptors.h"
 #include "llvm/Analysis/Loads.h"
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/Analysis/LoopIterator.h"
@@ -160,7 +161,7 @@ class PhiAnalyzer {
 
   // Calculate the sufficient minimum number of iterations of the loop to peel
   // such that phi instructions become determined (subject to allowable limits)
-  std::optional<unsigned> calculateIterationsToPeel();
+  std::optional<unsigned> calculateIterationsToPeel(ScalarEvolution &SE);
 
 protected:
   using PeelCounter = std::optional<unsigned>;
@@ -175,7 +176,7 @@ class PhiAnalyzer {
 
   // Calculate the number of iterations after which the given value
   // becomes an invariant.
-  PeelCounter calculate(const Value &);
+  PeelCounter calculate(Value &, ScalarEvolution &SE);
 
   const Loop &L;
   const unsigned MaxIterations;
@@ -204,7 +205,7 @@ PhiAnalyzer::PhiAnalyzer(const Loop &L, unsigned MaxIterations)
 //           %y = phi(0, 5)
 //           %a = %y + 1
 //   G(%y) = Unknown otherwise (including phi not in header block)
-PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(const Value &V) {
+PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(Value &V, ScalarEvolution &SE) {
   // If we already know the answer, take it from the map.
   auto I = IterationsToInvariance.find(&V);
   if (I != IterationsToInvariance.end())
@@ -217,15 +218,21 @@ PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(const Value &V) {
   if (L.isLoopInvariant(&V))
     // Loop invariant so known at start.
     return (IterationsToInvariance[&V] = 0);
-  if (const PHINode *Phi = dyn_cast<PHINode>(&V)) {
+  if (PHINode *Phi = dyn_cast<PHINode>(&V)) {
     if (Phi->getParent() != L.getHeader()) {
       // Phi is not in header block so Unknown.
       assert(IterationsToInvariance[&V] == Unknown && "unexpected value saved");
       return Unknown;
     }
+
+    // If Induction PHI, register as a starting point.
+    InductionDescriptor ID;
+    if (InductionDescriptor::isInductionPHI(Phi, &L, &SE, ID))
+      return (IterationsToInvariance[&V] = 0);
+
     // We need to analyze the input from the back edge and add 1.
     Value *Input = Phi->getIncomingValueForBlock(L.getLoopLatch());
-    PeelCounter Iterations = calculate(*Input);
+    PeelCounter Iterations = calculate(*Input, SE);
     assert(IterationsToInvariance[Input] == Iterations &&
            "unexpected value saved");
     return (IterationsToInvariance[Phi] = addOne(Iterations));
@@ -233,17 +240,17 @@ PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(const Value &V) {
   if (const Instruction *I = dyn_cast<Instruction>(&V)) {
     if (isa<CmpInst>(I) || I->isBinaryOp()) {
       // Binary instructions get the max of the operands.
-      PeelCounter LHS = calculate(*I->getOperand(0));
+      PeelCounter LHS = calculate(*I->getOperand(0), SE);
       if (LHS == Unknown)
         return Unknown;
-      PeelCounter RHS = calculate(*I->getOperand(1));
+      PeelCounter RHS = calculate(*I->getOperand(1), SE);
       if (RHS == Unknown)
         return Unknown;
       return (IterationsToInvariance[I] = {std::max(*LHS, *RHS)});
     }
     if (I->isCast())
       // Cast instructions get the value of the operand.
-      return (IterationsToInvariance[I] = calculate(*I->getOperand(0)));
+      return (IterationsToInvariance[I] = calculate(*I->getOperand(0), SE));
   }
   // TODO: handle more expressions
 
@@ -252,10 +259,10 @@ PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(const Value &V) {
   return Unknown;
 }
 
-std::optional<unsigned> PhiAnalyzer::calculateIterationsToPeel() {
+std::optional<unsigned> PhiAnalyzer::calculateIterationsToPeel(ScalarEvolution &SE) {
   unsigned Iterations = 0;
   for (auto &PHI : L.getHeader()->phis()) {
-    PeelCounter ToInvariance = calculate(PHI);
+    PeelCounter ToInvariance = calculate(PHI, SE);
     if (ToInvariance != Unknown) {
       assert(*ToInvariance <= MaxIterations && "bad result in phi analysis");
       Iterations = std::max(Iterations, *ToInvariance);
@@ -594,7 +601,7 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize,
   // Phis into invariants.
   if (MaxPeelCount > DesiredPeelCount) {
     // Check how many iterations are useful for resolving Phis
-    auto NumPeels = PhiAnalyzer(*L, MaxPeelCount).calculateIterationsToPeel();
+    auto NumPeels = PhiAnalyzer(*L, MaxPeelCount).calculateIterationsToPeel(SE);
     if (NumPeels)
       DesiredPeelCount = std::max(DesiredPeelCount, *NumPeels);
   }
diff --git a/llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll b/llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll
index e24eeef52de4e..3800c48954a6e 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll
@@ -197,3 +197,101 @@ for.body:
   %exitcond = icmp eq i32 %inc, 100000
   br i1 %exitcond, label %for.cond.cleanup, label %for.body
 }
+
+; Check that phi analysis can handle a binary operator with induction variable.
+define void @_Z6binaryv_induction() {
+; The phis become invariant through the chain of phis, with a unary
+; instruction on a loop invariant.  Check that the phis for x, a, and y
+; are removed since x is based on y, which is based on a, which is based
+; on a binary add of a phi and a constant.
+; Consider the calls to g:
+; First iteration: g(0), x=0, g(0), y=1, a=2
+; Second iteration: g(0), x=1, g(2), y=3(binary operator), a=3
+; Third iteration: g(1), x=3, g(3), y=4, a=4
+; Fourth iteration (and subsequent): g(3), x=4, g(4), y=5, a=6
+; Therefore, peeling 3 times removes the phi nodes.
+;
+; void g(int);
+; void binary() {
+;   int x = 0;
+;   int y = 0;
+;   int a = 0;
+;   for(int i = 0; i <100000; ++i) {
+;     g(x);
+;     x = y;
+;     g(a);
+;     y = a + 1;
+;     a = i + 2;
+;   }
+; }
+; CHECK-LABEL: @_Z6binaryv_induction(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[FOR_BODY_PEEL_BEGIN:%.*]]
+; CHECK:       for.body.peel.begin:
+; CHECK-NEXT:    br label [[FOR_BODY_PEEL:%.*]]
+; CHECK:       for.body.peel:
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext 0)
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext 0)
+; CHECK-NEXT:    [[ADD_PEEL:%.*]] = add nuw nsw i32 0, 2
+; CHECK-NEXT:    [[INC_PEEL:%.*]] = add nuw nsw i32 0, 1
+; CHECK-NEXT:    [[EXITCOND_PEEL:%.*]] = icmp ne i32 [[INC_PEEL]], 100000
+; CHECK-NEXT:    br i1 [[EXITCOND_PEEL]], label [[FOR_BODY_PEEL_NEXT:%.*]], label [[FOR_COND_CLEANUP:%.*]]
+; CHECK:       for.body.peel.next:
+; CHECK-NEXT:    br label [[FOR_BODY_PEEL2:%.*]]
+; CHECK:       for.body.peel2:
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext 0)
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext [[ADD_PEEL]])
+; CHECK-NEXT:    [[ADD_PEEL3:%.*]] = add nuw nsw i32 [[INC_PEEL]], 2
+; CHECK-NEXT:    [[INC_PEEL4:%.*]] = add nuw nsw i32 [[INC_PEEL]], 1
+; CHECK-NEXT:    [[EXITCOND_PEEL5:%.*]] = icmp ne i32 [[INC_PEEL4]], 100000
+; CHECK-NEXT:    br i1 [[EXITCOND_PEEL5]], label [[FOR_BODY_PEEL_NEXT1:%.*]], label [[FOR_COND_CLEANUP]]
+; CHECK:       for.body.peel.next1:
+; CHECK-NEXT:    br label [[FOR_BODY_PEEL7:%.*]]
+; CHECK:       for.body.peel7:
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext 0)
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext [[ADD_PEEL3]])
+; CHECK-NEXT:    [[ADD_PEEL8:%.*]] = add nuw nsw i32 [[INC_PEEL4]], 2
+; CHECK-NEXT:    [[INC_PEEL9:%.*]] = add nuw nsw i32 [[INC_PEEL4]], 1
+; CHECK-NEXT:    [[EXITCOND_PEEL10:%.*]] = icmp ne i32 [[INC_PEEL9]], 100000
+; CHECK-NEXT:    br i1 [[EXITCOND_PEEL10]], label [[FOR_BODY_PEEL_NEXT6:%.*]], label [[FOR_COND_CLEANUP]]
+; CHECK:       for.body.peel.next6:
+; CHECK-NEXT:    br label [[FOR_BODY_PEEL_NEXT11:%.*]]
+; CHECK:       for.body.peel.next11:
+; CHECK-NEXT:    br label [[ENTRY_PEEL_NEWPH:%.*]]
+; CHECK:       entry.peel.newph:
+; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
+; CHECK:       for.cond.cleanup.loopexit:
+; CHECK-NEXT:    br label [[FOR_COND_CLEANUP]]
+; CHECK:       for.cond.cleanup:
+; CHECK-NEXT:    ret void
+; CHECK:       for.body:
+; CHECK-NEXT:    [[I:%.*]] = phi i32 [ [[INC_PEEL9]], [[ENTRY_PEEL_NEWPH]] ], [ [[INC:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[X:%.*]] = phi i32 [ [[ADD_PEEL]], [[ENTRY_PEEL_NEWPH]] ], [ [[Y:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[A:%.*]] = phi i32 [ [[ADD_PEEL8]], [[ENTRY_PEEL_NEWPH]] ], [ [[ADD:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT:    [[Y]] = phi i32 [ [[ADD_PEEL3]], [[ENTRY_PEEL_NEWPH]] ], [ [[A]], [[FOR_BODY]] ]
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext [[X]])
+; CHECK-NEXT:    tail call void @_Z1gi(i32 signext [[A]])
+; CHECK-NEXT:    [[ADD]] = add nuw nsw i32 [[I]], 2
+; CHECK-NEXT:    [[INC]] = add nuw nsw i32 [[I]], 1
+; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp ne i32 [[INC]], 100000
+; CHECK-NEXT:    br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], !llvm.loop [[LOOP3:![0-9]+]]
+;
+entry:
+  br label %for.body
+
+for.cond.cleanup:
+  ret void
+
+for.body:
+  %i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
+  %x = phi i32 [ 0, %entry ], [ %y, %for.body ]
+  %a = phi i32 [ 0, %entry ], [ %add, %for.body ]
+  %y = phi i32 [ 0, %entry ], [ %a, %for.body ]
+  tail call void @_Z1gi(i32 signext %x)
+  tail call void @_Z1gi(i32 signext %a)
+  %add = add nuw nsw i32 %i, 2
+  %inc = add nuw nsw i32 %i, 1
+  %exitcond = icmp ne i32 %inc, 100000
+  br i1 %exitcond, label %for.body, label %for.cond.cleanup
+}
+

nikic

I don't really get what you're trying to achieve here. None of the induction variables involved become invariant (see https://c.godbolt.org/z/WP7KdM66d for the evolution).

Is the point that after some iterations the IVs end up being at fixed offsets from each other?

m-sai · 2024-06-09T14:14:31Z

Thanks for your comment.

I thought of using loop peeling to convert if it can be simply expressed using induction variables after a few iterations.

Unfortunately, a small number of trip counts will result in full unrolling, which is illustrated in the following code example.
https://c.godbolt.org/z/eoecW1jvP

In that case, the status of each variable and argument per iteration is as follows
iteration 0: i=0, g(0,0,0,0), x=0, y=0, a=2
iteration 1: i=1, g(1,0,0,2), x=0, y=2, a=3
iteration 2: i=2, g(2,0,2,3), x=2, y=3, a=4
iteration 3: i=3, g(3,2,3,4), x=3, y=4, a=5
iteration 4: i=4, g(4,3,4,5), x=4, y=5, a=6

I think this is equivalent to the following code.

  // loop peeling part
  for (int i = 0; i < 3; i++) {
    g(i, x, y, a);
    x = y;
    y = a;
    a = i + 2;
  }
  // simple loop part
  for (int i = 3; i < 1000; i++) {
    g(i, i-1, i, i+1);
  }

github-actions · 2024-06-10T22:58:38Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 6b3e0002dfe0029487fc2f8f11f5d5fdc07a5e11 8cd566f056c3e31a5b535af7352f767fa95467c9 -- llvm/lib/Transforms/Utils/LoopPeel.cpp

View the diff from clang-format here.

diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index bcf0c7b6b8..90d0328762 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -259,7 +259,8 @@ PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(Value &V, ScalarEvolution &SE) {
   return Unknown;
 }
 
-std::optional<unsigned> PhiAnalyzer::calculateIterationsToPeel(ScalarEvolution &SE) {
+std::optional<unsigned>
+PhiAnalyzer::calculateIterationsToPeel(ScalarEvolution &SE) {
   unsigned Iterations = 0;
   for (auto &PHI : L.getHeader()->phis()) {
     PeelCounter ToInvariance = calculate(PHI, SE);

madhur13490 · 2024-08-26T04:30:48Z

Hi @m-sai
@nikic pointed me to this patch on https://discourse.llvm.org/t/missed-loop-peeling-and-thus-vectorization-opportunity/80616. This patch is indeed fixing the vectorization issue in the TSVC benchmark.

Overall, this patch looks good to me as I arrived at some of the changes myself in my local branch :) - adding PHI nodes as a starting point is sensible.

I have another test case from TSVC benchmark but I believe I can add it after you land this patch.

madhur13490 · 2024-08-26T04:33:16Z

llvm/lib/Transforms/Utils/LoopPeel.cpp

      return Unknown;
    }
+
+    // If Induction PHI, register as a starting point.


I think a comment explaining some more context would be useful. May be copy the comments from the test here too as they provide the motivation?

@madhur13490 Thanks for your comment.
Comment added. How about something like this?

Thanks. Looks good.

madhur13490 · 2024-08-27T05:31:30Z

@m-sai
Have you measured the impact of the patch on key benchmarks? LLVM test-suite is a good example to make sure we don't regress.

m-sai · 2024-09-02T21:54:52Z

@madhur13490
I found a bug that caused unnecessary peeling when running LLVM test-suite and have fixed it. I am currently checking to see if there are any problems with the fix.

madhur13490 · 2024-09-16T06:40:21Z

Hello @m-sai Any updates?

madhur13490 · 2024-10-07T07:57:49Z

Gentle ping @m-sai. Are you planning to progress this?

m-sai · 2024-10-15T22:05:30Z

@madhur13490
Sorry for the delay in my reply. Fixing the bug is going to take a bit more time.

kasuga-fj · 2024-12-12T06:54:39Z

Hi. I discussed this PR with @m-sai and we agreed that I will take over this. He also told me about the cases that cause unexpected peeling, so I'll submit a new PR once this is resolved.

…21104) LoopPeel currently considers PHI nodes that become loop invariants through peeling. However, in some cases, peeling transforms PHI nodes into induction variables (IVs), potentially enabling further optimizations such as loop vectorization. For example: ```c // TSVC s292 int im = N-1; for (int i=0; i<N; i++) { a[i] = b[i] + b[im]; im = i; } ``` In this case, peeling one iteration converts `im` into an IV, allowing it to be handled by the loop vectorizer. This patch adds a new feature to peel loops when to convert PHIs into IVs. At the moment this feature is disabled by default. Enabling it allows to vectorize the above example. I have measured on neoverse-v2 and observed a speedup of more than 60% (options: `-O3 -ffast-math -mcpu=neoverse-v2 -mllvm -enable-peeling-for-iv`). This PR is taken over from #94900 Related #81851

llvmbot added the llvm:transforms label Jun 9, 2024

nikic reviewed Jun 9, 2024

View reviewed changes

madhur13490 reviewed Aug 26, 2024

View reviewed changes

m-sai added 2 commits August 26, 2024 22:02

apply clang-format

b6488f6

Add comment.

34fb403

hiraditya requested a review from sebpop November 19, 2024 16:48

kasuga-fj mentioned this pull request Dec 25, 2024

[LoopPeel] Add new option to peeling loops to convert PHI into IV #121104

Merged

[LoopPeel] Added support for cases where PhiAnalyzer contains Induction PHI #94900

Are you sure you want to change the base?

[LoopPeel] Added support for cases where PhiAnalyzer contains Induction PHI #94900

Uh oh!

Conversation

m-sai commented Jun 9, 2024

Uh oh!

github-actions bot commented Jun 9, 2024

Uh oh!

llvmbot commented Jun 9, 2024

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

m-sai commented Jun 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 10, 2024

Uh oh!

madhur13490 commented Aug 26, 2024

Uh oh!

madhur13490 Aug 26, 2024

Choose a reason for hiding this comment

Uh oh!

m-sai Aug 26, 2024

Choose a reason for hiding this comment

Uh oh!

madhur13490 Aug 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

madhur13490 commented Aug 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m-sai commented Sep 2, 2024

Uh oh!

madhur13490 commented Sep 16, 2024

Uh oh!

madhur13490 commented Oct 7, 2024

Uh oh!

m-sai commented Oct 15, 2024

Uh oh!

kasuga-fj commented Dec 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

m-sai commented Jun 9, 2024 •

edited

Loading

madhur13490 Aug 27, 2024 •

edited

Loading

madhur13490 commented Aug 27, 2024 •

edited

Loading