-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[LoopPeel] Added support for cases where PhiAnalyzer contains Induction PHI #94900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…on PHI
Add support for Induction PHI when calculating peel count.
The following cases are supported
---
void binary() {
int x = 0;
int y = 0;
int a = 0;
for (int i = 0; i < 1000; i++) {
g(x);
x = y;
g(a);
y = a;
a = i + 2;
}
}
---
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-llvm-transforms Author: Moriyuki Saito (m-sai) Changes…on PHI Add support for Induction PHI when calculating peel count. The following cases are supported Full diff: https://github.com/llvm/llvm-project/pull/94900.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index e2516930d251b..bcf0c7b6b8960 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -13,6 +13,7 @@
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"
+#include "llvm/Analysis/IVDescriptors.h"
#include "llvm/Analysis/Loads.h"
#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopIterator.h"
@@ -160,7 +161,7 @@ class PhiAnalyzer {
// Calculate the sufficient minimum number of iterations of the loop to peel
// such that phi instructions become determined (subject to allowable limits)
- std::optional<unsigned> calculateIterationsToPeel();
+ std::optional<unsigned> calculateIterationsToPeel(ScalarEvolution &SE);
protected:
using PeelCounter = std::optional<unsigned>;
@@ -175,7 +176,7 @@ class PhiAnalyzer {
// Calculate the number of iterations after which the given value
// becomes an invariant.
- PeelCounter calculate(const Value &);
+ PeelCounter calculate(Value &, ScalarEvolution &SE);
const Loop &L;
const unsigned MaxIterations;
@@ -204,7 +205,7 @@ PhiAnalyzer::PhiAnalyzer(const Loop &L, unsigned MaxIterations)
// %y = phi(0, 5)
// %a = %y + 1
// G(%y) = Unknown otherwise (including phi not in header block)
-PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(const Value &V) {
+PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(Value &V, ScalarEvolution &SE) {
// If we already know the answer, take it from the map.
auto I = IterationsToInvariance.find(&V);
if (I != IterationsToInvariance.end())
@@ -217,15 +218,21 @@ PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(const Value &V) {
if (L.isLoopInvariant(&V))
// Loop invariant so known at start.
return (IterationsToInvariance[&V] = 0);
- if (const PHINode *Phi = dyn_cast<PHINode>(&V)) {
+ if (PHINode *Phi = dyn_cast<PHINode>(&V)) {
if (Phi->getParent() != L.getHeader()) {
// Phi is not in header block so Unknown.
assert(IterationsToInvariance[&V] == Unknown && "unexpected value saved");
return Unknown;
}
+
+ // If Induction PHI, register as a starting point.
+ InductionDescriptor ID;
+ if (InductionDescriptor::isInductionPHI(Phi, &L, &SE, ID))
+ return (IterationsToInvariance[&V] = 0);
+
// We need to analyze the input from the back edge and add 1.
Value *Input = Phi->getIncomingValueForBlock(L.getLoopLatch());
- PeelCounter Iterations = calculate(*Input);
+ PeelCounter Iterations = calculate(*Input, SE);
assert(IterationsToInvariance[Input] == Iterations &&
"unexpected value saved");
return (IterationsToInvariance[Phi] = addOne(Iterations));
@@ -233,17 +240,17 @@ PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(const Value &V) {
if (const Instruction *I = dyn_cast<Instruction>(&V)) {
if (isa<CmpInst>(I) || I->isBinaryOp()) {
// Binary instructions get the max of the operands.
- PeelCounter LHS = calculate(*I->getOperand(0));
+ PeelCounter LHS = calculate(*I->getOperand(0), SE);
if (LHS == Unknown)
return Unknown;
- PeelCounter RHS = calculate(*I->getOperand(1));
+ PeelCounter RHS = calculate(*I->getOperand(1), SE);
if (RHS == Unknown)
return Unknown;
return (IterationsToInvariance[I] = {std::max(*LHS, *RHS)});
}
if (I->isCast())
// Cast instructions get the value of the operand.
- return (IterationsToInvariance[I] = calculate(*I->getOperand(0)));
+ return (IterationsToInvariance[I] = calculate(*I->getOperand(0), SE));
}
// TODO: handle more expressions
@@ -252,10 +259,10 @@ PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(const Value &V) {
return Unknown;
}
-std::optional<unsigned> PhiAnalyzer::calculateIterationsToPeel() {
+std::optional<unsigned> PhiAnalyzer::calculateIterationsToPeel(ScalarEvolution &SE) {
unsigned Iterations = 0;
for (auto &PHI : L.getHeader()->phis()) {
- PeelCounter ToInvariance = calculate(PHI);
+ PeelCounter ToInvariance = calculate(PHI, SE);
if (ToInvariance != Unknown) {
assert(*ToInvariance <= MaxIterations && "bad result in phi analysis");
Iterations = std::max(Iterations, *ToInvariance);
@@ -594,7 +601,7 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize,
// Phis into invariants.
if (MaxPeelCount > DesiredPeelCount) {
// Check how many iterations are useful for resolving Phis
- auto NumPeels = PhiAnalyzer(*L, MaxPeelCount).calculateIterationsToPeel();
+ auto NumPeels = PhiAnalyzer(*L, MaxPeelCount).calculateIterationsToPeel(SE);
if (NumPeels)
DesiredPeelCount = std::max(DesiredPeelCount, *NumPeels);
}
diff --git a/llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll b/llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll
index e24eeef52de4e..3800c48954a6e 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-loop-phi-analysis.ll
@@ -197,3 +197,101 @@ for.body:
%exitcond = icmp eq i32 %inc, 100000
br i1 %exitcond, label %for.cond.cleanup, label %for.body
}
+
+; Check that phi analysis can handle a binary operator with induction variable.
+define void @_Z6binaryv_induction() {
+; The phis become invariant through the chain of phis, with a unary
+; instruction on a loop invariant. Check that the phis for x, a, and y
+; are removed since x is based on y, which is based on a, which is based
+; on a binary add of a phi and a constant.
+; Consider the calls to g:
+; First iteration: g(0), x=0, g(0), y=1, a=2
+; Second iteration: g(0), x=1, g(2), y=3(binary operator), a=3
+; Third iteration: g(1), x=3, g(3), y=4, a=4
+; Fourth iteration (and subsequent): g(3), x=4, g(4), y=5, a=6
+; Therefore, peeling 3 times removes the phi nodes.
+;
+; void g(int);
+; void binary() {
+; int x = 0;
+; int y = 0;
+; int a = 0;
+; for(int i = 0; i <100000; ++i) {
+; g(x);
+; x = y;
+; g(a);
+; y = a + 1;
+; a = i + 2;
+; }
+; }
+; CHECK-LABEL: @_Z6binaryv_induction(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: br label [[FOR_BODY_PEEL_BEGIN:%.*]]
+; CHECK: for.body.peel.begin:
+; CHECK-NEXT: br label [[FOR_BODY_PEEL:%.*]]
+; CHECK: for.body.peel:
+; CHECK-NEXT: tail call void @_Z1gi(i32 signext 0)
+; CHECK-NEXT: tail call void @_Z1gi(i32 signext 0)
+; CHECK-NEXT: [[ADD_PEEL:%.*]] = add nuw nsw i32 0, 2
+; CHECK-NEXT: [[INC_PEEL:%.*]] = add nuw nsw i32 0, 1
+; CHECK-NEXT: [[EXITCOND_PEEL:%.*]] = icmp ne i32 [[INC_PEEL]], 100000
+; CHECK-NEXT: br i1 [[EXITCOND_PEEL]], label [[FOR_BODY_PEEL_NEXT:%.*]], label [[FOR_COND_CLEANUP:%.*]]
+; CHECK: for.body.peel.next:
+; CHECK-NEXT: br label [[FOR_BODY_PEEL2:%.*]]
+; CHECK: for.body.peel2:
+; CHECK-NEXT: tail call void @_Z1gi(i32 signext 0)
+; CHECK-NEXT: tail call void @_Z1gi(i32 signext [[ADD_PEEL]])
+; CHECK-NEXT: [[ADD_PEEL3:%.*]] = add nuw nsw i32 [[INC_PEEL]], 2
+; CHECK-NEXT: [[INC_PEEL4:%.*]] = add nuw nsw i32 [[INC_PEEL]], 1
+; CHECK-NEXT: [[EXITCOND_PEEL5:%.*]] = icmp ne i32 [[INC_PEEL4]], 100000
+; CHECK-NEXT: br i1 [[EXITCOND_PEEL5]], label [[FOR_BODY_PEEL_NEXT1:%.*]], label [[FOR_COND_CLEANUP]]
+; CHECK: for.body.peel.next1:
+; CHECK-NEXT: br label [[FOR_BODY_PEEL7:%.*]]
+; CHECK: for.body.peel7:
+; CHECK-NEXT: tail call void @_Z1gi(i32 signext 0)
+; CHECK-NEXT: tail call void @_Z1gi(i32 signext [[ADD_PEEL3]])
+; CHECK-NEXT: [[ADD_PEEL8:%.*]] = add nuw nsw i32 [[INC_PEEL4]], 2
+; CHECK-NEXT: [[INC_PEEL9:%.*]] = add nuw nsw i32 [[INC_PEEL4]], 1
+; CHECK-NEXT: [[EXITCOND_PEEL10:%.*]] = icmp ne i32 [[INC_PEEL9]], 100000
+; CHECK-NEXT: br i1 [[EXITCOND_PEEL10]], label [[FOR_BODY_PEEL_NEXT6:%.*]], label [[FOR_COND_CLEANUP]]
+; CHECK: for.body.peel.next6:
+; CHECK-NEXT: br label [[FOR_BODY_PEEL_NEXT11:%.*]]
+; CHECK: for.body.peel.next11:
+; CHECK-NEXT: br label [[ENTRY_PEEL_NEWPH:%.*]]
+; CHECK: entry.peel.newph:
+; CHECK-NEXT: br label [[FOR_BODY:%.*]]
+; CHECK: for.cond.cleanup.loopexit:
+; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
+; CHECK: for.cond.cleanup:
+; CHECK-NEXT: ret void
+; CHECK: for.body:
+; CHECK-NEXT: [[I:%.*]] = phi i32 [ [[INC_PEEL9]], [[ENTRY_PEEL_NEWPH]] ], [ [[INC:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT: [[X:%.*]] = phi i32 [ [[ADD_PEEL]], [[ENTRY_PEEL_NEWPH]] ], [ [[Y:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT: [[A:%.*]] = phi i32 [ [[ADD_PEEL8]], [[ENTRY_PEEL_NEWPH]] ], [ [[ADD:%.*]], [[FOR_BODY]] ]
+; CHECK-NEXT: [[Y]] = phi i32 [ [[ADD_PEEL3]], [[ENTRY_PEEL_NEWPH]] ], [ [[A]], [[FOR_BODY]] ]
+; CHECK-NEXT: tail call void @_Z1gi(i32 signext [[X]])
+; CHECK-NEXT: tail call void @_Z1gi(i32 signext [[A]])
+; CHECK-NEXT: [[ADD]] = add nuw nsw i32 [[I]], 2
+; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I]], 1
+; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i32 [[INC]], 100000
+; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], !llvm.loop [[LOOP3:![0-9]+]]
+;
+entry:
+ br label %for.body
+
+for.cond.cleanup:
+ ret void
+
+for.body:
+ %i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
+ %x = phi i32 [ 0, %entry ], [ %y, %for.body ]
+ %a = phi i32 [ 0, %entry ], [ %add, %for.body ]
+ %y = phi i32 [ 0, %entry ], [ %a, %for.body ]
+ tail call void @_Z1gi(i32 signext %x)
+ tail call void @_Z1gi(i32 signext %a)
+ %add = add nuw nsw i32 %i, 2
+ %inc = add nuw nsw i32 %i, 1
+ %exitcond = icmp ne i32 %inc, 100000
+ br i1 %exitcond, label %for.body, label %for.cond.cleanup
+}
+
|
nikic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really get what you're trying to achieve here. None of the induction variables involved become invariant (see https://c.godbolt.org/z/WP7KdM66d for the evolution).
Is the point that after some iterations the IVs end up being at fixed offsets from each other?
|
Thanks for your comment. I thought of using loop peeling to convert if it can be simply expressed using induction variables after a few iterations. Unfortunately, a small number of trip counts will result in full unrolling, which is illustrated in the following code example. In that case, the status of each variable and argument per iteration is as follows I think this is equivalent to the following code. // loop peeling part
for (int i = 0; i < 3; i++) {
g(i, x, y, a);
x = y;
y = a;
a = i + 2;
}
// simple loop part
for (int i = 3; i < 1000; i++) {
g(i, i-1, i, i+1);
} |
You can test this locally with the following command:git-clang-format --diff 6b3e0002dfe0029487fc2f8f11f5d5fdc07a5e11 8cd566f056c3e31a5b535af7352f767fa95467c9 -- llvm/lib/Transforms/Utils/LoopPeel.cppView the diff from clang-format here.diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index bcf0c7b6b8..90d0328762 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -259,7 +259,8 @@ PhiAnalyzer::PeelCounter PhiAnalyzer::calculate(Value &V, ScalarEvolution &SE) {
return Unknown;
}
-std::optional<unsigned> PhiAnalyzer::calculateIterationsToPeel(ScalarEvolution &SE) {
+std::optional<unsigned>
+PhiAnalyzer::calculateIterationsToPeel(ScalarEvolution &SE) {
unsigned Iterations = 0;
for (auto &PHI : L.getHeader()->phis()) {
PeelCounter ToInvariance = calculate(PHI, SE);
|
|
Hi @m-sai Overall, this patch looks good to me as I arrived at some of the changes myself in my local branch :) - adding PHI nodes as a starting point is sensible. I have another test case from TSVC benchmark but I believe I can add it after you land this patch. |
| return Unknown; | ||
| } | ||
|
|
||
| // If Induction PHI, register as a starting point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a comment explaining some more context would be useful. May be copy the comments from the test here too as they provide the motivation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@madhur13490 Thanks for your comment.
Comment added. How about something like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Looks good.
|
@m-sai |
|
@madhur13490 |
|
Hello @m-sai Any updates? |
|
Gentle ping @m-sai. Are you planning to progress this? |
|
@madhur13490 |
|
Hi. I discussed this PR with @m-sai and we agreed that I will take over this. He also told me about the cases that cause unexpected peeling, so I'll submit a new PR once this is resolved. |
…21104) LoopPeel currently considers PHI nodes that become loop invariants through peeling. However, in some cases, peeling transforms PHI nodes into induction variables (IVs), potentially enabling further optimizations such as loop vectorization. For example: ```c // TSVC s292 int im = N-1; for (int i=0; i<N; i++) { a[i] = b[i] + b[im]; im = i; } ``` In this case, peeling one iteration converts `im` into an IV, allowing it to be handled by the loop vectorizer. This patch adds a new feature to peel loops when to convert PHIs into IVs. At the moment this feature is disabled by default. Enabling it allows to vectorize the above example. I have measured on neoverse-v2 and observed a speedup of more than 60% (options: `-O3 -ffast-math -mcpu=neoverse-v2 -mllvm -enable-peeling-for-iv`). This PR is taken over from #94900 Related #81851
…on PHI
Add support for Induction PHI when calculating peel count.
The following cases are supported