-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[VPlan] Don't reset canonical IV start value. #161589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesInstead of re-setting the start value of the canonical IV when vectorizing the epilogue we can emit an Add VPInstruction to provide canonical IV value, adjusted by the resume value from the main loop. This is in preparation to make the canonical IV a VPValue defined by loop regions. It ensures that the canonical IV always starts at 0. Full diff: https://github.com/llvm/llvm-project/pull/161589.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index fa5be21dc2b8a..cbdedf886ee16 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -9562,7 +9562,10 @@ static SmallVector<Instruction *> preparePlanForEpilogueVectorLoop(
}) &&
"the canonical IV should only be used by its increment or "
"ScalarIVSteps when resetting the start value");
- IV->setOperand(0, VPV);
+ VPBuilder Builder(Header, Header->getFirstNonPhi());
+ VPInstruction *Add = Builder.createNaryOp(Instruction::Add, {IV, VPV});
+ IV->replaceAllUsesWith(Add);
+ Add->setOperand(0, IV);
DenseMap<Value *, Value *> ToFrozen;
SmallVector<Instruction *> InstsToMove;
@@ -9596,17 +9599,18 @@ static SmallVector<Instruction *> preparePlanForEpilogueVectorLoop(
ToFrozen[StartV] = cast<PHINode>(ResumeV)->getIncomingValueForBlock(
EPI.MainLoopIterationCountCheck);
- // VPReductionPHIRecipe for FindFirstIV/FindLastIV reductions requires
- // an adjustment to the resume value. The resume value is adjusted to
- // the sentinel value when the final value from the main vector loop
- // equals the start value. This ensures correctness when the start value
- // might not be less than the minimum value of a monotonically
- // increasing induction variable.
+ // VPReductionPHIRecipe for FindFirstIV/FindLastIV reductions
+ // requires an adjustment to the resume value. The resume value is
+ // adjusted to the sentinel value when the final value from the main
+ // vector loop equals the start value. This ensures correctness when
+ // the start value might not be less than the minimum value of a
+ // monotonically increasing induction variable.
BasicBlock *ResumeBB = cast<Instruction>(ResumeV)->getParent();
IRBuilder<> Builder(ResumeBB, ResumeBB->getFirstNonPHIIt());
Value *Cmp = Builder.CreateICmpEQ(ResumeV, ToFrozen[StartV]);
if (auto *I = dyn_cast<Instruction>(Cmp))
InstsToMove.push_back(I);
+
Value *Sentinel = RdxResult->getOperand(2)->getLiveInIRValue();
ResumeV = Builder.CreateSelect(Cmp, Sentinel, ResumeV);
if (auto *I = dyn_cast<Instruction>(ResumeV))
diff --git a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
index 555efea1ea840..de8f5f944d1a6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
@@ -431,6 +431,12 @@ m_c_Binary(const Op0_t &Op0, const Op1_t &Op1) {
return AllRecipe_commutative_match<Opcode, Op0_t, Op1_t>(Op0, Op1);
}
+template <typename Op0_t, typename Op1_t>
+inline AllRecipe_match<Instruction::Add, Op0_t, Op1_t> m_Add(const Op0_t &Op0,
+ const Op1_t &Op1) {
+ return m_Binary<Instruction::Add, Op0_t, Op1_t>(Op0, Op1);
+}
+
template <typename Op0_t, typename Op1_t>
inline AllRecipe_commutative_match<Instruction::Add, Op0_t, Op1_t>
m_c_Add(const Op0_t &Op0, const Op1_t &Op1) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index a73b083cff7fd..9792c3f5adc76 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1235,6 +1235,17 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) {
if (!Plan->isUnrolled())
return;
+ if (match(Def, m_Add(m_VPValue(X), m_VPValue(Y))) && Y->isLiveIn() &&
+ isa<VPPhi>(X)) {
+ auto *Phi = cast<VPPhi>(X);
+ if (Phi->getOperand(1) != Def && match(Phi->getOperand(0), m_ZeroInt()) &&
+ Phi->getNumUsers() == 1 && (*Phi->user_begin() == &R)) {
+ Phi->setOperand(0, Y);
+ Def->replaceAllUsesWith(Phi);
+ return;
+ }
+ }
+
// VPVectorPointer for part 0 can be replaced by their start pointer.
if (auto *VecPtr = dyn_cast<VPVectorPointerRecipe>(&R)) {
if (VecPtr->isFirstPart()) {
|
a1c6b3c
to
6391ca9
Compare
Instead of re-setting the start value of the canonical IV when vectorizing the epilogue we can emit an Add VPInstruction to provide canonical IV value, adjusted by the resume value from the main loop. This is in preparation to make the canonical IV a VPValue defined by loop regions. It ensures that the canonical IV always starts at 0.
6391ca9
to
84bd5ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for extracting from #156262, copied over related minor comments.
Title might be a bit more accurate, as the start value of the canonical IV continues to be reset, but later. Perhaps "Delay reset of epilog canonical IV start value". Also, NFC?
template <typename Op0_t, typename Op1_t> | ||
inline AllRecipe_match<Instruction::Add, Op0_t, Op1_t> m_Add(const Op0_t &Op0, | ||
const Op1_t &Op1) { | ||
return m_Binary<Instruction::Add, Op0_t, Op1_t>(Op0, Op1); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commutative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, can use the existing m_c_Add, removed the changes here, thanks
// below. | ||
if (!Plan->isUnrolled()) | ||
return; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Hoist an invariant increment Y of a canonical IV phi X, by having X start at Y. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added with minor edit, as this applies to any (scalar) phi
"all incoming values must be 0"); | ||
EPI.VectorTripCount = EPResumeVal->getOperand(0); | ||
} | ||
VPValue *VPV = Plan.getOrAddLiveIn(EPResumeVal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Independent: better rename VPV
Instead of re-setting the start value of the canonical IV when vectorizing the epilogue we can emit an Add VPInstruction to provide canonical IV value, adjusted by the resume value from the main loop. This is in preparation to make the canonical IV a VPValue defined by loop regions. It ensures that the canonical IV always starts at 0. PR: llvm/llvm-project#161589
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/154/builds/22623 Here is the relevant piece of the build log for the reference
|
Instead of re-setting the start value of the canonical IV when vectorizing the epilogue we can emit an Add VPInstruction to provide canonical IV value, adjusted by the resume value from the main loop. This is in preparation to make the canonical IV a VPValue defined by loop regions. It ensures that the canonical IV always starts at 0. PR: llvm#161589
Instead of re-setting the start value of the canonical IV when vectorizing the epilogue we can emit an Add VPInstruction to provide canonical IV value, adjusted by the resume value from the main loop.
This is in preparation to make the canonical IV a VPValue defined by loop regions. It ensures that the canonical IV always starts at 0.