Skip to content

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Oct 1, 2025

Instead of re-setting the start value of the canonical IV when vectorizing the epilogue we can emit an Add VPInstruction to provide canonical IV value, adjusted by the resume value from the main loop.

This is in preparation to make the canonical IV a VPValue defined by loop regions. It ensures that the canonical IV always starts at 0.

@llvmbot
Copy link
Member

llvmbot commented Oct 1, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Instead of re-setting the start value of the canonical IV when vectorizing the epilogue we can emit an Add VPInstruction to provide canonical IV value, adjusted by the resume value from the main loop.

This is in preparation to make the canonical IV a VPValue defined by loop regions. It ensures that the canonical IV always starts at 0.


Full diff: https://github.com/llvm/llvm-project/pull/161589.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+11-7)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h (+6)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+11)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index fa5be21dc2b8a..cbdedf886ee16 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -9562,7 +9562,10 @@ static SmallVector<Instruction *> preparePlanForEpilogueVectorLoop(
                 }) &&
          "the canonical IV should only be used by its increment or "
          "ScalarIVSteps when resetting the start value");
-  IV->setOperand(0, VPV);
+  VPBuilder Builder(Header, Header->getFirstNonPhi());
+  VPInstruction *Add = Builder.createNaryOp(Instruction::Add, {IV, VPV});
+  IV->replaceAllUsesWith(Add);
+  Add->setOperand(0, IV);
 
   DenseMap<Value *, Value *> ToFrozen;
   SmallVector<Instruction *> InstsToMove;
@@ -9596,17 +9599,18 @@ static SmallVector<Instruction *> preparePlanForEpilogueVectorLoop(
         ToFrozen[StartV] = cast<PHINode>(ResumeV)->getIncomingValueForBlock(
             EPI.MainLoopIterationCountCheck);
 
-        // VPReductionPHIRecipe for FindFirstIV/FindLastIV reductions requires
-        // an adjustment to the resume value. The resume value is adjusted to
-        // the sentinel value when the final value from the main vector loop
-        // equals the start value. This ensures correctness when the start value
-        // might not be less than the minimum value of a monotonically
-        // increasing induction variable.
+        // VPReductionPHIRecipe for FindFirstIV/FindLastIV reductions
+        // requires an adjustment to the resume value. The resume value is
+        // adjusted to the sentinel value when the final value from the main
+        // vector loop equals the start value. This ensures correctness when
+        // the start value might not be less than the minimum value of a
+        // monotonically increasing induction variable.
         BasicBlock *ResumeBB = cast<Instruction>(ResumeV)->getParent();
         IRBuilder<> Builder(ResumeBB, ResumeBB->getFirstNonPHIIt());
         Value *Cmp = Builder.CreateICmpEQ(ResumeV, ToFrozen[StartV]);
         if (auto *I = dyn_cast<Instruction>(Cmp))
           InstsToMove.push_back(I);
+
         Value *Sentinel = RdxResult->getOperand(2)->getLiveInIRValue();
         ResumeV = Builder.CreateSelect(Cmp, Sentinel, ResumeV);
         if (auto *I = dyn_cast<Instruction>(ResumeV))
diff --git a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
index 555efea1ea840..de8f5f944d1a6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
@@ -431,6 +431,12 @@ m_c_Binary(const Op0_t &Op0, const Op1_t &Op1) {
   return AllRecipe_commutative_match<Opcode, Op0_t, Op1_t>(Op0, Op1);
 }
 
+template <typename Op0_t, typename Op1_t>
+inline AllRecipe_match<Instruction::Add, Op0_t, Op1_t> m_Add(const Op0_t &Op0,
+                                                             const Op1_t &Op1) {
+  return m_Binary<Instruction::Add, Op0_t, Op1_t>(Op0, Op1);
+}
+
 template <typename Op0_t, typename Op1_t>
 inline AllRecipe_commutative_match<Instruction::Add, Op0_t, Op1_t>
 m_c_Add(const Op0_t &Op0, const Op1_t &Op1) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index a73b083cff7fd..9792c3f5adc76 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1235,6 +1235,17 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) {
   if (!Plan->isUnrolled())
     return;
 
+  if (match(Def, m_Add(m_VPValue(X), m_VPValue(Y))) && Y->isLiveIn() &&
+      isa<VPPhi>(X)) {
+    auto *Phi = cast<VPPhi>(X);
+    if (Phi->getOperand(1) != Def && match(Phi->getOperand(0), m_ZeroInt()) &&
+        Phi->getNumUsers() == 1 && (*Phi->user_begin() == &R)) {
+      Phi->setOperand(0, Y);
+      Def->replaceAllUsesWith(Phi);
+      return;
+    }
+  }
+
   // VPVectorPointer for part 0 can be replaced by their start pointer.
   if (auto *VecPtr = dyn_cast<VPVectorPointerRecipe>(&R)) {
     if (VecPtr->isFirstPart()) {

@fhahn fhahn force-pushed the vplan-can-iv-dont-reset branch 2 times, most recently from a1c6b3c to 6391ca9 Compare October 5, 2025 12:51
Instead of re-setting the start value of the canonical IV when
vectorizing the epilogue we can emit an Add VPInstruction to provide
canonical IV value, adjusted by the resume value from the main loop.

This is in preparation to make the canonical IV a VPValue defined by
loop regions. It ensures that the canonical IV always starts at 0.
@fhahn fhahn force-pushed the vplan-can-iv-dont-reset branch from 6391ca9 to 84bd5ef Compare October 5, 2025 20:11
Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for extracting from #156262, copied over related minor comments.

Title might be a bit more accurate, as the start value of the canonical IV continues to be reset, but later. Perhaps "Delay reset of epilog canonical IV start value". Also, NFC?

Comment on lines 434 to 439
template <typename Op0_t, typename Op1_t>
inline AllRecipe_match<Instruction::Add, Op0_t, Op1_t> m_Add(const Op0_t &Op0,
const Op1_t &Op1) {
return m_Binary<Instruction::Add, Op0_t, Op1_t>(Op0, Op1);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commutative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, can use the existing m_c_Add, removed the changes here, thanks

// below.
if (!Plan->isUnrolled())
return;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Hoist an invariant increment Y of a canonical IV phi X, by having X start at Y.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added with minor edit, as this applies to any (scalar) phi

"all incoming values must be 0");
EPI.VectorTripCount = EPResumeVal->getOperand(0);
}
VPValue *VPV = Plan.getOrAddLiveIn(EPResumeVal);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent: better rename VPV

@fhahn fhahn merged commit 4b8cac2 into llvm:main Oct 11, 2025
10 checks passed
@fhahn fhahn deleted the vplan-can-iv-dont-reset branch October 11, 2025 21:21
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 11, 2025
Instead of re-setting the start value of the canonical IV when
vectorizing the epilogue we can emit an Add VPInstruction to provide
canonical IV value, adjusted by the resume value from the main loop.

This is in preparation to make the canonical IV a VPValue defined by
loop regions. It ensures that the canonical IV always starts at 0.

PR: llvm/llvm-project#161589
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 11, 2025

LLVM Buildbot has detected a new failure on builder clang-armv8-quick running on linaro-clang-armv8-quick while building llvm at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/154/builds/22623

Here is the relevant piece of the build log for the reference
Step 5 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'lit :: timeout-hang.py' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 18
not env -u FILECHECK_OPTS "/usr/bin/python3.10" /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/llvm/utils/lit/lit.py -j1 --order=lexical Inputs/timeout-hang/run-nonexistent.txt  --timeout=15 --param external=0 | "/usr/bin/python3.10" /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/utils/lit/tests/timeout-hang.py 5
# executed command: not env -u FILECHECK_OPTS /usr/bin/python3.10 /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/llvm/utils/lit/lit.py -j1 --order=lexical Inputs/timeout-hang/run-nonexistent.txt --timeout=15 --param external=0
# .---command stderr------------
# | lit.py: /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 15 seconds was requested on the command line. Forcing timeout to be 15 seconds.
# `-----------------------------
# executed command: /usr/bin/python3.10 /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/utils/lit/tests/timeout-hang.py 5
# .---command stdout------------
# | Testing took 5.17s, which is beyond the grace period of 5.0s
# `-----------------------------
# error: command failed with exit status: 1

--

********************


DharuniRAcharya pushed a commit to DharuniRAcharya/llvm-project that referenced this pull request Oct 13, 2025
Instead of re-setting the start value of the canonical IV when
vectorizing the epilogue we can emit an Add VPInstruction to provide
canonical IV value, adjusted by the resume value from the main loop.

This is in preparation to make the canonical IV a VPValue defined by
loop regions. It ensures that the canonical IV always starts at 0.

PR: llvm#161589
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants