Skip to content

[VPlan] Move initial skeleton construction earlier (NFC). #150848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 9, 2025

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Jul 27, 2025

Split up the not clearly named prepareForVectorization transform into addInitialSkeleton, which adds the vector preheader, middle and scalar preheader blocks, as well as the canonical induction recipes and sets the trip count. The new transform is run directly after building the plain CFG VPlan initially.

The remaining code handling early exits and adding the branch in the middle block is renamed to handleEarlyExitsAndAddMiddleCheck and still runs at the original position.

With the code movement, we only have to add the skeleton once to the initial VPlan, and cloning will take care of the rest. It will also enable moving other construction steps to work directly on VPlan0, like adding resume phis.

Split up the not clearly named prepareForVectorization transform into
addInitialSkeleton, which adds the vector preheader, middle and scalar
preheader blocks, as well as the canonical induction recipes and sets
the trip count. The new transform is run directly after building the
plain CFG VPlan initially.

The remaining code handling early exits and adding the branch in the
middle block is renamed to handleEarlyExitsAndAddMiddleCheck and still
runs at the original position.

With the code movement, we only have to add the skeleton once to the
initial VPlan, and cloning will take care of the rest. It will also
enable moving other construction steps to work directly on VPlan0, like
adding resume phis.
@llvmbot
Copy link
Member

llvmbot commented Jul 27, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Split up the not clearly named prepareForVectorization transform into addInitialSkeleton, which adds the vector preheader, middle and scalar preheader blocks, as well as the canonical induction recipes and sets the trip count. The new transform is run directly after building the plain CFG VPlan initially.

The remaining code handling early exits and adding the branch in the middle block is renamed to handleEarlyExitsAndAddMiddleCheck and still runs at the original position.

With the code movement, we only have to add the skeleton once to the initial VPlan, and cloning will take care of the rest. It will also enable moving other construction steps to work directly on VPlan0, like adding resume phis.


Full diff: https://github.com/llvm/llvm-project/pull/150848.diff

4 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+20-15)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp (+42-34)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+14-10)
  • (modified) llvm/unittests/Transforms/Vectorize/VPlanTestBase.h (+5-2)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 6616e61f9bb84..59b276b5096fd 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8373,8 +8373,18 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
     LVer.prepareNoAliasMetadata();
   }
 
-  auto MaxVFTimes2 = MaxVF * 2;
+  // Create initial VPlan skeleton, having a basic block for the pre-header
+  // which contains SCEV expansions that need to happen before the CFG is
+  // modified; a basic block for the vector pre-header, followed by a region for
+  // the vector loop, followed by the middle basic block, connecting to the
+  // scalar preheader and exit blcoks.
   auto VPlan0 = VPlanTransforms::buildPlainCFG(OrigLoop, *LI);
+  VPlanTransforms::addInitialSkeleton(
+      *VPlan0, Legal->getWidestInductionType(),
+      getDebugLocFromInstOrOperands(Legal->getPrimaryInduction()), PSE,
+      OrigLoop);
+
+  auto MaxVFTimes2 = MaxVF * 2;
   for (ElementCount VF = MinVF; ElementCount::isKnownLT(VF, MaxVFTimes2);) {
     VFRange SubRange = {VF, MaxVFTimes2};
     if (auto Plan = tryToBuildVPlanWithVPRecipes(
@@ -8615,22 +8625,14 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
   // visit each basic block after having visited its predecessor basic blocks.
   // ---------------------------------------------------------------------------
 
-  // Create initial VPlan skeleton, having a basic block for the pre-header
-  // which contains SCEV expansions that need to happen before the CFG is
-  // modified; a basic block for the vector pre-header, followed by a region for
-  // the vector loop, followed by the middle basic block. The skeleton vector
-  // loop region contains a header and latch basic blocks.
-
   bool RequiresScalarEpilogueCheck =
       LoopVectorizationPlanner::getDecisionAndClampRange(
           [this](ElementCount VF) {
             return !CM.requiresScalarEpilogue(VF.isVector());
           },
           Range);
-  VPlanTransforms::prepareForVectorization(
-      *Plan, Legal->getWidestInductionType(), PSE, RequiresScalarEpilogueCheck,
-      CM.foldTailByMasking(), OrigLoop,
-      getDebugLocFromInstOrOperands(Legal->getPrimaryInduction()),
+  VPlanTransforms::handleEarlyExitsAndAddMiddleCheck(
+      *Plan, RequiresScalarEpilogueCheck, CM.foldTailByMasking(),
       Legal->hasUncountableEarlyExit(), Range);
   VPlanTransforms::createLoopRegions(*Plan);
   VPlanTransforms::createExtractsForLiveOuts(*Plan);
@@ -8918,10 +8920,13 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlan(VFRange &Range) {
   assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");
 
   auto Plan = VPlanTransforms::buildPlainCFG(OrigLoop, *LI);
-  VPlanTransforms::prepareForVectorization(
-      *Plan, Legal->getWidestInductionType(), PSE, true, false, OrigLoop,
-      getDebugLocFromInstOrOperands(Legal->getPrimaryInduction()), false,
-      Range);
+
+  VPlanTransforms::addInitialSkeleton(
+      *Plan, Legal->getWidestInductionType(),
+      getDebugLocFromInstOrOperands(Legal->getPrimaryInduction()), PSE,
+      OrigLoop);
+  VPlanTransforms::handleEarlyExitsAndAddMiddleCheck(*Plan, true, false, false,
+                                                     Range);
   VPlanTransforms::createLoopRegions(*Plan);
 
   for (ElementCount VF : Range)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
index 6c1f53b4eaa24..92b2d017d7a11 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
@@ -459,10 +459,10 @@ static void addCanonicalIVRecipes(VPlan &Plan, VPBasicBlock *HeaderVPBB,
                        LatchDL);
 }
 
-void VPlanTransforms::prepareForVectorization(
-    VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
-    bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop,
-    DebugLoc IVDL, bool HasUncountableEarlyExit, VFRange &Range) {
+void VPlanTransforms::addInitialSkeleton(VPlan &Plan, Type *InductionTy,
+                                         DebugLoc IVDL,
+                                         PredicatedScalarEvolution &PSE,
+                                         Loop *TheLoop) {
   VPDominatorTree VPDT;
   VPDT.recalculate(Plan);
 
@@ -488,12 +488,46 @@ void VPlanTransforms::prepareForVectorization(
 
   addCanonicalIVRecipes(Plan, HeaderVPBB, LatchVPBB, InductionTy, IVDL);
 
-  [[maybe_unused]] bool HandledUncountableEarlyExit = false;
+  // Create SCEV and VPValue for the trip count.
+  // We use the symbolic max backedge-taken-count, which works also when
+  // vectorizing loops with uncountable early exits.
+  const SCEV *BackedgeTakenCountSCEV = PSE.getSymbolicMaxBackedgeTakenCount();
+  assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCountSCEV) &&
+         "Invalid loop count");
+  ScalarEvolution &SE = *PSE.getSE();
+  const SCEV *TripCount = SE.getTripCountFromExitCount(BackedgeTakenCountSCEV,
+                                                       InductionTy, TheLoop);
+  Plan.setTripCount(
+      vputils::getOrCreateVPValueForSCEVExpr(Plan, TripCount, SE));
+
+  VPBasicBlock *ScalarPH = Plan.createVPBasicBlock("scalar.ph");
+  VPBlockUtils::connectBlocks(ScalarPH, Plan.getScalarHeader());
+
+  // The connection order corresponds to the operands of the conditional branch,
+  // with the middle block already connected to the exit block.
+  VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
+  // Also connect the entry block to the scalar preheader.
+  // TODO: Also introduce a branch recipe together with the minimum trip count
+  // check.
+  VPBlockUtils::connectBlocks(Plan.getEntry(), ScalarPH);
+  Plan.getEntry()->swapSuccessors();
+}
+
+void VPlanTransforms::handleEarlyExitsAndAddMiddleCheck(
+    VPlan &Plan, bool RequiresScalarEpilogueCheck, bool TailFolded,
+    bool HasUncountableEarlyExit, VFRange &Range) {
+  auto *MiddleVPBB = cast<VPBasicBlock>(
+      Plan.getScalarHeader()->getSinglePredecessor()->getPredecessors()[0]);
+  VPBlockBase *HeaderVPB =
+      Plan.getEntry()->getSuccessors()[1]->getSingleSuccessor();
+  auto *LatchVPBB = cast<VPBasicBlock>(HeaderVPB->getPredecessors()[1]);
+
   // Disconnect all early exits from the loop leaving it with a single exit from
   // the latch. Early exits that are countable are left for a scalar epilog. The
   // condition of uncountable early exits (currently at most one is supported)
   // is fused into the latch exit, and used to branch from middle block to the
   // early exit destination.
+  [[maybe_unused]] bool HandledUncountableEarlyExit = false;
   for (VPIRBasicBlock *EB : Plan.getExitBlocks()) {
     for (VPBlockBase *Pred : to_vector(EB->getPredecessors())) {
       if (Pred == MiddleVPBB)
@@ -502,7 +536,8 @@ void VPlanTransforms::prepareForVectorization(
         assert(!HandledUncountableEarlyExit &&
                "can handle exactly one uncountable early exit");
         handleUncountableEarlyExit(cast<VPBasicBlock>(Pred), EB, Plan,
-                                   HeaderVPBB, LatchVPBB, Range);
+                                   cast<VPBasicBlock>(HeaderVPB), LatchVPBB,
+                                   Range);
         HandledUncountableEarlyExit = true;
       } else {
         for (VPRecipeBase &R : EB->phis())
@@ -516,38 +551,11 @@ void VPlanTransforms::prepareForVectorization(
   assert((!HasUncountableEarlyExit || HandledUncountableEarlyExit) &&
          "missed an uncountable exit that must be handled");
 
-  // Create SCEV and VPValue for the trip count.
-  // We use the symbolic max backedge-taken-count, which works also when
-  // vectorizing loops with uncountable early exits.
-  const SCEV *BackedgeTakenCountSCEV = PSE.getSymbolicMaxBackedgeTakenCount();
-  assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCountSCEV) &&
-         "Invalid loop count");
-  ScalarEvolution &SE = *PSE.getSE();
-  const SCEV *TripCount = SE.getTripCountFromExitCount(BackedgeTakenCountSCEV,
-                                                       InductionTy, TheLoop);
-  Plan.setTripCount(
-      vputils::getOrCreateVPValueForSCEVExpr(Plan, TripCount, SE));
-
-  VPBasicBlock *ScalarPH = Plan.createVPBasicBlock("scalar.ph");
-  VPBlockUtils::connectBlocks(ScalarPH, Plan.getScalarHeader());
-
-  // The connection order corresponds to the operands of the conditional branch,
-  // with the middle block already connected to the exit block.
-  VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
-  // Also connect the entry block to the scalar preheader.
-  // TODO: Also introduce a branch recipe together with the minimum trip count
-  // check.
-  VPBlockUtils::connectBlocks(Plan.getEntry(), ScalarPH);
-  Plan.getEntry()->swapSuccessors();
-
   // If MiddleVPBB has a single successor then the original loop does not exit
   // via the latch and the single successor must be the scalar preheader.
   // There's no need to add a runtime check to MiddleVPBB.
-  if (MiddleVPBB->getNumSuccessors() == 1) {
-    assert(MiddleVPBB->getSingleSuccessor() == ScalarPH &&
-           "must have ScalarPH as single successor");
+  if (MiddleVPBB->getNumSuccessors() == 1)
     return;
-  }
 
   assert(MiddleVPBB->getNumSuccessors() == 2 && "must have 2 successors");
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index d5af6cd73a4a0..0b24cb00ff73c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -58,17 +58,21 @@ struct VPlanTransforms {
                                                                 LoopInfo &LI);
 
   /// Prepare the plan for vectorization. It will introduce a dedicated
-  /// VPBasicBlock for the vector pre-header as well as a VPBasicBlock as exit
-  /// block of the main vector loop (middle.block). If a check is needed to
+  /// VPBasicBlock for the vector pre-header, a VPBasicBlock as exit
+  /// block of the main vector loop (middle.block) and a VPBaiscBlock for the
+  /// scalar preheader. It also adds a canonical IV and its increment, using \p
+  /// InductionTy and \p IVDL, and creates a VPValue expression for the original
+  /// trip count.
+  LLVM_ABI_FOR_TEST static void
+  addInitialSkeleton(VPlan &Plan, Type *InductionTy, DebugLoc IVDL,
+                     PredicatedScalarEvolution &PSE, Loop *TheLoop);
+
+  /// Update \p Plan to account for all early exits. If a check is needed to
   /// guard executing the scalar epilogue loop, it will be added to the middle
-  /// block, together with VPBasicBlocks for the scalar preheader and exit
-  /// blocks. \p InductionTy is the type of the canonical induction and used for
-  /// related values, like the trip count expression.  It also creates a VPValue
-  /// expression for the original trip count.
-  LLVM_ABI_FOR_TEST static void prepareForVectorization(
-      VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
-      bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop,
-      DebugLoc IVDL, bool HasUncountableExit, VFRange &Range);
+  /// block
+  LLVM_ABI_FOR_TEST static void handleEarlyExitsAndAddMiddleCheck(
+      VPlan &Plan, bool RequiresScalarEpilogueCheck, bool TailFolded,
+      bool HasUncountableExit, VFRange &Range);
 
   /// Replace loops in \p Plan's flat CFG with VPRegionBlocks, turning \p Plan's
   /// flat CFG into a hierarchical CFG.
diff --git a/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h b/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h
index 7dfd11a48b595..877394cc22ba2 100644
--- a/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h
+++ b/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h
@@ -74,8 +74,11 @@ class VPlanTestIRBase : public testing::Test {
     PredicatedScalarEvolution PSE(*SE, *L);
     auto Plan = VPlanTransforms::buildPlainCFG(L, *LI);
     VFRange R(ElementCount::getFixed(1), ElementCount::getFixed(2));
-    VPlanTransforms::prepareForVectorization(*Plan, IntegerType::get(*Ctx, 64),
-                                             PSE, true, false, L, {}, false, R);
+    VPlanTransforms::addInitialSkeleton(*Plan, IntegerType::get(*Ctx, 64), {},
+                                        PSE, L);
+
+    VPlanTransforms::handleEarlyExitsAndAddMiddleCheck(*Plan, true, false,
+                                                       false, R);
     VPlanTransforms::createLoopRegions(*Plan);
     return Plan;
   }

Copy link
Contributor

@artagnon artagnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall; just a few suggestions/clarifications.

Comment on lines 8928 to 8929
VPlanTransforms::handleEarlyExitsAndAddMiddleCheck(*Plan, true, false, false,
Range);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline comments annotating the true/false args?

Comment on lines 547 to 546
assert(MiddleVPBB->getSingleSuccessor() == ScalarPH &&
"must have ScalarPH as single successor");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this assert gone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easy to retrieve the scalar preheader here, added back, thanks

VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop,
DebugLoc IVDL, bool HasUncountableExit, VFRange &Range);
/// block
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// block
/// block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks

Copy link
Contributor

@artagnon artagnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post approval review.

Comment on lines 8376 to 8380
// Create initial VPlan skeleton, having a basic block for the pre-header
// which contains SCEV expansions that need to happen before the CFG is
// modified; a basic block for the vector pre-header, followed by a region for
// the vector loop, followed by the middle basic block, connecting to the
// scalar preheader and exit blcoks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to place the above documentation next to the definition of addInitialSkeleton()?

Here it would be good to explain that a "base" VPlan0 is built to serve as a common generic starting point for all candidates built later for specific VF ranges.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

Comment on lines 8634 to 8635
VPlanTransforms::handleEarlyExitsAndAddMiddleCheck(
*Plan, RequiresScalarEpilogueCheck, CM.foldTailByMasking(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential follow-up: would be good to complete the common skeleton with a canonical representation of early exits and middle check, to be specialized per each VF range according to its RequiresScalarEpilogueCheck (fold tail is currently also common?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep taht sounds good!

*Plan, Legal->getWidestInductionType(),
getDebugLocFromInstOrOperands(Legal->getPrimaryInduction()), PSE,
OrigLoop);
VPlanTransforms::handleEarlyExitsAndAddMiddleCheck(*Plan, true, false, false,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Suggested change
VPlanTransforms::handleEarlyExitsAndAddMiddleCheck(*Plan, true, false, false,
VPlanTransforms::handleEarlyExitsAndAddMiddleCheck(*Plan, true /* requires scalar epilog check */, false /*fold tail by masking */, false /*has uncountable exit */,

Comment on lines 77 to 78
VPlanTransforms::addInitialSkeleton(*Plan, IntegerType::get(*Ctx, 64), {},
PSE, L);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addInitialSkeleton() complements buildPlainCFG() in constructing the common VF-agnostic VPlan0 base, so best appear together, if not fused:

Suggested change
VPlanTransforms::addInitialSkeleton(*Plan, IntegerType::get(*Ctx, 64), {},
PSE, L);
VPlanTransforms::addInitialSkeleton(*Plan, IntegerType::get(*Ctx, 64), {},
PSE, L);
Range R(ElementCount::getFixed(1), ElementCount::getFixed(2));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks

bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop,
DebugLoc IVDL, bool HasUncountableExit, VFRange &Range);
/// block
LLVM_ABI_FOR_TEST static void handleEarlyExitsAndAddMiddleCheck(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM_ABI_FOR_TEST?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has been a recent addition, and now all symbols not to be exported to the public LLVM API but needed for unit tests need to be marked as such. It looks like this was for better supporting shared libraries with symbols hidden by default: 4e2efa5

*Plan, Legal->getWidestInductionType(), PSE, true, false, OrigLoop,
getDebugLocFromInstOrOperands(Legal->getPrimaryInduction()), false,
Range);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Should buildPlainCFG() be called buildPlainLoopCFG() or buildPlainCFGForLoop(), as it focuses on building the basic blocks of the loop along with its preheader and exit. buildInitialSkeleton()also builds plain CFG, can be calledconnectPlainLoopCFG(), as it adds new basic blocks before and after the loop connecting it(?). Should a new buildSkeletalPlan()callbuildPlainCFG()followed byaddInitialSkeleton()`? Together they build an initial CFG-based skeleton of VPlan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combined into buildVPlan0, wdyt?

I tried to combine the comments as well as possibe

/// VPBasicBlock for the vector pre-header as well as a VPBasicBlock as exit
/// block of the main vector loop (middle.block). If a check is needed to
/// VPBasicBlock for the vector pre-header, a VPBasicBlock as exit
/// block of the main vector loop (middle.block) and a VPBaiscBlock for the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// block of the main vector loop (middle.block) and a VPBaiscBlock for the
/// block of the main vector loop (middle.block) and a VPBasicBlock for the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be fixed in the latest version, thanks

Comment on lines 63 to 65
/// scalar preheader. It also adds a canonical IV and its increment, using \p
/// InductionTy and \p IVDL, and creates a VPValue expression for the original
/// trip count.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this "also" part that handles values and recipes rather than CFG blocks and edges be handled by a separate function?

Comment on lines 519 to 522
auto *MiddleVPBB = cast<VPBasicBlock>(
Plan.getScalarHeader()->getSinglePredecessor()->getPredecessors()[0]);
VPBlockBase *HeaderVPB =
Plan.getEntry()->getSuccessors()[1]->getSingleSuccessor();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem somewhat obscure and fragile. Perhaps better fuse into a common canonical skeleton form.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, next step would be to move this forward also.

Plan.getEntry()->swapSuccessors();
}

void VPlanTransforms::handleEarlyExitsAndAddMiddleCheck(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting middle block is one part of handling certain early exits.

Suggested change
void VPlanTransforms::handleEarlyExitsAndAddMiddleCheck(
void VPlanTransforms::handleEarlyExits(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently also handled connecting the exit block/scalar ph.

Once the PR lands we could default to creating the branch and then replacing it with kown-true/false depending on tail-folding/requireScalarEpilogue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better split into separate handleEarlyExits() and addMiddleBranch()?
(Former is unneeded for native, although perhaps better to keep aligned.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split up thanks.

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Adding some minor suggestions.

/// \p TheLoop being directly translated to VPBasicBlocks with VPInstruction
/// corresponding to the input IR.
///
/// The created loop is wrapped into an initial skeleton to facilitate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The created loop is wrapped into an initial skeleton to facilitate
/// The created loop is wrapped in an initial skeleton to facilitate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks

/// corresponding to the input IR.
///
/// The created loop is wrapped into an initial skeleton to facilitate
/// vectorization, consisting of a vector pre-header, a exit block for the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// vectorization, consisting of a vector pre-header, a exit block for the
/// vectorization, consisting of a vector pre-header, an exit block for the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks

Plan.getEntry()->swapSuccessors();
}

void VPlanTransforms::handleEarlyExitsAndAddMiddleCheck(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better split into separate handleEarlyExits() and addMiddleBranch()?
(Former is unneeded for native, although perhaps better to keep aligned.)

// check.
VPBlockUtils::connectBlocks(Plan.getEntry(), ScalarPH);
Plan.getEntry()->swapSuccessors();

// If MiddleVPBB has a single successor then the original loop does not exit
// via the latch and the single successor must be the scalar preheader.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent/follow-up: this refers to how LV handles loops whose latch does not exit, i.e., by requiring a scalar epilogue. Better handle all such cases consistently - either by wiring middle block to scalar preheader only as here, or by case 1 below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this should be taken care of by consistently emitting a check if VectorTC == TC. This is the most general, the others are optimizations of cases where we know the check is either always false or true.

bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop,
DebugLoc IVDL, bool HasUncountableExit, VFRange &Range);
/// block
LLVM_ABI_FOR_TEST static void handleEarlyExitsAndAddMiddleCheck(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

@fhahn fhahn merged commit 06fd0f9 into llvm:main Aug 9, 2025
9 checks passed
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 9, 2025
…(#150848)

Split up the not clearly named prepareForVectorization transform into
buildVPlan0, which adds the vector preheader, middle and scalar
preheader blocks, as well as the canonical induction recipes and sets
the trip count. The new transform is run directly after building the
plain CFG VPlan initially.

The remaining code handling early exits and adding the branch in the
middle block is renamed to handleEarlyExitsAndAddMiddleCheck and still
runs at the original position.

With the code movement, we only have to add the skeleton once to the
initial VPlan, and cloning will take care of the rest. It will also
enable moving other construction steps to work directly on VPlan0, like
adding resume phis.

PR: llvm/llvm-project#150848
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants