Skip to content
Merged
16 changes: 9 additions & 7 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9312,14 +9312,15 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
return !CM.requiresScalarEpilogue(VF.isVector());
},
Range);
VPlanPtr Plan = VPlan::createInitialVPlan(Legal->getWidestInductionType(),
PSE, RequiresScalarEpilogueCheck,
CM.foldTailByMasking(), OrigLoop);

auto Plan = std::make_unique<VPlan>(OrigLoop);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above explanation regarding "Create initial VPlan skeleton, having ..." remains intact, this change only affects how to get there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I think so.

// Build hierarchical CFG.
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
HCFGBuilder.buildHierarchicalCFG();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like buildHierarchicalCFG should now (as in TODO) also be a VPlanTransform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that sounds good. I think it would make sense to consolidate the transforms for initial VPlan construction into a separate VPlanConstruction.cpp


VPlanTransforms::introduceTopLevelVectorLoopRegion(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this introduce all regions, i.e., lifting a flat CFG into a hierarchical one? With the inverse lowing conversion taking place at the end, to simplify code-gen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I added a TODO for a follow-up. As this is for now only needed in the native path, it's probably best to do it separately.

*Plan, Legal->getWidestInductionType(), PSE, RequiresScalarEpilogueCheck,
CM.foldTailByMasking(), OrigLoop);

// Don't use getDecisionAndClampRange here, because we don't know the UF
// so this function is better to be conservative, rather than to split
// it up into different VPlans.
Expand Down Expand Up @@ -9615,13 +9616,14 @@ VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");

// Create new empty VPlan
auto Plan = VPlan::createInitialVPlan(Legal->getWidestInductionType(), PSE,
true, false, OrigLoop);

auto Plan = std::make_unique<VPlan>(OrigLoop);
// Build hierarchical CFG
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
HCFGBuilder.buildHierarchicalCFG();

VPlanTransforms::introduceTopLevelVectorLoopRegion(
*Plan, Legal->getWidestInductionType(), PSE, true, false, OrigLoop);

for (ElementCount VF : Range)
Plan->addVF(VF);

Expand Down
91 changes: 7 additions & 84 deletions llvm/lib/Transforms/Vectorize/VPlan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -880,85 +880,6 @@ VPlan::~VPlan() {
delete BackedgeTakenCount;
}

VPlanPtr VPlan::createInitialVPlan(Type *InductionTy,
PredicatedScalarEvolution &PSE,
bool RequiresScalarEpilogueCheck,
bool TailFolded, Loop *TheLoop) {
auto Plan = std::make_unique<VPlan>(TheLoop);
VPBlockBase *ScalarHeader = Plan->getScalarHeader();

// Connect entry only to vector preheader initially. Entry will also be
// connected to the scalar preheader later, during skeleton creation when
// runtime guards are added as needed. Note that when executing the VPlan for
// an epilogue vector loop, the original entry block here will be replaced by
// a new VPIRBasicBlock wrapping the entry to the epilogue vector loop after
// generating code for the main vector loop.
VPBasicBlock *VecPreheader = Plan->createVPBasicBlock("vector.ph");
VPBlockUtils::connectBlocks(Plan->getEntry(), VecPreheader);

// Create SCEV and VPValue for the trip count.
// We use the symbolic max backedge-taken-count, which works also when
// vectorizing loops with uncountable early exits.
const SCEV *BackedgeTakenCountSCEV = PSE.getSymbolicMaxBackedgeTakenCount();
assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCountSCEV) &&
"Invalid loop count");
ScalarEvolution &SE = *PSE.getSE();
const SCEV *TripCount = SE.getTripCountFromExitCount(BackedgeTakenCountSCEV,
InductionTy, TheLoop);
Plan->TripCount =
vputils::getOrCreateVPValueForSCEVExpr(*Plan, TripCount, SE);

// Create VPRegionBlock, with empty header and latch blocks, to be filled
// during processing later.
VPBasicBlock *HeaderVPBB = Plan->createVPBasicBlock("vector.body");
VPBasicBlock *LatchVPBB = Plan->createVPBasicBlock("vector.latch");
VPBlockUtils::insertBlockAfter(LatchVPBB, HeaderVPBB);
auto *TopRegion = Plan->createVPRegionBlock(
HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);

VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
VPBasicBlock *MiddleVPBB = Plan->createVPBasicBlock("middle.block");
VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);

VPBasicBlock *ScalarPH = Plan->createVPBasicBlock("scalar.ph");
VPBlockUtils::connectBlocks(ScalarPH, ScalarHeader);
if (!RequiresScalarEpilogueCheck) {
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
return Plan;
}

// If needed, add a check in the middle block to see if we have completed
// all of the iterations in the first vector loop. Three cases:
// 1) If (N - N%VF) == N, then we *don't* need to run the remainder.
// Thus if tail is to be folded, we know we don't need to run the
// remainder and we can set the condition to true.
// 2) If we require a scalar epilogue, there is no conditional branch as
// we unconditionally branch to the scalar preheader. Do nothing.
// 3) Otherwise, construct a runtime check.
BasicBlock *IRExitBlock = TheLoop->getUniqueLatchExitBlock();
VPIRBasicBlock *VPExitBlock = Plan->getExitBlock(IRExitBlock);
// The connection order corresponds to the operands of the conditional branch.
VPBlockUtils::insertBlockAfter(VPExitBlock, MiddleVPBB);
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);

auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
// Here we use the same DebugLoc as the scalar loop latch terminator instead
// of the corresponding compare because they may have ended up with
// different line numbers and we want to avoid awkward line stepping while
// debugging. Eg. if the compare has got a line number inside the loop.
VPBuilder Builder(MiddleVPBB);
VPValue *Cmp =
TailFolded
? Plan->getOrAddLiveIn(ConstantInt::getTrue(
IntegerType::getInt1Ty(TripCount->getType()->getContext())))
: Builder.createICmp(CmpInst::ICMP_EQ, Plan->getTripCount(),
&Plan->getVectorTripCount(),
ScalarLatchTerm->getDebugLoc(), "cmp.n");
Builder.createNaryOp(VPInstruction::BranchOnCond, {Cmp},
ScalarLatchTerm->getDebugLoc());
return Plan;
}

void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
VPTransformState &State) {
Type *TCTy = TripCountV->getType();
Expand Down Expand Up @@ -1135,11 +1056,13 @@ void VPlan::printLiveIns(raw_ostream &O) const {
}

O << "\n";
if (TripCount->isLiveIn())
O << "Live-in ";
TripCount->printAsOperand(O, SlotTracker);
O << " = original trip-count";
O << "\n";
if (TripCount) {
if (TripCount->isLiveIn())
O << "Live-in ";
TripCount->printAsOperand(O, SlotTracker);
O << " = original trip-count";
O << "\n";
}
}

LLVM_DUMP_METHOD
Expand Down
17 changes: 2 additions & 15 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -3505,21 +3505,6 @@ class VPlan {
VPBB->setPlan(this);
}

/// Create initial VPlan, having an "entry" VPBasicBlock (wrapping
/// original scalar pre-header) which contains SCEV expansions that need
/// to happen before the CFG is modified (when executing a VPlan for the
/// epilogue vector loop, the original entry needs to be replaced by a new
/// one); a VPBasicBlock for the vector pre-header, followed by a region for
/// the vector loop, followed by the middle VPBasicBlock. If a check is needed
/// to guard executing the scalar epilogue loop, it will be added to the
/// middle block, together with VPBasicBlocks for the scalar preheader and
/// exit blocks. \p InductionTy is the type of the canonical induction and
/// used for related values, like the trip count expression.
static VPlanPtr createInitialVPlan(Type *InductionTy,
PredicatedScalarEvolution &PSE,
bool RequiresScalarEpilogueCheck,
bool TailFolded, Loop *TheLoop);

/// Prepare the plan for execution, setting up the required live-in values.
void prepareToExecute(Value *TripCount, Value *VectorTripCount,
VPTransformState &State);
Expand Down Expand Up @@ -3589,6 +3574,8 @@ class VPlan {
TripCount = NewTripCount;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above error message that "TripCount always must be set" needs to be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

}

void setTripCount(VPValue *NewTripCount) { TripCount = NewTripCount; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better place setTripCount() before resetTripCount().

Suggested change
void setTripCount(VPValue *NewTripCount) { TripCount = NewTripCount; }
// Set the trip count assuming it is currently null; if it is not - use resetTripCount().
void setTripCount(VPValue *NewTripCount) {
assert(!TripCount && "TripCount expected to be null");
TripCount = NewTripCount;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks


/// The backedge taken count of the original loop.
VPValue *getOrCreateBackedgeTakenCount() {
if (!BackedgeTakenCount)
Expand Down
53 changes: 20 additions & 33 deletions llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ VPBasicBlock *PlainCFGBuilder::getOrCreateVPBB(BasicBlock *BB) {

// Get or create a region for the loop containing BB.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Get or create a region for the loop containing BB.
// Get or create a region for the loop containing BB, except for the top region of TheLoop which is created later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

Loop *LoopOfBB = LI->getLoopFor(BB);
if (!LoopOfBB || !doesContainLoop(LoopOfBB, TheLoop))
if (!LoopOfBB || LoopOfBB == TheLoop || !doesContainLoop(LoopOfBB, TheLoop))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the if (LoopOfBB == TheLoop) { case below which is now dead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

return VPBB;

auto *RegionOfVPBB = Loop2Region.lookup(LoopOfBB);
Expand Down Expand Up @@ -353,29 +353,6 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
// Main interface to build the plain CFG.
void PlainCFGBuilder::buildPlainCFG(
DenseMap<VPBlockBase *, BasicBlock *> &VPB2IRBB) {
// 0. Reuse the top-level region, vector-preheader and exit VPBBs from the
// skeleton. These were created directly rather than via getOrCreateVPBB(),
// revisit them now to update BB2VPBB. Note that header/entry and
// latch/exiting VPBB's of top-level region have yet to be created.
VPRegionBlock *TheRegion = Plan.getVectorLoopRegion();
BasicBlock *ThePreheaderBB = TheLoop->getLoopPreheader();
assert((ThePreheaderBB->getTerminator()->getNumSuccessors() == 1) &&
"Unexpected loop preheader");
auto *VectorPreheaderVPBB =
cast<VPBasicBlock>(TheRegion->getSinglePredecessor());
// ThePreheaderBB conceptually corresponds to both Plan.getPreheader() (which
// wraps the original preheader BB) and Plan.getEntry() (which represents the
// new vector preheader); here we're interested in setting BB2VPBB to the
// latter.
BB2VPBB[ThePreheaderBB] = VectorPreheaderVPBB;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TheRegion and VectorPreheaderVPBB are yet to be formed, hence dropping their recording in BB2VPBB and Loop2Region?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will be introduced as part of the transform.

Loop2Region[LI->getLoopFor(TheLoop->getHeader())] = TheRegion;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(btw, LI->getLoopFor(TheLoop->getHeader()) is aka TheLoop?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep


// The existing vector region's entry and exiting VPBBs correspond to the loop
// header and latch.
VPBasicBlock *VectorHeaderVPBB = TheRegion->getEntryBasicBlock();
VPBasicBlock *VectorLatchVPBB = TheRegion->getExitingBasicBlock();
BB2VPBB[TheLoop->getHeader()] = VectorHeaderVPBB;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header of TheLoop need not be mapped in BB2VPBB (yet)? Its corresponding VPBB cannot be retrieved via TheRegion (yet).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, there's no region yet for TheLoop, only the initial entry block to the VPlan.

VectorHeaderVPBB->clearSuccessors();

// 1. Scan the body of the loop in a topological order to visit each basic
// block after having visited its predecessor basic blocks. Create a VPBB for
Expand All @@ -386,6 +363,9 @@ void PlainCFGBuilder::buildPlainCFG(

// Loop PH needs to be explicitly visited since it's not taken into account by
// LoopBlocksDFS.
BasicBlock *ThePreheaderBB = TheLoop->getLoopPreheader();
assert((ThePreheaderBB->getTerminator()->getNumSuccessors() == 1) &&
"Unexpected loop preheader");
for (auto &I : *ThePreheaderBB) {
if (I.getType()->isVoidTy())
continue;
Expand All @@ -406,18 +386,16 @@ void PlainCFGBuilder::buildPlainCFG(
} else {
// BB is a loop header, set the predecessor for the region, except for the
// top region, whose predecessor was set when creating VPlan's skeleton.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment needs to be updated - the top region itself has yet to be introduced, rather than its predecessor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

assert(isHeaderVPBB(VPBB) && "isHeaderBB and isHeaderVPBB disagree");
if (TheRegion != Region)
if (LoopForBB != TheLoop)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fold this if with its preceding else?
Check instead if (Region)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

setRegionPredsFromBB(Region, BB);
}

// Create VPInstructions for BB.
createVPInstructionsForVPBB(VPBB, BB);

if (TheLoop->getLoopLatch() == BB) {
VPBB->setOneSuccessor(VectorLatchVPBB);
VectorLatchVPBB->clearPredecessors();
VectorLatchVPBB->setPredecessors({VPBB});
if (BB == TheLoop->getLoopLatch()) {
VPBasicBlock *HeaderVPBB = getOrCreateVPBB(LoopForBB->getHeader());
VPBlockUtils::connectBlocks(VPBB, HeaderVPBB);
continue;
}

Expand Down Expand Up @@ -449,16 +427,22 @@ void PlainCFGBuilder::buildPlainCFG(
VPBasicBlock *Successor0 = getOrCreateVPBB(IRSucc0);
VPBasicBlock *Successor1 = getOrCreateVPBB(IRSucc1);
if (BB == LoopForBB->getLoopLatch()) {
// For a latch we need to set the successor of the region rather than that
// of VPBB and it should be set to the exit, i.e., non-header successor,
// For a latch we need to set the successor of the region rather
// than that
// of VPBB and it should be set to the exit, i.e., non-header
// successor,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed thanks

// except for the top region, whose successor was set when creating
// VPlan's skeleton.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// except for the top region, whose successor was set when creating
// VPlan's skeleton.
// except for the top region, which is handled elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

assert(TheRegion != Region &&
assert(LoopForBB != TheLoop &&
"Latch of the top region should have been handled earlier");
Region->setOneSuccessor(isHeaderVPBB(Successor0) ? Successor1
: Successor0);
Region->setExiting(VPBB);
continue;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this continue;?
How does this work with the following code being unreachable - is it really needed? tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code was left over from earlier iterations, removed, thanks

VPBasicBlock *HeaderVPBB = getOrCreateVPBB(LoopForBB->getHeader());
VPBlockUtils::connectBlocks(VPBB, HeaderVPBB);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be folded with handling TheLoop's latch above, with an early continue if LoopForBB == TheLoop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code here not needed, removed, thanks

continue;
}

// Don't connect any blocks outside the current loop except the latch for
Expand All @@ -482,6 +466,9 @@ void PlainCFGBuilder::buildPlainCFG(
// corresponding VPlan operands.
fixHeaderPhis();

VPBlockUtils::connectBlocks(Plan.getEntry(),
getOrCreateVPBB(TheLoop->getHeader()));

for (const auto &[IRBB, VPB] : BB2VPBB)
VPB2IRBB[VPB] = IRBB;
}
Expand Down
76 changes: 76 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,82 @@

using namespace llvm;

void VPlanTransforms::introduceTopLevelVectorLoopRegion(
VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop) {
auto *HeaderVPBB = cast<VPBasicBlock>(Plan.getEntry()->getSingleSuccessor());
VPBlockUtils::disconnectBlocks(Plan.getEntry(), HeaderVPBB);

VPBasicBlock *OriginalLatch =
cast<VPBasicBlock>(HeaderVPBB->getSinglePredecessor());
VPBlockUtils::disconnectBlocks(OriginalLatch, HeaderVPBB);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert OriginalLatch is now free of successors to retain shallow dfs scan below rather than RPOTing until latch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doje, thanks

VPBasicBlock *VecPreheader = Plan.createVPBasicBlock("vector.ph");
VPBlockUtils::connectBlocks(Plan.getEntry(), VecPreheader);

// Create SCEV and VPValue for the trip count.
// We use the symbolic max backedge-taken-count, which works also when
// vectorizing loops with uncountable early exits.
const SCEV *BackedgeTakenCountSCEV = PSE.getSymbolicMaxBackedgeTakenCount();
assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCountSCEV) &&
"Invalid loop count");
ScalarEvolution &SE = *PSE.getSE();
const SCEV *TripCount = SE.getTripCountFromExitCount(BackedgeTakenCountSCEV,
InductionTy, TheLoop);
Plan.setTripCount(
vputils::getOrCreateVPValueForSCEVExpr(Plan, TripCount, SE));

// Create VPRegionBlock, with empty header and latch blocks, to be filled
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Create VPRegionBlock, with empty header and latch blocks, to be filled
// Create VPRegionBlock, with existing header and new empty latch block, to be filled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

// during processing later.
VPBasicBlock *LatchVPBB = Plan.createVPBasicBlock("vector.latch");
VPBlockUtils::insertBlockAfter(LatchVPBB, OriginalLatch);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a new LatchVPBB needed or can OriginalLatch continue to serve as the region's latch? Initially separate empty header and latch were created here to support subsequent introduction of VPBB's in between.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some places where we rely on there to the a separate latch, e.g. when adding canonical IV related recipes. Those could be updated separately.

auto *TopRegion = Plan.createVPRegionBlock(
HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);
for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use RPOT and break when reaching LatchVPBB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left as is for now, as this is just to set the parent, so RPOT should not be needed

VPBB->setParent(TopRegion);
}

VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
VPBasicBlock *MiddleVPBB = Plan.createVPBasicBlock("middle.block");
VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);

VPBasicBlock *ScalarPH = Plan.createVPBasicBlock("scalar.ph");
VPBlockUtils::connectBlocks(ScalarPH, Plan.getScalarHeader());
if (!RequiresScalarEpilogueCheck) {
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
return;
}

// If needed, add a check in the middle block to see if we have completed
// all of the iterations in the first vector loop. Three cases:
// 1) If (N - N%VF) == N, then we *don't* need to run the remainder.
// Thus if tail is to be folded, we know we don't need to run the
// remainder and we can set the condition to true.
// 2) If we require a scalar epilogue, there is no conditional branch as
// we unconditionally branch to the scalar preheader. Do nothing.
// 3) Otherwise, construct a runtime check.
BasicBlock *IRExitBlock = TheLoop->getUniqueLatchExitBlock();
auto *VPExitBlock = Plan.getExitBlock(IRExitBlock);
// The connection order corresponds to the operands of the conditional branch.
VPBlockUtils::insertBlockAfter(VPExitBlock, MiddleVPBB);
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);

auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
// Here we use the same DebugLoc as the scalar loop latch terminator instead
// of the corresponding compare because they may have ended up with
// different line numbers and we want to avoid awkward line stepping while
// debugging. Eg. if the compare has got a line number inside the loop.
VPBuilder Builder(MiddleVPBB);
VPValue *Cmp =
TailFolded
? Plan.getOrAddLiveIn(ConstantInt::getTrue(
IntegerType::getInt1Ty(TripCount->getType()->getContext())))
: Builder.createICmp(CmpInst::ICMP_EQ, Plan.getTripCount(),
&Plan.getVectorTripCount(),
ScalarLatchTerm->getDebugLoc(), "cmp.n");
Builder.createNaryOp(VPInstruction::BranchOnCond, {Cmp},
ScalarLatchTerm->getDebugLoc());
}

void VPlanTransforms::VPInstructionsToVPRecipes(
VPlanPtr &Plan,
function_ref<const InductionDescriptor *(PHINode *)>
Expand Down
15 changes: 15 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,21 @@ struct VPlanTransforms {
verifyVPlanIsValid(Plan);
}

/// Introduce the top-level VPRegionBlock for the main loop in \p Plan. Coming
/// in this function, \p Plan's top-level loop is modeled using a plain CFG.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// in this function, \p Plan's top-level loop is modeled using a plain CFG.
/// into this function, \p Plan's top-level loop is modeled using a plain CFG.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

/// This transforms replaces the plain CFG with a VPRegionBlock wrapping the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// This transforms replaces the plain CFG with a VPRegionBlock wrapping the
/// This transform wraps the plain CFG of the top-level loop within a VPRegionBlock

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

/// top-level loop and creates a VPValue expressions for the original trip
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// top-level loop and creates a VPValue expressions for the original trip
/// and creates a VPValue expressions for the original trip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

/// count. It will also introduce a dedicated VPBasicBlock for the vector
/// pre-header as well a VPBasicBlock as exit block of the region
/// (middle.block). If a check is needed to guard executing the scalar
/// epilogue loop, it will be added to the middle block, together with
/// VPBasicBlocks for the scalar preheader and exit blocks. \p InductionTy is
/// the type of the canonical induction and used for related values, like the
/// trip count expression.
static void introduceTopLevelVectorLoopRegion(
VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop);

/// Replaces the VPInstructions in \p Plan with corresponding
/// widen recipes.
static void
Expand Down
Loading
Loading