Skip to content
86 changes: 65 additions & 21 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8309,7 +8309,7 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
: GEPNoWrapFlags::none(),
I->getDebugLoc());
}
Builder.getInsertBlock()->appendRecipe(VectorPtr);
VectorPtr->insertBefore(&*Builder.getInsertPoint());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems awkward, would be better to have VPBuilder support

Suggested change
VectorPtr->insertBefore(&*Builder.getInsertPoint());
Builder.insert(VectorPtr);

as inspired by IRBuilderBase?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insert at insert point rather than at the end of the insert block - is this change in/dependent of rest of patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split off to f8fa931, thanks

Ptr = VectorPtr;
}
if (LoadInst *Load = dyn_cast<LoadInst>(I))
Expand Down Expand Up @@ -9221,6 +9221,7 @@ static void addExitUsersForFirstOrderRecurrences(
VPlanPtr
LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {

using namespace llvm::VPlanPatternMatch;
SmallPtrSet<const InterleaveGroup<Instruction> *, 1> InterleaveGroups;

// ---------------------------------------------------------------------------
Expand All @@ -9244,6 +9245,10 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
PSE, RequiresScalarEpilogueCheck,
CM.foldTailByMasking(), OrigLoop);

// Build hierarchical CFG.
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
HCFGBuilder.buildHierarchicalCFG();
Comment on lines +9325 to +9327
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps VPlanHCFGBuilder::buildingHierarchicalCFG() which creates initial VPBB's inside loop region ( via PlainCFGBuilder::buildPlainCFG()) should now be combined with VPlan::createInitialVPlan() which creates initial VPBB's elsewhere?
Can be done as follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, maybe best to merge as follow-up


// Don't use getDecisionAndClampRange here, because we don't know the UF
// so this function is better to be conservative, rather than to split
// it up into different VPlans.
Expand Down Expand Up @@ -9312,23 +9317,45 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
RecipeBuilder.collectScaledReductions(Range);

auto *MiddleVPBB = Plan->getMiddleBlock();
ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the above comment over here:

Suggested change
ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
// Scan the body of the loop in a topological order to visit each basic block
// after having visited its predecessor basic blocks.
ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

Plan->getVectorLoopRegion()->getEntry());

VPBasicBlock::iterator MBIP = MiddleVPBB->getFirstNonPhi();
for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also remove

  LoopBlocksDFS DFS(OrigLoop);
  DFS.perform(LI);

above which become dead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

// Relevant instructions from basic block BB will be grouped into VPRecipe
// ingredients and fill a new VPBasicBlock.
if (VPBB != HeaderVPBB)
VPBB->setName(BB->getName());
Builder.setInsertPoint(VPBB);
VPBlockBase *PrevVPBB = nullptr;
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
// Skip VPBBs not corresponding to any input IR basic blocks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the nature of RPOT, if a VPBB w/o IRBB is encountered, can break instead of continue?

This should iterate over all blocks of Plan->getVectorLoopRegion(), starting with its entry and ending with its exiting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to break, thanks.

It already only visits the loops in the region. We always have a latch VPBB, which isn't associated with an IRBB, which is what this covered

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Skip VPBBs not corresponding to any input IR basic blocks.
// Handle VPBBs down to the latch.

Latch also conceptually has a corresponding IRBB, with underlying Instructions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks. Initially the latch in the IR is not the latch/exiting block in the VPlan; the exiting block is created as part of the skeleton.

if (!HCFGBuilder.getIRBBForVPB(VPBB))
continue;

if (VPBB == HeaderVPBB)
// Create mask based on the IR BB corresponding to VPBB.
// TODO: Predicate directly based on VPlan.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we set the insert point of Builder once, here, as done now, possibly to first non-Phi of VPBB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that doesn't play nice with remove the recipes.

if (VPBB == HeaderVPBB) {
Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());
RecipeBuilder.createHeaderMask();
else if (NeedsMasks)
RecipeBuilder.createBlockInMask(BB);
} else if (NeedsMasks) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the header mask also be created only if NeedsMasks? (Seems independent)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently createHeaderMask itself checks if tail folding is enabled, and only then creates the mask

Builder.setInsertPoint(VPBB, VPBB->begin());
RecipeBuilder.createBlockInMask(HCFGBuilder.getIRBBForVPB(VPBB));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time to move edge masks and block masks to be cached according to VPBB's rather than IRBB's?
Rather than hacking VPlanHCFGBuilder and PlainCFGBuilder to record and retrieve IRBB for VPBB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is somewhat a chicken-and-egg problem. Without this patch, we cannot do that, because we only ever create the flattened CFG in VPlan. I might be missing another option, but I think once this patch lands as first step we can move the mask creation to be based on the VPBlocks. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a try, but it doesn't work out with the flattening of the CFG here, because we might already removed edges earlier before querying a mask on that edge.

We could first compute all masks in a separate loop, but there also are quite a few places that need updating, include BasicBlock -> VPBasicBlock, various places that get the mask for the parent of an instruction, looking for BranchOnCond instead of BranchInst, so that would also increase the diff quite a bit unfortunately. It would probably be better to tackle that separately, also to make testing easier.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, another suggestion: getIRBBForVPB() is needed only here - for non-header (non-latch) (non-empty?) VPBB, whose recipes are all expected to have underlying Instructions, which know their parental BasicBlock? Can the latter be retrieved as in below rather than hacking HCFGBuilder to record and expose this VRBB-to-IRBB mapping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfrotunately we don't create recipes for branches, so there may be empty blocks for which we would need to create masks (because there succesors may require them at the moment).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's go with HCFGBuilder.getIRBBForVPB(), for now ...
Following our roadmap, HCFGBuilder would be the VPlan-to-VPlan transformation producing "buildLoop" from "wrapInput", and tryToBuildVPlanWithRecipes would follow as a subsequent VPlan-to-VPlan transformation(s). Obtaining this VPBB-to-IRBB mapping from HCFGBuilder should be removed in one of two ways (or more):

  1. If the buildLoop VPlan is to maintain its correspondence with IR basic blocks then a new type of VPBB can be introduced - one providing getIRBasicBlock() similar to VPIRBasicBlock but having an execute() similar to VPBasicBlock - that generates new basic blocks, or deemed abstract/unreachable - to be materialized into a concreate VPBB prior to codegen.
  2. If, OTOH, getIRBBForVPB() is meant only as a temporary solution to support createBlockInMask() until the latter is upgraded to work directly on VPBB's, then a FIXME should be added.

(Recipes are created for internal conditional branches but not for unconditional branches (including early exits) - which imply a single successor. An empty block also implies a single predecessor due to lack of phi recipes. Figuring out the BasicBlock from the single predecessor and/or successor of an empty VPBB seems cumbersome; getIRBBForVPB() could record the mapping for empty VPBB's only; simplest to probably to record all VPBB's in loop, for now.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a FXIME to the declaration, thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RecipeBuilder.createBlockInMask(HCFGBuilder.getIRBBForVPB(VPBB));
auto *SingleDef = cast<VPSingleDefRecipe>(VPBB->begin());
Instruction *Instr = SingleDef->getUnderlyingInstr();
RecipeBuilder.createBlockInMask(Instr->getParent());

?

}

// Introduce each ingredient into VPlan.
// Convert input VPInstructions to widened recipes.
// TODO: Model and preserve debug intrinsics in VPlan.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO corresponds to traversing BB's insns w/o debug, which no longer appears here. Belongs in PlainCFGBuilder::createVPInstructionsForVPBB()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

for (Instruction &I : drop_end(BB->instructionsWithoutDebug(false))) {
Instruction *Instr = &I;
for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
auto *SingleDef = dyn_cast<VPSingleDefRecipe>(&R);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If dyn_cast need to also check below if SingleDef (is null).
Only SingleDef's are expected, so better replace dyn_cast with cast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, should be cast, updated thanks!

if (!isa<VPWidenPHIRecipe>(&R) &&
(!isa<VPInstruction>(SingleDef) || !SingleDef->getUnderlyingValue()))
continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an assert - expecting only VPWidenPHIRecipes or VPInstructions with underlying values?

In any case, perhaps DeMorganizing would be more readable:

!(isa<VPWidenPHIRecipe>(SingleDef) ||
 (isa<VPInstruction>(SingleDef) && SingleDef->getUnderlyingValue()))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to skip only the recipes that explicitly don't need transformation + a comment and assert.


if (match(&R, m_BranchOnCond(m_VPValue())) ||
(isa<VPInstruction>(&R) &&
cast<VPInstruction>(&R)->getOpcode() == Instruction::Switch)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can match via some m_Switch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately at the moment the matchers require a fixed number of operands; need to think about how to match without a fixed number of ops.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then perhaps more consistent / simpler to do:

Suggested change
if (match(&R, m_BranchOnCond(m_VPValue())) ||
(isa<VPInstruction>(&R) &&
cast<VPInstruction>(&R)->getOpcode() == Instruction::Switch)) {
if (isa<VPInstruction>(&R) &&
(cast<VPInstruction>(&R)->getOpcode() == VPInstruction::BranchOnCond ||
(cast<VPInstruction>(&R)->getOpcode() == Instruction::Switch)) {

(once VPWidenPHIRecipe is turned into a VPInstruction this would be simpler)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

R.eraseFromParent();
break;
}

// TODO: Gradually replace uses of underlying instruction by analyses on
// VPlan.
Instruction *Instr = SingleDef->getUnderlyingInstr();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: define Instr earlier to be reused above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getUnderlyingInstr will assert if there's no underlying value, left here for now

Builder.setInsertPoint(SingleDef);
SmallVector<VPValue *, 4> Operands;
auto *Phi = dyn_cast<PHINode>(Instr);
if (Phi && Phi->getParent() == HeaderBB) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth commenting that the other operand of the header phi - the one across the back-edge, will be added later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

Expand All @@ -9343,15 +9370,18 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
// in the exit block, a uniform store recipe will be created for the final
// invariant store of the reduction.
StoreInst *SI;
if ((SI = dyn_cast<StoreInst>(&I)) &&
if ((SI = dyn_cast<StoreInst>(Instr)) &&
Legal->isInvariantAddressOfReduction(SI->getPointerOperand())) {
// Only create recipe for the final invariant store of the reduction.
if (!Legal->isInvariantStoreOfReduction(SI))
if (!Legal->isInvariantStoreOfReduction(SI)) {
R.eraseFromParent();
continue;
}
auto *Recipe = new VPReplicateRecipe(
SI, RecipeBuilder.mapToVPValues(Instr->operands()),
true /* IsUniform */);
Recipe->insertBefore(*MiddleVPBB, MBIP);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps simpler to do:

Suggested change
if (!Legal->isInvariantStoreOfReduction(SI)) {
R.eraseFromParent();
continue;
}
auto *Recipe = new VPReplicateRecipe(
SI, RecipeBuilder.mapToVPValues(Instr->operands()),
true /* IsUniform */);
Recipe->insertBefore(*MiddleVPBB, MBIP);
if (Legal->isInvariantStoreOfReduction(SI)) {
auto *Recipe = new VPReplicateRecipe(
SI, RecipeBuilder.mapToVPValues(Instr->operands()),
true /* IsUniform */);
Recipe->insertBefore(*MiddleVPBB, MBIP);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

R.eraseFromParent();
continue;
}

Expand All @@ -9370,16 +9400,30 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
// after them)
// * Optimizing truncates to VPWidenIntOrFpInductionRecipe.

assert((HeaderVPBB->getFirstNonPhi() == VPBB->end() ||
CM.foldTailByMasking() || isa<TruncInst>(Instr)) &&
"unexpected recipe needs moving");
Comment on lines -9450 to -9452
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert is associated with the preceding explanation. Both become obsolete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to only move truncated inductions, for which it still is needed

Recipe->insertBefore(*HeaderVPBB, HeaderVPBB->getFirstNonPhi());
} else
VPBB->appendRecipe(Recipe);
Recipe->insertBefore(&R);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Builder to place Recipe in its insert point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

if (Recipe->getNumDefinedValues() == 1)
SingleDef->replaceAllUsesWith(Recipe->getVPSingleValue());
else
assert(Recipe->getNumDefinedValues() == 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(Recipe->getNumDefinedValues() == 0);
assert(Recipe->getNumDefinedValues() == 0 && "Unexpected multidef recipe");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added thanks

R.eraseFromParent();
}

VPBlockUtils::insertBlockAfter(Plan->createVPBasicBlock(""), VPBB);
VPBB = cast<VPBasicBlock>(VPBB->getSingleSuccessor());
// Flatten the CFG in the loop. Masks for blocks have already been generated
// and added to recipes as needed. To do so, first disconnect VPBB from its
// predecessors and successors, except the exiting block. Then connect VPBB
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's special about the exiting block, want to retain edges that leave the loop to exit blocks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing special, just needed a connect when breaking on reaching the exiting block. Remove logic here, thanks

// to the previously visited VPBB.
for (auto *Succ : to_vector(VPBB->getSuccessors())) {
if (Succ == Plan->getVectorLoopRegion()->getExiting())
continue;
VPBlockUtils::disconnectBlocks(VPBB, Succ);
}
for (auto *Pred : to_vector(VPBB->getPredecessors()))
VPBlockUtils::disconnectBlocks(Pred, VPBB);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this doing anything, given that we're already visited Pred earlier in RPOT and disconnected it from all its successors? Can assert the absence of any predecessor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed in the latest version, removed thanks

if (PrevVPBB)
VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
PrevVPBB = VPBB;
}

// After here, VPBB should not be used.
Expand Down
8 changes: 5 additions & 3 deletions llvm/lib/Transforms/Vectorize/VPlan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -587,9 +587,11 @@ static bool hasConditionalTerminator(const VPBasicBlock *VPBB) {
}

const VPRecipeBase *R = &VPBB->back();
bool IsCondBranch = isa<VPBranchOnMaskRecipe>(R) ||
match(R, m_BranchOnCond(m_VPValue())) ||
match(R, m_BranchOnCount(m_VPValue(), m_VPValue()));
bool IsCondBranch =
isa<VPBranchOnMaskRecipe>(R) || match(R, m_BranchOnCond(m_VPValue())) ||
match(R, m_BranchOnCount(m_VPValue(), m_VPValue())) ||
(isa<VPInstruction>(R) &&
cast<VPInstruction>(R)->getOpcode() == Instruction::Switch);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool IsCondBranch =
isa<VPBranchOnMaskRecipe>(R) || match(R, m_BranchOnCond(m_VPValue())) ||
match(R, m_BranchOnCount(m_VPValue(), m_VPValue())) ||
(isa<VPInstruction>(R) &&
cast<VPInstruction>(R)->getOpcode() == Instruction::Switch);
bool IsCondBranch = isa<VPBranchOnMaskRecipe>(R) ||
match(R, m_BranchOnCond(m_VPValue())) ||
match(R, m_BranchOnCount(m_VPValue(), m_VPValue()));
bool IsSwitch = isa<VPInstruction>(R) &&
cast<VPInstruction>(R)->getOpcode() == Instruction::Switch);

plus asserting below that, if, VPBB has at least 2 successors, then

    assert((IsCondBranch || IsSwitch) && "block with multiple successors not terminated by "
                           "conditional branch nor switch recipe");

Switches are allowed to have a single successor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

Yes I think switches can have a single successor (just the default destination)

(void)IsCondBranch;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool IsCondBranch =
isa<VPBranchOnMaskRecipe>(R) || match(R, m_BranchOnCond(m_VPValue())) ||
match(R, m_BranchOnCount(m_VPValue(), m_VPValue())) ||
(isa<VPInstruction>(R) &&
cast<VPInstruction>(R)->getOpcode() == Instruction::Switch);
(void)IsCondBranch;
bool IsCondBranch = isa<VPBranchOnMaskRecipe>(R) ||
match(R, m_BranchOnCond(m_VPValue())) ||
match(R, m_BranchOnCount(m_VPValue(), m_VPValue()));
bool IsSwitch = isa<VPInstruction>(R) &&
cast<VPInstruction>(R)->getOpcode() == Instruction::Switch);
(void)IsCondBranch;
(void)IsSwitch;

plus asserting below that, if, VPBB has at least 2 successors, then

    assert((IsCondBranch || IsSwitch) && "block with multiple successors not terminated by "
                           "conditional branch nor switch recipe");

Switches can have a single successor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks


if (VPBB->getNumSuccessors() >= 2 ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can check >2 case separately expecting IsSwitch only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

Expand Down
32 changes: 26 additions & 6 deletions llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ class PlainCFGBuilder {
: TheLoop(Lp), LI(LI), Plan(P) {}

/// Build plain CFG for TheLoop and connects it to Plan's entry.
void buildPlainCFG();
void buildPlainCFG(DenseMap<VPBlockBase *, BasicBlock *> &VPB2IRBB);
};
} // anonymous namespace

Expand Down Expand Up @@ -238,9 +238,9 @@ bool PlainCFGBuilder::isExternalDef(Value *Val) {
return false;

// Check whether Instruction definition is in the loop exit.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Check whether Instruction definition is in the loop exit.
// Check whether Instruction definition is in a loop exit.

?

Is this notion of "External Def" still valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I think so as

BasicBlock *Exit = TheLoop->getUniqueExitBlock();
assert(Exit && "Expected loop with single exit.");
if (InstParent == Exit) {
SmallVector<BasicBlock *> ExitBlocks;
TheLoop->getExitBlocks(ExitBlocks);
if (is_contained(ExitBlocks, InstParent)) {
// Instruction definition is in outermost loop exit.
return false;
}
Comment on lines +241 to 246
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer expecting a loop with single exit?
Teaching PlainCFGBuilder about multiple exits - does this imply that "native" can now vectorize such outerloops, and worth testing?
OTOH, PlainCFGBuilder::buildPlainCFG() does handle multiple/non-unique exits(?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed for the inner loop multi-exit support. The native path still won't vectorize early exits, due to checks in legality I think. I can add a test case, thanks

Expand Down Expand Up @@ -308,6 +308,14 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
continue;
}

if (auto *SI = dyn_cast<SwitchInst>(Inst)) {
SmallVector<VPValue *> Ops = {getOrCreateVPOperand(SI->getCondition())};
for (auto Case : SI->cases())
Ops.push_back(getOrCreateVPOperand(Case.getCaseValue()));
VPIRBuilder.createNaryOp(Instruction::Switch, Ops, Inst);
continue;
}

Comment on lines +312 to +319
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Teaching PlainCFGBuilder about switch statements - does this imply that "native" can now vectorize outerloops with switches, and worth testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The native path still won't vectorize switches, due to checks in legality I think. I can add a test case, thanks

VPValue *NewVPV;
if (auto *Phi = dyn_cast<PHINode>(Inst)) {
// Phi node's operands may have not been visited at this point. We create
Expand All @@ -334,7 +342,8 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
}

// Main interface to build the plain CFG.
void PlainCFGBuilder::buildPlainCFG() {
void PlainCFGBuilder::buildPlainCFG(
DenseMap<VPBlockBase *, BasicBlock *> &VPB2IRBB) {
// 0. Reuse the top-level region, vector-preheader and exit VPBBs from the
// skeleton. These were created directly rather than via getOrCreateVPBB(),
// revisit them now to update BB2VPBB. Note that header/entry and
Expand Down Expand Up @@ -423,6 +432,14 @@ void PlainCFGBuilder::buildPlainCFG() {
// Set VPBB successors. We create empty VPBBs for successors if they don't
// exist already. Recipes will be created when the successor is visited
// during the RPO traversal.
if (auto *SI = dyn_cast<SwitchInst>(BB->getTerminator())) {
SmallVector<VPBlockBase *> Succs = {
getOrCreateVPBB(SI->getDefaultDest())};
for (auto Case : SI->cases())
Succs.push_back(getOrCreateVPBB(Case.getCaseSuccessor()));
VPBB->setSuccessors(Succs);
continue;
}
auto *BI = cast<BranchInst>(BB->getTerminator());
unsigned NumSuccs = succ_size(BB);
if (NumSuccs == 1) {
Expand Down Expand Up @@ -476,11 +493,14 @@ void PlainCFGBuilder::buildPlainCFG() {
// have a VPlan couterpart. Fix VPlan phi nodes by adding their corresponding
// VPlan operands.
fixPhiNodes();

for (const auto &[IRBB, VPB] : BB2VPBB)
VPB2IRBB[VPB] = IRBB;
}

void VPlanHCFGBuilder::buildPlainCFG() {
PlainCFGBuilder PCFGBuilder(TheLoop, LI, Plan);
PCFGBuilder.buildPlainCFG();
PCFGBuilder.buildPlainCFG(VPB2IRBB);
}

// Public interface to build a H-CFG.
Expand Down
10 changes: 10 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,10 @@ class VPlanHCFGBuilder {
// are introduced.
VPDominatorTree VPDomTree;

/// Map of create VP blocks to their input IR basic blocks, if they have been
/// created for a input IR basic block.
DenseMap<VPBlockBase *, BasicBlock *> VPB2IRBB;

/// Build plain CFG for TheLoop and connects it to Plan's entry.
void buildPlainCFG();

Expand All @@ -62,6 +66,12 @@ class VPlanHCFGBuilder {

/// Build H-CFG for TheLoop and update Plan accordingly.
void buildHierarchicalCFG();

/// Return the input IR BasicBlock corresponding to \p VPB. Returns nullptr if
/// there is no such corresponding block.
BasicBlock *getIRBBForVPB(const VPBlockBase *VPB) const {
return VPB2IRBB.lookup(VPB);
}
};
} // namespace llvm

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
; CHECK-NEXT: LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
; CHECK-NEXT: LV: Using user VF vscale x 4.
; CHECK-NEXT: LV: Loop does not require scalar epilogue
; CHECK-NEXT: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
; CHECK: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why/Is this line no longer NEXT? Same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now also include printing the VPlan after HCFG construction, should add a test to check that or drop the extra output?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, whatever you prefer.

; CHECK-NEXT: LV: Scalarizing: %idxprom = zext i32 %i.0 to i64
; CHECK-NEXT: LV: Scalarizing: %arrayidx = getelementptr inbounds i32, ptr %B, i64 %idxprom
; CHECK-NEXT: LV: Scalarizing: %arrayidx3 = getelementptr inbounds i32, ptr %A, i64 %idxprom
Expand Down Expand Up @@ -295,7 +295,7 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur
; CHECK-NEXT: LV: Found an estimated cost of 0 for VF vscale x 4 For instruction: br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !0
; CHECK-NEXT: LV: Using user VF vscale x 4.
; CHECK-NEXT: LV: Loop does not require scalar epilogue
; CHECK-NEXT: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
; CHECK: LV: Scalarizing: %i.0 = add nsw i32 %i.0.in8, -1
; CHECK-NEXT: LV: Scalarizing: %idxprom = zext i32 %i.0 to i64
; CHECK-NEXT: LV: Scalarizing: %arrayidx = getelementptr inbounds float, ptr %B, i64 %idxprom
; CHECK-NEXT: LV: Scalarizing: %arrayidx3 = getelementptr inbounds float, ptr %A, i64 %idxprom
Expand Down