Skip to content

[VPlan] Add initial CFG simplification, removing BranchOnCond true. #106748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 4, 2025
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 9 additions & 6 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2502,12 +2502,13 @@ void InnerLoopVectorizer::introduceCheckBlockInVPlan(BasicBlock *CheckIRBB) {
PreVectorPH->swapSuccessors();

// We just connected a new block to the scalar preheader. Update all
// ResumePhis by adding an incoming value for it.
// ResumePhis by adding an incoming value for it, replacing the last value.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be

Suggested change
// ResumePhis by adding an incoming value for it, replacing the last value.
// ResumePhis by adding an incoming value for it, replicating the last value.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

for (VPRecipeBase &R : *cast<VPBasicBlock>(ScalarPH)) {
auto *ResumePhi = dyn_cast<VPInstruction>(&R);
if (!ResumePhi || ResumePhi->getOpcode() != VPInstruction::ResumePhi)
continue;
ResumePhi->addOperand(ResumePhi->getOperand(1));
ResumePhi->addOperand(
ResumePhi->getOperand(ResumePhi->getNumOperands() - 1));
}
}

Expand Down Expand Up @@ -2676,7 +2677,6 @@ void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
LoopScalarPreHeader =
SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->getTerminator(), DT,
LI, nullptr, Twine(Prefix) + "scalar.ph");
replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader);
}

/// Return the expanded step for \p ID using \p ExpandedSCEVs to look up SCEV
Expand Down Expand Up @@ -2809,6 +2809,7 @@ BasicBlock *InnerLoopVectorizer::createVectorizedLoopSkeleton(
// faster.
emitMemRuntimeChecks(LoopScalarPreHeader);

replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How/Is this move dependent - replacing the scalar preheader VPBB with IRBB here instead of earlier when calling createVectorLoopSkeleton() above?
(Here being createVectorizedLoopSkeleton() and its overridings - better have more distinct names, independently).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the original points, the scalar PH may be unreachable, which means at the moment we cannot use getPlan() . Calling it later ensures it will be connected, for now.

Could independently improve this, by either storing parent plan in all VPBBs (not just the entries) or passing Plan to replaceVPBBWithIRVPBB

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original position seems (more) reasonable, being right after LoopScalarPreHeader is built. Worth leaving behind some comment why this replacement is currently done later, "for now"?

VPlan is assumed to always be connected, with all its VPBB's reachable from its entry. Can the original points maintain this connectivity, w/o storing the parental plan in all VPBBs nor passing Plan to replaceVPBBWithIRVPBB()?

BTW, would be good to clarify that VPlan::getPlanEntry() avoids going into an infinite loop, if invoked on flat region-less cyclic CFG, based on visiting operands in order, and relying on the operand associated with the preheader block to appear (first) before that of the latch when visiting header phis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I added a note at the original place.

We could store the plan in all blocks w/o a parent region, there is already a field in all blocks to do so?

BTW, would be good to clarify that VPlan::getPlanEntry() avoids going into an infinite loop, if invoked on flat region-less cyclic CFG, based on visiting operands in order, and relying on the operand associated with the preheader block to appear (first) before that of the latch when visiting header phis.
getPlanEntry at the moment uses a SmallSetVector for its worklist, which naturally avoids infinite cycles.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could store the plan in all blocks w/o a parent region, there is already a field in all blocks to do so?

Agreed, it seems better to store the plan for orphan blocks in their existing field rather than null, at-least for unreachable blocks, although best maintain connectivity rather than have a block point to a plan which in turn cannot reach it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we will be able to do so once we move to model the full skeleton independently in VPlan and not rely on legacy's skeleton creation.

return LoopVectorPreHeader;
}

Expand Down Expand Up @@ -7567,10 +7568,10 @@ VectorizationFactor LoopVectorizationPlanner::computeBestVF() {
// Set PlanForEarlyExitLoop to true if the BestPlan has been built from a
// loop with an uncountable early exit. The legacy cost model doesn't
// properly model costs for such loops.
auto ExitBlocks = BestPlan.getExitBlocks();
bool PlanForEarlyExitLoop =
BestPlan.getVectorLoopRegion() &&
BestPlan.getVectorLoopRegion()->getSingleSuccessor() !=
BestPlan.getMiddleBlock();
ExitBlocks.size() > 1 ||
(ExitBlocks.size() == 1 && ExitBlocks[0]->getNumPredecessors() > 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(ExitBlocks.size() == 1 && ExitBlocks[0]->getNumPredecessors() > 1);
ExitBlocks[0]->getNumPredecessors() > 1;

asserting that ExitBlocks.size() >= 1?

Is this change independent?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should be gone in the latest version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree with @ayalz, this does look like an independent change. Is it worth a separate NFC patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the changes, they are not needed in the latest version. I can put them up separately, as this might be slightly simpler than having to retrieve middle block and vector loop region.

assert((BestFactor.Width == LegacyVF.Width || PlanForEarlyExitLoop ||
planContainsAdditionalSimplifications(getPlanFor(BestFactor.Width),
CostCtx, OrigLoop) ||
Expand Down Expand Up @@ -7883,6 +7884,7 @@ BasicBlock *EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton(
// Generate the induction variable.
EPI.VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);

replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader);
return LoopVectorPreHeader;
}

Expand Down Expand Up @@ -8037,6 +8039,7 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton(
// resume value for the induction variable comes from the trip count of the
// main vector loop, passed as the second argument.
createInductionAdditionalBypassValues(ExpandedSCEVs, EPI.VectorTripCount);
replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader);
return LoopVectorPreHeader;
}

Expand Down
16 changes: 11 additions & 5 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -3529,12 +3529,18 @@ class VPlan {

/// Returns the 'middle' block of the plan, that is the block that selects
/// whether to execute the scalar tail loop or the exit block from the loop
/// latch.
const VPBasicBlock *getMiddleBlock() const {
return cast<VPBasicBlock>(getScalarPreheader()->getPredecessors().front());
}
/// latch. If the scalar tail loop or exit block are known to always execute,
/// the middle block may branch directly to the block.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// latch. If the scalar tail loop or exit block are known to always execute,
/// the middle block may branch directly to the block.
/// latch. If the middle block is known to always proceed to one of these two blocks, it may branch to it unconditionally.

Also explain about the early-exit case, where middle.block is the 2nd successor of middle.split block, as depicted in https://llvm.org/docs/Vectorizers.html#early-exit-vectorization, which corresponds to the scalar preheader being absent from RegionSucc's successors? The middle block in this case conceptually has three successors: scalar preheader, latch.exit, early.exit with the first two postponed to be a successor's successors.

Note that if the middle block branches unconditionally to exit block (or scalar preheader block), the two blocks may subsequently be merged, causing RegionSucc to have no successors (or be the scalar preheader itself).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment.

Note that if the middle block branches unconditionally to exit block (or scalar preheader block), the two blocks may subsequently be merged, causing RegionSucc to have no successors (or be the scalar preheader itself).

Yep, for now, we don't merge VPIRBBs into other blocks.

VPBasicBlock *getMiddleBlock() {
return cast<VPBasicBlock>(getScalarPreheader()->getPredecessors().front());
if (!getScalarPreheader()->getPredecessors().empty())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scalar preheader could also/only be reached from runtime guards that bypass the vector region and its middle block? Would it be better to retrieve the middle block from its predecessor vector loop region, than from its successors scalar preheader and/or exit blocks? Is the term "middle block" well defined, when early exits are involved which currently splits it into two blocks, rather than having a single block with three successors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

The middle block selects between the scalar preheader and the exit block from the latch, which is the same with and w/o early exits.

return cast<VPBasicBlock>(
getScalarPreheader()->getPredecessors().front());
if (getExitBlocks().size() == 1)
return cast<VPBasicBlock>(getExitBlocks()[0]->getPredecessors().front());
return nullptr;
}
const VPBasicBlock *getMiddleBlock() const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const VPBasicBlock *getMiddleBlock() const {
const VPBasicBlock *getMiddleBlock() const {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

return const_cast<VPlan *>(this)->getMiddleBlock();
}

/// Return the VPBasicBlock for the preheader of the scalar loop.
Expand Down
44 changes: 44 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1467,6 +1467,49 @@ void VPlanTransforms::truncateToMinimalBitwidths(
"some entries in MinBWs haven't been processed");
}

/// Remove BranchOnCond recipes with true conditions together with removing
/// dead edges to their successors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The successors across the removed edges are assumed to have ResumePhi recipes, which are fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a comment that said ResumePhis are expected to already be fixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, they are expected to be valid coming in and the transform will keep them valid.

static void simplifyCFG(VPlan &Plan) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a more modest name would be more accurate, at this stage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to simplifyBranchOnCondTrue, thanks

using namespace llvm::VPlanPatternMatch;
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
vp_depth_first_deep(Plan.getEntry()))) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the traversal be shallow, at-least while its candidates reside outside of regions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, done, thanks

if (VPBB->getNumSuccessors() != 2 ||
!match(&VPBB->back(), m_BranchOnCond(m_True())))
continue;

VPBasicBlock *RemovedSucc = cast<VPBasicBlock>(VPBB->getSuccessors()[1]);
const auto &Preds = RemovedSucc->getPredecessors();
unsigned DeadIdx = std::distance(Preds.begin(), find(Preds, VPBB));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert that VPBB feeds a single value to RemovedSucc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks


// Remove values coming from VPBB from phi-like recipes in RemovedSucc.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Remove values coming from VPBB from phi-like recipes in RemovedSucc.
// Values coming from VPBB into ResumePhi recipes of RemoveSucc are removed from these recipes.

clarifying double from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

for (VPRecipeBase &R : make_early_inc_range(*RemovedSucc)) {
assert((!isa<VPIRInstruction>(&R) ||
!isa<PHINode>(cast<VPIRInstruction>(&R)->getInstruction())) &&
!isa<VPHeaderPHIRecipe>(&R) &&
"Cannot update VPIRInstructions wrapping phis or header phis yet");
auto *VPI = dyn_cast<VPInstruction>(&R);
if (!VPI || VPI->getOpcode() != VPInstruction::ResumePhi)
break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better break if !isPhi(), as we're traversing all and only phi recipes which appear first in the block, and then assert that the phi is a ResumePhi recipe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do separately, as isPhi needs updating to consider ReusmePhi

VPBuilder B(VPI);
SmallVector<VPValue *> NewOps;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SmallVector<VPValue *> NewOps;
SmallVector<VPValue *> NewOperands;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

// Create new operand list, with the dead incoming value filtered out.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would erase()'ing the dying operand from VPI->operands be easier, perhaps with some removeOperand() API, than replacing the VPInstruction with a new one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, the question is how to best limit this to some recipes, as I think it only makes sense for phi-like recipes (or maybe just ResumePhi). Could do as follow-up?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, can be limited to ResumePhi, until needed elsewhere.

for (const auto &[Idx, Op] : enumerate(VPI->operands())) {
if (Idx == DeadIdx)
continue;
NewOps.push_back(Op);
}
VPI->replaceAllUsesWith(B.createNaryOp(VPInstruction::ResumePhi, NewOps,
VPI->getDebugLoc(),
VPI->getName()));
VPI->eraseFromParent();
}
// Disconnect blocks and remove the terminator. RemovedSucc will be deleted
// automatically on VPlan destruction.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// automatically on VPlan destruction.
// automatically on VPlan destruction if it becomes unreachable.

If RemovedSucc becomes unreachable, i.e., Preds consists of VPBB only, do the resumePhis need to be cleared, or better bail out early?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to clear them regardless, as the left-over users may pessimize transforms.

VPBlockUtils::disconnectBlocks(VPBB, RemovedSucc);
VPBB->back().eraseFromParent();
}
}

void VPlanTransforms::optimize(VPlan &Plan) {
runPass(removeRedundantCanonicalIVs, Plan);
runPass(removeRedundantInductionCasts, Plan);
Expand All @@ -1476,6 +1519,7 @@ void VPlanTransforms::optimize(VPlan &Plan) {
runPass(legalizeAndOptimizeInductions, Plan);
runPass(removeRedundantExpandSCEVRecipes, Plan);
runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType());
runPass(simplifyCFG, Plan);
runPass(removeDeadRecipes, Plan);

runPass(createAndOptimizeReplicateRegions, Plan);
Expand Down
14 changes: 6 additions & 8 deletions llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP6:%.*]] = mul i64 [[TMP5]], 8
; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(many test changes, yet to review)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above branch-on-false from entry to scalar preheader or vector preheader can/should also be eliminated, turning the scalar loop into unreachable dead code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, more to clean up :)

; CHECK-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 0, i64 8)
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 8 x i64> poison, i64 [[VAL]], i64 0
; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 8 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
Expand All @@ -42,10 +41,10 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
; CHECK-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 [[INDEX_NEXT]], i64 8)
; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch-on-true from vector.body to middle.block or back to itself, can/should also be eliminated - as part of optimizing a vector loop found to have a trip-count of 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, optimizeForVFAndUF removes the region in some cases, but not yet with active-lane-masks.

; CHECK: middle.block:
; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
; CHECK-NEXT: br label [[FOR_COND_CLEANUP:%.*]]
; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi ptr [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[DST]], [[ENTRY]] ]
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi ptr [ [[DST]], [[ENTRY]] ]
; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:
; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
Expand Down Expand Up @@ -101,7 +100,6 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP6:%.*]] = mul i64 [[TMP5]], 8
; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]]
; CHECK-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 0, i64 [[WIDE_TRIP_COUNT]])
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 8 x i64> poison, i64 [[VAL]], i64 0
; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 8 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
Expand All @@ -128,10 +126,10 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range
; CHECK-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 [[INDEX_NEXT]], i64 [[WIDE_TRIP_COUNT]])
; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: middle.block:
; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]]
; CHECK-NEXT: br label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]]
; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi ptr [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[DST]], [[FOR_BODY_PREHEADER]] ]
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ]
; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi ptr [ [[DST]], [[FOR_BODY_PREHEADER]] ]
; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:
; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -460,17 +460,17 @@ define void @latch_branch_cost(ptr %dst) {
; PRED-NEXT: [[TMP25:%.*]] = icmp eq i64 [[INDEX_NEXT]], 104
; PRED-NEXT: br i1 [[TMP25]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; PRED: middle.block:
; PRED-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
; PRED-NEXT: br label [[EXIT:%.*]]
; PRED: scalar.ph:
; PRED-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 104, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
; PRED-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
; PRED-NEXT: br label [[FOR_BODY:%.*]]
; PRED: loop:
; PRED-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
; PRED-NEXT: [[GEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[IV]]
; PRED-NEXT: store i8 0, ptr [[GEP]], align 1
; PRED-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[IV]], 1
; PRED-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 100
; PRED-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
; PRED-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
; PRED: exit:
; PRED-NEXT: ret void
;
Expand Down Expand Up @@ -713,9 +713,9 @@ define i32 @header_mask_and_invariant_compare(ptr %A, ptr %B, ptr %C, ptr %D, pt
; PRED-NEXT: [[TMP24:%.*]] = extractelement <4 x i1> [[TMP28]], i32 0
; PRED-NEXT: br i1 [[TMP24]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP17:![0-9]+]]
; PRED: middle.block:
; PRED-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
; PRED-NEXT: br label [[EXIT:%.*]]
; PRED: scalar.ph:
; PRED-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ], [ 0, [[VECTOR_MEMCHECK]] ]
; PRED-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ 0, [[VECTOR_MEMCHECK]] ]
; PRED-NEXT: br label [[LOOP_HEADER:%.*]]
; PRED: loop.header:
; PRED-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
Expand Down Expand Up @@ -821,9 +821,6 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
; PRED-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
; PRED-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; PRED-NEXT: [[TMP5:%.*]] = mul i64 [[TMP4]], 2
; PRED-NEXT: [[TMP3:%.*]] = mul i64 [[N_VEC]], 8
; PRED-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[TMP3]]
; PRED-NEXT: [[IND_END1:%.*]] = mul i64 [[N_VEC]], 2
; PRED-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
; PRED-NEXT: [[TMP7:%.*]] = mul i64 [[TMP6]], 2
; PRED-NEXT: [[TMP8:%.*]] = sub i64 257, [[TMP7]]
Expand All @@ -850,10 +847,10 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
; PRED-NEXT: [[TMP17:%.*]] = extractelement <vscale x 2 x i1> [[TMP16]], i32 0
; PRED-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]
; PRED: middle.block:
; PRED-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
; PRED-NEXT: br label [[EXIT:%.*]]
; PRED: scalar.ph:
; PRED-NEXT: [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[DST]], [[ENTRY:%.*]] ]
; PRED-NEXT: [[BC_RESUME_VAL2:%.*]] = phi i64 [ [[IND_END1]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
; PRED-NEXT: [[BC_RESUME_VAL:%.*]] = phi ptr [ [[DST]], [[ENTRY:%.*]] ]
; PRED-NEXT: [[BC_RESUME_VAL2:%.*]] = phi i64 [ 0, [[ENTRY]] ]
; PRED-NEXT: br label [[LOOP:%.*]]
; PRED: loop:
; PRED-NEXT: [[PTR_IV:%.*]] = phi ptr [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[PTR_IV_NEXT:%.*]], [[LOOP]] ]
Expand Down Expand Up @@ -978,9 +975,9 @@ define void @low_trip_count_fold_tail_scalarized_store(ptr %dst) {
; DEFAULT-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
; DEFAULT-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP24:![0-9]+]]
; DEFAULT: middle.block:
; DEFAULT-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
; DEFAULT-NEXT: br label [[EXIT:%.*]]
; DEFAULT: scalar.ph:
; DEFAULT-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 8, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
; DEFAULT-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
; DEFAULT-NEXT: br label [[LOOP:%.*]]
; DEFAULT: loop:
; DEFAULT-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
Expand Down Expand Up @@ -1080,9 +1077,9 @@ define void @low_trip_count_fold_tail_scalarized_store(ptr %dst) {
; PRED-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
; PRED-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]
; PRED: middle.block:
; PRED-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
; PRED-NEXT: br label [[EXIT:%.*]]
; PRED: scalar.ph:
; PRED-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 8, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
; PRED-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
; PRED-NEXT: br label [[LOOP:%.*]]
; PRED: loop:
; PRED-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
Expand Down Expand Up @@ -1517,9 +1514,9 @@ define void @test_conditional_interleave_group (ptr noalias %src.1, ptr noalias
; PRED-NEXT: [[TMP85:%.*]] = extractelement <8 x i1> [[TMP84]], i32 0
; PRED-NEXT: br i1 [[TMP85]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP23:![0-9]+]]
; PRED: middle.block:
; PRED-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
; PRED-NEXT: br label [[EXIT:%.*]]
; PRED: scalar.ph:
; PRED-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ], [ 0, [[VECTOR_SCEVCHECK]] ]
; PRED-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ 0, [[VECTOR_SCEVCHECK]] ]
; PRED-NEXT: br label [[LOOP_HEADER:%.*]]
; PRED: loop.header:
; PRED-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
Expand Down Expand Up @@ -1630,9 +1627,9 @@ define void @redundant_branch_and_tail_folding(ptr %dst, i1 %c) optsize {
; DEFAULT-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], 24
; DEFAULT-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP28:![0-9]+]]
; DEFAULT: middle.block:
; DEFAULT-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
; DEFAULT-NEXT: br label [[EXIT:%.*]]
; DEFAULT: scalar.ph:
; DEFAULT-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 24, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
; DEFAULT-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
; DEFAULT-NEXT: br label [[LOOP_HEADER:%.*]]
; DEFAULT: loop.header:
; DEFAULT-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
Expand Down Expand Up @@ -1693,9 +1690,9 @@ define void @redundant_branch_and_tail_folding(ptr %dst, i1 %c) optsize {
; PRED-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], 24
; PRED-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP25:![0-9]+]]
; PRED: middle.block:
; PRED-NEXT: br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
; PRED-NEXT: br label [[EXIT:%.*]]
; PRED: scalar.ph:
; PRED-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 24, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
; PRED-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
; PRED-NEXT: br label [[LOOP_HEADER:%.*]]
; PRED: loop.header:
; PRED-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -160,9 +160,9 @@ define void @sdiv_feeding_gep_predicated(ptr %dst, i32 %x, i64 %M, i64 %conv6, i
; CHECK-NEXT: [[TMP37:%.*]] = extractelement <vscale x 2 x i1> [[TMP36]], i32 0
; CHECK-NEXT: br i1 [[TMP37]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
; CHECK-NEXT: br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
; CHECK-NEXT: br label %[[EXIT:.*]]
; CHECK: [[SCALAR_PH]]:
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_SCEVCHECK]] ]
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_SCEVCHECK]] ]
; CHECK-NEXT: br label %[[LOOP:.*]]
; CHECK: [[LOOP]]:
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
Expand Down Expand Up @@ -287,9 +287,9 @@ define void @udiv_urem_feeding_gep(i64 %x, ptr %dst, i64 %N) {
; CHECK-NEXT: [[TMP48:%.*]] = extractelement <vscale x 2 x i1> [[TMP47]], i32 0
; CHECK-NEXT: br i1 [[TMP48]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
; CHECK-NEXT: br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
; CHECK-NEXT: br label %[[EXIT:.*]]
; CHECK: [[SCALAR_PH]]:
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_SCEVCHECK]] ]
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_SCEVCHECK]] ]
; CHECK-NEXT: br label %[[LOOP:.*]]
; CHECK: [[LOOP]]:
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
Expand Down
Loading
Loading