Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
f85de24
[LV] Add variable trip count for test.
fhahn Aug 20, 2025
33afce8
[VPlan] Simplify Plan's entry in removeBranchOnConst.
fhahn Aug 14, 2025
208a182
Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…
fhahn Sep 1, 2025
1e16872
!fixup address latest comments, thanks!
fhahn Sep 1, 2025
17fe80c
Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…
fhahn Sep 1, 2025
528e463
!fixup address latest comments,t hanks!
fhahn Sep 1, 2025
03203ae
Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…
fhahn Sep 1, 2025
7b19cec
!Fxiup address comments, thanks
fhahn Sep 1, 2025
c9228c1
Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…
fhahn Sep 1, 2025
189b639
!fixup fix profile info
fhahn Sep 1, 2025
01e7486
!fixup fix formatting
fhahn Sep 1, 2025
3d90160
Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…
fhahn Sep 1, 2025
4390c24
!fixup update new tests
fhahn Sep 1, 2025
e0f99d9
Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…
fhahn Sep 3, 2025
519ae8b
!fixup move out loop metadata update
fhahn Sep 3, 2025
ca164d9
!fixup move
fhahn Sep 3, 2025
1bac0c2
Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…
fhahn Sep 3, 2025
0006272
!fixup move more
fhahn Sep 4, 2025
5b28b16
Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…
fhahn Sep 4, 2025
05c8386
[VPlan] Consolidate logic to update loop metadata and profile info.
fhahn Sep 4, 2025
3b47e50
Merge branch 'main' into vplan-simplify-branch-on-const-entry-tmp
fhahn Sep 4, 2025
7bcbbe6
!fixu update on top of main
fhahn Sep 4, 2025
174293a
Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…
fhahn Sep 4, 2025
afdf4c2
!fixup fix formatting
fhahn Sep 4, 2025
4accca8
Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…
fhahn Sep 12, 2025
df8c9da
!fixup update tests after merge
fhahn Sep 12, 2025
83cb4dc
Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…
fhahn Sep 14, 2025
cae7c85
Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…
fhahn Sep 15, 2025
6ea007b
!fixup address latest comments, thanks!
fhahn Sep 15, 2025
ea2db4e
!fixup restore tests
fhahn Sep 15, 2025
bef221f
Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…
fhahn Sep 16, 2025
6a89924
!fixup address comments, thanks
fhahn Sep 16, 2025
0ec1a59
Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…
fhahn Sep 16, 2025
44537d9
Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…
fhahn Sep 16, 2025
b5405c1
!fixup adjust check and message.
fhahn Sep 16, 2025
ce766c7
Merge remote-tracking branch 'origin/main' into vplan-simplify-branch…
fhahn Sep 18, 2025
be17a75
Merge branch 'main' into vplan-simplify-branch-on-const-entry
fhahn Sep 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
40 changes: 33 additions & 7 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2352,9 +2352,9 @@ EpilogueVectorizerMainLoop::createIterationCountCheck(ElementCount VF,
/// VPBB are moved to the end of the newly created VPIRBasicBlock. VPBB must
/// have a single predecessor, which is rewired to the new VPIRBasicBlock. All
/// successors of VPBB, if any, are rewired to the new VPIRBasicBlock.
static VPIRBasicBlock *replaceVPBBWithIRVPBB(VPBasicBlock *VPBB,
static VPIRBasicBlock *replaceVPBBWithIRVPBB(VPlan &Plan, VPBasicBlock *VPBB,
BasicBlock *IRBB) {
VPIRBasicBlock *IRVPBB = VPBB->getPlan()->createVPIRBasicBlock(IRBB);
VPIRBasicBlock *IRVPBB = Plan.createVPIRBasicBlock(IRBB);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is needed because VPBB may no longer be reachable from Plan's entry. VPBB must still have a single predecessor, as documented above, but that pred might be pred free? If only some callers pass an unreachable VPBB, Plan could be an optional parameter, to keep other callers intact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, updated the comment to say that all predecessors/successors are rewired, if any and added Plan as optional parameter, which must be passed if the may be unreachable, thanks!

auto IP = IRVPBB->begin();
for (auto &R : make_early_inc_range(VPBB->phis()))
R.moveBefore(*IRVPBB, IP);
Expand Down Expand Up @@ -2565,6 +2565,9 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {
// Remove redundant induction instructions.
cse(HeaderBB);

if (Plan.getScalarPreheader()->getNumPredecessors() == 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like having bool VPBlockBase::hasPredecessors() could be useful; possibly implemented using empty().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in a53a5ed, as it can be used in multiple places already, thanks

return;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early exit if remainder/default scalar loop is dead, w/o setting/updating profile weights for the vector loop? Worth a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to move the code, becuase in order to preserve profile info on the vector loop when the remainder loop is dead requires retrieving the profile info before removing the remainder loop and conditionally setting the profile info on the vector/remainder loops as available.

// Set/update profile weights for the vector and remainder loops as original
// loop iterations are now distributed among them. Note that original loop
// becomes the scalar remainder loop after vectorization.
Expand Down Expand Up @@ -7220,6 +7223,12 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);
VPlanTransforms::simplifyRecipes(BestVPlan);
VPlanTransforms::removeBranchOnConst(BestVPlan);
if (BestVPlan.getEntry()->getSingleSuccessor() ==
BestVPlan.getScalarPreheader()) {
Comment on lines +7190 to +7191
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removeBranchOnConst() could conceivably bypass the vector loop; this actually happens in few tests. Worth emitting a missed-vectorization remark.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, added an analysis remark. I am not sure if missed-vectorization would be accurate, because this is for cases where we would create a dead vector loop and should not even try to vectorize.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it appears the loop isn't vectorized because the Trip Count guard is known to always jump to the scalar loop, i.e., where VFxUF is known to exceed TC, so conceptually a smaller VFxUF could work. But tests include unvectorizable non-loop cases where TC<=1, which should better be cleaned up before calling LV, certainly before reaching LVP::executePlan().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we already have a TODO where we created the known True condition

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a TODO here too; wondering if the message should specify that vectorization is dead or never executes - due to insufficient trip-count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated message to mention insufficient trip count, thanks

// TODO: Should not even try to vectorize.
return DenseMap<const SCEV *, Value *>();
}

VPlanTransforms::narrowInterleaveGroups(
BestVPlan, BestVF,
TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector));
Expand Down Expand Up @@ -7262,7 +7271,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
BasicBlock *EntryBB =
cast<VPIRBasicBlock>(BestVPlan.getEntry())->getIRBasicBlock();
State.CFG.PrevBB = ILV.createVectorizedLoopSkeleton();
replaceVPBBWithIRVPBB(BestVPlan.getScalarPreheader(),
replaceVPBBWithIRVPBB(BestVPlan, BestVPlan.getScalarPreheader(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent: worth documenting somewhere that createVectorizedLoopSkeleton() returns the predecessor of the newly created scalar preheader (which when created has a unique predecessor).

State.CFG.PrevBB->getSingleSuccessor());
VPlanTransforms::removeDeadRecipes(BestVPlan);

Expand Down Expand Up @@ -7345,8 +7354,9 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
} else {
// Keep all loop hints from the original loop on the vector loop (we'll
// replace the vectorizer-specific hints below).
if (MDNode *LID = OrigLoop->getLoopID())
L->setLoopID(LID);
if (BestVPlan.getScalarPreheader()->getNumPredecessors() > 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (BestVPlan.getScalarPreheader()->getNumPredecessors() > 0)
} else if (BestVPlan.getScalarPreheader()->getNumPredecessors() > 0) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code below will execute in in the general else{, so I left it as-is for now.

if (MDNode *LID = OrigLoop->getLoopID())
L->setLoopID(LID);

LoopVectorizeHints Hints(L, true, *ORE);
Hints.setAlreadyVectorized();
Expand Down Expand Up @@ -7377,6 +7387,16 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
addRuntimeUnrollDisableMetaData(L);
}

if (BestVPlan.getScalarPreheader()->getNumPredecessors() == 0) {
// If the original loop became unreachable, we need to delete it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Native" VPlan constructs can simply be discarded when they become dead or unreached (even then lazily), whereas VPlan constructs that model the scalar loop are kept orphaned after becoming unreachable, to be processed here.

This raises an old thought: VPlan specifies code to be generated, using its blocks and recipes; how/could VPlan be extended to also dismantle existing code, perhaps using "anti-recipes" or "anti-blocks" whose execute() performs the desired clean-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, unfortunately we cannot do this when the VPIRBasicBlock for the scalar preheader gets executed, as we only execute reachable VPBBs.

I moved it for now to VPlan::execute. Unfortunately doing in the destructor of VPIRBasicBlock won't work either, because we need access to LoopInfo.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it in VPlan::execute is great!
As a potential follow-up: the VPIRBasicBlock of the scalar preheader and/or a VPIRRegionBlock of the scalar loop could be marked for destruction in VPlan if unreachable (as in CreatedBlocks), indicating that its loop should be removed from LoopInfo etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, they are implicitly marked for destruction (all VPIRBasicBlocks that are unreachable are tracked in CreatedBlocks) and can be destroyed in the destructor; but they need LoopInfo/DT passed somehow, to notify those about the removal

auto Blocks = OrigLoop->getBlocksVector();
Blocks.push_back(cast<VPIRBasicBlock>(BestVPlan.getScalarPreheader())
->getIRBasicBlock());
for (auto *BB : Blocks)
LI->removeBlock(BB);
LI->erase(OrigLoop);
}

// 3. Fix the vectorized code: take care of header phi's, live-outs,
// predication, updating analyses.
ILV.fixVectorizedLoop(State);
Expand Down Expand Up @@ -7452,7 +7472,8 @@ EpilogueVectorizerMainLoop::emitIterationCountCheck(BasicBlock *Bypass,
// generated here dominates the vector epilog iter check.
EPI.TripCount = Count;
} else {
VectorPHVPBB = replaceVPBBWithIRVPBB(VectorPHVPBB, LoopVectorPreHeader);
VectorPHVPBB =
replaceVPBBWithIRVPBB(Plan, VectorPHVPBB, LoopVectorPreHeader);
}

BranchInst &BI =
Expand Down Expand Up @@ -7485,7 +7506,7 @@ BasicBlock *EpilogueVectorizerEpilogueLoop::createVectorizedLoopSkeleton() {
BasicBlock *VecEpilogueIterationCountCheck =
SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->begin(), DT, LI,
nullptr, "vec.epilog.iter.check", true);
VectorPHVPBB = replaceVPBBWithIRVPBB(VectorPHVPBB, LoopVectorPreHeader);
VectorPHVPBB = replaceVPBBWithIRVPBB(Plan, VectorPHVPBB, LoopVectorPreHeader);

emitMinimumVectorEpilogueIterCountCheck(ScalarPH,
VecEpilogueIterationCountCheck);
Expand Down Expand Up @@ -10205,6 +10226,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
}

bool DisableRuntimeUnroll = false;
bool LoopRemoved = false;
MDNode *OrigLoopID = L->getLoopID();

// Report the vectorization decision.
Expand Down Expand Up @@ -10293,11 +10315,15 @@ bool LoopVectorizePass::processLoop(Loop *L) {
// rarely used is not worth unrolling.
if (!Checks.hasChecks() && !VF.Width.isScalar())
DisableRuntimeUnroll = true;
LoopRemoved = BestPlan.getScalarPreheader()->getNumPredecessors() == 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is recorded here instead of asking if BestPlan reaches the scalar loop below (next to early exiting if it doesn't), because it's relevant when vectorizing the main loop only? When also vectorizing the epilog loop, should the we check if BestEpiPlan reaches the scalar loop instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only relevant when not vectorizing the epilogue for now, as when vectorizing the epilogue we don't materilize the check yet.

The flag is set here to still keep the assert below before exiting. It could be moved/duplicated, but eventually we also will also enable it for the epilogue vectorization case

}

assert(DT->verify(DominatorTree::VerificationLevel::Fast) &&
"DT not preserved correctly");

if (LoopRemoved)
return true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early-exit worth a comment - skip the end which updates the scalar loop if it's removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!


std::optional<MDNode *> RemainderLoopID =
makeFollowupLoopID(OrigLoopID, {LLVMLoopVectorizeFollowupAll,
LLVMLoopVectorizeFollowupEpilogue});
Expand Down
12 changes: 7 additions & 5 deletions llvm/lib/Transforms/Vectorize/VPlan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -972,12 +972,14 @@ void VPlan::execute(VPTransformState *State) {
setName("Final VPlan");
LLVM_DEBUG(dump());

// Disconnect scalar preheader and scalar header, as the dominator tree edge
// will be updated as part of VPlan execution. This allows keeping the DTU
// logic generic during VPlan execution.
BasicBlock *ScalarPh = State->CFG.ExitBB;
State->CFG.DTU.applyUpdates(
{{DominatorTree::Delete, ScalarPh, ScalarPh->getSingleSuccessor()}});
if (getScalarPreheader()->getNumPredecessors() > 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else, if the scalar loop is unreachable, this disconnection is redundant, as all edges associated with the loop will be disconnected, including this one from PH to header?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that the updater than would try to apply an update to a disconnected part of the CFG, which will result in a crash, as the edge we are trying to update won't exist.

// Disconnect scalar preheader and scalar header, as the dominator tree edge
// will be updated as part of VPlan execution. This allows keeping the DTU
// logic generic during VPlan execution.
State->CFG.DTU.applyUpdates(
{{DominatorTree::Delete, ScalarPh, ScalarPh->getSingleSuccessor()}});
}

ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
Entry);
Expand Down
2 changes: 1 addition & 1 deletion llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1927,7 +1927,7 @@ void VPlanTransforms::removeBranchOnConst(VPlan &Plan) {
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
vp_depth_first_shallow(Plan.getEntry()))) {
VPValue *Cond;
if (VPBB->getNumSuccessors() != 2 || VPBB == Plan.getEntry() ||
if (VPBB->getNumSuccessors() != 2 || VPBB->empty() ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any VPBB with two successors must be non-empty, suffice to assert?

(This line applies branch-on-const removal to the entry block too, potentially resulting in either scalar loop or vector loop becoming unreachable.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the entry block, we only add the branch-on-cond late (or not at all when performing epilogue vectorization), hence the check for now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a branch-on-cond will be added only later, the following match will early-continue; but this match may fail when a block (the entry, with two successors) is currently free of recipes? Probably worth an explaining comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Would it suffice to check only if VPBB terminator matches a BranchOnCond? Is the emptiness check needed to prevent failure of this match? Check for two successors could then be an assert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that in case w/o successors, a VPBB may be empty, in which case there is no terminator to retrieve.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, plus even w/ a pair of successors VPBB may be empty. Is it then (necessary and) suffice to check

    if (VPBB->empty() ||
        !match(&VPBB->back(), m_BranchOnCond(m_VPValue(Cond))))
      continue;

    assert(VPBB->getNumSuccessors() == 2 && "Two successors expected for BranchOnCond"); 

but redundant first check for two-successors may early-continue faster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to the simpler check (together with simpler comment),thanks

!match(&VPBB->back(), m_BranchOnCond(m_VPValue(Cond))))
continue;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,8 @@ define i64 @predicated_udiv_scalarized_operand(ptr %a, i64 %x) {
; CHECK-NEXT: [[TMP17]] = add <2 x i64> [[VEC_PHI]], [[PREDPHI]]
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; CHECK: middle.block:
; CHECK-NEXT: br i1 [[TMP18]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; CHECK: for.end:
; CHECK-NEXT: [[TMP19:%.*]] = call i64 @llvm.vector.reduce.add.v2i64(<2 x i64> [[TMP17]])
; CHECK-NEXT: ret i64 [[TMP19]]
;
Expand Down
9 changes: 4 additions & 5 deletions llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
Original file line number Diff line number Diff line change
Expand Up @@ -202,8 +202,8 @@ exit:
define void @test_blend_feeding_replicated_store_2(ptr noalias %src, ptr %dst, i1 %c.0) {
; CHECK-LABEL: define void @test_blend_feeding_replicated_store_2(
; CHECK-SAME: ptr noalias [[SRC:%.*]], ptr [[DST:%.*]], i1 [[C_0:%.*]]) {
; CHECK-NEXT: [[ENTRY:.*]]:
; CHECK-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
; CHECK-NEXT: [[ENTRY:.*:]]
; CHECK-NEXT: br label %[[VECTOR_PH:.*]]
; CHECK: [[VECTOR_PH]]:
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i1> poison, i1 [[C_0]], i64 0
; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i1> [[BROADCAST_SPLATINSERT]], <16 x i1> poison, <16 x i32> zeroinitializer
Expand Down Expand Up @@ -366,12 +366,11 @@ define void @test_blend_feeding_replicated_store_2(ptr noalias %src, ptr %dst, i
; CHECK-NEXT: [[TMP71:%.*]] = icmp eq i32 [[INDEX_NEXT]], 96
; CHECK-NEXT: br i1 [[TMP71]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
; CHECK-NEXT: br label %[[SCALAR_PH]]
; CHECK-NEXT: br label %[[SCALAR_PH:.*]]
; CHECK: [[SCALAR_PH]]:
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 96, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
; CHECK-NEXT: br label %[[LOOP_HEADER:.*]]
; CHECK: [[LOOP_HEADER]]:
; CHECK-NEXT: [[IV1:%.*]] = phi i32 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
; CHECK-NEXT: [[IV1:%.*]] = phi i32 [ 96, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
; CHECK-NEXT: [[GEP_SRC1:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[IV1]]
; CHECK-NEXT: [[L:%.*]] = load i8, ptr [[GEP_SRC1]], align 1
; CHECK-NEXT: [[C_1:%.*]] = icmp eq i8 [[L]], 0
Expand Down
19 changes: 8 additions & 11 deletions llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ target triple = "arm64-apple-macosx11.0.0"
define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
; CHECK-LABEL: define void @fshl_operand_first_order_recurrence(
; CHECK-SAME: ptr [[DST:%.*]], ptr noalias [[SRC:%.*]]) {
; CHECK-NEXT: [[ENTRY:.*]]:
; CHECK-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
; CHECK-NEXT: [[ENTRY:.*:]]
; CHECK-NEXT: br label %[[VECTOR_PH:.*]]
; CHECK: [[VECTOR_PH]]:
; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
; CHECK: [[VECTOR_BODY]]:
Expand All @@ -30,14 +30,12 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
; CHECK-NEXT: br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <2 x i64> [[WIDE_LOAD1]], i32 1
; CHECK-NEXT: br label %[[SCALAR_PH]]
; CHECK-NEXT: br label %[[SCALAR_PH:.*]]
; CHECK: [[SCALAR_PH]]:
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 100, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
; CHECK-NEXT: br label %[[LOOP:.*]]
; CHECK: [[LOOP]]:
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
; CHECK-NEXT: [[RECUR:%.*]] = phi i64 [ [[SCALAR_RECUR_INIT]], %[[SCALAR_PH]] ], [ [[L:%.*]], %[[LOOP]] ]
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 100, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
; CHECK-NEXT: [[RECUR:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], %[[SCALAR_PH]] ], [ [[L:%.*]], %[[LOOP]] ]
; CHECK-NEXT: [[GEP_SRC:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 [[IV]]
; CHECK-NEXT: [[L]] = load i64, ptr [[GEP_SRC]], align 8
; CHECK-NEXT: [[OR:%.*]] = tail call i64 @llvm.fshl.i64(i64 1, i64 [[RECUR]], i64 1)
Expand Down Expand Up @@ -73,7 +71,7 @@ define void @powi_call(ptr %P) {
; CHECK-LABEL: define void @powi_call(
; CHECK-SAME: ptr [[P:%.*]]) {
; CHECK-NEXT: [[ENTRY:.*:]]
; CHECK-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
; CHECK-NEXT: br label %[[VECTOR_PH:.*]]
; CHECK: [[VECTOR_PH]]:
; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
; CHECK: [[VECTOR_BODY]]:
Expand All @@ -83,7 +81,7 @@ define void @powi_call(ptr %P) {
; CHECK-NEXT: br label %[[MIDDLE_BLOCK:.*]]
; CHECK: [[MIDDLE_BLOCK]]:
; CHECK-NEXT: br label %[[EXIT:.*]]
; CHECK: [[SCALAR_PH]]:
; CHECK: [[SCALAR_PH:.*]]:
; CHECK-NEXT: br label %[[LOOP:.*]]
; CHECK: [[LOOP]]:
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 0, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
Expand All @@ -93,7 +91,7 @@ define void @powi_call(ptr %P) {
; CHECK-NEXT: store double [[POWI]], ptr [[GEP]], align 8
; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1
; CHECK-NEXT: [[EC:%.*]] = icmp eq i64 [[IV]], 1
; CHECK-NEXT: br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP4:![0-9]+]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata dropped, scalar loop unreachable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

; CHECK-NEXT: br i1 [[EC]], label %[[EXIT]], label %[[LOOP]]
; CHECK: [[EXIT]]:
; CHECK-NEXT: ret void
;
Expand Down Expand Up @@ -224,5 +222,4 @@ declare i64 @llvm.fshl.i64(i64, i64, i64)
; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META2]], [[META1]]}
;.
18 changes: 8 additions & 10 deletions llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
; CHECK-LABEL: define void @clamped_tc_8(
; CHECK-SAME: ptr captures(none) [[DST:%.*]], i32 [[N:%.*]], i64 [[VAL:%.*]]) #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; CHECK-NEXT: br label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP1:%.*]] = mul nuw i64 [[TMP0]], 8
Expand Down Expand Up @@ -36,7 +36,7 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
; CHECK: scalar.ph:
; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:
; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ 0, [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ 0, [[SCALAR_PH:%.*]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
; CHECK-NEXT: [[P_OUT_TAIL_09:%.*]] = phi ptr [ [[DST]], [[SCALAR_PH]] ], [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ]
; CHECK-NEXT: [[TMP19:%.*]] = shl nuw nsw i64 [[INDVARS_IV]], 3
; CHECK-NEXT: [[SHR3:%.*]] = lshr i64 [[VAL]], [[TMP19]]
Expand All @@ -45,7 +45,7 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, ptr [[P_OUT_TAIL_09]], i64 1
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 8
; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]]
; CHECK: for.cond.cleanup:
; CHECK-NEXT: ret void
;
Expand Down Expand Up @@ -79,7 +79,7 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range
; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i32 [[REM]], 7
; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[ADD]], 3
; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[SHR]] to i64
; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; CHECK-NEXT: br label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP1:%.*]] = mul nuw i64 [[TMP0]], 8
Expand All @@ -104,13 +104,13 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range
; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP1]]
; CHECK-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 [[INDEX_NEXT]], i64 [[WIDE_TRIP_COUNT]])
; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 8 x i64> [[VEC_IND]], [[DOTSPLAT]]
; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single vector iteration branch-on-const missed opportunity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will look into that separately, probably due to other restrictions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth leaving a TODO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added thanks!

; CHECK: middle.block:
; CHECK-NEXT: br label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]]
; CHECK: scalar.ph:
; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:
; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ 0, [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ 0, [[SCALAR_PH:%.*]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
; CHECK-NEXT: [[P_OUT_TAIL_09:%.*]] = phi ptr [ [[DST]], [[SCALAR_PH]] ], [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ]
; CHECK-NEXT: [[TMP19:%.*]] = shl nuw nsw i64 [[INDVARS_IV]], 3
; CHECK-NEXT: [[SHR3:%.*]] = lshr i64 [[VAL]], [[TMP19]]
Expand All @@ -119,7 +119,7 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range
; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, ptr [[P_OUT_TAIL_09]], i64 1
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has the metadata been dropped on the scalar loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scalar loop is unreachable now, which means we have to remove it from LoopInfo as an unreachable block is dominated by any other unreachable block, breaking LoopInfo verification.

From the PR description

In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.

; CHECK: for.cond.cleanup.loopexit:
; CHECK-NEXT: br label [[FOR_COND_CLEANUP]]
; CHECK: for.cond.cleanup:
Expand Down Expand Up @@ -156,7 +156,5 @@ for.cond.cleanup: ; preds = %for.body
; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META2]]}
; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META2]], [[META1]]}
; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META1]], [[META2]]}
;.
Loading