-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[VPlan] Simplify Plan's entry in removeBranchOnConst. #154510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
f85de24
33afce8
208a182
1e16872
17fe80c
528e463
03203ae
7b19cec
c9228c1
189b639
01e7486
3d90160
4390c24
e0f99d9
519ae8b
ca164d9
1bac0c2
0006272
5b28b16
05c8386
3b47e50
7bcbbe6
174293a
afdf4c2
4accca8
df8c9da
83cb4dc
cae7c85
6ea007b
ea2db4e
bef221f
6a89924
0ec1a59
44537d9
b5405c1
ce766c7
be17a75
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -2352,9 +2352,9 @@ EpilogueVectorizerMainLoop::createIterationCountCheck(ElementCount VF, | |||||
| /// VPBB are moved to the end of the newly created VPIRBasicBlock. VPBB must | ||||||
| /// have a single predecessor, which is rewired to the new VPIRBasicBlock. All | ||||||
| /// successors of VPBB, if any, are rewired to the new VPIRBasicBlock. | ||||||
| static VPIRBasicBlock *replaceVPBBWithIRVPBB(VPBasicBlock *VPBB, | ||||||
| static VPIRBasicBlock *replaceVPBBWithIRVPBB(VPlan &Plan, VPBasicBlock *VPBB, | ||||||
| BasicBlock *IRBB) { | ||||||
| VPIRBasicBlock *IRVPBB = VPBB->getPlan()->createVPIRBasicBlock(IRBB); | ||||||
| VPIRBasicBlock *IRVPBB = Plan.createVPIRBasicBlock(IRBB); | ||||||
| auto IP = IRVPBB->begin(); | ||||||
| for (auto &R : make_early_inc_range(VPBB->phis())) | ||||||
| R.moveBefore(*IRVPBB, IP); | ||||||
|
|
@@ -2565,6 +2565,9 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) { | |||||
| // Remove redundant induction instructions. | ||||||
| cse(HeaderBB); | ||||||
|
|
||||||
| if (Plan.getScalarPreheader()->getNumPredecessors() == 0) | ||||||
|
||||||
| return; | ||||||
|
|
||||||
|
||||||
| // Set/update profile weights for the vector and remainder loops as original | ||||||
| // loop iterations are now distributed among them. Note that original loop | ||||||
| // becomes the scalar remainder loop after vectorization. | ||||||
|
|
@@ -7220,6 +7223,12 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan( | |||||
| VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE); | ||||||
| VPlanTransforms::simplifyRecipes(BestVPlan); | ||||||
| VPlanTransforms::removeBranchOnConst(BestVPlan); | ||||||
| if (BestVPlan.getEntry()->getSingleSuccessor() == | ||||||
| BestVPlan.getScalarPreheader()) { | ||||||
|
Comment on lines
+7190
to
+7191
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removeBranchOnConst() could conceivably bypass the vector loop; this actually happens in few tests. Worth emitting a missed-vectorization remark.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, added an analysis remark. I am not sure if missed-vectorization would be accurate, because this is for cases where we would create a dead vector loop and should not even try to vectorize.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, it appears the loop isn't vectorized because the Trip Count guard is known to always jump to the scalar loop, i.e., where VFxUF is known to exceed TC, so conceptually a smaller VFxUF could work. But tests include unvectorizable non-loop cases where TC<=1, which should better be cleaned up before calling LV, certainly before reaching LVP::executePlan().
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, we already have a TODO where we created the known True condition
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have a TODO here too; wondering if the message should specify that vectorization is dead or never executes - due to insufficient trip-count.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated message to mention insufficient trip count, thanks |
||||||
| // TODO: Should not even try to vectorize. | ||||||
| return DenseMap<const SCEV *, Value *>(); | ||||||
| } | ||||||
|
|
||||||
| VPlanTransforms::narrowInterleaveGroups( | ||||||
| BestVPlan, BestVF, | ||||||
| TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector)); | ||||||
|
|
@@ -7262,7 +7271,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan( | |||||
| BasicBlock *EntryBB = | ||||||
| cast<VPIRBasicBlock>(BestVPlan.getEntry())->getIRBasicBlock(); | ||||||
| State.CFG.PrevBB = ILV.createVectorizedLoopSkeleton(); | ||||||
| replaceVPBBWithIRVPBB(BestVPlan.getScalarPreheader(), | ||||||
| replaceVPBBWithIRVPBB(BestVPlan, BestVPlan.getScalarPreheader(), | ||||||
|
||||||
| State.CFG.PrevBB->getSingleSuccessor()); | ||||||
| VPlanTransforms::removeDeadRecipes(BestVPlan); | ||||||
|
|
||||||
|
|
@@ -7345,8 +7354,9 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan( | |||||
| } else { | ||||||
| // Keep all loop hints from the original loop on the vector loop (we'll | ||||||
| // replace the vectorizer-specific hints below). | ||||||
| if (MDNode *LID = OrigLoop->getLoopID()) | ||||||
| L->setLoopID(LID); | ||||||
| if (BestVPlan.getScalarPreheader()->getNumPredecessors() > 0) | ||||||
|
||||||
| if (BestVPlan.getScalarPreheader()->getNumPredecessors() > 0) | |
| } else if (BestVPlan.getScalarPreheader()->getNumPredecessors() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code below will execute in in the general else{, so I left it as-is for now.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Native" VPlan constructs can simply be discarded when they become dead or unreached (even then lazily), whereas VPlan constructs that model the scalar loop are kept orphaned after becoming unreachable, to be processed here.
This raises an old thought: VPlan specifies code to be generated, using its blocks and recipes; how/could VPlan be extended to also dismantle existing code, perhaps using "anti-recipes" or "anti-blocks" whose execute() performs the desired clean-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, unfortunately we cannot do this when the VPIRBasicBlock for the scalar preheader gets executed, as we only execute reachable VPBBs.
I moved it for now to VPlan::execute. Unfortunately doing in the destructor of VPIRBasicBlock won't work either, because we need access to LoopInfo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having it in VPlan::execute is great!
As a potential follow-up: the VPIRBasicBlock of the scalar preheader and/or a VPIRRegionBlock of the scalar loop could be marked for destruction in VPlan if unreachable (as in CreatedBlocks), indicating that its loop should be removed from LoopInfo etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, they are implicitly marked for destruction (all VPIRBasicBlocks that are unreachable are tracked in CreatedBlocks) and can be destroyed in the destructor; but they need LoopInfo/DT passed somehow, to notify those about the removal
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is recorded here instead of asking if BestPlan reaches the scalar loop below (next to early exiting if it doesn't), because it's relevant when vectorizing the main loop only? When also vectorizing the epilog loop, should the we check if BestEpiPlan reaches the scalar loop instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is only relevant when not vectorizing the epilogue for now, as when vectorizing the epilogue we don't materilize the check yet.
The flag is set here to still keep the assert below before exiting. It could be moved/duplicated, but eventually we also will also enable it for the epilogue vectorization case
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Early-exit worth a comment - skip the end which updates the scalar loop if it's removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks!
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -972,12 +972,14 @@ void VPlan::execute(VPTransformState *State) { | |
| setName("Final VPlan"); | ||
| LLVM_DEBUG(dump()); | ||
|
|
||
| // Disconnect scalar preheader and scalar header, as the dominator tree edge | ||
| // will be updated as part of VPlan execution. This allows keeping the DTU | ||
| // logic generic during VPlan execution. | ||
| BasicBlock *ScalarPh = State->CFG.ExitBB; | ||
| State->CFG.DTU.applyUpdates( | ||
| {{DominatorTree::Delete, ScalarPh, ScalarPh->getSingleSuccessor()}}); | ||
| if (getScalarPreheader()->getNumPredecessors() > 0) { | ||
|
||
| // Disconnect scalar preheader and scalar header, as the dominator tree edge | ||
| // will be updated as part of VPlan execution. This allows keeping the DTU | ||
| // logic generic during VPlan execution. | ||
| State->CFG.DTU.applyUpdates( | ||
| {{DominatorTree::Delete, ScalarPh, ScalarPh->getSingleSuccessor()}}); | ||
| } | ||
|
|
||
| ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT( | ||
| Entry); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1927,7 +1927,7 @@ void VPlanTransforms::removeBranchOnConst(VPlan &Plan) { | |
| for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>( | ||
| vp_depth_first_shallow(Plan.getEntry()))) { | ||
| VPValue *Cond; | ||
| if (VPBB->getNumSuccessors() != 2 || VPBB == Plan.getEntry() || | ||
| if (VPBB->getNumSuccessors() != 2 || VPBB->empty() || | ||
|
||
| !match(&VPBB->back(), m_BranchOnCond(m_VPValue(Cond)))) | ||
| continue; | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,8 +6,8 @@ target triple = "arm64-apple-macosx11.0.0" | |
| define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) { | ||
| ; CHECK-LABEL: define void @fshl_operand_first_order_recurrence( | ||
| ; CHECK-SAME: ptr [[DST:%.*]], ptr noalias [[SRC:%.*]]) { | ||
| ; CHECK-NEXT: [[ENTRY:.*]]: | ||
| ; CHECK-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]] | ||
| ; CHECK-NEXT: [[ENTRY:.*:]] | ||
| ; CHECK-NEXT: br label %[[VECTOR_PH:.*]] | ||
| ; CHECK: [[VECTOR_PH]]: | ||
| ; CHECK-NEXT: br label %[[VECTOR_BODY:.*]] | ||
| ; CHECK: [[VECTOR_BODY]]: | ||
|
|
@@ -30,14 +30,12 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) { | |
| ; CHECK-NEXT: br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] | ||
| ; CHECK: [[MIDDLE_BLOCK]]: | ||
| ; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <2 x i64> [[WIDE_LOAD1]], i32 1 | ||
| ; CHECK-NEXT: br label %[[SCALAR_PH]] | ||
| ; CHECK-NEXT: br label %[[SCALAR_PH:.*]] | ||
| ; CHECK: [[SCALAR_PH]]: | ||
| ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 100, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ] | ||
| ; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ] | ||
| ; CHECK-NEXT: br label %[[LOOP:.*]] | ||
| ; CHECK: [[LOOP]]: | ||
| ; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ] | ||
| ; CHECK-NEXT: [[RECUR:%.*]] = phi i64 [ [[SCALAR_RECUR_INIT]], %[[SCALAR_PH]] ], [ [[L:%.*]], %[[LOOP]] ] | ||
| ; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 100, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ] | ||
| ; CHECK-NEXT: [[RECUR:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], %[[SCALAR_PH]] ], [ [[L:%.*]], %[[LOOP]] ] | ||
| ; CHECK-NEXT: [[GEP_SRC:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 [[IV]] | ||
| ; CHECK-NEXT: [[L]] = load i64, ptr [[GEP_SRC]], align 8 | ||
| ; CHECK-NEXT: [[OR:%.*]] = tail call i64 @llvm.fshl.i64(i64 1, i64 [[RECUR]], i64 1) | ||
|
|
@@ -73,7 +71,7 @@ define void @powi_call(ptr %P) { | |
| ; CHECK-LABEL: define void @powi_call( | ||
| ; CHECK-SAME: ptr [[P:%.*]]) { | ||
| ; CHECK-NEXT: [[ENTRY:.*:]] | ||
| ; CHECK-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]] | ||
| ; CHECK-NEXT: br label %[[VECTOR_PH:.*]] | ||
| ; CHECK: [[VECTOR_PH]]: | ||
| ; CHECK-NEXT: br label %[[VECTOR_BODY:.*]] | ||
| ; CHECK: [[VECTOR_BODY]]: | ||
|
|
@@ -83,7 +81,7 @@ define void @powi_call(ptr %P) { | |
| ; CHECK-NEXT: br label %[[MIDDLE_BLOCK:.*]] | ||
| ; CHECK: [[MIDDLE_BLOCK]]: | ||
| ; CHECK-NEXT: br label %[[EXIT:.*]] | ||
| ; CHECK: [[SCALAR_PH]]: | ||
| ; CHECK: [[SCALAR_PH:.*]]: | ||
| ; CHECK-NEXT: br label %[[LOOP:.*]] | ||
| ; CHECK: [[LOOP]]: | ||
| ; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 0, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ] | ||
|
|
@@ -93,7 +91,7 @@ define void @powi_call(ptr %P) { | |
| ; CHECK-NEXT: store double [[POWI]], ptr [[GEP]], align 8 | ||
| ; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1 | ||
| ; CHECK-NEXT: [[EC:%.*]] = icmp eq i64 [[IV]], 1 | ||
| ; CHECK-NEXT: br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP4:![0-9]+]] | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. metadata dropped, scalar loop unreachable
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yep |
||
| ; CHECK-NEXT: br i1 [[EC]], label %[[EXIT]], label %[[LOOP]] | ||
| ; CHECK: [[EXIT]]: | ||
| ; CHECK-NEXT: ret void | ||
| ; | ||
|
|
@@ -224,5 +222,4 @@ declare i64 @llvm.fshl.i64(i64, i64, i64) | |
| ; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1} | ||
| ; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"} | ||
| ; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]} | ||
| ; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META2]], [[META1]]} | ||
| ;. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,7 +5,7 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1 | |
| ; CHECK-LABEL: define void @clamped_tc_8( | ||
| ; CHECK-SAME: ptr captures(none) [[DST:%.*]], i32 [[N:%.*]], i64 [[VAL:%.*]]) #[[ATTR0:[0-9]+]] { | ||
| ; CHECK-NEXT: entry: | ||
| ; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] | ||
| ; CHECK-NEXT: br label [[VECTOR_PH:%.*]] | ||
| ; CHECK: vector.ph: | ||
| ; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64() | ||
| ; CHECK-NEXT: [[TMP1:%.*]] = mul nuw i64 [[TMP0]], 8 | ||
|
|
@@ -36,7 +36,7 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1 | |
| ; CHECK: scalar.ph: | ||
| ; CHECK-NEXT: br label [[FOR_BODY:%.*]] | ||
| ; CHECK: for.body: | ||
| ; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ 0, [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ] | ||
| ; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ 0, [[SCALAR_PH:%.*]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ] | ||
| ; CHECK-NEXT: [[P_OUT_TAIL_09:%.*]] = phi ptr [ [[DST]], [[SCALAR_PH]] ], [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ] | ||
| ; CHECK-NEXT: [[TMP19:%.*]] = shl nuw nsw i64 [[INDVARS_IV]], 3 | ||
| ; CHECK-NEXT: [[SHR3:%.*]] = lshr i64 [[VAL]], [[TMP19]] | ||
|
|
@@ -45,7 +45,7 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1 | |
| ; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, ptr [[P_OUT_TAIL_09]], i64 1 | ||
| ; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1 | ||
| ; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 8 | ||
| ; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]] | ||
| ; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]] | ||
| ; CHECK: for.cond.cleanup: | ||
| ; CHECK-NEXT: ret void | ||
| ; | ||
|
|
@@ -79,7 +79,7 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range | |
| ; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i32 [[REM]], 7 | ||
| ; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[ADD]], 3 | ||
| ; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[SHR]] to i64 | ||
| ; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] | ||
| ; CHECK-NEXT: br label [[VECTOR_PH:%.*]] | ||
| ; CHECK: vector.ph: | ||
| ; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64() | ||
| ; CHECK-NEXT: [[TMP1:%.*]] = mul nuw i64 [[TMP0]], 8 | ||
|
|
@@ -104,13 +104,13 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range | |
| ; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP1]] | ||
| ; CHECK-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 [[INDEX_NEXT]], i64 [[WIDE_TRIP_COUNT]]) | ||
| ; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 8 x i64> [[VEC_IND]], [[DOTSPLAT]] | ||
| ; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]] | ||
| ; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]] | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Single vector iteration branch-on-const missed opportunity?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, will look into that separately, probably due to other restrictions.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. May be worth leaving a TODO.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added thanks! |
||
| ; CHECK: middle.block: | ||
| ; CHECK-NEXT: br label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]] | ||
| ; CHECK: scalar.ph: | ||
| ; CHECK-NEXT: br label [[FOR_BODY:%.*]] | ||
| ; CHECK: for.body: | ||
| ; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ 0, [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ] | ||
| ; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ 0, [[SCALAR_PH:%.*]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ] | ||
| ; CHECK-NEXT: [[P_OUT_TAIL_09:%.*]] = phi ptr [ [[DST]], [[SCALAR_PH]] ], [ [[INCDEC_PTR:%.*]], [[FOR_BODY]] ] | ||
| ; CHECK-NEXT: [[TMP19:%.*]] = shl nuw nsw i64 [[INDVARS_IV]], 3 | ||
| ; CHECK-NEXT: [[SHR3:%.*]] = lshr i64 [[VAL]], [[TMP19]] | ||
|
|
@@ -119,7 +119,7 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range | |
| ; CHECK-NEXT: [[INCDEC_PTR]] = getelementptr inbounds i8, ptr [[P_OUT_TAIL_09]], i64 1 | ||
| ; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1 | ||
| ; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]] | ||
| ; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]] | ||
| ; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]] | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why has the metadata been dropped on the scalar loop?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The scalar loop is unreachable now, which means we have to remove it from LoopInfo as an unreachable block is dominated by any other unreachable block, breaking LoopInfo verification. From the PR description
|
||
| ; CHECK: for.cond.cleanup.loopexit: | ||
| ; CHECK-NEXT: br label [[FOR_COND_CLEANUP]] | ||
| ; CHECK: for.cond.cleanup: | ||
|
|
@@ -156,7 +156,5 @@ for.cond.cleanup: ; preds = %for.body | |
| ; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]} | ||
| ; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1} | ||
| ; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"} | ||
| ; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]} | ||
| ; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META2]]} | ||
| ; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META2]], [[META1]]} | ||
| ; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META1]], [[META2]]} | ||
| ;. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is needed because VPBB may no longer be reachable from Plan's entry. VPBB must still have a single predecessor, as documented above, but that pred might be pred free? If only some callers pass an unreachable VPBB, Plan could be an optional parameter, to keep other callers intact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, updated the comment to say that all predecessors/successors are rewired, if any and added Plan as optional parameter, which must be passed if the may be unreachable, thanks!