-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[VPlan] Simplify Plan's entry in removeBranchOnConst. #154510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 30 commits
f85de24
33afce8
208a182
1e16872
17fe80c
528e463
03203ae
7b19cec
c9228c1
189b639
01e7486
3d90160
4390c24
e0f99d9
519ae8b
ca164d9
1bac0c2
0006272
5b28b16
05c8386
3b47e50
7bcbbe6
174293a
afdf4c2
4accca8
df8c9da
83cb4dc
cae7c85
6ea007b
ea2db4e
bef221f
6a89924
0ec1a59
44537d9
b5405c1
ce766c7
be17a75
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2347,12 +2347,15 @@ Value *EpilogueVectorizerMainLoop::createIterationCountCheck( | |
| } | ||
|
|
||
| /// Replace \p VPBB with a VPIRBasicBlock wrapping \p IRBB. All recipes from \p | ||
| /// VPBB are moved to the end of the newly created VPIRBasicBlock. VPBB must | ||
| /// have a single predecessor, which is rewired to the new VPIRBasicBlock. All | ||
| /// successors of VPBB, if any, are rewired to the new VPIRBasicBlock. | ||
| /// VPBB are moved to the end of the newly created VPIRBasicBlock. All | ||
| /// predecessors and successors of VPBB, if any, are rewired to the new | ||
| /// VPIRBasicBlock. If \p VPBB may be unreachable, \p Plan must be passed. | ||
| static VPIRBasicBlock *replaceVPBBWithIRVPBB(VPBasicBlock *VPBB, | ||
| BasicBlock *IRBB) { | ||
| VPIRBasicBlock *IRVPBB = VPBB->getPlan()->createVPIRBasicBlock(IRBB); | ||
| BasicBlock *IRBB, | ||
| VPlan *Plan = nullptr) { | ||
| if (!Plan) | ||
| Plan = VPBB->getPlan(); | ||
| VPIRBasicBlock *IRVPBB = Plan->createVPIRBasicBlock(IRBB); | ||
| auto IP = IRVPBB->begin(); | ||
| for (auto &R : make_early_inc_range(VPBB->phis())) | ||
| R.moveBefore(*IRVPBB, IP); | ||
|
|
@@ -7191,6 +7194,18 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan( | |
| VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE); | ||
| VPlanTransforms::simplifyRecipes(BestVPlan); | ||
| VPlanTransforms::removeBranchOnConst(BestVPlan); | ||
| if (BestVPlan.getEntry()->getSingleSuccessor() == | ||
| BestVPlan.getScalarPreheader()) { | ||
|
Comment on lines
+7190
to
+7191
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removeBranchOnConst() could conceivably bypass the vector loop; this actually happens in few tests. Worth emitting a missed-vectorization remark.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, added an analysis remark. I am not sure if missed-vectorization would be accurate, because this is for cases where we would create a dead vector loop and should not even try to vectorize.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, it appears the loop isn't vectorized because the Trip Count guard is known to always jump to the scalar loop, i.e., where VFxUF is known to exceed TC, so conceptually a smaller VFxUF could work. But tests include unvectorizable non-loop cases where TC<=1, which should better be cleaned up before calling LV, certainly before reaching LVP::executePlan().
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, we already have a TODO where we created the known True condition
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have a TODO here too; wondering if the message should specify that vectorization is dead or never executes - due to insufficient trip-count.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated message to mention insufficient trip count, thanks |
||
| // TODO: The vector loop would be dead, should not even try to vectorize. | ||
| ORE->emit([&]() { | ||
| return OptimizationRemarkAnalysis(DEBUG_TYPE, "VectorizationDead", | ||
| OrigLoop->getStartLoc(), | ||
| OrigLoop->getHeader()) | ||
| << "Created vector loop never executes."; | ||
| }); | ||
| return DenseMap<const SCEV *, Value *>(); | ||
| } | ||
|
|
||
| VPlanTransforms::narrowInterleaveGroups( | ||
| BestVPlan, BestVF, | ||
| TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector)); | ||
|
|
@@ -7233,7 +7248,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan( | |
| // middle block. The vector loop is created during VPlan execution. | ||
| State.CFG.PrevBB = ILV.createVectorizedLoopSkeleton(); | ||
| replaceVPBBWithIRVPBB(BestVPlan.getScalarPreheader(), | ||
| State.CFG.PrevBB->getSingleSuccessor()); | ||
| State.CFG.PrevBB->getSingleSuccessor(), &BestVPlan); | ||
| VPlanTransforms::removeDeadRecipes(BestVPlan); | ||
|
|
||
| assert(verifyVPlanIsValid(BestVPlan, true /*VerifyLate*/) && | ||
|
|
@@ -7264,6 +7279,13 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan( | |
| // | ||
| //===------------------------------------------------===// | ||
|
|
||
| // Retrieve loop information before executing the plan, which may remove the | ||
| // original loop, if it becomes unreachable. | ||
| MDNode *LID = OrigLoop->getLoopID(); | ||
| unsigned OrigLoopInvocationWeight = 0; | ||
| std::optional<unsigned> OrigAverageTripCount = | ||
| getLoopEstimatedTripCount(OrigLoop, &OrigLoopInvocationWeight); | ||
|
|
||
| BestVPlan.execute(&State); | ||
|
|
||
| // 2.6. Maintain Loop Hints | ||
|
|
@@ -7277,7 +7299,8 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan( | |
| updateLoopMetadataAndProfileInfo( | ||
| HeaderVPBB ? LI->getLoopFor(State.CFG.VPBB2IRBB.lookup(HeaderVPBB)) | ||
| : nullptr, | ||
| HeaderVPBB, VectorizingEpilogue, | ||
| HeaderVPBB, BestVPlan, VectorizingEpilogue, LID, OrigAverageTripCount, | ||
| OrigLoopInvocationWeight, | ||
| estimateElementCount(BestVF * BestUF, CM.getVScaleForTuning()), | ||
| DisableRuntimeUnroll); | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -969,12 +969,24 @@ void VPlan::execute(VPTransformState *State) { | |
| setName("Final VPlan"); | ||
| LLVM_DEBUG(dump()); | ||
|
|
||
| // Disconnect scalar preheader and scalar header, as the dominator tree edge | ||
| // will be updated as part of VPlan execution. This allows keeping the DTU | ||
| // logic generic during VPlan execution. | ||
| BasicBlock *ScalarPh = State->CFG.ExitBB; | ||
| State->CFG.DTU.applyUpdates( | ||
| {{DominatorTree::Delete, ScalarPh, ScalarPh->getSingleSuccessor()}}); | ||
| VPBasicBlock *ScalarPhVPBB = getScalarPreheader(); | ||
| if (ScalarPhVPBB->hasPredecessors()) { | ||
| // Disconnect scalar preheader and scalar header, as the dominator tree edge | ||
| // will be updated as part of VPlan execution. This allows keeping the DTU | ||
| // logic generic during VPlan execution. | ||
| State->CFG.DTU.applyUpdates( | ||
| {{DominatorTree::Delete, ScalarPh, ScalarPh->getSingleSuccessor()}}); | ||
| } else { | ||
| Loop *OrigLoop = | ||
| State->LI->getLoopFor(getScalarHeader()->getIRBasicBlock()); | ||
| // If the original loop is unreachable, we need to delete it. | ||
| auto Blocks = OrigLoop->getBlocksVector(); | ||
| Blocks.push_back(cast<VPIRBasicBlock>(ScalarPhVPBB)->getIRBasicBlock()); | ||
| for (auto *BB : Blocks) | ||
| State->LI->removeBlock(BB); | ||
| State->LI->erase(OrigLoop); | ||
| } | ||
|
|
||
| ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT( | ||
| Entry); | ||
|
|
@@ -1648,14 +1660,18 @@ static void addRuntimeUnrollDisableMetaData(Loop *L) { | |
| } | ||
|
|
||
| void LoopVectorizationPlanner::updateLoopMetadataAndProfileInfo( | ||
| Loop *VectorLoop, VPBasicBlock *HeaderVPBB, bool VectorizingEpilogue, | ||
| unsigned EstimatedVFxUF, bool DisableRuntimeUnroll) { | ||
| MDNode *LID = OrigLoop->getLoopID(); | ||
| Loop *VectorLoop, VPBasicBlock *HeaderVPBB, const VPlan &Plan, | ||
| bool VectorizingEpilogue, MDNode *OrigLoopID, | ||
| std::optional<unsigned> OrigAverageTripCount, | ||
| unsigned OrigLoopInvocationWeight, unsigned EstimatedVFxUF, | ||
| bool DisableRuntimeUnroll) { | ||
| // Update the metadata of the scalar loop. Skip the update when vectorizing | ||
| // the epilogue loop, to ensure it is only updated once. | ||
| if (!VectorizingEpilogue) { | ||
| std::optional<MDNode *> RemainderLoopID = makeFollowupLoopID( | ||
| LID, {LLVMLoopVectorizeFollowupAll, LLVMLoopVectorizeFollowupEpilogue}); | ||
| // the epilogue loop to ensure it is updated only once. Also skip the update | ||
| // when the scalar loop became unreachable. | ||
| if (Plan.getScalarPreheader()->hasPredecessors() && !VectorizingEpilogue) { | ||
| std::optional<MDNode *> RemainderLoopID = | ||
| makeFollowupLoopID(OrigLoopID, {LLVMLoopVectorizeFollowupAll, | ||
| LLVMLoopVectorizeFollowupEpilogue}); | ||
| if (RemainderLoopID) { | ||
| OrigLoop->setLoopID(*RemainderLoopID); | ||
| } else { | ||
|
|
@@ -1670,15 +1686,15 @@ void LoopVectorizationPlanner::updateLoopMetadataAndProfileInfo( | |
| if (!VectorLoop) | ||
| return; | ||
|
|
||
| if (std::optional<MDNode *> VectorizedLoopID = | ||
| makeFollowupLoopID(LID, {LLVMLoopVectorizeFollowupAll, | ||
| LLVMLoopVectorizeFollowupVectorized})) { | ||
| if (std::optional<MDNode *> VectorizedLoopID = makeFollowupLoopID( | ||
| OrigLoopID, {LLVMLoopVectorizeFollowupAll, | ||
| LLVMLoopVectorizeFollowupVectorized})) { | ||
| VectorLoop->setLoopID(*VectorizedLoopID); | ||
| } else { | ||
| // Keep all loop hints from the original loop on the vector loop (we'll | ||
| // replace the vectorizer-specific hints below). | ||
| if (LID) | ||
| VectorLoop->setLoopID(LID); | ||
| if (OrigLoopID) | ||
| VectorLoop->setLoopID(OrigLoopID); | ||
|
|
||
| if (!VectorizingEpilogue) { | ||
| LoopVectorizeHints Hints(VectorLoop, true, *ORE); | ||
|
|
@@ -1723,7 +1739,21 @@ void LoopVectorizationPlanner::updateLoopMetadataAndProfileInfo( | |
| // For scalable vectorization we can't know at compile time how many | ||
| // iterations of the loop are handled in one vector iteration, so instead | ||
| // use the value of vscale used for tuning. | ||
| setProfileInfoAfterUnrolling(OrigLoop, VectorLoop, OrigLoop, EstimatedVFxUF); | ||
| if (OrigAverageTripCount) { | ||
|
||
| // Calculate number of iterations in unrolled loop. | ||
| unsigned AverageVectorTripCount = *OrigAverageTripCount / EstimatedVFxUF; | ||
| // Calculate number of iterations for remainder loop. | ||
| unsigned RemainderAverageTripCount = *OrigAverageTripCount % EstimatedVFxUF; | ||
|
|
||
| if (HeaderVPBB) { | ||
| setLoopEstimatedTripCount(VectorLoop, AverageVectorTripCount, | ||
| OrigLoopInvocationWeight); | ||
| } | ||
| if (Plan.getScalarPreheader()->hasPredecessors()) { | ||
| setLoopEstimatedTripCount(OrigLoop, RemainderAverageTripCount, | ||
| OrigLoopInvocationWeight); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2198,7 +2198,10 @@ void VPlanTransforms::removeBranchOnConst(VPlan &Plan) { | |
| for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>( | ||
| vp_depth_first_shallow(Plan.getEntry()))) { | ||
| VPValue *Cond; | ||
| if (VPBB->getNumSuccessors() != 2 || VPBB == Plan.getEntry() || | ||
| // Skip blocks that don't have 2 successors or are not terminated by | ||
| // BranchOnCond. Empty blocks with 2 successors are also skipped; their | ||
| // branch condition will be added later. | ||
| if (VPBB->getNumSuccessors() != 2 || VPBB->empty() || | ||
|
||
| !match(&VPBB->back(), m_BranchOnCond(m_VPValue(Cond)))) | ||
| continue; | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,8 +6,8 @@ target triple = "arm64-apple-macosx11.0.0" | |
| define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) { | ||
| ; CHECK-LABEL: define void @fshl_operand_first_order_recurrence( | ||
| ; CHECK-SAME: ptr [[DST:%.*]], ptr noalias [[SRC:%.*]]) { | ||
| ; CHECK-NEXT: [[ENTRY:.*]]: | ||
| ; CHECK-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]] | ||
| ; CHECK-NEXT: [[ENTRY:.*:]] | ||
| ; CHECK-NEXT: br label %[[VECTOR_PH:.*]] | ||
| ; CHECK: [[VECTOR_PH]]: | ||
| ; CHECK-NEXT: br label %[[VECTOR_BODY:.*]] | ||
| ; CHECK: [[VECTOR_BODY]]: | ||
|
|
@@ -30,14 +30,12 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) { | |
| ; CHECK-NEXT: br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] | ||
| ; CHECK: [[MIDDLE_BLOCK]]: | ||
| ; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <2 x i64> [[WIDE_LOAD1]], i32 1 | ||
| ; CHECK-NEXT: br label %[[SCALAR_PH]] | ||
| ; CHECK-NEXT: br label %[[SCALAR_PH:.*]] | ||
| ; CHECK: [[SCALAR_PH]]: | ||
| ; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 100, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ] | ||
| ; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ] | ||
| ; CHECK-NEXT: br label %[[LOOP:.*]] | ||
| ; CHECK: [[LOOP]]: | ||
| ; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ] | ||
| ; CHECK-NEXT: [[RECUR:%.*]] = phi i64 [ [[SCALAR_RECUR_INIT]], %[[SCALAR_PH]] ], [ [[L:%.*]], %[[LOOP]] ] | ||
| ; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 100, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ] | ||
| ; CHECK-NEXT: [[RECUR:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], %[[SCALAR_PH]] ], [ [[L:%.*]], %[[LOOP]] ] | ||
| ; CHECK-NEXT: [[GEP_SRC:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 [[IV]] | ||
| ; CHECK-NEXT: [[L]] = load i64, ptr [[GEP_SRC]], align 8 | ||
| ; CHECK-NEXT: [[OR:%.*]] = tail call i64 @llvm.fshl.i64(i64 1, i64 [[RECUR]], i64 1) | ||
|
|
@@ -73,7 +71,7 @@ define void @powi_call(ptr %P) { | |
| ; CHECK-LABEL: define void @powi_call( | ||
| ; CHECK-SAME: ptr [[P:%.*]]) { | ||
| ; CHECK-NEXT: [[ENTRY:.*:]] | ||
| ; CHECK-NEXT: br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]] | ||
| ; CHECK-NEXT: br label %[[VECTOR_PH:.*]] | ||
| ; CHECK: [[VECTOR_PH]]: | ||
| ; CHECK-NEXT: br label %[[VECTOR_BODY:.*]] | ||
| ; CHECK: [[VECTOR_BODY]]: | ||
|
|
@@ -83,7 +81,7 @@ define void @powi_call(ptr %P) { | |
| ; CHECK-NEXT: br label %[[MIDDLE_BLOCK:.*]] | ||
| ; CHECK: [[MIDDLE_BLOCK]]: | ||
| ; CHECK-NEXT: br label %[[EXIT:.*]] | ||
| ; CHECK: [[SCALAR_PH]]: | ||
| ; CHECK: [[SCALAR_PH:.*]]: | ||
| ; CHECK-NEXT: br label %[[LOOP:.*]] | ||
| ; CHECK: [[LOOP]]: | ||
| ; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 0, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ] | ||
|
|
@@ -93,7 +91,7 @@ define void @powi_call(ptr %P) { | |
| ; CHECK-NEXT: store double [[POWI]], ptr [[GEP]], align 8 | ||
| ; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1 | ||
| ; CHECK-NEXT: [[EC:%.*]] = icmp eq i64 [[IV]], 1 | ||
| ; CHECK-NEXT: br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP4:![0-9]+]] | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. metadata dropped, scalar loop unreachable
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yep |
||
| ; CHECK-NEXT: br i1 [[EC]], label %[[EXIT]], label %[[LOOP]] | ||
| ; CHECK: [[EXIT]]: | ||
| ; CHECK-NEXT: ret void | ||
| ; | ||
|
|
@@ -224,5 +222,4 @@ declare i64 @llvm.fshl.i64(i64, i64, i64) | |
| ; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1} | ||
| ; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"} | ||
| ; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]} | ||
| ; CHECK: [[LOOP4]] = distinct !{[[LOOP4]], [[META2]], [[META1]]} | ||
| ;. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closed ) above thanks