-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[VPlan] Add initial CFG simplification, removing BranchOnCond true. #106748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 9 commits
e3db837
195677a
19f1666
2fd15a4
641c647
f36fad0
364ee91
5ad4020
b6ced87
e86e71b
e6ae199
c03e5f1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -2484,12 +2484,13 @@ void InnerLoopVectorizer::introduceCheckBlockInVPlan(BasicBlock *CheckIRBB) { | |||||
PreVectorPH->swapSuccessors(); | ||||||
|
||||||
// We just connected a new block to the scalar preheader. Update all | ||||||
// ResumePhis by adding an incoming value for it. | ||||||
// ResumePhis by adding an incoming value for it, replacing the last value. | ||||||
for (VPRecipeBase &R : *cast<VPBasicBlock>(ScalarPH)) { | ||||||
auto *ResumePhi = dyn_cast<VPInstruction>(&R); | ||||||
if (!ResumePhi || ResumePhi->getOpcode() != VPInstruction::ResumePhi) | ||||||
continue; | ||||||
ResumePhi->addOperand(ResumePhi->getOperand(1)); | ||||||
ResumePhi->addOperand( | ||||||
ResumePhi->getOperand(ResumePhi->getNumOperands() - 1)); | ||||||
} | ||||||
} | ||||||
|
||||||
|
@@ -2658,7 +2659,10 @@ void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) { | |||||
LoopScalarPreHeader = | ||||||
SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->getTerminator(), DT, | ||||||
LI, nullptr, Twine(Prefix) + "scalar.ph"); | ||||||
replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader); | ||||||
// NOTE: The Plan's scalar preheader isn't replaced with a VPIRBasicBlock | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, thanks! |
||||||
// wrapping LoopScalarPreHeader here at the moment, because the Plan's scalar | ||||||
// preheader may be unreachable at this point. Instead it is replaced in | ||||||
// createVectorizedLoopSkeleton. | ||||||
} | ||||||
|
||||||
/// Return the expanded step for \p ID using \p ExpandedSCEVs to look up SCEV | ||||||
|
@@ -2754,6 +2758,7 @@ BasicBlock *InnerLoopVectorizer::createVectorizedLoopSkeleton() { | |||||
// faster. | ||||||
emitMemRuntimeChecks(LoopScalarPreHeader); | ||||||
|
||||||
replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How/Is this move dependent - replacing the scalar preheader VPBB with IRBB here instead of earlier when calling There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At the original points, the scalar PH may be unreachable, which means at the moment we cannot use Could independently improve this, by either storing parent plan in all VPBBs (not just the entries) or passing Plan to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The original position seems (more) reasonable, being right after LoopScalarPreHeader is built. Worth leaving behind some comment why this replacement is currently done later, "for now"? VPlan is assumed to always be connected, with all its VPBB's reachable from its entry. Can the original points maintain this connectivity, w/o storing the parental plan in all VPBBs nor passing Plan to replaceVPBBWithIRVPBB()? BTW, would be good to clarify that VPlan::getPlanEntry() avoids going into an infinite loop, if invoked on flat region-less cyclic CFG, based on visiting operands in order, and relying on the operand associated with the preheader block to appear (first) before that of the latch when visiting header phis. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, I added a note at the original place. We could store the plan in all blocks w/o a parent region, there is already a field in all blocks to do so?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Agreed, it seems better to store the plan for orphan blocks in their existing field rather than null, at-least for unreachable blocks, although best maintain connectivity rather than have a block point to a plan which in turn cannot reach it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, we will be able to do so once we move to model the full skeleton independently in VPlan and not rely on legacy's skeleton creation. |
||||||
return LoopVectorPreHeader; | ||||||
} | ||||||
|
||||||
|
@@ -7894,6 +7899,7 @@ BasicBlock *EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton() { | |||||
// Generate the induction variable. | ||||||
EPI.VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader); | ||||||
|
||||||
replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader); | ||||||
return LoopVectorPreHeader; | ||||||
} | ||||||
|
||||||
|
@@ -8042,6 +8048,7 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton() { | |||||
Phi->removeIncomingValue(EPI.MemSafetyCheck); | ||||||
} | ||||||
|
||||||
replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader); | ||||||
return LoopVectorPreHeader; | ||||||
} | ||||||
|
||||||
|
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -3527,12 +3527,20 @@ class VPlan { | |||||||
|
||||||||
/// Returns the 'middle' block of the plan, that is the block that selects | ||||||||
/// whether to execute the scalar tail loop or the exit block from the loop | ||||||||
/// latch. | ||||||||
const VPBasicBlock *getMiddleBlock() const { | ||||||||
return cast<VPBasicBlock>(getScalarPreheader()->getPredecessors().front()); | ||||||||
} | ||||||||
/// latch. If the scalar tail loop or exit block are known to always execute, | ||||||||
/// the middle block may branch directly to the block. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Also explain about the early-exit case, where middle.block is the 2nd successor of middle.split block, as depicted in https://llvm.org/docs/Vectorizers.html#early-exit-vectorization, which corresponds to the scalar preheader being absent from RegionSucc's successors? The middle block in this case conceptually has three successors: scalar preheader, latch.exit, early.exit with the first two postponed to be a successor's successors. Note that if the middle block branches unconditionally to exit block (or scalar preheader block), the two blocks may subsequently be merged, causing RegionSucc to have no successors (or be the scalar preheader itself). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated the comment.
Yep, for now, we don't merge VPIRBBs into other blocks. |
||||||||
VPBasicBlock *getMiddleBlock() { | ||||||||
return cast<VPBasicBlock>(getScalarPreheader()->getPredecessors().front()); | ||||||||
VPRegionBlock *LoopRegion = getVectorLoopRegion(); | ||||||||
if (!LoopRegion) | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: do the callers of getMiddleBlock() expect it to return null or assert it does not, or should the callee assert so? |
||||||||
return nullptr; | ||||||||
auto *RegionSucc = LoopRegion->getSingleSuccessor(); | ||||||||
if (RegionSucc->getSingleSuccessor() || | ||||||||
is_contained(RegionSucc->getSuccessors(), getScalarPreheader())) | ||||||||
return cast<VPBasicBlock>(RegionSucc); | ||||||||
return cast<VPBasicBlock>(RegionSucc->getSuccessors()[1]); | ||||||||
} | ||||||||
const VPBasicBlock *getMiddleBlock() const { | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done thanks |
||||||||
return const_cast<VPlan *>(this)->getMiddleBlock(); | ||||||||
} | ||||||||
|
||||||||
/// Return the VPBasicBlock for the preheader of the scalar loop. | ||||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -1682,6 +1682,52 @@ void VPlanTransforms::truncateToMinimalBitwidths( | |||||
"some entries in MinBWs haven't been processed"); | ||||||
} | ||||||
|
||||||
/// Remove BranchOnCond recipes with true conditions together with removing | ||||||
/// dead edges to their successors. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The successors across the removed edges are assumed to have ResumePhi recipes, which are fixed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth adding a comment that said ResumePhis are expected to already be fixed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure, they are expected to be valid coming in and the transform will keep them valid. |
||||||
static void simplifyBranchOnCondTrue(VPlan &Plan) { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated, thanks! |
||||||
using namespace llvm::VPlanPatternMatch; | ||||||
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>( | ||||||
vp_depth_first_shallow(Plan.getEntry()))) { | ||||||
if (VPBB->getNumSuccessors() != 2 || | ||||||
!match(&VPBB->back(), m_BranchOnCond(m_True()))) | ||||||
continue; | ||||||
|
||||||
VPBasicBlock *RemovedSucc = cast<VPBasicBlock>(VPBB->getSuccessors()[1]); | ||||||
const auto &Preds = RemovedSucc->getPredecessors(); | ||||||
assert(count(Preds, VPBB) == 1 && | ||||||
"There must be a single edge between VPBB and its successor"); | ||||||
unsigned DeadIdx = std::distance(Preds.begin(), find(Preds, VPBB)); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Assert that VPBB feeds a single value to RemovedSucc? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done thanks |
||||||
|
||||||
// Values coming from VPBB into ResumePhi recipes of RemoveSucc are removed | ||||||
// from these recipes. | ||||||
for (VPRecipeBase &R : make_early_inc_range(*RemovedSucc)) { | ||||||
assert((!isa<VPIRInstruction>(&R) || | ||||||
!isa<PHINode>(cast<VPIRInstruction>(&R)->getInstruction())) && | ||||||
!isa<VPHeaderPHIRecipe>(&R) && | ||||||
"Cannot update VPIRInstructions wrapping phis or header phis yet"); | ||||||
auto *VPI = dyn_cast<VPInstruction>(&R); | ||||||
if (!VPI || VPI->getOpcode() != VPInstruction::ResumePhi) | ||||||
break; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better break if !isPhi(), as we're traversing all and only phi recipes which appear first in the block, and then assert that the phi is a ResumePhi recipe? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will do separately, as isPhi needs updating to consider ReusmePhi |
||||||
VPBuilder B(VPI); | ||||||
SmallVector<VPValue *> NewOperands; | ||||||
// Create new operand list, with the dead incoming value filtered out. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would erase()'ing the dying operand from VPI->operands be easier, perhaps with some removeOperand() API, than replacing the VPInstruction with a new one? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could, the question is how to best limit this to some recipes, as I think it only makes sense for phi-like recipes (or maybe just ResumePhi). Could do as follow-up? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, can be limited to ResumePhi, until needed elsewhere. |
||||||
for (const auto &[Idx, Op] : enumerate(VPI->operands())) { | ||||||
if (Idx == DeadIdx) | ||||||
continue; | ||||||
NewOperands.push_back(Op); | ||||||
} | ||||||
VPI->replaceAllUsesWith(B.createNaryOp(VPInstruction::ResumePhi, | ||||||
NewOperands, VPI->getDebugLoc(), | ||||||
VPI->getName())); | ||||||
VPI->eraseFromParent(); | ||||||
} | ||||||
// Disconnect blocks and remove the terminator. RemovedSucc will be deleted | ||||||
// automatically on VPlan destruction if it becomes unreachable. | ||||||
VPBlockUtils::disconnectBlocks(VPBB, RemovedSucc); | ||||||
VPBB->back().eraseFromParent(); | ||||||
} | ||||||
} | ||||||
|
||||||
void VPlanTransforms::optimize(VPlan &Plan) { | ||||||
runPass(removeRedundantCanonicalIVs, Plan); | ||||||
runPass(removeRedundantInductionCasts, Plan); | ||||||
|
@@ -1691,6 +1737,7 @@ void VPlanTransforms::optimize(VPlan &Plan) { | |||||
runPass(legalizeAndOptimizeInductions, Plan); | ||||||
runPass(removeRedundantExpandSCEVRecipes, Plan); | ||||||
runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType()); | ||||||
runPass(simplifyBranchOnCondTrue, Plan); | ||||||
runPass(removeDeadRecipes, Plan); | ||||||
|
||||||
runPass(createAndOptimizeReplicateRegions, Plan); | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,7 +15,6 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1 | |
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]] | ||
; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64() | ||
; CHECK-NEXT: [[TMP6:%.*]] = mul i64 [[TMP5]], 8 | ||
; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (many test changes, yet to review) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The above branch-on-false from entry to scalar preheader or vector preheader can/should also be eliminated, turning the scalar loop into unreachable dead code? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, more to clean up :) |
||
; CHECK-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 0, i64 8) | ||
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 8 x i64> poison, i64 [[VAL]], i64 0 | ||
; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 8 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer | ||
|
@@ -41,10 +40,10 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1 | |
; CHECK-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 [[INDEX_NEXT]], i64 8) | ||
; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This branch-on-true from vector.body to middle.block or back to itself, can/should also be eliminated - as part of optimizing a vector loop found to have a trip-count of 1? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, optimizeForVFAndUF removes the region in some cases, but not yet with active-lane-masks. |
||
; CHECK: middle.block: | ||
; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]] | ||
; CHECK-NEXT: br label [[FOR_COND_CLEANUP:%.*]] | ||
; CHECK: scalar.ph: | ||
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ] | ||
; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi ptr [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[DST]], [[ENTRY]] ] | ||
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ] | ||
; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi ptr [ [[DST]], [[ENTRY]] ] | ||
; CHECK-NEXT: br label [[FOR_BODY:%.*]] | ||
; CHECK: for.body: | ||
; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ] | ||
|
@@ -100,7 +99,6 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range | |
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]] | ||
; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64() | ||
; CHECK-NEXT: [[TMP6:%.*]] = mul i64 [[TMP5]], 8 | ||
; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]] | ||
; CHECK-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 0, i64 [[WIDE_TRIP_COUNT]]) | ||
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 8 x i64> poison, i64 [[VAL]], i64 0 | ||
; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 8 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer | ||
|
@@ -126,10 +124,10 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range | |
; CHECK-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 [[INDEX_NEXT]], i64 [[WIDE_TRIP_COUNT]]) | ||
; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]] | ||
; CHECK: middle.block: | ||
; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]], label [[SCALAR_PH]] | ||
; CHECK-NEXT: br label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]] | ||
; CHECK: scalar.ph: | ||
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_PREHEADER]] ] | ||
; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi ptr [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[DST]], [[FOR_BODY_PREHEADER]] ] | ||
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ] | ||
; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi ptr [ [[DST]], [[FOR_BODY_PREHEADER]] ] | ||
; CHECK-NEXT: br label [[FOR_BODY:%.*]] | ||
; CHECK: for.body: | ||
; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks