-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[VPlan] Materialize vector trip count using VPInstructions. #151925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
a613fa4
db2fc58
4a8927a
f9c873f
e190397
96b818a
3ced2d8
3a9e56a
319ff1a
0db43c0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3153,7 +3153,7 @@ void VPlanTransforms::materializeBroadcasts(VPlan &Plan) { | |
} | ||
} | ||
|
||
void VPlanTransforms::materializeVectorTripCount( | ||
void VPlanTransforms::materializeConstantVectorTripCount( | ||
VPlan &Plan, ElementCount BestVF, unsigned BestUF, | ||
PredicatedScalarEvolution &PSE) { | ||
assert(Plan.hasVF(BestVF) && "BestVF is not available in Plan"); | ||
|
@@ -3191,6 +3191,62 @@ void VPlanTransforms::materializeBackedgeTakenCount(VPlan &Plan, | |
BTC->replaceAllUsesWith(TCMO); | ||
} | ||
|
||
void VPlanTransforms::materializeVectorTripCount(VPlan &Plan, | ||
VPBasicBlock *VectorPHVPBB, | ||
bool TailByMasking, | ||
bool RequiresScalarEpilogue) { | ||
VPValue &VectorTC = Plan.getVectorTripCount(); | ||
if (VectorTC.getNumUsers() == 0 || | ||
(VectorTC.isLiveIn() && VectorTC.getLiveInIRValue())) | ||
lukel97 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return; | ||
VPValue *TC = Plan.getTripCount(); | ||
Type *TCTy = VPTypeAnalysis(Plan).inferScalarType(TC); | ||
VPBuilder Builder(VectorPHVPBB, VectorPHVPBB->begin()); | ||
|
||
VPValue *Step = &Plan.getVFxUF(); | ||
|
||
// If the tail is to be folded by masking, round the number of iterations N | ||
// up to a multiple of Step instead of rounding down. This is done by first | ||
// adding Step-1 and then rounding down. Note that it's ok if this addition | ||
// overflows: the vector induction variable will eventually wrap to zero given | ||
// that it starts at zero and its Step is a power of two; the loop will then | ||
// exit, with the last early-exit vector comparison also producing all-true. | ||
// For scalable vectors the VF is not guaranteed to be a power of 2, but this | ||
// is accounted for in emitIterationCountCheck that adds an overflow check. | ||
if (TailByMasking) { | ||
TC = Builder.createNaryOp( | ||
Instruction::Add, | ||
{TC, Builder.createNaryOp( | ||
Instruction::Sub, | ||
{Step, Plan.getOrAddLiveIn(ConstantInt::get(TCTy, 1))})}, | ||
DebugLoc::getUnknown(), "n.rnd.up"); | ||
} | ||
|
||
// Now we need to generate the expression for the part of the loop that the | ||
// vectorized body will execute. This is equal to N - (N % Step) if scalar | ||
// iterations are not required for correctness, or N - Step, otherwise. Step | ||
// is equal to the vectorization factor (number of SIMD elements) times the | ||
// unroll factor (number of SIMD instructions). | ||
VPValue *R = Builder.createNaryOp(Instruction::URem, {TC, Step}, | ||
DebugLoc::getUnknown(), "n.mod.vf"); | ||
lukel97 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
// There are cases where we *must* run at least one iteration in the remainder | ||
// loop. See the cost model for when this can happen. If the step evenly | ||
// divides the trip count, we set the remainder to be equal to the step. If | ||
// the step does not evenly divide the trip count, no adjustment is necessary | ||
// since there will already be scalar iterations. Note that the minimum | ||
// iterations check ensures that N >= Step. | ||
if (RequiresScalarEpilogue) { | ||
auto *IsZero = Builder.createICmp( | ||
CmpInst::ICMP_EQ, R, Plan.getOrAddLiveIn(ConstantInt::get(TCTy, 0))); | ||
R = Builder.createSelect(IsZero, Step, R); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't a problem with your patch, but isn't there some odd behaviour here when tail-folding a loop that requires a scalar epilogue? Suppose TC=(VF * UF) + 1, for the tail-folding case we add Perhaps we don't permit tail-folding when requiring a scalar epilogue? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes they are mutually exclusive at the moment. Added an assert though, in case it changes in the future |
||
} | ||
|
||
auto Res = Builder.createNaryOp(Instruction::Sub, {TC, R}, | ||
lukel97 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
DebugLoc::getUnknown(), "n.vec"); | ||
Plan.getVectorTripCount().replaceAllUsesWith(Res); | ||
lukel97 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
/// Returns true if \p V is VPWidenLoadRecipe or VPInterleaveRecipe that can be | ||
/// converted to a narrower recipe. \p V is used by a wide recipe that feeds a | ||
/// store interleave group at index \p Idx, \p WideMember0 is the recipe feeding | ||
|
Uh oh!
There was an error while loading. Please reload this page.