-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[VPlan] Materialize VF and VFxUF using VPInstructions. #152879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
59322c5
3367a1f
e26258b
929db2e
4aa2925
f03fdfb
1cc3267
a404171
dab0d23
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -276,6 +276,20 @@ class VPBuilder { | |||||
return tryInsertInstruction(new VPPhi(IncomingValues, DL, Name)); | ||||||
} | ||||||
|
||||||
VPValue *createElementCount(Type *Ty, ElementCount EC) { | ||||||
VPlan &Plan = *getInsertBlock()->getPlan(); | ||||||
VPValue *RuntimeEC = | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I left this as RuntimeEC, because we use this for VFxUF as well, where the ElementCount passed in is the original VF multiplied by UF. |
||||||
Plan.getOrAddLiveIn(ConstantInt::get(Ty, EC.getKnownMinValue())); | ||||||
if (EC.isScalable()) { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Early exit simpler complementary case first? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated, thanks |
||||||
VPValue *VScale = createNaryOp(VPInstruction::VScale, {}, Ty); | ||||||
RuntimeEC = EC.getKnownMinValue() == 1 | ||||||
? VScale | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Early return VScale? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is now directly returned at the end, I left the ternary operator there for now. |
||||||
: createNaryOp(Instruction::Mul, {VScale, RuntimeEC}, | ||||||
VPIRFlags::WrapFlagsTy(true, false)); | ||||||
lukel97 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
} | ||||||
return RuntimeEC; | ||||||
} | ||||||
|
||||||
/// Convert the input value \p Current to the corresponding value of an | ||||||
/// induction with \p Start and \p Step values, using \p Start + \p Current * | ||||||
/// \p Step. | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3340,6 +3340,33 @@ void VPlanTransforms::materializeVectorTripCount(VPlan &Plan, | |
VectorTC.replaceAllUsesWith(Res); | ||
} | ||
|
||
void VPlanTransforms::materializeVFAndVFxUF(VPlan &Plan, VPBasicBlock *VectorPH, | ||
ElementCount VFEC) { | ||
VPBuilder Builder(VectorPH, VectorPH->begin()); | ||
auto *TCTy = VPTypeAnalysis(Plan).inferScalarType(Plan.getTripCount()); | ||
lukel97 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
VPValue &VF = Plan.getVF(); | ||
VPValue &VFxUF = Plan.getVFxUF(); | ||
Comment on lines
+3343
to
+3344
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So Plan's getVF() and getVFxUF() become obsolete from this point? They will remain use-less, no longer retrieving the relevant values. Worth noting, or hooking them to their replacements? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, for now, there are no users of getVF and getVFxUF at after this point. It would be good to mark them as such, but I'm not sure what the best way would be. Another alternative would be manage them as actual users. They should also be region-specific. I'll prepare some follow-ups for that. |
||
if (VF.getNumUsers()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Deal with the simpler complementary case first? Add comment why needed - distinct orders explained below. Can also guard handling VFxUF if it has users. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added comments, thanks! I've not added a use check for now, as this will easily be cleaned up by VPlan dce |
||
VPValue *RuntimeVF = Builder.createElementCount(TCTy, VFEC); | ||
if (any_of(VF.users(), [&VF](VPUser *U) { return !U->usesScalars(&VF); })) { | ||
auto *BC = Builder.createNaryOp(VPInstruction::Broadcast, {RuntimeVF}); | ||
lukel97 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
VF.replaceUsesWithIf( | ||
BC, [&VF](VPUser &U, unsigned) { return !U.usesScalars(&VF); }); | ||
} | ||
VF.replaceAllUsesWith(RuntimeVF); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How does this work, given you're undoing the VF.replaceUsesWithIf call above? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Vector uses will be replaced above with the broadcast, the remaining users get the scalar value. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh I see now, sorry! By the time we get to the second line the uses of VF that were replaced before are no longer present in the use list. |
||
|
||
VPValue *UF = Plan.getOrAddLiveIn(ConstantInt::get(TCTy, Plan.getUF())); | ||
auto *MulByUF = Plan.getUF() == 1 ? RuntimeVF | ||
lukel97 marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you not just fall through to the bottom of the function - it looks like we're just duplicating code here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure how, we if VF is used we first multiply VF * vscale and then multiply that by UF, wherease below we compute (VF * UF) * vscale There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps I'm missing something, but it looked like these two pieces of code were calculating the same thing:
and
The only difference seems to be that you're not using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, but they generate operations in slightly different order, the first does (VF * Vscale) * UF, whereas otherwise we do ((VF * UF) * vscale), the former will serve users of the runtime-vf. Both compute the same value, but in different order. The patch keeps the sequence the same as for IR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, but I'm a bit surprised that the order of multiplication is crucial for correctness here. At the moment it just looks like the code is deliberately trying hard to maintain the order of multiplication whereas in reality if the order matters then something is horribly broken, right? I would expect the vectoriser to always choose VF,UF pairs such that (VF * VScale) * UF == (VF * UF) * VScale. If not, then I believe that we need to fix the vectoriser to ensure this cannot happen. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It’s not needed for correctness, as mentioned above they both compute the same value. but if there are no users of the runtime vf, the current code constant folds the UF*VF.getKnownMinValue() computation, which is not possibly if there are users for VF.getKnownMinValue()*vscale. one could argue that we could always create VFvscaleUF, and rely on recipe simplifications to fold the VFUF computations if there no users of VFvscale. This would certainly be possible once we have the computation explicit in VPlan, but the current patch just tries to port the existing computation, to avoid unrelated test changes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK fair enough, thanks for the explanation. I do get the argument that you want to minimise test changes and I'm happy with that. I just wanted to understand the rationale behind it that's all and make sure there isn't a hidden bug or assumption here. |
||
: Builder.createNaryOp(Instruction::Mul, | ||
{RuntimeVF, UF}); | ||
lukel97 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
VFxUF.replaceAllUsesWith(MulByUF); | ||
return; | ||
} | ||
|
||
VPValue *RuntimeVFxUF = Builder.createElementCount(TCTy, VFEC * Plan.getUF()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this guaranteed to always have users? I just wondered why we don't check that here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not in all cases, e.g. if the canonical IV and its increment can be completely be removed. If it won't get used, VPlan DCE can remove it. The reason we check for users of VF is that we compute things in slightly different order if we need to provide a value for the runtime VF There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need for MulByUF in this case? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's done directly by multiplying (constant folding) VF * Plan.getUF(). |
||
VFxUF.replaceAllUsesWith(RuntimeVFxUF); | ||
} | ||
|
||
/// Returns true if \p V is VPWidenLoadRecipe or VPInterleaveRecipe that can be | ||
/// converted to a narrower recipe. \p V is used by a wide recipe that feeds a | ||
/// store interleave group at index \p Idx, \p WideMember0 is the recipe feeding | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds a bit odd to createElementCount given an ElementCount. There's probably a better name than "RuntimeVF", but that's already in use elsewhere, in any case should be clearly defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IRBuilder has an identical function also called
IRBuilderBase::CreateElementCount
, and SelectionDAG too withSelectionDAG::getElementCount
. I think we should be consistent here.It also doesn't necessarily need to generate a VF. An ElementCount could be e.g. VFxUF or another arbitrary quantity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the name aligned with the IR Builder for now.