-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[VPlan] Fix header masks in EVL tail folding #150202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
d76cd7b
5222d52
aa6a87c
6c79176
17c69e9
f25895f
1d4f20a
9b4d6fe
e7c2ef5
72d525f
a8b6194
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2235,6 +2235,10 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) { | |
} | ||
|
||
// Try to optimize header mask recipes away to their EVL variants. | ||
// | ||
// TODO: Split this out and move into VPlanTransforms::optimize. | ||
// transformRecipestoEVLRecipes should be run in tryToBuildVPlanWithVPRecipes | ||
// beforehand. | ||
for (VPValue *HeaderMask : collectAllHeaderMasks(Plan)) { | ||
for (VPUser *U : collectUsersRecursively(HeaderMask)) { | ||
auto *CurRecipe = cast<VPRecipeBase>(U); | ||
|
@@ -2265,6 +2269,27 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) { | |
for (VPValue *Op : PossiblyDead) | ||
recursivelyDeleteDeadRecipes(Op); | ||
} | ||
|
||
// Replace header masks with a mask equivalent to predicating by EVL: | ||
// | ||
// icmp ule widen-canonical-iv backedge-taken-count | ||
// -> | ||
// icmp ult step-vector, EVL | ||
Type *EVLType = TypeInfo.inferScalarType(&EVL); | ||
for (VPValue *HeaderMask : collectAllHeaderMasks(Plan)) { | ||
if (HeaderMask->users().empty()) | ||
continue; | ||
VPRecipeBase *EVLR = EVL.getDefiningRecipe(); | ||
VPBuilder Builder(Plan.getVectorPreheader()); | ||
VPValue *StepVector = | ||
Builder.createNaryOp(VPInstruction::StepVector, {}, EVLType); | ||
Builder.setInsertPoint(EVLR->getParent(), std::next(EVLR->getIterator())); | ||
VPValue *EVLMask = Builder.createICmp( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we use the same insert point and leave this for LICM? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I originally left it for LICM, but then I realised that this happens after VPlanTransforms::optimize so we won't get VPlan LICM and it'll affect the cost model. But now I see that VPInstruction::StepVector doesn't actually have a cost so I guess it should be ok. |
||
CmpInst::ICMP_ULT, StepVector, | ||
Builder.createNaryOp(VPInstruction::Broadcast, {&EVL})); | ||
lukel97 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
HeaderMask->replaceAllUsesWith(EVLMask); | ||
HeaderMask->getDefiningRecipe()->eraseFromParent(); | ||
} | ||
} | ||
|
||
/// Add a VPEVLBasedIVPHIRecipe and related recipes to \p Plan and | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is needed for correctness, right? Should it not happen before the loop above?
Also, inferring type after removing recipes? Should this happen before
for (VPRecipeBase *R : reverse(ToErase)) {
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should, but we need to rework that optimizeMaskToEVL loop to detect EVL style header masks. I was hoping to do that in a separate NFC patch, but can include it in this one if you'd like.
Yes good catch, I had pulled EVLType out of the loop so it wouldn't be invalidated when calling
HeaderMask->getDefiningRecipe()->eraseFromParent()
, but forgot about the erasing above :)Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fhahn I remember we discussed this issue a long time ago: #90184 (comment)
It doesn’t seem like a correctness issue — unless I missed some cases?
Anyway, I noticed that some test case did get fewer IR, so this seems like a good direction overall, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like that comment was discussing it in the context of reductions.
For any trapping operation, e.g. a load or store, it will be incorrect if we end up with the wrong mask. Which wasn't an issue up until now because we happen to convert loads and stores to vp intrinsics. But if for some reason a load/store slips through optimizeMaskToEVL, then it will be incorrect. I.e. in #150074