-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[VPlan] Unroll VPReplicateRecipe by VF. #142433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
c354bdc
2e877b5
0800450
884b9d3
ed982d4
14e296c
48699d9
cc1a779
47b9665
af2d2c0
5a73ebe
ab6665c
b6a0834
ae3e3c4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7291,6 +7291,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan( | |
// cost model is complete for better cost estimates. | ||
VPlanTransforms::runPass(VPlanTransforms::unrollByUF, BestVPlan, BestUF, | ||
OrigLoop->getHeader()->getContext()); | ||
VPlanTransforms::runPass(VPlanTransforms::unrollByVF, BestVPlan, BestVF); | ||
|
||
VPlanTransforms::runPass(VPlanTransforms::materializeBroadcasts, BestVPlan); | ||
VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE); | ||
VPlanTransforms::simplifyRecipes(BestVPlan, *Legal->getWidestInductionType()); | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -261,6 +261,14 @@ Value *VPTransformState::get(const VPValue *Def, const VPLane &Lane) { | |||||
return Data.VPV2Scalars[Def][0]; | ||||||
} | ||||||
|
||||||
// Look through BuildVector to avoid redundant extracts. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is done now rather than later to reduce test diff? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep without the fold we would have additional insert/extracts for uses inside replicate regions, with corresponding test changes. |
||||||
// TODO: Remove once replicate regions are unrolled explicitly. | ||||||
auto *BV = dyn_cast<VPInstruction>(Def); | ||||||
|
auto *BV = dyn_cast<VPInstruction>(Def); | |
auto *BuildV = dyn_cast<VPInstruction>(Def); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth match
ing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matched using a matcher w/o ops, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Independent: missing error message.
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -907,6 +907,12 @@ class VPInstruction : public VPRecipeWithIRFlags, | |||||||||
BranchOnCount, | ||||||||||
BranchOnCond, | ||||||||||
Broadcast, | ||||||||||
/// Creates a vector containing all operands. The vector element count | ||||||||||
|
||||||||||
/// matches the number of operands. | ||||||||||
|
/// Creates a vector containing all operands. The vector element count | |
/// matches the number of operands. | |
/// Creates a vector containing all operands. The number of operands | |
/// matches the vector element count. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also added fixed-width vectors here
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Creates a struct of vectors containing all operands. The vector element | |
/// count matches the number of operands. | |
/// Creates a struct of vectors containing all operands. The number of operands | |
/// matches the number of fields in the struct. |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lex order has BuildVector after BuildStructVector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reordered, thanks
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -107,6 +107,8 @@ Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPInstruction *R) { | |||||||||
case VPInstruction::CalculateTripCountMinusVF: | ||||||||||
case VPInstruction::CanonicalIVIncrementForPart: | ||||||||||
case VPInstruction::AnyOf: | ||||||||||
case VPInstruction::BuildVector: | ||||||||||
case VPInstruction::BuildStructVector: | ||||||||||
|
case VPInstruction::BuildVector: | |
case VPInstruction::BuildStructVector: | |
case VPInstruction::BuildStructVector: | |
case VPInstruction::BuildVector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated thanks
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -493,6 +493,9 @@ Value *VPInstruction::generate(VPTransformState &State) { | |||||||||||||||||||||||||
} | ||||||||||||||||||||||||||
case Instruction::ExtractElement: { | ||||||||||||||||||||||||||
assert(State.VF.isVector() && "Only extract elements from vectors"); | ||||||||||||||||||||||||||
return State.get(getOperand(0), | ||||||||||||||||||||||||||
VPLane(cast<ConstantInt>(getOperand(1)->getLiveInIRValue()) | ||||||||||||||||||||||||||
->getZExtValue())); | ||||||||||||||||||||||||||
|
return State.get(getOperand(0), | |
VPLane(cast<ConstantInt>(getOperand(1)->getLiveInIRValue()) | |
->getZExtValue())); | |
auto ElementToExtract = cast<ConstantInt>(getOperand(1)->getLiveInIRValue())->getZExtValue()); | |
return State.get(getOperand(0), VPLane(ElementToExtract)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thank.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If second operand of ExtractElement must be a (small, less than VF) compile-time constant, would it be better held in VPIRFlags, given that other flags are unneeded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be possible, although in theory the VF could be > 255, which would exceed VPIRFlags' char
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed theoretically, but this deals with explicit unrolling by VF, which may be worth bounding to prevent code bloat, which may also impact performance. I.e., VF's larger than 255 could still be supported if only vectors are produced. (OTOH, legalization could also take place later...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are lines 499-501 not deleted, given that we've already returned on line 496?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep removed, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Value *Vec = State.get(getOperand(0)); | |
Value *Idx = State.get(getOperand(1), /*IsScalar=*/true); | |
return Builder.CreateExtractElement(Vec, Idx, Name); |
unreachable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Value *Res = PoisonValue::get( | |
toVectorizedTy(ScalarTy, ElementCount::getFixed(getNumOperands()))); | |
auto NumOfElements = ElementCount::getFixed(getNumOperands()); | |
Value *Res = PoisonValue::get(toVectorizedTy(ScalarTy, NumOfElements)); |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto *STy = | |
auto *StructTy = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Value *Res = PoisonValue::get( | |
toVectorizedTy(STy, ElementCount::getFixed(getNumOperands()))); | |
auto NumOfElements = ElementCount::getFixed(getNumOperands()); | |
Value *Res = PoisonValue::get(toVectorizedTy(StructTy, NumOfElements)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (unsigned I = 0, E = STy->getNumElements(); I != E; I++) { | |
for (unsigned I = 0; I < NumOfElements; I++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case VPInstruction::BuildVector: | |
case VPInstruction::BuildStructVector: | |
case VPInstruction::BuildStructVector: | |
case VPInstruction::BuildVector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case VPInstruction::BuildVector: | |
O << "buildvector"; | |
break; | |
case VPInstruction::BuildStructVector: | |
O << "buildstructvector"; | |
break; | |
case VPInstruction::BuildStructVector: | |
O << "buildstructvector"; | |
break; | |
case VPInstruction::BuildVector: | |
O << "buildvector"; | |
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle simpler single-scalar case first and early-return, assert one of above two cases applies:
if (!State.Lane) {
assert(IsSingleScalar && "...");
scalarizeInstruction(UI, this, VPLane(0), State);
return;
}
// Generate a single instance.
...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -1140,6 +1140,22 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) { | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
return; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
// Look through Extract(Last|Penultimate)Element (BuildVector ....). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
if (match(&R, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
m_VPInstruction<VPInstruction::ExtractLastElement>(m_VPValue(A))) || | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
match(&R, m_VPInstruction<VPInstruction::ExtractPenultimateElement>( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
m_VPValue(A)))) { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
unsigned Offset = cast<VPInstruction>(&R)->getOpcode() == | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
VPInstruction::ExtractLastElement | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
? 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
: 2; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
auto *BV = dyn_cast<VPInstruction>(A); | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
if (BV && BV->getOpcode() == VPInstruction::BuildVector) { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Def->replaceAllUsesWith(BV->getOperand(BV->getNumOperands() - Offset)); | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
return; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
// Look through Extract(Last|Penultimate)Element (BuildVector ....). | |
if (match(&R, | |
m_VPInstruction<VPInstruction::ExtractLastElement>(m_VPValue(A))) || | |
match(&R, m_VPInstruction<VPInstruction::ExtractPenultimateElement>( | |
m_VPValue(A)))) { | |
unsigned Offset = cast<VPInstruction>(&R)->getOpcode() == | |
VPInstruction::ExtractLastElement | |
? 1 | |
: 2; | |
auto *BV = dyn_cast<VPInstruction>(A); | |
if (BV && BV->getOpcode() == VPInstruction::BuildVector) { | |
Def->replaceAllUsesWith(BV->getOperand(BV->getNumOperands() - Offset)); | |
return; | |
} | |
} | |
// Look through ExtractLastElement (BuildVector ....). | |
if (match(&R, m_VPInstruction<VPInstruction::ExtractLastElement>( | |
m_VPInstruction<VPInstruction::BuildVector>(BuildV))) { | |
Def->replaceAllUsesWith(BV->getOperand(BuildV->getNumOperands() - 1)); | |
return; | |
} | |
// Look through ExtractPenultimateElement (BuildVector ....). | |
if (match(&R, m_VPInstruction<VPInstruction::ExtractPenultimateElement>( | |
m_VPInstruction<VPInstruction::BuildVector>(BuildV))) { | |
Def->replaceAllUsesWith(BV->getOperand(BuildV->getNumOperands() - 2)); | |
return; | |
} | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks. The current version doesn't caputre the BuildVector VPInstruction though
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -99,6 +99,10 @@ struct VPlanTransforms { | |
/// Explicitly unroll \p Plan by \p UF. | ||
static void unrollByUF(VPlan &Plan, unsigned UF, LLVMContext &Ctx); | ||
|
||
/// Explicitly unroll VPReplicateRecipes outside of replicate regions by \p | ||
|
||
/// VF. | ||
|
||
static void unrollByVF(VPlan &Plan, ElementCount VF); | ||
|
||
|
||
/// Optimize \p Plan based on \p BestVF and \p BestUF. This may restrict the | ||
/// resulting plan to \p BestVF and \p BestUF. | ||
static void optimizeForVFAndUF(VPlan &Plan, ElementCount BestVF, | ||
|
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -15,6 +15,7 @@ | |||||||||
#include "VPlan.h" | ||||||||||
#include "VPlanAnalysis.h" | ||||||||||
#include "VPlanCFG.h" | ||||||||||
#include "VPlanHelpers.h" | ||||||||||
#include "VPlanPatternMatch.h" | ||||||||||
#include "VPlanTransforms.h" | ||||||||||
#include "VPlanUtils.h" | ||||||||||
|
@@ -430,3 +431,83 @@ void VPlanTransforms::unrollByUF(VPlan &Plan, unsigned UF, LLVMContext &Ctx) { | |||||||||
|
||||||||||
VPlanTransforms::removeDeadRecipes(Plan); | ||||||||||
} | ||||||||||
|
||||||||||
/// Create a single-scalar clone of RepR for lane \p Lane. | ||||||||||
|
/// Create a single-scalar clone of RepR for lane \p Lane. | |
/// Create a single-scalar clone of \p RepR for lane \p Lane. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ext = Builder.createNaryOp(VPInstruction::ExtractLastElement, {Op}); | |
Ext = Builder.createNaryOp(VPInstruction::ExtractLastElement, {Op}); | |
NewOps.push_back(Ext); | |
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a VPInstruction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That this failed to call transferFlags here, which resulted in miscompiles. I've created #147398 to fix.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Stores to invariant addresses only need to store the last lane. | |
// Stores to invariant addresses need to store the last lane only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a pair of replicating recipes one feeding the other is replaced by VF recipes feeding a buildVector which VF other recipes extract from, where the extracts are optimized away by cloneForLane(); and the buildVector possibly by dce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ResTy is void (better check for empty users instead?) suffice to clone for lanes and erase from parent, w/o populating LaneDefs, handling all stores early following those to a single address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it reasonable to unroll by VF before unrolling by UF rather than afterwards? BestVF is conceptually chosen before BestUF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason to run it after unrolling by UF is that the current position matches the order VPReplicateRecipes currently generate code, so less test changes.