-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[VPlan] Unroll VPReplicateRecipe by VF. #142433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
c354bdc
2e877b5
0800450
884b9d3
ed982d4
14e296c
48699d9
cc1a779
47b9665
af2d2c0
5a73ebe
ab6665c
b6a0834
ae3e3c4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -261,6 +261,13 @@ Value *VPTransformState::get(const VPValue *Def, const VPLane &Lane) { | |
return Data.VPV2Scalars[Def][0]; | ||
} | ||
|
||
// Look through BuildVector to avoid redundant extracts. | ||
// TODO: Remove once replicate regions are unrolled explicitly. | ||
if (Lane.getKind() == VPLane::Kind::First && match(Def, m_BuildVector())) { | ||
auto *BuildVector = cast<VPInstruction>(Def); | ||
return get(BuildVector->getOperand(Lane.getKnownLane()), true); | ||
} | ||
|
||
assert(hasVectorValue(Def)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Independent: missing error message. |
||
auto *VecPart = Data.VPV2Vector[Def]; | ||
if (!VecPart->getType()->isVectorTy()) { | ||
|
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -907,6 +907,13 @@ class VPInstruction : public VPRecipeWithIRFlags, | |||||||||||||
BranchOnCount, | ||||||||||||||
BranchOnCond, | ||||||||||||||
Broadcast, | ||||||||||||||
/// Creates a struct of fixed-width vectors containing all operands. The | ||||||||||||||
/// number of operands | ||||||||||||||
/// matches the number of fields in the struct. | ||||||||||||||
|
/// Creates a struct of fixed-width vectors containing all operands. The | |
/// number of operands | |
/// matches the number of fields in the struct. | |
/// Given operands of (the same) struct type, creates a struct of fixed- | |
/// width vectors each containing a struct field of all operands. The | |
/// number of operands matches the element count of every vector. |
(Strictly speaking, the vectors created contain the fields of the operands, rather than the complete operands, in contrast to BuildVector.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lex order has BuildVector after BuildStructVector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reordered, thanks
Original file line number | Diff line number | Diff line change | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -221,6 +221,9 @@ struct Recipe_match { | |||||||||||||||||
if ((!matchRecipeAndOpcode<RecipeTys>(R) && ...)) | ||||||||||||||||||
return false; | ||||||||||||||||||
|
||||||||||||||||||
auto *VPI = dyn_cast<VPInstruction>(R); | ||||||||||||||||||
if (VPI && VPI->getOpcode() == VPInstruction::BuildVector) | ||||||||||||||||||
return true; | ||||||||||||||||||
|
auto *VPI = dyn_cast<VPInstruction>(R); | |
if (VPI && VPI->getOpcode() == VPInstruction::BuildVector) | |
return true; | |
// Finally match operands, except for BuildVector which is matched w/o checking its operands. | |
auto *VPI = dyn_cast<VPInstruction>(R); | |
if (VPI && VPI->getOpcode() == VPInstruction::BuildVector) | |
return true; |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to check instead if Ops_t is empty and if so assert that R is BuildVector and early return, or set NumOperands to zero instead of R->getNumOperands() and continue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved up to handle the case first, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Independent nit: worth noting below that "Commutative" checks operands in reverse order, which works best for binary operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline ZeroOpVPInstruction_match<VPInstruction::BuildVector> m_BuildVector() { | |
return ZeroOpVPInstruction_match<VPInstruction::BuildVector>(); | |
} | |
/// BuildVector is matches only its opcode, w/o matching its operands. | |
inline ZeroOpVPInstruction_match<VPInstruction::BuildVector> m_BuildVector() { | |
return ZeroOpVPInstruction_match<VPInstruction::BuildVector>(); | |
} | |
plus some explanation why - number of operands varies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -493,9 +493,9 @@ Value *VPInstruction::generate(VPTransformState &State) { | |||||||
} | ||||||||
case Instruction::ExtractElement: { | ||||||||
assert(State.VF.isVector() && "Only extract elements from vectors"); | ||||||||
Value *Vec = State.get(getOperand(0)); | ||||||||
Value *Idx = State.get(getOperand(1), /*IsScalar=*/true); | ||||||||
return Builder.CreateExtractElement(Vec, Idx, Name); | ||||||||
unsigned IdxToExtract = | ||||||||
cast<ConstantInt>(getOperand(1)->getLiveInIRValue())->getZExtValue(); | ||||||||
return State.get(getOperand(0), VPLane(IdxToExtract)); | ||||||||
|
||||||||
} | ||||||||
case Instruction::Freeze: { | ||||||||
Value *Op = State.get(getOperand(0), vputils::onlyFirstLaneUsed(this)); | ||||||||
|
@@ -604,6 +604,35 @@ Value *VPInstruction::generate(VPTransformState &State) { | |||||||
return Builder.CreateVectorSplat( | ||||||||
State.VF, State.get(getOperand(0), /*IsScalar*/ true), "broadcast"); | ||||||||
} | ||||||||
case VPInstruction::BuildStructVector: { | ||||||||
// For struct types, we need to build a new 'wide' struct type, where each | ||||||||
// element is widened. | ||||||||
auto *StructTy = | ||||||||
cast<StructType>(State.TypeAnalysis.inferScalarType(getOperand(0))); | ||||||||
auto NumOfElements = ElementCount::getFixed(getNumOperands()); | ||||||||
Value *Res = PoisonValue::get(toVectorizedTy(StructTy, NumOfElements)); | ||||||||
|
||||||||
assert(NumOfElements.getKnownMinValue() == StructTy->getNumElements() && | ||||||||
"number of operands must match number of elements in StructTy"); | ||||||||
|
||||||||
for (const auto &[Idx, Op] : enumerate(operands())) { | ||||||||
for (unsigned I = 0; I != NumOfElements.getKnownMinValue(); I++) { | ||||||||
|
||||||||
Value *ScalarValue = Builder.CreateExtractValue(State.get(Op, true), I); | ||||||||
Value *VectorValue = Builder.CreateExtractValue(Res, I); | ||||||||
VectorValue = | ||||||||
Builder.CreateInsertElement(VectorValue, ScalarValue, Idx); | ||||||||
Res = Builder.CreateInsertValue(Res, VectorValue, I); | ||||||||
} | ||||||||
} | ||||||||
return Res; | ||||||||
} | ||||||||
case VPInstruction::BuildVector: { | ||||||||
auto *ScalarTy = State.TypeAnalysis.inferScalarType(getOperand(0)); | ||||||||
auto NumOfElements = ElementCount::getFixed(getNumOperands()); | ||||||||
Value *Res = PoisonValue::get(toVectorizedTy(ScalarTy, NumOfElements)); | ||||||||
for (const auto &[Idx, Op] : enumerate(operands())) | ||||||||
Res = State.Builder.CreateInsertElement(Res, State.get(Op, true), | ||||||||
State.Builder.getInt32(Idx)); | ||||||||
return Res; | ||||||||
} | ||||||||
case VPInstruction::ReductionStartVector: { | ||||||||
if (State.VF.isScalar()) | ||||||||
return State.get(getOperand(0), true); | ||||||||
|
@@ -885,10 +914,11 @@ void VPInstruction::execute(VPTransformState &State) { | |||||||
if (!hasResult()) | ||||||||
return; | ||||||||
assert(GeneratedValue && "generate must produce a value"); | ||||||||
assert( | ||||||||
(GeneratedValue->getType()->isVectorTy() == !GeneratesPerFirstLaneOnly || | ||||||||
State.VF.isScalar()) && | ||||||||
"scalar value but not only first lane defined"); | ||||||||
assert((((GeneratedValue->getType()->isVectorTy() || | ||||||||
GeneratedValue->getType()->isStructTy()) == | ||||||||
!GeneratesPerFirstLaneOnly) || | ||||||||
State.VF.isScalar()) && | ||||||||
"scalar value but not only first lane defined"); | ||||||||
State.set(this, GeneratedValue, | ||||||||
/*IsScalar*/ GeneratesPerFirstLaneOnly); | ||||||||
} | ||||||||
|
@@ -902,6 +932,8 @@ bool VPInstruction::opcodeMayReadOrWriteFromMemory() const { | |||||||
case Instruction::ICmp: | ||||||||
case Instruction::Select: | ||||||||
case VPInstruction::AnyOf: | ||||||||
case VPInstruction::BuildStructVector: | ||||||||
case VPInstruction::BuildVector: | ||||||||
case VPInstruction::CalculateTripCountMinusVF: | ||||||||
case VPInstruction::CanonicalIVIncrementForPart: | ||||||||
case VPInstruction::ExtractLastElement: | ||||||||
|
@@ -1023,6 +1055,12 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent, | |||||||
case VPInstruction::Broadcast: | ||||||||
O << "broadcast"; | ||||||||
break; | ||||||||
case VPInstruction::BuildStructVector: | ||||||||
O << "buildstructvector"; | ||||||||
break; | ||||||||
case VPInstruction::BuildVector: | ||||||||
O << "buildvector"; | ||||||||
break; | ||||||||
case VPInstruction::ExtractLastElement: | ||||||||
O << "extract-last-element"; | ||||||||
break; | ||||||||
|
@@ -2758,44 +2796,29 @@ static void scalarizeInstruction(const Instruction *Instr, | |||||||
|
||||||||
void VPReplicateRecipe::execute(VPTransformState &State) { | ||||||||
Instruction *UI = getUnderlyingInstr(); | ||||||||
if (State.Lane) { // Generate a single instance. | ||||||||
assert((State.VF.isScalar() || !isSingleScalar()) && | ||||||||
"uniform recipe shouldn't be predicated"); | ||||||||
assert(!State.VF.isScalable() && "Can't scalarize a scalable vector"); | ||||||||
scalarizeInstruction(UI, this, *State.Lane, State); | ||||||||
// Insert scalar instance packing it into a vector. | ||||||||
if (State.VF.isVector() && shouldPack()) { | ||||||||
// If we're constructing lane 0, initialize to start from poison. | ||||||||
if (State.Lane->isFirstLane()) { | ||||||||
assert(!State.VF.isScalable() && "VF is assumed to be non scalable."); | ||||||||
Value *Poison = | ||||||||
PoisonValue::get(VectorType::get(UI->getType(), State.VF)); | ||||||||
State.set(this, Poison); | ||||||||
} | ||||||||
State.packScalarIntoVectorizedValue(this, *State.Lane); | ||||||||
} | ||||||||
return; | ||||||||
} | ||||||||
|
||||||||
if (IsSingleScalar) { | ||||||||
// Uniform within VL means we need to generate lane 0. | ||||||||
if (!State.Lane) { | ||||||||
assert(IsSingleScalar && | ||||||||
"VPReplicateRecipes outside replicate regions must be unrolled"); | ||||||||
|
"VPReplicateRecipes outside replicate regions must be unrolled"); | |
"VPReplicateRecipes outside replicate regions must have already been unrolled"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle simpler single-scalar case first and early-return, assert one of above two cases applies:
if (!State.Lane) {
assert(IsSingleScalar && "...");
scalarizeInstruction(UI, this, VPLane(0), State);
return;
}
// Generate a single instance.
...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"uniform recipe shouldn't be predicated"); | |
"uniform recipe shouldn't be predicated"); | |
assert(!State.VF.isScalable() && "Can't scalarize a scalable vector"); |
retain the assert here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -1140,6 +1140,23 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) { | |||||||||||
return; | ||||||||||||
} | ||||||||||||
|
||||||||||||
// Look through ExtractLastElement (BuildVector ....). | ||||||||||||
if (match(&R, m_VPInstruction<VPInstruction::ExtractLastElement>( | ||||||||||||
m_BuildVector()))) { | ||||||||||||
auto *BuildVector = cast<VPInstruction>(R.getOperand(0)); | ||||||||||||
Def->replaceAllUsesWith( | ||||||||||||
BuildVector->getOperand(BuildVector->getNumOperands() - 1)); | ||||||||||||
return; | ||||||||||||
} | ||||||||||||
// Look through ExtractPenultimateElement (BuildVector ....). | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done thanks |
||||||||||||
if (match(&R, m_VPInstruction<VPInstruction::ExtractPenultimateElement>( | ||||||||||||
m_BuildVector()))) { | ||||||||||||
auto *BuildVector = cast<VPInstruction>(R.getOperand(0)); | ||||||||||||
Def->replaceAllUsesWith( | ||||||||||||
BuildVector->getOperand(BuildVector->getNumOperands() - 2)); | ||||||||||||
return; | ||||||||||||
} | ||||||||||||
|
||||||||||||
// Some simplifications can only be applied after unrolling. Perform them | ||||||||||||
// below. | ||||||||||||
if (!Plan->isUnrolled()) | ||||||||||||
|
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -99,6 +99,11 @@ struct VPlanTransforms { | |||||||||
/// Explicitly unroll \p Plan by \p UF. | ||||||||||
static void unrollByUF(VPlan &Plan, unsigned UF, LLVMContext &Ctx); | ||||||||||
|
||||||||||
/// Replace replicating VPReplicateRecipes outside replicate regions in \p | ||||||||||
/// Plan with \p VF single-scalar recipes. | ||||||||||
|
/// Replace replicating VPReplicateRecipes outside replicate regions in \p | |
/// Plan with \p VF single-scalar recipes. | |
/// Replace each VPReplicateRecipe outside on any replicate region in \p Plan | |
/// with \p VF single-scalar recipes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// TODO: Also unroll VPReplicateRegions by VF. | |
/// TODO: Also replicate VPReplicateRecipes inside replicate regions, thereby | |
/// dissolving the latter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated thanks
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,6 +15,7 @@ | |
#include "VPlan.h" | ||
#include "VPlanAnalysis.h" | ||
#include "VPlanCFG.h" | ||
#include "VPlanHelpers.h" | ||
#include "VPlanPatternMatch.h" | ||
#include "VPlanTransforms.h" | ||
#include "VPlanUtils.h" | ||
|
@@ -445,3 +446,83 @@ void VPlanTransforms::unrollByUF(VPlan &Plan, unsigned UF, LLVMContext &Ctx) { | |
|
||
VPlanTransforms::removeDeadRecipes(Plan); | ||
} | ||
|
||
/// Create a single-scalar clone of \p RepR for lane \p Lane. | ||
static VPReplicateRecipe *cloneForLane(VPlan &Plan, VPBuilder &Builder, | ||
Type *IdxTy, VPReplicateRecipe *RepR, | ||
VPLane Lane) { | ||
// Collect the operands at Lane, creating extracts as needed. | ||
SmallVector<VPValue *> NewOps; | ||
for (VPValue *Op : RepR->operands()) { | ||
if (vputils::isSingleScalar(Op)) { | ||
NewOps.push_back(Op); | ||
continue; | ||
} | ||
if (Lane.getKind() == VPLane::Kind::ScalableLast) { | ||
NewOps.push_back( | ||
Builder.createNaryOp(VPInstruction::ExtractLastElement, {Op})); | ||
continue; | ||
} | ||
// Look through buildvector to avoid unnecessary extracts. | ||
if (match(Op, m_BuildVector())) { | ||
NewOps.push_back( | ||
cast<VPInstruction>(Op)->getOperand(Lane.getKnownLane())); | ||
continue; | ||
} | ||
VPValue *Idx = | ||
Plan.getOrAddLiveIn(ConstantInt::get(IdxTy, Lane.getKnownLane())); | ||
VPValue *Ext = Builder.createNaryOp(Instruction::ExtractElement, {Op, Idx}); | ||
NewOps.push_back(Ext); | ||
} | ||
|
||
auto *New = | ||
new VPReplicateRecipe(RepR->getUnderlyingInstr(), NewOps, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could this be a VPInstruction? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
/*IsSingleScalar=*/true, /*Mask=*/nullptr, *RepR); | ||
New->insertBefore(RepR); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That this failed to call transferFlags here, which resulted in miscompiles. I've created #147398 to fix. |
||
return New; | ||
} | ||
|
||
void VPlanTransforms::replicateByVF(VPlan &Plan, ElementCount VF) { | ||
Type *IdxTy = IntegerType::get( | ||
Plan.getScalarHeader()->getIRBasicBlock()->getContext(), 32); | ||
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>( | ||
vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry()))) { | ||
for (VPRecipeBase &R : make_early_inc_range(*VPBB)) { | ||
auto *RepR = dyn_cast<VPReplicateRecipe>(&R); | ||
if (!RepR || RepR->isSingleScalar()) | ||
continue; | ||
|
||
VPBuilder Builder(RepR); | ||
SmallVector<VPValue *> LaneDefs; | ||
// Stores to invariant addresses need to store the last lane only. | ||
|
||
if (isa<StoreInst>(RepR->getUnderlyingInstr()) && | ||
vputils::isSingleScalar(RepR->getOperand(1))) { | ||
cloneForLane(Plan, Builder, IdxTy, RepR, VPLane::getLastLaneForVF(VF)); | ||
RepR->eraseFromParent(); | ||
continue; | ||
} | ||
|
||
/// Create single-scalar version of RepR for all lanes. | ||
for (unsigned I = 0; I != VF.getKnownMinValue(); ++I) | ||
LaneDefs.push_back(cloneForLane(Plan, Builder, IdxTy, RepR, VPLane(I))); | ||
|
||
/// Users that only demand the first lane can use the definition for lane | ||
/// 0. | ||
RepR->replaceUsesWithIf(LaneDefs[0], [RepR](VPUser &U, unsigned) { | ||
return U.onlyFirstLaneUsed(RepR); | ||
}); | ||
|
||
Type *ResTy = RepR->getUnderlyingInstr()->getType(); | ||
// If needed, create a Build(Struct)Vector recipe to insert the scalar | ||
// lane values into a vector. | ||
Comment on lines
+526
to
+527
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So a pair of replicating recipes one feeding the other is replaced by VF recipes feeding a buildVector which VF other recipes extract from, where the extracts are optimized away by cloneForLane(); and the buildVector possibly by dce? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep |
||
if (!ResTy->isVoidTy()) { | ||
|
||
VPValue *VecRes = Builder.createNaryOp( | ||
ResTy->isStructTy() ? VPInstruction::BuildStructVector | ||
: VPInstruction::BuildVector, | ||
LaneDefs); | ||
RepR->replaceAllUsesWith(VecRes); | ||
} | ||
RepR->eraseFromParent(); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done now rather than later to reduce test diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep without the fold we would have additional insert/extracts for uses inside replicate regions, with corresponding test changes.