-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[VPlan] Add VPInstruction to unpack vector values to scalars. #155670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 9 commits
a2a1372
b49efbc
719b51d
dc2df3a
39605b4
2f90433
26d2add
f617993
b4a34f9
14f2feb
133f50a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1064,6 +1064,11 @@ class LLVM_ABI_FOR_TEST VPInstruction : public VPRecipeWithIRFlags, | |
ResumeForEpilogue, | ||
/// Returns the value for vscale. | ||
VScale, | ||
/// Extracts all lanes from its (non-scalable) vector operand. This is an | ||
/// abstract VPInstruction whose single defined VPValue represents VF | ||
/// scalars extracted from a vector, to be replaced by VF ExtractElement | ||
/// VPInstructions. | ||
Unpack, | ||
|
||
}; | ||
|
||
/// Returns true if this VPInstruction generates scalar values for all lanes. | ||
|
@@ -2718,6 +2723,15 @@ class LLVM_ABI_FOR_TEST VPReductionRecipe : public VPRecipeWithIRFlags { | |
return R && classof(R); | ||
} | ||
|
||
static inline bool classof(const VPValue *VPV) { | ||
const VPRecipeBase *R = VPV->getDefiningRecipe(); | ||
return R && classof(R); | ||
} | ||
|
||
static inline bool classof(const VPSingleDefRecipe *R) { | ||
return classof(static_cast<const VPRecipeBase *>(R)); | ||
} | ||
|
||
Comment on lines
+2725
to
+2733
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, related to the introduction of unpack? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's needed for this change, due to the new cast/dyn_cast/isa with VPSingleDefRecipe |
||
/// Generate the reduction in the loop. | ||
void execute(VPTransformState &State) override; | ||
|
||
|
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -1223,6 +1223,13 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) { | |||||||||||||
return; | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
uint64_t Idx; | ||||||||||||||
if (match(&R, m_ExtractElement(m_BuildVector(), m_ConstantInt(Idx)))) { | ||||||||||||||
auto *BuildVector = cast<VPInstruction>(R.getOperand(0)); | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I think it could, but would be inconsistent with other matchers, where the inner matcher match the operands. |
||||||||||||||
Def->replaceAllUsesWith(BuildVector->getOperand(Idx)); | ||||||||||||||
return; | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
if (auto *Phi = dyn_cast<VPPhi>(Def)) { | ||||||||||||||
if (Phi->getNumOperands() == 1) | ||||||||||||||
Phi->replaceAllUsesWith(Phi->getOperand(0)); | ||||||||||||||
|
@@ -3740,7 +3747,7 @@ void VPlanTransforms::materializeBackedgeTakenCount(VPlan &Plan, | |||||||||||||
BTC->replaceAllUsesWith(TCMO); | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
void VPlanTransforms::materializeBuildVectors(VPlan &Plan) { | ||||||||||||||
void VPlanTransforms::materializePacksAndUnpacks(VPlan &Plan) { | ||||||||||||||
if (Plan.hasScalarVFOnly()) | ||||||||||||||
return; | ||||||||||||||
|
||||||||||||||
|
@@ -3789,6 +3796,48 @@ void VPlanTransforms::materializeBuildVectors(VPlan &Plan) { | |||||||||||||
}); | ||||||||||||||
} | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
// Create explicit VPInstructions to convert vectors to scalars. | ||||||||||||||
for (VPBasicBlock *VPBB : VPBBsInsideLoopRegion) { | ||||||||||||||
for (VPRecipeBase &R : make_early_inc_range(*VPBB)) { | ||||||||||||||
if (isa<VPReplicateRecipe, VPInstruction, VPScalarIVStepsRecipe, | ||||||||||||||
VPDerivedIVRecipe, VPCanonicalIVPHIRecipe>(&R)) | ||||||||||||||
continue; | ||||||||||||||
for (VPValue *Def : R.definedValues()) { | ||||||||||||||
artagnon marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
// Skip recipes that are single-scalar or only have their first lane | ||||||||||||||
// used. | ||||||||||||||
// TODO: The Defs skipped here may or may not be vector values. | ||||||||||||||
// Introduce Unpacks, and remove them later, if they are guaranteed to | ||||||||||||||
// produce scalar values. | ||||||||||||||
Comment on lines
+3810
to
+3812
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be placed above - next to "Create explicit...", to also capture skipping VPInstructions?
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Moved and adjusted, thanks |
||||||||||||||
if (vputils::isSingleScalar(Def) || vputils::onlyFirstLaneUsed(Def)) | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The onlyFirstLaneUsed case may require a single extract rather than full unpack. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, I added a comment that the recipes skipped here may or may not produce a vector eventually and a TODO to introduce unpacks for them as well, and remove them once we are guaranteed to produce scalars. For that reason, I left the code as is here for now. |
||||||||||||||
continue; | ||||||||||||||
|
||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth explaining how scalar users inside replicate regions (continue to) extract their values w/o going through an Unpack VPInstruction. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done and added TODO, thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! Better placed before the lambda? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should be moved, thanks |
||||||||||||||
// At the moment, we only create unpacks for scalar users outside | ||||||||||||||
|
// At the moment, we only create unpacks for scalar users outside | |
// At the moment, we create unpacks only for scalar users outside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// replicate regions. Recipes inside replicate regions still manually | |
// extract the required lanes. | |
// replicate regions. Recipes inside replicate regions still extract the | |
// required lanes implicitly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: Remove once replicate regions are | |
// unrolled completely. | |
// TODO: Remove once replicate regions are unrolled completely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
artagnon marked this conversation as resolved.
Show resolved
Hide resolved
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -326,8 +326,9 @@ struct VPlanTransforms { | |
VPBasicBlock *VectorPH); | ||
|
||
/// Add explicit Build[Struct]Vector recipes that combine multiple scalar | ||
/// values into single vectors. | ||
static void materializeBuildVectors(VPlan &Plan); | ||
/// values into single vectors and Unpack recipes to extract scalars from a | ||
/// vector as needed. | ||
|
||
static void materializePacksAndUnpacks(VPlan &Plan); | ||
|
||
/// Materialize VF and VFxUF to be computed explicitly using VPInstructions. | ||
static void materializeVFAndVFxUF(VPlan &Plan, VPBasicBlock *VectorPH, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -466,10 +466,21 @@ void VPlanTransforms::unrollByUF(VPlan &Plan, unsigned UF) { | |
/// Create a single-scalar clone of \p DefR (must be a VPReplicateRecipe or | ||
/// VPInstruction) for lane \p Lane. Use \p Def2LaneDefs to look up scalar | ||
/// definitions for operands of \DefR. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Independent: "\p DefR" rather than "\DefR" |
||
static VPRecipeWithIRFlags * | ||
static VPValue * | ||
cloneForLane(VPlan &Plan, VPBuilder &Builder, Type *IdxTy, | ||
VPRecipeWithIRFlags *DefR, VPLane Lane, | ||
const DenseMap<VPValue *, SmallVector<VPValue *>> &Def2LaneDefs) { | ||
VPValue *Op; | ||
if (match(DefR, m_VPInstruction<VPInstruction::Unpack>(m_VPValue(Op)))) { | ||
auto LaneDefs = Def2LaneDefs.find(Op); | ||
if (LaneDefs != Def2LaneDefs.end()) | ||
return LaneDefs->second[Lane.getKnownLane()]; | ||
|
||
VPValue *Idx = | ||
Plan.getOrAddLiveIn(ConstantInt::get(IdxTy, Lane.getKnownLane())); | ||
return Builder.createNaryOp(Instruction::ExtractElement, {Op, Idx}); | ||
} | ||
|
||
// Collect the operands at Lane, creating extracts as needed. | ||
SmallVector<VPValue *> NewOps; | ||
for (VPValue *Op : DefR->operands()) { | ||
|
@@ -481,6 +492,10 @@ cloneForLane(VPlan &Plan, VPBuilder &Builder, Type *IdxTy, | |
continue; | ||
} | ||
if (Lane.getKind() == VPLane::Kind::ScalableLast) { | ||
// Look through mandatory Unpack. | ||
[[maybe_unused]] bool Matched = | ||
match(Op, m_VPInstruction<VPInstruction::Unpack>(m_VPValue(Op))); | ||
assert(Matched && "original op must have been Unpack"); | ||
NewOps.push_back( | ||
Builder.createNaryOp(VPInstruction::ExtractLastElement, {Op})); | ||
continue; | ||
|
@@ -548,7 +563,8 @@ void VPlanTransforms::replicateByVF(VPlan &Plan, ElementCount VF) { | |
(isa<VPReplicateRecipe>(&R) && | ||
cast<VPReplicateRecipe>(&R)->isSingleScalar()) || | ||
(isa<VPInstruction>(&R) && | ||
!cast<VPInstruction>(&R)->doesGeneratePerAllLanes())) | ||
!cast<VPInstruction>(&R)->doesGeneratePerAllLanes() && | ||
cast<VPInstruction>(&R)->getOpcode() != VPInstruction::Unpack)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should Unpack indicate that it doesGeneratePerAllLanes? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It may not generate for all lanes at the moment, if only its first lane is used and all cases need to be handled by unrolling at the moment. |
||
continue; | ||
|
||
auto *DefR = cast<VPRecipeWithIRFlags>(&R); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -205,25 +205,25 @@ define void @test_load_gep_widen_induction(ptr noalias %dst, ptr noalias %dst2) | |
; CHECK-NEXT: [[STEP_ADD_2:%.*]] = add <2 x i64> [[STEP_ADD]], splat (i64 2) | ||
; CHECK-NEXT: [[STEP_ADD_3:%.*]] = add <2 x i64> [[STEP_ADD_2]], splat (i64 2) | ||
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr i128, ptr [[DST]], <2 x i64> [[VEC_IND]] | ||
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x ptr> [[TMP0]], i32 0 | ||
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x ptr> [[TMP0]], i32 1 | ||
; CHECK-NEXT: [[TMP1:%.*]] = getelementptr i128, ptr [[DST]], <2 x i64> [[STEP_ADD]] | ||
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x ptr> [[TMP1]], i32 0 | ||
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x ptr> [[TMP1]], i32 1 | ||
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr i128, ptr [[DST]], <2 x i64> [[STEP_ADD_2]] | ||
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x ptr> [[TMP2]], i32 0 | ||
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x ptr> [[TMP2]], i32 1 | ||
; CHECK-NEXT: [[TMP3:%.*]] = getelementptr i128, ptr [[DST]], <2 x i64> [[STEP_ADD_3]] | ||
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x ptr> [[TMP0]], i32 0 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP4]], align 8 | ||
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x ptr> [[TMP0]], i32 1 | ||
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x ptr> [[TMP3]], i32 0 | ||
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <2 x ptr> [[TMP3]], i32 1 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP5]], align 8 | ||
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x ptr> [[TMP1]], i32 0 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP6]], align 8 | ||
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x ptr> [[TMP1]], i32 1 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP7]], align 8 | ||
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x ptr> [[TMP2]], i32 0 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP8]], align 8 | ||
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x ptr> [[TMP2]], i32 1 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP9]], align 8 | ||
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x ptr> [[TMP3]], i32 0 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP10]], align 8 | ||
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x ptr> [[TMP3]], i32 1 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP11]], align 8 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP17]], align 8 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This store to TMP17 replaces the one removed above to TMP4? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, ok: this store to TMP17 replaces the one to TMP11, which is retained but TMP11 is redefined. Wonder if some simple "sink each ExtractElement to before its single/first user" pass could help confirm all test differences? |
||
; CHECK-NEXT: [[TMP12:%.*]] = getelementptr ptr, ptr [[DST2]], i64 [[OFFSET_IDX]] | ||
; CHECK-NEXT: [[TMP13:%.*]] = getelementptr ptr, ptr [[TMP12]], i32 2 | ||
; CHECK-NEXT: [[TMP14:%.*]] = getelementptr ptr, ptr [[TMP12]], i32 4 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, comment indicates that these transforms are still taking place during VPlan execution rather than planning. Perhaps worth rewording to clarify.