-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[VPlan] Add VPInstruction to unpack vector values to scalars. #155670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 7 commits
a2a1372
b49efbc
719b51d
dc2df3a
39605b4
2f90433
26d2add
f617993
b4a34f9
14f2feb
133f50a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -1064,6 +1064,11 @@ class LLVM_ABI_FOR_TEST VPInstruction : public VPRecipeWithIRFlags, | |||||
ResumeForEpilogue, | ||||||
/// Returns the value for vscale. | ||||||
VScale, | ||||||
/// Extracts all lanes from its (non-scalable) vector operand. This is an | ||||||
/// abstract VPInstruction whose single defined VPValue represents VF | ||||||
/// scalars extracted from a vector, to be replaced by VF ExtractElement | ||||||
/// VPInstructions? | ||||||
|
/// VPInstructions? | |
/// VPInstructions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed ,thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, related to the introduction of unpack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's needed for this change, due to the new cast/dyn_cast/isa with VPSingleDefRecipe
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -514,6 +514,7 @@ unsigned VPInstruction::getNumOperandsForOpcode(unsigned Opcode) { | |
case VPInstruction::ExtractPenultimateElement: | ||
case VPInstruction::FirstActiveLane: | ||
case VPInstruction::Not: | ||
case VPInstruction::Unpack: | ||
return 1; | ||
case Instruction::ICmp: | ||
case Instruction::FCmp: | ||
|
@@ -1240,6 +1241,7 @@ bool VPInstruction::opcodeMayReadOrWriteFromMemory() const { | |
case VPInstruction::StepVector: | ||
case VPInstruction::ReductionStartVector: | ||
case VPInstruction::VScale: | ||
case VPInstruction::Unpack: | ||
return false; | ||
default: | ||
return true; | ||
|
@@ -1283,8 +1285,6 @@ bool VPInstruction::onlyFirstLaneUsed(const VPValue *Op) const { | |
return getNumOperands() > 1; | ||
case VPInstruction::PtrAdd: | ||
return Op == getOperand(0) || vputils::onlyFirstLaneUsed(this); | ||
case VPInstruction::WidePtrAdd: | ||
return Op == getOperand(0); | ||
Comment on lines
1287
to
1288
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unrelated? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current code here doesn't match the codegen for WidePtrAdd which is exposed by this patch; it only uses the first lane, if the pointer is a single scalar. With this code, we would incorrectly insert an
Comment on lines
1287
to
1288
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How/Is this related? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The patch exposed an issue here, more details in https://github.com/llvm/llvm-project/pull/155670/files#r2347298616 above There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Defaulting to false seems a bit odd, as the first operand should be a scalar base. Worth a comment? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a comment. It supports both vector and scalar base addresses. |
||
case VPInstruction::ComputeAnyOfResult: | ||
case VPInstruction::ComputeFindIVResult: | ||
return Op == getOperand(1); | ||
|
@@ -1408,6 +1408,9 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent, | |
case VPInstruction::ResumeForEpilogue: | ||
O << "resume-for-epilogue"; | ||
break; | ||
case VPInstruction::Unpack: | ||
O << "unpack"; | ||
break; | ||
default: | ||
O << Instruction::getOpcodeName(getOpcode()); | ||
} | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -1223,6 +1223,13 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) { | |||||
return; | ||||||
} | ||||||
|
||||||
uint64_t Idx; | ||||||
if (match(&R, m_ExtractElement(m_BuildVector(), m_ConstantInt(Idx)))) { | ||||||
auto *BuildVector = cast<VPInstruction>(R.getOperand(0)); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I think it could, but would be inconsistent with other matchers, where the inner matcher match the operands. |
||||||
Def->replaceAllUsesWith(BuildVector->getOperand(Idx)); | ||||||
return; | ||||||
} | ||||||
|
||||||
if (auto *Phi = dyn_cast<VPPhi>(Def)) { | ||||||
if (Phi->getNumOperands() == 1) | ||||||
Phi->replaceAllUsesWith(Phi->getOperand(0)); | ||||||
|
@@ -3740,7 +3747,7 @@ void VPlanTransforms::materializeBackedgeTakenCount(VPlan &Plan, | |||||
BTC->replaceAllUsesWith(TCMO); | ||||||
} | ||||||
|
||||||
void VPlanTransforms::materializeBuildVectors(VPlan &Plan) { | ||||||
void VPlanTransforms::materializePacksAndUnpacks(VPlan &Plan) { | ||||||
if (Plan.hasScalarVFOnly()) | ||||||
return; | ||||||
|
||||||
|
@@ -3789,6 +3796,42 @@ void VPlanTransforms::materializeBuildVectors(VPlan &Plan) { | |||||
}); | ||||||
} | ||||||
} | ||||||
|
||||||
// Create explicit VPInstructions to convert vectors to scalars. | ||||||
for (VPBasicBlock *VPBB : VPBBsInsideLoopRegion) { | ||||||
for (VPRecipeBase &R : make_early_inc_range(*VPBB)) { | ||||||
if (isa<VPReplicateRecipe, VPInstruction, VPScalarIVStepsRecipe>(&R)) | ||||||
continue; | ||||||
for (VPValue *Def : R.definedValues()) { | ||||||
artagnon marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
if (vputils::isSingleScalar(Def) || vputils::onlyFirstLaneUsed(Def)) | ||||||
|
||||||
continue; | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth explaining how scalar users inside replicate regions (continue to) extract their values w/o going through an Unpack VPInstruction. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done and added TODO, thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! Better placed before the lambda? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should be moved, thanks |
||||||
auto IsInsideReplicateRegion = [LoopRegion](VPUser *U) { | ||||||
VPRegionBlock *ParentRegion = | ||||||
cast<VPRecipeBase>(U)->getParent()->getParent(); | ||||||
return ParentRegion && ParentRegion != LoopRegion; | ||||||
|
||||||
}; | ||||||
// At the moment, we only create unpacks for scalar users outside | ||||||
// replicate regions. Recipes inside replicate regions still manually | ||||||
// extract the required lanes. TODO: Remove once replicate regions are | ||||||
// unrolled explicitly. | ||||||
|
// unrolled explicitly. | |
// unrolled completely. |
They are already unrolled explicitly by UF, but have yet to also be unrolled by VF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, updated, thanks
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return !IsInsideReplicateRegion(U) && U->usesScalars(Def); | |
return U->usesScalars(Def) && !IsInsideReplicateRegion(U); |
sounds a bit cleaner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done thanks
artagnon marked this conversation as resolved.
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
Worth defining an IsCandidateUnpackUser()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, merged IsInsideReplicateRegion
in new IsCandidateUnpackUser
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -326,8 +326,9 @@ struct VPlanTransforms { | |||||
VPBasicBlock *VectorPH); | ||||||
|
||||||
/// Add explicit Build[Struct]Vector recipes that combine multiple scalar | ||||||
/// values into single vectors. | ||||||
static void materializeBuildVectors(VPlan &Plan); | ||||||
/// values into single vectors and UnpackVector to extract scalars from a | ||||||
|
/// values into single vectors and UnpackVector to extract scalars from a | |
/// values into single vectors and Unpack recipes to extract scalars from a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -466,10 +466,21 @@ void VPlanTransforms::unrollByUF(VPlan &Plan, unsigned UF) { | |
/// Create a single-scalar clone of \p DefR (must be a VPReplicateRecipe or | ||
/// VPInstruction) for lane \p Lane. Use \p Def2LaneDefs to look up scalar | ||
/// definitions for operands of \DefR. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Independent: "\p DefR" rather than "\DefR" |
||
static VPRecipeWithIRFlags * | ||
static VPValue * | ||
cloneForLane(VPlan &Plan, VPBuilder &Builder, Type *IdxTy, | ||
VPRecipeWithIRFlags *DefR, VPLane Lane, | ||
const DenseMap<VPValue *, SmallVector<VPValue *>> &Def2LaneDefs) { | ||
VPValue *Op; | ||
if (match(DefR, m_VPInstruction<VPInstruction::Unpack>(m_VPValue(Op)))) { | ||
auto LaneDefs = Def2LaneDefs.find(Op); | ||
if (LaneDefs != Def2LaneDefs.end()) | ||
return LaneDefs->second[Lane.getKnownLane()]; | ||
|
||
VPValue *Idx = | ||
Plan.getOrAddLiveIn(ConstantInt::get(IdxTy, Lane.getKnownLane())); | ||
return Builder.createNaryOp(Instruction::ExtractElement, {Op, Idx}); | ||
} | ||
|
||
// Collect the operands at Lane, creating extracts as needed. | ||
SmallVector<VPValue *> NewOps; | ||
for (VPValue *Op : DefR->operands()) { | ||
|
@@ -481,6 +492,10 @@ cloneForLane(VPlan &Plan, VPBuilder &Builder, Type *IdxTy, | |
continue; | ||
} | ||
if (Lane.getKind() == VPLane::Kind::ScalableLast) { | ||
// Look through mandatory Unpack. | ||
[[maybe_unused]] bool Matched = | ||
match(Op, m_VPInstruction<VPInstruction::Unpack>(m_VPValue(Op))); | ||
assert(Matched && "original op must have been Unpack"); | ||
NewOps.push_back( | ||
Builder.createNaryOp(VPInstruction::ExtractLastElement, {Op})); | ||
continue; | ||
|
@@ -548,7 +563,8 @@ void VPlanTransforms::replicateByVF(VPlan &Plan, ElementCount VF) { | |
(isa<VPReplicateRecipe>(&R) && | ||
cast<VPReplicateRecipe>(&R)->isSingleScalar()) || | ||
(isa<VPInstruction>(&R) && | ||
!cast<VPInstruction>(&R)->doesGeneratePerAllLanes())) | ||
!cast<VPInstruction>(&R)->doesGeneratePerAllLanes() && | ||
cast<VPInstruction>(&R)->getOpcode() != VPInstruction::Unpack)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should Unpack indicate that it doesGeneratePerAllLanes? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It may not generate for all lanes at the moment, if only its first lane is used and all cases need to be handled by unrolling at the moment. |
||
continue; | ||
|
||
auto *DefR = cast<VPRecipeWithIRFlags>(&R); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -205,25 +205,25 @@ define void @test_load_gep_widen_induction(ptr noalias %dst, ptr noalias %dst2) | |
; CHECK-NEXT: [[STEP_ADD_2:%.*]] = add <2 x i64> [[STEP_ADD]], splat (i64 2) | ||
; CHECK-NEXT: [[STEP_ADD_3:%.*]] = add <2 x i64> [[STEP_ADD_2]], splat (i64 2) | ||
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr i128, ptr [[DST]], <2 x i64> [[VEC_IND]] | ||
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x ptr> [[TMP0]], i32 0 | ||
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x ptr> [[TMP0]], i32 1 | ||
; CHECK-NEXT: [[TMP1:%.*]] = getelementptr i128, ptr [[DST]], <2 x i64> [[STEP_ADD]] | ||
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x ptr> [[TMP1]], i32 0 | ||
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x ptr> [[TMP1]], i32 1 | ||
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr i128, ptr [[DST]], <2 x i64> [[STEP_ADD_2]] | ||
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x ptr> [[TMP2]], i32 0 | ||
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x ptr> [[TMP2]], i32 1 | ||
; CHECK-NEXT: [[TMP3:%.*]] = getelementptr i128, ptr [[DST]], <2 x i64> [[STEP_ADD_3]] | ||
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x ptr> [[TMP0]], i32 0 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP4]], align 8 | ||
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x ptr> [[TMP0]], i32 1 | ||
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x ptr> [[TMP3]], i32 0 | ||
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <2 x ptr> [[TMP3]], i32 1 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP5]], align 8 | ||
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x ptr> [[TMP1]], i32 0 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP6]], align 8 | ||
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x ptr> [[TMP1]], i32 1 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP7]], align 8 | ||
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x ptr> [[TMP2]], i32 0 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP8]], align 8 | ||
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x ptr> [[TMP2]], i32 1 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP9]], align 8 | ||
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x ptr> [[TMP3]], i32 0 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP10]], align 8 | ||
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x ptr> [[TMP3]], i32 1 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP11]], align 8 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP17]], align 8 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This store to TMP17 replaces the one removed above to TMP4? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, ok: this store to TMP17 replaces the one to TMP11, which is retained but TMP11 is redefined. Wonder if some simple "sink each ExtractElement to before its single/first user" pass could help confirm all test differences? |
||
; CHECK-NEXT: [[TMP12:%.*]] = getelementptr ptr, ptr [[DST2]], i64 [[OFFSET_IDX]] | ||
; CHECK-NEXT: [[TMP13:%.*]] = getelementptr ptr, ptr [[TMP12]], i32 2 | ||
; CHECK-NEXT: [[TMP14:%.*]] = getelementptr ptr, ptr [[TMP12]], i32 4 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, comment indicates that these transforms are still taking place during VPlan execution rather than planning. Perhaps worth rewording to clarify.