-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[VPlan] Add VPInstruction to unpack vector values to scalars. #155670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-backend-risc-v Author: Florian Hahn (fhahn) ChangesAdd a new Unpack VPInstruction (name to be improved) to explicitly Test changes are movements of the extracts: they are no generated Depends on #155102 (included in Patch is 436.83 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155670.diff 71 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 1438dc366b55d..1ada5e413bd9e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -333,6 +333,9 @@ Value *VPTransformState::get(const VPValue *Def, bool NeedsScalar) {
LastLane = 0;
}
+ assert(IsSingleScalar && "must be a single-scalar at this point");
+ // We need to construct the vector value for a single-scalar value by
+ // broadcasting the scalar to all lanes.
auto *LastInst = cast<Instruction>(get(Def, LastLane));
// Set the insert point after the last scalarized instruction or after the
// last PHI, if LastInst is a PHI. This ensures the insertelement sequence
@@ -343,27 +346,8 @@ Value *VPTransformState::get(const VPValue *Def, bool NeedsScalar) {
: std::next(BasicBlock::iterator(LastInst));
Builder.SetInsertPoint(&*NewIP);
- // However, if we are vectorizing, we need to construct the vector values.
- // If the value is known to be uniform after vectorization, we can just
- // broadcast the scalar value corresponding to lane zero. Otherwise, we
- // construct the vector values using insertelement instructions. Since the
- // resulting vectors are stored in State, we will only generate the
- // insertelements once.
- Value *VectorValue = nullptr;
- if (IsSingleScalar) {
- VectorValue = GetBroadcastInstrs(ScalarValue);
- set(Def, VectorValue);
- } else {
- assert(!VF.isScalable() && "VF is assumed to be non scalable.");
- assert(isa<VPInstruction>(Def) &&
- "Explicit BuildVector recipes must have"
- "handled packing for non-VPInstructions.");
- // Initialize packing with insertelements to start from poison.
- VectorValue = PoisonValue::get(toVectorizedTy(LastInst->getType(), VF));
- for (unsigned Lane = 0; Lane < VF.getFixedValue(); ++Lane)
- VectorValue = packScalarIntoVectorizedValue(Def, VectorValue, Lane);
- set(Def, VectorValue);
- }
+ Value *VectorValue = GetBroadcastInstrs(ScalarValue);
+ set(Def, VectorValue);
Builder.restoreIP(OldIP);
return VectorValue;
}
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 33bcb49b81740..b0ada25c6fe67 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -900,6 +900,8 @@ struct VPRecipeWithIRFlags : public VPSingleDefRecipe, public VPIRFlags {
return R && classof(R);
}
+ virtual VPRecipeWithIRFlags *clone() override = 0;
+
void execute(VPTransformState &State) override = 0;
/// Compute the cost for this recipe for \p VF, using \p Opcode and \p Ctx.
@@ -1042,15 +1044,9 @@ class LLVM_ABI_FOR_TEST VPInstruction : public VPRecipeWithIRFlags,
ResumeForEpilogue,
/// Returns the value for vscale.
VScale,
+ Unpack,
};
-private:
- typedef unsigned char OpcodeTy;
- OpcodeTy Opcode;
-
- /// An optional name that can be used for the generated IR instruction.
- const std::string Name;
-
/// Returns true if this VPInstruction generates scalar values for all lanes.
/// Most VPInstructions generate a single value per part, either vector or
/// scalar. VPReplicateRecipe takes care of generating multiple (scalar)
@@ -1059,6 +1055,13 @@ class LLVM_ABI_FOR_TEST VPInstruction : public VPRecipeWithIRFlags,
/// underlying ingredient.
bool doesGeneratePerAllLanes() const;
+private:
+ typedef unsigned char OpcodeTy;
+ OpcodeTy Opcode;
+
+ /// An optional name that can be used for the generated IR instruction.
+ const std::string Name;
+
/// Returns true if we can generate a scalar for the first lane only if
/// needed.
bool canGenerateScalarForFirstLane() const;
@@ -1068,11 +1071,6 @@ class LLVM_ABI_FOR_TEST VPInstruction : public VPRecipeWithIRFlags,
/// existing value is returned rather than a generated one.
Value *generate(VPTransformState &State);
- /// Utility methods serving execute(): generates a scalar single instance of
- /// the modeled instruction for a given lane. \returns the scalar generated
- /// value for lane \p Lane.
- Value *generatePerLane(VPTransformState &State, const VPLane &Lane);
-
#if !defined(NDEBUG)
/// Return the number of operands determined by the opcode of the
/// VPInstruction. Returns -1u if the number of operands cannot be determined
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 747c6623aa22a..75d23f5103503 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -109,6 +109,7 @@ Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPInstruction *R) {
case VPInstruction::AnyOf:
case VPInstruction::BuildStructVector:
case VPInstruction::BuildVector:
+ case VPInstruction::Unpack:
return SetResultTyFromOp();
case VPInstruction::ExtractLane:
return inferScalarType(R->getOperand(1));
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 86834ab1240c1..cf6e2360d76bd 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -467,6 +467,7 @@ unsigned VPInstruction::getNumOperandsForOpcode(unsigned Opcode) {
case VPInstruction::ExtractPenultimateElement:
case VPInstruction::FirstActiveLane:
case VPInstruction::Not:
+ case VPInstruction::Unpack:
return 1;
case Instruction::ICmp:
case Instruction::FCmp:
@@ -525,16 +526,6 @@ bool VPInstruction::canGenerateScalarForFirstLane() const {
}
}
-Value *VPInstruction::generatePerLane(VPTransformState &State,
- const VPLane &Lane) {
- IRBuilderBase &Builder = State.Builder;
-
- assert(getOpcode() == VPInstruction::PtrAdd &&
- "only PtrAdd opcodes are supported for now");
- return Builder.CreatePtrAdd(State.get(getOperand(0), Lane),
- State.get(getOperand(1), Lane), Name);
-}
-
/// Create a conditional branch using \p Cond branching to the successors of \p
/// VPBB. Note that the first successor is always forward (i.e. not created yet)
/// while the second successor may already have been created (if it is a header
@@ -1154,24 +1145,13 @@ void VPInstruction::execute(VPTransformState &State) {
"Set flags not supported for the provided opcode");
if (hasFastMathFlags())
State.Builder.setFastMathFlags(getFastMathFlags());
- bool GeneratesPerFirstLaneOnly = canGenerateScalarForFirstLane() &&
- (vputils::onlyFirstLaneUsed(this) ||
- isVectorToScalar() || isSingleScalar());
- bool GeneratesPerAllLanes = doesGeneratePerAllLanes();
- if (GeneratesPerAllLanes) {
- for (unsigned Lane = 0, NumLanes = State.VF.getFixedValue();
- Lane != NumLanes; ++Lane) {
- Value *GeneratedValue = generatePerLane(State, VPLane(Lane));
- assert(GeneratedValue && "generatePerLane must produce a value");
- State.set(this, GeneratedValue, VPLane(Lane));
- }
- return;
- }
-
Value *GeneratedValue = generate(State);
if (!hasResult())
return;
assert(GeneratedValue && "generate must produce a value");
+ bool GeneratesPerFirstLaneOnly = canGenerateScalarForFirstLane() &&
+ (vputils::onlyFirstLaneUsed(this) ||
+ isVectorToScalar() || isSingleScalar());
assert((((GeneratedValue->getType()->isVectorTy() ||
GeneratedValue->getType()->isStructTy()) ==
!GeneratesPerFirstLaneOnly) ||
@@ -1209,6 +1189,7 @@ bool VPInstruction::opcodeMayReadOrWriteFromMemory() const {
case VPInstruction::StepVector:
case VPInstruction::ReductionStartVector:
case VPInstruction::VScale:
+ case VPInstruction::Unpack:
return false;
default:
return true;
@@ -1244,10 +1225,11 @@ bool VPInstruction::onlyFirstLaneUsed(const VPValue *Op) const {
case VPInstruction::Broadcast:
case VPInstruction::ReductionStartVector:
return true;
+ case VPInstruction::BuildStructVector:
+ case VPInstruction::BuildVector:
+ return getNumOperands() > 1;
case VPInstruction::PtrAdd:
return Op == getOperand(0) || vputils::onlyFirstLaneUsed(this);
- case VPInstruction::WidePtrAdd:
- return Op == getOperand(0);
case VPInstruction::ComputeAnyOfResult:
case VPInstruction::ComputeFindIVResult:
return Op == getOperand(1);
@@ -1371,6 +1353,9 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent,
case VPInstruction::ResumeForEpilogue:
O << "resume-for-epilogue";
break;
+ case VPInstruction::Unpack:
+ O << "unpack-into-scalars";
+ break;
default:
O << Instruction::getOpcodeName(getOpcode());
}
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index d32d2a9ad11f7..110f127d0d741 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1189,6 +1189,15 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) {
return;
}
+ VPValue *Idx;
+ if (match(&R, m_VPInstruction<Instruction::ExtractElement>(m_BuildVector(),
+ m_VPValue(Idx)))) {
+ auto *BuildVector = cast<VPInstruction>(R.getOperand(0));
+ Def->replaceAllUsesWith(BuildVector->getOperand(
+ dyn_cast<ConstantInt>(Idx->getLiveInIRValue())->getZExtValue()));
+ return;
+ }
+
if (auto *Phi = dyn_cast<VPPhi>(Def)) {
if (Phi->getNumOperands() == 1)
Phi->replaceAllUsesWith(Phi->getOperand(0));
@@ -3370,40 +3379,89 @@ void VPlanTransforms::materializeBuildVectors(VPlan &Plan) {
vp_depth_first_shallow(Plan.getEntry()));
auto VPBBsInsideLoopRegion = VPBlockUtils::blocksOnly<VPBasicBlock>(
vp_depth_first_shallow(LoopRegion->getEntry()));
- // Materialize Build(Struct)Vector for all replicating VPReplicateRecipes,
- // excluding ones in replicate regions. Those are not materialized explicitly
- // yet. Those vector users are still handled in VPReplicateRegion::execute(),
- // via shouldPack().
+ // Materialize Build(Struct)Vector for all replicating VPReplicateRecipes and
+ // VPInstructions, excluding ones in replicate regions. Those are not
+ // materialized explicitly yet. Those vector users are still handled in
+ // VPReplicateRegion::execute(), via shouldPack().
// TODO: materialize build vectors for replicating recipes in replicating
// regions.
// TODO: materialize build vectors for VPInstructions.
for (VPBasicBlock *VPBB :
concat<VPBasicBlock *>(VPBBsOutsideLoopRegion, VPBBsInsideLoopRegion)) {
for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
- auto *RepR = dyn_cast<VPReplicateRecipe>(&R);
- auto UsesVectorOrInsideReplicateRegion = [RepR, LoopRegion](VPUser *U) {
+ auto *DefR = dyn_cast<VPRecipeWithIRFlags>(&R);
+ if (!DefR)
+ continue;
+ if (!isa<VPReplicateRecipe, VPInstruction>(DefR))
+ continue;
+ auto UsesVectorOrInsideReplicateRegion = [DefR, LoopRegion](VPUser *U) {
VPRegionBlock *ParentRegion =
cast<VPRecipeBase>(U)->getParent()->getParent();
- return !U->usesScalars(RepR) || ParentRegion != LoopRegion;
+ return !U->usesScalars(DefR) || ParentRegion != LoopRegion;
};
- if (!RepR || RepR->isSingleScalar() ||
- none_of(RepR->users(), UsesVectorOrInsideReplicateRegion))
+ if ((isa<VPReplicateRecipe>(DefR) &&
+ cast<VPReplicateRecipe>(DefR)->isSingleScalar()) ||
+ (isa<VPInstruction>(DefR) &&
+ !cast<VPInstruction>(DefR)->doesGeneratePerAllLanes()) ||
+ vputils::onlyFirstLaneUsed(DefR) ||
+ none_of(DefR->users(), UsesVectorOrInsideReplicateRegion))
continue;
- Type *ScalarTy = TypeInfo.inferScalarType(RepR);
+ Type *ScalarTy = TypeInfo.inferScalarType(DefR);
unsigned Opcode = ScalarTy->isStructTy()
? VPInstruction::BuildStructVector
: VPInstruction::BuildVector;
- auto *BuildVector = new VPInstruction(Opcode, {RepR});
- BuildVector->insertAfter(RepR);
+ auto *BuildVector = new VPInstruction(Opcode, {DefR});
+ BuildVector->insertAfter(DefR);
- RepR->replaceUsesWithIf(
+ DefR->replaceUsesWithIf(
BuildVector, [BuildVector, &UsesVectorOrInsideReplicateRegion](
VPUser &U, unsigned) {
return &U != BuildVector && UsesVectorOrInsideReplicateRegion(&U);
});
}
}
+
+ // Create explicit VPInstructions to convert vectors to scalars.
+ for (VPBasicBlock *VPBB :
+ concat<VPBasicBlock *>(VPBBsOutsideLoopRegion, VPBBsInsideLoopRegion)) {
+ for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
+ if (isa<VPReplicateRecipe, VPInstruction, VPScalarIVStepsRecipe>(&R))
+ continue;
+ for (VPValue *Def : R.definedValues()) {
+ if (vputils::isSingleScalar(Def) || vputils::onlyFirstLaneUsed(Def))
+ continue;
+
+ if (VPBB->getParent() != Plan.getVectorLoopRegion())
+ continue;
+
+ auto UsesVectorOrInsideReplicateRegion = [LoopRegion](VPUser *U) {
+ VPRegionBlock *ParentRegion =
+ cast<VPRecipeBase>(U)->getParent()->getParent();
+ return ParentRegion && ParentRegion != LoopRegion;
+ };
+
+ if (none_of(Def->users(),
+ [Def, &UsesVectorOrInsideReplicateRegion](VPUser *U) {
+ return !UsesVectorOrInsideReplicateRegion(U) &&
+ U->usesScalars(Def);
+ }))
+ continue;
+
+ auto *Unpack = new VPInstruction(VPInstruction::Unpack, {Def});
+ if (R.isPhi())
+ Unpack->insertBefore(*VPBB, VPBB->getFirstNonPhi());
+ else
+ Unpack->insertAfter(&R);
+ Def->replaceUsesWithIf(
+ Unpack,
+ [Def, &UsesVectorOrInsideReplicateRegion](VPUser &U, unsigned) {
+ return !UsesVectorOrInsideReplicateRegion(&U) &&
+ U.usesScalars(Def);
+ });
+ }
+ }
+ }
}
void VPlanTransforms::materializeVectorTripCount(VPlan &Plan,
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 9cf62a35ae36b..bd4b94fa034c4 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -158,8 +158,8 @@ struct VPlanTransforms {
/// Explicitly unroll \p Plan by \p UF.
static void unrollByUF(VPlan &Plan, unsigned UF);
- /// Replace each VPReplicateRecipe outside on any replicate region in \p Plan
- /// with \p VF single-scalar recipes.
+ /// Replace each VPReplicateRecipe and replicating VPInstruction outside on
+ /// any replicate region in \p Plan with \p VF single-scalar recipes.
/// TODO: Also replicate VPReplicateRecipes inside replicate regions, thereby
/// dissolving the latter.
static void replicateByVF(VPlan &Plan, ElementCount VF);
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
index 7a63d20825a31..9155335f9dde6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
@@ -463,15 +463,34 @@ void VPlanTransforms::unrollByUF(VPlan &Plan, unsigned UF) {
VPlanTransforms::removeDeadRecipes(Plan);
}
-/// Create a single-scalar clone of \p RepR for lane \p Lane. Use \p
-/// Def2LaneDefs to look up scalar definitions for operands of \RepR.
-static VPReplicateRecipe *
+/// Create a single-scalar clone of \p DefR for lane \p Lane. Use \p
+/// Def2LaneDefs to look up scalar definitions for operands of \DefR.
+static VPValue *
cloneForLane(VPlan &Plan, VPBuilder &Builder, Type *IdxTy,
- VPReplicateRecipe *RepR, VPLane Lane,
+ VPRecipeWithIRFlags *DefR, VPLane Lane,
const DenseMap<VPValue *, SmallVector<VPValue *>> &Def2LaneDefs) {
+
+ VPValue *Op;
+ if (match(DefR, m_VPInstruction<VPInstruction::Unpack>(m_VPValue(Op)))) {
+ auto LaneDefs = Def2LaneDefs.find(Op);
+ if (LaneDefs != Def2LaneDefs.end())
+ return LaneDefs->second[Lane.getKnownLane()];
+
+ VPValue *Idx =
+ Plan.getOrAddLiveIn(ConstantInt::get(IdxTy, Lane.getKnownLane()));
+ return Builder.createNaryOp(Instruction::ExtractElement, {Op, Idx});
+ }
+
// Collect the operands at Lane, creating extracts as needed.
SmallVector<VPValue *> NewOps;
- for (VPValue *Op : RepR->operands()) {
+ for (VPValue *Op : DefR->operands()) {
+ if (Lane.getKind() == VPLane::Kind::ScalableLast) {
+ match(Op, m_VPInstruction<VPInstruction::Unpack>(m_VPValue(Op)));
+ NewOps.push_back(
+ Builder.createNaryOp(VPInstruction::ExtractLastElement, {Op}));
+ continue;
+ }
+
// If Op is a definition that has been unrolled, directly use the clone for
// the corresponding lane.
auto LaneDefs = Def2LaneDefs.find(Op);
@@ -479,11 +498,6 @@ cloneForLane(VPlan &Plan, VPBuilder &Builder, Type *IdxTy,
NewOps.push_back(LaneDefs->second[Lane.getKnownLane()]);
continue;
}
- if (Lane.getKind() == VPLane::Kind::ScalableLast) {
- NewOps.push_back(
- Builder.createNaryOp(VPInstruction::ExtractLastElement, {Op}));
- continue;
- }
if (vputils::isSingleScalar(Op)) {
NewOps.push_back(Op);
continue;
@@ -497,15 +511,23 @@ cloneForLane(VPlan &Plan, VPBuilder &Builder, Type *IdxTy,
}
VPValue *Idx =
Plan.getOrAddLiveIn(ConstantInt::get(IdxTy, Lane.getKnownLane()));
- VPValue *Ext = Builder.createNaryOp(Instruction::ExtractElement, {Op, Idx});
- NewOps.push_back(Ext);
+ NewOps.push_back(
+ Builder.createNaryOp(Instruction::ExtractElement, {Op, Idx}));
}
- auto *New =
- new VPReplicateRecipe(RepR->getUnderlyingInstr(), NewOps,
- /*IsSingleScalar=*/true, /*Mask=*/nullptr, *RepR);
- New->transferFlags(*RepR);
- New->insertBefore(RepR);
+ VPRecipeWithIRFlags *New;
+ if (auto *RepR = dyn_cast<VPReplicateRecipe>(DefR)) {
+ New =
+ new VPReplicateRecipe(RepR->getUnderlyingInstr(), NewOps,
+ /*IsSingleScalar=*/true, /*Mask=*/nullptr, *RepR);
+ } else {
+ New = DefR->clone();
+ for (const auto &[Idx, Op] : enumerate(NewOps)) {
+ New->setOperand(Idx, Op);
+ }
+ }
+ New->transferFlags(*DefR);
+ New->insertBefore(DefR);
return New;
}
@@ -530,41 +552,47 @@ void VPlanTransforms::replicateByVF(VPlan &Plan, ElementCount VF) {
SmallVector<VPRecipeBase *> ToRemove;
for (VPBasicBlock *VPBB : VPBBsToUnroll) {
for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
- auto *RepR = dyn_cast<VPReplicateRecipe>(&R);
- if (!RepR || RepR->isSingleScalar())
+ auto *DefR = dyn_cast<VPRecipeWithIRFlags>(&R);
+ if (!DefR || !isa<VPInstruction, VPReplicateRecipe>(DefR))
+ continue;
+ if ((isa<VPReplicateRecipe>(DefR) &&
+ cast<VPReplicateRecipe>(DefR)->isSingleScalar()) ||
+ (isa<VPInstruction>(DefR) &&
+ !cast<VPInstruction>(DefR)->doesGeneratePerAllLanes() &&
+ cast<VPInstruction>(DefR)->getOpcode() != VPInstruction::Unpack))
continue;
- VPBuilder Builder(RepR);
- if (RepR->getNumUsers() == 0) {
- if (isa<StoreInst>(RepR->getUnderlyingInstr()) &&
- vputils::isSingleScalar(RepR->getOperand(1))) {
+ VPBuilder Builder(DefR);
+ if (DefR->getNumUsers() == 0) {
+ if (isa<StoreInst>(DefR->getUnderlyingInstr()) &&
+ vputils::isSingleScalar(DefR->getOperand(1))) {
// Stores to invariant addresses need to store the last lane only.
- cloneForLane(Plan, Builder, IdxTy, RepR, VPLane::getLastLaneForVF(VF),
+ cloneForLane(Plan, Builder, IdxTy, DefR, VPLane::getLastLaneForVF(VF),
Def2LaneDefs);
} else {
- // Create single-scalar version of RepR for all lanes.
+ // Create single-scalar version of DefR for all lanes.
for (unsigned I = 0; I != VF.getKnownMinValue(); ++I)
- ...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confused.
case VPInstruction::WidePtrAdd: | ||
return Op == getOperand(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code here doesn't match the codegen for WidePtrAdd which is exposed by this patch; it only uses the first lane, if the pointer is a single scalar.
With this code, we would incorrectly insert an unpack
and only provide the first lane instead of the vector pointers that is needed.
Add a new Unpack VPInstruction (name to be improved) to explicitly extract scalars values from vectors. Test changes are movements of the extracts: they are no generated together and also directly after the producer. Depends on llvm#155102 (included in PR) modified: llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
b6198b4
to
b49efbc
Compare
} | ||
|
||
VPValue *Idx; | ||
if (match(&R, m_VPInstruction<Instruction::ExtractElement>(m_BuildVector(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to add an m_ExtractElement?
m_VPValue(Idx)))) { | ||
auto *BuildVector = cast<VPInstruction>(R.getOperand(0)); | ||
Def->replaceAllUsesWith(BuildVector->getOperand( | ||
cast<ConstantInt>(Idx->getLiveInIRValue())->getZExtValue())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to replace this with m_ConstantInt, once the patch I put up is merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done thanks
if (vputils::isSingleScalar(Def) || vputils::onlyFirstLaneUsed(Def)) | ||
continue; | ||
|
||
if (VPBB->getParent() != Plan.getVectorLoopRegion()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (VPBB->getParent() != Plan.getVectorLoopRegion()) | |
if (VPBB->getParent() != LoopRegion) |
Can this ever happen? I thought we were doing a shallow traversal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This skipped VPInstructions outside regions, I updated the traversal to not visit those.
if (VPBB->getParent() != Plan.getVectorLoopRegion()) | ||
continue; | ||
|
||
auto UsesVectorOrInsideReplicateRegion = [LoopRegion](VPUser *U) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto UsesVectorOrInsideReplicateRegion = [LoopRegion](VPUser *U) { | |
auto IsInsideReplicateRegion = [LoopRegion](VPUser *U) { |
Where does it check that it uses vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that needs to be done separately in the latest version, renamed, thanks
if (none_of(Def->users(), | ||
[Def, &UsesVectorOrInsideReplicateRegion](VPUser *U) { | ||
return !UsesVectorOrInsideReplicateRegion(U) && | ||
U->usesScalars(Def); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick note: usesScalars can be a bit weak; I put up chainUsesScalarValues to fix this, but I'm not sure if it's the best idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this use case, it should be fine, we just need to know if the recipe may use scalars and provide it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find any additional issues, but I'm afraid I'm missing context for VPlanUnroll: I've never played with the ScalableLast lane, so I'll leave this up to a more qualified review (@ayalz?). Thanks.
cloneForLane(VPlan &Plan, VPBuilder &Builder, Type *IdxTy, | ||
VPRecipeWithIRFlags *DefR, VPLane Lane, | ||
const DenseMap<VPValue *, SmallVector<VPValue *>> &Def2LaneDefs) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stray newline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks
// TODO: Move to VPlan transform stage once the transition to the VPlan-based | ||
// cost model is complete for better cost estimates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, comment indicates that these transforms are still taking place during VPlan execution rather than planning. Perhaps worth rewording to clarify.
// cost model is complete for better cost estimates. | ||
VPlanTransforms::runPass(VPlanTransforms::unrollByUF, BestVPlan, BestUF); | ||
VPlanTransforms::runPass(VPlanTransforms::materializeBuildVectors, BestVPlan); | ||
VPlanTransforms::runPass(VPlanTransforms::materializeBuildAndUnpackVectors, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using Pack
to denote both BuildVector and BuildStructVector, consistent with Unpack
; and drop Vectors
suffix, consistent with the following materializeBroadcasts()
? Could fuse both into one, or split into three: pack, unpack, broadcast.
VPlanTransforms::runPass(VPlanTransforms::materializeBuildAndUnpackVectors, | |
VPlanTransforms::runPass(VPlanTransforms::materializePacksAndUnpacks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
// TODO: Move to VPlan transform stage once the transition to the VPlan-based | ||
// cost model is complete for better cost estimates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, comment indicates that these transforms are still taking place during VPlan execution rather than planning. Perhaps worth rewording to clarify.
case VPInstruction::WidePtrAdd: | ||
return Op == getOperand(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How/Is this related?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch exposed an issue here, more details in https://github.com/llvm/llvm-project/pull/155670/files#r2347298616 above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaulting to false seems a bit odd, as the first operand should be a scalar base. Worth a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment. It supports both vector and scalar base addresses.
case VPInstruction::UnpackVector: | ||
O << "unpack-into-scalars"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If clearer should the enum also be UnpackIntoScalars
? Although Unpack
should suffice for both, with documentation explaining what the term stands for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, updated to unpack
|
||
uint64_t Idx; | ||
if (match(&R, m_ExtractElement(m_BuildVector(), m_ConstantInt(Idx)))) { | ||
auto *BuildVector = cast<VPInstruction>(R.getOperand(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could m_BuildVector(Op0)
provide the desired operand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think it could, but would be inconsistent with other matchers, where the inner matcher match the operands.
cloneForLane(VPlan &Plan, VPBuilder &Builder, Type *IdxTy, | ||
VPRecipeWithIRFlags *DefR, VPLane Lane, | ||
const DenseMap<VPValue *, SmallVector<VPValue *>> &Def2LaneDefs) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
VPValue *Ext = Builder.createNaryOp(Instruction::ExtractElement, {Op, Idx}); | ||
NewOps.push_back(Ext); | ||
NewOps.push_back( | ||
Builder.createNaryOp(Instruction::ExtractElement, {Op, Idx})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Independent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restored the original code, thanks
if (auto *Reduce = dyn_cast<VPReductionRecipe>(VPV)) | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Independent, or only tested now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stripped again for now, thanks
SmallVector<VPValue *> NewOps; | ||
for (VPValue *Op : DefR->operands()) { | ||
if (Lane.getKind() == VPLane::Kind::ScalableLast) { | ||
match(Op, m_VPInstruction<VPInstruction::UnpackVector>(m_VPValue(Op))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth clarifying that Op
is being redefined from an operand of DefR to its own operand.
Is it important to hoist from current location below?
Worth documenting Unpack's role in extracting ScalableLast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment, thanks.
It can stay at the original position in the latest version, moved back, thanks
ResumeForEpilogue, | ||
/// Returns the value for vscale. | ||
VScale, | ||
/// Extracts all lanes from its (non-scalable) vector operand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an abstract VPInstruction whose single defined VPValue represents VF scalars extracted from a vector, to be replaced by VF ExtractElement VPInstructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added thanks
/// Extracts all lanes from its (non-scalable) vector operand. This is an | ||
/// abstract VPInstruction whose single defined VPValue represents VF | ||
/// scalars extracted from a vector, to be replaced by VF ExtractElement | ||
/// VPInstructions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// VPInstructions? | |
/// VPInstructions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed ,thanks
case VPInstruction::WidePtrAdd: | ||
return Op == getOperand(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaulting to false seems a bit odd, as the first operand should be a scalar base. Worth a comment?
auto IsInsideReplicateRegion = [LoopRegion](VPUser *U) { | ||
VPRegionBlock *ParentRegion = | ||
cast<VPRecipeBase>(U)->getParent()->getParent(); | ||
return ParentRegion && ParentRegion != LoopRegion; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better check if ParentRegion is replicating, consistent with the lambda's name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks!
for (VPValue *Def : R.definedValues()) { | ||
if (vputils::isSingleScalar(Def) || vputils::onlyFirstLaneUsed(Def)) | ||
continue; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Better placed before the lambda?
// At the moment, we only create unpacks for scalar users outside | ||
// replicate regions. Recipes inside replicate regions still manually | ||
// extract the required lanes. TODO: Remove once replicate regions are | ||
// unrolled explicitly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// unrolled explicitly. | |
// unrolled completely. |
They are already unrolled explicitly by UF, but have yet to also be unrolled by VF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, updated, thanks
if (isa<VPReplicateRecipe, VPInstruction, VPScalarIVStepsRecipe>(&R)) | ||
continue; | ||
for (VPValue *Def : R.definedValues()) { | ||
if (vputils::isSingleScalar(Def) || vputils::onlyFirstLaneUsed(Def)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The onlyFirstLaneUsed case may require a single extract rather than full unpack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I added a comment that the recipes skipped here may or may not produce a vector eventually and a TODO to introduce unpacks for them as well, and remove them once we are guaranteed to produce scalars.
For that reason, I left the code as is here for now.
// extract the required lanes. TODO: Remove once replicate regions are | ||
// unrolled explicitly. | ||
if (none_of(Def->users(), [Def, &IsInsideReplicateRegion](VPUser *U) { | ||
return !IsInsideReplicateRegion(U) && U->usesScalars(Def); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return !IsInsideReplicateRegion(U) && U->usesScalars(Def); | |
return U->usesScalars(Def) && !IsInsideReplicateRegion(U); |
sounds a bit cleaner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done thanks
Unpack->insertAfter(&R); | ||
Def->replaceUsesWithIf( | ||
Unpack, [Def, &IsInsideReplicateRegion](VPUser &U, unsigned) { | ||
return !IsInsideReplicateRegion(&U) && U.usesScalars(Def); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
Worth defining an IsCandidateUnpackUser()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, merged IsInsideReplicateRegion
in new IsCandidateUnpackUser
/// Add explicit Build[Struct]Vector recipes that combine multiple scalar | ||
/// values into single vectors. | ||
static void materializeBuildVectors(VPlan &Plan); | ||
/// values into single vectors and UnpackVector to extract scalars from a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// values into single vectors and UnpackVector to extract scalars from a | |
/// values into single vectors and Unpack recipes to extract scalars from a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
; CHECK-NEXT: store ptr null, ptr [[TMP10]], align 8 | ||
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x ptr> [[TMP3]], i32 1 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP11]], align 8 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP17]], align 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This store to TMP17 replaces the one removed above to TMP4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok: this store to TMP17 replaces the one to TMP11, which is retained but TMP11 is redefined.
Wonder if some simple "sink each ExtractElement to before its single/first user" pass could help confirm all test differences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, thanks! Adding some final comments.
static inline bool classof(const VPValue *VPV) { | ||
const VPRecipeBase *R = VPV->getDefiningRecipe(); | ||
return R && classof(R); | ||
} | ||
|
||
static inline bool classof(const VPSingleDefRecipe *R) { | ||
return classof(static_cast<const VPRecipeBase *>(R)); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, related to the introduction of unpack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's needed for this change, due to the new cast/dyn_cast/isa with VPSingleDefRecipe
// TODO: The Defs skipped here may or may not be vector values. | ||
// Introduce Unpacks, and remove them later, if they are guaranteed to | ||
// produce scalar values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be placed above - next to "Create explicit...", to also capture skipping VPInstructions?
// TODO: The Defs skipped here may or may not be vector values. | |
// Introduce Unpacks, and remove them later, if they are guaranteed to | |
// produce scalar values. | |
// Current implementation is conservative - it may miss some cases that may | |
// or may not be vector values. TODO: introduce Unpacks speculatively - remove | |
// them later if they are known to operate on scalar values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved and adjusted, thanks
if (vputils::isSingleScalar(Def) || vputils::onlyFirstLaneUsed(Def)) | ||
continue; | ||
|
||
// At the moment, we only create unpacks for scalar users outside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// At the moment, we only create unpacks for scalar users outside | |
// At the moment, we create unpacks only for scalar users outside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
// replicate regions. Recipes inside replicate regions still manually | ||
// extract the required lanes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// replicate regions. Recipes inside replicate regions still manually | |
// extract the required lanes. | |
// replicate regions. Recipes inside replicate regions still extract the | |
// required lanes implicitly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated thanks
// TODO: Remove once replicate regions are | ||
// unrolled completely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: Remove once replicate regions are | |
// unrolled completely. | |
// TODO: Remove once replicate regions are unrolled completely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
/// Add explicit Build[Struct]Vector recipes that combine multiple scalar | ||
/// values into single vectors. | ||
static void materializeBuildVectors(VPlan &Plan); | ||
/// values into single vectors and Unpack recipes to extract scalars from a | ||
/// vector as needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Add explicit Build[Struct]Vector recipes to Pack multiple scalar values
/// into vectors and Unpack recipes to extract scalars from vectors as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
/// Extracts all lanes from its (non-scalable) vector operand. This is an | ||
/// abstract VPInstruction whose single defined VPValue represents VF | ||
/// scalars extracted from a vector, to be replaced by VF ExtractElement | ||
/// VPInstructions. | ||
Unpack, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to place next to BuildStructVector and BuildVector "Pack" cases - which btw are concrete rather than abstract. (An analogous concrete Unpack which defines VF distinct VPValues could be introduced only after replicating by VF.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved, thanks
(isa<VPInstruction>(&R) && | ||
!cast<VPInstruction>(&R)->doesGeneratePerAllLanes())) | ||
!cast<VPInstruction>(&R)->doesGeneratePerAllLanes() && | ||
cast<VPInstruction>(&R)->getOpcode() != VPInstruction::Unpack)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should Unpack indicate that it doesGeneratePerAllLanes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may not generate for all lanes at the moment, if only its first lane is used and all cases need to be handled by unrolling at the moment.
; CHECK-NEXT: store ptr null, ptr [[TMP10]], align 8 | ||
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x ptr> [[TMP3]], i32 1 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP11]], align 8 | ||
; CHECK-NEXT: store ptr null, ptr [[TMP17]], align 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok: this store to TMP17 replaces the one to TMP11, which is retained but TMP11 is redefined.
Wonder if some simple "sink each ExtractElement to before its single/first user" pass could help confirm all test differences?
675b4ec
to
133f50a
Compare
Add a new Unpack VPInstruction (name to be improved) to explicitly
extract scalars values from vectors.
Test changes are movements of the extracts: they are no generated
together and also directly after the producer.
Depends on #155102 (included in
PR)