Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7294,6 +7294,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
// cost model is complete for better cost estimates.
VPlanTransforms::runPass(VPlanTransforms::unrollByUF, BestVPlan, BestUF,
OrigLoop->getHeader()->getContext());
VPlanTransforms::runPass(VPlanTransforms::replicateByVF, BestVPlan, BestVF);
VPlanTransforms::runPass(VPlanTransforms::materializeBroadcasts, BestVPlan);
VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);
VPlanTransforms::simplifyRecipes(BestVPlan, *Legal->getWidestInductionType());
Expand Down
7 changes: 7 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,13 @@ Value *VPTransformState::get(const VPValue *Def, const VPLane &Lane) {
return Data.VPV2Scalars[Def][0];
}

// Look through BuildVector to avoid redundant extracts.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done now rather than later to reduce test diff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep without the fold we would have additional insert/extracts for uses inside replicate regions, with corresponding test changes.

// TODO: Remove once replicate regions are unrolled explicitly.
if (Lane.getKind() == VPLane::Kind::First && match(Def, m_BuildVector())) {
auto *BuildVector = cast<VPInstruction>(Def);
return get(BuildVector->getOperand(Lane.getKnownLane()), true);
}

assert(hasVectorValue(Def));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent: missing error message.

auto *VecPart = Data.VPV2Vector[Def];
if (!VecPart->getType()->isVectorTy()) {
Expand Down
7 changes: 7 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -907,6 +907,13 @@ class VPInstruction : public VPRecipeWithIRFlags,
BranchOnCount,
BranchOnCond,
Broadcast,
/// Creates a struct of fixed-width vectors containing all operands. The
/// number of operands
/// matches the number of fields in the struct.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Creates a struct of fixed-width vectors containing all operands. The
/// number of operands
/// matches the number of fields in the struct.
/// Given operands of (the same) struct type, creates a struct of fixed-
/// width vectors each containing a struct field of all operands. The
/// number of operands matches the element count of every vector.

(Strictly speaking, the vectors created contain the fields of the operands, rather than the complete operands, in contrast to BuildVector.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

BuildStructVector,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lex order has BuildVector after BuildStructVector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reordered, thanks

/// Creates a fixed-width vector containing all operands. The number of
/// operands matches the vector element count.
BuildVector,
ComputeAnyOfResult,
ComputeFindLastIVResult,
ComputeReductionResult,
Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,8 @@ Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPInstruction *R) {
case VPInstruction::CalculateTripCountMinusVF:
case VPInstruction::CanonicalIVIncrementForPart:
case VPInstruction::AnyOf:
case VPInstruction::BuildStructVector:
case VPInstruction::BuildVector:
return SetResultTyFromOp();
case VPInstruction::FirstActiveLane:
return Type::getIntNTy(Ctx, 64);
Expand Down
14 changes: 14 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,9 @@ struct Recipe_match {
if ((!matchRecipeAndOpcode<RecipeTys>(R) && ...))
return false;

auto *VPI = dyn_cast<VPInstruction>(R);
if (VPI && VPI->getOpcode() == VPInstruction::BuildVector)
return true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *VPI = dyn_cast<VPInstruction>(R);
if (VPI && VPI->getOpcode() == VPInstruction::BuildVector)
return true;
// Finally match operands, except for BuildVector which is matched w/o checking its operands.
auto *VPI = dyn_cast<VPInstruction>(R);
if (VPI && VPI->getOpcode() == VPInstruction::BuildVector)
return true;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to check instead if Ops_t is empty and if so assert that R is BuildVector and early return, or set NumOperands to zero instead of R->getNumOperands() and continue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved up to handle the case first, thanks

assert(R->getNumOperands() == std::tuple_size<Ops_t>::value &&
"recipe with matched opcode does not have the expected number of "
"operands");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent nit: worth noting below that "Commutative" checks operands in reverse order, which works best for binary operations.

Expand Down Expand Up @@ -260,6 +263,10 @@ struct Recipe_match {
}
};

template <unsigned Opcode, typename... RecipeTys>
using ZeroOpRecipe_match =
Recipe_match<std::tuple<>, Opcode, false, RecipeTys...>;

template <typename Op0_t, unsigned Opcode, typename... RecipeTys>
using UnaryRecipe_match =
Recipe_match<std::tuple<Op0_t>, Opcode, false, RecipeTys...>;
Expand All @@ -268,6 +275,9 @@ template <typename Op0_t, unsigned Opcode>
using UnaryVPInstruction_match =
UnaryRecipe_match<Op0_t, Opcode, VPInstruction>;

template <unsigned Opcode>
using ZeroOpVPInstruction_match = ZeroOpRecipe_match<Opcode, VPInstruction>;

template <typename Op0_t, unsigned Opcode>
using AllUnaryRecipe_match =
UnaryRecipe_match<Op0_t, Opcode, VPWidenRecipe, VPReplicateRecipe,
Expand Down Expand Up @@ -299,6 +309,10 @@ using AllBinaryRecipe_match =
BinaryRecipe_match<Op0_t, Op1_t, Opcode, Commutative, VPWidenRecipe,
VPReplicateRecipe, VPWidenCastRecipe, VPInstruction>;

inline ZeroOpVPInstruction_match<VPInstruction::BuildVector> m_BuildVector() {
return ZeroOpVPInstruction_match<VPInstruction::BuildVector>();
}

Comment on lines +321 to +324
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inline ZeroOpVPInstruction_match<VPInstruction::BuildVector> m_BuildVector() {
return ZeroOpVPInstruction_match<VPInstruction::BuildVector>();
}
/// BuildVector is matches only its opcode, w/o matching its operands.
inline ZeroOpVPInstruction_match<VPInstruction::BuildVector> m_BuildVector() {
return ZeroOpVPInstruction_match<VPInstruction::BuildVector>();
}

plus some explanation why - number of operands varies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

template <unsigned Opcode, typename Op0_t>
inline UnaryVPInstruction_match<Op0_t, Opcode>
m_VPInstruction(const Op0_t &Op0) {
Expand Down
101 changes: 62 additions & 39 deletions llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -493,9 +493,9 @@ Value *VPInstruction::generate(VPTransformState &State) {
}
case Instruction::ExtractElement: {
assert(State.VF.isVector() && "Only extract elements from vectors");
Value *Vec = State.get(getOperand(0));
Value *Idx = State.get(getOperand(1), /*IsScalar=*/true);
return Builder.CreateExtractElement(Vec, Idx, Name);
unsigned IdxToExtract =
cast<ConstantInt>(getOperand(1)->getLiveInIRValue())->getZExtValue();
return State.get(getOperand(0), VPLane(IdxToExtract));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that the second operand must be constant should be documented and verified (if modeled as a general operand rather than a "flag").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to update the code here; there is code that may construct ExtractElement with non-constant values (early-exit with forced interleaving, where it is the first active lane)

}
case Instruction::Freeze: {
Value *Op = State.get(getOperand(0), vputils::onlyFirstLaneUsed(this));
Expand Down Expand Up @@ -604,6 +604,35 @@ Value *VPInstruction::generate(VPTransformState &State) {
return Builder.CreateVectorSplat(
State.VF, State.get(getOperand(0), /*IsScalar*/ true), "broadcast");
}
case VPInstruction::BuildStructVector: {
// For struct types, we need to build a new 'wide' struct type, where each
// element is widened.
auto *StructTy =
cast<StructType>(State.TypeAnalysis.inferScalarType(getOperand(0)));
auto NumOfElements = ElementCount::getFixed(getNumOperands());
Value *Res = PoisonValue::get(toVectorizedTy(StructTy, NumOfElements));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting that this creates a struct of vectors rather than a vector of structs, perhaps by calling toVectorizedStructTy() directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to the comment above.

assert(NumOfElements.getKnownMinValue() == StructTy->getNumElements() &&
"number of operands must match number of elements in StructTy");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "number of elements in StructTy", does StructTy->getNumElements() refer to the number of fields in StructTy or the number of elements in each of these fields. Tests seem to have two fields and VF=2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refers to the number of fields in the struct. It looks like the StructType documentation in LLVM refers to the fields as elements rather than fields.

for (const auto &[Idx, Op] : enumerate(operands())) {
for (unsigned I = 0; I != NumOfElements.getKnownMinValue(); I++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both loops seem to always have the same trip count, namely the number of operands (i.e., width of each resulting vector), but one (I) should be the number of fields in the struct (i.e., number of resulting vectors)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep the inner loop needs to process StructTy->getNumElements, which may be different than the number of operands.

Value *ScalarValue = Builder.CreateExtractValue(State.get(Op, true), I);
Value *VectorValue = Builder.CreateExtractValue(Res, I);
VectorValue =
Builder.CreateInsertElement(VectorValue, ScalarValue, Idx);
Res = Builder.CreateInsertValue(Res, VectorValue, I);
}
}
return Res;
}
case VPInstruction::BuildVector: {
auto *ScalarTy = State.TypeAnalysis.inferScalarType(getOperand(0));
auto NumOfElements = ElementCount::getFixed(getNumOperands());
Value *Res = PoisonValue::get(toVectorizedTy(ScalarTy, NumOfElements));
for (const auto &[Idx, Op] : enumerate(operands()))
Res = State.Builder.CreateInsertElement(Res, State.get(Op, true),
State.Builder.getInt32(Idx));
return Res;
}
case VPInstruction::ReductionStartVector: {
if (State.VF.isScalar())
return State.get(getOperand(0), true);
Expand Down Expand Up @@ -885,10 +914,11 @@ void VPInstruction::execute(VPTransformState &State) {
if (!hasResult())
return;
assert(GeneratedValue && "generate must produce a value");
assert(
(GeneratedValue->getType()->isVectorTy() == !GeneratesPerFirstLaneOnly ||
State.VF.isScalar()) &&
"scalar value but not only first lane defined");
assert((((GeneratedValue->getType()->isVectorTy() ||
GeneratedValue->getType()->isStructTy()) ==
!GeneratesPerFirstLaneOnly) ||
State.VF.isScalar()) &&
"scalar value but not only first lane defined");
State.set(this, GeneratedValue,
/*IsScalar*/ GeneratesPerFirstLaneOnly);
}
Expand All @@ -902,6 +932,8 @@ bool VPInstruction::opcodeMayReadOrWriteFromMemory() const {
case Instruction::ICmp:
case Instruction::Select:
case VPInstruction::AnyOf:
case VPInstruction::BuildStructVector:
case VPInstruction::BuildVector:
case VPInstruction::CalculateTripCountMinusVF:
case VPInstruction::CanonicalIVIncrementForPart:
case VPInstruction::ExtractLastElement:
Expand Down Expand Up @@ -1023,6 +1055,12 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent,
case VPInstruction::Broadcast:
O << "broadcast";
break;
case VPInstruction::BuildStructVector:
O << "buildstructvector";
break;
case VPInstruction::BuildVector:
O << "buildvector";
break;
case VPInstruction::ExtractLastElement:
O << "extract-last-element";
break;
Expand Down Expand Up @@ -2758,44 +2796,29 @@ static void scalarizeInstruction(const Instruction *Instr,

void VPReplicateRecipe::execute(VPTransformState &State) {
Instruction *UI = getUnderlyingInstr();
if (State.Lane) { // Generate a single instance.
assert((State.VF.isScalar() || !isSingleScalar()) &&
"uniform recipe shouldn't be predicated");
assert(!State.VF.isScalable() && "Can't scalarize a scalable vector");
scalarizeInstruction(UI, this, *State.Lane, State);
// Insert scalar instance packing it into a vector.
if (State.VF.isVector() && shouldPack()) {
// If we're constructing lane 0, initialize to start from poison.
if (State.Lane->isFirstLane()) {
assert(!State.VF.isScalable() && "VF is assumed to be non scalable.");
Value *Poison =
PoisonValue::get(VectorType::get(UI->getType(), State.VF));
State.set(this, Poison);
}
State.packScalarIntoVectorizedValue(this, *State.Lane);
}
return;
}

if (IsSingleScalar) {
// Uniform within VL means we need to generate lane 0.
if (!State.Lane) {
assert(IsSingleScalar &&
"VPReplicateRecipes outside replicate regions must be unrolled");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"VPReplicateRecipes outside replicate regions must be unrolled");
"VPReplicateRecipes outside replicate regions must have already been unrolled");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated thanks

scalarizeInstruction(UI, this, VPLane(0), State);
return;
}
Comment on lines 2735 to 2737
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle simpler single-scalar case first and early-return, assert one of above two cases applies:

  if (!State.Lane) {
    assert(IsSingleScalar && "...");
    scalarizeInstruction(UI, this, VPLane(0), State);
    return;
  }

  // Generate a single instance.
  ...

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks


// A store of a loop varying value to a uniform address only needs the last
// copy of the store.
if (isa<StoreInst>(UI) && vputils::isSingleScalar(getOperand(1))) {
auto Lane = VPLane::getLastLaneForVF(State.VF);
scalarizeInstruction(UI, this, VPLane(Lane), State);
return;
}

// Generate scalar instances for all VF lanes.
assert((State.VF.isScalar() || !isSingleScalar()) &&
"uniform recipe shouldn't be predicated");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"uniform recipe shouldn't be predicated");
"uniform recipe shouldn't be predicated");
assert(!State.VF.isScalable() && "Can't scalarize a scalable vector");

retain the assert here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

assert(!State.VF.isScalable() && "Can't scalarize a scalable vector");
const unsigned EndLane = State.VF.getKnownMinValue();
for (unsigned Lane = 0; Lane < EndLane; ++Lane)
scalarizeInstruction(UI, this, VPLane(Lane), State);
scalarizeInstruction(UI, this, *State.Lane, State);
// Insert scalar instance packing it into a vector.
if (State.VF.isVector() && shouldPack()) {
// If we're constructing lane 0, initialize to start from poison.
if (State.Lane->isFirstLane()) {
assert(!State.VF.isScalable() && "VF is assumed to be non scalable.");
Value *Poison =
PoisonValue::get(VectorType::get(UI->getType(), State.VF));
State.set(this, Poison);
}
State.packScalarIntoVectorizedValue(this, *State.Lane);
}
}

bool VPReplicateRecipe::shouldPack() const {
Expand Down
17 changes: 17 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1140,6 +1140,23 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) {
return;
}

// Look through ExtractLastElement (BuildVector ....).
if (match(&R, m_VPInstruction<VPInstruction::ExtractLastElement>(
m_BuildVector()))) {
auto *BuildVector = cast<VPInstruction>(R.getOperand(0));
Def->replaceAllUsesWith(
BuildVector->getOperand(BuildVector->getNumOperands() - 1));
return;
}
// Look through ExtractPenultimateElement (BuildVector ....).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
// Look through ExtractPenultimateElement (BuildVector ....).
}
// Look through ExtractPenultimateElement (BuildVector ....).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks

if (match(&R, m_VPInstruction<VPInstruction::ExtractPenultimateElement>(
m_BuildVector()))) {
auto *BuildVector = cast<VPInstruction>(R.getOperand(0));
Def->replaceAllUsesWith(
BuildVector->getOperand(BuildVector->getNumOperands() - 2));
return;
}

// Some simplifications can only be applied after unrolling. Perform them
// below.
if (!Plan->isUnrolled())
Expand Down
5 changes: 5 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.h
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,11 @@ struct VPlanTransforms {
/// Explicitly unroll \p Plan by \p UF.
static void unrollByUF(VPlan &Plan, unsigned UF, LLVMContext &Ctx);

/// Replace replicating VPReplicateRecipes outside replicate regions in \p
/// Plan with \p VF single-scalar recipes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Replace replicating VPReplicateRecipes outside replicate regions in \p
/// Plan with \p VF single-scalar recipes.
/// Replace each VPReplicateRecipe outside on any replicate region in \p Plan
/// with \p VF single-scalar recipes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

/// TODO: Also unroll VPReplicateRegions by VF.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// TODO: Also unroll VPReplicateRegions by VF.
/// TODO: Also replicate VPReplicateRecipes inside replicate regions, thereby
/// dissolving the latter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated thanks

static void replicateByVF(VPlan &Plan, ElementCount VF);

/// Optimize \p Plan based on \p BestVF and \p BestUF. This may restrict the
/// resulting plan to \p BestVF and \p BestUF.
static void optimizeForVFAndUF(VPlan &Plan, ElementCount BestVF,
Expand Down
81 changes: 81 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#include "VPlan.h"
#include "VPlanAnalysis.h"
#include "VPlanCFG.h"
#include "VPlanHelpers.h"
#include "VPlanPatternMatch.h"
#include "VPlanTransforms.h"
#include "VPlanUtils.h"
Expand Down Expand Up @@ -445,3 +446,83 @@ void VPlanTransforms::unrollByUF(VPlan &Plan, unsigned UF, LLVMContext &Ctx) {

VPlanTransforms::removeDeadRecipes(Plan);
}

/// Create a single-scalar clone of \p RepR for lane \p Lane.
static VPReplicateRecipe *cloneForLane(VPlan &Plan, VPBuilder &Builder,
Type *IdxTy, VPReplicateRecipe *RepR,
VPLane Lane) {
// Collect the operands at Lane, creating extracts as needed.
SmallVector<VPValue *> NewOps;
for (VPValue *Op : RepR->operands()) {
if (vputils::isSingleScalar(Op)) {
NewOps.push_back(Op);
continue;
}
if (Lane.getKind() == VPLane::Kind::ScalableLast) {
NewOps.push_back(
Builder.createNaryOp(VPInstruction::ExtractLastElement, {Op}));
continue;
}
// Look through buildvector to avoid unnecessary extracts.
if (match(Op, m_BuildVector())) {
NewOps.push_back(
cast<VPInstruction>(Op)->getOperand(Lane.getKnownLane()));
continue;
}
VPValue *Idx =
Plan.getOrAddLiveIn(ConstantInt::get(IdxTy, Lane.getKnownLane()));
VPValue *Ext = Builder.createNaryOp(Instruction::ExtractElement, {Op, Idx});
NewOps.push_back(Ext);
}

auto *New =
new VPReplicateRecipe(RepR->getUnderlyingInstr(), NewOps,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a VPInstruction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually yes, but more work is needed separately to use VPInstruction for single-scalar VPReplicateRecipes: #141429, #140623

/*IsSingleScalar=*/true, /*Mask=*/nullptr, *RepR);
New->insertBefore(RepR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That this failed to call transferFlags here, which resulted in miscompiles. I've created #147398 to fix.

return New;
}

void VPlanTransforms::replicateByVF(VPlan &Plan, ElementCount VF) {
Type *IdxTy = IntegerType::get(
Plan.getScalarHeader()->getIRBasicBlock()->getContext(), 32);
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry()))) {
for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
auto *RepR = dyn_cast<VPReplicateRecipe>(&R);
if (!RepR || RepR->isSingleScalar())
continue;

VPBuilder Builder(RepR);
SmallVector<VPValue *> LaneDefs;
// Stores to invariant addresses need to store the last lane only.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, should this be considered RepR->isSingleScalar() too and/or converted to storing last lane only independent of replicatingByVF(), possibly using ExtractLast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should be able to move this to convertToSingleScalar.

if (isa<StoreInst>(RepR->getUnderlyingInstr()) &&
vputils::isSingleScalar(RepR->getOperand(1))) {
cloneForLane(Plan, Builder, IdxTy, RepR, VPLane::getLastLaneForVF(VF));
RepR->eraseFromParent();
continue;
}

/// Create single-scalar version of RepR for all lanes.
for (unsigned I = 0; I != VF.getKnownMinValue(); ++I)
LaneDefs.push_back(cloneForLane(Plan, Builder, IdxTy, RepR, VPLane(I)));

/// Users that only demand the first lane can use the definition for lane
/// 0.
RepR->replaceUsesWithIf(LaneDefs[0], [RepR](VPUser &U, unsigned) {
return U.onlyFirstLaneUsed(RepR);
});

Type *ResTy = RepR->getUnderlyingInstr()->getType();
// If needed, create a Build(Struct)Vector recipe to insert the scalar
// lane values into a vector.
Comment on lines +526 to +527
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a pair of replicating recipes one feeding the other is replaced by VF recipes feeding a buildVector which VF other recipes extract from, where the extracts are optimized away by cloneForLane(); and the buildVector possibly by dce?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

if (!ResTy->isVoidTy()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ResTy is void (better check for empty users instead?) suffice to clone for lanes and erase from parent, w/o populating LaneDefs, handling all stores early following those to a single address.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

VPValue *VecRes = Builder.createNaryOp(
ResTy->isStructTy() ? VPInstruction::BuildStructVector
: VPInstruction::BuildVector,
LaneDefs);
RepR->replaceAllUsesWith(VecRes);
}
RepR->eraseFromParent();
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -398,12 +398,6 @@ define void @test_for_tried_to_force_scalar(ptr noalias %A, ptr noalias %B, ptr
; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <12 x float> [[WIDE_VEC]], <12 x float> poison, <4 x i32> <i32 0, i32 3, i32 6, i32 9>
; CHECK-NEXT: [[TMP30:%.*]] = extractelement <4 x float> [[STRIDED_VEC]], i32 3
; CHECK-NEXT: store float [[TMP30]], ptr [[C:%.*]], align 4
; CHECK-NEXT: [[TMP31:%.*]] = extractelement <4 x ptr> [[TMP29]], i32 0
; CHECK-NEXT: [[TMP38:%.*]] = load float, ptr [[TMP31]], align 4
; CHECK-NEXT: [[TMP33:%.*]] = extractelement <4 x ptr> [[TMP29]], i32 1
; CHECK-NEXT: [[TMP32:%.*]] = load float, ptr [[TMP33]], align 4
; CHECK-NEXT: [[TMP35:%.*]] = extractelement <4 x ptr> [[TMP29]], i32 2
; CHECK-NEXT: [[TMP34:%.*]] = load float, ptr [[TMP35]], align 4
; CHECK-NEXT: [[TMP37:%.*]] = extractelement <4 x ptr> [[TMP29]], i32 3
; CHECK-NEXT: [[TMP36:%.*]] = load float, ptr [[TMP37]], align 4
; CHECK-NEXT: store float [[TMP36]], ptr [[B:%.*]], align 4
Expand Down
Loading
Loading