Skip to content
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7223,9 +7223,6 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
return DenseMap<const SCEV *, Value *>();
}

VPlanTransforms::narrowInterleaveGroups(
BestVPlan, BestVF,
TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector));
Comment on lines -7234 to -7236
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in addition to moving narrowInterleaveGroups from VPlan execution to planning, it also changes relative transform order - being moved from LVP::executePlan() after optimizing for final VF and UF, to be the last transform of buildVPlansWithVPRecipes(), skipping over several transforms in LVP::executePlan().

Perhaps worth first hoisting it to appear earlier/earliest in LVP::executePlan(), still operating on the final VPlan but before it is unrolled etc., and then move it to the end of LVP::executePlan() where it operates on multiple VPlan's?

VPlanTransforms::removeDeadRecipes(BestVPlan);

VPlanTransforms::convertToConcreteRecipes(BestVPlan);
Expand Down Expand Up @@ -8194,6 +8191,10 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
if (CM.foldTailWithEVL())
VPlanTransforms::runPass(VPlanTransforms::addExplicitVectorLength,
*Plan, CM.getMaxSafeElements());

if (auto P = VPlanTransforms::narrowInterleaveGroups(*Plan, TTI))
VPlans.push_back(std::move(P));

assert(verifyVPlanIsValid(*Plan) && "VPlan is invalid");
VPlans.push_back(std::move(Plan));
}
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Transforms/Vectorize/VPlan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1213,6 +1213,7 @@ VPlan *VPlan::duplicate() {
}
Old2NewVPValues[&VectorTripCount] = &NewPlan->VectorTripCount;
Old2NewVPValues[&VF] = &NewPlan->VF;
Old2NewVPValues[&UF] = &NewPlan->UF;
Old2NewVPValues[&VFxUF] = &NewPlan->VFxUF;
if (BackedgeTakenCount) {
NewPlan->BackedgeTakenCount = new VPValue();
Expand Down
12 changes: 12 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -4117,6 +4117,9 @@ class VPlan {
/// Represents the vectorization factor of the loop.
VPValue VF;

/// Represents the symbolic unroll factor of the loop.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Represents the symbolic unroll factor of the loop.
/// Represents the unroll factor of the loop.

VF and VFxUF are also "symbolic", when VF is fixed.
Worth documenting here that they must not be used after materializing?

VPValue UF;

/// Represents the loop-invariant VF * UF of the vector loop region.
VPValue VFxUF;

Expand Down Expand Up @@ -4268,6 +4271,9 @@ class VPlan {
/// Returns the VF of the vector loop region.
VPValue &getVF() { return VF; };

/// Returns the symbolic UF of the vector loop region.
VPValue &getSymbolicUF() { return UF; };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this cann't be made const, as it is used with replaceAllUsesWith, which cannot be const.

Comment on lines +4311 to +4312
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Returns the symbolic UF of the vector loop region.
VPValue &getSymbolicUF() { return UF; };
/// Returns the UF of the vector loop region.
VPValue &getUF() { return UF; };

to be consistent with VF and VFxUF which may also be symbolic; or at-least rename UF to be SymbolicUF.
This would require renaming the exiting getUF() which returns unsigned, say, to be getFixedUF(). (Can also provide getFixedVF(), getFixedVFxUF() to support fixed VF case.)


/// Returns VF * UF of the vector loop region.
VPValue &getVFxUF() { return VFxUF; }

Expand All @@ -4277,6 +4283,12 @@ class VPlan {

void addVF(ElementCount VF) { VFs.insert(VF); }

/// Remove \p VF from the plan.
void removeVF(ElementCount VF) {
assert(hasVF(VF) && "tried to remove VF not present in plan");
VFs.remove(VF);
}

Comment on lines +4323 to +4328
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better place removeVF() after rather than between addVF() and setVF()?

void setVF(ElementCount VF) {
assert(hasVF(VF) && "Cannot set VF not already in plan");
VFs.clear();
Expand Down
116 changes: 75 additions & 41 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3862,6 +3862,9 @@ void VPlanTransforms::materializeVFAndVFxUF(VPlan &Plan, VPBasicBlock *VectorPH,
// used.
// TODO: Assert that they aren't used.
Comment on lines 3960 to 3961
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above comment and TODO apply to Plan.getUF as well?


VPValue *UF = Plan.getOrAddLiveIn(ConstantInt::get(TCTy, Plan.getUF()));
Plan.getSymbolicUF().replaceAllUsesWith(UF);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better rename materializeVFAndVFxUF() now to, say, materializeVFAndUF() or materializeFactors()?


// If there are no users of the runtime VF, compute VFxUF by constant folding
// the multiplication of VF and UF.
if (VF.getNumUsers() == 0) {
Expand All @@ -3881,7 +3884,6 @@ void VPlanTransforms::materializeVFAndVFxUF(VPlan &Plan, VPBasicBlock *VectorPH,
}
VF.replaceAllUsesWith(RuntimeVF);

VPValue *UF = Plan.getOrAddLiveIn(ConstantInt::get(TCTy, Plan.getUF()));
VPValue *MulByUF = Builder.createNaryOp(Instruction::Mul, {RuntimeVF, UF});
VFxUF.replaceAllUsesWith(MulByUF);
}
Expand Down Expand Up @@ -3949,14 +3951,14 @@ static bool canNarrowLoad(VPWidenRecipe *WideMember0, unsigned OpIdx,
return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Independent) Above talks about "a narrower recipe", would be good to clarify how narrower.

}

/// Returns true if \p IR is a full interleave group with factor and number of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Independent) \p IR should be \p InterleaveR

/// members both equal to \p VF. The interleave group must also access the full
/// vector width \p VectorRegWidth.
static bool isConsecutiveInterleaveGroup(VPInterleaveRecipe *InterleaveR,
unsigned VF, VPTypeAnalysis &TypeInfo,
unsigned VectorRegWidth) {
/// Returns VF from \p VFs if \p IR is a full interleave group with factor and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps should be renamed get*() rather than is*().

/// number of members both equal to VF. The interleave group must also access
/// the full vector width.
static std::optional<ElementCount> isConsecutiveInterleaveGroup(
VPInterleaveRecipe *InterleaveR, ArrayRef<ElementCount> VFs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isConsecutiveInterleaveGroup() is extended from accepting a single, fixed unsigned VF to working with an ArrayRef of VFs - which can now also be scalable? No, scalable VF's are already supported. The confusion stemmed from naming the unsigned parameter of isConsecutiveInterleaveGroup() "VF", instead of "VFMinVal".

VPTypeAnalysis &TypeInfo, const TargetTransformInfo &TTI) {
if (!InterleaveR)
return false;
return std::nullopt;

Type *GroupElementTy = nullptr;
if (InterleaveR->getStoredValues().empty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Independent) Would be good to have InterleaveR->is[Interleaved]Store(), InterleaveR->is[Interleaved]Load().

Expand All @@ -3965,21 +3967,35 @@ static bool isConsecutiveInterleaveGroup(VPInterleaveRecipe *InterleaveR,
[&TypeInfo, GroupElementTy](VPValue *Op) {
return TypeInfo.inferScalarType(Op) == GroupElementTy;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Independent) Can distinct element types be grouped into an interleaved load, or can this currently be an assert?
Same for interleaved stores below.

}))
return false;
return std::nullopt;
} else {
GroupElementTy =
TypeInfo.inferScalarType(InterleaveR->getStoredValues()[0]);
if (!all_of(InterleaveR->getStoredValues(),
[&TypeInfo, GroupElementTy](VPValue *Op) {
return TypeInfo.inferScalarType(Op) == GroupElementTy;
}))
return false;
return std::nullopt;
}

unsigned GroupSize = GroupElementTy->getScalarSizeInBits() * VF;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Independent) GroupSizeInBits?

auto IG = InterleaveR->getInterleaveGroup();
return IG->getFactor() == VF && IG->getNumMembers() == VF &&
GroupSize == VectorRegWidth;
Comment on lines -4079 to -4080
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Independent) The IG->getFactor() == VF and GroupSize == VectorRegWidth restrictions correspond to narrowing interleave groups into regular VF-wide loads/stores and scaling the induction step accordingly, and could potentially be lifted if multiple VF-wide loads/store are generated? An interleaved load feeding an interleaved store amounts to a wide-load --> shuffle --> de-shuffle --> wide-store, where the two shuffles could be folded together yielding wide-load --> wide-store, regardless of how many vector registers are needed to legalize wide-load and wide-store? Splitting / materializing InterleaveGroup recipes into wide load/store and shuffle recipes may simplify optimizing away shuffle/de-shuffle pairs, and seems in general aligned with having VPlan support SLP.
Admittedly, if after splitting the shuffles remain, it would be good to pack them back into interleave groups for costing.

auto GetVectorWidthForVF = [&TTI](ElementCount VF) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto GetVectorWidthForVF = [&TTI](ElementCount VF) {
auto GetVectorBitWidthForVF = [&TTI](ElementCount VF) {

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends only on whether VF is scalar or fixed, and is independent of VF's actual value? I.e., can be a function of VF.isFixed() rather than of VF.

TypeSize Size = TTI.getRegisterBitWidth(
VF.isFixed() ? TargetTransformInfo::RGK_FixedWidthVector
: TargetTransformInfo::RGK_ScalableVector);
assert(Size.isScalable() == VF.isScalable() &&
"if Size is scalable, VF must to and vice versa");
return Size.getKnownMinValue();
};

for (ElementCount VF : VFs) {
unsigned MinVal = VF.getKnownMinValue();
unsigned GroupSize = GroupElementTy->getScalarSizeInBits() * MinVal;
auto IG = InterleaveR->getInterleaveGroup();
if (IG->getFactor() == MinVal && IG->getNumMembers() == MinVal &&
Comment on lines +4091 to +4092
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can pre-check that IG's factor equals IG's number of members.

GroupSize == GetVectorWidthForVF(VF))
return {VF};
}
return std::nullopt;
}

/// Returns true if \p VPValue is a narrow VPValue.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Independent) Would be good to define narrow here and/or elsewhere.

Expand All @@ -3990,16 +4006,18 @@ static bool isAlreadyNarrow(VPValue *VPV) {
return RepR && RepR->isSingleScalar();
}

void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, ElementCount VF,
unsigned VectorRegWidth) {
std::unique_ptr<VPlan>
VPlanTransforms::narrowInterleaveGroups(VPlan &Plan,
const TargetTransformInfo &TTI) {
using namespace llvm::VPlanPatternMatch;
VPRegionBlock *VectorLoop = Plan.getVectorLoopRegion();

if (!VectorLoop)
return;
return nullptr;

VPTypeAnalysis TypeInfo(Plan);

unsigned VFMinVal = VF.getKnownMinValue();
SmallVector<VPInterleaveRecipe *> StoreGroups;
std::optional<ElementCount> VFToOptimize;
for (auto &R : *VectorLoop->getEntryBasicBlock()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Independent) Checking recipes of entry BB only?

if (isa<VPCanonicalIVPHIRecipe>(&R) ||
match(&R, m_BranchOnCount(m_VPValue(), m_VPValue())))
Expand All @@ -4014,30 +4032,33 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, ElementCount VF,
// * recipes writing to memory except interleave groups
// Only support plans with a canonical induction phi.
if (R.isPhi())
return;
return nullptr;

auto *InterleaveR = dyn_cast<VPInterleaveRecipe>(&R);
if (R.mayWriteToMemory() && !InterleaveR)
return;

// Do not narrow interleave groups if there are VectorPointer recipes and
// the plan was unrolled. The recipe implicitly uses VF from
// VPTransformState.
// TODO: Remove restriction once the VF for the VectorPointer offset is
// modeled explicitly as operand.
if (isa<VPVectorPointerRecipe>(&R) && Plan.getUF() > 1)
return;
Comment on lines -4120 to -4126
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO taken care of? Below asserts that vector pointer recipes are absent.

return nullptr;

// All other ops are allowed, but we reject uses that cannot be converted
// when checking all allowed consumers (store interleave groups) below.
if (!InterleaveR)
continue;

// Bail out on non-consecutive interleave groups.
if (!isConsecutiveInterleaveGroup(InterleaveR, VFMinVal, TypeInfo,
VectorRegWidth))
return;

// Try to find a single VF, where all interleave groups are consecutive and
// saturate the full vector width. If we already have a candidate VF, check
// if it is applicable for the current InterleaveR, otherwise look for a
// suitable VF across the Plans VFs.
//
if (VFToOptimize) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unify using VFs = VFToOptimize ? {*VFToOptimize} : Plan.vectorFactors()?

if (!isConsecutiveInterleaveGroup(InterleaveR, {*VFToOptimize}, TypeInfo,
TTI))
return nullptr;
} else {
if (auto VF = isConsecutiveInterleaveGroup(
InterleaveR, to_vector(Plan.vectorFactors()), TypeInfo, TTI))
VFToOptimize = *VF;
else
return nullptr;
}
// Skip read interleave groups.
if (InterleaveR->getStoredValues().empty())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Independent) May be good to rename InterleaveR into InterleavedStore, at-least from here on.

continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Independent) What if below Member0 is already narrow but not all stored values are the same?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Independent) Better to check indices of members in IG rather than match the order of VPValues defined by interleaved load recipe to the order of interleaved store operands? Or verify that these recipes retain these orders.

Expand Down Expand Up @@ -4071,24 +4092,34 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, ElementCount VF,
auto *WideMember0 = dyn_cast_or_null<VPWidenRecipe>(
InterleaveR->getStoredValues()[0]->getDefiningRecipe());
if (!WideMember0)
return;
return nullptr;
for (const auto &[I, V] : enumerate(InterleaveR->getStoredValues())) {
auto *R = dyn_cast_or_null<VPWidenRecipe>(V->getDefiningRecipe());
if (!R || R->getOpcode() != WideMember0->getOpcode() ||
R->getNumOperands() > 2)
return;
return nullptr;
if (any_of(enumerate(R->operands()),
[WideMember0, Idx = I](const auto &P) {
const auto &[OpIdx, OpV] = P;
return !canNarrowLoad(WideMember0, OpIdx, OpV, Idx);
}))
return;
return nullptr;
}
StoreGroups.push_back(InterleaveR);
}

if (StoreGroups.empty())
return;
return nullptr;

// All interleave groups in Plan can be narrowed for VFToOptimize. Split the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a
TODO: Handle cases where only some interleave groups can be narrowed.
?
This transform pivots the dimension of vectorization from fully-loop-based to fully-SLP, affecting all recipes. The motivation stems from at-least one SLP tree, and works well when all recipes of the loop lie on SLP trees, but may still be beneficial even if some recipes remain scalar - outside of any SLP tree.

// original Plan into 2: a) a new clone which contains all VFs of Plan, except
// VFToOptimize, and b) the original Plan with VFToOptimize as single VF.
std::unique_ptr<VPlan> NewPlan;
if (size(Plan.vectorFactors()) != 1) {
NewPlan = std::unique_ptr<VPlan>(Plan.duplicate());
Plan.setVF(*VFToOptimize);
NewPlan->removeVF(*VFToOptimize);
}

// Convert InterleaveGroup \p R to a single VPWidenLoadRecipe.
SmallPtrSet<VPValue *, 4> NarrowedOps;
Expand Down Expand Up @@ -4159,9 +4190,8 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, ElementCount VF,
auto *Inc = cast<VPInstruction>(CanIV->getBackedgeValue());
VPBuilder PHBuilder(Plan.getVectorPreheader());

VPValue *UF = Plan.getOrAddLiveIn(
ConstantInt::get(CanIV->getScalarType(), 1 * Plan.getUF()));
if (VF.isScalable()) {
VPValue *UF = &Plan.getSymbolicUF();
if (VFToOptimize->isScalable()) {
VPValue *VScale = PHBuilder.createElementCount(
CanIV->getScalarType(), ElementCount::getScalable(1));
VPValue *VScaleUF = PHBuilder.createNaryOp(Instruction::Mul, {VScale, UF});
Expand All @@ -4173,6 +4203,10 @@ void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, ElementCount VF,
Plan.getOrAddLiveIn(ConstantInt::get(CanIV->getScalarType(), 1)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VF of Plan is set to 1 to affect the induction recipes that use it, in order to de-vectorize the loop, but the widen loads and stores recipes (that replace the interleaved loads and stores) are to still generate vectors instructions according to the original VF. Would be good to clarify this discrepancy.

}
removeDeadRecipes(Plan);
assert(none_of(*VectorLoop->getEntryBasicBlock(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again attention is given to entry BB only.

IsaPred<VPVectorPointerRecipe>) &&
"All VPVectorPointerRecipes should have been removed");
Comment on lines +4303 to +4305
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This corresponds to the original constraint that UF must be 1 if vector pointer recipes are present?

return NewPlan;
}

/// Add branch weight metadata, if the \p Plan's middle block is terminated by a
Expand Down
22 changes: 14 additions & 8 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.h
Original file line number Diff line number Diff line change
Expand Up @@ -340,14 +340,20 @@ struct VPlanTransforms {
static DenseMap<const SCEV *, Value *> expandSCEVs(VPlan &Plan,
ScalarEvolution &SE);

/// Try to convert a plan with interleave groups with VF elements to a plan
/// with the interleave groups replaced by wide loads and stores processing VF
/// elements, if all transformed interleave groups access the full vector
/// width (checked via \o VectorRegWidth). This effectively is a very simple
/// form of loop-aware SLP, where we use interleave groups to identify
/// candidates.
static void narrowInterleaveGroups(VPlan &Plan, ElementCount VF,
unsigned VectorRegWidth);
/// Try to find a single VF among \p Plan's VFs for which all interleave
/// groups (with known minimum VF elements) can be replaced by wide loads and
/// stores processing VF elements, if all transformed interleave groups access
/// the full vector width (checked via the maximum vector register width). If
/// the transformation can be applied, the original \p Plan will be split in
/// 2:
/// 1. The original Plan with the single VF containing the optimized recipes
/// using wide loads instead of interleave groups.
/// 2. A new clone which contains all VFs of Plan except the optimized VF.
///
/// This effectively is a very simple form of loop-aware SLP, where we use
/// interleave groups to identify candidates.
static std::unique_ptr<VPlan>
narrowInterleaveGroups(VPlan &Plan, const TargetTransformInfo &TTI);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More important than "narrowing" is the "pivoting" of the vectorization dimension from being loop-based to being SLP-based, thereby eliminating shuffle-de-shuffle redundancies. This can be achieved w/o narrowing, provided support for very-wide load/store recipes or emission of multiple wide load/store recipes instead of emitting only single ones.


/// Predicate and linearize the control-flow in the only loop region of
/// \p Plan. If \p FoldTail is true, create a mask guarding the loop
Expand Down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a scalable vector version of at least one of these tests please? I tested this file with this PR and ran opt -p loop-vectorize -mcpu=neoverse-v1 and we generate IR like this for test_add_double_same_const_args_1:

  %wide.load = load <vscale x 2 x double>, ptr %9, align 4
  %wide.load1 = load <vscale x 2 x double>, ptr %10, align 4
  %11 = fadd <vscale x 2 x double> %wide.load, splat (double 1.000000e+00)
  %12 = fadd <vscale x 2 x double> %wide.load1, splat (double 1.000000e+00)
...
  store <vscale x 2 x double> %11, ptr %13, align 4
  store <vscale x 2 x double> %12, ptr %14, align 4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a RUN line to the scalable test file w/o forced interleaving. I think that should add the missing coverage. Could also add additional tests there.

Original file line number Diff line number Diff line change
Expand Up @@ -175,28 +175,18 @@ define void @test_add_double_same_var_args_1(ptr %res, ptr noalias %A, ptr noali
; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
; CHECK: [[VECTOR_BODY]]:
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 1
; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i64 [[INDEX]]
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i64 [[TMP0]]
; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <4 x double>, ptr [[TMP1]], align 4
; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <4 x double> [[WIDE_VEC]], <4 x double> poison, <2 x i32> <i32 0, i32 2>
; CHECK-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <4 x double> [[WIDE_VEC]], <4 x double> poison, <2 x i32> <i32 1, i32 3>
; CHECK-NEXT: [[WIDE_VEC2:%.*]] = load <4 x double>, ptr [[TMP2]], align 4
; CHECK-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <4 x double> [[WIDE_VEC2]], <4 x double> poison, <2 x i32> <i32 0, i32 2>
; CHECK-NEXT: [[STRIDED_VEC4:%.*]] = shufflevector <4 x double> [[WIDE_VEC2]], <4 x double> poison, <2 x i32> <i32 1, i32 3>
; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[STRIDED_VEC]], [[BROADCAST_SPLAT]]
; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[STRIDED_VEC3]], [[BROADCAST_SPLAT]]
; CHECK-NEXT: [[STRIDED_VEC1:%.*]] = load <2 x double>, ptr [[TMP1]], align 4
; CHECK-NEXT: [[STRIDED_VEC4:%.*]] = load <2 x double>, ptr [[TMP2]], align 4
; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[STRIDED_VEC1]], [[BROADCAST_SPLAT]]
; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[STRIDED_VEC4]], [[BROADCAST_SPLAT]]
; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[RES]], i64 [[INDEX]]
; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[RES]], i64 [[TMP0]]
; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x double> [[TMP9]], <4 x double> poison, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
; CHECK-NEXT: store <4 x double> [[INTERLEAVED_VEC]], ptr [[TMP7]], align 4
; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[INTERLEAVED_VEC5:%.*]] = shufflevector <4 x double> [[TMP10]], <4 x double> poison, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
; CHECK-NEXT: store <4 x double> [[INTERLEAVED_VEC5]], ptr [[TMP8]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; CHECK-NEXT: store <2 x double> [[TMP5]], ptr [[TMP7]], align 4
; CHECK-NEXT: store <2 x double> [[TMP6]], ptr [[TMP8]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
Expand Down Expand Up @@ -237,28 +227,18 @@ define void @test_add_double_same_var_args_2(ptr %res, ptr noalias %A, ptr noali
; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
; CHECK: [[VECTOR_BODY]]:
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 1
; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i64 [[INDEX]]
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[A]], i64 [[TMP0]]
; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <4 x double>, ptr [[TMP1]], align 4
; CHECK-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <4 x double> [[WIDE_VEC]], <4 x double> poison, <2 x i32> <i32 0, i32 2>
; CHECK-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <4 x double> [[WIDE_VEC]], <4 x double> poison, <2 x i32> <i32 1, i32 3>
; CHECK-NEXT: [[WIDE_VEC2:%.*]] = load <4 x double>, ptr [[TMP2]], align 4
; CHECK-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <4 x double> [[WIDE_VEC2]], <4 x double> poison, <2 x i32> <i32 0, i32 2>
; CHECK-NEXT: [[STRIDED_VEC4:%.*]] = shufflevector <4 x double> [[WIDE_VEC2]], <4 x double> poison, <2 x i32> <i32 1, i32 3>
; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[BROADCAST_SPLAT]], [[STRIDED_VEC]]
; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[BROADCAST_SPLAT]], [[STRIDED_VEC3]]
; CHECK-NEXT: [[STRIDED_VEC1:%.*]] = load <2 x double>, ptr [[TMP1]], align 4
; CHECK-NEXT: [[STRIDED_VEC4:%.*]] = load <2 x double>, ptr [[TMP2]], align 4
; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[BROADCAST_SPLAT]], [[STRIDED_VEC1]]
; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[BROADCAST_SPLAT]], [[STRIDED_VEC4]]
; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[RES]], i64 [[INDEX]]
; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds nuw { double, double }, ptr [[RES]], i64 [[TMP0]]
; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x double> [[TMP9]], <4 x double> poison, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
; CHECK-NEXT: store <4 x double> [[INTERLEAVED_VEC]], ptr [[TMP7]], align 4
; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[INTERLEAVED_VEC5:%.*]] = shufflevector <4 x double> [[TMP10]], <4 x double> poison, <4 x i32> <i32 0, i32 2, i32 1, i32 3>
; CHECK-NEXT: store <4 x double> [[INTERLEAVED_VEC5]], ptr [[TMP8]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; CHECK-NEXT: store <2 x double> [[TMP5]], ptr [[TMP7]], align 4
; CHECK-NEXT: store <2 x double> [[TMP6]], ptr [[TMP8]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
; CHECK-NEXT: br i1 [[TMP11]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
Expand Down
Loading