Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions llvm/include/llvm/Analysis/TargetTransformInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -209,9 +209,10 @@ struct TailFoldingInfo {
TargetLibraryInfo *TLI;
LoopVectorizationLegality *LVL;
InterleavedAccessInfo *IAI;
bool UseWideLaneMask;
TailFoldingInfo(TargetLibraryInfo *TLI, LoopVectorizationLegality *LVL,
InterleavedAccessInfo *IAI)
: TLI(TLI), LVL(LVL), IAI(IAI) {}
InterleavedAccessInfo *IAI, bool UseWideLaneMask = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag is being set by a loop vectoriser flag called 'EnableWideLaneMask', which to me isn't the same as 'UseWideLaneMask'. The latter makes it sound like a decision has already been made, whereas the former sounds more like a possibility if the target wishes to use them.

: TLI(TLI), LVL(LVL), IAI(IAI), UseWideLaneMask(UseWideLaneMask) {}
};

class TargetTransformInfo;
Expand Down
25 changes: 18 additions & 7 deletions llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -957,10 +957,18 @@ AArch64TTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
return TyL.first + ExtraCost;
}
case Intrinsic::get_active_lane_mask: {
auto *RetTy = dyn_cast<FixedVectorType>(ICA.getReturnType());
if (RetTy) {
EVT RetVT = getTLI()->getValueType(DL, RetTy);
EVT OpVT = getTLI()->getValueType(DL, ICA.getArgTypes()[0]);
auto RetTy = cast<VectorType>(ICA.getReturnType());
EVT RetVT = getTLI()->getValueType(DL, RetTy);
EVT OpVT = getTLI()->getValueType(DL, ICA.getArgTypes()[0]);
if (RetTy->isScalableTy()) {
if (getTLI()->shouldExpandGetActiveLaneMask(RetVT, OpVT) ||
(!ST->hasSVE2p1() && !ST->hasSME2()) ||
TLI->getTypeAction(RetTy->getContext(), RetVT) !=
TargetLowering::TypeSplitVector)
break;
auto LT = getTypeLegalizationCost(RetTy);
return LT.first / 2;
} else {
if (!getTLI()->shouldExpandGetActiveLaneMask(RetVT, OpVT) &&
!getTLI()->isTypeLegal(RetVT)) {
// We don't have enough context at this point to determine if the mask
Expand All @@ -972,7 +980,7 @@ AArch64TTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
// NOTE: getScalarizationOverhead returns a cost that's far too
// pessimistic for the actual generated codegen. In reality there are
// two instructions generated per lane.
return RetTy->getNumElements() * 2;
return cast<FixedVectorType>(RetTy)->getNumElements() * 2;
}
}
break;
Expand Down Expand Up @@ -6146,8 +6154,11 @@ bool AArch64TTIImpl::preferPredicateOverEpilogue(TailFoldingInfo *TFI) const {
if (Required == TailFoldingOpts::Disabled)
Required |= TailFoldingOpts::Simple;

if (!TailFoldingOptionLoc.satisfies(ST->getSVETailFoldingDefaultOpts(),
Required))
TailFoldingOpts DefaultOpts = ST->getSVETailFoldingDefaultOpts();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this behaviour to be honest. Enabling the use of wide lane masks itself doesn't automatically imply that we should tail-fold all simple loops. I see two issues here:

  1. I think if we want to go down the route of forcing tail-folding with wide lane masks then whatever flags we use should probably be named more appropriately, i.e. ForceTailFoldingWithWideLaneMasks or something like that.
  2. It gives users no option to only enable tail-folding with wide lane masks for reductions, etc. For example, -mllvm -enable-wide-lane-masks -mllvm -sve-tail-folding=reductions will never work.

If you want a way to force simple tail-folding with wide lane masks it might be better to use a target flag that lives in this file. For example, you could add a new option to -sve-tail-folding, i.e. something like -sve-tail-folding=simple+widelanemasks. This way also gives you the option of getting more fine-grained testing with just reductions, recurrences, etc.

You could rename the loop vectoriser flag EnableWideLaneMasks to ForceWideLaneMasks or UseWideLaneMasks so that there is still a way to test this for other targets, but it would be off by default. If the flag is off then the target would have another chance to opt in to this based on their preference. I guess you'd also have to change this interface to return an enum rather than a boolean. How about something like

enum PreferredPredicationStyle {
  None,
  PredicatedBody,
  PredicatedBodyWideLaneMasks
};

or something like that?

if (TFI->UseWideLaneMask)
DefaultOpts |= TailFoldingOpts::Simple;

if (!TailFoldingOptionLoc.satisfies(DefaultOpts, Required))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's a good idea to ignore the instruction threshold below because the problem that exists for normal tail-folding will also exist for wide lane masks. If the user really wants to test the special case of tail-folding for small loops they can always do it in conjunction with -sve-tail-folding-insn-threshold=0.

return false;

// Don't tail-fold for tight loops where we would be better off interleaving
Expand Down
17 changes: 15 additions & 2 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,10 @@ static cl::opt<TailFoldingStyle> ForceTailFoldingStyle(
"Use predicated EVL instructions for tail folding. If EVL "
"is unsupported, fallback to data-without-lane-mask.")));

cl::opt<bool> llvm::EnableWideActiveLaneMask(
"enable-wide-lane-mask", cl::init(false), cl::Hidden,
cl::desc("Enable use of wide get active lane mask instructions"));

static cl::opt<bool> MaximizeBandwidth(
"vectorizer-maximize-bandwidth", cl::init(false), cl::Hidden,
cl::desc("Maximize bandwidth when selecting vectorization factor which "
Expand Down Expand Up @@ -1346,6 +1350,15 @@ class LoopVectorizationCostModel {
return getTailFoldingStyle() != TailFoldingStyle::None;
}

bool useWideActiveLaneMask() const {
if (!EnableWideActiveLaneMask)
return false;

TailFoldingStyle TF = getTailFoldingStyle();
return TF == TailFoldingStyle::DataAndControlFlow ||
TF == TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck;
}

/// Return maximum safe number of elements to be processed per vector
/// iteration, which do not prevent store-load forwarding and are safe with
/// regard to the memory dependencies. Required for EVL-based VPlans to
Expand Down Expand Up @@ -4518,7 +4531,7 @@ LoopVectorizationPlanner::selectInterleaveCount(VPlan &Plan, ElementCount VF,
// 3. We don't interleave if we think that we will spill registers to memory
// due to the increased register pressure.

if (!CM.isScalarEpilogueAllowed())
if (!CM.isScalarEpilogueAllowed() && !CM.useWideActiveLaneMask())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves some explanation, why useWideActiveLaneMask should be treated differently. This check may serve as proxy for optimizing for size. Could you add a test to make sure we do not interleave with wide active lane masks when optimizing for size?

return 1;

if (any_of(Plan.getVectorLoopRegion()->getEntryBasicBlock()->phis(),
Expand Down Expand Up @@ -8995,7 +9008,7 @@ static ScalarEpilogueLowering getScalarEpilogueLowering(
};

// 4) if the TTI hook indicates this is profitable, request predication.
TailFoldingInfo TFI(TLI, &LVL, IAI);
TailFoldingInfo TFI(TLI, &LVL, IAI, EnableWideActiveLaneMask);
if (TTI->preferPredicateOverEpilogue(&TFI))
return CM_ScalarEpilogueNotNeededUsePredicate;

Expand Down
4 changes: 0 additions & 4 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,6 @@
using namespace llvm;
using namespace VPlanPatternMatch;

static cl::opt<bool> EnableWideActiveLaneMask(
"enable-wide-lane-mask", cl::init(false), cl::Hidden,
cl::desc("Enable use of wide get active lane mask instructions"));

bool VPlanTransforms::tryToConvertVPInstructionsToVPRecipes(
VPlan &Plan,
function_ref<const InductionDescriptor *(PHINode *)>
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ class VPRecipeBuilder;
struct VFRange;

extern cl::opt<bool> VerifyEachVPlan;
extern cl::opt<bool> EnableWideActiveLaneMask;

struct VPlanTransforms {
/// Helper to run a VPlan transform \p Transform on \p VPlan, forwarding extra
Expand Down
Loading