Skip to content

[LV] Vectorize FMax via OrderedFCmpSelect w/o fast-math flags. #146711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions llvm/include/llvm/Analysis/IVDescriptors.h
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,9 @@ enum class RecurKind {
FMul, ///< Product of floats.
FMin, ///< FP min implemented in terms of select(cmp()).
FMax, ///< FP max implemented in terms of select(cmp()).
OrderedFCmpSelect, ///< FP max implemented in terms of select(cmp()), but
/// without any fast-math flags. Users need to handle NaNs
/// and signed zeros when generating code.
Comment on lines +50 to +52
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing to look for good name(s), how about:

Suggested change
OrderedFCmpSelect, ///< FP max implemented in terms of select(cmp()), but
/// without any fast-math flags. Users need to handle NaNs
/// and signed zeros when generating code.
FMaxOGT, ///< FP max implemented in terms of select(cmp(OGT)), being
/// strict with signed zeroes and NaN's, without fast-math flags.
FMinOLT, ///< FP min implemented in terms of select(cmp(OLT)), being
/// strict with signed zeroes and NaN's, without fast-math flags.

Have distinct cases for max vs. min, as in other cases, or fuse them into one FMinMax?

Rename existing FMax/FMin to FMaxFast/FMinFast?

NaN's are handled the same by OrderedFCmpSelect and by FMax/FMin-with-OGT/OLT as their compare predicate, where the latter assume all elements are non-NaN by FMF; the distinction lies in tie breaking +0/-0?

Need to also indicate if +0/-0 ties are broken according to the sign of the first zero or of last zero, which can be solved by computing index of first max/min or index of last max/min, respectively. Cases FMaxOGE/FMinOLE can be added later to denote +0/-0 ties are broken by last index rather than first index.

("Users need to handle ..." also applies to FMaxNum/FMinNum, suffice for enum to state semantics rather than how to keep them, there and here.)

FMinNum, ///< FP min with llvm.minnum semantics including NaNs.
FMaxNum, ///< FP max with llvm.maxnum semantics including NaNs.
FMinimum, ///< FP min with llvm.minimum semantics
Expand Down Expand Up @@ -252,9 +255,10 @@ class RecurrenceDescriptor {
/// Returns true if the recurrence kind is a floating-point min/max kind.
static bool isFPMinMaxRecurrenceKind(RecurKind Kind) {
return Kind == RecurKind::FMin || Kind == RecurKind::FMax ||
Kind == RecurKind::FMinNum || Kind == RecurKind::FMaxNum ||
Kind == RecurKind::FMinimum || Kind == RecurKind::FMaximum ||
Kind == RecurKind::FMinimumNum || Kind == RecurKind::FMaximumNum;
Kind == RecurKind::OrderedFCmpSelect || Kind == RecurKind::FMinNum ||
Kind == RecurKind::FMaxNum || Kind == RecurKind::FMinimum ||
Kind == RecurKind::FMaximum || Kind == RecurKind::FMinimumNum ||
Kind == RecurKind::FMaximumNum;
}

/// Returns true if the recurrence kind is any min/max kind.
Expand Down
12 changes: 11 additions & 1 deletion llvm/lib/Analysis/IVDescriptors.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -819,7 +819,8 @@ RecurrenceDescriptor::isMinMaxPattern(Instruction *I, RecurKind Kind,
if (match(I, m_OrdOrUnordFMin(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::FMin, I);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only max is handled by OrderedFCmpSelect, not min? Can start w/ FMaxOGT only.

if (match(I, m_OrdOrUnordFMax(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::FMax, I);
return InstDesc(
Kind == RecurKind::FMax || Kind == RecurKind::OrderedFCmpSelect, I);
if (match(I, m_FMinNum(m_Value(), m_Value())))
return InstDesc(Kind == RecurKind::FMin, I);
if (match(I, m_FMaxNum(m_Value(), m_Value())))
Expand Down Expand Up @@ -962,6 +963,14 @@ RecurrenceDescriptor::InstDesc RecurrenceDescriptor::isRecurrenceInstr(
"unexpected recurrence kind for minnum");
return InstDesc(I, RecurKind::FMinNum);
}
if (Kind == RecurKind::FMax || Kind == RecurKind::OrderedFCmpSelect) {
if (isa<SelectInst>(I))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SelectInst need not check that its comparison is ordered and isMinMaxPattern() like FCmpInst below?

return InstDesc(I, RecurKind::OrderedFCmpSelect);
auto *Cmp = dyn_cast<FCmpInst>(I);
if (Cmp && FCmpInst::isOrdered(Cmp->getPredicate()) &&
isMinMaxPattern(I, Kind, Prev).isRecurrence())
return InstDesc(I, RecurKind::OrderedFCmpSelect);
}
Comment on lines +966 to +973
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deserves a comment like the one above for maxnum/minnum handling first non-NaN portion on input?

return InstDesc(false, I);
}
if (isFMulAddIntrinsic(I))
Expand Down Expand Up @@ -1227,6 +1236,7 @@ unsigned RecurrenceDescriptor::getOpcode(RecurKind Kind) {
case RecurKind::UMin:
return Instruction::ICmp;
case RecurKind::FMax:
case RecurKind::OrderedFCmpSelect:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placed between FMax and FMin?

case RecurKind::FMin:
case RecurKind::FMaximum:
case RecurKind::FMinimum:
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/Transforms/Utils/LoopUtils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -937,6 +937,7 @@ constexpr Intrinsic::ID llvm::getReductionIntrinsicID(RecurKind RK) {
return Intrinsic::vector_reduce_umax;
case RecurKind::UMin:
return Intrinsic::vector_reduce_umin;
case RecurKind::OrderedFCmpSelect:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for max, returning Intrinsic::vector_reduce_fmax; need a separate case for min, returning Intrinsic::vector_reduce_fmin as below?

case RecurKind::FMax:
case RecurKind::FMaxNum:
return Intrinsic::vector_reduce_fmax;
Expand Down Expand Up @@ -1088,6 +1089,7 @@ CmpInst::Predicate llvm::getMinMaxReductionPredicate(RecurKind RK) {
return CmpInst::ICMP_SGT;
case RecurKind::FMin:
return CmpInst::FCMP_OLT;
case RecurKind::OrderedFCmpSelect:
case RecurKind::FMax:
return CmpInst::FCMP_OGT;
// We do not add FMinimum/FMaximum recurrence kind here since there is no
Expand Down Expand Up @@ -1310,6 +1312,7 @@ Value *llvm::createSimpleReduction(IRBuilderBase &Builder, Value *Src,
case RecurKind::SMin:
case RecurKind::UMax:
case RecurKind::UMin:
case RecurKind::OrderedFCmpSelect:
case RecurKind::FMax:
case RecurKind::FMin:
Comment on lines +1315 to 1317
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps better placed after fast/simple FMax/FMin, right before complex FMaxNum/FMinNum.

case RecurKind::FMinNum:
Expand Down
18 changes: 10 additions & 8 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4405,13 +4405,15 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() {

bool LoopVectorizationPlanner::isCandidateForEpilogueVectorization(
ElementCount VF) const {
// Cross iteration phis such as fixed-order recurrences and FMaxNum/FMinNum
// reductions need special handling and are currently unsupported.
// Cross iteration phis such as fixed-order recurrences and
// OrderedFCmpSelect/FMaxNum/FMinNum reductions need special handling and are
// currently unsupported.
if (any_of(OrigLoop->getHeader()->phis(), [&](PHINode &Phi) {
if (!Legal->isReductionVariable(&Phi))
return Legal->isFixedOrderRecurrence(&Phi);
RecurKind RK = Legal->getRecurrenceDescriptor(&Phi).getRecurrenceKind();
return RK == RecurKind::FMinNum || RK == RecurKind::FMaxNum;
return RK == RecurKind::OrderedFCmpSelect || RK == RecurKind::FMinNum ||
RK == RecurKind::FMaxNum;
}))
return false;

Expand Down Expand Up @@ -8847,11 +8849,11 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(

// Adjust the recipes for any inloop reductions.
adjustRecipesForReductions(Plan, RecipeBuilder, Range.Start);

// Apply mandatory transformation to handle FP maxnum/minnum reduction with
// NaNs if possible, bail out otherwise.
if (!VPlanTransforms::runPass(VPlanTransforms::handleMaxMinNumReductions,
*Plan))
// Apply mandatory transformation to handle FP maxnum/minnum/OrderedFCmpSelect
// reduction with NaNs and signed-zeros if possible, bail out otherwise.
if (!VPlanTransforms::runPass(
VPlanTransforms::handleMaxMinNumAndOrderedFCmpSelectReductions,
*Plan))
return nullptr;

// Transform recipes to abstract recipes if it is legal and beneficial and
Expand Down
3 changes: 3 additions & 0 deletions llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23705,6 +23705,7 @@ class HorizontalReduction {
case RecurKind::FindFirstIVUMin:
case RecurKind::FindLastIVSMax:
case RecurKind::FindLastIVUMax:
case RecurKind::OrderedFCmpSelect:
case RecurKind::FMaxNum:
case RecurKind::FMinNum:
case RecurKind::FMaximumNum:
Expand Down Expand Up @@ -23844,6 +23845,7 @@ class HorizontalReduction {
case RecurKind::FindFirstIVUMin:
case RecurKind::FindLastIVSMax:
case RecurKind::FindLastIVUMax:
case RecurKind::OrderedFCmpSelect:
case RecurKind::FMaxNum:
case RecurKind::FMinNum:
case RecurKind::FMaximumNum:
Expand Down Expand Up @@ -23948,6 +23950,7 @@ class HorizontalReduction {
case RecurKind::FindFirstIVUMin:
case RecurKind::FindLastIVSMax:
case RecurKind::FindLastIVUMax:
case RecurKind::OrderedFCmpSelect:
case RecurKind::FMaxNum:
case RecurKind::FMinNum:
case RecurKind::FMaximumNum:
Expand Down
112 changes: 110 additions & 2 deletions llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -654,7 +654,105 @@ void VPlanTransforms::attachCheckBlock(VPlan &Plan, Value *Cond,
}
}

bool VPlanTransforms::handleMaxMinNumReductions(VPlan &Plan) {
static bool handleOrderedFCmpSelect(VPlan &Plan,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth documenting how MaxNum/MinNum and FMaxOGT/FMinOLT are handled, along with examples demonstrating their challenges and how the latter processes the final horizontal reduction to fix its +0/-0 discrepancy by recording first/last argmax/argmin (or sign of first/last zero), and how the former reverts to scalar loop when encountering NaN's (rather than trying a similar fix?).

VPReductionPHIRecipe *RedPhiR) {
VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
VPWidenIntOrFpInductionRecipe *WideIV = nullptr;

// MaxOp feeding the reduction phi must be a select (either wide or a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Names refer to max only. Would be good to clarify parts that apply to min as well, if any.

// replicate recipe), where the phi is the last operand, and the compare
// predicate is strict. This ensures NaNs won't get propagated unless the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictness implies tie-breaker of signed zeroes is first zero, non-strict implies last zero.

Ordered comparison ensures NaNs won't get propagated if initial value is not a NaN, and if it is result will be NaN.

Can the Unordered case (and non-strict until supported by LastIndex) be handled by continuing with scalar epilog when a NaN is encountered, as in FMaxNum/FMinNum. Or does the +0/-0 tie breaking prevent that solution, or requires bailing out if either NaN or zero is encountered.

// initial value is NaN
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// initial value is NaN
// initial value is NaN.

Comment on lines +662 to +665
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be easier to spell out the pattern, e.g.,

//  %RedPhiR = phi(%Start, %MaxOp)
//  ...
//  %Cmp = fcmp(OGT/OLT, %Current, %RedPhiR)
//  %MaxOp = replicate/widen select(%Cmp, %Current, %RedPhiR)

auto *MaxOp = dyn_cast<VPRecipeWithIRFlags>(
RedPhiR->getBackedgeValue()->getDefiningRecipe());
if (!MaxOp)
return false;
auto *RepR = dyn_cast<VPReplicateRecipe>(MaxOp);
if (!isa<VPWidenSelectRecipe>(MaxOp) &&
!(RepR && (isa<SelectInst>(RepR->getUnderlyingInstr()))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
!(RepR && (isa<SelectInst>(RepR->getUnderlyingInstr()))))
!(RepR && isa<SelectInst>(RepR->getUnderlyingInstr())))

return false;

auto *Cmp = cast<VPRecipeWithIRFlags>(MaxOp->getOperand(0));
if (MaxOp->getOperand(1) == RedPhiR ||
!CmpInst::isStrictPredicate(Cmp->getPredicate()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better also make sure the operands of the strict Cmp are in the expected order - same as in the select MaxOp rather than flipped, and that Cmp isOrdered? I.e., OGT or OLT, excluding UGT and ULT.

return false;

for (auto &R : LoopRegion->getEntryBasicBlock()->phis()) {
// We need a wide canonical IV
if (auto *CurIV = dyn_cast<VPWidenIntOrFpInductionRecipe>(&R)) {
if (CurIV->isCanonical()) {
WideIV = CurIV;
break;
}
}
}

// A wide canonical IV is currently required.
// TODO: Create an induction if no suitable existing one is available.
if (!WideIV)
return false;
Comment on lines +690 to +693
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that a scalar canonical IV always exists, and is unique. But widen ones may exist (last one found is used?) or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, at this stage, all inductions will still be widened, but may not be canonical.


// Create a reduction that tracks the first indices where the latest maximum
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Create a reduction that tracks the first indices where the latest maximum
// Create a reduction that tracks the first indices where the running maximum

// value has been selected. This is later used to select the max value from
// the partial reductions in a way that correctly handles signed zeros and
// NaNs in the input.
Comment on lines +697 to +698
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// the partial reductions in a way that correctly handles signed zeros and
// NaNs in the input.
// the partial reductions in a way that correctly handles a signed zero maximum.

(NaN's nor non-zero numbers do not require the tracked indices.)

// Note that we do not need to check if the induction may hit the sentinel
// value. If the sentinel value gets hit, the final reduction value is at the
// last index or the maximum was never set and all lanes contain the start
// value. In either case, the correct value is selected.
Comment on lines +699 to +702
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to elaborate, is this clear/correct/helpful?

Suggested change
// Note that we do not need to check if the induction may hit the sentinel
// value. If the sentinel value gets hit, the final reduction value is at the
// last index or the maximum was never set and all lanes contain the start
// value. In either case, the correct value is selected.
// Note that we do not need to check if a final index holds the initial
// sentinel value (max uint). This is because of the following argument.
// The index is used to select one among several lanes holding the same
// maximum value, in order to select the correct sign in case that value is 0.
// So only indices of lanes holding the total maximum value are of interest.
// If a final index holds the sentinel value then either the maximum of the
// corresponding lane appeared first at that index, or this maximum was never
// updated and still retains the start value. The former case is clearly fine.
// In the latter case the start value is either NaN or a number greater or
// equal to all elements of that lane. If the start value is NaN all lanes
// eventually hold NaN as their maximum value, a tie to be broken arbitrarily.
// If the start value is equal to the total maximum value, then the maxima of
// all lanes are equal to the start value, including its sign, again a tie to
// be broken arbitrarily.
// If the start value differs from (is less than) the total maximum value,
// then an index holding the sentinel value corresponds to a non maximum lane
// and is thus irrelevant for tie breaking.

unsigned IVWidth =
VPTypeAnalysis(Plan).inferScalarType(WideIV)->getScalarSizeInBits();
LLVMContext &Ctx = Plan.getScalarHeader()->getIRBasicBlock()->getContext();
VPValue *UMinSentinel =
Plan.getOrAddLiveIn(ConstantInt::get(Ctx, APInt::getMaxValue(IVWidth)));
auto *IdxPhi = new VPReductionPHIRecipe(nullptr, RecurKind::FindFirstIVUMin,
*UMinSentinel, false, false, 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth commenting constant parameters.

IdxPhi->insertBefore(RedPhiR);
auto *MinIdxSel = new VPInstruction(Instruction::Select,
{MaxOp->getOperand(0), WideIV, IdxPhi});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{MaxOp->getOperand(0), WideIV, IdxPhi});
{Cmp, WideIV, IdxPhi});

MinIdxSel->insertAfter(MaxOp);
IdxPhi->addOperand(MinIdxSel);

// Find the first index holding with the maximum value. This is used to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Find the first index holding with the maximum value. This is used to
// Find the first index holding the maximum value. This is used to

// extract the lane with the final max value and is needed to handle signed
// zeros and NaNs in the input.
Comment on lines +717 to +718
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// extract the lane with the final max value and is needed to handle signed
// zeros and NaNs in the input.
// extract the lane with the final max value needed to handle signed zeros.

auto *MaxResult = find_singleton<VPSingleDefRecipe>(
RedPhiR->users(), [](VPUser *U, bool) -> VPSingleDefRecipe * {
auto *VPI = dyn_cast<VPInstruction>(U);
if (VPI && VPI->getOpcode() == VPInstruction::ComputeReductionResult)
return VPI;
return nullptr;
});
VPBuilder Builder(MaxResult->getParent(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPBuilder Builder(MaxResult->getParent(),
assert(MaxResult && "Missing a ComputeReductionResult user");
VPBuilder Builder(MaxResult->getParent(),

std::next(MaxResult->getIterator()));

// Create mask for lanes that have the max value and use it to mask out
// indices that don't contain maximum values.
auto *MaskFinalMaxValue = Builder.createNaryOp(
Instruction::FCmp, {MaxResult->getOperand(1), MaxResult},
VPIRFlags(CmpInst::FCMP_OEQ));
auto *IndicesWithMaxValue = Builder.createNaryOp(
Instruction::Select, {MaskFinalMaxValue, MinIdxSel, UMinSentinel});
auto *FirstMaxIdx = Builder.createNaryOp(
VPInstruction::ComputeFindIVResult,
{IdxPhi, WideIV->getStartValue(), UMinSentinel, IndicesWithMaxValue});
// Convert the index of the first max value to an index in the vector lanes of
// the partial reduction results. This ensures we select the first max value
// and acts as a tie-breaker if the partial reductions contain signed zeros.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vertical computation of each partial reduction result takes care of NaNs and signed zeroes, it is only the horizontal reduction of these vector lanes that require tie-breaking, to handle potential signed zeroes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the tie-breaking is only needed to handle signed zeroes when computing the final reduction results.

Consider a final partial reduction vector with -0.0, +0.0 and -0.0 was encountered before +0.0 (e.g. the max at iteration 2 is -0.0 and at iteration 3 it is +0.0. Doing a plain horizontal fmax reduction will produce +0.0 (-0.0 < +0.0).

We then compare the partial reduction values to the result of the horizontal reduction (-0.0 == +0.0 will also be true, selecting all lanes with zeros of any signed-ness)

Out of those, we select the one encountered first using FindFirstIV. Note that this only works for strict predicates.

auto *FirstMaxLane =
Builder.createNaryOp(Instruction::URem, {FirstMaxIdx, &Plan.getVFxUF()});

// Extract the final max value and update the users.
auto *Res = Builder.createNaryOp(VPInstruction::ExtractLane,
{FirstMaxLane, MaxResult->getOperand(1)});
MaxResult->replaceUsesWithIf(Res, [MaskFinalMaxValue](VPUser &U, unsigned) {
return &U != MaskFinalMaxValue;
});
return true;
}

bool VPlanTransforms::handleMaxMinNumAndOrderedFCmpSelectReductions(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better retain handleMaxMinNumReductions() alongside the newly introduced handleOrderedFCmpSelect() (or handleOrderedMaxMinReductions(), possibly with a common caller that performs common checks (handleComplexReductions())?

VPlan &Plan) {
auto GetMinMaxCompareValue = [](VPReductionPHIRecipe *RedPhiR) -> VPValue * {
auto *MinMaxR = dyn_cast<VPRecipeWithIRFlags>(
RedPhiR->getBackedgeValue()->getDefiningRecipe());
Expand Down Expand Up @@ -703,7 +801,8 @@ bool VPlanTransforms::handleMaxMinNumReductions(VPlan &Plan) {
if (RedPhiR)
return false;
if (Cur->getRecurrenceKind() != RecurKind::FMaxNum &&
Cur->getRecurrenceKind() != RecurKind::FMinNum) {
Cur->getRecurrenceKind() != RecurKind::FMinNum &&
Cur->getRecurrenceKind() != RecurKind::OrderedFCmpSelect) {
HasUnsupportedPhi = true;
continue;
}
Expand All @@ -713,6 +812,15 @@ bool VPlanTransforms::handleMaxMinNumReductions(VPlan &Plan) {
if (!RedPhiR)
return true;

if (HasUnsupportedPhi)
return false;

if (RedPhiR->getRecurrenceKind() == RecurKind::OrderedFCmpSelect)
return handleOrderedFCmpSelect(Plan, RedPhiR);

// Try to update the vector loop to exit early if any input is NaN and resume
// executing in the scalar loop to handle the NaNs there.

// We won't be able to resume execution in the scalar tail, if there are
// unsupported header phis or there is no scalar tail at all, due to
// tail-folding.
Expand Down
7 changes: 5 additions & 2 deletions llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -508,6 +508,7 @@ bool VPInstruction::canGenerateScalarForFirstLane() const {
return true;
switch (Opcode) {
case Instruction::Freeze:
case Instruction::FCmp:
case Instruction::ICmp:
case Instruction::PHI:
case Instruction::Select:
Expand Down Expand Up @@ -599,7 +600,8 @@ Value *VPInstruction::generate(VPTransformState &State) {
llvm_unreachable("should be handled by VPPhi::execute");
}
case Instruction::Select: {
bool OnlyFirstLaneUsed = vputils::onlyFirstLaneUsed(this);
bool OnlyFirstLaneUsed =
State.VF.isScalar() || vputils::onlyFirstLaneUsed(this);
Value *Cond = State.get(getOperand(0), OnlyFirstLaneUsed);
Value *Op1 = State.get(getOperand(1), OnlyFirstLaneUsed);
Value *Op2 = State.get(getOperand(2), OnlyFirstLaneUsed);
Expand Down Expand Up @@ -1015,7 +1017,8 @@ bool VPInstruction::isVectorToScalar() const {
getOpcode() == VPInstruction::ComputeAnyOfResult ||
getOpcode() == VPInstruction::ComputeFindIVResult ||
getOpcode() == VPInstruction::ComputeReductionResult ||
getOpcode() == VPInstruction::AnyOf;
getOpcode() == VPInstruction::AnyOf ||
getOpcode() == VPInstruction::ExtractLane;
}

bool VPInstruction::isSingleScalar() const {
Expand Down
9 changes: 4 additions & 5 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.h
Original file line number Diff line number Diff line change
Expand Up @@ -103,11 +103,10 @@ struct VPlanTransforms {
/// not valid.
static bool adjustFixedOrderRecurrences(VPlan &Plan, VPBuilder &Builder);

/// Check if \p Plan contains any FMaxNum or FMinNum reductions. If they do,
/// try to update the vector loop to exit early if any input is NaN and resume
/// executing in the scalar loop to handle the NaNs there. Return false if
/// this attempt was unsuccessful.
static bool handleMaxMinNumReductions(VPlan &Plan);
/// Check if \p Plan contains any FMaxNum, FMinNum or reductions. If they do,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Check if \p Plan contains any FMaxNum, FMinNum or reductions. If they do,
/// Check if \p Plan contains any FMaxNum, FMinNum or reductions. If they do,
Suggested change
/// Check if \p Plan contains any FMaxNum, FMinNum or reductions. If they do,
/// Check if \p Plan contains any FMaxNum, FMinNum or <how to call them> reductions. If they do,

/// try to update the vector loop to account for NaNs and signed zeros as
/// needed.
static bool handleMaxMinNumAndOrderedFCmpSelectReductions(VPlan &Plan);

/// Clear NSW/NUW flags from reduction instructions if necessary.
static void clearReductionWrapFlags(VPlan &Plan);
Expand Down
Loading