Skip to content

Commit 7a8a9b0

Browse files
committed
[LV] Stop using the legacy cost model for udiv + friends
In VPWidenRecipe::computeCost for the instructions udiv, sdiv, urem and srem we fall back on the legacy cost unnecessarily. At this point we know that the vplan must be functionally correct, i.e. if the divide/remainder is not safe to speculatively execute then we must have either: 1. Scalarised the operation, in which case we wouldn't be using a VPWidenRecipe, or 2. We've inserted a select for the second operand to ensure we don't fault through divide-by-zero. For 2) it's necessary to add the select operation to VPInstruction::computeCost so that we mirror the cost of the legacy cost model. The only problem with this is that we also generate selects in vplan for predicated loops with reductions, which *aren't* accounted for in the legacy cost model. In order to prevent asserts firing I've also added the selects to precomputeCosts to ensure the legacy costs match the vplan costs for reductions.
1 parent ffe21f1 commit 7a8a9b0

File tree

2 files changed

+16
-3
lines changed

2 files changed

+16
-3
lines changed

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4278,6 +4278,9 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() {
42784278
if (!VPI)
42794279
continue;
42804280
switch (VPI->getOpcode()) {
4281+
// Selects are not modelled in the legacy cost model if they are
4282+
// inserted for reductions.
4283+
case Instruction::Select:
42814284
case VPInstruction::ActiveLaneMask:
42824285
case VPInstruction::ExplicitVectorLength:
42834286
C += VPI->cost(VF, CostCtx);

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1029,6 +1029,15 @@ InstructionCost VPInstruction::computeCost(ElementCount VF,
10291029
}
10301030

10311031
switch (getOpcode()) {
1032+
case Instruction::Select: {
1033+
// TODO: It may be possible to improve this by analyzing where the
1034+
// condition operand comes from.
1035+
CmpInst::Predicate Pred = CmpInst::BAD_ICMP_PREDICATE;
1036+
auto *CondTy = toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF);
1037+
auto *VecTy = toVectorTy(Ctx.Types.inferScalarType(getOperand(1)), VF);
1038+
return Ctx.TTI.getCmpSelInstrCost(Instruction::Select, VecTy, CondTy, Pred,
1039+
Ctx.CostKind);
1040+
}
10321041
case Instruction::ExtractElement:
10331042
case VPInstruction::ExtractLane: {
10341043
// Add on the cost of extracting the element.
@@ -2099,9 +2108,10 @@ InstructionCost VPWidenRecipe::computeCost(ElementCount VF,
20992108
case Instruction::SDiv:
21002109
case Instruction::SRem:
21012110
case Instruction::URem:
2102-
// More complex computation, let the legacy cost-model handle this for now.
2103-
return Ctx.getLegacyCost(cast<Instruction>(getUnderlyingValue()), VF);
2104-
case Instruction::FNeg:
2111+
// If the div/rem operation isn't safe to speculate and requires
2112+
// predication, then the only way we can even create a vplan is to insert
2113+
// a select on the second input operand to ensure we use the value of 1
2114+
// for the inactive lanes. The select will be costed separately.
21052115
case Instruction::Add:
21062116
case Instruction::FAdd:
21072117
case Instruction::Sub:

0 commit comments

Comments
 (0)