Skip to content

Commit 3542150

Browse files
david-armtstellar
authored andcommitted
[LoopVectorize] Fix cost model assert when vectorising calls (#125716)
The legacy and vplan cost models did not agree because VPWidenCallRecipe::computeCost only calculates the cost of the call instruction, whereas LoopVectorizationCostModel::setVectorizedCallDecision in some cases adds on the cost of a synthesised mask argument. However, this mask is always 'splat(i1 true)' which should be hoisted out of the loop during codegen. In order to synchronise the two cost models I have two options: 1) Also add the cost of the splat to the vplan model, or 2) Remove the cost of the splat from the legacy model. I chose 2) because I feel this more closely represents what the final code will look like. There is an argument that we should take account of such broadcast costs in the preheader when deciding if it's profitable to vectorise a loop, however there isn't currently a mechanism to do this. We currently only take account of the runtime checks when assessing profitability and what the minimum trip count should be. However, I don't believe this work needs doing as part of this PR. (cherry picked from commit 1930524)
1 parent 7bcfaa1 commit 3542150

File tree

2 files changed

+262
-23
lines changed

2 files changed

+262
-23
lines changed

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Lines changed: 1 addition & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -6331,19 +6331,8 @@ void LoopVectorizationCostModel::setVectorizedCallDecision(ElementCount VF) {
63316331
break;
63326332
}
63336333

6334-
// Add in the cost of synthesizing a mask if one wasn't required.
6335-
InstructionCost MaskCost = 0;
6336-
if (VecFunc && UsesMask && !MaskRequired)
6337-
MaskCost = TTI.getShuffleCost(
6338-
TargetTransformInfo::SK_Broadcast,
6339-
VectorType::get(IntegerType::getInt1Ty(
6340-
VecFunc->getFunctionType()->getContext()),
6341-
VF),
6342-
{}, CostKind);
6343-
63446334
if (TLI && VecFunc && !CI->isNoBuiltin())
6345-
VectorCost =
6346-
TTI.getCallInstrCost(nullptr, RetTy, Tys, CostKind) + MaskCost;
6335+
VectorCost = TTI.getCallInstrCost(nullptr, RetTy, Tys, CostKind);
63476336

63486337
// Find the cost of an intrinsic; some targets may have instructions that
63496338
// perform the operation without needing an actual call.

0 commit comments

Comments
 (0)