Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 35 additions & 7 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1228,6 +1228,27 @@ class LoopVectorizationCostModel {
/// Superset of instructions that return true for isScalarWithPredication.
bool isPredicatedInst(Instruction *I) const;

/// A helper function that returns how much we should divide the cost of a
/// predicated block by. Typically this is the reciprocal of the block
/// probability, i.e. if we return X we are assuming the predicated block will
/// execute once for every X iterations of the loop header so the block should
/// only contribute 1/X of its cost to the total cost calculation, but when
/// optimizing for code size it will just be 1 as code size costs don't depend
/// on execution probabilities.
///
/// TODO: We should use actual block probability here, if available.
/// Currently, we always assume predicated blocks have a 50% chance of
/// executing, apart from blocks that are only predicated due to tail folding.
inline unsigned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment on this function needs updating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the comment in cf6b435 to mention that tail-folded predication doesn't count in this case

getPredBlockCostDivisor(TargetTransformInfo::TargetCostKind CostKind,
BasicBlock *BB) const {
// If a block wasn't originally predicated but was predicated due to
// e.g. tail folding, don't divide the cost.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note about the cases where this may not be 100% acurate?

if (!Legal->blockNeedsPredication(BB))
return 1;
return CostKind == TTI::TCK_CodeSize ? 1 : 2;
}

/// Return the costs for our two available strategies for lowering a
/// div/rem operation which requires speculating at least one lane.
/// First result is for scalarization (will be invalid for scalable
Expand Down Expand Up @@ -2883,7 +2904,8 @@ LoopVectorizationCostModel::getDivRemSpeculationCost(Instruction *I,
// Scale the cost by the probability of executing the predicated blocks.
// This assumes the predicated block for each vector lane is equally
// likely.
ScalarizationCost = ScalarizationCost / getPredBlockCostDivisor(CostKind);
ScalarizationCost =
ScalarizationCost / getPredBlockCostDivisor(CostKind, I->getParent());
}

InstructionCost SafeDivisorCost = 0;
Expand Down Expand Up @@ -5015,7 +5037,7 @@ InstructionCost LoopVectorizationCostModel::computePredInstDiscount(
}

// Scale the total scalar cost by block probability.
ScalarCost /= getPredBlockCostDivisor(CostKind);
ScalarCost /= getPredBlockCostDivisor(CostKind, I->getParent());

// Compute the discount. A non-negative discount means the vector version
// of the instruction costs more, and scalarizing would be beneficial.
Expand Down Expand Up @@ -5065,10 +5087,11 @@ InstructionCost LoopVectorizationCostModel::expectedCost(ElementCount VF) {
// stores and instructions that may divide by zero) will now be
// unconditionally executed. For the scalar case, we may not always execute
// the predicated block, if it is an if-else block. Thus, scale the block's
// cost by the probability of executing it. blockNeedsPredication from
// Legal is used so as to not include all blocks in tail folded loops.
if (VF.isScalar() && Legal->blockNeedsPredication(BB))
BlockCost /= getPredBlockCostDivisor(CostKind);
// cost by the probability of executing it.
// getPredBlockCostDivisor won't include blocks that are only predicated due
// to tail folded loops
if (VF.isScalar())
BlockCost /= getPredBlockCostDivisor(CostKind, BB);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the call to getPredBlockCostDivisor is already guarded by Legal->blockNeedsPredication(BB). Instead of changing the meaning/behaviour of getPredBlockCostDivisor is it perhaps better to simply guard all calls to the function in a similar way? Then you can add a comment to getPredBlockCostDivisor saying that it should only ever be called for blocks that are predicated in the original scalar code.

Alternatively, you could remove the Legal->blockNeedsPredication(BB) check here and update the comment above getPredBlockCostDivisor saying that it will always return 1 for blocks that are predicated due to tail-folding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point. And I guess that comment above confirms that the predication discount isn't meant to be applied to tail folded blocks? ab97c9b

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the redundant Legal->blockNeedsPredication(BB) in cf6b435 and updated the comment. I figured we're going to need to plumb through the basic block anyway in #158690 so we might as well do it in this PR


Cost += BlockCost;
}
Expand Down Expand Up @@ -5147,7 +5170,7 @@ LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I,
// conditional branches, but may not be executed for each vector lane. Scale
// the cost by the probability of executing the predicated block.
if (isPredicatedInst(I)) {
Cost /= getPredBlockCostDivisor(CostKind);
Cost /= getPredBlockCostDivisor(CostKind, I->getParent());

// Add the cost of an i1 extract and a branch
auto *VecI1Ty =
Expand Down Expand Up @@ -6693,6 +6716,11 @@ bool VPCostContext::skipCostComputation(Instruction *UI, bool IsVector) const {
SkipCostComputation.contains(UI);
}

unsigned VPCostContext::getPredBlockCostDivisor(
TargetTransformInfo::TargetCostKind CostKind, BasicBlock *BB) const {
return CM.getPredBlockCostDivisor(CostKind, BB);
}

InstructionCost
LoopVectorizationPlanner::precomputeCosts(VPlan &Plan, ElementCount VF,
VPCostContext &CostCtx) const {
Expand Down
19 changes: 4 additions & 15 deletions llvm/lib/Transforms/Vectorize/VPlanHelpers.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,21 +50,6 @@ Value *getRuntimeVF(IRBuilderBase &B, Type *Ty, ElementCount VF);
Value *createStepForVF(IRBuilderBase &B, Type *Ty, ElementCount VF,
int64_t Step);

/// A helper function that returns how much we should divide the cost of a
/// predicated block by. Typically this is the reciprocal of the block
/// probability, i.e. if we return X we are assuming the predicated block will
/// execute once for every X iterations of the loop header so the block should
/// only contribute 1/X of its cost to the total cost calculation, but when
/// optimizing for code size it will just be 1 as code size costs don't depend
/// on execution probabilities.
///
/// TODO: We should use actual block probability here, if available. Currently,
/// we always assume predicated blocks have a 50% chance of executing.
inline unsigned
getPredBlockCostDivisor(TargetTransformInfo::TargetCostKind CostKind) {
return CostKind == TTI::TCK_CodeSize ? 1 : 2;
}

/// A range of powers-of-2 vectorization factors with fixed start and
/// adjustable end. The range includes start and excludes end, e.g.,:
/// [1, 16) = {1, 2, 4, 8}
Expand Down Expand Up @@ -366,6 +351,10 @@ struct VPCostContext {
/// has already been pre-computed.
bool skipCostComputation(Instruction *UI, bool IsVector) const;

/// \returns how much the cost of a predicated block should be divided by.
unsigned getPredBlockCostDivisor(TargetTransformInfo::TargetCostKind CostKind,
BasicBlock *BB) const;

/// Returns the OperandInfo for \p V, if it is a live-in.
TargetTransformInfo::OperandValueInfo getOperandInfo(VPValue *V) const;

Expand Down
2 changes: 1 addition & 1 deletion llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3255,7 +3255,7 @@ InstructionCost VPReplicateRecipe::computeCost(ElementCount VF,
// Scale the cost by the probability of executing the predicated blocks.
// This assumes the predicated block for each vector lane is equally
// likely.
ScalarCost /= getPredBlockCostDivisor(Ctx.CostKind);
ScalarCost /= Ctx.getPredBlockCostDivisor(Ctx.CostKind, UI->getParent());
return ScalarCost;
}
case Instruction::Load:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -612,63 +612,18 @@ define void @low_trip_count_fold_tail_scalarized_store(ptr %dst) {
;
; COMMON-LABEL: define void @low_trip_count_fold_tail_scalarized_store(
; COMMON-SAME: ptr [[DST:%.*]]) {
; COMMON-NEXT: [[ENTRY:.*:]]
; COMMON-NEXT: br label %[[VECTOR_PH:.*]]
; COMMON: [[VECTOR_PH]]:
; COMMON-NEXT: br label %[[VECTOR_BODY:.*]]
; COMMON: [[VECTOR_BODY]]:
; COMMON-NEXT: br i1 true, label %[[PRED_STORE_IF:.*]], label %[[PRED_STORE_CONTINUE:.*]]
; COMMON: [[PRED_STORE_IF]]:
; COMMON-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[DST]], i64 0
; COMMON-NEXT: store i8 0, ptr [[TMP0]], align 1
; COMMON-NEXT: br label %[[PRED_STORE_CONTINUE]]
; COMMON: [[PRED_STORE_CONTINUE]]:
; COMMON-NEXT: br i1 true, label %[[PRED_STORE_IF1:.*]], label %[[PRED_STORE_CONTINUE2:.*]]
; COMMON: [[PRED_STORE_IF1]]:
; COMMON-NEXT: [[TMP1:%.*]] = getelementptr i8, ptr [[DST]], i64 1
; COMMON-NEXT: store i8 1, ptr [[TMP1]], align 1
; COMMON-NEXT: br label %[[PRED_STORE_CONTINUE2]]
; COMMON: [[PRED_STORE_CONTINUE2]]:
; COMMON-NEXT: br i1 true, label %[[PRED_STORE_IF3:.*]], label %[[PRED_STORE_CONTINUE4:.*]]
; COMMON: [[PRED_STORE_IF3]]:
; COMMON-NEXT: [[TMP2:%.*]] = getelementptr i8, ptr [[DST]], i64 2
; COMMON-NEXT: store i8 2, ptr [[TMP2]], align 1
; COMMON-NEXT: br label %[[PRED_STORE_CONTINUE4]]
; COMMON: [[PRED_STORE_CONTINUE4]]:
; COMMON-NEXT: br i1 true, label %[[PRED_STORE_IF5:.*]], label %[[PRED_STORE_CONTINUE6:.*]]
; COMMON: [[PRED_STORE_IF5]]:
; COMMON-NEXT: [[TMP3:%.*]] = getelementptr i8, ptr [[DST]], i64 3
; COMMON-NEXT: store i8 3, ptr [[TMP3]], align 1
; COMMON-NEXT: br label %[[PRED_STORE_CONTINUE6]]
; COMMON: [[PRED_STORE_CONTINUE6]]:
; COMMON-NEXT: br i1 true, label %[[PRED_STORE_IF7:.*]], label %[[PRED_STORE_CONTINUE8:.*]]
; COMMON: [[PRED_STORE_IF7]]:
; COMMON-NEXT: [[TMP4:%.*]] = getelementptr i8, ptr [[DST]], i64 4
; COMMON-NEXT: store i8 4, ptr [[TMP4]], align 1
; COMMON-NEXT: br label %[[PRED_STORE_CONTINUE8]]
; COMMON: [[PRED_STORE_CONTINUE8]]:
; COMMON-NEXT: br i1 true, label %[[PRED_STORE_IF9:.*]], label %[[PRED_STORE_CONTINUE10:.*]]
; COMMON: [[PRED_STORE_IF9]]:
; COMMON-NEXT: [[TMP5:%.*]] = getelementptr i8, ptr [[DST]], i64 5
; COMMON-NEXT: store i8 5, ptr [[TMP5]], align 1
; COMMON-NEXT: br label %[[PRED_STORE_CONTINUE10]]
; COMMON: [[PRED_STORE_CONTINUE10]]:
; COMMON-NEXT: br i1 true, label %[[PRED_STORE_IF11:.*]], label %[[PRED_STORE_CONTINUE12:.*]]
; COMMON: [[PRED_STORE_IF11]]:
; COMMON-NEXT: [[TMP6:%.*]] = getelementptr i8, ptr [[DST]], i64 6
; COMMON-NEXT: store i8 6, ptr [[TMP6]], align 1
; COMMON-NEXT: br label %[[PRED_STORE_CONTINUE12]]
; COMMON: [[PRED_STORE_CONTINUE12]]:
; COMMON-NEXT: br i1 false, label %[[PRED_STORE_IF13:.*]], label %[[EXIT:.*]]
; COMMON: [[PRED_STORE_IF13]]:
; COMMON-NEXT: [[TMP7:%.*]] = getelementptr i8, ptr [[DST]], i64 7
; COMMON-NEXT: store i8 7, ptr [[TMP7]], align 1
; COMMON-NEXT: br label %[[EXIT]]
; COMMON: [[EXIT]]:
; COMMON-NEXT: br label %[[SCALAR_PH:.*]]
; COMMON: [[SCALAR_PH]]:
; COMMON-NEXT: br [[EXIT1:label %.*]]
; COMMON: [[SCALAR_PH1:.*:]]
; COMMON-NEXT: [[ENTRY:.*]]:
; COMMON-NEXT: br label %[[EXIT1:.*]]
; COMMON: [[EXIT1]]:
; COMMON-NEXT: [[IV:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[IV_NEXT:%.*]], %[[EXIT1]] ]
; COMMON-NEXT: [[IV_TRUNC:%.*]] = trunc i64 [[IV]] to i8
; COMMON-NEXT: [[GEP:%.*]] = getelementptr i8, ptr [[DST]], i64 [[IV]]
; COMMON-NEXT: store i8 [[IV_TRUNC]], ptr [[GEP]], align 1
; COMMON-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1
; COMMON-NEXT: [[EC:%.*]] = icmp eq i64 [[IV_NEXT]], 7
; COMMON-NEXT: br i1 [[EC]], label %[[SCALAR_PH1:.*]], label %[[EXIT1]]
; COMMON: [[SCALAR_PH1]]:
; COMMON-NEXT: ret void
;
entry:
br label %loop
Expand Down
Loading