-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[VPlan] Don't apply predication discount to non-originally-predicated blocks #160449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 3 commits
e39fef4
972ee3b
cf6b435
697fbd6
874fdbb
88748ae
f072e3c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1249,6 +1249,27 @@ class LoopVectorizationCostModel { | |
/// Superset of instructions that return true for isScalarWithPredication. | ||
bool isPredicatedInst(Instruction *I) const; | ||
|
||
/// A helper function that returns how much we should divide the cost of a | ||
/// predicated block by. Typically this is the reciprocal of the block | ||
/// probability, i.e. if we return X we are assuming the predicated block will | ||
/// execute once for every X iterations of the loop header so the block should | ||
/// only contribute 1/X of its cost to the total cost calculation, but when | ||
/// optimizing for code size it will just be 1 as code size costs don't depend | ||
/// on execution probabilities. | ||
/// | ||
/// TODO: We should use actual block probability here, if available. | ||
/// Currently, we always assume predicated blocks have a 50% chance of | ||
/// executing, apart from blocks that are only predicated due to tail folding. | ||
inline unsigned | ||
getPredBlockCostDivisor(TargetTransformInfo::TargetCostKind CostKind, | ||
BasicBlock *BB) const { | ||
// If a block wasn't originally predicated but was predicated due to | ||
// e.g. tail folding, don't divide the cost. | ||
|
||
if (!Legal->blockNeedsPredication(BB)) | ||
return 1; | ||
return CostKind == TTI::TCK_CodeSize ? 1 : 2; | ||
} | ||
|
||
/// Return the costs for our two available strategies for lowering a | ||
/// div/rem operation which requires speculating at least one lane. | ||
/// First result is for scalarization (will be invalid for scalable | ||
|
@@ -2902,7 +2923,8 @@ LoopVectorizationCostModel::getDivRemSpeculationCost(Instruction *I, | |
// Scale the cost by the probability of executing the predicated blocks. | ||
// This assumes the predicated block for each vector lane is equally | ||
// likely. | ||
ScalarizationCost = ScalarizationCost / getPredBlockCostDivisor(CostKind); | ||
ScalarizationCost = | ||
ScalarizationCost / getPredBlockCostDivisor(CostKind, I->getParent()); | ||
} | ||
|
||
InstructionCost SafeDivisorCost = 0; | ||
|
@@ -5035,7 +5057,7 @@ InstructionCost LoopVectorizationCostModel::computePredInstDiscount( | |
} | ||
|
||
// Scale the total scalar cost by block probability. | ||
ScalarCost /= getPredBlockCostDivisor(CostKind); | ||
ScalarCost /= getPredBlockCostDivisor(CostKind, I->getParent()); | ||
|
||
// Compute the discount. A non-negative discount means the vector version | ||
// of the instruction costs more, and scalarizing would be beneficial. | ||
|
@@ -5085,10 +5107,11 @@ InstructionCost LoopVectorizationCostModel::expectedCost(ElementCount VF) { | |
// stores and instructions that may divide by zero) will now be | ||
// unconditionally executed. For the scalar case, we may not always execute | ||
// the predicated block, if it is an if-else block. Thus, scale the block's | ||
// cost by the probability of executing it. blockNeedsPredication from | ||
// Legal is used so as to not include all blocks in tail folded loops. | ||
if (VF.isScalar() && Legal->blockNeedsPredication(BB)) | ||
BlockCost /= getPredBlockCostDivisor(CostKind); | ||
// cost by the probability of executing it. | ||
// getPredBlockCostDivisor won't include blocks that are only predicated due | ||
// to tail folded loops | ||
if (VF.isScalar()) | ||
BlockCost /= getPredBlockCostDivisor(CostKind, BB); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here the call to Alternatively, you could remove the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh good point. And I guess that comment above confirms that the predication discount isn't meant to be applied to tail folded blocks? ab97c9b There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
Cost += BlockCost; | ||
} | ||
|
@@ -5167,7 +5190,7 @@ LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I, | |
// conditional branches, but may not be executed for each vector lane. Scale | ||
// the cost by the probability of executing the predicated block. | ||
if (isPredicatedInst(I)) { | ||
Cost /= getPredBlockCostDivisor(CostKind); | ||
Cost /= getPredBlockCostDivisor(CostKind, I->getParent()); | ||
|
||
// Add the cost of an i1 extract and a branch | ||
auto *VecI1Ty = | ||
|
@@ -6727,6 +6750,11 @@ bool VPCostContext::skipCostComputation(Instruction *UI, bool IsVector) const { | |
SkipCostComputation.contains(UI); | ||
} | ||
|
||
unsigned VPCostContext::getPredBlockCostDivisor( | ||
TargetTransformInfo::TargetCostKind CostKind, BasicBlock *BB) const { | ||
return CM.getPredBlockCostDivisor(CostKind, BB); | ||
} | ||
|
||
InstructionCost | ||
LoopVectorizationPlanner::precomputeCosts(VPlan &Plan, ElementCount VF, | ||
VPCostContext &CostCtx) const { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment on this function needs updating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the comment in cf6b435 to mention that tail-folded predication doesn't count in this case