Skip to content

Commit 776235f

Browse files
committed
[VPlan] Get Addr computation cost with scalar type if it is uniform for gather/scatter.
This patch query `getAddressComputationCost()` with scalar type if the address is uniform. This can help the cost for gather/scatter more accurate. In current LV, non consecutive VPWidenMemoryRecipe (gather/scatter) will account the cost of address computation. But there are some cases that the addr is uniform accross lanes, that makes the address can be calculated with scalar type and broadcast. I have a follow optimization that try to converts gather/scatter with uniform memory acces to scalar load/store + broadcast. With this optimization, we can remove this temporary change.
1 parent 7b8dea2 commit 776235f

File tree

2 files changed

+18
-3
lines changed

2 files changed

+18
-3
lines changed

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7018,6 +7018,12 @@ static bool planContainsAdditionalSimplifications(VPlan &Plan,
70187018
auto Iter = vp_depth_first_deep(Plan.getVectorLoopRegion()->getEntry());
70197019
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(Iter)) {
70207020
for (VPRecipeBase &R : *VPBB) {
7021+
if (auto *MR = dyn_cast<VPWidenMemoryRecipe>(&R)) {
7022+
// The address computation cost can be query as scalar type if the
7023+
// address is uniform.
7024+
if (!MR->isConsecutive() && vputils::isSingleScalar(MR->getAddr()))
7025+
return true;
7026+
}
70217027
if (auto *IR = dyn_cast<VPInterleaveRecipe>(&R)) {
70227028
auto *IG = IR->getInterleaveGroup();
70237029
unsigned NumMembers = IG->getNumMembers();

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3099,9 +3099,18 @@ InstructionCost VPWidenMemoryRecipe::computeCost(ElementCount VF,
30993099
const Value *Ptr = getLoadStorePointerOperand(&Ingredient);
31003100
assert(!Reverse &&
31013101
"Inconsecutive memory access should not have the order.");
3102-
return Ctx.TTI.getAddressComputationCost(Ty) +
3103-
Ctx.TTI.getGatherScatterOpCost(Opcode, Ty, Ptr, IsMasked, Alignment,
3104-
Ctx.CostKind, &Ingredient);
3102+
InstructionCost Cost = 0;
3103+
3104+
// If the address value is uniform across all lane, then the address can be
3105+
// calculated with scalar type and broacast.
3106+
if (vputils::isSingleScalar(getAddr()))
3107+
Cost += Ctx.TTI.getAddressComputationCost(Ty->getScalarType());
3108+
else
3109+
Cost += Ctx.TTI.getAddressComputationCost(Ty);
3110+
3111+
return Cost + Ctx.TTI.getGatherScatterOpCost(Opcode, Ty, Ptr, IsMasked,
3112+
Alignment, Ctx.CostKind,
3113+
&Ingredient);
31053114
}
31063115

31073116
InstructionCost Cost = 0;

0 commit comments

Comments
 (0)