Skip to content

Commit 18a3c34

Browse files
jrbyrnesbcahoon
authored andcommitted
[SLPVectorizer] Use accurate cost for external users of resize shuffles (llvm#137419)
When implementing the vectorization, we potentially need to add shuffles for external users. In such cases, we may be shuffling a smaller vector into a larger vector. When this happens `ResizeToVF` will just build a poison padded identity vector. Then the to build the final shuffle, we just use the `SK_InsertSubvector` mask. This is possibly clearer by looking at the included test in SLPVectorizer/AMDGPU/external-shuffle.ll In the exit block we have a bunch of shuffles to glue the vectorized tree match the `InsertElement` users. `TMP25` holds the result of resizing the v2i16 vectorized sequence to match the `InsertElement` size v16i16. Then `TMP26` is the final shuffle which replaces the `InsertElement` sequence. This is just an insertsubvector. However, when calculating the cost for these shuffles, we aren't modelling this correctly. `ResizeToVF` will indicate to `performExtractsShuffleAction` that we cannot use the original mask due to the resize shuffle. The consequence is that the cost calculation uses a different shuffle mask than what is ultimately used. Going back to the included test, we can consider again `TMP26`. Clearly we can see the shuffle uses a mask {0, 1, 2, 3, 16, 17, poison ..}. However, we will currently calculate the cost with a mask {0, 1, 2, 3, 20, 21, ...} we have replaced 16 and 17 with 20 and 21 (Index + Vector Size). Queries like BasicTTImpl::improveShuffleKindFromMask will not recognize this as an `SK_InsertSubvector` mask, and targets which have reduced costs for `SK_InsertSubvector` will not accurately calculate the cost. (cherry picked from commit c9a87a5)
1 parent a9c93da commit 18a3c34

File tree

5 files changed

+101
-120
lines changed

5 files changed

+101
-120
lines changed

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

Lines changed: 38 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -12799,25 +12799,47 @@ InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals) {
1279912799
InstructionCost SpillCost = getSpillCost();
1280012800
Cost += SpillCost + ExtractCost;
1280112801
auto &&ResizeToVF = [this, &Cost](const TreeEntry *TE, ArrayRef<int> Mask,
12802-
bool) {
12802+
bool ForSingleMask) {
1280312803
InstructionCost C = 0;
1280412804
unsigned VF = Mask.size();
1280512805
unsigned VecVF = TE->getVectorFactor();
12806-
if (VF != VecVF &&
12807-
(any_of(Mask, [VF](int Idx) { return Idx >= static_cast<int>(VF); }) ||
12808-
!ShuffleVectorInst::isIdentityMask(Mask, VF))) {
12809-
SmallVector<int> OrigMask(VecVF, PoisonMaskElem);
12810-
std::copy(Mask.begin(), std::next(Mask.begin(), std::min(VF, VecVF)),
12811-
OrigMask.begin());
12812-
C = ::getShuffleCost(*TTI, TTI::SK_PermuteSingleSrc,
12813-
getWidenedType(TE->getMainOp()->getType(), VecVF),
12814-
OrigMask);
12815-
LLVM_DEBUG(
12816-
dbgs() << "SLP: Adding cost " << C
12817-
<< " for final shuffle of insertelement external users.\n";
12818-
TE->dump(); dbgs() << "SLP: Current total cost = " << Cost << "\n");
12819-
Cost += C;
12820-
return std::make_pair(TE, true);
12806+
bool HasLargeIndex =
12807+
any_of(Mask, [VF](int Idx) { return Idx >= static_cast<int>(VF); });
12808+
if ((VF != VecVF && HasLargeIndex) ||
12809+
!ShuffleVectorInst::isIdentityMask(Mask, VF)) {
12810+
12811+
if (HasLargeIndex) {
12812+
SmallVector<int> OrigMask(VecVF, PoisonMaskElem);
12813+
std::copy(Mask.begin(), std::next(Mask.begin(), std::min(VF, VecVF)),
12814+
OrigMask.begin());
12815+
C = ::getShuffleCost(*TTI, TTI::SK_PermuteSingleSrc,
12816+
getWidenedType(TE->getMainOp()->getType(), VecVF),
12817+
OrigMask);
12818+
LLVM_DEBUG(
12819+
dbgs() << "SLP: Adding cost " << C
12820+
<< " for final shuffle of insertelement external users.\n";
12821+
TE->dump(); dbgs() << "SLP: Current total cost = " << Cost << "\n");
12822+
Cost += C;
12823+
return std::make_pair(TE, true);
12824+
}
12825+
12826+
if (!ForSingleMask) {
12827+
SmallVector<int> ResizeMask(VF, PoisonMaskElem);
12828+
for (unsigned I = 0; I < VF; ++I) {
12829+
if (Mask[I] != PoisonMaskElem)
12830+
ResizeMask[Mask[I]] = Mask[I];
12831+
}
12832+
if (!ShuffleVectorInst::isIdentityMask(ResizeMask, VF))
12833+
C = ::getShuffleCost(
12834+
*TTI, TTI::SK_PermuteSingleSrc,
12835+
getWidenedType(TE->getMainOp()->getType(), VecVF), ResizeMask);
12836+
LLVM_DEBUG(
12837+
dbgs() << "SLP: Adding cost " << C
12838+
<< " for final shuffle of insertelement external users.\n";
12839+
TE->dump(); dbgs() << "SLP: Current total cost = " << Cost << "\n");
12840+
12841+
Cost += C;
12842+
}
1282112843
}
1282212844
return std::make_pair(TE, false);
1282312845
};

0 commit comments

Comments
 (0)