Skip to content

Commit d9d9d9a

Browse files
authored
[ARM][MVE] Add shuffle costs for LDn and STn instructions. (#145304)
LD2 is represented in IR as deinterleave-shuffle(load), and ST2 as store(interleave-shuffle). Whilst the shuffle would be expensive in general for MVE (it does not have zip/uzp instructions), it should be treated as cheap when part of the LD2/ST2 pattern. This borrows some code from the AArch64 backed to produce lower costs. (Some of which still shows as higher than it should - that just shows how broken the generic shuffle costs are at the moment, they would be lower if getShuffleCost was called directly as opposed to going through getInstructionCost).
1 parent 3b6d879 commit d9d9d9a

File tree

2 files changed

+303
-155
lines changed

2 files changed

+303
-155
lines changed

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1335,6 +1335,39 @@ InstructionCost ARMTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
13351335

13361336
if (!Mask.empty()) {
13371337
std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(SrcTy);
1338+
// Check for LD2/LD4 instructions, which are represented in llvm IR as
1339+
// deinterleaving-shuffle(load). The shuffle cost could potentially be
1340+
// free, but we model it with a cost of LT.first so that LD2/LD4 have a
1341+
// higher cost than just the load.
1342+
if (Args.size() >= 1 && isa<LoadInst>(Args[0]) &&
1343+
(LT.second.getScalarSizeInBits() == 8 ||
1344+
LT.second.getScalarSizeInBits() == 16 ||
1345+
LT.second.getScalarSizeInBits() == 32) &&
1346+
LT.second.getSizeInBits() == 128 &&
1347+
((TLI->getMaxSupportedInterleaveFactor() >= 2 &&
1348+
ShuffleVectorInst::isDeInterleaveMaskOfFactor(Mask, 2)) ||
1349+
(TLI->getMaxSupportedInterleaveFactor() == 4 &&
1350+
ShuffleVectorInst::isDeInterleaveMaskOfFactor(Mask, 4))))
1351+
return ST->getMVEVectorCostFactor(CostKind) *
1352+
std::max<InstructionCost>(1, LT.first / 4);
1353+
1354+
// Check for ST2/ST4 instructions, which are represented in llvm IR as
1355+
// store(interleaving-shuffle). The shuffle cost could potentially be
1356+
// free, but we model it with a cost of LT.first so that ST2/ST4 have a
1357+
// higher cost than just the store.
1358+
if (CxtI && CxtI->hasOneUse() && isa<StoreInst>(*CxtI->user_begin()) &&
1359+
(LT.second.getScalarSizeInBits() == 8 ||
1360+
LT.second.getScalarSizeInBits() == 16 ||
1361+
LT.second.getScalarSizeInBits() == 32) &&
1362+
LT.second.getSizeInBits() == 128 &&
1363+
((TLI->getMaxSupportedInterleaveFactor() >= 2 &&
1364+
ShuffleVectorInst::isInterleaveMask(
1365+
Mask, 2, SrcTy->getElementCount().getKnownMinValue() * 2)) ||
1366+
(TLI->getMaxSupportedInterleaveFactor() == 4 &&
1367+
ShuffleVectorInst::isInterleaveMask(
1368+
Mask, 4, SrcTy->getElementCount().getKnownMinValue() * 2))))
1369+
return ST->getMVEVectorCostFactor(CostKind) * LT.first;
1370+
13381371
if (LT.second.isVector() &&
13391372
Mask.size() <= LT.second.getVectorNumElements() &&
13401373
(isVREVMask(Mask, LT.second, 16) || isVREVMask(Mask, LT.second, 32) ||

0 commit comments

Comments
 (0)