Skip to content

Commit 3bb6e60

Browse files
authored
[AArch64] Update cost model for extracting halves from 128+ bit vectors (#155601)
Previously, only 128-bit "NEON" vectors were given sensible costs. Cores with vscale>1 can use SVE's EXT instruction to perform a fixed-length subvector extract. This is a follow-up from the codegen patches at #152554. They show that with the help of MOVPRFX, we can do subvector extracts with roughly one instruction. We now at least give sensible costs for extracting 128-bit halves from a 256-bit vector.
1 parent d4de780 commit 3bb6e60

File tree

3 files changed

+349
-4
lines changed

3 files changed

+349
-4
lines changed

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5752,11 +5752,14 @@ AArch64TTIImpl::getShuffleCost(TTI::ShuffleKind Kind, VectorType *DstTy,
57525752

57535753
Kind = improveShuffleKindFromMask(Kind, Mask, SrcTy, Index, SubTp);
57545754
bool IsExtractSubvector = Kind == TTI::SK_ExtractSubvector;
5755-
// A subvector extract can be implemented with an ext (or trivial extract, if
5756-
// from lane 0). This currently only handles low or high extracts to prevent
5757-
// SLP vectorizer regressions.
5755+
// A subvector extract can be implemented with a NEON/SVE ext (or trivial
5756+
// extract, if from lane 0) for 128-bit NEON vectors or legal SVE vectors.
5757+
// This currently only handles low or high extracts to prevent SLP vectorizer
5758+
// regressions.
5759+
// Note that SVE's ext instruction is destructive, but it can be fused with
5760+
// a movprfx to act like a constructive instruction.
57585761
if (IsExtractSubvector && LT.second.isFixedLengthVector()) {
5759-
if (LT.second.is128BitVector() &&
5762+
if (LT.second.getFixedSizeInBits() >= 128 &&
57605763
cast<FixedVectorType>(SubTp)->getNumElements() ==
57615764
LT.second.getVectorNumElements() / 2) {
57625765
if (Index == 0)

llvm/test/Analysis/CostModel/AArch64/shuffle-extract.ll

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
22
; RUN: opt < %s -mtriple=aarch64--linux-gnu -passes="print<cost-model>" -cost-kind=all 2>&1 -disable-output | FileCheck %s
33

4+
; This tests the cost of fixed-length subvector extracts for NEON.
5+
; For the SVE equivalent test, see sve-vls-shuffle-extract.ll
6+
47
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
58

69
define void @extract_half() {

0 commit comments

Comments
 (0)