Skip to content

Commit 4bb250d

Browse files
authored
[VPlan] Always consider register pressure on RISC-V (#156951)
Stacked on #156923 In https://godbolt.org/z/8svWaredK, we spill a lot on RISC-V because whilst the largest element type is i8, we generate a bunch of pointer vectors for gathers and scatters. This means the VF chosen is quite high e.g. <vscale x 16 x i8>, but we end up using a bunch of <vscale x 16 x i64> m8 registers for the pointers. This was briefly fixed by #132190 where we computed register pressure in VPlan and used it to prune VFs that were likely to spill. The legacy cost model wasn't able to do this pruning because it didn't have visibility into the pointer vectors that were needed for the gathers/scatters. However VF pruning was restricted again to just the case when max bandwidth was enabled in #141736 to avoid an AArch64 regression, and restricted again in #149056 to only prune VFs that had max bandwidth enabled. On RISC-V we take advantage of register grouping for performance and choose a default of LMUL 2, which means there are 16 registers to work with – half the number as SVE, so we encounter higher register pressure more frequently. As such, we likely want to always consider pruning VFs with high register pressure and not just the VFs from max bandwidth. This adds a TTI hook to opt into this behaviour for RISC-V which fixes the motivating godbolt example above. When last checked this significantly reduces the number of spills on SPEC CPU 2017, up to 80% on 538.imagick_r.
1 parent 73cfd45 commit 4bb250d

File tree

9 files changed

+309
-48
lines changed

9 files changed

+309
-48
lines changed

llvm/include/llvm/Analysis/TargetTransformInfo.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1847,6 +1847,10 @@ class TargetTransformInfo {
18471847
/// otherwise scalar epilogue loop.
18481848
LLVM_ABI bool preferEpilogueVectorization() const;
18491849

1850+
/// \returns True if the loop vectorizer should discard any VFs where the
1851+
/// maximum register pressure exceeds getNumberOfRegisters.
1852+
LLVM_ABI bool shouldConsiderVectorizationRegPressure() const;
1853+
18501854
/// \returns True if the target wants to expand the given reduction intrinsic
18511855
/// into a shuffle sequence.
18521856
LLVM_ABI bool shouldExpandReduction(const IntrinsicInst *II) const;

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1105,6 +1105,8 @@ class TargetTransformInfoImplBase {
11051105

11061106
virtual bool preferEpilogueVectorization() const { return true; }
11071107

1108+
virtual bool shouldConsiderVectorizationRegPressure() const { return false; }
1109+
11081110
virtual bool shouldExpandReduction(const IntrinsicInst *II) const {
11091111
return true;
11101112
}

llvm/lib/Analysis/TargetTransformInfo.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1425,6 +1425,10 @@ bool TargetTransformInfo::preferEpilogueVectorization() const {
14251425
return TTIImpl->preferEpilogueVectorization();
14261426
}
14271427

1428+
bool TargetTransformInfo::shouldConsiderVectorizationRegPressure() const {
1429+
return TTIImpl->shouldConsiderVectorizationRegPressure();
1430+
}
1431+
14281432
TargetTransformInfo::VPLegalization
14291433
TargetTransformInfo::getVPLegalizationStrategy(const VPIntrinsic &VPI) const {
14301434
return TTIImpl->getVPLegalizationStrategy(VPI);

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,8 @@ class RISCVTTIImpl final : public BasicTTIImplBase<RISCVTTIImpl> {
141141
return false;
142142
}
143143

144+
bool shouldConsiderVectorizationRegPressure() const override { return true; }
145+
144146
InstructionCost
145147
getMaskedMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
146148
unsigned AddressSpace,

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -393,6 +393,10 @@ static cl::opt<bool> EnableEarlyExitVectorization(
393393
cl::desc(
394394
"Enable vectorization of early exit loops with uncountable exits."));
395395

396+
static cl::opt<bool> ConsiderRegPressure(
397+
"vectorizer-consider-reg-pressure", cl::init(false), cl::Hidden,
398+
cl::desc("Discard VFs if their register pressure is too high."));
399+
396400
// Likelyhood of bypassing the vectorized loop because there are zero trips left
397401
// after prolog. See `emitIterationCountCheck`.
398402
static constexpr uint32_t MinItersBypassWeights[] = {1, 127};
@@ -3693,6 +3697,14 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
36933697

36943698
bool LoopVectorizationCostModel::shouldConsiderRegPressureForVF(
36953699
ElementCount VF) {
3700+
if (ConsiderRegPressure.getNumOccurrences())
3701+
return ConsiderRegPressure;
3702+
3703+
// TODO: We should eventually consider register pressure for all targets. The
3704+
// TTI hook is temporary whilst target-specific issues are being fixed.
3705+
if (TTI.shouldConsiderVectorizationRegPressure())
3706+
return true;
3707+
36963708
if (!useMaxBandwidth(VF.isScalable()
36973709
? TargetTransformInfo::RGK_ScalableVector
36983710
: TargetTransformInfo::RGK_FixedWidthVector))

llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,11 @@
11
; REQUIRES: asserts
2-
; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v,+zvfbfmin -prefer-predicate-over-epilogue=scalar-epilogue -debug-only=loop-vectorize,vplan --disable-output -riscv-v-register-bit-width-lmul=1 -S < %s 2>&1 | FileCheck %s
3-
4-
; TODO: -prefer-predicate-over-epilogue=scalar-epilogue was added to allow
5-
; unrolling. Calculate register pressure for all VPlans, not just unrolled ones,
6-
; and remove.
2+
; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v,+zvfbfmin -debug-only=loop-vectorize,vplan --disable-output -riscv-v-register-bit-width-lmul=1 -S < %s 2>&1 | FileCheck %s
73

84
define void @add(ptr noalias nocapture readonly %src1, ptr noalias nocapture readonly %src2, i32 signext %size, ptr noalias nocapture writeonly %result) {
95
; CHECK-LABEL: add
10-
; CHECK: LV(REG): Found max usage: 2 item
11-
; CHECK-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 3 registers
6+
; CHECK: LV(REG): VF = vscale x 4
7+
; CHECK-NEXT: LV(REG): Found max usage: 2 item
8+
; CHECK-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
129
; CHECK-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 4 registers
1310
; CHECK-NEXT: LV(REG): Found invariant usage: 1 item
1411
; CHECK-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers

llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-f16.ll

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,19 @@
11
; REQUIRES: asserts
2-
; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v,+zvfh -prefer-predicate-over-epilogue=scalar-epilogue -debug-only=loop-vectorize,vplan --disable-output -riscv-v-register-bit-width-lmul=1 -S < %s 2>&1 | FileCheck %s --check-prefix=ZVFH
3-
; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v,+zvfhmin -prefer-predicate-over-epilogue=scalar-epilogue -debug-only=loop-vectorize,vplan --disable-output -riscv-v-register-bit-width-lmul=1 -S < %s 2>&1 | FileCheck %s --check-prefix=ZVFHMIN
4-
5-
; TODO: -prefer-predicate-over-epilogue=scalar-epilogue was added to allow
6-
; unrolling. Calculate register pressure for all VPlans, not just unrolled ones,
7-
; and remove.
2+
; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v,+zvfh -debug-only=loop-vectorize,vplan --disable-output -riscv-v-register-bit-width-lmul=1 -S < %s 2>&1 | FileCheck %s --check-prefix=ZVFH
3+
; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v,+zvfhmin -debug-only=loop-vectorize,vplan --disable-output -riscv-v-register-bit-width-lmul=1 -S < %s 2>&1 | FileCheck %s --check-prefix=ZVFHMIN
84

95
define void @add(ptr noalias nocapture readonly %src1, ptr noalias nocapture readonly %src2, i32 signext %size, ptr noalias nocapture writeonly %result) {
10-
; CHECK-LABEL: add
11-
; ZVFH: LV(REG): Found max usage: 2 item
12-
; ZVFH-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 3 registers
6+
; ZVFH-LABEL: add
7+
; ZVFH: LV(REG): VF = vscale x 4
8+
; ZVFH-NEXT: LV(REG): Found max usage: 2 item
9+
; ZVFH-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
1310
; ZVFH-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 2 registers
1411
; ZVFH-NEXT: LV(REG): Found invariant usage: 1 item
1512
; ZVFH-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
16-
; ZVFHMIN: LV(REG): Found max usage: 2 item
17-
; ZVFHMIN-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 3 registers
13+
; ZVFHMIN-LABEL: add
14+
; ZVFHMIN: LV(REG): VF = vscale x 4
15+
; ZVFHMIN-NEXT: LV(REG): Found max usage: 2 item
16+
; ZVFHMIN-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
1817
; ZVFHMIN-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 4 registers
1918
; ZVFHMIN-NEXT: LV(REG): Found invariant usage: 1 item
2019
; ZVFHMIN-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers

0 commit comments

Comments
 (0)