Skip to content

Commit 65ad21d

Browse files
authored
[RISCV] Handle recurrences in RISCVVLOptimizer (#151285)
After #144666 we now support vectorizing loops with induction variables with EVL tail folding. The induction updates don't use VP intrinsics to avoid VL toggles but instead rely on RISCVVLOptimizer. However RISCVVLOptimizer can't reason about cycles or recurrences today, which means we are left with a VL toggle to VLMAX: # %bb.1: # %for.body.preheader li a2, 0 vsetvli a3, zero, e32, m2, ta, ma vid.v v8 .LBB0_2: # %vector.body # =>This Inner Loop Header: Depth=1 sub a3, a1, a2 sh2add a4, a2, a0 vsetvli a3, a3, e32, m2, ta, ma vle32.v v10, (a4) add a2, a2, a3 vadd.vv v10, v10, v8 vse32.v v10, (a4) vsetvli a4, zero, e32, m2, ta, ma vadd.vx v8, v8, a3 bne a2, a1, .LBB0_2 This patch teaches RISCVVLOptimizer to reason about recurrences so we can remove the VLMAX toggle: # %bb.1: # %for.body.preheader li a2, 0 vsetvli a3, zero, e32, m2, ta, ma vid.v v8 .LBB0_2: # %vector.body # =>This Inner Loop Header: Depth=1 sub a3, a1, a2 sh2add a4, a2, a0 vsetvli a3, a3, e32, m2, ta, ma vle32.v v10, (a4) add a2, a2, a3 vadd.vv v10, v10, v8 vse32.v v10, (a4) vadd.vx v8, v8, a3 bne a2, a1, .LBB0_2 With this we remove a significant number of VL toggles and vsetvli instructions across llvm-test-suite and SPEC CPU 2017 with tail folding enabled, since it affects every loop with an induction variable. This builds upon the work in #124530 where we started computing what VL each instruction demanded, and generalizes it to an optimistic sparse dataflow analysis: - We begin by optimistically assuming no VL is used by any instruction, and push instructions onto the worklist starting from the bottom. - For each instruction on the worklist we apply the transfer function, which propagates the VL needed by that instruction upwards to the instructions it uses. If a use's demanded VL changes, it's added to the worklist. - Eventually this converges to a fixpoint when all uses have been processed and every demanded VL has been propagated throughout the entire use-def chain. Only after this is the DemandedVL map accurate. Some implementation details: - The roots are stores (or other unsupported instructions not in `isSupportedInstr`) or copies to physical registers (they fail the `any_of(MI.defs(), isPhysical)` check) - This patch untangles `getMinimumVLForUser` and `checkUsers`. `getMinimumVLForUser` now returns how many lanes of an operand are read by an instruction, whilst `checkUsers` checks that an instruction and its users have compatible EEW/EMULs. - The `DemandedVL` struct was added so that we have a default constructor of 0 for `DenseMap<const MachineInstr *, DemandedVL> DemandedVLs`, so we don't need to check if a key exists when looking things up. There was no measurable compile time impact on llvm-test-suite or SPEC CPU 2017. The analysis will always terminate, there are more details in this EuroLLVM talk here: https://www.youtube.com/watch?v=Mfb5fRSdJAc Fixes #149354
1 parent c44e015 commit 65ad21d

File tree

5 files changed

+271
-64
lines changed

5 files changed

+271
-64
lines changed

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

Lines changed: 109 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,19 @@
1010
// instructions are inserted.
1111
//
1212
// The purpose of this optimization is to make the VL argument, for instructions
13-
// that have a VL argument, as small as possible. This is implemented by
14-
// visiting each instruction in reverse order and checking that if it has a VL
15-
// argument, whether the VL can be reduced.
13+
// that have a VL argument, as small as possible.
14+
//
15+
// This is split into a sparse dataflow analysis where we determine what VL is
16+
// demanded by each instruction first, and then afterwards try to reduce the VL
17+
// of each instruction if it demands less than its VL operand.
18+
//
19+
// The analysis is explained in more detail in the 2025 EuroLLVM Developers'
20+
// Meeting talk "Accidental Dataflow Analysis: Extending the RISC-V VL
21+
// Optimizer", which is available on YouTube at
22+
// https://www.youtube.com/watch?v=Mfb5fRSdJAc
23+
//
24+
// The slides for the talk are available at
25+
// https://llvm.org/devmtg/2025-04/slides/technical_talk/lau_accidental_dataflow.pdf
1626
//
1727
//===---------------------------------------------------------------------===//
1828

@@ -30,6 +40,27 @@ using namespace llvm;
3040

3141
namespace {
3242

43+
/// Wrapper around MachineOperand that defaults to immediate 0.
44+
struct DemandedVL {
45+
MachineOperand VL;
46+
DemandedVL() : VL(MachineOperand::CreateImm(0)) {}
47+
DemandedVL(MachineOperand VL) : VL(VL) {}
48+
static DemandedVL vlmax() {
49+
return DemandedVL(MachineOperand::CreateImm(RISCV::VLMaxSentinel));
50+
}
51+
bool operator!=(const DemandedVL &Other) const {
52+
return !VL.isIdenticalTo(Other.VL);
53+
}
54+
55+
DemandedVL max(const DemandedVL &X) const {
56+
if (RISCV::isVLKnownLE(VL, X.VL))
57+
return X;
58+
if (RISCV::isVLKnownLE(X.VL, VL))
59+
return *this;
60+
return DemandedVL::vlmax();
61+
}
62+
};
63+
3364
class RISCVVLOptimizer : public MachineFunctionPass {
3465
const MachineRegisterInfo *MRI;
3566
const MachineDominatorTree *MDT;
@@ -51,17 +82,25 @@ class RISCVVLOptimizer : public MachineFunctionPass {
5182
StringRef getPassName() const override { return PASS_NAME; }
5283

5384
private:
54-
std::optional<MachineOperand>
55-
getMinimumVLForUser(const MachineOperand &UserOp) const;
56-
/// Returns the largest common VL MachineOperand that may be used to optimize
57-
/// MI. Returns std::nullopt if it failed to find a suitable VL.
58-
std::optional<MachineOperand> checkUsers(const MachineInstr &MI) const;
85+
DemandedVL getMinimumVLForUser(const MachineOperand &UserOp) const;
86+
/// Returns true if the users of \p MI have compatible EEWs and SEWs.
87+
bool checkUsers(const MachineInstr &MI) const;
5988
bool tryReduceVL(MachineInstr &MI) const;
6089
bool isCandidate(const MachineInstr &MI) const;
90+
void transfer(const MachineInstr &MI);
6191

6292
/// For a given instruction, records what elements of it are demanded by
6393
/// downstream users.
64-
DenseMap<const MachineInstr *, std::optional<MachineOperand>> DemandedVLs;
94+
DenseMap<const MachineInstr *, DemandedVL> DemandedVLs;
95+
SetVector<const MachineInstr *> Worklist;
96+
97+
/// \returns all vector virtual registers that \p MI uses.
98+
auto virtual_vec_uses(const MachineInstr &MI) const {
99+
return make_filter_range(MI.uses(), [this](const MachineOperand &MO) {
100+
return MO.isReg() && MO.getReg().isVirtual() &&
101+
RISCVRegisterInfo::isRVVRegClass(MRI->getRegClass(MO.getReg()));
102+
});
103+
}
65104
};
66105

67106
/// Represents the EMUL and EEW of a MachineOperand.
@@ -847,10 +886,15 @@ static std::optional<OperandInfo> getOperandInfo(const MachineOperand &MO) {
847886
return OperandInfo(getEMULEqualsEEWDivSEWTimesLMUL(*Log2EEW, MI), *Log2EEW);
848887
}
849888

889+
static bool isTupleInsertInstr(const MachineInstr &MI);
890+
850891
/// Return true if this optimization should consider MI for VL reduction. This
851892
/// white-list approach simplifies this optimization for instructions that may
852893
/// have more complex semantics with relation to how it uses VL.
853894
static bool isSupportedInstr(const MachineInstr &MI) {
895+
if (MI.isPHI() || MI.isFullCopy() || isTupleInsertInstr(MI))
896+
return true;
897+
854898
const RISCVVPseudosTable::PseudoInfo *RVV =
855899
RISCVVPseudosTable::getPseudoInfo(MI.getOpcode());
856900

@@ -1348,21 +1392,24 @@ bool RISCVVLOptimizer::isCandidate(const MachineInstr &MI) const {
13481392
return true;
13491393
}
13501394

1351-
std::optional<MachineOperand>
1395+
DemandedVL
13521396
RISCVVLOptimizer::getMinimumVLForUser(const MachineOperand &UserOp) const {
13531397
const MachineInstr &UserMI = *UserOp.getParent();
13541398
const MCInstrDesc &Desc = UserMI.getDesc();
13551399

1400+
if (UserMI.isPHI() || UserMI.isFullCopy() || isTupleInsertInstr(UserMI))
1401+
return DemandedVLs.lookup(&UserMI);
1402+
13561403
if (!RISCVII::hasVLOp(Desc.TSFlags) || !RISCVII::hasSEWOp(Desc.TSFlags)) {
13571404
LLVM_DEBUG(dbgs() << " Abort due to lack of VL, assume that"
13581405
" use VLMAX\n");
1359-
return std::nullopt;
1406+
return DemandedVL::vlmax();
13601407
}
13611408

13621409
if (RISCVII::readsPastVL(
13631410
TII->get(RISCV::getRVVMCOpcode(UserMI.getOpcode())).TSFlags)) {
13641411
LLVM_DEBUG(dbgs() << " Abort because used by unsafe instruction\n");
1365-
return std::nullopt;
1412+
return DemandedVL::vlmax();
13661413
}
13671414

13681415
unsigned VLOpNum = RISCVII::getVLOpNum(Desc);
@@ -1376,11 +1423,10 @@ RISCVVLOptimizer::getMinimumVLForUser(const MachineOperand &UserOp) const {
13761423
if (UserOp.isTied()) {
13771424
assert(UserOp.getOperandNo() == UserMI.getNumExplicitDefs() &&
13781425
RISCVII::isFirstDefTiedToFirstUse(UserMI.getDesc()));
1379-
auto DemandedVL = DemandedVLs.lookup(&UserMI);
1380-
if (!DemandedVL || !RISCV::isVLKnownLE(*DemandedVL, VLOp)) {
1426+
if (!RISCV::isVLKnownLE(DemandedVLs.lookup(&UserMI).VL, VLOp)) {
13811427
LLVM_DEBUG(dbgs() << " Abort because user is passthru in "
13821428
"instruction with demanded tail\n");
1383-
return std::nullopt;
1429+
return DemandedVL::vlmax();
13841430
}
13851431
}
13861432

@@ -1393,11 +1439,8 @@ RISCVVLOptimizer::getMinimumVLForUser(const MachineOperand &UserOp) const {
13931439

13941440
// If we know the demanded VL of UserMI, then we can reduce the VL it
13951441
// requires.
1396-
if (auto DemandedVL = DemandedVLs.lookup(&UserMI)) {
1397-
assert(isCandidate(UserMI));
1398-
if (RISCV::isVLKnownLE(*DemandedVL, VLOp))
1399-
return DemandedVL;
1400-
}
1442+
if (RISCV::isVLKnownLE(DemandedVLs.lookup(&UserMI).VL, VLOp))
1443+
return DemandedVLs.lookup(&UserMI);
14011444

14021445
return VLOp;
14031446
}
@@ -1450,22 +1493,23 @@ static bool isSegmentedStoreInstr(const MachineInstr &MI) {
14501493
}
14511494
}
14521495

1453-
std::optional<MachineOperand>
1454-
RISCVVLOptimizer::checkUsers(const MachineInstr &MI) const {
1455-
std::optional<MachineOperand> CommonVL;
1456-
SmallSetVector<MachineOperand *, 8> Worklist;
1496+
bool RISCVVLOptimizer::checkUsers(const MachineInstr &MI) const {
1497+
if (MI.isPHI() || MI.isFullCopy() || isTupleInsertInstr(MI))
1498+
return true;
1499+
1500+
SmallSetVector<MachineOperand *, 8> OpWorklist;
14571501
SmallPtrSet<const MachineInstr *, 4> PHISeen;
14581502
for (auto &UserOp : MRI->use_operands(MI.getOperand(0).getReg()))
1459-
Worklist.insert(&UserOp);
1503+
OpWorklist.insert(&UserOp);
14601504

1461-
while (!Worklist.empty()) {
1462-
MachineOperand &UserOp = *Worklist.pop_back_val();
1505+
while (!OpWorklist.empty()) {
1506+
MachineOperand &UserOp = *OpWorklist.pop_back_val();
14631507
const MachineInstr &UserMI = *UserOp.getParent();
14641508
LLVM_DEBUG(dbgs() << " Checking user: " << UserMI << "\n");
14651509

14661510
if (UserMI.isFullCopy() && UserMI.getOperand(0).getReg().isVirtual()) {
14671511
LLVM_DEBUG(dbgs() << " Peeking through uses of COPY\n");
1468-
Worklist.insert_range(llvm::make_pointer_range(
1512+
OpWorklist.insert_range(llvm::make_pointer_range(
14691513
MRI->use_operands(UserMI.getOperand(0).getReg())));
14701514
continue;
14711515
}
@@ -1481,8 +1525,8 @@ RISCVVLOptimizer::checkUsers(const MachineInstr &MI) const {
14811525
// whole register group).
14821526
if (!isTupleInsertInstr(CandidateMI) &&
14831527
!isSegmentedStoreInstr(CandidateMI))
1484-
return std::nullopt;
1485-
Worklist.insert(&UseOp);
1528+
return false;
1529+
OpWorklist.insert(&UseOp);
14861530
}
14871531
continue;
14881532
}
@@ -1492,28 +1536,14 @@ RISCVVLOptimizer::checkUsers(const MachineInstr &MI) const {
14921536
if (!PHISeen.insert(&UserMI).second)
14931537
continue;
14941538
LLVM_DEBUG(dbgs() << " Peeking through uses of PHI\n");
1495-
Worklist.insert_range(llvm::make_pointer_range(
1539+
OpWorklist.insert_range(llvm::make_pointer_range(
14961540
MRI->use_operands(UserMI.getOperand(0).getReg())));
14971541
continue;
14981542
}
14991543

1500-
auto VLOp = getMinimumVLForUser(UserOp);
1501-
if (!VLOp)
1502-
return std::nullopt;
1503-
1504-
// Use the largest VL among all the users. If we cannot determine this
1505-
// statically, then we cannot optimize the VL.
1506-
if (!CommonVL || RISCV::isVLKnownLE(*CommonVL, *VLOp)) {
1507-
CommonVL = *VLOp;
1508-
LLVM_DEBUG(dbgs() << " User VL is: " << VLOp << "\n");
1509-
} else if (!RISCV::isVLKnownLE(*VLOp, *CommonVL)) {
1510-
LLVM_DEBUG(dbgs() << " Abort because cannot determine a common VL\n");
1511-
return std::nullopt;
1512-
}
1513-
15141544
if (!RISCVII::hasSEWOp(UserMI.getDesc().TSFlags)) {
15151545
LLVM_DEBUG(dbgs() << " Abort due to lack of SEW operand\n");
1516-
return std::nullopt;
1546+
return false;
15171547
}
15181548

15191549
std::optional<OperandInfo> ConsumerInfo = getOperandInfo(UserOp);
@@ -1522,7 +1552,7 @@ RISCVVLOptimizer::checkUsers(const MachineInstr &MI) const {
15221552
LLVM_DEBUG(dbgs() << " Abort due to unknown operand information.\n");
15231553
LLVM_DEBUG(dbgs() << " ConsumerInfo is: " << ConsumerInfo << "\n");
15241554
LLVM_DEBUG(dbgs() << " ProducerInfo is: " << ProducerInfo << "\n");
1525-
return std::nullopt;
1555+
return false;
15261556
}
15271557

15281558
if (!OperandInfo::areCompatible(*ProducerInfo, *ConsumerInfo)) {
@@ -1531,11 +1561,11 @@ RISCVVLOptimizer::checkUsers(const MachineInstr &MI) const {
15311561
<< " Abort due to incompatible information for EMUL or EEW.\n");
15321562
LLVM_DEBUG(dbgs() << " ConsumerInfo is: " << ConsumerInfo << "\n");
15331563
LLVM_DEBUG(dbgs() << " ProducerInfo is: " << ProducerInfo << "\n");
1534-
return std::nullopt;
1564+
return false;
15351565
}
15361566
}
15371567

1538-
return CommonVL;
1568+
return true;
15391569
}
15401570

15411571
bool RISCVVLOptimizer::tryReduceVL(MachineInstr &MI) const {
@@ -1551,9 +1581,7 @@ bool RISCVVLOptimizer::tryReduceVL(MachineInstr &MI) const {
15511581
return false;
15521582
}
15531583

1554-
auto CommonVL = DemandedVLs.lookup(&MI);
1555-
if (!CommonVL)
1556-
return false;
1584+
auto *CommonVL = &DemandedVLs.at(&MI).VL;
15571585

15581586
assert((CommonVL->isImm() || CommonVL->getReg().isVirtual()) &&
15591587
"Expected VL to be an Imm or virtual Reg");
@@ -1564,7 +1592,7 @@ bool RISCVVLOptimizer::tryReduceVL(MachineInstr &MI) const {
15641592
const MachineInstr *VLMI = MRI->getVRegDef(CommonVL->getReg());
15651593
if (RISCVInstrInfo::isFaultOnlyFirstLoad(*VLMI) &&
15661594
!MDT->dominates(VLMI, &MI))
1567-
CommonVL = VLMI->getOperand(RISCVII::getVLOpNum(VLMI->getDesc()));
1595+
CommonVL = &VLMI->getOperand(RISCVII::getVLOpNum(VLMI->getDesc()));
15681596
}
15691597

15701598
if (!RISCV::isVLKnownLE(*CommonVL, VLOp)) {
@@ -1599,6 +1627,24 @@ bool RISCVVLOptimizer::tryReduceVL(MachineInstr &MI) const {
15991627
return true;
16001628
}
16011629

1630+
static bool isPhysical(const MachineOperand &MO) {
1631+
return MO.isReg() && MO.getReg().isPhysical();
1632+
}
1633+
1634+
/// Look through \p MI's operands and propagate what it demands to its uses.
1635+
void RISCVVLOptimizer::transfer(const MachineInstr &MI) {
1636+
if (!isSupportedInstr(MI) || !checkUsers(MI) || any_of(MI.defs(), isPhysical))
1637+
DemandedVLs[&MI] = DemandedVL::vlmax();
1638+
1639+
for (const MachineOperand &MO : virtual_vec_uses(MI)) {
1640+
const MachineInstr *Def = MRI->getVRegDef(MO.getReg());
1641+
DemandedVL Prev = DemandedVLs[Def];
1642+
DemandedVLs[Def] = DemandedVLs[Def].max(getMinimumVLForUser(MO));
1643+
if (DemandedVLs[Def] != Prev)
1644+
Worklist.insert(Def);
1645+
}
1646+
}
1647+
16021648
bool RISCVVLOptimizer::runOnMachineFunction(MachineFunction &MF) {
16031649
if (skipFunction(MF.getFunction()))
16041650
return false;
@@ -1614,15 +1660,18 @@ bool RISCVVLOptimizer::runOnMachineFunction(MachineFunction &MF) {
16141660

16151661
assert(DemandedVLs.empty());
16161662

1617-
// For each instruction that defines a vector, compute what VL its
1618-
// downstream users demand.
1663+
// For each instruction that defines a vector, propagate the VL it
1664+
// uses to its inputs.
16191665
for (MachineBasicBlock *MBB : post_order(&MF)) {
16201666
assert(MDT->isReachableFromEntry(MBB));
1621-
for (MachineInstr &MI : reverse(*MBB)) {
1622-
if (!isCandidate(MI))
1623-
continue;
1624-
DemandedVLs.insert({&MI, checkUsers(MI)});
1625-
}
1667+
for (MachineInstr &MI : reverse(*MBB))
1668+
Worklist.insert(&MI);
1669+
}
1670+
1671+
while (!Worklist.empty()) {
1672+
const MachineInstr *MI = Worklist.front();
1673+
Worklist.remove(MI);
1674+
transfer(*MI);
16261675
}
16271676

16281677
// Then go through and see if we can reduce the VL of any instructions to

llvm/test/CodeGen/RISCV/rvv/reproducer-pr146855.ll

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,15 @@ target triple = "riscv64-unknown-linux-gnu"
66
define i32 @_ZN4Mesh12rezone_countESt6vectorIiSaIiEERiS3_(<vscale x 4 x i32> %wide.load, <vscale x 4 x i1> %0, <vscale x 4 x i1> %1, <vscale x 4 x i1> %2, <vscale x 4 x i1> %3) #0 {
77
; CHECK-LABEL: _ZN4Mesh12rezone_countESt6vectorIiSaIiEERiS3_:
88
; CHECK: # %bb.0: # %entry
9-
; CHECK-NEXT: vsetvli a0, zero, e32, m2, ta, ma
9+
; CHECK-NEXT: vsetivli zero, 0, e32, m2, ta, ma
1010
; CHECK-NEXT: vmv1r.v v8, v0
1111
; CHECK-NEXT: li a0, 0
1212
; CHECK-NEXT: vmv.v.i v10, 0
1313
; CHECK-NEXT: vmv.v.i v12, 0
1414
; CHECK-NEXT: vmv.v.i v14, 0
1515
; CHECK-NEXT: .LBB0_1: # %vector.body
1616
; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
17-
; CHECK-NEXT: vsetvli a1, zero, e32, m2, ta, mu
17+
; CHECK-NEXT: vsetivli zero, 0, e32, m2, ta, mu
1818
; CHECK-NEXT: vmv1r.v v0, v8
1919
; CHECK-NEXT: slli a0, a0, 2
2020
; CHECK-NEXT: vmv2r.v v16, v10

0 commit comments

Comments
 (0)