[RISCV] Reduce minimum VL needed for vslidedown.vx in RISCVVLOptimizer #168392

lukel97 · 2025-11-17T15:59:08Z

Whenever #149042 is relanded we will soon start EVL tail folding vectorized loops that have live-outs, e.g.:

int f(int *x, int n) {
  for (int i = 0; i < n; i++) {
    int y = x[i] + 1;
    x[y] = y;
  }
  return y;
}

These are vectorized by extracting the last "active lane" in the loop's exit:

loop:
  %vl = call i32 @llvm.experimental.get.vector.length(i64 %avl, i32 4, i1 true)
  ...

exit:
  %lastidx = sub i64 %vl, 1
  %lastelt = extractelement <vscale x 4 x i32> %y, i64 %lastidx

Which in RISC-V translates to a vslidedown.vx with a VL of 1:

bb.loop:
    %vl:gprnox0 = PseudoVSETVLI ...
    %y:vr = PseudoVADD_VI_M1 $noreg, %x, 1,  AVL=-1
    ...
bb.exit:
    %lastidx:gprnox0 = ADDI %vl, -1
    %w:vr = PseudoVSLIDEDOWN_VX_M1 $noreg, %y, %lastidx, AVL=1

However today we will fail to reduce the VL of %y in the loop and will end up with two extra VL toggles. The reason being that today RISCVVLOptimizer is conservative with vslidedown.vx as it can read the lanes of %y past its own VL. So in getMinimumVLForUser we say that vslidedown.vx demands the entirety of %y.

One observation with the sequence above is that it only actually needs to read the first %vl lanes of %y, because the last lane of vs2 used is offset + 1. In this case, that's %lastidx + 1 = %vl - 1 + 1 = %vl.

This PR teaches RISCVVLOptimizer about this case in getMinimumVLForVSLIDEDOWN_VX, and in doing so removes the VL toggles.

The one case that I had to think about for a bit was whenever ADDI %vl, -1 wraps, i.e. when %vl=0 and the resulting offset is all ones. This should always be larger than the largest VLMAX, so vs2 will be completely slid down and absent from the output. So we don't need to read anything from vs2.

This patch on its own has no observable effect on llvm-test-suite or SPEC CPU 2017 w/ rva23u64 today.

llvmbot · 2025-11-17T15:59:43Z

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

Whenever #149042 is relanded we will soon start EVL tail folding vectorized loops that have live-outs, e.g.:

int f(int *x, int n) {
  for (int i = 0; i &lt; n; i++) {
    int y = x[i] + 1;
    x[y] = y;
  }
  return y;
}

These are vectorized by extracting the last "active lane" in the loop's exit:

loop:
  %vl = call i32 @<!-- -->llvm.experimental.get.vector.length(i64 %avl, i32 4, i1 true)
  ...

exit:
  %lastidx = sub i64 %vl, 1
  %lastelt = extractelement &lt;vscale x 4 x i32&gt; %y, i64 %lastidx

Which in RISC-V translates to a vslidedown.vx with a VL of 1:

bb.loop:
    %vl:gprnox0 = PseudoVSETVLI ...
    %y:vr = PseudoVADD_VI_M1 $noreg, %x, 1,  AVL=-1
    ...
bb.exit:
    %lastidx:gprnox0 = ADDI %vl, -1
    %w:vr = PseudoVSLIDEDOWN_VX_M1 $noreg, %y, %lastidx, AVL=1

However today we will fail to reduce the VL of %y in the loop and will end up with two extra VL toggles. The reason being that today RISCVVLOptimizer is conservative with vslidedown.vx as it can read the lanes of %y past its own VL. So in getMinimumVLForUser we say that vslidedown.vx demands the entirety of %y.

One observation with the sequence above is that it only actually needs to read the first %vl lanes of %y, because the last lane of vs2 used is offset + 1. In this case, that's %lastidx + 1 = %vl - 1 + 1 = %vl.

This PR teaches RISCVVLOptimizer about this case in getMinimumVLForVSLIDEDOWN_VX, and in doing so removes the VL toggles.

The one case that I had to think about for a bit was whenever ADDI %vl, -1 wraps, i.e. when the resulting offset is all ones. This should always be larger than the largest VLMAX, so vs2 will be completely slid down and absent from the output. So we don't need to read anything from vs2.

This patch on its own has no observable effect on llvm-test-suite or SPEC CPU 2017 w/ rva23u64 today.

Full diff: https://github.com/llvm/llvm-project/pull/168392.diff

3 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp (+40-1)
(added) llvm/test/CodeGen/RISCV/rvv/vl-opt-live-out.ll (+44)
(modified) llvm/test/CodeGen/RISCV/rvv/vl-opt.mir (+18)

diff --git a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
index 0a8838cbd45c7..5011b178a5770 100644
--- a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
@@ -62,7 +62,7 @@ struct DemandedVL {
 };
 
 class RISCVVLOptimizer : public MachineFunctionPass {
-  const MachineRegisterInfo *MRI;
+  MachineRegisterInfo *MRI;
   const MachineDominatorTree *MDT;
   const TargetInstrInfo *TII;
 
@@ -1392,6 +1392,41 @@ bool RISCVVLOptimizer::isCandidate(const MachineInstr &MI) const {
   return true;
 }
 
+/// Given a vslidedown.vx like:
+///
+/// %slideamt = ADDI %x, -1
+/// %v = PseudoVSLIDEDOWN_VX %passthru, %src, %slideamt, avl=1
+///
+/// %v will only read the first %slideamt + 1 lanes of %src, which = %x.
+/// This is a common case when lowering extractelement.
+///
+/// Note that if %x is 0, %slideamt will be all ones. In this case %src will be
+/// completely slid down and none of its lanes will be read (since %slideamt is
+/// greater than the largest VLMAX of 65536) so we can demand any minimum VL.
+static std::optional<DemandedVL>
+getMinimumVLForVSLIDEDOWN_VX(const MachineOperand &UserOp,
+                             const MachineRegisterInfo *MRI) {
+  const MachineInstr &MI = *UserOp.getParent();
+  if (RISCV::getRVVMCOpcode(MI.getOpcode()) != RISCV::VSLIDEDOWN_VX)
+    return std::nullopt;
+  // We're looking at what lanes are used from the src operand.
+  if (UserOp.getOperandNo() != 2)
+    return std::nullopt;
+  // For now, the AVL must be 1.
+  const MachineOperand &AVL = MI.getOperand(4);
+  if (!AVL.isImm() || AVL.getImm() != 1)
+    return std::nullopt;
+  // The slide amount must be %x - 1.
+  const MachineOperand &SlideAmt = MI.getOperand(3);
+  if (!SlideAmt.getReg().isVirtual())
+    return std::nullopt;
+  MachineInstr *SlideAmtDef = MRI->getUniqueVRegDef(SlideAmt.getReg());
+  if (SlideAmtDef->getOpcode() != RISCV::ADDI ||
+      SlideAmtDef->getOperand(2).getImm() != -AVL.getImm())
+    return std::nullopt;
+  return SlideAmtDef->getOperand(1);
+}
+
 DemandedVL
 RISCVVLOptimizer::getMinimumVLForUser(const MachineOperand &UserOp) const {
   const MachineInstr &UserMI = *UserOp.getParent();
@@ -1406,6 +1441,9 @@ RISCVVLOptimizer::getMinimumVLForUser(const MachineOperand &UserOp) const {
     return DemandedVL::vlmax();
   }
 
+  if (auto VL = getMinimumVLForVSLIDEDOWN_VX(UserOp, MRI))
+    return *VL;
+
   if (RISCVII::readsPastVL(
           TII->get(RISCV::getRVVMCOpcode(UserMI.getOpcode())).TSFlags)) {
     LLVM_DEBUG(dbgs() << "  Abort because used by unsafe instruction\n");
@@ -1624,6 +1662,7 @@ bool RISCVVLOptimizer::tryReduceVL(MachineInstr &MI) const {
 
   // All our checks passed. We can reduce VL.
   VLOp.ChangeToRegister(CommonVL->getReg(), false);
+  MRI->constrainRegClass(CommonVL->getReg(), &RISCV::GPRNoX0RegClass);
   return true;
 }
 
diff --git a/llvm/test/CodeGen/RISCV/rvv/vl-opt-live-out.ll b/llvm/test/CodeGen/RISCV/rvv/vl-opt-live-out.ll
new file mode 100644
index 0000000000000..cf15fad5533b9
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/vl-opt-live-out.ll
@@ -0,0 +1,44 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -mtriple=riscv64 -mattr=+v -verify-machineinstrs < %s | FileCheck %s
+
+define i32 @loop_live_out(ptr %p, i64 %n) {
+; CHECK-LABEL: loop_live_out:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    mv a2, a0
+; CHECK-NEXT:  .LBB0_1: # %loop
+; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    vsetvli a3, a1, e32, m2, ta, ma
+; CHECK-NEXT:    vle32.v v8, (a2)
+; CHECK-NEXT:    sub a1, a1, a3
+; CHECK-NEXT:    vadd.vi v8, v8, 1
+; CHECK-NEXT:    vse32.v v8, (a2)
+; CHECK-NEXT:    slli a2, a3, 2
+; CHECK-NEXT:    add a2, a0, a2
+; CHECK-NEXT:    bnez a1, .LBB0_1
+; CHECK-NEXT:  # %bb.2: # %exit
+; CHECK-NEXT:    addi a3, a3, -1
+; CHECK-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; CHECK-NEXT:    vslidedown.vx v8, v8, a3
+; CHECK-NEXT:    vmv.x.s a0, v8
+; CHECK-NEXT:    ret
+entry:
+  br label %loop
+
+loop:
+  %avl = phi i64 [%n, %entry], [%avl.next, %loop]
+  %gep = phi ptr [%p, %entry], [%gep.next, %loop]
+  %vl = call i32 @llvm.experimental.get.vector.length(i64 %avl, i32 4, i1 true)
+  %x = call <vscale x 4 x i32> @llvm.vp.load(ptr %gep, <vscale x 4 x i1> splat (i1 true), i32 %vl)
+  %y = add <vscale x 4 x i32> %x, splat (i32 1)
+  call void @llvm.vp.store(<vscale x 4 x i32> %y, ptr %gep, <vscale x 4 x i1> splat (i1 true), i32 %vl)
+  %vl.zext = zext i32 %vl to i64
+  %avl.next = sub i64 %avl, %vl.zext
+  %gep.next = getelementptr i32, ptr %p, i32 %vl
+  %ec = icmp eq i64 %avl.next, 0
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  %lastidx = sub i64 %vl.zext, 1
+  %lastelt = extractelement <vscale x 4 x i32> %y, i64 %lastidx
+  ret i32 %lastelt
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/vl-opt.mir b/llvm/test/CodeGen/RISCV/rvv/vl-opt.mir
index 4d6d0e122b1cf..ddd23f3d575d8 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vl-opt.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/vl-opt.mir
@@ -778,3 +778,21 @@ body: |
     ; CHECK: DBG_VALUE %0:vr
     DBG_VALUE %0:vr
 ...
+---
+name: vslidedown_vx
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $x8
+    ; CHECK-LABEL: name: vslidedown_vx
+    ; CHECK: liveins: $x8
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %x:gprnox0 = COPY $x8
+    ; CHECK-NEXT: %y:gprnox0 = ADDI %x, -1
+    ; CHECK-NEXT: %v:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, %x, 5 /* e32 */, 0 /* tu, mu */
+    ; CHECK-NEXT: %w:vr = PseudoVSLIDEDOWN_VX_M1 $noreg, %v, %y, 1, 5 /* e32 */, 0 /* tu, mu */
+    %x:gpr = COPY $x8
+    %y:gprnox0 = ADDI %x, -1
+    %v:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 5 /* e32 */, 0 /* tu, mu */
+    %w:vr = PseudoVSLIDEDOWN_VX_M1 $noreg, %v, %y, 1, 5 /* e32 */, 0 /* tu, mu */
+...

lukel97 · 2025-11-17T16:04:08Z

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp


  // All our checks passed. We can reduce VL.
  VLOp.ChangeToRegister(CommonVL->getReg(), false);
+  MRI->constrainRegClass(CommonVL->getReg(), &RISCV::GPRNoX0RegClass);


Because we're taking the demanded VL from an ADDI's operands, the AVL may be a plain GPR virtual register now. So we need to constrain it to GPRNoX0. This doesn't seem to affect any existing AVLs that are reduced.

topperc · 2025-11-17T18:30:10Z

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

+  if (!SlideAmt.getReg().isVirtual())
+    return std::nullopt;
+  MachineInstr *SlideAmtDef = MRI->getUniqueVRegDef(SlideAmt.getReg());
+  if (SlideAmtDef->getOpcode() != RISCV::ADDI ||


Is it possible for this ADDI to be an LI? In which case Operand 1 is X0 and not a virtual register.

Woops yes, fixed in 0fa31a4

github-actions · 2025-11-17T19:28:28Z

🐧 Linux x64 Test Results

186263 tests passed
4848 tests skipped

topperc

LGTM

wangpc-pp · 2025-11-18T03:19:33Z

🐧 Linux x64 Test Results

186263 tests passed

4848 tests skipped

Just curious about this. Does this just run llvm unit tests? Why do we have a CI comment now?

lukel97 · 2025-11-18T06:14:00Z

🐧 Linux x64 Test Results

186263 tests passed

4848 tests skipped

Just curious about this. Does this just run llvm unit tests? Why do we have a CI comment now?

This is the first time I'm seeing it too. I presume it's reporting the summary of ninja check? Although you can usually see that summary from the run report anyway e.g. https://github.com/llvm/llvm-project/actions/runs/19435915385#summary-55606554562

lukel97 added 2 commits November 17, 2025 22:48

Precommit tests

a64b52e

[RISCV] Reduce minimum VL needed for vslidedown.vx in RISCVVLOptimizer

ce03066

lukel97 requested review from 4vtomat, michaelmaitland, preames, topperc and wangpc-pp November 17, 2025 15:59

llvmbot added the backend:RISC-V label Nov 17, 2025

lukel97 commented Nov 17, 2025

View reviewed changes

topperc reviewed Nov 17, 2025

View reviewed changes

Ignore LIs

0fa31a4

topperc approved these changes Nov 17, 2025

View reviewed changes

wangpc-pp approved these changes Nov 18, 2025

View reviewed changes

Merge branch 'main' into riscv/vlopt-vslidedown

1dfd00a

lukel97 enabled auto-merge (squash) November 18, 2025 06:09

lukel97 merged commit 485b3af into llvm:main Nov 18, 2025
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Reduce minimum VL needed for vslidedown.vx in RISCVVLOptimizer #168392

[RISCV] Reduce minimum VL needed for vslidedown.vx in RISCVVLOptimizer #168392

Uh oh!

lukel97 commented Nov 17, 2025 •

edited

Loading

Uh oh!

llvmbot commented Nov 17, 2025

Uh oh!

lukel97 Nov 17, 2025

Uh oh!

topperc Nov 17, 2025 •

edited

Loading

Uh oh!

lukel97 Nov 17, 2025

Uh oh!

github-actions bot commented Nov 17, 2025

Uh oh!

topperc left a comment

Uh oh!

wangpc-pp commented Nov 18, 2025

🐧 Linux x64 Test Results

Uh oh!

lukel97 commented Nov 18, 2025

🐧 Linux x64 Test Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[RISCV] Reduce minimum VL needed for vslidedown.vx in RISCVVLOptimizer #168392

[RISCV] Reduce minimum VL needed for vslidedown.vx in RISCVVLOptimizer #168392

Uh oh!

Conversation

lukel97 commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 17, 2025

Uh oh!

lukel97 Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

topperc Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukel97 Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 17, 2025

🐧 Linux x64 Test Results

Uh oh!

topperc left a comment

Choose a reason for hiding this comment

Uh oh!

wangpc-pp commented Nov 18, 2025

🐧 Linux x64 Test Results

Uh oh!

lukel97 commented Nov 18, 2025

🐧 Linux x64 Test Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lukel97 commented Nov 17, 2025 •

edited

Loading

topperc Nov 17, 2025 •

edited

Loading