[LV][VPlan] Reduce register usage of VPEVLBasedIVPHIRecipe. #154482

ElvisWang123 · 2025-08-20T06:51:47Z

VPEVLBasedIVPHIRecipe will lower to VPInstruction scalar phi and
generate scalar phi. This recipe will only occupy a scalar register just
like other phi recipes.

This patch fix the register usage for VPEVLBasedIVPHIRecipe from vector
to scalar which is close to generated vector IR.

https://godbolt.org/z/6Mzd6W6ha shows that no register spills when
choosing <vscale x 16>.

Note that this test is basically copied from AArch64.

VPEVLBasedIVPHIRecipe will lower to VPInstruction scalar phi and generate scalar phi. This recipe will only use a scalar register just like other phi recipes. This patch fix the register usage for VPEVLBasedIVPHIRecipe from vector to scalar which is close to generated vector IR. https://godbolt.org/z/6Mzd6W6ha shows that no register spills when choosing <vscale x 16>.

llvmbot · 2025-08-20T06:52:20Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Elvis Wang (ElvisWang123)

Changes

VPEVLBasedIVPHIRecipe will lower to VPInstruction scalar phi and
generate scalar phi. This recipe will only occupy a scalar register just
like other phi recipes.

This patch fix the register usage for VPEVLBasedIVPHIRecipe from vector
to scalar which is close to generated vector IR.

https://godbolt.org/z/6Mzd6W6ha shows that no register spills when
choosing <vscale x 16>.

Note that this test is basically copied from AArch64.

Full diff: https://github.com/llvm/llvm-project/pull/154482.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+1-1)
(added) llvm/test/Transforms/LoopVectorize/RISCV/maxbandwidth-regpressure.ll (+37)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index b39231f106300..b46d99052a1dd 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -555,7 +555,7 @@ SmallVector<VPRegisterUsage, 8> llvm::calculateRegisterUsageForPlan(
 
         if (VFs[J].isScalar() ||
             isa<VPCanonicalIVPHIRecipe, VPReplicateRecipe, VPDerivedIVRecipe,
-                VPScalarIVStepsRecipe>(R) ||
+                VPEVLBasedIVPHIRecipe, VPScalarIVStepsRecipe>(R) ||
             (isa<VPInstruction>(R) &&
              all_of(cast<VPSingleDefRecipe>(R)->users(),
                     [&](VPUser *U) {
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/maxbandwidth-regpressure.ll b/llvm/test/Transforms/LoopVectorize/RISCV/maxbandwidth-regpressure.ll
new file mode 100644
index 0000000000000..71b26aa77ce88
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/maxbandwidth-regpressure.ll
@@ -0,0 +1,37 @@
+; REQUIRES: asserts
+; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -vectorizer-maximize-bandwidth -debug-only=loop-vectorize,vplan -disable-output -force-vector-interleave=1 -enable-epilogue-vectorization=false -S < %s 2>&1 | FileCheck %s --check-prefixes=CHECK-REGS-VP
+; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -vectorizer-maximize-bandwidth -debug-only=loop-vectorize -disable-output -force-target-num-vector-regs=1 -force-vector-interleave=1 -enable-epilogue-vectorization=false -S < %s 2>&1 | FileCheck %s --check-prefixes=CHECK-NOREGS-VP
+define i32 @dotp(ptr %a, ptr %b) {
+; CHECK-REGS-VP:      LV(REG): VF = vscale x 16
+; CHECK-REGS-VP-NEXT: LV(REG): Found max usage: 2 item
+; CHECK-REGS-VP-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
+; CHECK-REGS-VP-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 24 registers
+; CHECK-REGS-VP-NEXT: LV(REG): Found invariant usage: 1 item
+; CHECK-REGS-VP-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
+; CHECK-REGS-VP: LV: Selecting VF: vscale x 16.
+;
+; CHECK-NOREGS-VP: LV(REG): Not considering vector loop of width vscale x 8 because it uses too many registers
+; CHECK-NOREGS-VP: LV(REG): Not considering vector loop of width vscale x 16 because it uses too many registers
+; CHECK-NOREGS-VP: LV: Selecting VF: vscale x 4.
+entry:
+  br label %for.body
+
+for.body:                                         ; preds = %for.body, %entry
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+  %accum = phi i32 [ 0, %entry ], [ %add, %for.body ]
+  %gep.a = getelementptr i8, ptr %a, i64 %iv
+  %load.a = load i8, ptr %gep.a, align 1
+  %ext.a = zext i8 %load.a to i32
+  %gep.b = getelementptr i8, ptr %b, i64 %iv
+  %load.b = load i8, ptr %gep.b, align 1
+  %ext.b = zext i8 %load.b to i32
+  %mul = mul i32 %ext.b, %ext.a
+  %sub = sub i32 0, %mul
+  %add = add i32 %accum, %sub
+  %iv.next = add i64 %iv, 1
+  %exitcond.not = icmp eq i64 %iv.next, 1024
+  br i1 %exitcond.not, label %for.exit, label %for.body
+
+for.exit:                        ; preds = %for.body
+  ret i32 %add
+}

llvmbot · 2025-08-20T06:52:20Z

@llvm/pr-subscribers-backend-risc-v

Author: Elvis Wang (ElvisWang123)

Changes

VPEVLBasedIVPHIRecipe will lower to VPInstruction scalar phi and
generate scalar phi. This recipe will only occupy a scalar register just
like other phi recipes.

This patch fix the register usage for VPEVLBasedIVPHIRecipe from vector
to scalar which is close to generated vector IR.

https://godbolt.org/z/6Mzd6W6ha shows that no register spills when
choosing <vscale x 16>.

Note that this test is basically copied from AArch64.

Full diff: https://github.com/llvm/llvm-project/pull/154482.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+1-1)
(added) llvm/test/Transforms/LoopVectorize/RISCV/maxbandwidth-regpressure.ll (+37)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index b39231f106300..b46d99052a1dd 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -555,7 +555,7 @@ SmallVector<VPRegisterUsage, 8> llvm::calculateRegisterUsageForPlan(
 
         if (VFs[J].isScalar() ||
             isa<VPCanonicalIVPHIRecipe, VPReplicateRecipe, VPDerivedIVRecipe,
-                VPScalarIVStepsRecipe>(R) ||
+                VPEVLBasedIVPHIRecipe, VPScalarIVStepsRecipe>(R) ||
             (isa<VPInstruction>(R) &&
              all_of(cast<VPSingleDefRecipe>(R)->users(),
                     [&](VPUser *U) {
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/maxbandwidth-regpressure.ll b/llvm/test/Transforms/LoopVectorize/RISCV/maxbandwidth-regpressure.ll
new file mode 100644
index 0000000000000..71b26aa77ce88
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/maxbandwidth-regpressure.ll
@@ -0,0 +1,37 @@
+; REQUIRES: asserts
+; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -vectorizer-maximize-bandwidth -debug-only=loop-vectorize,vplan -disable-output -force-vector-interleave=1 -enable-epilogue-vectorization=false -S < %s 2>&1 | FileCheck %s --check-prefixes=CHECK-REGS-VP
+; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -vectorizer-maximize-bandwidth -debug-only=loop-vectorize -disable-output -force-target-num-vector-regs=1 -force-vector-interleave=1 -enable-epilogue-vectorization=false -S < %s 2>&1 | FileCheck %s --check-prefixes=CHECK-NOREGS-VP
+define i32 @dotp(ptr %a, ptr %b) {
+; CHECK-REGS-VP:      LV(REG): VF = vscale x 16
+; CHECK-REGS-VP-NEXT: LV(REG): Found max usage: 2 item
+; CHECK-REGS-VP-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
+; CHECK-REGS-VP-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 24 registers
+; CHECK-REGS-VP-NEXT: LV(REG): Found invariant usage: 1 item
+; CHECK-REGS-VP-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
+; CHECK-REGS-VP: LV: Selecting VF: vscale x 16.
+;
+; CHECK-NOREGS-VP: LV(REG): Not considering vector loop of width vscale x 8 because it uses too many registers
+; CHECK-NOREGS-VP: LV(REG): Not considering vector loop of width vscale x 16 because it uses too many registers
+; CHECK-NOREGS-VP: LV: Selecting VF: vscale x 4.
+entry:
+  br label %for.body
+
+for.body:                                         ; preds = %for.body, %entry
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+  %accum = phi i32 [ 0, %entry ], [ %add, %for.body ]
+  %gep.a = getelementptr i8, ptr %a, i64 %iv
+  %load.a = load i8, ptr %gep.a, align 1
+  %ext.a = zext i8 %load.a to i32
+  %gep.b = getelementptr i8, ptr %b, i64 %iv
+  %load.b = load i8, ptr %gep.b, align 1
+  %ext.b = zext i8 %load.b to i32
+  %mul = mul i32 %ext.b, %ext.a
+  %sub = sub i32 0, %mul
+  %add = add i32 %accum, %sub
+  %iv.next = add i64 %iv, 1
+  %exitcond.not = icmp eq i64 %iv.next, 1024
+  br i1 %exitcond.not, label %for.exit, label %for.body
+
+for.exit:                        ; preds = %for.body
+  ret i32 %add
+}

lukel97

Makes sense to me. This might be useful for #144520. Have you been enabling max-bandwidth downstream?

lukel97 · 2025-08-20T07:50:58Z

llvm/test/Transforms/LoopVectorize/RISCV/maxbandwidth-regpressure.ll

+; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -vectorizer-maximize-bandwidth -debug-only=loop-vectorize,vplan -disable-output -force-vector-interleave=1 -enable-epilogue-vectorization=false -S < %s 2>&1 | FileCheck %s --check-prefixes=CHECK-REGS-VP
+; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -vectorizer-maximize-bandwidth -debug-only=loop-vectorize -disable-output -force-target-num-vector-regs=1 -force-vector-interleave=1 -enable-epilogue-vectorization=false -S < %s 2>&1 | FileCheck %s --check-prefixes=CHECK-NOREGS-VP


Do we need these flags? My understanding was that these were the default on RISC-V anyway

Suggested change

; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -vectorizer-maximize-bandwidth -debug-only=loop-vectorize,vplan -disable-output -force-vector-interleave=1 -enable-epilogue-vectorization=false -S < %s 2>&1 | FileCheck %s --check-prefixes=CHECK-REGS-VP

; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -vectorizer-maximize-bandwidth -debug-only=loop-vectorize -disable-output -force-target-num-vector-regs=1 -force-vector-interleave=1 -enable-epilogue-vectorization=false -S < %s 2>&1 | FileCheck %s --check-prefixes=CHECK-NOREGS-VP

; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -vectorizer-maximize-bandwidth -debug-only=loop-vectorize,vplan -disable-output -S < %s 2>&1 | FileCheck %s --check-prefixes=CHECK-REGS-VP

; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -vectorizer-maximize-bandwidth -debug-only=loop-vectorize -disable-output -force-target-num-vector-regs=1 -S < %s 2>&1 | FileCheck %s --check-prefixes=CHECK-NOREGS-VP

Dropped, thanks!

lukel97 · 2025-08-20T07:52:27Z

llvm/test/Transforms/LoopVectorize/RISCV/maxbandwidth-regpressure.ll

Can you rename this file to reg-usage-maxbandwidth.ll to be inline with the other reg-usage* tests?

Updated, thanks!

ElvisWang123 · 2025-08-20T08:03:38Z

Makes sense to me. This might be useful for #144520. Have you been enabling max-bandwidth downstream?

Actually we don't enable maximum-bandwidth downstream.
This is founded by regressions in downstream when the vplan-based register pressure model always enabled previously (now need maximum-bandwidth to enable it).

lukel97

LGTM, thanks

lukel97 · 2025-08-20T08:07:17Z

llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-maxbandwidth.ll

+; REQUIRES: asserts
+; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -vectorizer-maximize-bandwidth -debug-only=loop-vectorize,vplan -disable-output -S < %s 2>&1 | FileCheck %s --check-prefixes=CHECK-REGS-VP
+; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -vectorizer-maximize-bandwidth -debug-only=loop-vectorize -disable-output -force-target-num-vector-regs=1 -S < %s 2>&1 | FileCheck %s --check-prefixes=CHECK-NOREGS-VP
+define i32 @dotp(ptr %a, ptr %b) {


Nit, should probably have an extra newline

Suggested change

define i32 @dotp(ptr %a, ptr %b) {

define i32 @dotp(ptr %a, ptr %b) {

lukel97 · 2025-08-20T08:08:52Z

llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-maxbandwidth.ll

@@ -0,0 +1,37 @@
+; REQUIRES: asserts
+; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -vectorizer-maximize-bandwidth -debug-only=loop-vectorize,vplan -disable-output -S < %s 2>&1 | FileCheck %s --check-prefixes=CHECK-REGS-VP
+; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -vectorizer-maximize-bandwidth -debug-only=loop-vectorize -disable-output -force-target-num-vector-regs=1 -S < %s 2>&1 | FileCheck %s --check-prefixes=CHECK-NOREGS-VP


I don't think this RUN line is necessary, I think the diff from the debug output on the first RUN line is good enough. But I'm not strongly opinionated about this, I'll leave this up to you :)

fhahn

LGTM, thanks!

ElvisWang123 added 2 commits August 19, 2025 23:19

Precommit test case.

3990d82

ElvisWang123 requested review from Mel-Chen, SamTebbs33, arcbbb, fhahn and lukel97 August 20, 2025 06:51

llvmbot added backend:RISC-V vectorizers llvm:transforms labels Aug 20, 2025

lukel97 reviewed Aug 20, 2025

View reviewed changes

address comments.

c6eb21b

lukel97 approved these changes Aug 20, 2025

View reviewed changes

!fixup, add newline and remove unneed runs.

0a10d12

fhahn approved these changes Aug 20, 2025

View reviewed changes

Merge branch 'main' into lv-reg-RISCV-EVL-phi-fix

918309a

ElvisWang123 merged commit d611a9c into llvm:main Aug 20, 2025
9 checks passed

ElvisWang123 deleted the lv-reg-RISCV-EVL-phi-fix branch August 20, 2025 23:39

		; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -vectorizer-maximize-bandwidth -debug-only=loop-vectorize,vplan -disable-output -force-vector-interleave=1 -enable-epilogue-vectorization=false -S < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK-REGS-VP
		; RUN: opt -passes=loop-vectorize -mtriple riscv64 -mattr=+v -vectorizer-maximize-bandwidth -debug-only=loop-vectorize -disable-output -force-target-num-vector-regs=1 -force-vector-interleave=1 -enable-epilogue-vectorization=false -S < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK-NOREGS-VP

	define i32 @dotp(ptr %a, ptr %b) {

	define i32 @dotp(ptr %a, ptr %b) {

[LV][VPlan] Reduce register usage of VPEVLBasedIVPHIRecipe. #154482

[LV][VPlan] Reduce register usage of VPEVLBasedIVPHIRecipe. #154482

Uh oh!

Conversation

ElvisWang123 commented Aug 20, 2025

Uh oh!

llvmbot commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 20, 2025

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

lukel97 Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

ElvisWang123 Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

ElvisWang123 Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

ElvisWang123 commented Aug 20, 2025

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

lukel97 Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

llvmbot commented Aug 20, 2025 •

edited

Loading