[RISCV] Enable rematerialization for scalar loads #166774

lukel97 · 2025-11-06T13:43:16Z

In some workloads we see an argument passed on the stack where it is loaded, only for it to be immediately spilled to a different slot on the stack and then reloaded from that spill slot later on.

We can avoid the unnecessary spill by marking loads as rematerializable and just directly loading from where the argument was originally passed on the stack. TargetTransformInfo::isReMaterializableImpl checks to make sure that any loads are MI.isDereferenceableInvariantLoad(), so we should be able to move the load down to the remat site.

This gives a 14.8% reduction in spills in 544.nab_r on rva23u64 -O3, and a few other smaller reductions on llvm-test-suite. I didn't find any benchmarks where the number of spills/reloads increased.

Related: #165761

llvmbot · 2025-11-06T13:43:49Z

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

In some workloads we see an argument passed on the stack where it is loaded, only for it to be immediately spilled to a different slot on the stack and then reloaded from that spill slot.

We can avoid the unnecessary spill by marking loads as rematerializable. TargetTransformInfo::isReMaterializableImpl checks to make sure that any loads are MI.isDereferenceableInvariantLoad(), so we should be able to move the load down to the remat site.

This gives a 14.8% reduction in spills in 544.nab_r on rva23u64 -O3, and a few other smaller reductions on llvm-test-suite. I didn't find any benchmarks where the number of spills/reloads increased.

Related: #165761

Full diff: https://github.com/llvm/llvm-project/pull/166774.diff

6 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.td (+2-2)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfoD.td (+1-1)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfoF.td (+1-1)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfoZfh.td (+1-1)
(modified) llvm/test/CodeGen/RISCV/remat.ll (+92)
(modified) llvm/test/CodeGen/RISCV/rvv/pr95865.ll (+1-3)

diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.td b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
index 9cb53fb27a2d2..84b962b2a8607 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -768,7 +768,7 @@ def BGE  : BranchCC_rri<0b101, "bge">;
 def BLTU : BranchCC_rri<0b110, "bltu">;
 def BGEU : BranchCC_rri<0b111, "bgeu">;
 
-let IsSignExtendingOpW = 1, canFoldAsLoad = 1 in {
+let IsSignExtendingOpW = 1, canFoldAsLoad = 1, isReMaterializable = 1 in {
 def LB  : Load_ri<0b000, "lb">, Sched<[WriteLDB, ReadMemBase]>;
 def LH  : Load_ri<0b001, "lh">, Sched<[WriteLDH, ReadMemBase]>;
 def LW  : Load_ri<0b010, "lw">, Sched<[WriteLDW, ReadMemBase]>;
@@ -889,7 +889,7 @@ def CSRRCI : CSR_ii<0b111, "csrrci">;
 /// RV64I instructions
 
 let Predicates = [IsRV64] in {
-let canFoldAsLoad = 1 in {
+let canFoldAsLoad = 1, isReMaterializable = 1 in {
 def LWU   : Load_ri<0b110, "lwu">, Sched<[WriteLDW, ReadMemBase]>;
 def LD    : Load_ri<0b011, "ld">, Sched<[WriteLDD, ReadMemBase]>;
 }
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoD.td b/llvm/lib/Target/RISCV/RISCVInstrInfoD.td
index 4ffe3e62ac501..deacd41e6469a 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoD.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoD.td
@@ -71,7 +71,7 @@ defvar DExtsRV64 = [DExt, ZdinxExt];
 //===----------------------------------------------------------------------===//
 
 let Predicates = [HasStdExtD] in {
-let canFoldAsLoad = 1 in
+let canFoldAsLoad = 1, isReMaterializable = 1 in
 def FLD : FPLoad_r<0b011, "fld", FPR64, WriteFLD64>;
 
 // Operands for stores are in the order srcreg, base, offset rather than
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoF.td b/llvm/lib/Target/RISCV/RISCVInstrInfoF.td
index b30f8ec820c15..bd191001b75ec 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoF.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoF.td
@@ -330,7 +330,7 @@ class PseudoFROUND<DAGOperand Ty, ValueType vt, ValueType intvt = XLenVT>
 //===----------------------------------------------------------------------===//
 
 let Predicates = [HasStdExtF] in {
-let canFoldAsLoad = 1 in
+let canFoldAsLoad = 1, isReMaterializable = 1 in
 def FLW : FPLoad_r<0b010, "flw", FPR32, WriteFLD32>;
 
 // Operands for stores are in the order srcreg, base, offset rather than
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoZfh.td b/llvm/lib/Target/RISCV/RISCVInstrInfoZfh.td
index 1c6a5afcda49b..c172d1739ba61 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoZfh.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoZfh.td
@@ -90,7 +90,7 @@ defvar ZfhminDExts = [ZfhminDExt, ZhinxminZdinxExt, ZhinxminZdinx32Ext];
 //===----------------------------------------------------------------------===//
 
 let Predicates = [HasHalfFPLoadStoreMove] in {
-let canFoldAsLoad = 1 in
+let canFoldAsLoad = 1, isReMaterializable = 1 in
 def FLH : FPLoad_r<0b001, "flh", FPR16, WriteFLD16>;
 
 // Operands for stores are in the order srcreg, base, offset rather than
diff --git a/llvm/test/CodeGen/RISCV/remat.ll b/llvm/test/CodeGen/RISCV/remat.ll
index 92ae85f560cd4..88240474f30bf 100644
--- a/llvm/test/CodeGen/RISCV/remat.ll
+++ b/llvm/test/CodeGen/RISCV/remat.ll
@@ -201,3 +201,95 @@ for.end:                                          ; preds = %for.inc, %entry
 }
 
 declare i32 @foo(i32, i32, i32, i32, i32, i32)
+
+define i32 @remat_ld(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i32 %6, i32 %7, i32 %stack0, i32 %stack1, ptr %p, ptr %q) {
+; RV32I-LABEL: remat_ld:
+; RV32I:       # %bb.0: # %entry
+; RV32I-NEXT:    addi sp, sp, -64
+; RV32I-NEXT:    .cfi_def_cfa_offset 64
+; RV32I-NEXT:    sw ra, 60(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s0, 56(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s1, 52(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s2, 48(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s3, 44(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s4, 40(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s5, 36(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s6, 32(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s7, 28(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s8, 24(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s9, 20(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s10, 16(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s11, 12(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    .cfi_offset ra, -4
+; RV32I-NEXT:    .cfi_offset s0, -8
+; RV32I-NEXT:    .cfi_offset s1, -12
+; RV32I-NEXT:    .cfi_offset s2, -16
+; RV32I-NEXT:    .cfi_offset s3, -20
+; RV32I-NEXT:    .cfi_offset s4, -24
+; RV32I-NEXT:    .cfi_offset s5, -28
+; RV32I-NEXT:    .cfi_offset s6, -32
+; RV32I-NEXT:    .cfi_offset s7, -36
+; RV32I-NEXT:    .cfi_offset s8, -40
+; RV32I-NEXT:    .cfi_offset s9, -44
+; RV32I-NEXT:    .cfi_offset s10, -48
+; RV32I-NEXT:    .cfi_offset s11, -52
+; RV32I-NEXT:    sw a0, 8(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    lw a2, 68(sp)
+; RV32I-NEXT:    lw a3, 64(sp)
+; RV32I-NEXT:    lw a0, 72(sp)
+; RV32I-NEXT:    lw a1, 76(sp)
+; RV32I-NEXT:    sw a3, 0(a0)
+; RV32I-NEXT:    sw a2, 0(a1)
+; RV32I-NEXT:    #APP
+; RV32I-NEXT:    #NO_APP
+; RV32I-NEXT:    lw a0, 8(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    beqz a0, .LBB1_2
+; RV32I-NEXT:  # %bb.1: # %falsebb
+; RV32I-NEXT:    li a0, 0
+; RV32I-NEXT:    j .LBB1_3
+; RV32I-NEXT:  .LBB1_2: # %truebb
+; RV32I-NEXT:    lw a0, 68(sp)
+; RV32I-NEXT:    lw a1, 64(sp)
+; RV32I-NEXT:    add a0, a1, a0
+; RV32I-NEXT:  .LBB1_3: # %falsebb
+; RV32I-NEXT:    lw ra, 60(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s0, 56(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s1, 52(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s2, 48(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s3, 44(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s4, 40(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s5, 36(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s6, 32(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s7, 28(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s8, 24(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s9, 20(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s10, 16(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s11, 12(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    .cfi_restore ra
+; RV32I-NEXT:    .cfi_restore s0
+; RV32I-NEXT:    .cfi_restore s1
+; RV32I-NEXT:    .cfi_restore s2
+; RV32I-NEXT:    .cfi_restore s3
+; RV32I-NEXT:    .cfi_restore s4
+; RV32I-NEXT:    .cfi_restore s5
+; RV32I-NEXT:    .cfi_restore s6
+; RV32I-NEXT:    .cfi_restore s7
+; RV32I-NEXT:    .cfi_restore s8
+; RV32I-NEXT:    .cfi_restore s9
+; RV32I-NEXT:    .cfi_restore s10
+; RV32I-NEXT:    .cfi_restore s11
+; RV32I-NEXT:    addi sp, sp, 64
+; RV32I-NEXT:    .cfi_def_cfa_offset 0
+; RV32I-NEXT:    ret
+entry:
+  store i32 %stack0, ptr %p
+  store i32 %stack1, ptr %q
+  tail call void asm sideeffect "", "~{x1},~{x3},~{x4},~{x5},~{x6},~{x7},~{x8},~{x9},~{x10},~{x11},~{x12},~{x13},~{x14},~{x15},~{x16},~{x17},~{x18},~{x19},~{x20},~{x21},~{x22},~{x23},~{x24},~{x25},~{x26},~{x27},~{x28},~{x29},~{x30},~{x31}"()
+  %a = icmp eq i32 %0, 0
+  br i1 %a, label %truebb, label %falsebb
+truebb:
+  %b = add i32 %stack0, %stack1
+  ret i32 %b
+falsebb:
+  ret i32 0
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/pr95865.ll b/llvm/test/CodeGen/RISCV/rvv/pr95865.ll
index ab9849631663c..01d66b344ec2e 100644
--- a/llvm/test/CodeGen/RISCV/rvv/pr95865.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/pr95865.ll
@@ -40,8 +40,6 @@ define i32 @main(i1 %arg.1, i64 %arg.2, i1 %arg.3, i64 %arg.4, i1 %arg.5, <vscal
 ; CHECK-NEXT:    li t0, 12
 ; CHECK-NEXT:    li s0, 4
 ; CHECK-NEXT:    li t1, 20
-; CHECK-NEXT:    ld a1, 112(sp)
-; CHECK-NEXT:    sd a1, 0(sp) # 8-byte Folded Spill
 ; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
 ; CHECK-NEXT:    vmv.v.i v8, 0
 ; CHECK-NEXT:    andi t3, a4, 1
@@ -142,7 +140,7 @@ define i32 @main(i1 %arg.1, i64 %arg.2, i1 %arg.3, i64 %arg.4, i1 %arg.5, <vscal
 ; CHECK-NEXT:    j .LBB0_11
 ; CHECK-NEXT:  .LBB0_12: # %for.body7.us.19
 ; CHECK-NEXT:    vsetvli a0, zero, e32, m8, ta, ma
-; CHECK-NEXT:    ld a0, 0(sp) # 8-byte Folded Reload
+; CHECK-NEXT:    ld a0, 112(sp)
 ; CHECK-NEXT:    vmv.s.x v16, a0
 ; CHECK-NEXT:    vmv.v.i v8, 0
 ; CHECK-NEXT:    vsetivli zero, 2, e32, m1, tu, ma

topperc · 2025-11-06T18:18:42Z

llvm/test/CodeGen/RISCV/remat.ll

+; RV32I-NEXT:    sw s9, 20(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s10, 16(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s11, 12(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    .cfi_offset ra, -4


preames

Summarizing some offline conversation with Craig. We happened to chat about this idea yesterday.

Cases this might kick in:

Arguments - will be frame indices during regalloc, and rewritten afterwards, so no concerns for e.g. dynamic alloc offset calculations.
Globals, particularly pc-relative. Remating the LD without the whole sequence might break the assumption of the AUIPC offset?

Craig, can you expand on the later just a bit? As long as we don't move the AUIPC, isn't the low offset fixed? It's the pc relative offset from the AUIPC to the global, not from the LD to the global isn't it?

Luke, could you add a test case which does a constant global load before the asm block, and then a use after? We should be able to remat that, but it's the case we were a bit unclear on so having specific testing is good.

preames · 2025-11-06T19:19:02Z

llvm/test/CodeGen/RISCV/remat.ll

+  store i32 %stack1, ptr %q
+  tail call void asm sideeffect "", "~{x1},~{x3},~{x4},~{x5},~{x6},~{x7},~{x8},~{x9},~{x10},~{x11},~{x12},~{x13},~{x14},~{x15},~{x16},~{x17},~{x18},~{x19},~{x20},~{x21},~{x22},~{x23},~{x24},~{x25},~{x26},~{x27},~{x28},~{x29},~{x30},~{x31}"()
+  %a = icmp eq i32 %0, 0
+  br i1 %a, label %truebb, label %falsebb


I think you can simplify the test. Is the conditional block doing anything in particularly here? If you just had the inline assembly followed by a unconditional use of %stack0 and %stack1, I think that still remats? Or even, just return one of %stack0 or %stack1?

Yup, was able to simplify it to just a volatile store after the asm without branching. For some reason, the float stack arguments also require a use before the asm in order to force the initial load off the stack

llvm/test/CodeGen/RISCV/remat.ll

topperc · 2025-11-06T20:08:48Z

Craig, can you expand on the later just a bit? As long as we don't move the AUIPC, isn't the low offset fixed? It's the pc relative offset from the AUIPC to the global, not from the LD to the global isn't it?

You're right. Remating the load should be fine. It has a reference to a symbol on the auipc. We can remat that since we can have multiple instructions referring to the same auipc. It's auipc that we can't rematerialize since that would need a new symbol.

bababuck · 2025-11-07T02:09:46Z

Any reason this wouldn't apply to vector as well? Though I don't recall ever seeing any cases in the wild that would benefit from this though...

topperc · 2025-11-07T02:34:33Z

Any reason this wouldn't apply to vector as well? Though I don't recall ever seeing any cases in the wild that would benefit from this though...

I don't think the generic infrastructure can identify an invariant vector load from stack since the frame index operand won't be part of the load.

Maybe it's possible for other types of invariant loads?

This is to allow us to test rematting i64 lds in #166774. Checked and we're still rematting the same lis on rv64 in this test.

lukel97 · 2025-11-07T04:25:02Z

Apologies for the force push, I precomitted 630f43a and rebased on top of it so we can test i64 ld remat.

Luke, could you add a test case which does a constant global load before the asm block, and then a use after? We should be able to remat that, but it's the case we were a bit unclear on so having specific testing is good.

I've added a test for a constant global load, but it looks like we can't rematerialize global loads yet because even though the load is invariant, the upper bits of the address won't be live at the point of rematerialization. I think in order to remat this we either need to extend the live range or allow rematerializing multiple instructions.

preames · 2025-11-07T16:28:58Z

llvm/test/CodeGen/RISCV/remat.ll

+; CHECK-NEXT:    addi sp, sp, 208
+; CHECK-NEXT:    ret
+entry:
+  ; Force loading the stack arguments to create their live interval


Can you explain this comment? This doesn't make sense to me.

This is to allow us to test rematting i64 lds in llvm#166774. Checked and we're still rematting the same lis on rv64 in this test.

lukel97 requested review from asb, preames, topperc and wangpc-pp November 6, 2025 13:43

llvmbot added the backend:RISC-V label Nov 6, 2025

topperc reviewed Nov 6, 2025

View reviewed changes

preames reviewed Nov 6, 2025

View reviewed changes

lukel97 added a commit that referenced this pull request Nov 7, 2025

[RISCV] Move remat.ll test from riscv32 to riscv64. NFC

630f43a

This is to allow us to test rematting i64 lds in #166774. Checked and we're still rematting the same lis on rv64 in this test.

lukel97 added 2 commits November 7, 2025 12:18

Precommit tests

bc86a8f

[RISCV] Enable rematerialization for scalar loads

2b0f29f

lukel97 force-pushed the riscv/ld-remat branch from e7ebe51 to 2b0f29f Compare November 7, 2025 04:20

preames reviewed Nov 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Enable rematerialization for scalar loads #166774

[RISCV] Enable rematerialization for scalar loads #166774

lukel97 commented Nov 6, 2025 •

edited

Loading

Uh oh!

llvmbot commented Nov 6, 2025

Uh oh!

topperc Nov 6, 2025

Uh oh!

preames left a comment

Uh oh!

preames Nov 6, 2025

Uh oh!

lukel97 Nov 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

topperc commented Nov 6, 2025

Uh oh!

bababuck commented Nov 7, 2025

Uh oh!

topperc commented Nov 7, 2025

Uh oh!

lukel97 commented Nov 7, 2025

Uh oh!

preames Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[RISCV] Enable rematerialization for scalar loads #166774

Are you sure you want to change the base?

[RISCV] Enable rematerialization for scalar loads #166774

Conversation

lukel97 commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 6, 2025

Uh oh!

topperc Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

preames Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

lukel97 Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

topperc commented Nov 6, 2025

Uh oh!

bababuck commented Nov 7, 2025

Uh oh!

topperc commented Nov 7, 2025

Uh oh!

lukel97 commented Nov 7, 2025

Uh oh!

preames Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lukel97 commented Nov 6, 2025 •

edited

Loading

lukel97 Nov 7, 2025 •

edited

Loading