[RISCV] Only convert volatile i64 load/store to Zilsd in SelectionDAG. #169529

topperc · 2025-11-25T17:20:29Z

Rely on ZilsdOptimizer for non-volatile loads/stores. This can avoid extra moves.

Also use Zilsd for non-volatile stores of 0.

Rely on ZilsdOptimizer for non-volatile loads/stores. This can avoid extra moves. Also use Zilsd for non-volatile stores of 0.

topperc · 2025-11-25T17:20:45Z

CC @christian-herber-nxp

llvmbot · 2025-11-25T17:21:00Z

@llvm/pr-subscribers-backend-risc-v

Author: Craig Topper (topperc)

Changes

Rely on ZilsdOptimizer for non-volatile loads/stores. This can avoid extra moves.

Also use Zilsd for non-volatile stores of 0.

Full diff: https://github.com/llvm/llvm-project/pull/169529.diff

2 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+7-2)
(modified) llvm/test/CodeGen/RISCV/zilsd.ll (+19-32)

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 19a16197272fe..de60e8461e5d7 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -8404,7 +8404,12 @@ SDValue RISCVTargetLowering::LowerOperation(SDValue Op,
       if (Store->isTruncatingStore())
         return SDValue();
 
-      if (Store->getAlign() < Subtarget.getZilsdAlign())
+      // Expand non-volatile or misaligned stores.
+      // Keep stores of 0 since that doesn't constrain the register allocator.
+      if (!(Store->isVolatile() ||
+            (isa<ConstantSDNode>(StoredVal) &&
+             cast<ConstantSDNode>(StoredVal)->isZero())) ||
+          Store->getAlign() < Subtarget.getZilsdAlign())
         return SDValue();
 
       SDLoc DL(Op);
@@ -14803,7 +14808,7 @@ void RISCVTargetLowering::ReplaceNodeResults(SDNode *N,
       assert(Subtarget.hasStdExtZilsd() && !Subtarget.is64Bit() &&
              "Unexpected custom legalisation");
 
-      if (Ld->getAlign() < Subtarget.getZilsdAlign())
+      if (!Ld->isVolatile() || Ld->getAlign() < Subtarget.getZilsdAlign())
         return;
 
       SDLoc DL(N);
diff --git a/llvm/test/CodeGen/RISCV/zilsd.ll b/llvm/test/CodeGen/RISCV/zilsd.ll
index 27b1ff76f6f05..427b1d83b9d8e 100644
--- a/llvm/test/CodeGen/RISCV/zilsd.ll
+++ b/llvm/test/CodeGen/RISCV/zilsd.ll
@@ -9,9 +9,10 @@
 define i64 @load(ptr %a) nounwind {
 ; CHECK-LABEL: load:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    mv a2, a0
-; CHECK-NEXT:    ld a0, 80(a0)
-; CHECK-NEXT:    ld zero, 0(a2)
+; CHECK-NEXT:    lw a2, 80(a0)
+; CHECK-NEXT:    lw a1, 84(a0)
+; CHECK-NEXT:    ld zero, 0(a0)
+; CHECK-NEXT:    mv a0, a2
 ; CHECK-NEXT:    ret
   %1 = getelementptr i64, ptr %a, i32 10
   %2 = load i64, ptr %1
@@ -44,10 +45,10 @@ define i64 @load_align4(ptr %a) nounwind {
 define void @store(ptr %a, i64 %b) nounwind {
 ; CHECK-LABEL: store:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    mv a3, a2
-; CHECK-NEXT:    mv a2, a1
-; CHECK-NEXT:    sd a2, 0(a0)
-; CHECK-NEXT:    sd a2, 88(a0)
+; CHECK-NEXT:    sw a1, 0(a0)
+; CHECK-NEXT:    sw a2, 4(a0)
+; CHECK-NEXT:    sw a1, 88(a0)
+; CHECK-NEXT:    sw a2, 92(a0)
 ; CHECK-NEXT:    ret
   store i64 %b, ptr %a
   %1 = getelementptr i64, ptr %a, i32 11
@@ -56,25 +57,11 @@ define void @store(ptr %a, i64 %b) nounwind {
 }
 
 define void @store_align4(ptr %a, i64 %b) nounwind {
-; SLOW-LABEL: store_align4:
-; SLOW:       # %bb.0:
-; SLOW-NEXT:    sw a1, 88(a0)
-; SLOW-NEXT:    sw a2, 92(a0)
-; SLOW-NEXT:    ret
-;
-; FAST-LABEL: store_align4:
-; FAST:       # %bb.0:
-; FAST-NEXT:    mv a3, a2
-; FAST-NEXT:    mv a2, a1
-; FAST-NEXT:    sd a2, 88(a0)
-; FAST-NEXT:    ret
-;
-; 4BYTEALIGN-LABEL: store_align4:
-; 4BYTEALIGN:       # %bb.0:
-; 4BYTEALIGN-NEXT:    mv a3, a2
-; 4BYTEALIGN-NEXT:    mv a2, a1
-; 4BYTEALIGN-NEXT:    sd a2, 88(a0)
-; 4BYTEALIGN-NEXT:    ret
+; CHECK-LABEL: store_align4:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sw a1, 88(a0)
+; CHECK-NEXT:    sw a2, 92(a0)
+; CHECK-NEXT:    ret
   %1 = getelementptr i64, ptr %a, i32 11
   store i64 %b, ptr %1, align 4
   ret void
@@ -158,9 +145,8 @@ define void @store_unaligned(ptr %p, i64 %v) {
 ;
 ; FAST-LABEL: store_unaligned:
 ; FAST:       # %bb.0:
-; FAST-NEXT:    mv a3, a2
-; FAST-NEXT:    mv a2, a1
-; FAST-NEXT:    sd a2, 0(a0)
+; FAST-NEXT:    sw a1, 0(a0)
+; FAST-NEXT:    sw a2, 4(a0)
 ; FAST-NEXT:    ret
 ;
 ; 4BYTEALIGN-LABEL: store_unaligned:
@@ -213,10 +199,11 @@ define void @large_offset(ptr nocapture %p, i64 %d) nounwind {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    lui a1, 4
 ; CHECK-NEXT:    add a0, a0, a1
-; CHECK-NEXT:    ld a2, -384(a0)
+; CHECK-NEXT:    lw a2, -384(a0)
+; CHECK-NEXT:    lw a1, -380(a0)
 ; CHECK-NEXT:    addi a2, a2, 1
-; CHECK-NEXT:    seqz a1, a2
-; CHECK-NEXT:    add a3, a3, a1
+; CHECK-NEXT:    seqz a3, a2
+; CHECK-NEXT:    add a3, a1, a3
 ; CHECK-NEXT:    sd a2, -384(a0)
 ; CHECK-NEXT:    ret
 entry:

lenary · 2025-11-25T18:01:48Z

Can we hold off on this? I'm not super happy with the idea of splitting up the i64 operations just to attempt to pair them later in the pipeline.

I don't think we've investigated other ways to avoid the additional moves.

I'll work on a patch today to increase the allocation priority of GPRPairs (And their equivalents).

I do see that we're always going to have a problem with i64s passed in the nth argument, where n is odd. There's likely no escaping that, unfortunately.

I still owe you numbers on how bad the additional moves are in real-world code. I don't think I'll get to that this week. I don't know if you got numbers of your own on that.

lenary · 2025-11-25T18:10:09Z

Sorry, I just caught up with the other patch that does have more data on the code size impact, showing that this should help with that.

I will drop my concerns for the moment, but I do hope we can bring this back if we find other ways to remove the moves, enough to actually help.

lenary

LGTM

topperc · 2025-11-25T19:23:52Z

@christian-herber-nxp can you recheck the numbers since I've kept stores of 0 now.

topperc · 2025-11-25T19:25:47Z

llvm/test/CodeGen/RISCV/zilsd.ll

 ; CHECK-NEXT:    lui a1, 4
 ; CHECK-NEXT:    add a0, a0, a1
-; CHECK-NEXT:    ld a2, -384(a0)
+; CHECK-NEXT:    lw a2, -384(a0)


This is a regression. x10 is already allocated. We pick x11 for the odd register first, but this can't pair since x10 is already used. So we pick x12 for the even but don't revisit the odd.

lenary · 2025-11-25T20:56:57Z

I had a thought at lunch, which might be helpful.

What if we codegen the non-volatile loads/stores to PseudoLD_RV32_OPT/PseudoSD_RV32_OPT - this does the "keep things together as long as possible", but also allows the compiler to split them later if needed. Would this make any sense?

This is proposed as an alternative to llvm#169529. The idea here is during selection, to choose between directly generating `LD`/`SD` or generating `PseudoLD_RV32_OPT`/`PseudoSD_RV32_OPT` based on the volatility of the access. Volatile operations will always become `LD`/`SD`, but non-volatile operations have a chance of becoming a pair of `LW`/`SW` depending on the register allocation, which might save some `MV` instructions. The advantage of this approach is that we don't need to go searching for instructions to pair (including comparing their memory operands) in the pre-ra pass, we already know these are paired, but they don't constrain the register allocator, unlike `LD`/`SD`. This PR is maybe not enough - we probably have to check the passes between ISel and the Pre-RA Load/Store Pairing pass cope with this correctly. This also fixes a verifier error with the kill flags.

4vtomat · 2025-11-26T04:56:18Z

llvm/test/CodeGen/RISCV/zilsd.ll

+; CHECK-NEXT:    sw a1, 0(a0)
+; CHECK-NEXT:    sw a2, 4(a0)
+; CHECK-NEXT:    sw a1, 88(a0)
+; CHECK-NEXT:    sw a2, 92(a0)


Curious that does this also considered a regression in terms of latency since it turns 2 loads to 4 loads?
Or it doesn't matter since latency of sd is 2 * sw

christian-herber-nxp · 2025-11-26T12:29:29Z

@christian-herber-nxp can you recheck the numbers since I've kept stores of 0 now.

Oz, march=rv32imb_zce_zdinx_zilsd_zclsd

embench-iot 1.0:

eembc:

It is mostly flat, with a big code size expansion on cubic https://github.com/embench/embench-iot/tree/master/src/cubic

I will also build #169580 to compare. I do not have a fast build machine, so this will take a couple of hours.

christian-herber-nxp · 2025-11-26T14:39:08Z

embench-iot 1.0:

The cubic benchmark seems to be exposing a weakness in both approaches. Might be worthwhile to dive deeper into that.

eembc:

[RISCV] Only convert volatile i64 load/store to Zilsd in SelectionDAG.

6559990

Rely on ZilsdOptimizer for non-volatile loads/stores. This can avoid extra moves. Also use Zilsd for non-volatile stores of 0.

topperc requested review from 4vtomat and lenary November 25, 2025 17:20

llvmbot added the backend:RISC-V label Nov 25, 2025

lenary approved these changes Nov 25, 2025

View reviewed changes

lenary mentioned this pull request Nov 25, 2025

[RISCV] Increase GPRPair Allocation Priority #169553

Draft

Merge remote-tracking branch 'origin/main' into pr/zilsd-non-volatile

68a4e8f

topperc commented Nov 25, 2025

View reviewed changes

lenary mentioned this pull request Nov 25, 2025

[RISCV] Use Zilsd Pseudos in ISel #169580

Open

4vtomat reviewed Nov 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Only convert volatile i64 load/store to Zilsd in SelectionDAG. #169529

[RISCV] Only convert volatile i64 load/store to Zilsd in SelectionDAG. #169529

Uh oh!

topperc commented Nov 25, 2025

Uh oh!

topperc commented Nov 25, 2025

Uh oh!

llvmbot commented Nov 25, 2025

Uh oh!

lenary commented Nov 25, 2025

Uh oh!

lenary commented Nov 25, 2025

Uh oh!

lenary left a comment

Uh oh!

topperc commented Nov 25, 2025

Uh oh!

topperc Nov 25, 2025

Uh oh!

lenary commented Nov 25, 2025

Uh oh!

4vtomat Nov 26, 2025

Uh oh!

christian-herber-nxp commented Nov 26, 2025

Uh oh!

christian-herber-nxp commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[RISCV] Only convert volatile i64 load/store to Zilsd in SelectionDAG. #169529

Are you sure you want to change the base?

[RISCV] Only convert volatile i64 load/store to Zilsd in SelectionDAG. #169529

Uh oh!

Conversation

topperc commented Nov 25, 2025

Uh oh!

topperc commented Nov 25, 2025

Uh oh!

llvmbot commented Nov 25, 2025

Uh oh!

lenary commented Nov 25, 2025

Uh oh!

lenary commented Nov 25, 2025

Uh oh!

lenary left a comment

Choose a reason for hiding this comment

Uh oh!

topperc commented Nov 25, 2025

Uh oh!

topperc Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

lenary commented Nov 25, 2025

Uh oh!

4vtomat Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

christian-herber-nxp commented Nov 26, 2025

Uh oh!

christian-herber-nxp commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants