[RISCV][ISelLowering] Use Zicond for FP selects on Zfinx/Zdinx #169299

fennecJ · 2025-11-24T09:11:24Z

Summary

This patch let RISCVTargetLowering::lowerSELECT to lower some floating-point select operations through an integer zicond select when:

Zicond is available, and
FP values live in GPRs (Zfinx/Zdinx), and
Select condition is an integer type.

In that scenario there is no extra cost for GPR <-> "FP GPR" moves, so we can implement FP selects with a CZERO-based sequence instead of a branch.

For example, for

float foo(int cond, float x) {
    return (cond != 0) ? x : 0.0f;
}

the current lowering produces:

foo:
  mv    a2, a0
  li    a0, 0
  beqz  a2, .LBB0_2
.LBB0_1:
  mv    a0, a1
.LBB0_2:
  ret

With this patch, when targeting rv64ima_zicond_zfinx we instead get:

foo:
  czero.nez  a2, zero, a0
  czero.eqz  a0, a1, a0
  or         a0, a2, a0
  ret

The existing branch-based lowering is preserved for:

targets without Zicond
targets where FP registers are separate (+f, +d without zfinx/zdinx)

Testing

Adds llvm/test/CodeGen/RISCV/zicond-fp-select-zfinx.ll to cover:

RV64 Zfinx/Zicond vs Zfinx without Zicond
RV64 Zdinx/Zicond vs Zdinx without Zicond
RV32 Zfinx/Zicond vs Zfinx without Zicond

Also adds baseline RV32F/RV64F/RV64D cases to ensure we still use branches when FP registers are separate.

The tests check that:

With Zicond + Zfinx/Zdinx, FP select lowers to a CZERO+OR sequence with no conditional branches.
Without Zicond (or without Zfinx/Zdinx), we still get branch-based code and no czero.* instructions.

llvmbot · 2025-11-24T09:11:54Z

@llvm/pr-subscribers-backend-risc-v

Author: None (fennecJ)

Changes

Summary

This patch let RISCVTargetLowering::lowerSELECT to lower some floating-point select operations through an integer zicond select when:

Zicond is available, and
FP values live in GPRs (Zfinx/Zdinx), and
Select condition is an integer type.

In that scenario there is no extra cost for GPR <-> "FP GPR" moves, so we can implement FP selects with a CZERO-based sequence instead of a branch.

For example, for

float foo(int cond, float x) {
    return (cond != 0) ? x : 0.0f;
}

the current lowering produces:

foo:
  mv    a2, a0
  li    a0, 0
  beqz  a2, .LBB0_2
.LBB0_1:
  mv    a0, a1
.LBB0_2:
  ret

With this patch, when targeting rv64ima_zicond_zfinx we instead get:

foo:
  czero.nez  a2, zero, a0
  czero.eqz  a0, a1, a0
  or         a0, a2, a0
  ret

The existing branch-based lowering is preserved for:

targets without Zicond
targets where FP registers are separate (+f, +d without zfinx/zdinx)

Testing

Adds llvm/test/CodeGen/RISCV/zicond-fp-select-zfinx.ll to cover:

RV64 Zfinx/Zicond vs Zfinx without Zicond
RV64 Zdinx/Zicond vs Zdinx without Zicond
RV32 Zfinx/Zicond vs Zfinx without Zicond

Also adds baseline RV32F/RV64F/RV64D cases to ensure we still use branches when FP registers are separate.

The tests check that:

With Zicond + Zfinx/Zdinx, FP select lowers to a CZERO+OR sequence with no conditional branches.
Without Zicond (or without Zfinx/Zdinx), we still get branch-based code and no czero.* instructions.

Full diff: https://github.com/llvm/llvm-project/pull/169299.diff

2 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+26)
(added) llvm/test/CodeGen/RISCV/zicond-fp-select-zfinx.ll (+115)

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 19a16197272fe..e01d320e7e1ec 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -9555,6 +9555,32 @@ SDValue RISCVTargetLowering::lowerSELECT(SDValue Op, SelectionDAG &DAG) const {
   if (SDValue V = lowerSelectToBinOp(Op.getNode(), DAG, Subtarget))
     return V;
 
+  // When there is no cost for GPR <-> FGPR, we can use zicond select for floating value when CondV is int type
+  bool FPinGPR = Subtarget.hasStdExtZfinx() || Subtarget.hasStdExtZdinx();
+  bool UseZicondForFPSel = Subtarget.hasStdExtZicond() && FPinGPR && VT.isFloatingPoint() && CondV.getValueType().isInteger();
+  if (UseZicondForFPSel) {
+    MVT XLenIntVT = Subtarget.getXLenVT();
+
+    auto CastToInt = [&](SDValue V) -> SDValue {
+      if (VT == MVT::f32 && Subtarget.is64Bit()) {
+        return DAG.getNode(RISCVISD::FMV_X_ANYEXTW_RV64, DL, XLenIntVT, V);
+      }
+      return DAG.getBitcast(XLenIntVT, V);
+    };
+
+    SDValue TrueVInt = CastToInt(TrueV);
+    SDValue FalseVInt = CastToInt(FalseV);
+
+    // Emit integer SELECT (lowers to Zicond)
+    SDValue ResultInt = DAG.getNode(ISD::SELECT, DL, XLenIntVT, CondV, TrueVInt, FalseVInt);
+
+    // Convert back to floating VT
+    if (VT == MVT::f32 && Subtarget.is64Bit()) {
+      return DAG.getNode(RISCVISD::FMV_W_X_RV64, DL, VT, ResultInt);
+    }
+    return DAG.getBitcast(VT, ResultInt);
+  }
+
   // When Zicond or XVentanaCondOps is present, emit CZERO_EQZ and CZERO_NEZ
   // nodes to implement the SELECT. Performing the lowering here allows for
   // greater control over when CZERO_{EQZ/NEZ} are used vs another branchless
diff --git a/llvm/test/CodeGen/RISCV/zicond-fp-select-zfinx.ll b/llvm/test/CodeGen/RISCV/zicond-fp-select-zfinx.ll
new file mode 100644
index 0000000000000..ddce3752584c4
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/zicond-fp-select-zfinx.ll
@@ -0,0 +1,115 @@
+; RUN: llc -mtriple=riscv64 -mattr=+zfinx,+zicond -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64ZFINX_ZICOND
+; RUN: llc -mtriple=riscv64 -mattr=+zfinx           -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64ZFINX_NOZICOND
+; RUN: llc -mtriple=riscv64 -mattr=+zdinx,+zicond -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64ZDINX_ZICOND
+; RUN: llc -mtriple=riscv64 -mattr=+zdinx           -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64ZDINX_NOZICOND
+; RUN: llc -mtriple=riscv64 -mattr=+f -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64F
+; RUN: llc -mtriple=riscv64 -mattr=+d -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64D
+
+; RUN: llc -mtriple=riscv32 -mattr=+zfinx,+zicond -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV32ZFINX_ZICOND
+; RUN: llc -mtriple=riscv32 -mattr=+zfinx           -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV32ZFINX_NOZICOND
+; RUN: llc -mtriple=riscv32 -mattr=+f -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV32F
+
+
+; This test checks that floating-point SELECT is lowered through integer
+; SELECT (and thus to Zicond czero.* sequence) when FP values live in GPRs
+; (Zfinx/Zdinx) and Zicond is enabled. When Zicond is disabled, we expect
+; a branch-based lowering instead.
+
+; -----------------------------------------------------------------------------
+; float select with i1 condition (Zfinx)
+; -----------------------------------------------------------------------------
+
+define float @select_f32_i1(i1 %cond, float %t, float %f) nounwind {
+; RV64ZFINX_ZICOND-LABEL: select_f32_i1:
+; RV64ZFINX_ZICOND: czero
+; RV64ZFINX_ZICOND: czero
+; RV64ZFINX_ZICOND: or
+; RV64ZFINX_ZICOND-NOT: b{{(eq|ne)z?}}
+; RV64ZFINX_ZICOND: ret
+
+; RV64ZFINX_NOZICOND-LABEL: select_f32_i1:
+; RV64ZFINX_NOZICOND: b{{(eq|ne)z?}}
+; RV64ZFINX_NOZICOND-NOT: czero.eqz
+; RV64ZFINX_NOZICOND-NOT: czero.nez
+
+; RV64F-LABEL: select_f32_i1:
+; RV64F: b{{(eq|ne)z?}}
+; RV64F-NOT: czero.eqz
+; RV64F-NOT: czero.nez
+
+; RV32ZFINX_ZICOND-LABEL: select_f32_i1:
+; RV32ZFINX_ZICOND: czero
+; RV32ZFINX_ZICOND: czero
+; RV32ZFINX_ZICOND: or
+; RV32ZFINX_ZICOND-NOT: b{{(eq|ne)z?}}
+; RV32ZFINX_ZICOND: ret
+
+; RV32ZFINX_NOZICOND-LABEL: select_f32_i1:
+; RV32ZFINX_NOZICOND: b{{(eq|ne)z?}}
+; RV32ZFINX_NOZICOND-NOT: czero.eqz
+; RV32ZFINX_NOZICOND-NOT: czero.nez
+
+; RV32F-LABEL: select_f32_i1:
+; RV32F: b{{(eq|ne)z?}}
+; RV32F-NOT: czero.eqz
+; RV32F-NOT: czero.nez
+
+entry:
+  %sel = select i1 %cond, float %t, float %f
+  ret float %sel
+}
+
+; -----------------------------------------------------------------------------
+; double select with i1 condition (Zdinx)
+; -----------------------------------------------------------------------------
+
+define double @select_f64_i1(i1 %cond, double %t, double %f) nounwind {
+; RV64ZDINX_ZICOND-LABEL: select_f64_i1:
+; RV64ZDINX_ZICOND: czero
+; RV64ZDINX_ZICOND: czero
+; RV64ZDINX_ZICOND: or
+; RV64ZDINX_ZICOND-NOT: b{{(eq|ne)z?}}
+; RV64ZDINX_ZICOND: ret
+
+; RV64ZDINX_NOZICOND-LABEL: select_f64_i1:
+; RV64ZDINX_NOZICOND: b{{(eq|ne)z?}}
+; RV64ZDINX_NOZICOND-NOT: czero.eqz
+; RV64ZDINX_NOZICOND-NOT: czero.nez
+
+; RV64D-LABEL: select_f64_i1:
+; RV64D: b{{(eq|ne)z?}}
+; RV64D-NOT: czero.eqz
+; RV64D-NOT: czero.nez
+
+entry:
+  %sel = select i1 %cond, double %t, double %f
+  ret double %sel
+}
+
+; -----------------------------------------------------------------------------
+; double select with floating-point compare condition (a > b ? c : d), Zdinx
+; -----------------------------------------------------------------------------
+
+define double @select_f64_fcmp(double %a, double %b, double %c, double %d) nounwind {
+; RV64ZDINX_ZICOND-LABEL: select_f64_fcmp:
+; RV64ZDINX_ZICOND: czero
+; RV64ZDINX_ZICOND: czero
+; RV64ZDINX_ZICOND: or
+; RV64ZDINX_ZICOND-NOT: b{{(eq|ne)z?}}
+; RV64ZDINX_ZICOND: ret
+
+; RV64ZDINX_NOZICOND-LABEL: select_f64_fcmp:
+; RV64ZDINX_NOZICOND: b{{(eq|ne)z?}}
+; RV64ZDINX_NOZICOND-NOT: czero.eqz
+; RV64ZDINX_NOZICOND-NOT: czero.nez
+
+; RV64D-LABEL: select_f64_fcmp:
+; RV64D: b{{(eq|ne)z?}}
+; RV64D-NOT: czero.eqz
+; RV64D-NOT: czero.nez
+
+entry:
+  %cmp = fcmp ogt double %a, %b
+  %sel = select i1 %cmp, double %c, double %d
+  ret double %sel
+}
\ No newline at end of file

github-actions · 2025-11-24T09:13:00Z

✅ With the latest revision this PR passed the C/C++ code formatter.

fennecJ · 2025-11-24T10:11:57Z

Hi, this PR implements an optimization for Zicond combined with Zfinx/Zdinx.

Could you help review this? Thanks in advance!

cc @topperc

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

topperc · 2025-11-24T19:24:03Z

llvm/test/CodeGen/RISCV/zicond-fp-select-zfinx.ll

@@ -0,0 +1,159 @@
+; RUN: llc -mtriple=riscv64 -mattr=+zfinx,+zicond -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64ZFINX_ZICOND


Please use the update_llc_test_checks.py script. https://llvm.org/docs/TestingGuide.html#generating-assertions-in-regression-tests

I dropped some RUN variants so the autogenerated checks stay readable, while still covering the key Zfinx/Zdinx/Zfinx + Zicond cases.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

fennecJ · 2025-11-24T20:34:35Z

For the RV32ZFINX_ZICOND RUN of select_f64_fcmp, I noticed several extra lw/sw instructions appear, so this transform might not actually be a win there. I’m still looking into how to handle that configuration better.

On second thought, this happens because the target extensions don’t provide real double support, so double compare is lowered via soft-float/libcalls. I also tried an F-only configuration and still see the same lw/sw spills, so this codegen is expected in this setup.

lenary

Please can you add an rv32zdinx RUN line? It would be useful to see if/how this works for 64-bit values with that combination.

fennecJ · 2025-11-25T04:01:18Z

Please can you add an rv32zdinx RUN line? It would be useful to see if/how this works for 64-bit values with that combination.

Thanks for the suggestion! It was actually very helpful because adding the rv32zdinx RUN line revealed a crash when handling f64 values on RV32.

I've updated the implementation to handle f64 on 32-bit targets by splitting it into two 32-bit integer selects. The RUN line has been added and verifies the fix.

However, I have a concern regarding the performance cost for this specific case: While the static code size difference is negligible (7 vs 8 instructions), the dynamic instruction count is a regression.

define double @select_f64_fcmp(double %a, double %b, double %c, double %d) nounwind {
entry:
  %cmp = fcmp ogt double %a, %b
  %sel = select i1 %cmp, double %c, double %d
  ret double %sel
}

Branch version: Executes 5 or 7 instructions

; RV32ZDINX_NOZICOND-LABEL: select_f64_fcmp:
; RV32ZDINX_NOZICOND:       # %bb.0: # %entry
; RV32ZDINX_NOZICOND-NEXT:    flt.d a0, a2, a0
; RV32ZDINX_NOZICOND-NEXT:    bnez a0, .LBB2_2
; RV32ZDINX_NOZICOND-NEXT:  # %bb.1: # %entry
; RV32ZDINX_NOZICOND-NEXT:    mv a4, a6
; RV32ZDINX_NOZICOND-NEXT:    mv a5, a7
; RV32ZDINX_NOZICOND-NEXT:  .LBB2_2: # %entry
; RV32ZDINX_NOZICOND-NEXT:    mv a0, a4
; RV32ZDINX_NOZICOND-NEXT:    mv a1, a5
; RV32ZDINX_NOZICOND-NEXT:    ret

Zicond version: Always executes 8 instructions.

; RV32ZDINX_ZICOND-LABEL: select_f64_fcmp:
; RV32ZDINX_ZICOND:       # %bb.0: # %entry
; RV32ZDINX_ZICOND-NEXT:    flt.d a0, a2, a0
; RV32ZDINX_ZICOND-NEXT:    czero.nez a1, a6, a0
; RV32ZDINX_ZICOND-NEXT:    czero.eqz a2, a4, a0
; RV32ZDINX_ZICOND-NEXT:    czero.nez a3, a7, a0
; RV32ZDINX_ZICOND-NEXT:    czero.eqz a4, a5, a0
; RV32ZDINX_ZICOND-NEXT:    or a0, a2, a1
; RV32ZDINX_ZICOND-NEXT:    or a1, a4, a3
; RV32ZDINX_ZICOND-NEXT:    ret

Should we still prefer the Zicond transformation here, or should I disable it for f64 on RV32?

lenary · 2025-11-25T06:27:08Z

Your concerns are right, and that's partly what i wanted to see with the testing. I'm glad you also caught the crash :)

The branch version can also make further savings in longer sequences, where the final pair of mvs might not need to be emitted.

I think you should probably skip this optimisation for rv32_zdinx.

Aside: One thing we're pretty useless at with rv32_zdinx is using FSGNJ.D to move pairs of registers, as your examples show

fennecJ · 2025-11-25T08:59:31Z

I have updated the selection logic to skip cases requiring register splitting (e.g., f64 on RV32).

Consequently, the output for select_f64_fcmp

define double @select_f64_fcmp(double %a, double %b, double %c, double %d) nounwind {
entry:
  %cmp = fcmp ogt double %a, %b
  %sel = select i1 %cmp, double %c, double %d
  ret double %sel
}

is now identical for both RV32ZDINX_ZICOND and RV32ZDINX_NOZICOND:

; RV32ZDINX_ZICOND-LABEL: select_f64_fcmp:
; RV32ZDINX_ZICOND:       # %bb.0: # %entry
; RV32ZDINX_ZICOND-NEXT:    flt.d a0, a2, a0
; RV32ZDINX_ZICOND-NEXT:    bnez a0, .LBB2_2
; RV32ZDINX_ZICOND-NEXT:  # %bb.1: # %entry
; RV32ZDINX_ZICOND-NEXT:    mv a4, a6
; RV32ZDINX_ZICOND-NEXT:    mv a5, a7
; RV32ZDINX_ZICOND-NEXT:  .LBB2_2: # %entry
; RV32ZDINX_ZICOND-NEXT:    mv a0, a4
; RV32ZDINX_ZICOND-NEXT:    mv a1, a5
; RV32ZDINX_ZICOND-NEXT:    ret
;
; RV32ZDINX_NOZICOND-LABEL: select_f64_fcmp:
; RV32ZDINX_NOZICOND:       # %bb.0: # %entry
; RV32ZDINX_NOZICOND-NEXT:    flt.d a0, a2, a0
; RV32ZDINX_NOZICOND-NEXT:    bnez a0, .LBB2_2
; RV32ZDINX_NOZICOND-NEXT:  # %bb.1: # %entry
; RV32ZDINX_NOZICOND-NEXT:    mv a4, a6
; RV32ZDINX_NOZICOND-NEXT:    mv a5, a7
; RV32ZDINX_NOZICOND-NEXT:  .LBB2_2: # %entry
; RV32ZDINX_NOZICOND-NEXT:    mv a0, a4
; RV32ZDINX_NOZICOND-NEXT:    mv a1, a5
; RV32ZDINX_NOZICOND-NEXT:    ret

Note: For RV32ZFINX_ZICOND, the czero instructions seem to be pre-existing/unrelated, as they remain even if I remove the whole FP ISel logic.

; RV32ZFINX_ZICOND-LABEL: select_f64_fcmp:
; RV32ZFINX_ZICOND:       # %bb.0: # %entry
; RV32ZFINX_ZICOND-NEXT:    addi sp, sp, -32
; RV32ZFINX_ZICOND-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
; RV32ZFINX_ZICOND-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
; RV32ZFINX_ZICOND-NEXT:    sw s1, 20(sp) # 4-byte Folded Spill
; RV32ZFINX_ZICOND-NEXT:    sw s2, 16(sp) # 4-byte Folded Spill
; RV32ZFINX_ZICOND-NEXT:    sw s3, 12(sp) # 4-byte Folded Spill
; RV32ZFINX_ZICOND-NEXT:    mv s0, a7
; RV32ZFINX_ZICOND-NEXT:    mv s1, a6
; RV32ZFINX_ZICOND-NEXT:    mv s2, a5
; RV32ZFINX_ZICOND-NEXT:    mv s3, a4
; RV32ZFINX_ZICOND-NEXT:    call __gtdf2
; RV32ZFINX_ZICOND-NEXT:    sgtz a0, a0
; RV32ZFINX_ZICOND-NEXT:    czero.nez a1, s1, a0
; RV32ZFINX_ZICOND-NEXT:    czero.eqz a2, s3, a0
; RV32ZFINX_ZICOND-NEXT:    czero.nez a3, s0, a0
; RV32ZFINX_ZICOND-NEXT:    czero.eqz a4, s2, a0
; RV32ZFINX_ZICOND-NEXT:    or a0, a2, a1
; RV32ZFINX_ZICOND-NEXT:    or a1, a4, a3
; RV32ZFINX_ZICOND-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
; RV32ZFINX_ZICOND-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
; RV32ZFINX_ZICOND-NEXT:    lw s1, 20(sp) # 4-byte Folded Reload
; RV32ZFINX_ZICOND-NEXT:    lw s2, 16(sp) # 4-byte Folded Reload
; RV32ZFINX_ZICOND-NEXT:    lw s3, 12(sp) # 4-byte Folded Reload
; RV32ZFINX_ZICOND-NEXT:    addi sp, sp, 32
; RV32ZFINX_ZICOND-NEXT:    ret

fennecJ · 2025-11-25T15:42:11Z

I noticed that when lowering a floating-point SELECT to integer Zicond
operations on RV64, converting an f32 +0.0 results in a
RISCVISD::FMV_X_ANYEXTW_RV64 node. It seems that this target-specific
node represents a register move, which obscures the underlying const-zero
value from the instruction selector and prevents the backend from
pattern-matching it into a single czero instruction.

For the following example:

  define dso_local noundef float @select_i1_f32_0(i1 %cond, float %t) nounwind {
  entry:
    %sel = select i1 %cond, float %t, float 0.000000e+00
    ret float %sel
  }

On RV32 (e.g, RV32ZFINX_ZICOND, no need to call FMV_X), this previously resulted in:

    czero.eqz a0, a1, a0
    ret

However,
On RV64 (e.g., +zicond +zdinx), this previously resulted in:

    czero.nez  a2, zero, a0
    czero.eqz  a0, a1, a0
    or         a0, a2, a0
    ret

Since the "else" value is zero, we can utilize the mechanism that czero
like instruction will store zero into rd based on cond reg and optimized
this scenario in to a single instruction.
By explicitly detecting +0.0 and lowering it to a constant integer 0,
the newest commit enables the single czero generation for above example.

    czero.eqz  a0, a1, a0
    ret

lenary · 2025-11-25T18:14:07Z

Note: For RV32ZFINX_ZICOND, the czero instructions seem to be pre-existing/unrelated, as they remain even if I remove the whole FP ISel logic.

I think this is probably because SDAG sees this whole thing as integer, and so splits the selects anyway. If we were to address this, it's probably better in a follow-up.

lenary · 2025-11-25T18:16:45Z

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

+      if (auto *CFP = dyn_cast<ConstantFPSDNode>(V)) {
+        if (CFP->isZero() && !CFP->isNegative())
+          return DAG.getConstant(0, DL, XLenIntVT);
+      }


You could just use

Suggested change

if (auto *CFP = dyn_cast<ConstantFPSDNode>(V)) {

if (CFP->isZero() && !CFP->isNegative())

return DAG.getConstant(0, DL, XLenIntVT);

}

if (isNullFPConstant(V))

return DAG.getConstant(0, DL, XLenIntVT);

Though I do think we might need to be careful that XLenIntVT is the same width as VT

Thanks for the review! I will update the code to use isNullFPConstant(V) .
Regarding the caution about XLenIntVT vs VT width: One of my concerns was whether this would break the Nan-boxing semantics on RV64 with f16, where the type width < XLen .

To verify this, I added the select_i1_half_0 test case. The results confirm that the backend still generates the correct Nan-boxing sequence. (The test is appended in newest commit)

define dso_local noundef half @select_i1_half_0(i1 %cond, half %val) nounwind { entry: %sel = select i1 %cond, half %val, half 0xH0000 ret half %sel }

; RV64ZDINX_ZICOND: select_i1_half_0: # %bb.0: # %entry # kill: def $x11_w killed $x11_w def $x11 andi a0, a0, 1 czero.eqz a0, a1, a0 lui a1, 1048560 or a0, a0, a1 # kill: def $x10_w killed $x10_w killed $x10 ret

This is guaranteed by RISCVTargetLowering::splitValueIntoRegisterParts, which explicitly applies the boxing mask (0xFFFF0000) when the type is restored and lowered as an ABI copy:

if (IsABIRegCopy && (ValueVT == MVT::f16 || ValueVT == MVT::bf16) && PartVT == MVT::f32) { // Cast the [b]f16 to i16, extend to i32, pad with ones to make a float // nan, and cast to f32. Val = DAG.getNode(ISD::BITCAST, DL, MVT::i16, Val); Val = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, Val); Val = DAG.getNode(ISD::OR, DL, MVT::i32, Val, DAG.getConstant(0xFFFF0000, DL, MVT::i32)); Val = DAG.getNode(ISD::BITCAST, DL, PartVT, Val); Parts[0] = Val; return true; }

I am currently researching if there are any other possible issues regarding the width mismatch.
Any advice is welcome : )

What happens for the following with zhinx:

define half @select_i1_half_0_add(i1 %cond, half %val) nounwind { entry: %sel = select i1 %cond, half %val, half 0xH0000 %add = fadd half %sel, 1.0 ret half %add }

Is the operand of the fadd.h correctly nan-boxed? (This wouldn't be dealt with in splitValueIntoRegisterParts because it's in the same basic block as the select).

You potentially have the same problem with fadd.s on rv64, for which I think you can make/adapt a testcase based on the above.

It may be simplest to push the isNullFPConstant check down to just above the getBitcast if these two examples are not correctly nan-boxed.

I tried your select_i1_half_0_add case, and the result is correctly Nan-boxed at the return, I'll append the test result to newest commit.

select_i1_half_0_add: # %bb.0: # %entry addi sp, sp, -16 sd ra, 8(sp) # 8-byte Folded Spill # kill: def $x11_w killed $x11_w def $x11 andi a0, a0, 1 czero.eqz a0, a1, a0 # kill: def $x10_w killed $x10_w killed $x10 call __extendhfsf2 lui a1, 260096 fadd.s a0, a0, a1 call __truncsfhf2 # kill: def $x10_w killed $x10_w def $x10 lui a1, 1048560 # a1= 0xFFFF0000 or a0, a0, a1 # nan-boxing !! # kill: def $x10_w killed $x10_w killed $x10 ld ra, 8(sp) # 8-byte Folded Reload addi sp, sp, 16 ret

Regarding your question about whether this optimization results in incorrect Nan-boxing:

Actually, I initially shared the same assumption that explicit Nan-boxing was required here. However, upon double-checking the Zfinx/Zdinx/Zhinx specification[1], I discovered that Nan-boxing is not required for operands in these extensions. Instead, they rely on Sign-Extension.

According to Section 1 (Processing of Narrower Values):

"Floating-point operands of width w < XLEN bits occupy bits w-1:0 of an x register. Floating-point operations on w-bit operands ignore operand bits XLEN-1: w."
"Floating-point operations that produce w < XLEN-bit results fill bits XLEN-1: w with copies of bit w-1 (the sign bit)."

Since my optimization targets +0.0 (sign bit is 0), the resulting all-zero integer effectively implements the required Sign-Extension. It is safe for input operands (as high bits are ignored) and valid as a result.

As for the ABI[2], the clause regarding Nan-boxing specifies:

"When a floating-point argument narrower than FLEN bits is passed in a floating-point register, it is 1-extended (NaN-boxed) to FLEN bits."

Since Zfinx has no floating-point registers (it uses x registers), and the Unpriv spec mandates sign-extension for x registers, providing a zero-extended 0 is safe. Even if specific calling conventions require boxing at boundaries, the backend's splitValueIntoRegisterParts handles that explicitly.

Therefore, I believe keeping the +0.0 optimization at the current location is safe and yields the best code generation (single czero).

[1] - riscv unpriv zfinx
[2] - riscv-elf-psabi-doc

Ok, fantastic, I missed that note in the spec, and figured it used the regular NaN-boxing scheme for F/D/H.

Ok, cool, I'm happy with this

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

github-actions · 2025-11-26T07:37:35Z

🐧 Linux x64 Test Results

186633 tests passed
4888 tests skipped

When Zfinx or Zdinx is enabled, FP values live in the integer register file, so there is no GPR<->FPR move cost. In this configuration we can profitably lower floating-point `select` nodes through an integer `ISD::SELECT` so that the existing Zicond patterns generate branchless czero/cmov sequences instead of control-flow branches. This covers patterns such as: // Case 1: integer condition float sel(int cond, float a, float b) { return cond ? a : b; } // Case 2: floating-point condition float cmp_sel(float a, float b, float c, float d) { return (a > b) ? c : d; } In Case 1 we bitcast the FP operands to an XLen integer type, form an integer `select` with the original integer condition `cond`, and then bitcast the result back to float, allowing it to be lowered to a Zicond-based cmov/czero sequence. In Case 2 the floating-point compare is already lowered to an integer 0/1 value in a GPR (e.g. `flt.s` / `fle.s` / `feq.s`), so we can reuse that integer result as the Zicond condition and apply the same scheme. The transformation is gated on both Zicond and Zfinx/Zdinx, so classic F/D implementations (where FP values live in a separate FP register file) continue to use the existing branch-based lowering and do not pay for extra GPR<->FPR moves. FP semantics are unchanged: we only reinterpret the FP operands as integers; no numeric conversions are introduced.

* Floating-point `select` is lowered through Zicond only when both Zicond and Zfinx/Zdinx extensions are enabled. * When both Zicond and Zfinx/Zdinx are enabled, branch-based lowering is replaced by a Zicond-backed integer `select` (no control-flow branches are emitted for these cases).

Zdinx implies Zfinx so it's sufficient to check only Zfinx here.

* CondV should always be int, no need to check * Omit the if statement bracket when there is only oneline body

When target machine has only 32 bits, we need to split 64bit f64 VT into 2 32 bits reg for selection.

This patch disables the Zicond optimization for floating-point selects when the value type exceeds XLen (e.g., f64 on RV32 +zdinx). In these cases, using Zicond requires handling split registers, which results in a higher dynamic instruction count compared to the standard branch-based lowering (e.g., ~8 instructions vs ~5-7 instructions for appended sample code). Thus, there is no benefit to using Zicond here ```asm define double @select_f64_fcmp(double %a, double %b, double %c, double %d) nounwind { entry: %cmp = fcmp ogt double %a, %b %sel = select i1 %cmp, double %c, double %d ret double %sel } ``` Branch version: Executes 5 or 7 instruction ```asm ; RV32ZDINX_NOZICOND-LABEL: select_f64_fcmp: ; RV32ZDINX_NOZICOND: # %bb.0: # %entry ; RV32ZDINX_NOZICOND-NEXT: flt.d a0, a2, a0 ; RV32ZDINX_NOZICOND-NEXT: bnez a0, .LBB2_2 ; RV32ZDINX_NOZICOND-NEXT: # %bb.1: # %entry ; RV32ZDINX_NOZICOND-NEXT: mv a4, a6 ; RV32ZDINX_NOZICOND-NEXT: mv a5, a7 ; RV32ZDINX_NOZICOND-NEXT: .LBB2_2: # %entry ; RV32ZDINX_NOZICOND-NEXT: mv a0, a4 ; RV32ZDINX_NOZICOND-NEXT: mv a1, a5 ; RV32ZDINX_NOZICOND-NEXT: ret ``` Zicond version: Always executes 8 instructions. ```asm ; RV32ZDINX_ZICOND-LABEL: select_f64_fcmp: ; RV32ZDINX_ZICOND: # %bb.0: # %entry ; RV32ZDINX_ZICOND-NEXT: flt.d a0, a2, a0 ; RV32ZDINX_ZICOND-NEXT: czero.nez a1, a6, a0 ; RV32ZDINX_ZICOND-NEXT: czero.eqz a2, a4, a0 ; RV32ZDINX_ZICOND-NEXT: czero.nez a3, a7, a0 ; RV32ZDINX_ZICOND-NEXT: czero.eqz a4, a5, a0 ; RV32ZDINX_ZICOND-NEXT: or a0, a2, a1 ; RV32ZDINX_ZICOND-NEXT: or a1, a4, a3 ; RV32ZDINX_ZICOND-NEXT: ret ```

When lowering a floating-point SELECT to integer Zicond operations on RV64, converting an f32 +0.0 results in a RISCVISD::FMV_X_ANYEXTW_RV64 node. This target-specific node implies a register move somehow obscures the underlying constant zero value from the instruction selector, preventing the backend from pattern-matching a single `czero` instruction. For the following example: ```asm define dso_local noundef float @select_i1_f32_0(i1 %cond, float %t) nounwind { entry: %sel = select i1 %cond, float %t, float 0.000000e+00 ret float %sel } ``` On RV64 (e.g., +zicond +zdinx), this previously resulted in: ```asm czero.nez a2, zero, a0 czero.eqz a0, a1, a0 or a0, a2, a0 ret ``` Since the "else" value is zero, we can utilize the mechanism that czero like instruction will store zero into rd based on cond reg and optimized this scenario in to a single instruction. By explicitly detecting `+0.0` and lowering it to a constant integer 0, this commit enables the generation of: ```asm czero.eqz a0, a1, a0 ret ```

fennecJ · 2025-11-26T12:56:33Z

It seems like the CI failure is caused by some newer commits in the main branch. I will rebase and update the test checks later.

lenary

LGTM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

We can reuse XLenVT

tclin914

LGTM

fennecJ · 2025-11-27T16:33:05Z

Thanks everyone for the review! I really learned a lot from this process. :)

llvmbot added the backend:RISC-V label Nov 24, 2025

fennecJ force-pushed the RISCV_SELECT_OPT branch from 865ee32 to fe46a46 Compare November 24, 2025 09:16

topperc reviewed Nov 24, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated Show resolved Hide resolved

fennecJ requested a review from topperc November 24, 2025 19:09

topperc reviewed Nov 24, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated Show resolved Hide resolved

topperc reviewed Nov 24, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated Show resolved Hide resolved

fennecJ requested a review from topperc November 24, 2025 20:09

fennecJ force-pushed the RISCV_SELECT_OPT branch from 5f91d26 to bf53e56 Compare November 24, 2025 20:11

lenary reviewed Nov 24, 2025

View reviewed changes

fennecJ requested a review from lenary November 25, 2025 04:01

fennecJ force-pushed the RISCV_SELECT_OPT branch 2 times, most recently from b1cad4d to 5778116 Compare November 25, 2025 15:45

lenary reviewed Nov 25, 2025

View reviewed changes

topperc reviewed Nov 25, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated Show resolved Hide resolved

topperc reviewed Nov 25, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated Show resolved Hide resolved

fennecJ requested review from lenary and topperc November 26, 2025 07:24

fennecJ added 2 commits November 26, 2025 20:22

fennecJ added 12 commits November 26, 2025 20:22

Refine FPinGPR checking logic

2d43f37

Zdinx implies Zfinx so it's sufficient to check only Zfinx here.

Fix f16 Zicond SELECT lowering with Zhinx

1336a59

Add unit test for half zicond lowering

6a1e9cc

Revise zicond with zfinx implement

aa4eba6

* CondV should always be int, no need to check * Omit the if statement bracket when there is only oneline body

Update test file via update_llc_test_checks.py

1682fb6

Handle RV32 with f64 Zdinx

3e18135

When target machine has only 32 bits, we need to split 64bit f64 VT into 2 32 bits reg for selection.

Add testcase for rv32zdinx

23f5ad7

Update unit test result to align newest commit

082474e

Refine const zero casting logic

3b721ca

Add test for F16 select feeding into arithmetic op

ef97fca

fennecJ force-pushed the RISCV_SELECT_OPT branch from efca401 to ef97fca Compare November 26, 2025 12:56

lenary approved these changes Nov 26, 2025

View reviewed changes

tclin914 reviewed Nov 27, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Show resolved Hide resolved

Refine FP Zicond ISel logic

7cd5abc

We can reuse XLenVT

fennecJ requested a review from tclin914 November 27, 2025 03:45

tclin914 approved these changes Nov 27, 2025

View reviewed changes

		@@ -0,0 +1,159 @@
		; RUN: llc -mtriple=riscv64 -mattr=+zfinx,+zicond -verify-machineinstrs < %s \| FileCheck %s --check-prefix=RV64ZFINX_ZICOND

[RISCV][ISelLowering] Use Zicond for FP selects on Zfinx/Zdinx #169299

Are you sure you want to change the base?

[RISCV][ISelLowering] Use Zicond for FP selects on Zfinx/Zdinx #169299

Conversation

fennecJ commented Nov 24, 2025

Summary

Testing

Uh oh!

llvmbot commented Nov 24, 2025

Summary

Testing

Uh oh!

github-actions bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fennecJ commented Nov 24, 2025

Uh oh!

Uh oh!

Uh oh!

topperc Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fennecJ Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fennecJ commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lenary left a comment

Choose a reason for hiding this comment

Uh oh!

fennecJ commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lenary commented Nov 25, 2025

Uh oh!

fennecJ commented Nov 25, 2025

Uh oh!

fennecJ commented Nov 25, 2025

Uh oh!

lenary commented Nov 25, 2025

Uh oh!

lenary Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

fennecJ Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lenary Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

fennecJ Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

lenary Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

fennecJ commented Nov 26, 2025

Uh oh!

lenary left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tclin914 left a comment

Choose a reason for hiding this comment

Uh oh!

fennecJ commented Nov 27, 2025

Uh oh!

Reviewers

github-actions bot commented Nov 24, 2025 •

edited

Loading

topperc Nov 24, 2025 •

edited

Loading

fennecJ commented Nov 24, 2025 •

edited

Loading

fennecJ commented Nov 25, 2025 •

edited

Loading

fennecJ Nov 26, 2025 •

edited

Loading

github-actions bot commented Nov 26, 2025 •

edited

Loading