Skip to content

Conversation

@fennecJ
Copy link
Contributor

@fennecJ fennecJ commented Nov 24, 2025

Summary

This patch let RISCVTargetLowering::lowerSELECT to lower some floating-point select operations through an integer zicond select when:

  • Zicond is available, and
  • FP values live in GPRs (Zfinx/Zdinx), and
  • Select condition is an integer type.

In that scenario there is no extra cost for GPR <-> "FP GPR" moves, so we can implement FP selects with a CZERO-based sequence instead of a branch.

For example, for

float foo(int cond, float x) {
    return (cond != 0) ? x : 0.0f;
}

the current lowering produces:

foo:
  mv    a2, a0
  li    a0, 0
  beqz  a2, .LBB0_2
.LBB0_1:
  mv    a0, a1
.LBB0_2:
  ret

With this patch, when targeting rv64ima_zicond_zfinx we instead get:

foo:
  czero.nez  a2, zero, a0
  czero.eqz  a0, a1, a0
  or         a0, a2, a0
  ret

The existing branch-based lowering is preserved for:

  • targets without Zicond
  • targets where FP registers are separate (+f, +d without zfinx/zdinx)

Testing

Adds llvm/test/CodeGen/RISCV/zicond-fp-select-zfinx.ll to cover:

  • RV64 Zfinx/Zicond vs Zfinx without Zicond
  • RV64 Zdinx/Zicond vs Zdinx without Zicond
  • RV32 Zfinx/Zicond vs Zfinx without Zicond

Also adds baseline RV32F/RV64F/RV64D cases to ensure we still use branches when FP registers are separate.

The tests check that:

  • With Zicond + Zfinx/Zdinx, FP select lowers to a CZERO+OR sequence with no conditional branches.
  • Without Zicond (or without Zfinx/Zdinx), we still get branch-based code and no czero.* instructions.

@llvmbot
Copy link
Member

llvmbot commented Nov 24, 2025

@llvm/pr-subscribers-backend-risc-v

Author: None (fennecJ)

Changes

Summary

This patch let RISCVTargetLowering::lowerSELECT to lower some floating-point select operations through an integer zicond select when:

  • Zicond is available, and
  • FP values live in GPRs (Zfinx/Zdinx), and
  • Select condition is an integer type.

In that scenario there is no extra cost for GPR <-> "FP GPR" moves, so we can implement FP selects with a CZERO-based sequence instead of a branch.

For example, for

float foo(int cond, float x) {
    return (cond != 0) ? x : 0.0f;
}

the current lowering produces:

foo:
  mv    a2, a0
  li    a0, 0
  beqz  a2, .LBB0_2
.LBB0_1:
  mv    a0, a1
.LBB0_2:
  ret

With this patch, when targeting rv64ima_zicond_zfinx we instead get:

foo:
  czero.nez  a2, zero, a0
  czero.eqz  a0, a1, a0
  or         a0, a2, a0
  ret

The existing branch-based lowering is preserved for:

  • targets without Zicond
  • targets where FP registers are separate (+f, +d without zfinx/zdinx)

Testing

Adds llvm/test/CodeGen/RISCV/zicond-fp-select-zfinx.ll to cover:

  • RV64 Zfinx/Zicond vs Zfinx without Zicond
  • RV64 Zdinx/Zicond vs Zdinx without Zicond
  • RV32 Zfinx/Zicond vs Zfinx without Zicond

Also adds baseline RV32F/RV64F/RV64D cases to ensure we still use branches when FP registers are separate.

The tests check that:

  • With Zicond + Zfinx/Zdinx, FP select lowers to a CZERO+OR sequence with no conditional branches.
  • Without Zicond (or without Zfinx/Zdinx), we still get branch-based code and no czero.* instructions.

Full diff: https://github.com/llvm/llvm-project/pull/169299.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+26)
  • (added) llvm/test/CodeGen/RISCV/zicond-fp-select-zfinx.ll (+115)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 19a16197272fe..e01d320e7e1ec 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -9555,6 +9555,32 @@ SDValue RISCVTargetLowering::lowerSELECT(SDValue Op, SelectionDAG &DAG) const {
   if (SDValue V = lowerSelectToBinOp(Op.getNode(), DAG, Subtarget))
     return V;
 
+  // When there is no cost for GPR <-> FGPR, we can use zicond select for floating value when CondV is int type
+  bool FPinGPR = Subtarget.hasStdExtZfinx() || Subtarget.hasStdExtZdinx();
+  bool UseZicondForFPSel = Subtarget.hasStdExtZicond() && FPinGPR && VT.isFloatingPoint() && CondV.getValueType().isInteger();
+  if (UseZicondForFPSel) {
+    MVT XLenIntVT = Subtarget.getXLenVT();
+
+    auto CastToInt = [&](SDValue V) -> SDValue {
+      if (VT == MVT::f32 && Subtarget.is64Bit()) {
+        return DAG.getNode(RISCVISD::FMV_X_ANYEXTW_RV64, DL, XLenIntVT, V);
+      }
+      return DAG.getBitcast(XLenIntVT, V);
+    };
+
+    SDValue TrueVInt = CastToInt(TrueV);
+    SDValue FalseVInt = CastToInt(FalseV);
+
+    // Emit integer SELECT (lowers to Zicond)
+    SDValue ResultInt = DAG.getNode(ISD::SELECT, DL, XLenIntVT, CondV, TrueVInt, FalseVInt);
+
+    // Convert back to floating VT
+    if (VT == MVT::f32 && Subtarget.is64Bit()) {
+      return DAG.getNode(RISCVISD::FMV_W_X_RV64, DL, VT, ResultInt);
+    }
+    return DAG.getBitcast(VT, ResultInt);
+  }
+
   // When Zicond or XVentanaCondOps is present, emit CZERO_EQZ and CZERO_NEZ
   // nodes to implement the SELECT. Performing the lowering here allows for
   // greater control over when CZERO_{EQZ/NEZ} are used vs another branchless
diff --git a/llvm/test/CodeGen/RISCV/zicond-fp-select-zfinx.ll b/llvm/test/CodeGen/RISCV/zicond-fp-select-zfinx.ll
new file mode 100644
index 0000000000000..ddce3752584c4
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/zicond-fp-select-zfinx.ll
@@ -0,0 +1,115 @@
+; RUN: llc -mtriple=riscv64 -mattr=+zfinx,+zicond -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64ZFINX_ZICOND
+; RUN: llc -mtriple=riscv64 -mattr=+zfinx           -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64ZFINX_NOZICOND
+; RUN: llc -mtriple=riscv64 -mattr=+zdinx,+zicond -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64ZDINX_ZICOND
+; RUN: llc -mtriple=riscv64 -mattr=+zdinx           -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64ZDINX_NOZICOND
+; RUN: llc -mtriple=riscv64 -mattr=+f -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64F
+; RUN: llc -mtriple=riscv64 -mattr=+d -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64D
+
+; RUN: llc -mtriple=riscv32 -mattr=+zfinx,+zicond -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV32ZFINX_ZICOND
+; RUN: llc -mtriple=riscv32 -mattr=+zfinx           -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV32ZFINX_NOZICOND
+; RUN: llc -mtriple=riscv32 -mattr=+f -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV32F
+
+
+; This test checks that floating-point SELECT is lowered through integer
+; SELECT (and thus to Zicond czero.* sequence) when FP values live in GPRs
+; (Zfinx/Zdinx) and Zicond is enabled. When Zicond is disabled, we expect
+; a branch-based lowering instead.
+
+; -----------------------------------------------------------------------------
+; float select with i1 condition (Zfinx)
+; -----------------------------------------------------------------------------
+
+define float @select_f32_i1(i1 %cond, float %t, float %f) nounwind {
+; RV64ZFINX_ZICOND-LABEL: select_f32_i1:
+; RV64ZFINX_ZICOND: czero
+; RV64ZFINX_ZICOND: czero
+; RV64ZFINX_ZICOND: or
+; RV64ZFINX_ZICOND-NOT: b{{(eq|ne)z?}}
+; RV64ZFINX_ZICOND: ret
+
+; RV64ZFINX_NOZICOND-LABEL: select_f32_i1:
+; RV64ZFINX_NOZICOND: b{{(eq|ne)z?}}
+; RV64ZFINX_NOZICOND-NOT: czero.eqz
+; RV64ZFINX_NOZICOND-NOT: czero.nez
+
+; RV64F-LABEL: select_f32_i1:
+; RV64F: b{{(eq|ne)z?}}
+; RV64F-NOT: czero.eqz
+; RV64F-NOT: czero.nez
+
+; RV32ZFINX_ZICOND-LABEL: select_f32_i1:
+; RV32ZFINX_ZICOND: czero
+; RV32ZFINX_ZICOND: czero
+; RV32ZFINX_ZICOND: or
+; RV32ZFINX_ZICOND-NOT: b{{(eq|ne)z?}}
+; RV32ZFINX_ZICOND: ret
+
+; RV32ZFINX_NOZICOND-LABEL: select_f32_i1:
+; RV32ZFINX_NOZICOND: b{{(eq|ne)z?}}
+; RV32ZFINX_NOZICOND-NOT: czero.eqz
+; RV32ZFINX_NOZICOND-NOT: czero.nez
+
+; RV32F-LABEL: select_f32_i1:
+; RV32F: b{{(eq|ne)z?}}
+; RV32F-NOT: czero.eqz
+; RV32F-NOT: czero.nez
+
+entry:
+  %sel = select i1 %cond, float %t, float %f
+  ret float %sel
+}
+
+; -----------------------------------------------------------------------------
+; double select with i1 condition (Zdinx)
+; -----------------------------------------------------------------------------
+
+define double @select_f64_i1(i1 %cond, double %t, double %f) nounwind {
+; RV64ZDINX_ZICOND-LABEL: select_f64_i1:
+; RV64ZDINX_ZICOND: czero
+; RV64ZDINX_ZICOND: czero
+; RV64ZDINX_ZICOND: or
+; RV64ZDINX_ZICOND-NOT: b{{(eq|ne)z?}}
+; RV64ZDINX_ZICOND: ret
+
+; RV64ZDINX_NOZICOND-LABEL: select_f64_i1:
+; RV64ZDINX_NOZICOND: b{{(eq|ne)z?}}
+; RV64ZDINX_NOZICOND-NOT: czero.eqz
+; RV64ZDINX_NOZICOND-NOT: czero.nez
+
+; RV64D-LABEL: select_f64_i1:
+; RV64D: b{{(eq|ne)z?}}
+; RV64D-NOT: czero.eqz
+; RV64D-NOT: czero.nez
+
+entry:
+  %sel = select i1 %cond, double %t, double %f
+  ret double %sel
+}
+
+; -----------------------------------------------------------------------------
+; double select with floating-point compare condition (a > b ? c : d), Zdinx
+; -----------------------------------------------------------------------------
+
+define double @select_f64_fcmp(double %a, double %b, double %c, double %d) nounwind {
+; RV64ZDINX_ZICOND-LABEL: select_f64_fcmp:
+; RV64ZDINX_ZICOND: czero
+; RV64ZDINX_ZICOND: czero
+; RV64ZDINX_ZICOND: or
+; RV64ZDINX_ZICOND-NOT: b{{(eq|ne)z?}}
+; RV64ZDINX_ZICOND: ret
+
+; RV64ZDINX_NOZICOND-LABEL: select_f64_fcmp:
+; RV64ZDINX_NOZICOND: b{{(eq|ne)z?}}
+; RV64ZDINX_NOZICOND-NOT: czero.eqz
+; RV64ZDINX_NOZICOND-NOT: czero.nez
+
+; RV64D-LABEL: select_f64_fcmp:
+; RV64D: b{{(eq|ne)z?}}
+; RV64D-NOT: czero.eqz
+; RV64D-NOT: czero.nez
+
+entry:
+  %cmp = fcmp ogt double %a, %b
+  %sel = select i1 %cmp, double %c, double %d
+  ret double %sel
+}
\ No newline at end of file

@github-actions
Copy link

github-actions bot commented Nov 24, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@fennecJ
Copy link
Contributor Author

fennecJ commented Nov 24, 2025

Hi, this PR implements an optimization for Zicond combined with Zfinx/Zdinx.

Could you help review this? Thanks in advance!

cc @topperc

@fennecJ fennecJ requested a review from topperc November 24, 2025 19:09
@@ -0,0 +1,159 @@
; RUN: llc -mtriple=riscv64 -mattr=+zfinx,+zicond -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64ZFINX_ZICOND
Copy link
Collaborator

@topperc topperc Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped some RUN variants so the autogenerated checks stay readable, while still covering the key Zfinx/Zdinx/Zfinx + Zicond cases.

@fennecJ fennecJ requested a review from topperc November 24, 2025 20:09
@fennecJ
Copy link
Contributor Author

fennecJ commented Nov 24, 2025

For the RV32ZFINX_ZICOND RUN of select_f64_fcmp, I noticed several extra lw/sw instructions appear, so this transform might not actually be a win there. I’m still looking into how to handle that configuration better.

On second thought, this happens because the target extensions don’t provide real double support, so double compare is lowered via soft-float/libcalls. I also tried an F-only configuration and still see the same lw/sw spills, so this codegen is expected in this setup.

Copy link
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you add an rv32zdinx RUN line? It would be useful to see if/how this works for 64-bit values with that combination.

@fennecJ
Copy link
Contributor Author

fennecJ commented Nov 25, 2025

Please can you add an rv32zdinx RUN line? It would be useful to see if/how this works for 64-bit values with that combination.

Thanks for the suggestion! It was actually very helpful because adding the rv32zdinx RUN line revealed a crash when handling f64 values on RV32.

I've updated the implementation to handle f64 on 32-bit targets by splitting it into two 32-bit integer selects. The RUN line has been added and verifies the fix.

However, I have a concern regarding the performance cost for this specific case: While the static code size difference is negligible (7 vs 8 instructions), the dynamic instruction count is a regression.

define double @select_f64_fcmp(double %a, double %b, double %c, double %d) nounwind {
entry:
  %cmp = fcmp ogt double %a, %b
  %sel = select i1 %cmp, double %c, double %d
  ret double %sel
}

Branch version: Executes 5 or 7 instructions

; RV32ZDINX_NOZICOND-LABEL: select_f64_fcmp:
; RV32ZDINX_NOZICOND:       # %bb.0: # %entry
; RV32ZDINX_NOZICOND-NEXT:    flt.d a0, a2, a0
; RV32ZDINX_NOZICOND-NEXT:    bnez a0, .LBB2_2
; RV32ZDINX_NOZICOND-NEXT:  # %bb.1: # %entry
; RV32ZDINX_NOZICOND-NEXT:    mv a4, a6
; RV32ZDINX_NOZICOND-NEXT:    mv a5, a7
; RV32ZDINX_NOZICOND-NEXT:  .LBB2_2: # %entry
; RV32ZDINX_NOZICOND-NEXT:    mv a0, a4
; RV32ZDINX_NOZICOND-NEXT:    mv a1, a5
; RV32ZDINX_NOZICOND-NEXT:    ret

Zicond version: Always executes 8 instructions.

; RV32ZDINX_ZICOND-LABEL: select_f64_fcmp:
; RV32ZDINX_ZICOND:       # %bb.0: # %entry
; RV32ZDINX_ZICOND-NEXT:    flt.d a0, a2, a0
; RV32ZDINX_ZICOND-NEXT:    czero.nez a1, a6, a0
; RV32ZDINX_ZICOND-NEXT:    czero.eqz a2, a4, a0
; RV32ZDINX_ZICOND-NEXT:    czero.nez a3, a7, a0
; RV32ZDINX_ZICOND-NEXT:    czero.eqz a4, a5, a0
; RV32ZDINX_ZICOND-NEXT:    or a0, a2, a1
; RV32ZDINX_ZICOND-NEXT:    or a1, a4, a3
; RV32ZDINX_ZICOND-NEXT:    ret

Should we still prefer the Zicond transformation here, or should I disable it for f64 on RV32?

@fennecJ fennecJ requested a review from lenary November 25, 2025 04:01
@lenary
Copy link
Member

lenary commented Nov 25, 2025

Your concerns are right, and that's partly what i wanted to see with the testing. I'm glad you also caught the crash :)

The branch version can also make further savings in longer sequences, where the final pair of mvs might not need to be emitted.

I think you should probably skip this optimisation for rv32_zdinx.

Aside: One thing we're pretty useless at with rv32_zdinx is using FSGNJ.D to move pairs of registers, as your examples show

@fennecJ
Copy link
Contributor Author

fennecJ commented Nov 25, 2025

I have updated the selection logic to skip cases requiring register splitting (e.g., f64 on RV32).

Consequently, the output for select_f64_fcmp

define double @select_f64_fcmp(double %a, double %b, double %c, double %d) nounwind {
entry:
  %cmp = fcmp ogt double %a, %b
  %sel = select i1 %cmp, double %c, double %d
  ret double %sel
}

is now identical for both RV32ZDINX_ZICOND and RV32ZDINX_NOZICOND:

; RV32ZDINX_ZICOND-LABEL: select_f64_fcmp:
; RV32ZDINX_ZICOND:       # %bb.0: # %entry
; RV32ZDINX_ZICOND-NEXT:    flt.d a0, a2, a0
; RV32ZDINX_ZICOND-NEXT:    bnez a0, .LBB2_2
; RV32ZDINX_ZICOND-NEXT:  # %bb.1: # %entry
; RV32ZDINX_ZICOND-NEXT:    mv a4, a6
; RV32ZDINX_ZICOND-NEXT:    mv a5, a7
; RV32ZDINX_ZICOND-NEXT:  .LBB2_2: # %entry
; RV32ZDINX_ZICOND-NEXT:    mv a0, a4
; RV32ZDINX_ZICOND-NEXT:    mv a1, a5
; RV32ZDINX_ZICOND-NEXT:    ret
;
; RV32ZDINX_NOZICOND-LABEL: select_f64_fcmp:
; RV32ZDINX_NOZICOND:       # %bb.0: # %entry
; RV32ZDINX_NOZICOND-NEXT:    flt.d a0, a2, a0
; RV32ZDINX_NOZICOND-NEXT:    bnez a0, .LBB2_2
; RV32ZDINX_NOZICOND-NEXT:  # %bb.1: # %entry
; RV32ZDINX_NOZICOND-NEXT:    mv a4, a6
; RV32ZDINX_NOZICOND-NEXT:    mv a5, a7
; RV32ZDINX_NOZICOND-NEXT:  .LBB2_2: # %entry
; RV32ZDINX_NOZICOND-NEXT:    mv a0, a4
; RV32ZDINX_NOZICOND-NEXT:    mv a1, a5
; RV32ZDINX_NOZICOND-NEXT:    ret

Note: For RV32ZFINX_ZICOND, the czero instructions seem to be pre-existing/unrelated, as they remain even if I remove the whole FP ISel logic.

; RV32ZFINX_ZICOND-LABEL: select_f64_fcmp:
; RV32ZFINX_ZICOND:       # %bb.0: # %entry
; RV32ZFINX_ZICOND-NEXT:    addi sp, sp, -32
; RV32ZFINX_ZICOND-NEXT:    sw ra, 28(sp) # 4-byte Folded Spill
; RV32ZFINX_ZICOND-NEXT:    sw s0, 24(sp) # 4-byte Folded Spill
; RV32ZFINX_ZICOND-NEXT:    sw s1, 20(sp) # 4-byte Folded Spill
; RV32ZFINX_ZICOND-NEXT:    sw s2, 16(sp) # 4-byte Folded Spill
; RV32ZFINX_ZICOND-NEXT:    sw s3, 12(sp) # 4-byte Folded Spill
; RV32ZFINX_ZICOND-NEXT:    mv s0, a7
; RV32ZFINX_ZICOND-NEXT:    mv s1, a6
; RV32ZFINX_ZICOND-NEXT:    mv s2, a5
; RV32ZFINX_ZICOND-NEXT:    mv s3, a4
; RV32ZFINX_ZICOND-NEXT:    call __gtdf2
; RV32ZFINX_ZICOND-NEXT:    sgtz a0, a0
; RV32ZFINX_ZICOND-NEXT:    czero.nez a1, s1, a0
; RV32ZFINX_ZICOND-NEXT:    czero.eqz a2, s3, a0
; RV32ZFINX_ZICOND-NEXT:    czero.nez a3, s0, a0
; RV32ZFINX_ZICOND-NEXT:    czero.eqz a4, s2, a0
; RV32ZFINX_ZICOND-NEXT:    or a0, a2, a1
; RV32ZFINX_ZICOND-NEXT:    or a1, a4, a3
; RV32ZFINX_ZICOND-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
; RV32ZFINX_ZICOND-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
; RV32ZFINX_ZICOND-NEXT:    lw s1, 20(sp) # 4-byte Folded Reload
; RV32ZFINX_ZICOND-NEXT:    lw s2, 16(sp) # 4-byte Folded Reload
; RV32ZFINX_ZICOND-NEXT:    lw s3, 12(sp) # 4-byte Folded Reload
; RV32ZFINX_ZICOND-NEXT:    addi sp, sp, 32
; RV32ZFINX_ZICOND-NEXT:    ret

@fennecJ
Copy link
Contributor Author

fennecJ commented Nov 25, 2025

I noticed that when lowering a floating-point SELECT to integer Zicond
operations on RV64, converting an f32 +0.0 results in a
RISCVISD::FMV_X_ANYEXTW_RV64 node. It seems that this target-specific
node represents a register move, which obscures the underlying const-zero
value from the instruction selector and prevents the backend from
pattern-matching it into a single czero instruction.

For the following example:

  define dso_local noundef float @select_i1_f32_0(i1 %cond, float %t) nounwind {
  entry:
    %sel = select i1 %cond, float %t, float 0.000000e+00
    ret float %sel
  }

On RV32 (e.g, RV32ZFINX_ZICOND, no need to call FMV_X), this previously resulted in:

    czero.eqz a0, a1, a0
    ret

However,
On RV64 (e.g., +zicond +zdinx), this previously resulted in:

    czero.nez  a2, zero, a0
    czero.eqz  a0, a1, a0
    or         a0, a2, a0
    ret

Since the "else" value is zero, we can utilize the mechanism that czero
like instruction will store zero into rd based on cond reg and optimized
this scenario in to a single instruction.
By explicitly detecting +0.0 and lowering it to a constant integer 0,
the newest commit enables the single czero generation for above example.

    czero.eqz  a0, a1, a0
    ret

@fennecJ fennecJ force-pushed the RISCV_SELECT_OPT branch 2 times, most recently from b1cad4d to 5778116 Compare November 25, 2025 15:45
@lenary
Copy link
Member

lenary commented Nov 25, 2025

Note: For RV32ZFINX_ZICOND, the czero instructions seem to be pre-existing/unrelated, as they remain even if I remove the whole FP ISel logic.

I think this is probably because SDAG sees this whole thing as integer, and so splits the selects anyway. If we were to address this, it's probably better in a follow-up.

Comment on lines 9575 to 9578
if (auto *CFP = dyn_cast<ConstantFPSDNode>(V)) {
if (CFP->isZero() && !CFP->isNegative())
return DAG.getConstant(0, DL, XLenIntVT);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just use

Suggested change
if (auto *CFP = dyn_cast<ConstantFPSDNode>(V)) {
if (CFP->isZero() && !CFP->isNegative())
return DAG.getConstant(0, DL, XLenIntVT);
}
if (isNullFPConstant(V))
return DAG.getConstant(0, DL, XLenIntVT);

Though I do think we might need to be careful that XLenIntVT is the same width as VT

Copy link
Contributor Author

@fennecJ fennecJ Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I will update the code to use isNullFPConstant(V) .
Regarding the caution about XLenIntVT vs VT width: One of my concerns was whether this would break the Nan-boxing semantics on RV64 with f16, where the type width < XLen .

To verify this, I added the select_i1_half_0 test case. The results confirm that the backend still generates the correct Nan-boxing sequence. (The test is appended in newest commit)

define dso_local noundef half @select_i1_half_0(i1 %cond, half %val) nounwind {
entry:
  %sel = select i1 %cond, half %val, half 0xH0000
  ret half %sel
}
; RV64ZDINX_ZICOND:
select_i1_half_0:
# %bb.0: # %entry
  # kill: def $x11_w killed $x11_w def $x11
  andi a0, a0, 1
  czero.eqz a0, a1, a0
  lui a1, 1048560
  or a0, a0, a1
  # kill: def $x10_w killed $x10_w killed $x10
  ret

This is guaranteed by RISCVTargetLowering::splitValueIntoRegisterParts, which explicitly applies the boxing mask (0xFFFF0000) when the type is restored and lowered as an ABI copy:

  if (IsABIRegCopy && (ValueVT == MVT::f16 || ValueVT == MVT::bf16) &&
      PartVT == MVT::f32) {
    // Cast the [b]f16 to i16, extend to i32, pad with ones to make a float
    // nan, and cast to f32.
    Val = DAG.getNode(ISD::BITCAST, DL, MVT::i16, Val);
    Val = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, Val);
    Val = DAG.getNode(ISD::OR, DL, MVT::i32, Val,
                      DAG.getConstant(0xFFFF0000, DL, MVT::i32));
    Val = DAG.getNode(ISD::BITCAST, DL, PartVT, Val);
    Parts[0] = Val;
    return true;
  }

I am currently researching if there are any other possible issues regarding the width mismatch.
Any advice is welcome : )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens for the following with zhinx:

define half @select_i1_half_0_add(i1 %cond, half %val) nounwind {
entry:
  %sel = select i1 %cond, half %val, half 0xH0000
  %add = fadd half %sel, 1.0
  ret half %add
}

Is the operand of the fadd.h correctly nan-boxed? (This wouldn't be dealt with in splitValueIntoRegisterParts because it's in the same basic block as the select).

You potentially have the same problem with fadd.s on rv64, for which I think you can make/adapt a testcase based on the above.

It may be simplest to push the isNullFPConstant check down to just above the getBitcast if these two examples are not correctly nan-boxed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried your select_i1_half_0_add case, and the result is correctly Nan-boxed at the return, I'll append the test result to newest commit.

select_i1_half_0_add:
# %bb.0: # %entry
  addi sp, sp, -16
  sd ra, 8(sp) # 8-byte Folded Spill
  # kill: def $x11_w killed $x11_w def $x11
  andi a0, a0, 1
  czero.eqz a0, a1, a0
  # kill: def $x10_w killed $x10_w killed $x10
  call __extendhfsf2
  lui a1, 260096
  fadd.s a0, a0, a1
  call __truncsfhf2
  # kill: def $x10_w killed $x10_w def $x10
  lui a1, 1048560       # a1= 0xFFFF0000
  or a0, a0, a1         # nan-boxing !!
  # kill: def $x10_w killed $x10_w killed $x10
  ld ra, 8(sp) # 8-byte Folded Reload
  addi sp, sp, 16
  ret

Regarding your question about whether this optimization results in incorrect Nan-boxing:

Actually, I initially shared the same assumption that explicit Nan-boxing was required here. However, upon double-checking the Zfinx/Zdinx/Zhinx specification[1], I discovered that Nan-boxing is not required for operands in these extensions. Instead, they rely on Sign-Extension.

According to Section 1 (Processing of Narrower Values):

"Floating-point operands of width w < XLEN bits occupy bits w-1:0 of an x register. Floating-point operations on w-bit operands ignore operand bits XLEN-1: w."
"Floating-point operations that produce w < XLEN-bit results fill bits XLEN-1: w with copies of bit w-1 (the sign bit)."

Since my optimization targets +0.0 (sign bit is 0), the resulting all-zero integer effectively implements the required Sign-Extension. It is safe for input operands (as high bits are ignored) and valid as a result.

As for the ABI[2], the clause regarding Nan-boxing specifies:

"When a floating-point argument narrower than FLEN bits is passed in a floating-point register, it is 1-extended (NaN-boxed) to FLEN bits."

Since Zfinx has no floating-point registers (it uses x registers), and the Unpriv spec mandates sign-extension for x registers, providing a zero-extended 0 is safe. Even if specific calling conventions require boxing at boundaries, the backend's splitValueIntoRegisterParts handles that explicitly.

Therefore, I believe keeping the +0.0 optimization at the current location is safe and yields the best code generation (single czero).

[1] - riscv unpriv zfinx
[2] - riscv-elf-psabi-doc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fantastic, I missed that note in the spec, and figured it used the regular NaN-boxing scheme for F/D/H.

Ok, cool, I'm happy with this

@fennecJ fennecJ requested review from lenary and topperc November 26, 2025 07:24
@github-actions
Copy link

github-actions bot commented Nov 26, 2025

🐧 Linux x64 Test Results

  • 186633 tests passed
  • 4888 tests skipped

When Zfinx or Zdinx is enabled, FP values live in the integer register
file, so there is no GPR<->FPR move cost. In this configuration we can
profitably lower floating-point `select` nodes through an integer
`ISD::SELECT` so that the existing Zicond patterns generate branchless
czero/cmov sequences instead of control-flow branches.

This covers patterns such as:

  // Case 1: integer condition
  float sel(int cond, float a, float b) {
    return cond ? a : b;
  }

  // Case 2: floating-point condition
  float cmp_sel(float a, float b, float c, float d) {
    return (a > b) ? c : d;
  }

In Case 1 we bitcast the FP operands to an XLen integer type, form an
integer `select` with the original integer condition `cond`, and then
bitcast the result back to float, allowing it to be lowered to a
Zicond-based cmov/czero sequence.

In Case 2 the floating-point compare is already lowered to an integer
0/1 value in a GPR (e.g. `flt.s` / `fle.s` / `feq.s`), so we can reuse
that integer result as the Zicond condition and apply the same scheme.

The transformation is gated on both Zicond and Zfinx/Zdinx, so classic
F/D implementations (where FP values live in a separate FP register
file) continue to use the existing branch-based lowering and do not pay
for extra GPR<->FPR moves. FP semantics are unchanged: we only
reinterpret the FP operands as integers; no numeric conversions are
introduced.
* Floating-point `select` is lowered through Zicond only when both
  Zicond and Zfinx/Zdinx extensions are enabled.
* When both Zicond and Zfinx/Zdinx are enabled, branch-based lowering
  is replaced by a Zicond-backed integer `select` (no control-flow
  branches are emitted for these cases).
Zdinx implies Zfinx so it's sufficient to check only Zfinx here.
* CondV should always be int, no need to check
* Omit the if statement bracket when there is only oneline body
When target machine has only 32 bits, we need to split 64bit f64 VT into
2 32 bits reg for selection.
This patch disables the Zicond optimization for floating-point selects
when the value type exceeds XLen (e.g., f64 on RV32 +zdinx).

In these cases, using Zicond requires handling split registers, which
results in a higher dynamic instruction count compared to the standard
branch-based lowering (e.g., ~8 instructions vs ~5-7 instructions for
appended sample code).
Thus, there is no benefit to using Zicond here

```asm
define double @select_f64_fcmp(double %a, double %b,
  double %c, double %d) nounwind {
entry:
  %cmp = fcmp ogt double %a, %b
  %sel = select i1 %cmp, double %c, double %d
  ret double %sel
}
```

Branch version: Executes 5 or 7 instruction

```asm
; RV32ZDINX_NOZICOND-LABEL: select_f64_fcmp:
; RV32ZDINX_NOZICOND:       # %bb.0: # %entry
; RV32ZDINX_NOZICOND-NEXT:    flt.d a0, a2, a0
; RV32ZDINX_NOZICOND-NEXT:    bnez a0, .LBB2_2
; RV32ZDINX_NOZICOND-NEXT:  # %bb.1: # %entry
; RV32ZDINX_NOZICOND-NEXT:    mv a4, a6
; RV32ZDINX_NOZICOND-NEXT:    mv a5, a7
; RV32ZDINX_NOZICOND-NEXT:  .LBB2_2: # %entry
; RV32ZDINX_NOZICOND-NEXT:    mv a0, a4
; RV32ZDINX_NOZICOND-NEXT:    mv a1, a5
; RV32ZDINX_NOZICOND-NEXT:    ret
```

Zicond version: Always executes 8 instructions.

```asm
; RV32ZDINX_ZICOND-LABEL: select_f64_fcmp:
; RV32ZDINX_ZICOND:       # %bb.0: # %entry
; RV32ZDINX_ZICOND-NEXT:    flt.d a0, a2, a0
; RV32ZDINX_ZICOND-NEXT:    czero.nez a1, a6, a0
; RV32ZDINX_ZICOND-NEXT:    czero.eqz a2, a4, a0
; RV32ZDINX_ZICOND-NEXT:    czero.nez a3, a7, a0
; RV32ZDINX_ZICOND-NEXT:    czero.eqz a4, a5, a0
; RV32ZDINX_ZICOND-NEXT:    or a0, a2, a1
; RV32ZDINX_ZICOND-NEXT:    or a1, a4, a3
; RV32ZDINX_ZICOND-NEXT:    ret
```
When lowering a floating-point SELECT to integer Zicond operations on
RV64, converting an f32 +0.0 results in a RISCVISD::FMV_X_ANYEXTW_RV64
node. This target-specific node implies a register move somehow obscures
the underlying constant zero value from the instruction selector,
preventing the backend from pattern-matching a single `czero` instruction.

For the following example:
```asm
  define dso_local noundef float @select_i1_f32_0(i1 %cond, float %t) nounwind {
  entry:
    %sel = select i1 %cond, float %t, float 0.000000e+00
    ret float %sel
  }
```

On RV64 (e.g., +zicond +zdinx), this previously resulted in:
```asm
    czero.nez  a2, zero, a0
    czero.eqz  a0, a1, a0
    or         a0, a2, a0
    ret
```

Since the "else" value is zero, we can utilize the mechanism that czero
like instruction will store zero into rd based on cond reg and optimized
this scenario in to a single instruction.
By explicitly detecting `+0.0` and lowering it to a constant integer 0,
this commit enables the generation of:
```asm
    czero.eqz  a0, a1, a0
    ret
```
@fennecJ
Copy link
Contributor Author

fennecJ commented Nov 26, 2025

It seems like the CI failure is caused by some newer commits in the main branch. I will rebase and update the test checks later.

Copy link
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

We can reuse XLenVT
@fennecJ fennecJ requested a review from tclin914 November 27, 2025 03:45
Copy link
Contributor

@tclin914 tclin914 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fennecJ
Copy link
Contributor Author

fennecJ commented Nov 27, 2025

Thanks everyone for the review! I really learned a lot from this process. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants