-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[RISCV][ISelLowering] Use Zicond for FP selects on Zfinx/Zdinx #169299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-backend-risc-v Author: None (fennecJ) ChangesSummaryThis patch let RISCVTargetLowering::lowerSELECT to lower some floating-point select operations through an integer zicond select when:
In that scenario there is no extra cost for GPR <-> "FP GPR" moves, so we can implement FP selects with a CZERO-based sequence instead of a branch. For example, for float foo(int cond, float x) {
return (cond != 0) ? x : 0.0f;
}the current lowering produces: foo:
mv a2, a0
li a0, 0
beqz a2, .LBB0_2
.LBB0_1:
mv a0, a1
.LBB0_2:
retWith this patch, when targeting rv64ima_zicond_zfinx we instead get: foo:
czero.nez a2, zero, a0
czero.eqz a0, a1, a0
or a0, a2, a0
retThe existing branch-based lowering is preserved for:
TestingAdds llvm/test/CodeGen/RISCV/zicond-fp-select-zfinx.ll to cover:
Also adds baseline RV32F/RV64F/RV64D cases to ensure we still use branches when FP registers are separate. The tests check that:
Full diff: https://github.com/llvm/llvm-project/pull/169299.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 19a16197272fe..e01d320e7e1ec 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -9555,6 +9555,32 @@ SDValue RISCVTargetLowering::lowerSELECT(SDValue Op, SelectionDAG &DAG) const {
if (SDValue V = lowerSelectToBinOp(Op.getNode(), DAG, Subtarget))
return V;
+ // When there is no cost for GPR <-> FGPR, we can use zicond select for floating value when CondV is int type
+ bool FPinGPR = Subtarget.hasStdExtZfinx() || Subtarget.hasStdExtZdinx();
+ bool UseZicondForFPSel = Subtarget.hasStdExtZicond() && FPinGPR && VT.isFloatingPoint() && CondV.getValueType().isInteger();
+ if (UseZicondForFPSel) {
+ MVT XLenIntVT = Subtarget.getXLenVT();
+
+ auto CastToInt = [&](SDValue V) -> SDValue {
+ if (VT == MVT::f32 && Subtarget.is64Bit()) {
+ return DAG.getNode(RISCVISD::FMV_X_ANYEXTW_RV64, DL, XLenIntVT, V);
+ }
+ return DAG.getBitcast(XLenIntVT, V);
+ };
+
+ SDValue TrueVInt = CastToInt(TrueV);
+ SDValue FalseVInt = CastToInt(FalseV);
+
+ // Emit integer SELECT (lowers to Zicond)
+ SDValue ResultInt = DAG.getNode(ISD::SELECT, DL, XLenIntVT, CondV, TrueVInt, FalseVInt);
+
+ // Convert back to floating VT
+ if (VT == MVT::f32 && Subtarget.is64Bit()) {
+ return DAG.getNode(RISCVISD::FMV_W_X_RV64, DL, VT, ResultInt);
+ }
+ return DAG.getBitcast(VT, ResultInt);
+ }
+
// When Zicond or XVentanaCondOps is present, emit CZERO_EQZ and CZERO_NEZ
// nodes to implement the SELECT. Performing the lowering here allows for
// greater control over when CZERO_{EQZ/NEZ} are used vs another branchless
diff --git a/llvm/test/CodeGen/RISCV/zicond-fp-select-zfinx.ll b/llvm/test/CodeGen/RISCV/zicond-fp-select-zfinx.ll
new file mode 100644
index 0000000000000..ddce3752584c4
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/zicond-fp-select-zfinx.ll
@@ -0,0 +1,115 @@
+; RUN: llc -mtriple=riscv64 -mattr=+zfinx,+zicond -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64ZFINX_ZICOND
+; RUN: llc -mtriple=riscv64 -mattr=+zfinx -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64ZFINX_NOZICOND
+; RUN: llc -mtriple=riscv64 -mattr=+zdinx,+zicond -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64ZDINX_ZICOND
+; RUN: llc -mtriple=riscv64 -mattr=+zdinx -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64ZDINX_NOZICOND
+; RUN: llc -mtriple=riscv64 -mattr=+f -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64F
+; RUN: llc -mtriple=riscv64 -mattr=+d -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64D
+
+; RUN: llc -mtriple=riscv32 -mattr=+zfinx,+zicond -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV32ZFINX_ZICOND
+; RUN: llc -mtriple=riscv32 -mattr=+zfinx -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV32ZFINX_NOZICOND
+; RUN: llc -mtriple=riscv32 -mattr=+f -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV32F
+
+
+; This test checks that floating-point SELECT is lowered through integer
+; SELECT (and thus to Zicond czero.* sequence) when FP values live in GPRs
+; (Zfinx/Zdinx) and Zicond is enabled. When Zicond is disabled, we expect
+; a branch-based lowering instead.
+
+; -----------------------------------------------------------------------------
+; float select with i1 condition (Zfinx)
+; -----------------------------------------------------------------------------
+
+define float @select_f32_i1(i1 %cond, float %t, float %f) nounwind {
+; RV64ZFINX_ZICOND-LABEL: select_f32_i1:
+; RV64ZFINX_ZICOND: czero
+; RV64ZFINX_ZICOND: czero
+; RV64ZFINX_ZICOND: or
+; RV64ZFINX_ZICOND-NOT: b{{(eq|ne)z?}}
+; RV64ZFINX_ZICOND: ret
+
+; RV64ZFINX_NOZICOND-LABEL: select_f32_i1:
+; RV64ZFINX_NOZICOND: b{{(eq|ne)z?}}
+; RV64ZFINX_NOZICOND-NOT: czero.eqz
+; RV64ZFINX_NOZICOND-NOT: czero.nez
+
+; RV64F-LABEL: select_f32_i1:
+; RV64F: b{{(eq|ne)z?}}
+; RV64F-NOT: czero.eqz
+; RV64F-NOT: czero.nez
+
+; RV32ZFINX_ZICOND-LABEL: select_f32_i1:
+; RV32ZFINX_ZICOND: czero
+; RV32ZFINX_ZICOND: czero
+; RV32ZFINX_ZICOND: or
+; RV32ZFINX_ZICOND-NOT: b{{(eq|ne)z?}}
+; RV32ZFINX_ZICOND: ret
+
+; RV32ZFINX_NOZICOND-LABEL: select_f32_i1:
+; RV32ZFINX_NOZICOND: b{{(eq|ne)z?}}
+; RV32ZFINX_NOZICOND-NOT: czero.eqz
+; RV32ZFINX_NOZICOND-NOT: czero.nez
+
+; RV32F-LABEL: select_f32_i1:
+; RV32F: b{{(eq|ne)z?}}
+; RV32F-NOT: czero.eqz
+; RV32F-NOT: czero.nez
+
+entry:
+ %sel = select i1 %cond, float %t, float %f
+ ret float %sel
+}
+
+; -----------------------------------------------------------------------------
+; double select with i1 condition (Zdinx)
+; -----------------------------------------------------------------------------
+
+define double @select_f64_i1(i1 %cond, double %t, double %f) nounwind {
+; RV64ZDINX_ZICOND-LABEL: select_f64_i1:
+; RV64ZDINX_ZICOND: czero
+; RV64ZDINX_ZICOND: czero
+; RV64ZDINX_ZICOND: or
+; RV64ZDINX_ZICOND-NOT: b{{(eq|ne)z?}}
+; RV64ZDINX_ZICOND: ret
+
+; RV64ZDINX_NOZICOND-LABEL: select_f64_i1:
+; RV64ZDINX_NOZICOND: b{{(eq|ne)z?}}
+; RV64ZDINX_NOZICOND-NOT: czero.eqz
+; RV64ZDINX_NOZICOND-NOT: czero.nez
+
+; RV64D-LABEL: select_f64_i1:
+; RV64D: b{{(eq|ne)z?}}
+; RV64D-NOT: czero.eqz
+; RV64D-NOT: czero.nez
+
+entry:
+ %sel = select i1 %cond, double %t, double %f
+ ret double %sel
+}
+
+; -----------------------------------------------------------------------------
+; double select with floating-point compare condition (a > b ? c : d), Zdinx
+; -----------------------------------------------------------------------------
+
+define double @select_f64_fcmp(double %a, double %b, double %c, double %d) nounwind {
+; RV64ZDINX_ZICOND-LABEL: select_f64_fcmp:
+; RV64ZDINX_ZICOND: czero
+; RV64ZDINX_ZICOND: czero
+; RV64ZDINX_ZICOND: or
+; RV64ZDINX_ZICOND-NOT: b{{(eq|ne)z?}}
+; RV64ZDINX_ZICOND: ret
+
+; RV64ZDINX_NOZICOND-LABEL: select_f64_fcmp:
+; RV64ZDINX_NOZICOND: b{{(eq|ne)z?}}
+; RV64ZDINX_NOZICOND-NOT: czero.eqz
+; RV64ZDINX_NOZICOND-NOT: czero.nez
+
+; RV64D-LABEL: select_f64_fcmp:
+; RV64D: b{{(eq|ne)z?}}
+; RV64D-NOT: czero.eqz
+; RV64D-NOT: czero.nez
+
+entry:
+ %cmp = fcmp ogt double %a, %b
+ %sel = select i1 %cmp, double %c, double %d
+ ret double %sel
+}
\ No newline at end of file
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
865ee32 to
fe46a46
Compare
|
Hi, this PR implements an optimization for Zicond combined with Zfinx/Zdinx. Could you help review this? Thanks in advance! cc @topperc |
| @@ -0,0 +1,159 @@ | |||
| ; RUN: llc -mtriple=riscv64 -mattr=+zfinx,+zicond -verify-machineinstrs < %s | FileCheck %s --check-prefix=RV64ZFINX_ZICOND | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the update_llc_test_checks.py script. https://llvm.org/docs/TestingGuide.html#generating-assertions-in-regression-tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped some RUN variants so the autogenerated checks stay readable, while still covering the key Zfinx/Zdinx/Zfinx + Zicond cases.
5f91d26 to
bf53e56
Compare
|
For the RV32ZFINX_ZICOND RUN of select_f64_fcmp, I noticed several extra lw/sw instructions appear, so this transform might not actually be a win there. I’m still looking into how to handle that configuration better. On second thought, this happens because the target extensions don’t provide real double support, so double compare is lowered via soft-float/libcalls. I also tried an F-only configuration and still see the same lw/sw spills, so this codegen is expected in this setup. |
lenary
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can you add an rv32zdinx RUN line? It would be useful to see if/how this works for 64-bit values with that combination.
Thanks for the suggestion! It was actually very helpful because adding the rv32zdinx RUN line revealed a crash when handling f64 values on RV32. I've updated the implementation to handle f64 on 32-bit targets by splitting it into two 32-bit integer selects. The RUN line has been added and verifies the fix. However, I have a concern regarding the performance cost for this specific case: While the static code size difference is negligible (7 vs 8 instructions), the dynamic instruction count is a regression. define double @select_f64_fcmp(double %a, double %b, double %c, double %d) nounwind {
entry:
%cmp = fcmp ogt double %a, %b
%sel = select i1 %cmp, double %c, double %d
ret double %sel
}Branch version: Executes 5 or 7 instructions ; RV32ZDINX_NOZICOND-LABEL: select_f64_fcmp:
; RV32ZDINX_NOZICOND: # %bb.0: # %entry
; RV32ZDINX_NOZICOND-NEXT: flt.d a0, a2, a0
; RV32ZDINX_NOZICOND-NEXT: bnez a0, .LBB2_2
; RV32ZDINX_NOZICOND-NEXT: # %bb.1: # %entry
; RV32ZDINX_NOZICOND-NEXT: mv a4, a6
; RV32ZDINX_NOZICOND-NEXT: mv a5, a7
; RV32ZDINX_NOZICOND-NEXT: .LBB2_2: # %entry
; RV32ZDINX_NOZICOND-NEXT: mv a0, a4
; RV32ZDINX_NOZICOND-NEXT: mv a1, a5
; RV32ZDINX_NOZICOND-NEXT: retZicond version: Always executes 8 instructions. ; RV32ZDINX_ZICOND-LABEL: select_f64_fcmp:
; RV32ZDINX_ZICOND: # %bb.0: # %entry
; RV32ZDINX_ZICOND-NEXT: flt.d a0, a2, a0
; RV32ZDINX_ZICOND-NEXT: czero.nez a1, a6, a0
; RV32ZDINX_ZICOND-NEXT: czero.eqz a2, a4, a0
; RV32ZDINX_ZICOND-NEXT: czero.nez a3, a7, a0
; RV32ZDINX_ZICOND-NEXT: czero.eqz a4, a5, a0
; RV32ZDINX_ZICOND-NEXT: or a0, a2, a1
; RV32ZDINX_ZICOND-NEXT: or a1, a4, a3
; RV32ZDINX_ZICOND-NEXT: retShould we still prefer the Zicond transformation here, or should I disable it for f64 on RV32? |
|
Your concerns are right, and that's partly what i wanted to see with the testing. I'm glad you also caught the crash :) The branch version can also make further savings in longer sequences, where the final pair of I think you should probably skip this optimisation for rv32_zdinx. Aside: One thing we're pretty useless at with rv32_zdinx is using |
|
I have updated the selection logic to skip cases requiring register splitting (e.g., f64 on RV32). Consequently, the output for select_f64_fcmp define double @select_f64_fcmp(double %a, double %b, double %c, double %d) nounwind {
entry:
%cmp = fcmp ogt double %a, %b
%sel = select i1 %cmp, double %c, double %d
ret double %sel
}is now identical for both ; RV32ZDINX_ZICOND-LABEL: select_f64_fcmp:
; RV32ZDINX_ZICOND: # %bb.0: # %entry
; RV32ZDINX_ZICOND-NEXT: flt.d a0, a2, a0
; RV32ZDINX_ZICOND-NEXT: bnez a0, .LBB2_2
; RV32ZDINX_ZICOND-NEXT: # %bb.1: # %entry
; RV32ZDINX_ZICOND-NEXT: mv a4, a6
; RV32ZDINX_ZICOND-NEXT: mv a5, a7
; RV32ZDINX_ZICOND-NEXT: .LBB2_2: # %entry
; RV32ZDINX_ZICOND-NEXT: mv a0, a4
; RV32ZDINX_ZICOND-NEXT: mv a1, a5
; RV32ZDINX_ZICOND-NEXT: ret
;
; RV32ZDINX_NOZICOND-LABEL: select_f64_fcmp:
; RV32ZDINX_NOZICOND: # %bb.0: # %entry
; RV32ZDINX_NOZICOND-NEXT: flt.d a0, a2, a0
; RV32ZDINX_NOZICOND-NEXT: bnez a0, .LBB2_2
; RV32ZDINX_NOZICOND-NEXT: # %bb.1: # %entry
; RV32ZDINX_NOZICOND-NEXT: mv a4, a6
; RV32ZDINX_NOZICOND-NEXT: mv a5, a7
; RV32ZDINX_NOZICOND-NEXT: .LBB2_2: # %entry
; RV32ZDINX_NOZICOND-NEXT: mv a0, a4
; RV32ZDINX_NOZICOND-NEXT: mv a1, a5
; RV32ZDINX_NOZICOND-NEXT: retNote: For |
|
I noticed that when lowering a floating-point SELECT to integer Zicond For the following example: define dso_local noundef float @select_i1_f32_0(i1 %cond, float %t) nounwind {
entry:
%sel = select i1 %cond, float %t, float 0.000000e+00
ret float %sel
}On RV32 (e.g, RV32ZFINX_ZICOND, no need to call FMV_X), this previously resulted in: czero.eqz a0, a1, a0
retHowever, czero.nez a2, zero, a0
czero.eqz a0, a1, a0
or a0, a2, a0
retSince the "else" value is zero, we can utilize the mechanism that czero czero.eqz a0, a1, a0
ret |
b1cad4d to
5778116
Compare
I think this is probably because SDAG sees this whole thing as integer, and so splits the selects anyway. If we were to address this, it's probably better in a follow-up. |
| if (auto *CFP = dyn_cast<ConstantFPSDNode>(V)) { | ||
| if (CFP->isZero() && !CFP->isNegative()) | ||
| return DAG.getConstant(0, DL, XLenIntVT); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just use
| if (auto *CFP = dyn_cast<ConstantFPSDNode>(V)) { | |
| if (CFP->isZero() && !CFP->isNegative()) | |
| return DAG.getConstant(0, DL, XLenIntVT); | |
| } | |
| if (isNullFPConstant(V)) | |
| return DAG.getConstant(0, DL, XLenIntVT); |
Though I do think we might need to be careful that XLenIntVT is the same width as VT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! I will update the code to use isNullFPConstant(V) .
Regarding the caution about XLenIntVT vs VT width: One of my concerns was whether this would break the Nan-boxing semantics on RV64 with f16, where the type width < XLen .
To verify this, I added the select_i1_half_0 test case. The results confirm that the backend still generates the correct Nan-boxing sequence. (The test is appended in newest commit)
define dso_local noundef half @select_i1_half_0(i1 %cond, half %val) nounwind {
entry:
%sel = select i1 %cond, half %val, half 0xH0000
ret half %sel
}; RV64ZDINX_ZICOND:
select_i1_half_0:
# %bb.0: # %entry
# kill: def $x11_w killed $x11_w def $x11
andi a0, a0, 1
czero.eqz a0, a1, a0
lui a1, 1048560
or a0, a0, a1
# kill: def $x10_w killed $x10_w killed $x10
retThis is guaranteed by RISCVTargetLowering::splitValueIntoRegisterParts, which explicitly applies the boxing mask (0xFFFF0000) when the type is restored and lowered as an ABI copy:
if (IsABIRegCopy && (ValueVT == MVT::f16 || ValueVT == MVT::bf16) &&
PartVT == MVT::f32) {
// Cast the [b]f16 to i16, extend to i32, pad with ones to make a float
// nan, and cast to f32.
Val = DAG.getNode(ISD::BITCAST, DL, MVT::i16, Val);
Val = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, Val);
Val = DAG.getNode(ISD::OR, DL, MVT::i32, Val,
DAG.getConstant(0xFFFF0000, DL, MVT::i32));
Val = DAG.getNode(ISD::BITCAST, DL, PartVT, Val);
Parts[0] = Val;
return true;
}I am currently researching if there are any other possible issues regarding the width mismatch.
Any advice is welcome : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens for the following with zhinx:
define half @select_i1_half_0_add(i1 %cond, half %val) nounwind {
entry:
%sel = select i1 %cond, half %val, half 0xH0000
%add = fadd half %sel, 1.0
ret half %add
}
Is the operand of the fadd.h correctly nan-boxed? (This wouldn't be dealt with in splitValueIntoRegisterParts because it's in the same basic block as the select).
You potentially have the same problem with fadd.s on rv64, for which I think you can make/adapt a testcase based on the above.
It may be simplest to push the isNullFPConstant check down to just above the getBitcast if these two examples are not correctly nan-boxed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried your select_i1_half_0_add case, and the result is correctly Nan-boxed at the return, I'll append the test result to newest commit.
select_i1_half_0_add:
# %bb.0: # %entry
addi sp, sp, -16
sd ra, 8(sp) # 8-byte Folded Spill
# kill: def $x11_w killed $x11_w def $x11
andi a0, a0, 1
czero.eqz a0, a1, a0
# kill: def $x10_w killed $x10_w killed $x10
call __extendhfsf2
lui a1, 260096
fadd.s a0, a0, a1
call __truncsfhf2
# kill: def $x10_w killed $x10_w def $x10
lui a1, 1048560 # a1= 0xFFFF0000
or a0, a0, a1 # nan-boxing !!
# kill: def $x10_w killed $x10_w killed $x10
ld ra, 8(sp) # 8-byte Folded Reload
addi sp, sp, 16
retRegarding your question about whether this optimization results in incorrect Nan-boxing:
Actually, I initially shared the same assumption that explicit Nan-boxing was required here. However, upon double-checking the Zfinx/Zdinx/Zhinx specification[1], I discovered that Nan-boxing is not required for operands in these extensions. Instead, they rely on Sign-Extension.
According to Section 1 (Processing of Narrower Values):
"Floating-point operands of width w < XLEN bits occupy bits w-1:0 of an x register. Floating-point operations on w-bit operands ignore operand bits XLEN-1: w."
"Floating-point operations that produce w < XLEN-bit results fill bits XLEN-1: w with copies of bit w-1 (the sign bit)."
Since my optimization targets +0.0 (sign bit is 0), the resulting all-zero integer effectively implements the required Sign-Extension. It is safe for input operands (as high bits are ignored) and valid as a result.
As for the ABI[2], the clause regarding Nan-boxing specifies:
"When a floating-point argument narrower than FLEN bits is passed in a floating-point register, it is 1-extended (NaN-boxed) to FLEN bits."
Since Zfinx has no floating-point registers (it uses x registers), and the Unpriv spec mandates sign-extension for x registers, providing a zero-extended 0 is safe. Even if specific calling conventions require boxing at boundaries, the backend's splitValueIntoRegisterParts handles that explicitly.
Therefore, I believe keeping the +0.0 optimization at the current location is safe and yields the best code generation (single czero).
[1] - riscv unpriv zfinx
[2] - riscv-elf-psabi-doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, fantastic, I missed that note in the spec, and figured it used the regular NaN-boxing scheme for F/D/H.
Ok, cool, I'm happy with this
🐧 Linux x64 Test Results
|
When Zfinx or Zdinx is enabled, FP values live in the integer register
file, so there is no GPR<->FPR move cost. In this configuration we can
profitably lower floating-point `select` nodes through an integer
`ISD::SELECT` so that the existing Zicond patterns generate branchless
czero/cmov sequences instead of control-flow branches.
This covers patterns such as:
// Case 1: integer condition
float sel(int cond, float a, float b) {
return cond ? a : b;
}
// Case 2: floating-point condition
float cmp_sel(float a, float b, float c, float d) {
return (a > b) ? c : d;
}
In Case 1 we bitcast the FP operands to an XLen integer type, form an
integer `select` with the original integer condition `cond`, and then
bitcast the result back to float, allowing it to be lowered to a
Zicond-based cmov/czero sequence.
In Case 2 the floating-point compare is already lowered to an integer
0/1 value in a GPR (e.g. `flt.s` / `fle.s` / `feq.s`), so we can reuse
that integer result as the Zicond condition and apply the same scheme.
The transformation is gated on both Zicond and Zfinx/Zdinx, so classic
F/D implementations (where FP values live in a separate FP register
file) continue to use the existing branch-based lowering and do not pay
for extra GPR<->FPR moves. FP semantics are unchanged: we only
reinterpret the FP operands as integers; no numeric conversions are
introduced.
* Floating-point `select` is lowered through Zicond only when both Zicond and Zfinx/Zdinx extensions are enabled. * When both Zicond and Zfinx/Zdinx are enabled, branch-based lowering is replaced by a Zicond-backed integer `select` (no control-flow branches are emitted for these cases).
Zdinx implies Zfinx so it's sufficient to check only Zfinx here.
* CondV should always be int, no need to check * Omit the if statement bracket when there is only oneline body
When target machine has only 32 bits, we need to split 64bit f64 VT into 2 32 bits reg for selection.
This patch disables the Zicond optimization for floating-point selects
when the value type exceeds XLen (e.g., f64 on RV32 +zdinx).
In these cases, using Zicond requires handling split registers, which
results in a higher dynamic instruction count compared to the standard
branch-based lowering (e.g., ~8 instructions vs ~5-7 instructions for
appended sample code).
Thus, there is no benefit to using Zicond here
```asm
define double @select_f64_fcmp(double %a, double %b,
double %c, double %d) nounwind {
entry:
%cmp = fcmp ogt double %a, %b
%sel = select i1 %cmp, double %c, double %d
ret double %sel
}
```
Branch version: Executes 5 or 7 instruction
```asm
; RV32ZDINX_NOZICOND-LABEL: select_f64_fcmp:
; RV32ZDINX_NOZICOND: # %bb.0: # %entry
; RV32ZDINX_NOZICOND-NEXT: flt.d a0, a2, a0
; RV32ZDINX_NOZICOND-NEXT: bnez a0, .LBB2_2
; RV32ZDINX_NOZICOND-NEXT: # %bb.1: # %entry
; RV32ZDINX_NOZICOND-NEXT: mv a4, a6
; RV32ZDINX_NOZICOND-NEXT: mv a5, a7
; RV32ZDINX_NOZICOND-NEXT: .LBB2_2: # %entry
; RV32ZDINX_NOZICOND-NEXT: mv a0, a4
; RV32ZDINX_NOZICOND-NEXT: mv a1, a5
; RV32ZDINX_NOZICOND-NEXT: ret
```
Zicond version: Always executes 8 instructions.
```asm
; RV32ZDINX_ZICOND-LABEL: select_f64_fcmp:
; RV32ZDINX_ZICOND: # %bb.0: # %entry
; RV32ZDINX_ZICOND-NEXT: flt.d a0, a2, a0
; RV32ZDINX_ZICOND-NEXT: czero.nez a1, a6, a0
; RV32ZDINX_ZICOND-NEXT: czero.eqz a2, a4, a0
; RV32ZDINX_ZICOND-NEXT: czero.nez a3, a7, a0
; RV32ZDINX_ZICOND-NEXT: czero.eqz a4, a5, a0
; RV32ZDINX_ZICOND-NEXT: or a0, a2, a1
; RV32ZDINX_ZICOND-NEXT: or a1, a4, a3
; RV32ZDINX_ZICOND-NEXT: ret
```
When lowering a floating-point SELECT to integer Zicond operations on
RV64, converting an f32 +0.0 results in a RISCVISD::FMV_X_ANYEXTW_RV64
node. This target-specific node implies a register move somehow obscures
the underlying constant zero value from the instruction selector,
preventing the backend from pattern-matching a single `czero` instruction.
For the following example:
```asm
define dso_local noundef float @select_i1_f32_0(i1 %cond, float %t) nounwind {
entry:
%sel = select i1 %cond, float %t, float 0.000000e+00
ret float %sel
}
```
On RV64 (e.g., +zicond +zdinx), this previously resulted in:
```asm
czero.nez a2, zero, a0
czero.eqz a0, a1, a0
or a0, a2, a0
ret
```
Since the "else" value is zero, we can utilize the mechanism that czero
like instruction will store zero into rd based on cond reg and optimized
this scenario in to a single instruction.
By explicitly detecting `+0.0` and lowering it to a constant integer 0,
this commit enables the generation of:
```asm
czero.eqz a0, a1, a0
ret
```
|
It seems like the CI failure is caused by some newer commits in the main branch. I will rebase and update the test checks later. |
efca401 to
ef97fca
Compare
lenary
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We can reuse XLenVT
tclin914
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks everyone for the review! I really learned a lot from this process. :) |
Summary
This patch let RISCVTargetLowering::lowerSELECT to lower some floating-point select operations through an integer zicond select when:
In that scenario there is no extra cost for GPR <-> "FP GPR" moves, so we can implement FP selects with a CZERO-based sequence instead of a branch.
For example, for
the current lowering produces:
With this patch, when targeting rv64ima_zicond_zfinx we instead get:
The existing branch-based lowering is preserved for:
Testing
Adds llvm/test/CodeGen/RISCV/zicond-fp-select-zfinx.ll to cover:
Also adds baseline RV32F/RV64F/RV64D cases to ensure we still use branches when FP registers are separate.
The tests check that: