-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[RISCV]: Implemented softening of FCANONICALIZE
#169234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-backend-risc-v Author: None (kper) ChangesThe Closes #169216 Full diff: https://github.com/llvm/llvm-project/pull/169234.diff 3 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
index 383a025a4d916..d405295229203 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
@@ -70,6 +70,8 @@ void DAGTypeLegalizer::SoftenFloatResult(SDNode *N, unsigned ResNo) {
case ISD::EXTRACT_VECTOR_ELT:
R = SoftenFloatRes_EXTRACT_VECTOR_ELT(N, ResNo); break;
case ISD::FABS: R = SoftenFloatRes_FABS(N); break;
+ case ISD::FCANONICALIZE:
+ R = SoftenFloatRes_FCANONICALIZE(N); break;
case ISD::STRICT_FMINNUM:
case ISD::FMINNUM: R = SoftenFloatRes_FMINNUM(N); break;
case ISD::STRICT_FMAXNUM:
@@ -311,6 +313,12 @@ SDValue DAGTypeLegalizer::SoftenFloatRes_FABS(SDNode *N) {
return DAG.getNode(ISD::AND, SDLoc(N), NVT, Op, Mask);
}
+SDValue DAGTypeLegalizer::SoftenFloatRes_FCANONICALIZE(SDNode *N) {
+ return SoftenFloatRes_Unary(
+ N, GetFPLibCall(N->getValueType(0), RTLIB::FMIN_F32, RTLIB::FMIN_F64,
+ RTLIB::FMIN_F80, RTLIB::FMIN_F128, RTLIB::FMIN_PPCF128));
+}
+
SDValue DAGTypeLegalizer::SoftenFloatRes_FMINNUM(SDNode *N) {
if (SDValue SelCC = TLI.createSelectForFMINNUM_FMAXNUM(N, DAG))
return SoftenFloatRes_SELECT_CC(SelCC.getNode());
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
index ede522eff6df3..c90cb7bc88f57 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
@@ -585,6 +585,7 @@ class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
SDValue SoftenFloatRes_FASIN(SDNode *N);
SDValue SoftenFloatRes_FATAN(SDNode *N);
SDValue SoftenFloatRes_FATAN2(SDNode *N);
+ SDValue SoftenFloatRes_FCANONICALIZE(SDNode *N);
SDValue SoftenFloatRes_FMINNUM(SDNode *N);
SDValue SoftenFloatRes_FMAXNUM(SDNode *N);
SDValue SoftenFloatRes_FMINIMUMNUM(SDNode *N);
diff --git a/llvm/test/CodeGen/RISCV/fcanonicalize.ll b/llvm/test/CodeGen/RISCV/fcanonicalize.ll
new file mode 100644
index 0000000000000..334657bdf92de
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/fcanonicalize.ll
@@ -0,0 +1,64 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=riscv64 < %s | FileCheck -check-prefix=RV64I %s
+; RUN: llc -mtriple=riscv64 -mattr=+d < %s | FileCheck -check-prefix=RV64D %s
+
+define double @max(double, double) unnamed_addr #0 {
+; RV64I-LABEL: max:
+; RV64I: # %bb.0: # %start
+; RV64I-NEXT: addi sp, sp, -32
+; RV64I-NEXT: .cfi_def_cfa_offset 32
+; RV64I-NEXT: sd ra, 24(sp) # 8-byte Folded Spill
+; RV64I-NEXT: sd s0, 16(sp) # 8-byte Folded Spill
+; RV64I-NEXT: sd s1, 8(sp) # 8-byte Folded Spill
+; RV64I-NEXT: sd s2, 0(sp) # 8-byte Folded Spill
+; RV64I-NEXT: .cfi_offset ra, -8
+; RV64I-NEXT: .cfi_offset s0, -16
+; RV64I-NEXT: .cfi_offset s1, -24
+; RV64I-NEXT: .cfi_offset s2, -32
+; RV64I-NEXT: mv s0, a1
+; RV64I-NEXT: mv s1, a0
+; RV64I-NEXT: call __ltdf2
+; RV64I-NEXT: srli s2, a0, 63
+; RV64I-NEXT: mv a0, s1
+; RV64I-NEXT: mv a1, s1
+; RV64I-NEXT: call __unorddf2
+; RV64I-NEXT: snez a0, a0
+; RV64I-NEXT: or a0, a0, s2
+; RV64I-NEXT: bnez a0, .LBB0_2
+; RV64I-NEXT: # %bb.1: # %start
+; RV64I-NEXT: mv s0, s1
+; RV64I-NEXT: .LBB0_2: # %start
+; RV64I-NEXT: mv a0, s0
+; RV64I-NEXT: call fmin
+; RV64I-NEXT: ld ra, 24(sp) # 8-byte Folded Reload
+; RV64I-NEXT: ld s0, 16(sp) # 8-byte Folded Reload
+; RV64I-NEXT: ld s1, 8(sp) # 8-byte Folded Reload
+; RV64I-NEXT: ld s2, 0(sp) # 8-byte Folded Reload
+; RV64I-NEXT: .cfi_restore ra
+; RV64I-NEXT: .cfi_restore s0
+; RV64I-NEXT: .cfi_restore s1
+; RV64I-NEXT: .cfi_restore s2
+; RV64I-NEXT: addi sp, sp, 32
+; RV64I-NEXT: .cfi_def_cfa_offset 0
+; RV64I-NEXT: ret
+;
+; RV64D-LABEL: max:
+; RV64D: # %bb.0: # %start
+; RV64D-NEXT: flt.d a0, fa0, fa1
+; RV64D-NEXT: feq.d a1, fa0, fa0
+; RV64D-NEXT: xori a1, a1, 1
+; RV64D-NEXT: or a0, a1, a0
+; RV64D-NEXT: bnez a0, .LBB0_2
+; RV64D-NEXT: # %bb.1: # %start
+; RV64D-NEXT: fmv.d fa1, fa0
+; RV64D-NEXT: .LBB0_2: # %start
+; RV64D-NEXT: fmin.d fa0, fa1, fa1
+; RV64D-NEXT: ret
+start:
+ %2 = fcmp olt double %0, %1
+ %3 = fcmp uno double %0, 0.000000e+00
+ %or.cond.i.i = or i1 %3, %2
+ %4 = select i1 %or.cond.i.i, double %1, double %0
+ %5 = tail call double @llvm.canonicalize.f64(double %4) #2
+ ret double %5
+}
|
| ; RV64-SOFT-NEXT: .cfi_offset ra, -8 | ||
| ; RV64-SOFT-NEXT: slli a0, a0, 48 | ||
| ; RV64-SOFT-NEXT: srli a0, a0, 48 | ||
| ; RV64-SOFT-NEXT: call __extendhfsf2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extension probably does the job already, the minnum should be redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me that the sign is preserved, so I suppose it's fine to drop the minnum
| // zero |
and
| if (test__extendhfsf2(fromRep16(0x8000), |
Legalizing node: t5: f16 = fcanonicalize nnan t4
Analyzing result type: f16
Soft promote half result 0: t5: f16 = fcanonicalize nnan t4
Creating new node: t11: f32 = fp16_to_fp nnan t3
Creating new node: t12: f32 = fcanonicalize nnan t11
Creating new node: t13: i16 = fp_to_fp16 nnan t12
This can be done in the DAG Combiner, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from z3 import *
f16 = FP('f16', Float16())
val1 = FPVal(1.0, Float32())
def extend(rm, var):
return fpFPToFP(rm, var, Float32())
def trunc(rm, var):
return fpFPToFP(rm, var, Float16())
def src(rm):
return trunc(rm, fpMul(rm, extend(rm, f16), val1))
def prove(f):
s = Solver()
s.add(Not(f))
if s.check() == unsat:
print("proved")
else:
print("failed to prove")
tgt = f16
prove(src(RTZ()) == tgt)
prove(src(RTN()) == tgt)
prove(src(RTP()) == tgt)
prove(src(RNE()) == tgt)
prove(src(RNA()) == tgt)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I still don't understand is that if that's true then fcanonalize always becomes no-op.
define half @fcanonicalize_f16(half %x) {
%z = call nnan half @llvm.canonicalize.f16(half %x)
ret half %z
}
|
|
||
| SDValue DAGTypeLegalizer::SoftenFloatRes_FCANONICALIZE(SDNode *N) { | ||
| SDLoc dl(N); | ||
| auto Node = DAG.getNode(ISD::FMINIMUMNUM, dl, N->getValueType(0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no auto
| ; RV64-SOFT-NEXT: srli a0, a0, 48 | ||
| ; RV64-SOFT-NEXT: call __extendhfsf2 | ||
| ; RV64-SOFT-NEXT: mv a1, a0 | ||
| ; RV64-SOFT-NEXT: call fminimum_numf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this function always exist in glibc? Or does this just become a linker error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this says it's present after glibc 2.34? I'm not sure. https://www.gnu.org/software/gnulib/manual/html_node/fminimum_005fnumf.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was added 2.35, so it exists since 2022-02-03
<math.h> functions for floating-point maximum and minimum,
corresponding to new operations in IEEE 754-2019, and corresponding
<tgmath.h> macros, are added from draft ISO C2X: fmaximum,
fmaximum_num, fmaximum_mag, fmaximum_mag_num, fminimum, fminimum_num,
fminimum_mag, fminimum_mag_num and corresponding functions for float,
long double, _FloatN and _FloatNx.
https://sourceware.org/pipermail/libc-announce/2022/000033.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to consider musl, llvm libc, newlib, etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Musl and newlib do not seem to support it, whereas, llvm libc does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it in the latest commit to MUL * 1.0 to avoid issues with compatibility.
| return DAG.getNode(ISD::AND, SDLoc(N), NVT, Op, Mask); | ||
| } | ||
|
|
||
| SDValue DAGTypeLegalizer::SoftenFloatRes_FCANONICALIZE(SDNode *N) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code feels like a series of hacks to reuse existing Softening functions.
Can we do this instead. It's the code from LegalizeDAG with a BitConvertToInteger at the end.
SDValue DAGTypeLegalizer::SoftenFloatRes_FCANONICALIZE(SDNode *N) {
SDLoc dl(N);
// This implements llvm.canonicalize.f* by multiplication with 1.0, as
// suggested in
// https://llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic.
// It uses strict_fp operations even outside a strict_fp context in order
// to guarantee that the canonicalization is not optimized away by later
// passes. The result chain introduced by that is intentionally ignored
// since no ordering requirement is intended here.
// Create strict multiplication by 1.0.
SDValue Operand = N->getOperand(0);
EVT VT = Operand.getValueType();
SDValue One = DAG.getConstantFP(1.0, dl, VT);
SDValue Chain = DAG.getEntryNode();
// Propagate existing flags on canonicalize, and additionally set
// NoFPExcept.
SDNodeFlags CanonicalizeFlags = N->getFlags();
CanonicalizeFlags.setNoFPExcept(true);
SDValue Mul = DAG.getNode(ISD::STRICT_FMUL, dl, {VT, MVT::Other},
{Chain, Operand, One}, CanonicalizeFlags);
return BitConvertToInteger(Mul);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, thank you that's much better
you were right, I tried to bend the API to make it work like the other methods :)
topperc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but maybe wait a couple days for @arsenm to review
|
Sure, I have no commit access so I can't merge it anyways :) |
The
ISD::FCANONICALIZEis mapped tollvm.minnum(x, x).Closes #169216