-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[Hexagon] Add an option to use fast FP to int convert for some HVX cases #169562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
d505c49 to
5ce1c04
Compare
🐧 Linux x64 Test Results
|
5ce1c04 to
7835490
Compare
|
@llvm/pr-subscribers-backend-hexagon Author: Fateme Hosseini (fhossein-quic) ChangesLowering several flavors of fptosi for HVX can be done faster, but violates c/c++ convention on some arch tags. Nevertheless customers are using direct intrinsics with "incorrect" rounding mode and want compiler to do the same. Default behavior is not changed. Patch By: Fateme Hosseini Full diff: https://github.com/llvm/llvm-project/pull/169562.diff 2 Files Affected:
diff --git a/llvm/lib/Target/Hexagon/HexagonISelLoweringHVX.cpp b/llvm/lib/Target/Hexagon/HexagonISelLoweringHVX.cpp
index 212a57bc7cde5..0b782d79237da 100644
--- a/llvm/lib/Target/Hexagon/HexagonISelLoweringHVX.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonISelLoweringHVX.cpp
@@ -31,6 +31,10 @@ static cl::opt<unsigned> HvxWidenThreshold("hexagon-hvx-widen",
cl::Hidden, cl::init(16),
cl::desc("Lower threshold (in bytes) for widening to HVX vectors"));
+static cl::opt<bool>
+ EnableFpFastConvert("hexagon-fp-fast-convert", cl::Hidden, cl::init(false),
+ cl::desc("Enable FP fast conversion routine."));
+
static const MVT LegalV64[] = { MVT::v64i8, MVT::v32i16, MVT::v16i32 };
static const MVT LegalW64[] = { MVT::v128i8, MVT::v64i16, MVT::v32i32 };
static const MVT LegalV128[] = { MVT::v128i8, MVT::v64i16, MVT::v32i32 };
@@ -2970,6 +2974,32 @@ HexagonTargetLowering::ExpandHvxFpToInt(SDValue Op, SelectionDAG &DAG) const {
MVT ResTy = ty(Op);
assert(InpTy.changeTypeToInteger() == ResTy);
+ // At this point this is an experiment under a flag.
+ // In arch before V81 the rounding mode is towards nearest value.
+ // The C/C++ standard requires rounding towards zero:
+ // C (C99 and later): ISO/IEC 9899:2018 (C18), section 6.3.1.4 — "When a
+ // finite value of real floating type is converted to an integer type, the
+ // fractional part is discarded (i.e., the value is truncated toward zero)."
+ // C++: ISO/IEC 14882:2020 (C++20), section 7.3.7 — "A prvalue of a
+ // floating-point type can be converted to a prvalue of an integer type. The
+ // conversion truncates; that is, the fractional part is discarded."
+ if (InpTy == MVT::v64f16) {
+ if (Subtarget.useHVXV81Ops()) {
+ // This is c/c++ compliant
+ SDValue ConvVec =
+ getInstr(Hexagon::V6_vconv_h_hf_rnd, dl, ResTy, {Op0}, DAG);
+ return ConvVec;
+ } else if (EnableFpFastConvert) {
+ // Vd32.h=Vu32.hf same as Q6_Vh_equals_Vhf
+ SDValue ConvVec = getInstr(Hexagon::V6_vconv_h_hf, dl, ResTy, {Op0}, DAG);
+ return ConvVec;
+ }
+ } else if (EnableFpFastConvert && InpTy == MVT::v32f32) {
+ // Vd32.w=Vu32.sf same as Q6_Vw_equals_Vsf
+ SDValue ConvVec = getInstr(Hexagon::V6_vconv_w_sf, dl, ResTy, {Op0}, DAG);
+ return ConvVec;
+ }
+
// int32_t conv_f32_to_i32(uint32_t inp) {
// // s | exp8 | frac23
//
diff --git a/llvm/test/CodeGen/Hexagon/autohvx/fp-to-int_2.ll b/llvm/test/CodeGen/Hexagon/autohvx/fp-to-int_2.ll
new file mode 100644
index 0000000000000..d4e3de1bc27b6
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/autohvx/fp-to-int_2.ll
@@ -0,0 +1,31 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=hexagon -hexagon-hvx-widen=32 -hexagon-fp-fast-convert=true -mattr=+hvxv68,+hvx-length128b,+hvx-qfloat < %s | FileCheck %s
+
+target datalayout = "e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048"
+target triple = "hexagon"
+
+; f16 -> s16
+; No widening
+define void @f16s16_0(ptr %a0, ptr %a1) #0 {
+; CHECK-LABEL: f16s16_0:
+; CHECK: {
+; CHECK: [[DST:v[0-9]+]].h = [[SRC:v[0-9]+]].hf
+; CHECK-NEXT: jumpr r31
+; CHECK: vmem(r1+#0) = [[DST]].new
+; CHECK-NEXT: }
+
+ %v0 = load <64 x half>, ptr %a0, align 128
+ %v1 = fptosi <64 x half> %v0 to <64 x i16>
+ store <64 x i16> %v1, ptr %a1, align 128
+ ret void
+}
+
+; Widen result #2
+define void @f32s8_2(ptr %a0, ptr %a1) {
+; CHECK-LABEL: f32s8_2:
+; CHECK: v{{.*}}.w = v{{.*}}.sf
+ %v0 = load <32 x float>, ptr %a0, align 128
+ %v1 = fptosi <32 x float> %v0 to <32 x i8>
+ store <32 x i8> %v1, ptr %a1, align 128
+ ret void
+}
|
Lowering several flavors of fptosi for HVX can be done faster, but violates c/c++ convention on some arch tags. Nevertheless customers are using direct intrinsics with "incorrect" rounding mode and want compiler to do the same. Default behavior is not changed. Patch By: Fateme Hosseini Co-authored-by: Sergei Larin <[email protected]>
7835490 to
c85bb90
Compare
Lowering several flavors of fptosi for HVX can be done faster, but violates c/c++ convention on some arch tags. Nevertheless customers are using direct intrinsics with "incorrect" rounding mode and want compiler to do the same.
Default behavior is not changed.
Patch By: Fateme Hosseini