[libc][math][c23] Add rsqrtf16() function #137545

amemov · 2025-04-27T19:57:10Z

github-actions · 2025-04-27T19:59:34Z

✅ With the latest revision this PR passed the C/C++ code formatter.

amemov · 2025-04-30T20:30:32Z

Trying to figure out what would be the best option to compute the result.
I found that the current polynomial produces the least errors ( bigger ones yield negligible results )
P = fpminimax(1/sqrt(x), [|0,1,2,3,4,5|], [|SG...|], [0.5, 1]);
And has ULP Error = 1.0

Also found this already existing implementation:

llvm-project/libc/src/__support/fixed_point/sqrt.h

Line 39 in ae6b4b2

// P = fpminimax(sqrt(x), 1, [|8, 8|], [i * 2^-4, (i + 1)*2^-4],

It has some other interesting points that I found when I was doing my research: specifically, Newton's method.

Upd: Tried adding 2 iterations of Newton's method. Each significantly reduced number of errors, but there are still some

llvmbot · 2025-09-13T01:08:34Z

@llvm/pr-subscribers-libc

Author: Anton Shepelev (amemov)

Changes

Addresses #132818
Part of #95250

Patch is 22.13 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/137545.diff

22 Files Affected:

(modified) libc/config/linux/x86_64/entrypoints.txt (+1)
(modified) libc/docs/headers/math/index.rst (+3-1)
(modified) libc/include/math.yaml (+7)
(modified) libc/shared/math.h (+2)
(added) libc/shared/math/rsqrtf16.h (+29)
(modified) libc/src/__support/math/CMakeLists.txt (+16)
(added) libc/src/__support/math/rsqrtf16.h (+139)
(modified) libc/src/math/CMakeLists.txt (+2)
(modified) libc/src/math/generic/CMakeLists.txt (+12-1)
(added) libc/src/math/generic/rsqrtf16.cpp (+15)
(added) libc/src/math/rsqrtf16.h (+21)
(modified) libc/test/shared/CMakeLists.txt (+2)
(modified) libc/test/shared/shared_math_test.cpp (+2)
(modified) libc/test/src/math/CMakeLists.txt (+11)
(added) libc/test/src/math/rsqrtf16_test.cpp (+42)
(modified) libc/test/src/math/smoke/CMakeLists.txt (+11)
(added) libc/test/src/math/smoke/rsqrtf16_test.cpp (+37)
(modified) libc/utils/MPFRWrapper/MPCommon.cpp (+6)
(modified) libc/utils/MPFRWrapper/MPCommon.h (+1)
(modified) libc/utils/MPFRWrapper/MPFRUtils.cpp (+2)
(modified) libc/utils/MPFRWrapper/MPFRUtils.h (+1)
(modified) utils/bazel/llvm-project-overlay/libc/BUILD.bazel (+25)

diff --git a/libc/config/linux/x86_64/entrypoints.txt b/libc/config/linux/x86_64/entrypoints.txt
index 1fef16f190af6..0bb8a683c5b01 100644
--- a/libc/config/linux/x86_64/entrypoints.txt
+++ b/libc/config/linux/x86_64/entrypoints.txt
@@ -784,6 +784,7 @@ if(LIBC_TYPES_HAS_FLOAT16)
     libc.src.math.rintf16
     libc.src.math.roundevenf16
     libc.src.math.roundf16
+    libc.src.math.rsqrtf16
     libc.src.math.scalblnf16
     libc.src.math.scalbnf16
     libc.src.math.setpayloadf16
diff --git a/libc/docs/headers/math/index.rst b/libc/docs/headers/math/index.rst
index 6c0e2190808df..7d5b341ba674a 100644
--- a/libc/docs/headers/math/index.rst
+++ b/libc/docs/headers/math/index.rst
@@ -255,6 +255,7 @@ Basic Operations
 Higher Math Functions
 =====================
 
+
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+----------++------------+------------------------+----------------------------+
 | <Func>    | <Func_f> (float) | <Func> (double) | <Func_l> (long double) | <Func_f16> (float16) | <Func_f128> (float128) | <Func_bf16> (bfloat16) | C23 Definition Section | C23 Error Handling Section |
 +===========+==================+=================+========================+======================+========================+========================+========================+============================+
@@ -342,7 +343,7 @@ Higher Math Functions
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+------------------------+----------------------------+
 | rootn     |                  |                 |                        |                      |                        |                        | 7.12.7.8               | F.10.4.8                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+------------------------+----------------------------+
-| rsqrt     |                  |                 |                        |                      |                        |                        | 7.12.7.9               | F.10.4.9                   |
+| rsqrt     |                  |                 |                        | |check|              |                        |                        | 7.12.7.9               | F.10.4.9                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+------------------------+----------------------------+
 | sin       | |check|          | |check|         |                        | |check|              |                        |                        | 7.12.4.6               | F.10.1.6                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+------------------------+----------------------------+
@@ -363,6 +364,7 @@ Higher Math Functions
 | tgamma    |                  |                 |                        |                      |                        |                        | 7.12.8.4               | F.10.5.4                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+------------------------+----------------------------+
 
+
 Legends:
 
 * |check| : correctly rounded for all 4 rounding modes.
diff --git a/libc/include/math.yaml b/libc/include/math.yaml
index 17f26fcfcb308..6c800a0e2aa28 100644
--- a/libc/include/math.yaml
+++ b/libc/include/math.yaml
@@ -2349,6 +2349,13 @@ functions:
     return_type: long double
     arguments:
       - type: long double
+  - name: rsqrtf16
+    standards:
+      - stdc
+    return_type: _Float16
+    arguments:
+      - type: _Float16
+    guard: LIBC_TYPES_HAS_FLOAT16
   - name: scalbln
     standards:
       - stdc
diff --git a/libc/shared/math.h b/libc/shared/math.h
index 69d785b3e0291..4f20095912bf1 100644
--- a/libc/shared/math.h
+++ b/libc/shared/math.h
@@ -53,4 +53,6 @@
 #include "math/ldexpf128.h"
 #include "math/ldexpf16.h"
 
+#include "math/rsqrtf16.h"
+
 #endif // LLVM_LIBC_SHARED_MATH_H
diff --git a/libc/shared/math/rsqrtf16.h b/libc/shared/math/rsqrtf16.h
new file mode 100644
index 0000000000000..54c7499214636
--- /dev/null
+++ b/libc/shared/math/rsqrtf16.h
@@ -0,0 +1,29 @@
+//===-- Shared rsqrtf16 function -------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SHARED_MATH_RSQRTF16_H
+#define LLVM_LIBC_SHARED_MATH_RSQRTF16_H
+
+#include "include/llvm-libc-macros/float16-macros.h"
+
+#ifdef LIBC_TYPES_HAS_FLOAT16
+
+#include "shared/libc_common.h"
+#include "src/__support/math/rsqrtf16.h"
+
+namespace LIBC_NAMESPACE_DECL {
+namespace shared {
+
+using math::rsqrtf16;
+
+} // namespace shared
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LIBC_TYPES_HAS_FLOAT16
+
+#endif // LLVM_LIBC_SHARED_MATH_RSQRTF16_H
diff --git a/libc/src/__support/math/CMakeLists.txt b/libc/src/__support/math/CMakeLists.txt
index 39dc0e57f4472..ed5f314b0a9b5 100644
--- a/libc/src/__support/math/CMakeLists.txt
+++ b/libc/src/__support/math/CMakeLists.txt
@@ -109,6 +109,22 @@ add_header_library(
     libc.src.__support.macros.properties.types
 )
 
+
+add_header_library(
+  rsqrtf16
+  HDRS
+    rsqrtf16.h
+  DEPENDS
+    libc.src.__support.FPUtil.cast
+    libc.src.__support.FPUtil.fenv_impl
+    libc.src.__support.FPUtil.fp_bits
+    libc.src.__support.FPUtil.multiply_add
+    libc.src.__support.FPUtil.polyeval
+    libc.src.__support.FPUtil.manipulation_functions
+    libc.src.__support.macros.optimization
+    libc.src.__support.macros.properties.types
+)
+
 add_header_library(
   asin_utils
   HDRS
diff --git a/libc/src/__support/math/rsqrtf16.h b/libc/src/__support/math/rsqrtf16.h
new file mode 100644
index 0000000000000..b410f258450d8
--- /dev/null
+++ b/libc/src/__support/math/rsqrtf16.h
@@ -0,0 +1,139 @@
+//===-- Implementation header for rsqrtf16 ----------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC___SUPPORT_MATH_RSQRTF16_H
+#define LLVM_LIBC_SRC___SUPPORT_MATH_RSQRTF16_H
+
+#include "include/llvm-libc-macros/float16-macros.h"
+
+#ifdef LIBC_TYPES_HAS_FLOAT16
+
+#include "src/__support/FPUtil/FEnvImpl.h"
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/ManipulationFunctions.h"
+#include "src/__support/FPUtil/PolyEval.h"
+#include "src/__support/FPUtil/cast.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/macros/optimization.h"
+
+namespace LIBC_NAMESPACE_DECL {
+namespace math {
+
+static constexpr float16 rsqrtf16(float16 x) {
+  using FPBits = fputil::FPBits<float16>;
+  FPBits xbits(x);
+
+  uint16_t x_u = xbits.uintval();
+  uint16_t x_abs = x_u & 0x7fff;
+  uint16_t x_sign = x_u >> 15;
+
+  // x is NaN
+  if (LIBC_UNLIKELY(xbits.is_nan())) {
+    if (xbits.is_signaling_nan()) {
+      fputil::raise_except_if_required(FE_INVALID);
+      return FPBits::quiet_nan().get_val();
+    }
+    return x;
+  }
+
+  // |x| = 0
+  if (LIBC_UNLIKELY(x_abs == 0x0)) {
+    fputil::raise_except_if_required(FE_DIVBYZERO);
+    fputil::set_errno_if_required(ERANGE);
+    return FPBits::inf(Sign::POS).get_val();
+  }
+
+  // -inf <= x < 0
+  if (LIBC_UNLIKELY(x_sign == 1)) {
+    fputil::raise_except_if_required(FE_INVALID);
+    fputil::set_errno_if_required(EDOM);
+    return FPBits::quiet_nan().get_val();
+  }
+
+  // x = +inf => rsqrt(x) = 0
+  if (LIBC_UNLIKELY(xbits.is_inf())) {
+    return fputil::cast<float16>(0.0f);
+  }
+
+  // x is valid, estimate the result
+  // Range reduction:
+  // x can be expressed as m*2^e, where e - int exponent and m - mantissa
+  // rsqrtf16(x) = rsqrtf16(m*2^e)
+  // rsqrtf16(m*2^e) = 1/sqrt(m) * 1/sqrt(2^e) = 1/sqrt(m) * 1/2^(e/2)
+  // 1/sqrt(m) * 1/2^(e/2) = 1/sqrt(m) * 2^(-e/2)
+
+  // Compute in float throughout to minimize cost while preserving accuracy.
+  float xf = x;
+  int exponent = 0;
+  float mantissa = fputil::frexp(xf, exponent);
+
+  float result = 0.0f;
+  int exp_floored = -(exponent >> 1);
+
+  if (mantissa == 0.5f) {
+    // When mantissa is 0.5f, x was a power of 2 (or subnormal that normalizes
+    // this way). 1/sqrt(0.5f) = sqrt(2.0f).
+    // If exponent is odd (exponent = 2k + 1):
+    //   rsqrt(x) = (1/sqrt(0.5)) * 2^(-(2k+1)/2) = sqrt(2) * 2^(-k-0.5)
+    //            = sqrt(2) * 2^(-k) * (1/sqrt(2)) = 2^(-k)
+    //   exp_floored = -((2k+1)>>1) = -(k) = -k
+    //   So result = ldexp(1.0f, exp_floored)
+    // If exponent is even (exponent = 2k):
+    //   rsqrt(x) = (1/sqrt(0.5)) * 2^(-2k/2) = sqrt(2) * 2^(-k)
+    //   exp_floored = -((2k)>>1) = -(k) = -k
+    //   So result = ldexp(sqrt(2.0f), exp_floored)
+    if (exponent & 1) {
+      result = fputil::ldexp(1.0f, exp_floored);
+    } else {
+      constexpr float SQRT_2_F = 0x1.6a09e6p0f; // sqrt(2.0f)
+      result = fputil::ldexp(SQRT_2_F, exp_floored);
+    }
+  } else {
+    // Degree-5 polynomial (float coefficients) generated with Sollya:
+    // P = fpminimax(1/sqrt(x) + 2^-28, 5, [|single...|], [0.5,1])
+    float y =
+        fputil::polyeval(mantissa, 0x1.9c81fap1f, -0x1.e2c63ap2f, 0x1.91e9b8p3f,
+                         -0x1.899abep3f, 0x1.9eddeap2f, -0x1.6bdb48p0f);
+
+    // Newton-Raphson iteration in float (use multiply_add to leverage FMA when
+    // available):
+    float y2 = y * y;
+    float factor = fputil::multiply_add(-0.5f * mantissa, y2, 1.5f);
+    y = y * factor;
+
+    result = fputil::ldexp(y, exp_floored);
+    if (exponent & 1) {
+      constexpr float ONE_OVER_SQRT2 = 0x1.6a09e6p-1f; // 1/sqrt(2)
+      result *= ONE_OVER_SQRT2;
+    }
+
+    // Targeted post-correction: for the specific half-precision mantissa
+    // pattern M == 0x011F we observe a consistent -1 ULP bias across exponents.
+    // Apply a tiny upward nudge to cross the rounding boundary in all modes.
+    const uint16_t half_mantissa = static_cast<uint16_t>(x_abs & 0x3ff);
+    if (half_mantissa == 0x011F) {
+      // Nudge up to fix consistent -1 ULP at that mantissa boundary
+      result = fputil::multiply_add(result, 0x1.0p-21f,
+                                    result); // result *= (1 + 2^-21)
+    } else if (half_mantissa == 0x0313) {
+      // Nudge down to fix +1 ULP under upward rounding at this mantissa
+      // boundary
+      result = fputil::multiply_add(result, -0x1.0p-21f,
+                                    result); // result *= (1 - 2^-21)
+    }
+  }
+
+  return fputil::cast<float16>(result);
+}
+
+} // namespace math
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LIBC_TYPES_HAS_FLOAT16
+
+#endif // LLVM_LIBC_SRC___SUPPORT_MATH_RSQRTF16_H
diff --git a/libc/src/math/CMakeLists.txt b/libc/src/math/CMakeLists.txt
index e418a8b0e24b9..a6f400c873b7e 100644
--- a/libc/src/math/CMakeLists.txt
+++ b/libc/src/math/CMakeLists.txt
@@ -516,6 +516,8 @@ add_math_entrypoint_object(roundevenf16)
 add_math_entrypoint_object(roundevenf128)
 add_math_entrypoint_object(roundevenbf16)
 
+add_math_entrypoint_object(rsqrtf16)
+
 add_math_entrypoint_object(scalbln)
 add_math_entrypoint_object(scalblnf)
 add_math_entrypoint_object(scalblnl)
diff --git a/libc/src/math/generic/CMakeLists.txt b/libc/src/math/generic/CMakeLists.txt
index 263c5dfd0832b..ca7baeccae01a 100644
--- a/libc/src/math/generic/CMakeLists.txt
+++ b/libc/src/math/generic/CMakeLists.txt
@@ -973,7 +973,7 @@ add_entrypoint_object(
 )
 
 add_entrypoint_object(
-    roundevenbf16
+  roundevenbf16
   SRCS
     roundevenbf16.cpp
   HDRS
@@ -988,6 +988,17 @@ add_entrypoint_object(
     ROUND_OPT
 )
 
+add_entrypoint_object(
+  rsqrtf16
+  SRCS
+    rsqrtf16.cpp
+  HDRS
+    ../rsqrtf16.h
+  DEPENDS
+    libc.src.__support.math.rsqrtf16
+    libc.src.errno.errno
+)
+
 add_entrypoint_object(
   lround
   SRCS
diff --git a/libc/src/math/generic/rsqrtf16.cpp b/libc/src/math/generic/rsqrtf16.cpp
new file mode 100644
index 0000000000000..fb166b131d673
--- /dev/null
+++ b/libc/src/math/generic/rsqrtf16.cpp
@@ -0,0 +1,15 @@
+//===-- Half-precision rsqrt function -------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception.
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/math/rsqrtf16.h"
+#include "src/__support/math/rsqrtf16.h"
+
+namespace LIBC_NAMESPACE_DECL {
+
+LLVM_LIBC_FUNCTION(float16, rsqrtf16, (float16 x)) { return math::rsqrtf16(x); }
+} // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/src/math/rsqrtf16.h b/libc/src/math/rsqrtf16.h
new file mode 100644
index 0000000000000..c88ab5256ce88
--- /dev/null
+++ b/libc/src/math/rsqrtf16.h
@@ -0,0 +1,21 @@
+//===-- Implementation header for rsqrtf16 ----------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_MATH_RSQRTF16_H
+#define LLVM_LIBC_SRC_MATH_RSQRTF16_H
+
+#include "src/__support/macros/config.h"
+#include "src/__support/macros/properties/types.h"
+
+namespace LIBC_NAMESPACE_DECL {
+
+float16 rsqrtf16(float16 x);
+
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LLVM_LIBC_SRC_MATH_RSQRTF16_H
diff --git a/libc/test/shared/CMakeLists.txt b/libc/test/shared/CMakeLists.txt
index 48241d3f55287..495d6f0a81a4c 100644
--- a/libc/test/shared/CMakeLists.txt
+++ b/libc/test/shared/CMakeLists.txt
@@ -48,4 +48,6 @@ add_fp_unittest(
     libc.src.__support.math.ldexpf
     libc.src.__support.math.ldexpf128
     libc.src.__support.math.ldexpf16
+    libc.src.__support.math.rsqrtf16
+
 )
diff --git a/libc/test/shared/shared_math_test.cpp b/libc/test/shared/shared_math_test.cpp
index 2e5a2d51146d4..aa459f88c29f5 100644
--- a/libc/test/shared/shared_math_test.cpp
+++ b/libc/test/shared/shared_math_test.cpp
@@ -17,6 +17,8 @@ TEST(LlvmLibcSharedMathTest, AllFloat16) {
 
   EXPECT_FP_EQ(0x0p+0f16, LIBC_NAMESPACE::shared::acoshf16(1.0f16));
   EXPECT_FP_EQ(0x0p+0f16, LIBC_NAMESPACE::shared::acospif16(1.0f16));
+  EXPECT_FP_EQ(0x1p+0f16, LIBC_NAMESPACE::shared::rsqrtf16(1.0f16));
+
   EXPECT_FP_EQ(0x0p+0f16, LIBC_NAMESPACE::shared::asinf16(0.0f16));
   EXPECT_FP_EQ(0x0p+0f16, LIBC_NAMESPACE::shared::asinhf16(0.0f16));
   EXPECT_FP_EQ(0x0p+0f16, LIBC_NAMESPACE::shared::atanf16(0.0f16));
diff --git a/libc/test/src/math/CMakeLists.txt b/libc/test/src/math/CMakeLists.txt
index 378eadcf9e70b..9d644703a61ae 100644
--- a/libc/test/src/math/CMakeLists.txt
+++ b/libc/test/src/math/CMakeLists.txt
@@ -1678,6 +1678,17 @@ add_fp_unittest(
     libc.src.math.sqrtl
 )
 
+add_fp_unittest(
+  rsqrtf16_test
+  NEED_MPFR
+  SUITE
+    libc-math-unittests
+  SRCS
+    rsqrtf16_test.cpp
+  DEPENDS
+    libc.src.math.rsqrtf16
+)
+
 add_fp_unittest(
   sqrtf16_test
   NEED_MPFR
diff --git a/libc/test/src/math/rsqrtf16_test.cpp b/libc/test/src/math/rsqrtf16_test.cpp
new file mode 100644
index 0000000000000..d2f3fe8f49b92
--- /dev/null
+++ b/libc/test/src/math/rsqrtf16_test.cpp
@@ -0,0 +1,42 @@
+//===-- Exhaustive test for rsqrtf16 --------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/math/rsqrtf16.h"
+#include "test/UnitTest/FPMatcher.h"
+#include "test/UnitTest/Test.h"
+#include "utils/MPFRWrapper/MPFRUtils.h"
+
+using LlvmLibcRsqrtf16Test = LIBC_NAMESPACE::testing::FPTest<float16>;
+
+namespace mpfr = LIBC_NAMESPACE::testing::mpfr;
+
+// Range: [0, Inf]
+static constexpr uint16_t POS_START = 0x0000U;
+static constexpr uint16_t POS_STOP = 0x7c00U;
+
+// Range: [-Inf, 0]
+static constexpr uint16_t NEG_START = 0x8000U;
+static constexpr uint16_t NEG_STOP = 0xfc00U;
+
+TEST_F(LlvmLibcRsqrtf16Test, PositiveRange) {
+  for (uint16_t v = POS_START; v <= POS_STOP; ++v) {
+    float16 x = FPBits(v).get_val();
+
+    EXPECT_MPFR_MATCH_ALL_ROUNDING(mpfr::Operation::Rsqrt, x,
+                                   LIBC_NAMESPACE::rsqrtf16(x), 0.5);
+  }
+}
+
+TEST_F(LlvmLibcRsqrtf16Test, NegativeRange) {
+  for (uint16_t v = NEG_START; v <= NEG_STOP; ++v) {
+    float16 x = FPBits(v).get_val();
+
+    EXPECT_MPFR_MATCH_ALL_ROUNDING(mpfr::Operation::Rsqrt, x,
+                                   LIBC_NAMESPACE::rsqrtf16(x), 0.5);
+  }
+}
diff --git a/libc/test/src/math/smoke/CMakeLists.txt b/libc/test/src/math/smoke/CMakeLists.txt
index b8d5ecf4d77e5..93243e0ca9e5a 100644
--- a/libc/test/src/math/smoke/CMakeLists.txt
+++ b/libc/test/src/math/smoke/CMakeLists.txt
@@ -3502,6 +3502,17 @@ add_fp_unittest(
     libc.src.math.sqrtl
 )
 
+add_fp_unittest(
+  rsqrtf16_test
+  SUITE
+    libc-math-smoke-tests
+  SRCS
+    rsqrtf16_test.cpp
+  DEPENDS
+    libc.src.errno.errno
+    libc.src.math.rsqrtf16
+)
+
 add_fp_unittest(
   sqrtf16_test
   SUITE
diff --git a/libc/test/src/math/smoke/rsqrtf16_test.cpp b/libc/test/src/math/smoke/rsqrtf16_test.cpp
new file mode 100644
index 0000000000000..a229ca6cdaaaf
--- /dev/null
+++ b/libc/test/src/math/smoke/rsqrtf16_test.cpp
@@ -0,0 +1,37 @@
+//===-- Unittests for rsqrtf16 --------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception.
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/__support/libc_errno.h"
+#include "src/math/rsqrtf16.h"
+#include "test/UnitTest/FPMatcher.h"
+#include "test/UnitTest/Test.h"
+
+using LlvmLibcRsqrtf16Test = LIBC_NAMESPACE::testing::FPTest<float16>;
+TEST_F(LlvmLibcRsqrtf16Test, SpecialNumbers) {
+  LIBC_NAMESPACE::libc_errno = 0;
+  EXPECT_FP_EQ(aNaN, LIBC_NAMESPACE::rsqrtf16(aNaN));
+  EXPECT_MATH_ERRNO(0);
+
+  EXPECT_FP_EQ_WITH_EXCEPTION(aNaN, LIBC_NAMESPACE::rsqrtf16(sNaN), FE_INVALID);
+  EXPECT_MATH_ERRNO(0);
+
+  EXPECT_FP_EQ(inf, LIBC_NAMESPACE::rsqrtf16(0.0f));
+  EXPECT_MATH_ERRNO(ERANGE);
+
+  EXPECT_FP_EQ(1.0f, LIBC_NAMESPACE::rsqrtf16(1.0f));
+  EXPECT_MATH_ERRNO(0);
+
+  EXPECT_FP_EQ(0.0f, LIBC_NAMESPACE::rsqrtf16(inf));
+  EXPECT_MATH_ERRNO(0);
+
+  EXPECT_FP_EQ(aNaN, LIBC_NAMESPACE::rsqrtf16(neg_inf));
+  EXPECT_MATH_ERRNO(EDOM);
+
+  EXPECT_FP_EQ(aNaN, LIBC_NAMESPACE::rsqrtf16(-2.0f));
+  EXPECT_MATH_ERRNO(EDOM);
+}
diff --git a/libc/utils/MPFRWrapper/MPCommon.cpp b/libc/utils/MPFRWrapper/MPCommon.cpp
index c255220774110..6b78bee6e7cae 100644
--- a/libc/utils/MPFRWrapper/MPCommon.cpp
+++ b/libc/utils/MPFRWrapper/MPCommon.cpp
@@ -393,6 +393,12 @@ MPFRNumber MPFRNumber::rint(mpfr_rnd_t rnd) const {
   return result;
 }
 
+MPFRNumber MPFRNumber::rsqrt() const {
+  MPFRNumber result(*this);
+  mpfr_rec_sqrt(result.value, value, mpfr_rounding);
+  return result;
+}
+
 MPFRNumber MPFRNumber::mod_2pi() const {
   MPFRNumber result(0.0, 1280);
   MPFRNumber _2pi(0.0, 1280);
diff --git a/libc/utils/MPFRWrapper/MPCommon.h b/libc/utils/MPFRWrapper/MPCommon.h
index 25bdc9bc00250..9f4107a7961d2 100644
--- a/libc/utils/MPFRWrapper/MPCommon.h
+++ b/libc/utils/MPFRWrapper/MPCommon.h
@@ -222,6 +222,7 @@ class MPFRNumber {
   bool round_to_long(long &result) const;
   bool round_to_long(mpfr_rnd_t rnd, long &result) const;
   MPFRNumber rint(mpfr_rnd_t rnd) const;
+  MPFRNu...
[truncated]

amemov · 2025-09-13T01:09:01Z

@overmighty @lntue

libc/src/__support/math/rsqrtf16.h

lntue · 2025-09-13T01:38:09Z

Trying to figure out what would be the best option to compute the result. I found that the current polynomial produces the least errors ( bigger ones yield negligible results ) P = fpminimax(1/sqrt(x), [|0,1,2,3,4,5|], [|SG...|], [0.5, 1]); And has ULP Error = 1.0

Also found this already existing implementation:

llvm-project/libc/src/__support/fixed_point/sqrt.h

Line 39 in ae6b4b2

// P = fpminimax(sqrt(x), 1, [|8, 8|], [i * 2^-4, (i + 1)*2^-4],

It has some other interesting points that I found when I was doing my research: specifically, Newton's method.
Upd: Tried adding 2 iterations of Newton's method. Each significantly reduced number of errors, but there are still some

Can you compare the performance of this with

  fputil::cast<float16>(1.0f / fputil::sqrt(fputil::cast<float>(x)));

amemov · 2025-09-13T16:10:07Z

Trying to figure out what would be the best option to compute the result. I found that the current polynomial produces the least errors ( bigger ones yield negligible results ) P = fpminimax(1/sqrt(x), [|0,1,2,3,4,5|], [|SG...|], [0.5, 1]); And has ULP Error = 1.0
Also found this already existing implementation:

llvm-project/libc/src/__support/fixed_point/sqrt.h

Line 39 in ae6b4b2

// P = fpminimax(sqrt(x), 1, [|8, 8|], [i * 2^-4, (i + 1)*2^-4],

It has some other interesting points that I found when I was doing my research: specifically, Newton's method.
Upd: Tried adding 2 iterations of Newton's method. Each significantly reduced number of errors, but there are still some

Can you compare the performance of this with
  fputil::cast<float16>(1.0f / fputil::sqrt(fputil::cast<float>(x)));

I wrote this test to check the performance of the implementation and ran the tests for rsqrtf16 a few times:

TEST_F(LlvmLibcRsqrtf16Test, PositiveRange_OneOverSqrtFputil) {
  for (uint16_t v = POS_START; v <= POS_STOP; ++v) {
    float16 x = FPBits(v).get_val();

    float16 y = LIBC_NAMESPACE::fputil::cast<float16, float>(
        1.0f / LIBC_NAMESPACE::fputil::sqrt<float, float>(
                   LIBC_NAMESPACE::fputil::cast<float, float16>(x)));

    EXPECT_MPFR_MATCH_ALL_ROUNDING(mpfr::Operation::Rsqrt, x, y, 1.0);
  }
}

Turns out that my implementation is ~3x slower than just directly calling 1.0f / fputil::sqrt
Not sure why is that - because of too many branches or I wrote over-complicated approximation. I understand it probably won't be as fast as directly calling CPU built-in instruction, but still. The one you see is the most minimal I was able to derive so far - I started with 7-degree polynomial and 2 iterations of Newton's method and was able to reduce it to 5-degree and 1 iteration. What do you think?

lntue · 2025-09-13T16:40:03Z

Trying to figure out what would be the best option to compute the result. I found that the current polynomial produces the least errors ( bigger ones yield negligible results ) P = fpminimax(1/sqrt(x), [|0,1,2,3,4,5|], [|SG...|], [0.5, 1]); And has ULP Error = 1.0
Also found this already existing implementation:

llvm-project/libc/src/__support/fixed_point/sqrt.h

Line 39 in ae6b4b2

// P = fpminimax(sqrt(x), 1, [|8, 8|], [i * 2^-4, (i + 1)*2^-4],

It has some other interesting points that I found when I was doing my research: specifically, Newton's method.
Upd: Tried adding 2 iterations of Newton's method. Each significantly reduced number of errors, but there are still some

Can you compare the performance of this with
  fputil::cast<float16>(1.0f / fputil::sqrt(fputil::cast<float>(x)));
I wrote this test to check the performance of the implementation and ran the tests for rsqrtf16 a few times:
TEST_F(LlvmLibcRsqrtf16Test, PositiveRange_OneOverSqrtFputil) {
  for (uint16_t v = POS_START; v <= POS_STOP; ++v) {
    float16 x = FPBits(v).get_val();

    float16 y = LIBC_NAMESPACE::fputil::cast<float16, float>(
        1.0f / LIBC_NAMESPACE::fputil::sqrt<float, float>(
                   LIBC_NAMESPACE::fputil::cast<float, float16>(x)));

    EXPECT_MPFR_MATCH_ALL_ROUNDING(mpfr::Operation::Rsqrt, x, y, 1.0);
  }
}
Turns out that my implementation is ~3x slower than just directly calling 1.0f / fputil::sqrt :/ Not sure why is that - because of too many branches or I wrote over-complicated approximation. The one you see is the most minimal I was able to derive so far - I started with 7-degree polynomial and 2 iterations of Newton's method and was able to reduce it to 5-degree and 1 iteration. What do you think?

It is actually expected, because single/double precision division and square root in modern hardware are quite efficient.
You can see for example Zen3 in https://www.agner.org/optimize/instruction_tables.pdf
SQRTSS and DIVSS latencies are 14 and 10.5 clocks respectively, while ADDSS/MULPS and VFMA are 3 and 4 clocks.

So unless you can reduce to maybe 3, 4 multiply-adds, the extra logic like branching, exponent reductions, ... around the computations will make it slower than the straightforward sqrt + div in single precision.

For rsqrtf16, the newton-raphson method will be better than sqrt + div for targets without single precision hardware, such as some embedded system. But in that case, you will need to implement polynomial approximation + newton raphson in integer / fixed point arithmetic to gain the efficiency.

libc/src/__support/math/rsqrtf16.h

libc/config/linux/x86_64/entrypoints.txt

amemov · 2025-09-16T22:49:44Z

For the record - the implementation had to be changed because it wasn't as fast as calling the hardware specific instructions by a very huge margin. Below are the observations that I saw when I was comparing 4 different implementations using Google Benchmark and suite introduced as part of GSoC 2025:

Run on (14 X 4900 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x7)
L1 Instruction 64 KiB (x7)
L2 Unified 2048 KiB (x7)
L3 Unified 12288 KiB (x1)
Load Average: 1.12, 1.53, 1.24
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-- LOG(0): Value passed to --benchmark_min_time should have a suffix. Eg., `30s` for 30-seconds.-- LOG(0): Value passed to --benchmark_min_time should have a suffix. Eg., `30s` for 30-seconds.-- LOG(0): Value passed to --benchmark_min_time should have a suffix. Eg., `30s` for 30-seconds.-- LOG(0): Value passed to --benchmark_min_time should have a suffix. Eg., `30s` for 30-seconds.---------------------------------------------------------------------
Benchmark           Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------
BM_Current     632704 ns       629457 ns         2220 items_per_second=50.4324M/s
BM_Impl1      1058891 ns      1053017 ns         1339 items_per_second=30.1467M/s
BM_Impl2       686397 ns       684024 ns         2010 items_per_second=46.4092M/s
BM_ViaSqrt     402577 ns       401175 ns         3345 items_per_second=79.13M/s

Where Current is what I wrote locally (didn't push on PR), Impl2 is similar to current but has slightly bigger polynomial, Impl1 is what you see on Github in the previous commit, and ViaSqrt is LIBC_NAMESPACE::fputil::cast<float16>(1.0f / LIBC_NAMESPACE::fputil::sqrt<float>(LIBC_NAMESPACE::fputil::cast<float>(x)));

However, the ViaSqrt as described above was not able to satisfy the correctness requirements established in the Libc across 2 mantissas: 0x0313 and 0x011F. Therefore, I added a small correction at the end. With this changes the implementation is a little slower than directly calling ViaSqrt (by ~5000 ns), but still way faster than the best implementation I did ( which is Current ) by 200000 ns.

With that being said, there is still some work left for the future: if hardware provides sqrt() instruction - it should be used, otherwise an int-based math approximation should be used for targets that don't have LIBC_TARGET_CPU_HAS_FPU_FLOAT

libc/test/src/math/smoke/rsqrtf16_test.cpp

-The accuracy improved drastically, but it still fails

- Refactored the implementation to match the proposal for constexpr - Added rsqrtf16 in Bazel build

…for calling sqrt - Adjusted the results from fixed-point call of hardware instruction to match the correctness required in the Libc - TODO: In the next PR add int-based approximation of the function for scenario where floats are not available in the hardware

llvm-ci · 2025-09-17T14:23:13Z

LLVM Buildbot has detected a new failure on builder libc-aarch64-ubuntu-dbg running on libc-aarch64-ubuntu while building libc,utils at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/104/builds/31550

Here is the relevant piece of the build log for the reference

Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py ...' (failure)
...
[ RUN      ] LlvmLibcFMinimumMagNumTest.InfArg
[       OK ] LlvmLibcFMinimumMagNumTest.InfArg (2 us)
[ RUN      ] LlvmLibcFMinimumMagNumTest.NegInfArg
[       OK ] LlvmLibcFMinimumMagNumTest.NegInfArg (1 us)
[ RUN      ] LlvmLibcFMinimumMagNumTest.BothZero
[       OK ] LlvmLibcFMinimumMagNumTest.BothZero (1 us)
[ RUN      ] LlvmLibcFMinimumMagNumTest.Range
[       OK ] LlvmLibcFMinimumMagNumTest.Range (23 ms)
Ran 5 tests.  PASS: 5  FAIL: 0
[1466/1762] Running unit test libc.test.src.math.smoke.rsqrtf16_test.__unit__
FAILED: libc/test/src/math/smoke/CMakeFiles/libc.test.src.math.smoke.rsqrtf16_test.__unit__ 
cd /home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu/build/libc/test/src/math/smoke && /home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu/build/libc/test/src/math/smoke/libc.test.src.math.smoke.rsqrtf16_test.__unit__.__build__
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcRsqrtf16Test.SpecialNumbers
/home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu/llvm-project/libc/test/src/math/smoke/rsqrtf16_test.cpp:19: FAILURE
      Expected: __llvm_libc_20_0_0_git::fputil::test_except( static_cast<int>((1 | 2 | 4 | 8 | 16))) & ((1) ? (1) : static_cast<int>((1 | 2 | 4 | 8 | 16)))
      Which is: 0
To be equal to: (1)
      Which is: 1
[  FAILED  ] LlvmLibcRsqrtf16Test.SpecialNumbers
Ran 1 tests.  PASS: 0  FAIL: 1
[1467/1762] Running unit test libc.test.src.math.smoke.log10_test.__unit__
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcLog10Test.SpecialNumbers
[       OK ] LlvmLibcLog10Test.SpecialNumbers (36 us)
Ran 1 tests.  PASS: 1  FAIL: 0
[1468/1762] Running unit test libc.test.src.math.smoke.generic_sqrtf128_test.__unit__
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcSqrtTest.SpecialNumbers
[       OK ] LlvmLibcSqrtTest.SpecialNumbers (6 us)
Ran 1 tests.  PASS: 1  FAIL: 0
[1469/1762] Running unit test libc.test.src.math.smoke.remquof_test.__unit__
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcRemQuoTest.SpecialNumbers
[       OK ] LlvmLibcRemQuoTest.SpecialNumbers (6 us)
[ RUN      ] LlvmLibcRemQuoTest.EqualNumeratorAndDenominator
[       OK ] LlvmLibcRemQuoTest.EqualNumeratorAndDenominator (3 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[1470/1762] Running unit test libc.test.src.math.smoke.llrint_test.__unit__
[==========] Running 3 tests from 1 test suite.
[ RUN      ] LlvmLibcRoundToIntegerTest.InfinityAndNaN
[       OK ] LlvmLibcRoundToIntegerTest.InfinityAndNaN (8 us)
[ RUN      ] LlvmLibcRoundToIntegerTest.RoundNumbers
[       OK ] LlvmLibcRoundToIntegerTest.RoundNumbers (9 us)
[ RUN      ] LlvmLibcRoundToIntegerTest.SubnormalRange
[       OK ] LlvmLibcRoundToIntegerTest.SubnormalRange (968 us)
Ran 3 tests.  PASS: 3  FAIL: 0
[1471/1762] Running unit test libc.test.src.math.smoke.fdivl_test.__unit__
[==========] Running 5 tests from 1 test suite.
Step 7 (libc-unit-tests) failure: libc-unit-tests (failure)
...
[ RUN      ] LlvmLibcFMinimumMagNumTest.InfArg
[       OK ] LlvmLibcFMinimumMagNumTest.InfArg (2 us)
[ RUN      ] LlvmLibcFMinimumMagNumTest.NegInfArg
[       OK ] LlvmLibcFMinimumMagNumTest.NegInfArg (1 us)
[ RUN      ] LlvmLibcFMinimumMagNumTest.BothZero
[       OK ] LlvmLibcFMinimumMagNumTest.BothZero (1 us)
[ RUN      ] LlvmLibcFMinimumMagNumTest.Range
[       OK ] LlvmLibcFMinimumMagNumTest.Range (23 ms)
Ran 5 tests.  PASS: 5  FAIL: 0
[1466/1762] Running unit test libc.test.src.math.smoke.rsqrtf16_test.__unit__
FAILED: libc/test/src/math/smoke/CMakeFiles/libc.test.src.math.smoke.rsqrtf16_test.__unit__ 
cd /home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu/build/libc/test/src/math/smoke && /home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu/build/libc/test/src/math/smoke/libc.test.src.math.smoke.rsqrtf16_test.__unit__.__build__
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcRsqrtf16Test.SpecialNumbers
/home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu/llvm-project/libc/test/src/math/smoke/rsqrtf16_test.cpp:19: FAILURE
      Expected: __llvm_libc_20_0_0_git::fputil::test_except( static_cast<int>((1 | 2 | 4 | 8 | 16))) & ((1) ? (1) : static_cast<int>((1 | 2 | 4 | 8 | 16)))
      Which is: 0
To be equal to: (1)
      Which is: 1
[  FAILED  ] LlvmLibcRsqrtf16Test.SpecialNumbers
Ran 1 tests.  PASS: 0  FAIL: 1
[1467/1762] Running unit test libc.test.src.math.smoke.log10_test.__unit__
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcLog10Test.SpecialNumbers
[       OK ] LlvmLibcLog10Test.SpecialNumbers (36 us)
Ran 1 tests.  PASS: 1  FAIL: 0
[1468/1762] Running unit test libc.test.src.math.smoke.generic_sqrtf128_test.__unit__
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcSqrtTest.SpecialNumbers
[       OK ] LlvmLibcSqrtTest.SpecialNumbers (6 us)
Ran 1 tests.  PASS: 1  FAIL: 0
[1469/1762] Running unit test libc.test.src.math.smoke.remquof_test.__unit__
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcRemQuoTest.SpecialNumbers
[       OK ] LlvmLibcRemQuoTest.SpecialNumbers (6 us)
[ RUN      ] LlvmLibcRemQuoTest.EqualNumeratorAndDenominator
[       OK ] LlvmLibcRemQuoTest.EqualNumeratorAndDenominator (3 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[1470/1762] Running unit test libc.test.src.math.smoke.llrint_test.__unit__
[==========] Running 3 tests from 1 test suite.
[ RUN      ] LlvmLibcRoundToIntegerTest.InfinityAndNaN
[       OK ] LlvmLibcRoundToIntegerTest.InfinityAndNaN (8 us)
[ RUN      ] LlvmLibcRoundToIntegerTest.RoundNumbers
[       OK ] LlvmLibcRoundToIntegerTest.RoundNumbers (9 us)
[ RUN      ] LlvmLibcRoundToIntegerTest.SubnormalRange
[       OK ] LlvmLibcRoundToIntegerTest.SubnormalRange (968 us)
Ran 3 tests.  PASS: 3  FAIL: 0
[1471/1762] Running unit test libc.test.src.math.smoke.fdivl_test.__unit__
[==========] Running 5 tests from 1 test suite.

llvm-ci · 2025-09-17T14:23:18Z

LLVM Buildbot has detected a new failure on builder libc-aarch64-ubuntu-fullbuild-dbg running on libc-aarch64-ubuntu while building libc,utils at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/71/builds/31529

Here is the relevant piece of the build log for the reference

Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py ...' (failure)
...
[       OK ] LlvmLibcFAbsTest.SpecialNumbers (17 us)
Ran 1 tests.  PASS: 1  FAIL: 0
[630/1064] Running unit test libc.test.src.math.smoke.canonicalizef_test.__unit__
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcCanonicalizeTest.SpecialNumbers
[       OK ] LlvmLibcCanonicalizeTest.SpecialNumbers (6 us)
[ RUN      ] LlvmLibcCanonicalizeTest.RegularNubmers
[       OK ] LlvmLibcCanonicalizeTest.RegularNubmers (3 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[631/1064] Running unit test libc.test.src.math.smoke.rsqrtf16_test.__unit__
FAILED: libc/test/src/math/smoke/CMakeFiles/libc.test.src.math.smoke.rsqrtf16_test.__unit__ 
cd /home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu-fullbuild-dbg/build/libc/test/src/math/smoke && /home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu-fullbuild-dbg/build/libc/test/src/math/smoke/libc.test.src.math.smoke.rsqrtf16_test.__unit__.__build__
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcRsqrtf16Test.SpecialNumbers
/home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu-fullbuild-dbg/llvm-project/libc/test/src/math/smoke/rsqrtf16_test.cpp:19: FAILURE
      Expected: __llvm_libc_20_0_0_git::fputil::test_except( static_cast<int>((0x1 | 0x2 | 0x4 | 0x8 | 0x10))) & ((0x4) ? (0x4) : static_cast<int>((0x1 | 0x2 | 0x4 | 0x8 | 0x10)))
      Which is: 0
To be equal to: (0x4)
      Which is: 4
[  FAILED  ] LlvmLibcRsqrtf16Test.SpecialNumbers
Ran 1 tests.  PASS: 0  FAIL: 1
[632/1064] Running unit test libc.test.src.math.smoke.sinf_test.__unit__.__NO_ROUND_OPT
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcSinfTest.SpecialNumbers
[       OK ] LlvmLibcSinfTest.SpecialNumbers (7 us)
Ran 1 tests.  PASS: 1  FAIL: 0
[633/1064] Running unit test libc.test.src.math.smoke.bf16divf128_test.__unit__
[==========] Running 5 tests from 1 test suite.
[ RUN      ] LlvmLibcDivTest.SpecialNumbers
[       OK ] LlvmLibcDivTest.SpecialNumbers (14 us)
[ RUN      ] LlvmLibcDivTest.DivisionByZero
[       OK ] LlvmLibcDivTest.DivisionByZero (3 us)
[ RUN      ] LlvmLibcDivTest.InvalidOperations
[       OK ] LlvmLibcDivTest.InvalidOperations (13 us)
[ RUN      ] LlvmLibcDivTest.RangeErrors
[       OK ] LlvmLibcDivTest.RangeErrors (30 us)
[ RUN      ] LlvmLibcDivTest.InexactResults
[       OK ] LlvmLibcDivTest.InexactResults (2 us)
Ran 5 tests.  PASS: 5  FAIL: 0
[634/1064] Running unit test libc.test.src.math.smoke.fminf16_test.__unit__.__NO_MISC_MATH_BASIC_OPS_OPT
[==========] Running 5 tests from 1 test suite.
[ RUN      ] LlvmLibcFMinTest.NaN
[       OK ] LlvmLibcFMinTest.NaN (4 us)
[ RUN      ] LlvmLibcFMinTest.InfArg
[       OK ] LlvmLibcFMinTest.InfArg (2 us)
[ RUN      ] LlvmLibcFMinTest.NegInfArg
[       OK ] LlvmLibcFMinTest.NegInfArg (2 us)
[ RUN      ] LlvmLibcFMinTest.BothZero
[       OK ] LlvmLibcFMinTest.BothZero (2 us)

amemov force-pushed the rsqrtf16-for-c23 branch 2 times, most recently from aaa897a to 1fdc319 Compare September 13, 2025 01:06

amemov marked this pull request as ready for review September 13, 2025 01:08

amemov requested review from aaronmondal, keith and rupprecht as code owners September 13, 2025 01:08

llvmbot added libc bazel "Peripheral" support tier build system: utils/bazel labels Sep 13, 2025

amemov force-pushed the rsqrtf16-for-c23 branch from cd0b0d4 to 07c8ad7 Compare September 13, 2025 01:24

lntue reviewed Sep 13, 2025

View reviewed changes

libc/src/__support/math/rsqrtf16.h Outdated Show resolved Hide resolved

amemov requested a review from lntue September 16, 2025 22:40

lntue reviewed Sep 16, 2025

View reviewed changes

libc/src/__support/math/rsqrtf16.h Outdated Show resolved Hide resolved

lntue reviewed Sep 16, 2025

View reviewed changes

libc/src/__support/math/rsqrtf16.h Outdated Show resolved Hide resolved

lntue reviewed Sep 16, 2025

View reviewed changes

libc/src/__support/math/rsqrtf16.h Outdated Show resolved Hide resolved

lntue reviewed Sep 16, 2025

View reviewed changes

libc/config/linux/x86_64/entrypoints.txt Outdated Show resolved Hide resolved

amemov force-pushed the rsqrtf16-for-c23 branch 2 times, most recently from bfee716 to ffcbff0 Compare September 16, 2025 23:11

amemov requested a review from lntue September 16, 2025 23:48

lntue approved these changes Sep 17, 2025

View reviewed changes

lntue reviewed Sep 17, 2025

View reviewed changes

libc/test/src/math/smoke/rsqrtf16_test.cpp Outdated Show resolved Hide resolved

lntue reviewed Sep 17, 2025

View reviewed changes

libc/test/src/math/smoke/rsqrtf16_test.cpp Outdated Show resolved Hide resolved

- rsqrtf16 refactored

cada218

amemov added 11 commits September 16, 2025 19:59

Clang-formated the files

be0de94

Replaced the computation for valid X with polynomial approximation

4bea2ef

Added range reduction to the approximation

8f1e13c

Added Newton-Raphson iterations

1ff0f39

-The accuracy improved drastically, but it still fails

Added separate handling for mantissa == 0.5f. Resulted in fewer errors

cb4f47f

- Fixed ULP errors

b5066ce

- Refactored the implementation to match the proposal for constexpr - Added rsqrtf16 in Bazel build

clang-formatted the files

c17e1e0

Formatted BUILD.Bazel w/ buildifier

a5b246b

- Addressed the comments and added entrypoints to other targets in Linux

49cb38c

- Replaced src/__support/libc_errno.h with hdr/errno_macros.h

57d417b

amemov force-pushed the rsqrtf16-for-c23 branch from 688f37d to 57d417b Compare September 17, 2025 03:00

amemov requested a review from lntue September 17, 2025 03:44

lntue approved these changes Sep 17, 2025

View reviewed changes

lntue merged commit 80f9c72 into llvm:main Sep 17, 2025
21 checks passed

amemov mentioned this pull request Sep 17, 2025

[libc][math][c23] Improve C23 math function rsqrtf16 #159378

Open

amemov deleted the rsqrtf16-for-c23 branch September 17, 2025 18:28

[libc][math][c23] Add rsqrtf16() function #137545

[libc][math][c23] Add rsqrtf16() function #137545

Uh oh!

Conversation

amemov commented Apr 27, 2025

Uh oh!

github-actions bot commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amemov commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 13, 2025

Uh oh!

amemov commented Sep 13, 2025

Uh oh!

Uh oh!

lntue commented Sep 13, 2025

Uh oh!

amemov commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lntue commented Sep 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amemov commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

llvm-ci commented Sep 17, 2025

Uh oh!

llvm-ci commented Sep 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Apr 27, 2025 •

edited

Loading

amemov commented Apr 30, 2025 •

edited

Loading

amemov commented Sep 13, 2025 •

edited

Loading