libm: Fix `_status` status outputs on null-mantissa inputs by tgross35 · Pull Request #1085 · rust-lang/compiler-builtins

tgross35 · 2026-02-07T14:51:53Z

Values like 0.5 that have an exponent but not a mantissa are currently being reported as OK rather than INEXACT for ceil, floor, and trunc. Fix this and clean up the test cases.

ci: allow-regressions

quaternic · 2026-02-09T02:05:36Z

libm/src/math/support/env.rs

    /// `x / y` when `x != 0.0` and `y == 0.0`,
    #[cfg_attr(not(feature = "unstable-public-internals"), allow(dead_code))]
-    pub const DIVIDE_BY_ZERO: Self = Self(1 << 2);
+    pub const DIVIDE_BY_ZERO: Self = Self(1 << 1);


I'm guessing the reason why this skipped a bit is that they used the same bit positions as the x86 status registers: x86-64 MXCSR Register as well as the separate x87 status word. Those use 0x2 is for the non-standard Denormal Operand Exception.

The patched contiguous labeling matches Arm, but RISC-V has yet another layout (same low 5 bits but in reverse order).

Great catch. Is there any reason to be consistent with these? I can't think of much purpose to that unless we use host floats with the environment via asm and load the register back to a Status.

Yeah, and doing that would require one of

mapping to and from the architecture specific layout at that interface

making these constants architecture specific

But we don't use it like that, so there's no need. Now that the Debug impl prints only the names, and since the UNKNOWN bits should not be settable, the actual bit-values are private to this module.

Added a note for this here https://github.com/rust-lang/compiler-builtins/compare/32417939f6558916252215c25a7c48b33c41b7df..5c981cfffd98bfb5135395712409811b6d280903

quaternic · 2026-02-09T02:10:43Z

libm/src/math/generic/trunc.rs

+    // If the to-be-masked-out portion is already zero, we have an exact result
    if (xi & !mask) == IntTy::<F>::ZERO {
        return FpResult::ok(x);
    }

-    // C5: Otherwise the result is inexact and we will truncate. Raise `FE_INEXACT`, mask the
+    // Otherwise the result is inexact and we will truncate. Raise `FE_INEXACT`, mask the
    // result, and return.
-
-    let status = if xi & F::SIG_MASK == F::Int::ZERO {
-        Status::OK
-    } else {
-        Status::INEXACT
-    };
    xi &= mask;
-    FpResult::new(F::from_bits(xi), status)
+    FpResult::new(F::from_bits(xi), Status::INEXACT)


If xi & !mask == 0, then xi & mask == xi, so this can just return F::from_bits(xi & mask) and avoid the branch.

Also, it might be slightly better to use the negated mask ("bits to clear") from the start.

let cleared = xi & clear_mask; let status = if cleared == 0 { Status::OK } else { Status::INEXACT }; FpResult::new(F::from_bits(xi ^ cleared), status)

Thanks, this came with some nice small speedups

[660](https://github.com/rust-lang/compiler-builtins/actions/runs/21939477624/job/63361314470?pr=1085#step:7:2661) icount::icount_bench_trunc_group::icount_bench_trunc logspace:setup_trunc() Baselines: softfloat|softfloat (old) Instructions: 7057|7579 (-6.88745%) [-1.07397x] L1 Hits: 7674|8197 (-6.38038%) [-1.06815x] LL Hits: 2|1 (+100.000%) [+2.00000x] RAM Hits: 3|3 (No change) Total read+write: 7679|8201 (-6.36508%) [-1.06798x] Estimated Cycles: 7789|8307 (-6.23570%) [-1.06650x] icount::icount_bench_truncf128_group::icount_bench_truncf128 logspace:setup_truncf128() Baselines: softfloat|softfloat (old) Instructions: 12263|13271 (-7.59551%) [-1.08220x] L1 Hits: 14883|15889 (-6.33142%) [-1.06759x] LL Hits: 3|5 (-40.0000%) [-1.66667x] RAM Hits: 6|6 (No change) Total read+write: 14892|15900 (-6.33962%) [-1.06769x] Estimated Cycles: 15108|16124 (-6.30117%) [-1.06725x] icount::icount_bench_truncf16_group::icount_bench_truncf16 logspace:setup_truncf16() Baselines: softfloat|softfloat (old) Instructions: 7952|8726 (-8.87004%) [-1.09733x] L1 Hits: 8520|9294 (-8.32795%) [-1.09085x] LL Hits: 1|1 (No change) RAM Hits: 5|5 (No change) Total read+write: 8526|9300 (-8.32258%) [-1.09078x] Estimated Cycles: 8700|9474 (-8.16973%) [-1.08897x] icount::icount_bench_truncf_group::icount_bench_truncf logspace:setup_truncf() Baselines: softfloat|softfloat (old) Instructions: 7281|7865 (-7.42530%) [-1.08021x] L1 Hits: 7896|8479 (-6.87581%) [-1.07383x] LL Hits: 2|3 (-33.3333%) [-1.50000x] RAM Hits: 5|5 (No change) Total read+write: 7903|8487 (-6.88111%) [-1.07390x] Estimated Cycles: 8081|8669 (-6.78279%) [-1.07276x]

quaternic · 2026-02-09T02:50:13Z

libm/src/math/generic/floor.rs

-            status = Status::INEXACT;
+        // |x| < 1.0, zero or inexact with truncation
+
+        if (ix & !F::SIGN_MASK) == F::Int::ZERO {


The performance regression didn't make sense to me, so I looked at the assembly and then LLVM-IR, and it seems that LLVM is being "smart" by turning this into a floating point comparison x == 0.0_f16, and that generates the equivalent of x as f32 == 0.0_f32 with a call to __extendhfsf2.

Thanks for digging into all of this. I wrote up an issue here llvm/llvm-project#180630, wonder what else might be hurting from it.

On x86-64, these contain one or more calls to __extendhfsf2 (f16 as f32):

fmaximumf16 fminimumf16 fmaximum_numf16 fminimum_numf16 fmaxf16 fminf16 fdimf16 sqrtf16 rintf16 roundf16 fmodf16 ldexpf16

The min/max -functions are doing floating point comparisons, but might benefit from inspecting the bits instead. The others do float arithmetic and also contains calls to __truncsfhf2 (f32 as f16). So none of them are explicit bit-operations being turned into float ops.

Most other libcalls are in various f128-functions.

Given the benchmarks primarily affect f16 and f128 and that we can't really easily work around LLVM, I'm just going to accept the regressions here.

libm/src/math/generic/trunc.rs

quaternic

LGTM

rustbot · 2026-02-13T10:58:29Z

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

Values like 0.5 that have an exponent but not a mantissa are currently being reported as OK rather than INEXACT for ceil, floor, and trunc. Fix this and clean up the test cases. This tries to keep the algorithm diff as simple as possible, which introduces some small performance regressions. This is improved in a future commit.

ceil seems to optimize better, but this gets closer to the pre-fix codegen.

Suggested-by: Juho Kahala <57393910+quaternic@users.noreply.github.com>

tgross35 force-pushed the round-cleanup branch 2 times, most recently from c04e284 to 7214141 Compare February 7, 2026 15:09

quaternic reviewed Feb 9, 2026

View reviewed changes

tgross35 force-pushed the round-cleanup branch 2 times, most recently from 57a0c3b to 0505d7a Compare February 12, 2026 08:41

This comment has been minimized.

Sign in to view

tgross35 force-pushed the round-cleanup branch from 0505d7a to 332416d Compare February 12, 2026 08:42

This comment has been minimized.

Sign in to view

tgross35 requested a review from quaternic February 12, 2026 08:50

quaternic reviewed Feb 12, 2026

View reviewed changes

libm/src/math/generic/trunc.rs Outdated Show resolved Hide resolved

quaternic approved these changes Feb 12, 2026

View reviewed changes

libm: Improve debug output for Status

cc8c042

tgross35 force-pushed the round-cleanup branch from 2a4aec6 to eeaf909 Compare February 13, 2026 10:58

tgross35 added 3 commits February 13, 2026 05:04

libm: Add an optimization for floor

c48aa16

ceil seems to optimize better, but this gets closer to the pre-fix codegen.

libm: Add an optimization for trunc

d7e350f

Suggested-by: Juho Kahala <57393910+quaternic@users.noreply.github.com>

tgross35 force-pushed the round-cleanup branch from eeaf909 to d7e350f Compare February 13, 2026 11:06

tgross35 enabled auto-merge (rebase) February 13, 2026 11:07

tgross35 merged commit cdc64b3 into rust-lang:main Feb 13, 2026
40 checks passed

tgross35 deleted the round-cleanup branch February 13, 2026 11:47

Conversation

tgross35 commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

Uh oh!

quaternic left a comment

Choose a reason for hiding this comment

Uh oh!

rustbot commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tgross35 commented Feb 7, 2026 •

edited

Loading