Skip to content

libm: Fix _status status outputs on null-mantissa inputs#1085

Merged
tgross35 merged 4 commits intorust-lang:mainfrom
tgross35:round-cleanup
Feb 13, 2026
Merged

libm: Fix _status status outputs on null-mantissa inputs#1085
tgross35 merged 4 commits intorust-lang:mainfrom
tgross35:round-cleanup

Conversation

@tgross35
Copy link
Contributor

@tgross35 tgross35 commented Feb 7, 2026

Values like 0.5 that have an exponent but not a mantissa are currently being reported as OK rather than INEXACT for ceil, floor, and trunc. Fix this and clean up the test cases.

ci: allow-regressions

@tgross35 tgross35 force-pushed the round-cleanup branch 2 times, most recently from c04e284 to 7214141 Compare February 7, 2026 15:09
/// `x / y` when `x != 0.0` and `y == 0.0`,
#[cfg_attr(not(feature = "unstable-public-internals"), allow(dead_code))]
pub const DIVIDE_BY_ZERO: Self = Self(1 << 2);
pub const DIVIDE_BY_ZERO: Self = Self(1 << 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing the reason why this skipped a bit is that they used the same bit positions as the x86 status registers: x86-64 MXCSR Register as well as the separate x87 status word. Those use 0x2 is for the non-standard Denormal Operand Exception.

The patched contiguous labeling matches Arm, but RISC-V has yet another layout (same low 5 bits but in reverse order).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. Is there any reason to be consistent with these? I can't think of much purpose to that unless we use host floats with the environment via asm and load the register back to a Status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, and doing that would require one of

  • mapping to and from the architecture specific layout at that interface
  • making these constants architecture specific

But we don't use it like that, so there's no need. Now that the Debug impl prints only the names, and since the UNKNOWN bits should not be settable, the actual bit-values are private to this module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +30 to +38
// If the to-be-masked-out portion is already zero, we have an exact result
if (xi & !mask) == IntTy::<F>::ZERO {
return FpResult::ok(x);
}

// C5: Otherwise the result is inexact and we will truncate. Raise `FE_INEXACT`, mask the
// Otherwise the result is inexact and we will truncate. Raise `FE_INEXACT`, mask the
// result, and return.

let status = if xi & F::SIG_MASK == F::Int::ZERO {
Status::OK
} else {
Status::INEXACT
};
xi &= mask;
FpResult::new(F::from_bits(xi), status)
FpResult::new(F::from_bits(xi), Status::INEXACT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If xi & !mask == 0, then xi & mask == xi, so this can just return F::from_bits(xi & mask) and avoid the branch.

Also, it might be slightly better to use the negated mask ("bits to clear") from the start.

let cleared = xi & clear_mask;
let status = if cleared == 0 { Status::OK } else { Status::INEXACT };
FpResult::new(F::from_bits(xi ^ cleared), status)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this came with some nice small speedups

[660](https://github.com/rust-lang/compiler-builtins/actions/runs/21939477624/job/63361314470?pr=1085#step:7:2661)
icount::icount_bench_trunc_group::icount_bench_trunc logspace:setup_trunc()
  Baselines:                      softfloat|softfloat (old)
  Instructions:                        7057|7579                 (-6.88745%) [-1.07397x]
  L1 Hits:                             7674|8197                 (-6.38038%) [-1.06815x]
  LL Hits:                                2|1                    (+100.000%) [+2.00000x]
  RAM Hits:                               3|3                    (No change)
  Total read+write:                    7679|8201                 (-6.36508%) [-1.06798x]
  Estimated Cycles:                    7789|8307                 (-6.23570%) [-1.06650x]
icount::icount_bench_truncf128_group::icount_bench_truncf128 logspace:setup_truncf128()
  Baselines:                      softfloat|softfloat (old)
  Instructions:                       12263|13271                (-7.59551%) [-1.08220x]
  L1 Hits:                            14883|15889                (-6.33142%) [-1.06759x]
  LL Hits:                                3|5                    (-40.0000%) [-1.66667x]
  RAM Hits:                               6|6                    (No change)
  Total read+write:                   14892|15900                (-6.33962%) [-1.06769x]
  Estimated Cycles:                   15108|16124                (-6.30117%) [-1.06725x]
icount::icount_bench_truncf16_group::icount_bench_truncf16 logspace:setup_truncf16()
  Baselines:                      softfloat|softfloat (old)
  Instructions:                        7952|8726                 (-8.87004%) [-1.09733x]
  L1 Hits:                             8520|9294                 (-8.32795%) [-1.09085x]
  LL Hits:                                1|1                    (No change)
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    8526|9300                 (-8.32258%) [-1.09078x]
  Estimated Cycles:                    8700|9474                 (-8.16973%) [-1.08897x]
icount::icount_bench_truncf_group::icount_bench_truncf logspace:setup_truncf()
  Baselines:                      softfloat|softfloat (old)
  Instructions:                        7281|7865                 (-7.42530%) [-1.08021x]
  L1 Hits:                             7896|8479                 (-6.87581%) [-1.07383x]
  LL Hits:                                2|3                    (-33.3333%) [-1.50000x]
  RAM Hits:                               5|5                    (No change)
  Total read+write:                    7903|8487                 (-6.88111%) [-1.07390x]
  Estimated Cycles:                    8081|8669                 (-6.78279%) [-1.07276x]

status = Status::INEXACT;
// |x| < 1.0, zero or inexact with truncation

if (ix & !F::SIGN_MASK) == F::Int::ZERO {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance regression didn't make sense to me, so I looked at the assembly and then LLVM-IR, and it seems that LLVM is being "smart" by turning this into a floating point comparison x == 0.0_f16, and that generates the equivalent of x as f32 == 0.0_f32 with a call to __extendhfsf2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into all of this. I wrote up an issue here llvm/llvm-project#180630, wonder what else might be hurting from it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On x86-64, these contain one or more calls to __extendhfsf2 (f16 as f32):

    fmaximumf16
    fminimumf16
    fmaximum_numf16
    fminimum_numf16
    fmaxf16
    fminf16
    fdimf16
    sqrtf16
    rintf16
    roundf16
    fmodf16
    ldexpf16

The min/max -functions are doing floating point comparisons, but might benefit from inspecting the bits instead. The others do float arithmetic and also contains calls to __truncsfhf2 (f32 as f16). So none of them are explicit bit-operations being turned into float ops.

Most other libcalls are in various f128-functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the benchmarks primarily affect f16 and f128 and that we can't really easily work around LLVM, I'm just going to accept the regressions here.

@tgross35 tgross35 force-pushed the round-cleanup branch 2 times, most recently from 57a0c3b to 0505d7a Compare February 12, 2026 08:41
@rustbot

This comment has been minimized.

@rustbot

This comment has been minimized.

@tgross35 tgross35 requested a review from quaternic February 12, 2026 08:50
Copy link
Contributor

@quaternic quaternic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rustbot
Copy link
Collaborator

rustbot commented Feb 13, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

Values like 0.5 that have an exponent but not a mantissa are currently
being reported as OK rather than INEXACT for ceil, floor, and trunc. Fix
this and clean up the test cases.

This tries to keep the algorithm diff as simple as possible, which
introduces some small performance regressions. This is improved in a
future commit.
ceil seems to optimize better, but this gets closer to the pre-fix
codegen.
Suggested-by: Juho Kahala <57393910+quaternic@users.noreply.github.com>
@tgross35 tgross35 enabled auto-merge (rebase) February 13, 2026 11:07
@tgross35 tgross35 merged commit cdc64b3 into rust-lang:main Feb 13, 2026
40 checks passed
@tgross35 tgross35 deleted the round-cleanup branch February 13, 2026 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants