Skip to content

Improve quality of f32/f64 generation.#103

Merged
taiki-e merged 9 commits intosmol-rs:masterfrom
reinerp:master
Oct 4, 2025
Merged

Improve quality of f32/f64 generation.#103
taiki-e merged 9 commits intosmol-rs:masterfrom
reinerp:master

Conversation

@reinerp
Copy link
Copy Markdown
Contributor

@reinerp reinerp commented May 11, 2025

The previous int-to-float conversion had a bias of probability 2^-24 / 2^-53 for types f32 / f64 respectively. The new conversion has a bias of 2^-64, which is the same bias as the underlying WyRand generator. See comments in lib.rs for explanation.

The new conversion is a slightly shorter instruction sequence on x86 and ARM, but executes as 1 more uop on x86. Seems unlikely to harm performance much, if at all. https://rust.godbolt.org/z/q3zMxEc3T.

The added tests in smoke.rs fail with the old conversion and succeed with the new conversion.

reinerp added 8 commits May 10, 2025 22:06
The previous int-to-float conversion had a bias of probability 2^-24 / 2^-53 for types f32 / f64 respectively. The new conversion has a bias of 2^-64, which is the same bias as the underlying WyRand generator.

The new conversion is a slightly shorter instruction sequence on x86 and ARM, but executes as 1 more uop on x86. Seems unlikely to harm performance much, if at all. https://rust.godbolt.org/z/q3zMxEc3T
This is unavailable on thumbv7m-none-eabi.
They both end up compiling to the same after inlining and simplification, but the former requires more work from the optimizer to get there.
Copy link
Copy Markdown
Collaborator

@taiki-e taiki-e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

@taiki-e taiki-e merged commit 6ceb4f1 into smol-rs:master Oct 4, 2025
14 checks passed
}

#[test]
fn digit() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I accidentally removed several unrelated tests when resolving conflicts. Filed #110 to re-add them.

Comment on lines +400 to +406
loop {
let x = self.f32_inclusive();
if x < 1.0 {
return x;
}
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the perf of this measured? Seems like adding a branch and a loop to f32 might have performance implications I'm not comfortable with releasing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't measure at the time since there was no existing benchmark; I only looked at assembly.

I've now added a benchmark in #112 and measured the results on AArch64 (M2 Pro) and x86-64 (Zen4). I get:

AArch64:
0.449ns/iter: old f32().   (2, 3, or 4 iters unrolled)
0.344ns/iter: new f32_inclusive().  (4 iters unrolled)
0.587ns/iter: new f32().  (2 iters unrolled)

x86-64:
0.856ns/iter: old f32().  (3 iterations)
0.649ns/iter: new f32_inclusive().
0.772ns/iter: new f32().  (2 iterations unrolled)

So the change is always a win if you switch to f32_inclusive(), and it's a win on x86-64 (but a loss on AArch64) if you stay with f32().

Note that the branch is almost-always taken (probability 1 - 2^-23) so the branch predictor will predict this ~perfectly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(These numbers are the benchmark numbers, divided by 10_000, since the benchmark itself generates 10_000 f32s.)

@taiki-e taiki-e mentioned this pull request Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants