Skip to content

Commit 0323e6a

Browse files
committed
Auto merge of rust-lang#86761 - Alexhuszagh:master, r=estebank
Update Rust Float-Parsing Algorithms to use the Eisel-Lemire algorithm. # Summary Rust, although it implements a correct float parser, has major performance issues in float parsing. Even for common floats, the performance can be 3-10x [slower](https://arxiv.org/pdf/2101.11408.pdf) than external libraries such as [lexical](https://github.com/Alexhuszagh/rust-lexical) and [fast-float-rust](https://github.com/aldanor/fast-float-rust). Recently, major advances in float-parsing algorithms have been developed by Daniel Lemire, along with others, and implement a fast, performant, and correct float parser, with speeds up to 1200 MiB/s on Apple's M1 architecture for the [canada](https://github.com/lemire/simple_fastfloat_benchmark/blob/0e2b5d163d4074cc0bde2acdaae78546d6e5c5f1/data/canada.txt) dataset, 10x faster than Rust's 130 MiB/s. In addition, [edge-cases](rust-lang#85234) in Rust's [dec2flt](https://github.com/rust-lang/rust/tree/868c702d0c9a471a28fb55f0148eb1e3e8b1dcc5/library/core/src/num/dec2flt) algorithm can lead to over a 1600x slowdown relative to efficient algorithms. This is due to the use of Clinger's correct, but slow [AlgorithmM and Bellepheron](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.45.4152&rep=rep1&type=pdf), which have been improved by faster big-integer algorithms and the Eisel-Lemire algorithm, respectively. Finally, this algorithm provides substantial improvements in the number of floats the Rust core library can parse. Denormal floats with a large number of digits cannot be parsed, due to use of the `Big32x40`, which simply does not have enough digits to round a float correctly. Using a custom decimal class, with much simpler logic, we can parse all valid decimal strings of any digit count. ```rust // Issue in Rust's dec2fly. "2.47032822920623272088284396434110686182e-324".parse::<f64>(); // Err(ParseFloatError { kind: Invalid }) ``` # Solution This pull request implements the Eisel-Lemire algorithm, modified from [fast-float-rust](https://github.com/aldanor/fast-float-rust) (which is licensed under Apache 2.0/MIT), along with numerous modifications to make it more amenable to inclusion in the Rust core library. The following describes both features in fast-float-rust and improvements in fast-float-rust for inclusion in core. **Documentation** Extensive documentation has been added to ensure the code base may be maintained by others, which explains the algorithms as well as various associated constants and routines. For example, two seemingly magical constants include documentation to describe how they were derived as follows: ```rust // Round-to-even only happens for negative values of q // when q ≥ −4 in the 64-bit case and when q ≥ −17 in // the 32-bitcase. // // When q ≥ 0,we have that 5^q ≤ 2m+1. In the 64-bit case,we // have 5^q ≤ 2m+1 ≤ 2^54 or q ≤ 23. In the 32-bit case,we have // 5^q ≤ 2m+1 ≤ 2^25 or q ≤ 10. // // When q < 0, we have w ≥ (2m+1)×5^−q. We must have that w < 2^64 // so (2m+1)×5^−q < 2^64. We have that 2m+1 > 2^53 (64-bit case) // or 2m+1 > 2^24 (32-bit case). Hence,we must have 2^53×5^−q < 2^64 // (64-bit) and 2^24×5^−q < 2^64 (32-bit). Hence we have 5^−q < 2^11 // or q ≥ −4 (64-bit case) and 5^−q < 2^40 or q ≥ −17 (32-bitcase). // // Thus we have that we only need to round ties to even when // we have that q ∈ [−4,23](in the 64-bit case) or q∈[−17,10] // (in the 32-bit case). In both cases,the power of five(5^|q|) // fits in a 64-bit word. const MIN_EXPONENT_ROUND_TO_EVEN: i32; const MAX_EXPONENT_ROUND_TO_EVEN: i32; ``` This ensures maintainability of the code base. **Improvements for Disguised Fast-Path Cases** The fast path in float parsing algorithms attempts to use native, machine floats to represent both the significant digits and the exponent, which is only possible if both can be exactly represented without rounding. In practice, this means that the significant digits must be 53-bits or less and the then exponent must be in the range `[-22, 22]` (for an f64). This is similar to the existing dec2flt implementation. However, disguised fast-path cases exist, where there are few significant digits and an exponent above the valid range, such as `1.23e25`. In this case, powers-of-10 may be shifted from the exponent to the significant digits, discussed at length in rust-lang#85198. **Digit Parsing Improvements** Typically, integers are parsed from string 1-at-a-time, requiring unnecessary multiplications which can slow down parsing. An approach to parse 8 digits at a time using only 3 multiplications is described in length [here](https://johnnylee-sde.github.io/Fast-numeric-string-to-int/). This leads to significant performance improvements, and is implemented for both big and little-endian systems. **Unsafe Changes** Relative to fast-float-rust, this library makes less use of unsafe functionality and clearly documents it. This includes the refactoring and documentation of numerous unsafe methods undesirably marked as safe. The original code would look something like this, which is deceptively marked as safe for unsafe functionality. ```rust impl AsciiStr { #[inline] pub fn step_by(&mut self, n: usize) -> &mut Self { unsafe { self.ptr = self.ptr.add(n) }; self } } ... #[inline] fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 { // the first character is 'e'/'E' and scientific mode is enabled let start = *s; s.step(); ... } ``` The new code clearly documents safety concerns, and does not mark unsafe functionality as safe, leading to better safety guarantees. ```rust impl AsciiStr { /// Advance the view by n, advancing it in-place to (n..). pub unsafe fn step_by(&mut self, n: usize) -> &mut Self { // SAFETY: same as step_by, safe as long n is less than the buffer length self.ptr = unsafe { self.ptr.add(n) }; self } } ... /// Parse the scientific notation component of a float. fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 { let start = *s; // SAFETY: the first character is 'e'/'E' and scientific mode is enabled unsafe { s.step(); } ... } ``` This allows us to trivially demonstrate the new implementation of dec2flt is safe. **Inline Annotations Have Been Removed** In the previous implementation of dec2flt, inline annotations exist practically nowhere in the entire module. Therefore, these annotations have been removed, which mostly does not impact [performance](aldanor/fast-float-rust#15 (comment)). **Fixed Correctness Tests** Numerous compile errors in `src/etc/test-float-parse` were present, due to deprecation of `time.clock()`, as well as the crate dependencies with `rand`. The tests have therefore been reworked as a [crate](https://github.com/Alexhuszagh/rust/tree/master/src/etc/test-float-parse), and any errors in `runtests.py` have been patched. **Undefined Behavior** An implementation of `check_len` which relied on undefined behavior (in fast-float-rust) has been refactored, to ensure that the behavior is well-defined. The original code is as follows: ```rust #[inline] pub fn check_len(&self, n: usize) -> bool { unsafe { self.ptr.add(n) <= self.end } } ``` And the new implementation is as follows: ```rust /// Check if the slice at least `n` length. fn check_len(&self, n: usize) -> bool { n <= self.as_ref().len() } ``` Note that this has since been fixed in [fast-float-rust](aldanor/fast-float-rust#29). **Inferring Binary Exponents** Rather than explicitly store binary exponents, this new implementation infers them from the decimal exponent, reducing the amount of static storage required. This removes the requirement to store [611 i16s](https://github.com/rust-lang/rust/blob/868c702d0c9a471a28fb55f0148eb1e3e8b1dcc5/library/core/src/num/dec2flt/table.rs#L8). # Code Size The code size, for all optimizations, does not considerably change relative to before for stripped builds, however it is **significantly** smaller prior to stripping the resulting binaries. These binary sizes were calculated on x86_64-unknown-linux-gnu. **new** Using rustc version 1.55.0-dev. opt-level|size|size(stripped) |:-:|:-:|:-:| 0|400k|300K 1|396k|292K 2|392k|292K 3|392k|296K s|396k|292K z|396k|292K **old** Using rustc version 1.53.0-nightly. opt-level|size|size(stripped) |:-:|:-:|:-:| 0|3.2M|304K 1|3.2M|292K 2|3.1M|284K 3|3.1M|284K s|3.1M|284K z|3.1M|284K # Correctness The dec2flt implementation passes all of Rust's unittests and comprehensive float parsing tests, along with numerous other tests such as Nigel Toa's comprehensive float [tests](https://github.com/nigeltao/parse-number-fxx-test-data) and Hrvoje Abraham [strtod_tests](https://github.com/ahrvoje/numerics/blob/master/strtod/strtod_tests.toml). Therefore, it is unlikely that this algorithm will incorrectly round parsed floats. # Issues Addressed This will fix and close the following issues: - resolves rust-lang#85198 - resolves rust-lang#85214 - resolves rust-lang#85234 - fixes rust-lang#31407 - fixes rust-lang#31109 - fixes rust-lang#53015 - resolves rust-lang#68396 - closes aldanor/fast-float-rust#15
2 parents 735e4e7 + 6102fc3 commit 0323e6a

File tree

19 files changed

+2383
-2596
lines changed

19 files changed

+2383
-2596
lines changed

core/src/num/dec2flt/algorithm.rs

Lines changed: 0 additions & 429 deletions
This file was deleted.

core/src/num/dec2flt/common.rs

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
//! Common utilities, for internal use only.
2+
3+
use crate::ptr;
4+
5+
/// Helper methods to process immutable bytes.
6+
pub(crate) trait ByteSlice: AsRef<[u8]> {
7+
unsafe fn first_unchecked(&self) -> u8 {
8+
debug_assert!(!self.is_empty());
9+
// SAFETY: safe as long as self is not empty
10+
unsafe { *self.as_ref().get_unchecked(0) }
11+
}
12+
13+
/// Get if the slice contains no elements.
14+
fn is_empty(&self) -> bool {
15+
self.as_ref().is_empty()
16+
}
17+
18+
/// Check if the slice at least `n` length.
19+
fn check_len(&self, n: usize) -> bool {
20+
n <= self.as_ref().len()
21+
}
22+
23+
/// Check if the first character in the slice is equal to c.
24+
fn first_is(&self, c: u8) -> bool {
25+
self.as_ref().first() == Some(&c)
26+
}
27+
28+
/// Check if the first character in the slice is equal to c1 or c2.
29+
fn first_is2(&self, c1: u8, c2: u8) -> bool {
30+
if let Some(&c) = self.as_ref().first() { c == c1 || c == c2 } else { false }
31+
}
32+
33+
/// Bounds-checked test if the first character in the slice is a digit.
34+
fn first_isdigit(&self) -> bool {
35+
if let Some(&c) = self.as_ref().first() { c.is_ascii_digit() } else { false }
36+
}
37+
38+
/// Check if self starts with u with a case-insensitive comparison.
39+
fn eq_ignore_case(&self, u: &[u8]) -> bool {
40+
debug_assert!(self.as_ref().len() >= u.len());
41+
let iter = self.as_ref().iter().zip(u.iter());
42+
let d = iter.fold(0, |i, (&x, &y)| i | (x ^ y));
43+
d == 0 || d == 32
44+
}
45+
46+
/// Get the remaining slice after the first N elements.
47+
fn advance(&self, n: usize) -> &[u8] {
48+
&self.as_ref()[n..]
49+
}
50+
51+
/// Get the slice after skipping all leading characters equal c.
52+
fn skip_chars(&self, c: u8) -> &[u8] {
53+
let mut s = self.as_ref();
54+
while s.first_is(c) {
55+
s = s.advance(1);
56+
}
57+
s
58+
}
59+
60+
/// Get the slice after skipping all leading characters equal c1 or c2.
61+
fn skip_chars2(&self, c1: u8, c2: u8) -> &[u8] {
62+
let mut s = self.as_ref();
63+
while s.first_is2(c1, c2) {
64+
s = s.advance(1);
65+
}
66+
s
67+
}
68+
69+
/// Read 8 bytes as a 64-bit integer in little-endian order.
70+
unsafe fn read_u64_unchecked(&self) -> u64 {
71+
debug_assert!(self.check_len(8));
72+
let src = self.as_ref().as_ptr() as *const u64;
73+
// SAFETY: safe as long as self is at least 8 bytes
74+
u64::from_le(unsafe { ptr::read_unaligned(src) })
75+
}
76+
77+
/// Try to read the next 8 bytes from the slice.
78+
fn read_u64(&self) -> Option<u64> {
79+
if self.check_len(8) {
80+
// SAFETY: self must be at least 8 bytes.
81+
Some(unsafe { self.read_u64_unchecked() })
82+
} else {
83+
None
84+
}
85+
}
86+
87+
/// Calculate the offset of slice from another.
88+
fn offset_from(&self, other: &Self) -> isize {
89+
other.as_ref().len() as isize - self.as_ref().len() as isize
90+
}
91+
}
92+
93+
impl ByteSlice for [u8] {}
94+
95+
/// Helper methods to process mutable bytes.
96+
pub(crate) trait ByteSliceMut: AsMut<[u8]> {
97+
/// Write a 64-bit integer as 8 bytes in little-endian order.
98+
unsafe fn write_u64_unchecked(&mut self, value: u64) {
99+
debug_assert!(self.as_mut().len() >= 8);
100+
let dst = self.as_mut().as_mut_ptr() as *mut u64;
101+
// NOTE: we must use `write_unaligned`, since dst is not
102+
// guaranteed to be properly aligned. Miri will warn us
103+
// if we use `write` instead of `write_unaligned`, as expected.
104+
// SAFETY: safe as long as self is at least 8 bytes
105+
unsafe {
106+
ptr::write_unaligned(dst, u64::to_le(value));
107+
}
108+
}
109+
}
110+
111+
impl ByteSliceMut for [u8] {}
112+
113+
/// Bytes wrapper with specialized methods for ASCII characters.
114+
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
115+
pub(crate) struct AsciiStr<'a> {
116+
slc: &'a [u8],
117+
}
118+
119+
impl<'a> AsciiStr<'a> {
120+
pub fn new(slc: &'a [u8]) -> Self {
121+
Self { slc }
122+
}
123+
124+
/// Advance the view by n, advancing it in-place to (n..).
125+
pub unsafe fn step_by(&mut self, n: usize) -> &mut Self {
126+
// SAFETY: safe as long n is less than the buffer length
127+
self.slc = unsafe { self.slc.get_unchecked(n..) };
128+
self
129+
}
130+
131+
/// Advance the view by n, advancing it in-place to (1..).
132+
pub unsafe fn step(&mut self) -> &mut Self {
133+
// SAFETY: safe as long as self is not empty
134+
unsafe { self.step_by(1) }
135+
}
136+
137+
/// Iteratively parse and consume digits from bytes.
138+
pub fn parse_digits(&mut self, mut func: impl FnMut(u8)) {
139+
while let Some(&c) = self.as_ref().first() {
140+
let c = c.wrapping_sub(b'0');
141+
if c < 10 {
142+
func(c);
143+
// SAFETY: self cannot be empty
144+
unsafe {
145+
self.step();
146+
}
147+
} else {
148+
break;
149+
}
150+
}
151+
}
152+
}
153+
154+
impl<'a> AsRef<[u8]> for AsciiStr<'a> {
155+
#[inline]
156+
fn as_ref(&self) -> &[u8] {
157+
self.slc
158+
}
159+
}
160+
161+
impl<'a> ByteSlice for AsciiStr<'a> {}
162+
163+
/// Determine if 8 bytes are all decimal digits.
164+
/// This does not care about the order in which the bytes were loaded.
165+
pub(crate) fn is_8digits(v: u64) -> bool {
166+
let a = v.wrapping_add(0x4646_4646_4646_4646);
167+
let b = v.wrapping_sub(0x3030_3030_3030_3030);
168+
(a | b) & 0x8080_8080_8080_8080 == 0
169+
}
170+
171+
/// Iteratively parse and consume digits from bytes.
172+
pub(crate) fn parse_digits(s: &mut &[u8], mut f: impl FnMut(u8)) {
173+
while let Some(&c) = s.get(0) {
174+
let c = c.wrapping_sub(b'0');
175+
if c < 10 {
176+
f(c);
177+
*s = s.advance(1);
178+
} else {
179+
break;
180+
}
181+
}
182+
}
183+
184+
/// A custom 64-bit floating point type, representing `f * 2^e`.
185+
/// e is biased, so it be directly shifted into the exponent bits.
186+
#[derive(Debug, Copy, Clone, PartialEq, Eq, Default)]
187+
pub struct BiasedFp {
188+
/// The significant digits.
189+
pub f: u64,
190+
/// The biased, binary exponent.
191+
pub e: i32,
192+
}
193+
194+
impl BiasedFp {
195+
pub const fn zero_pow2(e: i32) -> Self {
196+
Self { f: 0, e }
197+
}
198+
}

0 commit comments

Comments
 (0)