-
-
Notifications
You must be signed in to change notification settings - Fork 69
bstr::ByteSlice::chars is significantly slower than str::from_utf8 + str::chars #221
Description
I have byte slices of which I would like to obtain the number of characters, as given by bstr::ByteSlice's chars().
I have noted that when the byte slice is a valid UTF-8 string, it is about 8x faster to convert the byte slice to UTF-8 and then call str::chars() instead:
fn len(b: &[u8]) -> usize {
use bstr::ByteSlice;
// 1. only use `bstr`
//b.chars().count()
// 2. use `bstr` only as fallback if the bytes cannot be converted to UTF-8
core::str::from_utf8(b).map_or_else(|_| b.chars().count(), |s| s.chars().count())
}
fn main() {
let mut s = String::new();
use bstr::ByteSlice;
while len(&s.as_bytes()) < 10000 {
s.push('a');
}
}(I also performed this experiment in a real-world program to exclude that the Rust compiler does some nifty optimizations here that dilute the results, but I still got roughly the same factor of performance difference.)
$ hyperfine -M 5 -L v only,fallback "target/release/bstr-{v}"
Benchmark 1: target/release/bstr-only
Time (mean ± σ): 2.505 s ± 0.016 s [User: 2.495 s, System: 0.001 s]
Range (min … max): 2.487 s … 2.522 s 5 runs
Benchmark 2: target/release/bstr-fallback
Time (mean ± σ): 313.8 ms ± 0.2 ms [User: 311.7 ms, System: 0.9 ms]
Range (min … max): 313.5 ms … 314.0 ms 5 runs
Summary
target/release/bstr-fallback ran
7.98 ± 0.05 times faster than target/release/bstr-only
This came very unexpected for me, to the point that I think it should either be pointed out in the documentation or, even better, be sped up in the implementation.
I have looked at bstr's chars(), but I have not seen obvious ways to speed it up. The question is: Why is str::from_utf8 + str::chars so fast that their combination can walk through the whole bytes twice in 1/8 of the time that bstr walks through once? Is this some dark SIMD magic?
The source of from_utf8 uses run_utf8_validation, do you think it would be worthwhile to adapt bstr's chars() to use something more akin to that? I would be motivated to try that, but first, I would like to hear your opinion on this.