`bstr::ByteSlice::chars` is significantly slower than `str::from_utf8` + `str::chars`

I have byte slices of which I would like to obtain the number of characters, as given by `bstr::ByteSlice`'s `chars()`.
I have noted that when the byte slice is a valid UTF-8 string, it is about **8x** faster to convert the byte slice to UTF-8 and then call `str::chars()` instead:

~~~ rust
fn len(b: &[u8]) -> usize {
    use bstr::ByteSlice;
    // 1. only use `bstr`
    //b.chars().count()
    // 2. use `bstr` only as fallback if the bytes cannot be converted to UTF-8
    core::str::from_utf8(b).map_or_else(|_| b.chars().count(), |s| s.chars().count()) 
}

fn main() {
    let mut s = String::new();
    use bstr::ByteSlice;
    while len(&s.as_bytes()) < 10000 {
        s.push('a');
    }
}
~~~

(I also performed this experiment in a real-world program to exclude that the Rust compiler does some nifty optimizations here that dilute the results, but I still got roughly the same factor of performance difference.)

~~~
$ hyperfine -M 5 -L v only,fallback "target/release/bstr-{v}"
Benchmark 1: target/release/bstr-only
  Time (mean ± σ):      2.505 s ±  0.016 s    [User: 2.495 s, System: 0.001 s]
  Range (min … max):    2.487 s …  2.522 s    5 runs
 
Benchmark 2: target/release/bstr-fallback
  Time (mean ± σ):     313.8 ms ±   0.2 ms    [User: 311.7 ms, System: 0.9 ms]
  Range (min … max):   313.5 ms … 314.0 ms    5 runs
 
Summary
  target/release/bstr-fallback ran
    7.98 ± 0.05 times faster than target/release/bstr-only
~~~

This came very unexpected for me, to the point that I think it should either be pointed out in the documentation or, even better, be sped up in the implementation.
I have looked at `bstr`'s `chars()`, but I have not seen obvious ways to speed it up. The question is: Why is `str::from_utf8` + `str::chars` so fast that their combination can walk through the whole bytes _twice_ in 1/8 of the time that `bstr` walks through once? Is this some dark SIMD magic?
The source of `from_utf8` uses [`run_utf8_validation`](https://doc.rust-lang.org/src/core/str/validations.rs.html#126), do you think it would be worthwhile to adapt `bstr`'s `chars()` to use something more akin to that? I would be motivated to try that, but first, I would like to hear your opinion on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`bstr::ByteSlice::chars` is significantly slower than `str::from_utf8` + `str::chars` #221

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

bstr::ByteSlice::chars is significantly slower than str::from_utf8 + str::chars #221

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`bstr::ByteSlice::chars` is significantly slower than `str::from_utf8` + `str::chars` #221