Skip to content

bstr::ByteSlice::chars is significantly slower than str::from_utf8 + str::chars #221

@01mf02

Description

@01mf02

I have byte slices of which I would like to obtain the number of characters, as given by bstr::ByteSlice's chars().
I have noted that when the byte slice is a valid UTF-8 string, it is about 8x faster to convert the byte slice to UTF-8 and then call str::chars() instead:

fn len(b: &[u8]) -> usize {
    use bstr::ByteSlice;
    // 1. only use `bstr`
    //b.chars().count()
    // 2. use `bstr` only as fallback if the bytes cannot be converted to UTF-8
    core::str::from_utf8(b).map_or_else(|_| b.chars().count(), |s| s.chars().count()) 
}

fn main() {
    let mut s = String::new();
    use bstr::ByteSlice;
    while len(&s.as_bytes()) < 10000 {
        s.push('a');
    }
}

(I also performed this experiment in a real-world program to exclude that the Rust compiler does some nifty optimizations here that dilute the results, but I still got roughly the same factor of performance difference.)

$ hyperfine -M 5 -L v only,fallback "target/release/bstr-{v}"
Benchmark 1: target/release/bstr-only
  Time (mean ± σ):      2.505 s ±  0.016 s    [User: 2.495 s, System: 0.001 s]
  Range (min … max):    2.487 s …  2.522 s    5 runs
 
Benchmark 2: target/release/bstr-fallback
  Time (mean ± σ):     313.8 ms ±   0.2 ms    [User: 311.7 ms, System: 0.9 ms]
  Range (min … max):   313.5 ms … 314.0 ms    5 runs
 
Summary
  target/release/bstr-fallback ran
    7.98 ± 0.05 times faster than target/release/bstr-only

This came very unexpected for me, to the point that I think it should either be pointed out in the documentation or, even better, be sped up in the implementation.
I have looked at bstr's chars(), but I have not seen obvious ways to speed it up. The question is: Why is str::from_utf8 + str::chars so fast that their combination can walk through the whole bytes twice in 1/8 of the time that bstr walks through once? Is this some dark SIMD magic?
The source of from_utf8 uses run_utf8_validation, do you think it would be worthwhile to adapt bstr's chars() to use something more akin to that? I would be motivated to try that, but first, I would like to hear your opinion on this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions