Skip to content

Commit e830683

Browse files
authored
Update docs and benchmarks for v0.1.2 release (#42)
1 parent d9c06cf commit e830683

33 files changed

+15462
-32028
lines changed

Cargo.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,5 +29,7 @@ aarch64_neon = []
2929
hints = []
3030

3131
[package.metadata.docs.rs]
32-
features = ["public_imp"]
32+
all-features = true
3333
rustdoc-args = ["--cfg", "docsrs"]
34+
default-target = "x86_64-unknown-linux-gnu"
35+
targets = ["aarch64-unknown-linux-gnu"]

README.md

Lines changed: 59 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -2,43 +2,49 @@
22
[![crates.io](https://img.shields.io/crates/v/simdutf8.svg)](https://crates.io/crates/simdutf8)
33
[![docs.rs](https://docs.rs/simdutf8/badge.svg)](https://docs.rs/simdutf8)
44

5-
# simdutf8 – High-speed UTF-8 validation for Rust
5+
# simdutf8 – High-speed UTF-8 validation
66

77
Blazingly fast API-compatible UTF-8 validation for Rust using SIMD extensions, based on the implementation from
8-
[simdjson](https://github.com/simdjson/simdjson). Originally ported to Rust by the developers of [simd-json.rs](https://simd-json.rs).
8+
[simdjson](https://github.com/simdjson/simdjson). Originally ported to Rust by the developers of [simd-json.rs](https://simd-json.rs), but now heavily improved.
99

10-
## Disclaimer
11-
This software should not (yet) be used in production, though it has been tested with sample data as well as
12-
fuzzing and there are no known bugs.
10+
## Status
11+
This library has been thoroughly tested with sample data as well as fuzzing and there are no known bugs.
1312

1413
## Features
1514
* `basic` API for the fastest validation, optimized for valid UTF-8
1615
* `compat` API as a fully compatible replacement for `std::str::from_utf8()`
17-
* Up to 22 times faster than the std library on non-ASCII, up to three times faster on ASCII
18-
* As fast as or faster than the original simdjson implementation
19-
* Supports AVX 2 and SSE 4.2 implementations on x86 and x86-64. ARMv7 and ARMv8 neon support is planned
20-
* Selects the fastest implementation at runtime based on CPU support
16+
* Supports AVX 2 and SSE 4.2 implementations on x86 and x86-64
17+
* 🆕 ARM64 (Aarch64) SIMD is supported with Rust nightly (use feature `aarch64_neon`)
18+
* x86-64: Up to 23 times faster than the std library on valid non-ASCII, up to four times faster on ASCII
19+
* aarch64: Up to eleven times faster than the std library on valid non-ASCII, up to four times faster on ASCII (Apple Silicon)
20+
* Faster than the original simdjson implementation
21+
* Selects the fastest implementation at runtime based on CPU support (on x86)
22+
* Falls back to the excellent std implementation if SIMD extensions are not supported
2123
* Written in pure Rust
2224
* No dependencies
2325
* No-std support
24-
* Falls back to the excellent std implementation if SIMD extensions are not supported
2526

2627
## Quick start
2728
Add the dependency to your Cargo.toml file:
2829
```toml
2930
[dependencies]
30-
simdutf8 = { version = "0.1.1" }
31+
simdutf8 = { version = "0.1.2" }
32+
```
33+
or on ARM64 with Rust Nightly:
34+
```toml
35+
[dependencies]
36+
simdutf8 = { version = "0.1.2", features = ["aarch64_neon"] }
3137
```
3238

33-
Use `simdutf8::basic::from_utf8` as a drop-in replacement for `std::str::from_utf8()`.
39+
Use `simdutf8::basic::from_utf8()` as a drop-in replacement for `std::str::from_utf8()`.
3440

3541
```rust
3642
use simdutf8::basic::from_utf8;
3743

3844
println!("{}", from_utf8(b"I \xE2\x9D\xA4\xEF\xB8\x8F UTF-8!").unwrap());
3945
```
4046

41-
If you need detailed information on validation failures, use `simdutf8::compat::from_utf8`
47+
If you need detailed information on validation failures, use `simdutf8::compat::from_utf8()`
4248
instead.
4349

4450
```rust
@@ -57,16 +63,18 @@ for errors after processing the whole byte sequence and does not provide detaile
5763
is not valid UTF-8. `simdutf8::basic::Utf8Error` is a zero-sized error struct.
5864

5965
### Compat flavor
60-
The `compat` flavor is fully API-compatible with `std::str::from_utf8`. In particular, `simdutf8::compat::from_utf8()`
66+
The `compat` flavor is fully API-compatible with `std::str::from_utf8()`. In particular, `simdutf8::compat::from_utf8()`
6167
returns a `simdutf8::compat::Utf8Error`, which has `valid_up_to()` and `error_len()` methods. The first is useful for
6268
verification of streamed data. The second is useful e.g. for replacing invalid byte sequences with a replacement character.
6369

6470
It also fails early: errors are checked on the fly as the string is processed and once
6571
an invalid UTF-8 sequence is encountered, it returns without processing the rest of the data.
66-
This comes at a performance penalty compared to the `basic` API even if the input is valid UTF-8.
72+
This comes at a slight performance penalty compared to the `basic` API even if the input is valid UTF-8.
6773

6874
## Implementation selection
69-
The fastest implementation is selected at runtime using the `std::is_x86_feature_detected!` macro unless the CPU
75+
76+
### X86
77+
The fastest implementation is selected at runtime using the `std::is_x86_feature_detected!` macro, unless the CPU
7078
targeted by the compiler supports the fastest available implementation.
7179
So if you compile with `RUSTFLAGS="-C target-cpu=native"` on a recent x86-64 machine, the AVX 2 implementation is selected at
7280
compile-time and runtime selection is disabled.
@@ -76,10 +84,18 @@ the targeted CPU. Use `RUSTFLAGS="-C target-feature=+avx2"` for the AVX 2 implem
7684
for the SSE 4.2 implementation.
7785

7886
If you want to be able to call a SIMD implementation directly, use the `public_imp` feature flag. The validation
79-
implementations are then accessible via `simdutf8::(basic|compat)::imp::x86::(avx2|sse42)::validate_utf8()`.
87+
implementations are then accessible via `simdutf8::{basic, compat}::imp::x86::{avx2, sse42}::validate_utf8()`.
88+
89+
### ARM64
90+
For ARM64 support Nightly Rust is needed and the crate feature `aarch64_neon` needs to be enabled. CAVE: If this features is
91+
not turned on the non-SIMD std library implementation is used.
92+
93+
If you want to be able to call a SIMD implementation directly, use the `public_imp` feature flag. The validation implementations
94+
are then accessible via `simdutf8::{basic, compat}::imp::aarch64::neon::validate_utf8()`.
8095

81-
## When not to use
82-
This library uses unsafe code which has not been battle-tested and should not (yet) be used in production.
96+
## Optimisation flags
97+
Do not use [`opt-level = "z"`](https://doc.rust-lang.org/cargo/reference/profiles.html), which prevents inlining and makes
98+
the code quite slow.
8399

84100
## Minimum Supported Rust Version (MSRV)
85101
This crate's minimum supported Rust version is 1.38.0.
@@ -90,50 +106,48 @@ are created with [critcmp](https://github.com/BurntSushi/critcmp). Source code a
90106
[bench directory](https://github.com/rusticstuff/simdutf8/tree/main/bench).
91107

92108
The naming schema is id-charset/size. _0-empty_ is the empty byte slice, _x-error/66536_ is a 64KiB slice where the very
93-
first character is invalid UTF-8. All benchmarks were run on a laptop with an Intel Core i7-10750H CPU (Comet Lake) on
94-
Windows with Rust 1.51.0 if not otherwise stated. Library versions are simdutf8 v0.1.1 and simdjson v0.9.2. When comparing
109+
first character is invalid UTF-8. Library versions are simdutf8 v0.1.2 and simdjson v0.9.2. When comparing
95110
with simdjson simdutf8 is compiled with `#inline(never)`.
96111

97-
### simdutf8 basic vs std library UTF-8 validation
98-
![critcmp stimdutf8 v0.1.1 basic vs std lib](https://user-images.githubusercontent.com/3736990/116121179-a8271f80-a6c0-11eb-9b2b-6233c3c824f2.png)
99-
simdutf8 performs better or as well as the std library.
112+
Configurations:
113+
* X86-64: PC with an AMD Ryzen 7 PRO 3700 CPU (Zen2) on Linux with Rust 1.52.0
114+
* Aarch64: Macbook Air with an Apple M1 CPU (Apple Silicon) on macOS with Rust rustc 1.54.0-nightly (881c1ac40 2021-05-08).
100115

101-
### simdutf8 basic vs simdjson UTF-8 validation on Intel Comet Lake
102-
![critcmp stimdutf8 v0.1.1 basic vs simdjson WSL](https://user-images.githubusercontent.com/3736990/116121748-38656480-a6c1-11eb-8cb4-385c7516a46a.png)
103-
simdutf8 beats simdjson on almost all inputs on this CPU. This benchmark is run on
104-
[WSL](https://docs.microsoft.com/en-us/windows/wsl/install-win10)
105-
since I could not get simdjson to reach maximum performance on Windows with any C++ toolchain (see also simdjson issues
106-
[847](https://github.com/simdjson/simdjson/issues/847) and [848](https://github.com/simdjson/simdjson/issues/848)).
116+
### simdutf8 basic vs std library on x86-64 (AMD Zen2)
117+
![image](https://user-images.githubusercontent.com/3736990/117568104-1c00f900-b0bf-11eb-938f-4c253d192480.png)
118+
Simdutf8 is up to 23 times faster than the std library on valid non-ASCII, up to four times on pure ASCII.
107119

108-
### simdutf8 basic vs simdjson UTF-8 validation on AMD Zen 2
109-
![critcmp stimdutf8 v0.1.1 basic vs simdjson AMD Zen 2](https://user-images.githubusercontent.com/3736990/116122729-731bcc80-a6c2-11eb-82a5-6e297778a1c4.png)
120+
### simdutf8 basic vs std library on aarch64 (Apple Silicon)
121+
![image](https://user-images.githubusercontent.com/3736990/117568160-42bf2f80-b0bf-11eb-86a4-9aeee4cee87d.png)
122+
Simdutf8 is up to to eleven times faster than the std library on valid non-ASCII, up to four times faster on
123+
pure ASCII.
110124

111-
On AMD Zen 2 aligning reads apparently does not matter at all. The extra step for aligning even hurts performance a bit around
112-
an input size of 4096.
125+
### simdutf8 basic vs simdjson on x86-64
126+
![image](https://user-images.githubusercontent.com/3736990/117568231-80bc5380-b0bf-11eb-8e90-1dcc6d966ebd.png)
127+
Simdutf8 is faster than simdjson on almost all inputs.
113128

114-
### simdutf8 basic vs simdutf8 compat UTF-8 validation
115-
![image](https://user-images.githubusercontent.com/3736990/116122427-0dc7db80-a6c2-11eb-8434-f9879742d90d.png)
129+
### simdutf8 basic vs simdutf8 compat UTF-8 on x86-64
130+
![image](https://user-images.githubusercontent.com/3736990/117568270-af3a2e80-b0bf-11eb-8ec4-e5a0a4ad7210.png)
116131
There is a small performance penalty to continuously checking the error status while processing data, but detecting
117132
errors early provides a huge benefit for the _x-error/66536_ benchmark.
118133

119134
## Technical details
120-
On X86 for inputs shorter than 64 bytes validation is delegated to `core::str::from_utf8()`.
135+
For inputs shorter than 64 bytes validation is delegated to `core::str::from_utf8()` except for the direct-access
136+
functions in `simdutf8::{basic, compat}::imp`.
121137

122-
The SIMD implementation is similar to the one in simdjson except that it aligns reads to the block size of the
123-
SIMD extension, which leads to better peak performance compared to the implementation in simdjson on some CPUs.
124-
This alignment means that an incomplete block needs to be processed before the aligned data is read, which
125-
leads to worse performance on byte sequences shorter than 2048 bytes. Thus, aligned reads are only used with
126-
2048 bytes of data or more. Incomplete reads for the first unaligned and the last incomplete block are done in
127-
two aligned 64-byte buffers.
138+
The SIMD implementation is mostly similar to the one in simdjson except that it is has additional optimizations
139+
for the pure ASCII case. Also it uses prefetch with AVX 2 on x86 which leads to slightly better performance with
140+
some Intel CPUs on synthetic benchmarks.
128141

129-
For the compat API, we need to check the error buffer on each 64-byte block instead of just aggregating it. If an
142+
For the compat API, we need to check the error status vector on each 64-byte block instead of just aggregating it. If an
130143
error is found, the last bytes of the previous block are checked for a cross-block continuation and then
131144
`std::str::from_utf8()` is run to find the exact location of the error.
132145

133146
Care is taken that all functions are properly inlined up to the public interface.
134147

135148
## Thanks
136-
* to the authors of simdjson for coming up with the high-performance SIMD implementation.
149+
* to the authors of simdjson for coming up with the high-performance SIMD implementation and in particular to Daniel Lemire
150+
for his feedback. It was very helpful.
137151
* to the authors of the simdjson Rust port who did most of the heavy lifting of porting the C++ code to Rust.
138152

139153

TODO.md

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,19 @@
11
# TODO
22

33
# LATER
4-
* test coverage
5-
* more fuzz testing
64
* armv7 support (with neon runtime selection?)
7-
* fuzzers: extract common code into crate/module and add honggfuzz
85
* streaming API + experimental simdjson support
96
* faster/smarter error position detection
107
* try out [multiversion](https://docs.rs/multiversion/0.6.1/multiversion/)
11-
12-
# NEXT
8+
* test all available stable implementations by default as if public_imp were specified
9+
* try proptests again
1310
* clean up algorithm src.
1411
* document prev()
1512
* newtype -> use (mostly)
1613
* use imports instead of fully qualified at places
1714
* trait for SimdU8Value impl.
1815
* bikeshed: SimdU8Value -> SimdU8Vector | SimdU8xNative | ...
1916

20-
* test all available stable implementations by default as if public_imp were specified
21-
* document aarch64; docs-rs arch building
22-
* discourage -Oz
23-
* std handling: no-std + extern crate std if std
24-
* Doc: remove for Rust from README header but not from description
25-
* proptests?
17+
# NEXT
2618

2719
# OTHER

0 commit comments

Comments
 (0)