@@ -14,8 +14,8 @@ fuzzing and there are no known bugs.
1414## Features
1515* ` basic ` API for the fastest validation, optimized for valid UTF-8
1616* ` compat ` API as a fully compatible replacement for ` std::str::from_utf8() `
17- * Up to twenty times faster than the std library on non-ASCII, up to twice as fast on ASCII <-TBD!
18- * Up to 28% faster on non-ASCII input compared to the original simdjson implementation on some CPUs
17+ * Up to 22 times faster than the std library on non-ASCII, up to three times faster on ASCII
18+ * As fast as or faster than the original simdjson implementation
1919* Supports AVX 2 and SSE 4.2 implementations on x86 and x86-64. ARMv7 and ARMv8 neon support is planned
2020* Selects the fastest implementation at runtime based on CPU support
2121* Written in pure Rust
@@ -75,7 +75,7 @@ For no-std support (compiled with `--no-default-features`) the implementation is
7575the targeted CPU. Use ` RUSTFLAGS="-C target-feature=+avx2" ` for the AVX 2 implementation or ` RUSTFLAGS="-C target-feature=+sse4.2" `
7676for the SSE 4.2 implementation.
7777
78- If you want to be able to call A SIMD implementation directly, use the ` public_imp ` feature flag. The validation
78+ If you want to be able to call a SIMD implementation directly, use the ` public_imp ` feature flag. The validation
7979implementations are then accessible via ` simdutf8::(basic|compat)::imp::x86::(avx2|sse42)::validate_utf8() ` .
8080
8181## When not to use
@@ -85,26 +85,34 @@ This library uses unsafe code which has not been battle-tested and should not (y
8585This crate's minimum supported Rust version is 1.38.0.
8686
8787## Benchmarks
88- TBD!
8988The benchmarks have been done with [ criterion] ( https://bheisler.github.io/criterion.rs/book/index.html ) , the tables
9089are created with [ critcmp] ( https://github.com/BurntSushi/critcmp ) . Source code and data are in the
9190[ bench directory] ( https://github.com/rusticstuff/simdutf8/tree/main/bench ) .
9291
9392The name schema is id-charset/size. _ 0-empty_ is the empty byte slice, _ x-error/66536_ is a 64KiB slice where the very
9493first character is invalid UTF-8. All benchmarks were run on a laptop with an Intel Core i7-10750H CPU (Comet Lake) on
95- Windows with Rust 1.51.0. Library versions are simdutf8 v0.1.0 and simdjson v0.9.2.
94+ Windows with Rust 1.51.0 if not otherwise stated. Library versions are simdutf8 v0.1.1 and simdjson v0.9.2. When comparing
95+ with simdjson simdutf8 is compiled with ` #inline(never) ` .
9696
9797### simdutf8 basic vs std library UTF-8 validation
98- ![ critcmp stimdutf8 basic vs std lib] ( https://raw .githubusercontent.com/rusticstuff/simdutf8/main/img/basic-vs-std .png )
99- simdutf8 performs better except for inputs ≤ 64 bytes .
98+ ![ critcmp stimdutf8 v0.1.1 basic vs std lib] ( https://user-images .githubusercontent.com/3736990/116121179-a8271f80-a6c0-11eb-9b2b-6233c3c824f2 .png )
99+ simdutf8 performs better or as well as the std library .
100100
101- ### simdutf8 basic vs simdjson UTF-8 validation
102- ![ critcmp st lib vs stimdutf8 basic] ( https://raw.githubusercontent.com/rusticstuff/simdutf8/main/img/basic-vs-simdjson.png )
103- simdutf8 is faster than simdjson except for some crazy optimization by clang for the pure ASCII
104- loop (to be investigated). simdjson is compiled using clang and gcc from MSYS.
101+ ### simdutf8 basic vs simdjson UTF-8 validation on Intel Comet Lake
102+ ![ critcmp stimdutf8 v0.1.1 basic vs simdjson WSL] ( https://user-images.githubusercontent.com/3736990/116121748-38656480-a6c1-11eb-8cb4-385c7516a46a.png )
103+ simdutf8 beats simdjson on almost all inputs on this CPU. This benchmark is run on
104+ [ WSL] ( https://docs.microsoft.com/en-us/windows/wsl/install-win10 )
105+ since I could not get simdjson to reach maximum performance on Windows with any C++ toolchain (see also simdjson issues
106+ [ 847] ( https://github.com/simdjson/simdjson/issues/847 ) and [ 848] ( https://github.com/simdjson/simdjson/issues/848 ) ).
107+
108+ ### simdutf8 basic vs simdjson UTF-8 validation on AMD Zen 2
109+ ![ critcmp stimdutf8 v0.1.1 basic vs simdjson AMD Zen 2] ( https://user-images.githubusercontent.com/3736990/116122729-731bcc80-a6c2-11eb-82a5-6e297778a1c4.png )
110+
111+ On AMD Zen 2 aligning reads apparently does not matter at all. The extra step for aligning even hurts performance a bit around
112+ an input size of 4096.
105113
106114### simdutf8 basic vs simdutf8 compat UTF-8 validation
107- ![ critcmp st lib vs stimdutf8 basic ] ( https://raw .githubusercontent.com/rusticstuff/simdutf8/main/img/basic-vs-compat .png )
115+ ![ image ] ( https://user-images .githubusercontent.com/3736990/116122427-0dc7db80-a6c2-11eb-8434-f9879742d90d .png )
108116There is a small performance penalty to continuously checking the error status while processing data, but detecting
109117errors early provides a huge benefit for the _ x-error/66536_ benchmark.
110118
0 commit comments