|
1 | | -# StringWa.rs: The Empire Strikes Text |
| 1 | +# StringWa.rs: Text Processing on CPUs & GPUs, in Python & Rust 🦀 |
2 | 2 |
|
3 | 3 |  |
4 | 4 |
|
5 | | -_Not to pick a fight, but let there be String Wars!_ 😅 |
6 | | -Jokes aside, many __great__ libraries for string processing exist. |
7 | | -_Mostly, of course, written in Assembly, C, and C++, but some in Rust as well._ 😅 |
8 | | -And many of those "native" projects also ship first‑class Python bindings — so you'll see their Rust crates and Python wheels side‑by‑side in the comparisons. |
| 5 | +There are many __great__ libraries for string processing! |
| 6 | +Mostly, of course, written in Assembly, C, and C++, but some in Rust as well. 😅 |
9 | 7 |
|
10 | | -Where Rust decimates C and C++, however, is the __simplicity__ of dependency management, making it great for benchmarking "Systems Software" and lining up apples‑to‑apples across native crates and their Python bindings. |
11 | | -So, to accelerate the development of the [`StringZilla`](https://github.com/ashvardanian/StringZilla) C library _(with Rust and Python bindings)_, I've created this repository to compare it against some of my & communities most beloved Rust projects, like: |
| 8 | +Where Rust decimates C and C++, is the __simplicity__ of dependency management, making it great for benchmarking "Systems Software" and lining up apples-to-apples across native crates and their Python bindings. |
| 9 | +So, to accelerate the development of the [`StringZilla`](https://github.com/ashvardanian/StringZilla) C, C++, and CUDA libraries (with Rust and Python bindings), I've created this repository to compare it against some of my & communities most beloved Rust projects, like: |
12 | 10 |
|
13 | 11 | - [`memchr`](https://github.com/BurntSushi/memchr) for substring search. |
14 | 12 | - [`rapidfuzz`](https://github.com/rapidfuzz/rapidfuzz-rs) for edit distances. |
15 | | -- [`aHash`](https://github.com/tkaitchuck/aHash) for hashing. |
16 | | -- [`aho_corasick`](https://github.com/BurntSushi/aho-corasick) for multi-pattern search. |
17 | | -- [`tantivy`](https://github.com/quickwit-oss/tantivy) for document retrieval. |
| 13 | +- [`aHash`](https://github.com/tkaitchuck/aHash) and [`crc32fast`](https://github.com/srijs/rust-crc32fast) for hashing. |
| 14 | +- [`aho_corasick`](https://github.com/BurntSushi/aho-corasick) and [`regex`](https://github.com/rust-lang/regex) for multi-search. |
| 15 | +- [`arrow`](https://github.com/apache/arrow-rs) and [`polars`](https://github.com/pola-rs/polars) for collections. |
18 | 16 |
|
19 | 17 | Of course, the functionality of the projects is different, as are the APIs and the usage patterns. |
20 | 18 | So, I focus on the workloads for which StringZilla was designed and compare the throughput of the core operations. |
21 | | -Notably, I also favor modern hardware with support for a wider range SIMD instructions, like mask-equipped AVX-512 on x86 starting from the 2015 Intel Skylake-X CPUs or more recent predicated variable-length SVE and SVE2 on Arm, that aren't supported by most of the existing libraries and Rust tooling. |
| 19 | +Notably, I also favor modern hardware with support for a wider range SIMD instructions, like mask-equipped AVX-512 on x86 starting from the 2015 Intel Skylake-X CPUs or more recent predicated variable-length SVE and SVE2 on Arm, that aren't often supported by existing libraries and tooling. |
22 | 20 |
|
23 | 21 | > [!IMPORTANT] |
24 | 22 | > The numbers in the tables below are provided for reference only and may vary depending on the CPU, compiler, dataset, and tokenization method. |
@@ -158,24 +156,26 @@ Those operations mostly are implemented using conventional algorithms: |
158 | 156 | - Comparison-based Quicksort or Mergesort for sorting. |
159 | 157 | - Hash-based or Tree-based algorithms for intersections. |
160 | 158 |
|
161 | | -Assuming the comparisons can be accelerated with SIMD and so can be the hash functions, StringZilla could already provide a performance boost in such applications, but starting with v4 it also provides specialized algorithms for sorting and intersections. |
| 159 | +Assuming the compares can be accelerated with SIMD and so can be the hash functions, StringZilla could already provide a performance boost in such applications, but starting with v4 it also provides specialized algorithms for sorting and intersections. |
162 | 160 | Those are directly compatible with arbitrary string-comparable collection types with a support of an indexed access to the elements. |
163 | 161 |
|
164 | | -| Library | Short Words | Long Lines | |
165 | | -| ------------------------------------------- | ---------------------------: | -------------------------: | |
166 | | -| Rust 🦀 | | | |
167 | | -| `std::sort_unstable_by_key` | 54.35 M comparisons/s | 57.70 M comparisons/s | |
168 | | -| `rayon::par_sort_unstable_by_key` on 1x CPU | 47.08 M comparisons/s | 50.35 M comparisons/s | |
169 | | -| `arrow::lexsort_to_indices` | 122.20 M comparisons/s | __84.73 M comparisons/s__ | |
170 | | -| `stringzilla::argsort_permutation` | __182.88 M comparisons/s__ | 74.64 M comparisons/s | |
171 | | -| | | | |
172 | | -| Python 🐍 | | | |
173 | | -| `list.sort` on 1x CPU | 47.06 M comparisons/s | 22.36 M comparisons/s | |
174 | | -| `pandas.Series.sort_values` on 1x CPU | 9.39 M comparisons/s | 11.93 M comparisons/s | |
175 | | -| `pyarrow.compute.sort_indices` on 1x CPU | 62.17 M comparisons/s | 5.53 M comparisons/s | |
176 | | -| `polars.Series.sort` on 1x CPU | 223.38 M comparisons/s | __181.60 M comparisons/s__ | |
177 | | -| `cudf.Series.sort_values` on 1x GPU | __9'463.59 M comparisons/s__ | 66.44 M comparisons/s | |
178 | | -| `stringzilla.Strs.sorted` on 1x CPU | 171.13 M comparisons/s | 77.88 M comparisons/s | |
| 162 | +| Library | Short Words | Long Lines | |
| 163 | +| ------------------------------------------- | ------------------------: | ----------------------: | |
| 164 | +| Rust 🦀 | | | |
| 165 | +| `std::sort_unstable_by_key` | 54.35 M compares/s | 57.70 M compares/s | |
| 166 | +| `rayon::par_sort_unstable_by_key` on 1x CPU | 47.08 M compares/s | 50.35 M compares/s | |
| 167 | +| `polars::Series::sort` | 200.34 M compares/s | 65.44 M compares/s | |
| 168 | +| `polars::Series::arg_sort` | 25.01 M compares/s | 14.05 M compares/s | |
| 169 | +| `arrow::lexsort_to_indices` | 122.20 M compares/s | __84.73 M compares/s__ | |
| 170 | +| `stringzilla::argsort_permutation` | __213.73 M compares/s__ | 74.64 M compares/s | |
| 171 | +| | | | |
| 172 | +| Python 🐍 | | | |
| 173 | +| `list.sort` on 1x CPU | 47.06 M compares/s | 22.36 M compares/s | |
| 174 | +| `pandas.Series.sort_values` on 1x CPU | 9.39 M compares/s | 11.93 M compares/s | |
| 175 | +| `pyarrow.compute.sort_indices` on 1x CPU | 62.17 M compares/s | 5.53 M compares/s | |
| 176 | +| `polars.Series.sort` on 1x CPU | 223.38 M compares/s | __181.60 M compares/s__ | |
| 177 | +| `cudf.Series.sort_values` on 1x GPU | __9'463.59 M compares/s__ | 66.44 M compares/s | |
| 178 | +| `stringzilla.Strs.sorted` on 1x CPU | 171.13 M compares/s | 77.88 M compares/s | |
179 | 179 |
|
180 | 180 | ## Random Generation & Lookup Tables |
181 | 181 |
|
|
0 commit comments