Commit 90342e6
committed
feat(tip5): Add AVX-512 version of Tip5
On machines that support it, and using conditional compilation, Tip5 can
be vectorized using AVX-512, which results in significant performance
improvements. Credit goes to the team at https://kompilers.com/
== How To ==
To build the new goodies, use:
RUSTFLAGS="-C target-feature=+avx512ifma,+avx512f,+avx512bw,+avx512vbmi"
cargo build
If you are compiling on a machine that has all the necessary features,
you can use the following instead:
RUSTFLAGS="-C target-cpu=native" cargo build
== Performance ==
You can expect the following raw performance gains:
```
hash_10
time: [336.29 ns 336.92 ns 337.72 ns]
change: [−31.354% −31.251% −31.139%] (p = 0.00 < 0.05)
Performance has improved.
hash_varlen/len/16384
time: [526.63 µs 527.02 µs 527.51 µs]
change: [−32.321% −32.151% −32.017%] (p = 0.00 < 0.05)
Performance has improved.
hash_parallel/len/65536
time: [2.0350 ms 2.0502 ms 2.0656 ms]
change: [−10.764% −8.6165% −5.8573%] (p = 0.00 < 0.05)
Performance has improved.
```
In addition, there are now specialized functions for SIMD hashing. Since
those functions didn't exist before, they cannot be compared quite as
easily. For example, we have `hash_varlen_many`:
```
hash_varlen_many/(len, N)/(16384, 2)
time: [783.82 µs 787.46 µs 791.45 µs]
```
This is a ~25% speedup over the already sped-up `hash_varlen` (794 μs vs
1054 μs = 2·527 μs). Note that using these new functions requires
changes in the calling code, whereas no changes are needed to get the
performance benefits of the functions that existed already.
All benchmarks performed on the machine known as megingjord.
== Gotchas ==
Note that rust is somewhat conservative when it comes to enabling
CPU-architecture specific features. In particular, even if you compile
this code on a CPU that has all the necessary AVX features, rust opts
to not enable them in anticipation of the binary being distributed.
Thus, you must pass `RUSTFLAGS="-C target-cpu=native"` to get access to
the vectorized version of Tip5. In the future, we can use dynamic
dispatch
(https://doc.rust-lang.org/std/arch/#dynamic-cpu-feature-detection)
instead, but these current changes don't include that.6 files changed
+1050
-165
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
18 | 21 | | |
19 | 22 | | |
20 | 23 | | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
21 | 28 | | |
22 | 29 | | |
23 | 30 | | |
| |||
26 | 33 | | |
27 | 34 | | |
28 | 35 | | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
29 | 49 | | |
30 | 50 | | |
31 | 51 | | |
| |||
39 | 59 | | |
40 | 60 | | |
41 | 61 | | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
42 | 71 | | |
43 | 72 | | |
44 | 73 | | |
| |||
0 commit comments