Skip to content

Commit 90342e6

Browse files
committed
feat(tip5): Add AVX-512 version of Tip5
On machines that support it, and using conditional compilation, Tip5 can be vectorized using AVX-512, which results in significant performance improvements. Credit goes to the team at https://kompilers.com/ == How To == To build the new goodies, use: RUSTFLAGS="-C target-feature=+avx512ifma,+avx512f,+avx512bw,+avx512vbmi" cargo build If you are compiling on a machine that has all the necessary features, you can use the following instead: RUSTFLAGS="-C target-cpu=native" cargo build == Performance == You can expect the following raw performance gains: ``` hash_10 time: [336.29 ns 336.92 ns 337.72 ns] change: [−31.354% −31.251% −31.139%] (p = 0.00 < 0.05) Performance has improved. hash_varlen/len/16384 time: [526.63 µs 527.02 µs 527.51 µs] change: [−32.321% −32.151% −32.017%] (p = 0.00 < 0.05) Performance has improved. hash_parallel/len/65536 time: [2.0350 ms 2.0502 ms 2.0656 ms] change: [−10.764% −8.6165% −5.8573%] (p = 0.00 < 0.05) Performance has improved. ``` In addition, there are now specialized functions for SIMD hashing. Since those functions didn't exist before, they cannot be compared quite as easily. For example, we have `hash_varlen_many`: ``` hash_varlen_many/(len, N)/(16384, 2) time: [783.82 µs 787.46 µs 791.45 µs] ``` This is a ~25% speedup over the already sped-up `hash_varlen` (794 μs vs 1054 μs = 2·527 μs). Note that using these new functions requires changes in the calling code, whereas no changes are needed to get the performance benefits of the functions that existed already. All benchmarks performed on the machine known as megingjord. == Gotchas == Note that rust is somewhat conservative when it comes to enabling CPU-architecture specific features. In particular, even if you compile this code on a CPU that has all the necessary AVX features, rust opts to not enable them in anticipation of the binary being distributed. Thus, you must pass `RUSTFLAGS="-C target-cpu=native"` to get access to the vectorized version of Tip5. In the future, we can use dynamic dispatch (https://doc.rust-lang.org/std/arch/#dynamic-cpu-feature-detection) instead, but these current changes don't include that.
2 parents 1504405 + e750c11 commit 90342e6

File tree

6 files changed

+1050
-165
lines changed

6 files changed

+1050
-165
lines changed

.github/workflows/avx512.yml

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
name: Build AVX-512
2+
3+
on:
4+
push:
5+
branches:
6+
- master
7+
pull_request:
8+
branches:
9+
- master
10+
11+
jobs:
12+
rust:
13+
name: Build with AVX-512 features
14+
runs-on: ubuntu-latest
15+
steps:
16+
- name: Checkout repository
17+
uses: actions/checkout@v4
18+
19+
- name: Install stable toolchain
20+
uses: dtolnay/rust-toolchain@1.90.0
21+
22+
- name: Build
23+
run: cargo build
24+
env:
25+
RUSTFLAGS: "-C target-feature=+avx512ifma,+avx512f,+avx512bw,+avx512vbmi"
26+
27+
# This is necessary to pass the `RUSTFLAGS` only to the compiler
28+
# invocation for the final artifact, but _not_ to previous compiler
29+
# invocations, such as build scripts. See also:
30+
# https://doc.rust-lang.org/cargo/reference/config.html#buildrustflags
31+
CARGO_BUILD_TARGET: "x86_64-unknown-linux-gnu"

twenty-first/benches/tip5.rs

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,16 @@ criterion_group!(
1515
config = Criterion::default().measurement_time(Duration::from_secs(8));
1616
targets =
1717
hash_10,
18+
hash_10_x2,
19+
hash_10_many::<2>,
20+
hash_10_many::<200>,
1821
hash_pair,
1922
hash_varlen::<10>,
2023
hash_varlen::<16_384>,
24+
hash_varlen_many::<10, 2>,
25+
hash_varlen_many::<10, 200>,
26+
hash_varlen_many::<16_384, 2>,
27+
hash_varlen_many::<16_384, 200>,
2128
hash_parallel::<65_536>,
2229
);
2330

@@ -26,6 +33,19 @@ fn hash_10(c: &mut Criterion) {
2633
c.bench_function("hash_10", |b| b.iter(|| Tip5::hash_10(&input)));
2734
}
2835

36+
fn hash_10_x2(c: &mut Criterion) {
37+
let [left, right] = random();
38+
c.bench_function("hash_10_x2", |b| b.iter(|| Tip5::hash_10_x2(&left, &right)));
39+
}
40+
41+
fn hash_10_many<const N: usize>(c: &mut Criterion) {
42+
let elements = random::<[_; N]>();
43+
c.benchmark_group("hash_10_many")
44+
.bench_function(BenchmarkId::new("N", N), |b| {
45+
b.iter(|| Tip5::hash_10_many(&elements))
46+
});
47+
}
48+
2949
fn hash_pair(c: &mut Criterion) {
3050
let (left, right) = random();
3151
c.bench_function("hash_pair", |b| b.iter(|| Tip5::hash_pair(left, right)));
@@ -39,6 +59,15 @@ fn hash_varlen<const LEN: usize>(c: &mut Criterion) {
3959
});
4060
}
4161

62+
fn hash_varlen_many<const LEN: usize, const N: usize>(c: &mut Criterion) {
63+
let input = (0..N).map(|_| random_elements(LEN)).collect::<Vec<_>>();
64+
let input = input.iter().map(|row| row.as_slice()).collect::<Vec<_>>();
65+
c.benchmark_group("hash_varlen_many")
66+
.bench_function(BenchmarkId::new("(len, N)", format!("({LEN}, {N})")), |b| {
67+
b.iter(|| Tip5::hash_varlen_many(&input))
68+
});
69+
}
70+
4271
fn hash_parallel<const LEN: usize>(c: &mut Criterion) {
4372
let input = (0..LEN).map(|_| random()).collect::<Vec<_>>();
4473
c.benchmark_group("hash_parallel")

0 commit comments

Comments
 (0)