-
-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Description
From the README:
| varint-simd unsafe | varint-simd safe | rustc | integer-encoding-rs | prost | |
|---|---|---|---|---|---|
| u8 | 554.81 | 283.26 | 131.71 | 116.59 | 131.42 |
| u16 | 493.96 | 349.74 | 168.09 | 121.35 | 157.68 |
| u32 | 482.95 | 332.11 | 191.37 | 120.16 | 196.05 |
| u64 | 330.86 | 277.65 | 82.315 | 80.328 | 97.585 |
u64 expressed as ratios:
| varint-simd unsafe | varint-simd safe | rustc | integer-encoding-rs | prost | |
|---|---|---|---|---|---|
| u64 MOPS | 330.86 | 277.65 | 82.315 | 80.328 | 97.585 |
| u64 ratio | 1 | 0.84 | 0.25 | 0.24 | 0.29 |
Does not reproduce (AMD Zen 3, RUSTFLAGS="-Ctarget-cpu=native" - however, this flag has no noticable performance impact):
| varint-simd unsafe | varint-simd safe | rustc | integer-encoding-rs | prost | simple | |
|---|---|---|---|---|---|---|
| u64 MOPS | 232 | 151 | 107 | 136 | 105 | 123 |
| u64 ratio | 1 | 0.65 | 0.46 | 0.58 | 0.45 | 0.53 |
So at least as of rustc 1.92, the safe interface of this crate only offers a moderate speed-up versus even naive code (see "simple" below). (For reasons which I'm still looking into, using varint-simd in place of the trivial decode loop actually somehow halves throughput in application code)
Where "simple" is the trivial, totally naive way of writing this:
fn varint_decode(data: &[u8]) -> (u64, usize) {
let mut result: u64 = 0;
let mut shift = 0;
for (i, &byte) in data.iter().enumerate() {
result |= ((byte & 0x7f) as u64) << shift;
if byte & 0x80 == 0 {
return (result, i + 1);
}
shift += 7;
if shift > 63 {
return panic!("varint overflow");
}
}
panic!("truncated varint");
}Enabling the native-optimizations feature does improve performance slightly, but only by about 10%, which doesn't change the overall picture:
$ RUSTFLAGS="-Ctarget-cpu=native" cargo bench u64/decode --features native-optimizations
varint-u64/decode/varint-simd/unsafe
time: [985.31 ns 985.54 ns 985.78 ns]
thrpt: [259.69 Melem/s 259.76 Melem/s 259.82 Melem/s]
change:
time: [-12.597% -11.919% -11.360%] (p = 0.00 < 0.05)
thrpt: [+12.816% +13.531% +14.413%]
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
7 (7.00%) low severe
5 (5.00%) low mild
2 (2.00%) high mild
varint-u64/decode/varint-simd/safe
time: [1.5620 µs 1.5625 µs 1.5631 µs]
thrpt: [163.77 Melem/s 163.84 Melem/s 163.89 Melem/s]
change:
time: [-7.7026% -7.6085% -7.5128%] (p = 0.00 < 0.05)
thrpt: [+8.1231% +8.2351% +8.3454%]
Performance has improved.
Complete run (cpu-target=native, default-features):
varint-u64/decode/integer-encoding
time: [1.8694 µs 1.8705 µs 1.8717 µs]
thrpt: [136.77 Melem/s 136.86 Melem/s 136.94 Melem/s]
varint-u64/decode/rustc time: [2.3652 µs 2.3768 µs 2.3886 µs]
thrpt: [107.17 Melem/s 107.71 Melem/s 108.23 Melem/s]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
varint-u64/decode/simple
time: [2.0804 µs 2.0817 µs 2.0835 µs]
thrpt: [122.87 Melem/s 122.98 Melem/s 123.05 Melem/s]
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) low severe
5 (5.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
varint-u64/decode/prost-varint
time: [2.4307 µs 2.4321 µs 2.4334 µs]
thrpt: [105.20 Melem/s 105.26 Melem/s 105.32 Melem/s]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
varint-u64/decode/varint-simd/unsafe
time: [1.1021 µs 1.1044 µs 1.1075 µs]
thrpt: [231.14 Melem/s 231.80 Melem/s 232.29 Melem/s]
Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high mild
varint-u64/decode/varint-simd/safe
time: [1.6921 µs 1.6926 µs 1.6931 µs]
thrpt: [151.20 Melem/s 151.25 Melem/s 151.29 Melem/s]
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) low severe
5 (5.00%) low mild
1 (1.00%) high severe
Metadata
Metadata
Assignees
Labels
No labels