Skip to content

Performance numbers can't be reproduced #14

@enkore

Description

@enkore

From the README:

  varint-simd unsafe varint-simd safe rustc integer-encoding-rs prost
u8 554.81 283.26 131.71 116.59 131.42
u16 493.96 349.74 168.09 121.35 157.68
u32 482.95 332.11 191.37 120.16 196.05
u64 330.86 277.65 82.315 80.328 97.585

u64 expressed as ratios:

  varint-simd unsafe varint-simd safe rustc integer-encoding-rs prost
u64 MOPS 330.86 277.65 82.315 80.328 97.585
u64 ratio 1 0.84 0.25 0.24 0.29

Does not reproduce (AMD Zen 3, RUSTFLAGS="-Ctarget-cpu=native" - however, this flag has no noticable performance impact):

  varint-simd unsafe varint-simd safe rustc integer-encoding-rs prost simple
u64 MOPS 232 151 107 136 105 123
u64 ratio 1 0.65 0.46 0.58 0.45 0.53

So at least as of rustc 1.92, the safe interface of this crate only offers a moderate speed-up versus even naive code (see "simple" below). (For reasons which I'm still looking into, using varint-simd in place of the trivial decode loop actually somehow halves throughput in application code)

Where "simple" is the trivial, totally naive way of writing this:

fn varint_decode(data: &[u8]) -> (u64, usize) {
    let mut result: u64 = 0;
    let mut shift = 0;
    for (i, &byte) in data.iter().enumerate() {
        result |= ((byte & 0x7f) as u64) << shift;
        if byte & 0x80 == 0 {
            return (result, i + 1);
        }
        shift += 7;
        if shift > 63 {
            return panic!("varint overflow");
        }
    }
    panic!("truncated varint");
}

Enabling the native-optimizations feature does improve performance slightly, but only by about 10%, which doesn't change the overall picture:

$ RUSTFLAGS="-Ctarget-cpu=native" cargo bench u64/decode --features native-optimizations
varint-u64/decode/varint-simd/unsafe
                        time:   [985.31 ns 985.54 ns 985.78 ns]
                        thrpt:  [259.69 Melem/s 259.76 Melem/s 259.82 Melem/s]
                 change:
                        time:   [-12.597% -11.919% -11.360%] (p = 0.00 < 0.05)
                        thrpt:  [+12.816% +13.531% +14.413%]
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  7 (7.00%) low severe
  5 (5.00%) low mild
  2 (2.00%) high mild
varint-u64/decode/varint-simd/safe
                        time:   [1.5620 µs 1.5625 µs 1.5631 µs]
                        thrpt:  [163.77 Melem/s 163.84 Melem/s 163.89 Melem/s]
                 change:
                        time:   [-7.7026% -7.6085% -7.5128%] (p = 0.00 < 0.05)
                        thrpt:  [+8.1231% +8.2351% +8.3454%]
                        Performance has improved.

Complete run (cpu-target=native, default-features):

varint-u64/decode/integer-encoding
                        time:   [1.8694 µs 1.8705 µs 1.8717 µs]
                        thrpt:  [136.77 Melem/s 136.86 Melem/s 136.94 Melem/s]
varint-u64/decode/rustc time:   [2.3652 µs 2.3768 µs 2.3886 µs]
                        thrpt:  [107.17 Melem/s 107.71 Melem/s 108.23 Melem/s]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
varint-u64/decode/simple
                        time:   [2.0804 µs 2.0817 µs 2.0835 µs]
                        thrpt:  [122.87 Melem/s 122.98 Melem/s 123.05 Melem/s]
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) low severe
  5 (5.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
varint-u64/decode/prost-varint
                        time:   [2.4307 µs 2.4321 µs 2.4334 µs]
                        thrpt:  [105.20 Melem/s 105.26 Melem/s 105.32 Melem/s]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
varint-u64/decode/varint-simd/unsafe
                        time:   [1.1021 µs 1.1044 µs 1.1075 µs]
                        thrpt:  [231.14 Melem/s 231.80 Melem/s 232.29 Melem/s]
Found 8 outliers among 100 measurements (8.00%)
  8 (8.00%) high mild
varint-u64/decode/varint-simd/safe
                        time:   [1.6921 µs 1.6926 µs 1.6931 µs]
                        thrpt:  [151.20 Melem/s 151.25 Melem/s 151.29 Melem/s]
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) low severe
  5 (5.00%) low mild
  1 (1.00%) high severe

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions