Closed
Conversation
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add benchmarks for uniform distribution / single-sample variates for
Simdtypes:u8x8, u8x16, u8x32, u8x64, i16x8, i16x16, i16x32.Also, change the name of the non-SIMD benchmarks to "x1" e.g.
sample_i16x1/SmallRng/distr.Motivation
This is a pre-requisite for any type of SIMD optimisation.
Details
Sample output (5800X):
$ cargo +nightly bench --bench uniform --features simd_support -- SmallRng Finished `bench` profile [optimized] target(s) in 0.03s Running benches/uniform.rs (target/release/deps/uniform-fff5b09bd763a405) sample_i8x1/SmallRng/single time: [1.5923 ns 1.5929 ns 1.5936 ns] Found 8408 outliers among 100000 measurements (8.41%) 278 (0.28%) low severe 303 (0.30%) low mild 3554 (3.55%) high mild 4273 (4.27%) high severe Criterion.rs ERROR: Error in Gnuplot: line 0: Can't plot with an empty x range! Benchmarking sample_i8x1/SmallRng/distr: Warming up for 1.0000 s Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.3s, enable flat sampling, or reduce sample count to 53020. sample_i8x1/SmallRng/distr time: [1.0419 ns 1.0422 ns 1.0425 ns] Found 12826 outliers among 100000 measurements (12.83%) 4043 (4.04%) high mild 8783 (8.78%) high severe sample_i16x1/SmallRng/single time: [1.4809 ns 1.4812 ns 1.4817 ns] Found 22841 outliers among 100000 measurements (22.84%) 105 (0.10%) high mild 22736 (22.74%) high severe Criterion.rs ERROR: Error in Gnuplot: line 0: Can't plot with an empty x range! Criterion.rs ERROR: Error in Gnuplot: line 0: Can't plot with an empty x range! Criterion.rs ERROR: Error in Gnuplot: line 0: Can't plot with an empty x range! Benchmarking sample_i16x1/SmallRng/distr: Warming up for 1.0000 s Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.3s, enable flat sampling, or reduce sample count to 52990. sample_i16x1/SmallRng/distr time: [1.0454 ns 1.0458 ns 1.0461 ns] Found 12165 outliers among 100000 measurements (12.16%) 3140 (3.14%) high mild 9025 (9.03%) high severe sample_i32x1/SmallRng/single time: [3.1251 ns 3.1331 ns 3.1411 ns] Found 12 outliers among 100000 measurements (0.01%) 5 (0.01%) high mild 7 (0.01%) high severe sample_i32x1/SmallRng/distr time: [1.9073 ns 1.9139 ns 1.9206 ns] Found 8875 outliers among 100000 measurements (8.88%) 5546 (5.55%) high mild 3329 (3.33%) high severe sample_i64x1/SmallRng/single time: [4.3892 ns 4.3953 ns 4.4018 ns] Found 73 outliers among 100000 measurements (0.07%) 64 (0.06%) high mild 9 (0.01%) high severe sample_i64x1/SmallRng/distr time: [1.7550 ns 1.7616 ns 1.7681 ns] Found 9194 outliers among 100000 measurements (9.19%) 5378 (5.38%) high mild 3816 (3.82%) high severe sample_i128x1/SmallRng/single time: [9.7639 ns 9.7734 ns 9.7839 ns] Found 162 outliers among 100000 measurements (0.16%) 135 (0.14%) high mild 27 (0.03%) high severe sample_i128x1/SmallRng/distr time: [3.8971 ns 3.9066 ns 3.9166 ns] Found 8601 outliers among 100000 measurements (8.60%) 6231 (6.23%) high mild 2370 (2.37%) high severe sample_u8x8/SmallRng/single time: [23.973 ns 24.000 ns 24.027 ns] Found 1602 outliers among 100000 measurements (1.60%) 1209 (1.21%) low mild 300 (0.30%) high mild 93 (0.09%) high severe sample_u8x8/SmallRng/distr time: [12.358 ns 12.379 ns 12.400 ns] Found 300 outliers among 100000 measurements (0.30%) 264 (0.26%) high mild 36 (0.04%) high severe sample_u8x16/SmallRng/single time: [43.275 ns 43.308 ns 43.344 ns] Found 407 outliers among 100000 measurements (0.41%) 2 (0.00%) low mild 342 (0.34%) high mild 63 (0.06%) high severe sample_u8x16/SmallRng/distr time: [26.532 ns 26.560 ns 26.587 ns] Found 289 outliers among 100000 measurements (0.29%) 6 (0.01%) low mild 243 (0.24%) high mild 40 (0.04%) high severe sample_u8x32/SmallRng/single time: [71.023 ns 71.052 ns 71.083 ns] Found 1010 outliers among 100000 measurements (1.01%) 350 (0.35%) low mild 625 (0.62%) high mild 35 (0.04%) high severe sample_u8x32/SmallRng/distr time: [35.570 ns 35.598 ns 35.625 ns] Found 906 outliers among 100000 measurements (0.91%) 397 (0.40%) low mild 402 (0.40%) high mild 107 (0.11%) high severe sample_u8x64/SmallRng/single time: [121.40 ns 121.44 ns 121.49 ns] Found 3279 outliers among 100000 measurements (3.28%) 206 (0.21%) low mild 1918 (1.92%) high mild 1155 (1.16%) high severe sample_u8x64/SmallRng/distr time: [54.650 ns 54.680 ns 54.711 ns] Found 845 outliers among 100000 measurements (0.84%) 358 (0.36%) low mild 363 (0.36%) high mild 124 (0.12%) high severe sample_i16x8/SmallRng/single time: [31.671 ns 31.700 ns 31.730 ns] Found 1196 outliers among 100000 measurements (1.20%) 475 (0.47%) low mild 637 (0.64%) high mild 84 (0.08%) high severe sample_i16x8/SmallRng/distr time: [21.576 ns 21.602 ns 21.628 ns] Found 335 outliers among 100000 measurements (0.34%) 31 (0.03%) low mild 283 (0.28%) high mild 21 (0.02%) high severe sample_i16x16/SmallRng/single time: [50.998 ns 51.044 ns 51.093 ns] Found 1362 outliers among 100000 measurements (1.36%) 755 (0.76%) high mild 607 (0.61%) high severe sample_i16x16/SmallRng/distr time: [29.717 ns 29.745 ns 29.773 ns] Found 228 outliers among 100000 measurements (0.23%) 1 (0.00%) low mild 179 (0.18%) high mild 48 (0.05%) high severe sample_i16x32/SmallRng/single time: [83.029 ns 83.066 ns 83.106 ns] Found 2168 outliers among 100000 measurements (2.17%) 534 (0.53%) low mild 1000 (1.00%) high mild 634 (0.63%) high severe sample_i16x32/SmallRng/distr time: [43.454 ns 43.482 ns 43.511 ns] Found 1070 outliers among 100000 measurements (1.07%) 516 (0.52%) low mild 497 (0.50%) high mild 57 (0.06%) high severeFurther motivation
In particular, I wanted to know whether the
target_featureoptimisations insrc/distr/utils.rsare useful. Not using the sse2 and avx2 features on my CPU (which doesn't support AVX512) I get very similar results implying they may not be useful:I was planning to then test whether or not run-time detection of CPU features was viable, but with the above results it may not even be worth asking.