Skip to content

Commit 8bec499

Browse files
committed
fix: Use iter_batched() and iter_batched_ref() for the remaining benchmarks
The bulk of the changes was done Claude Sonnet 4. Additionally I moved `DVector` allocations outside of the benchmark, and added anything allocated and not consumed into a return tuple of a benchmark closure to ensure that implicit drop/free is not included into the measured time. This fixes https://github.com/dimforge/nalgebra/issues/1547 for the remaining benchmarks. Benchmark results before vs. after all changes: mat2_mul_m time: [1.1043 ns 1.1058 ns 1.1077 ns] change: [+49.306% +49.651% +50.045%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) low severe 2 (2.00%) high mild 6 (6.00%) high severe mat3_mul_m time: [3.1885 ns 3.1945 ns 3.2038 ns] change: [+102.62% +103.63% +104.86%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe mat4_mul_m time: [6.7759 ns 6.7840 ns 6.7929 ns] change: [+130.65% +131.50% +132.59%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) low severe 3 (3.00%) high mild 4 (4.00%) high severe mat2_tr_mul_m time: [1.2882 ns 1.2901 ns 1.2926 ns] change: [+75.005% +75.472% +75.928%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) low severe 1 (1.00%) high mild 3 (3.00%) high severe mat3_tr_mul_m time: [3.1688 ns 3.1725 ns 3.1770 ns] change: [+101.61% +102.10% +102.66%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 4 (4.00%) high mild 4 (4.00%) high severe mat4_tr_mul_m time: [6.5406 ns 6.5453 ns 6.5508 ns] change: [+121.95% +122.66% +123.42%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 3 (3.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 6 (6.00%) high severe mat2_add_m time: [644.68 ps 645.88 ps 647.24 ps] change: [−13.049% −12.530% −11.972%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe mat3_add_m time: [1.3543 ns 1.3572 ns 1.3607 ns] change: [−14.707% −13.705% −12.403%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 6 (6.00%) low severe 5 (5.00%) high mild 4 (4.00%) high severe mat4_add_m time: [2.3987 ns 2.4015 ns 2.4044 ns] change: [−20.676% −19.615% −18.453%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 6 (6.00%) low severe 5 (5.00%) high mild 3 (3.00%) high severe mat2_sub_m time: [637.47 ps 638.88 ps 640.62 ps] change: [−13.604% −13.020% −12.333%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe mat3_sub_m time: [1.3531 ns 1.3546 ns 1.3562 ns] change: [−15.139% −14.610% −14.084%] (p = 0.00 < 0.05) Performance has improved. Found 16 outliers among 100 measurements (16.00%) 5 (5.00%) low severe 1 (1.00%) low mild 6 (6.00%) high mild 4 (4.00%) high severe mat4_sub_m time: [2.3972 ns 2.3996 ns 2.4021 ns] change: [−20.412% −19.249% −18.330%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) low severe 1 (1.00%) high mild 3 (3.00%) high severe mat2_mul_v time: [774.43 ps 775.48 ps 776.73 ps] change: [+144.90% +145.51% +146.12%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 5 (5.00%) high mild 3 (3.00%) high severe mat3_mul_v time: [1.6843 ns 1.6858 ns 1.6874 ns] change: [+284.57% +285.82% +287.43%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) low severe 1 (1.00%) high mild 3 (3.00%) high severe mat4_mul_v time: [2.6029 ns 2.6196 ns 2.6485 ns] change: [+255.34% +257.62% +261.68%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe single_mat2_mul_v time: [392.29 ps 393.45 ps 394.87 ps] Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe single_mat3_mul_v time: [650.16 ps 651.47 ps 653.07 ps] Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) low severe 3 (3.00%) high mild 4 (4.00%) high severe single_mat4_mul_v time: [1.0665 ns 1.0690 ns 1.0722 ns] Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe mat2_tr_mul_v time: [719.95 ps 720.92 ps 722.16 ps] change: [+127.86% +128.34% +128.98%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 1 (1.00%) low severe 2 (2.00%) low mild 7 (7.00%) high mild 4 (4.00%) high severe mat3_tr_mul_v time: [1.6551 ns 1.6564 ns 1.6577 ns] change: [+277.57% +278.32% +279.16%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe mat4_tr_mul_v time: [2.6477 ns 2.6546 ns 2.6666 ns] change: [+259.47% +260.55% +261.67%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) low severe 3 (3.00%) high mild 3 (3.00%) high severe single_mat2_tr_mul_v time: [353.60 ps 355.50 ps 358.48 ps] Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low mild 4 (4.00%) high mild 3 (3.00%) high severe single_mat3_tr_mul_v time: [778.13 ps 779.43 ps 781.25 ps] Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 3 (3.00%) high mild 5 (5.00%) high severe single_mat4_tr_mul_v time: [1.1887 ns 1.1906 ns 1.1930 ns] Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe mat2_mul_s time: [774.44 ps 775.33 ps 776.37 ps] change: [+6.0947% +6.3308% +6.5936%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe mat3_mul_s time: [962.59 ps 964.98 ps 967.43 ps] change: [−38.097% −37.694% −37.145%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low severe 3 (3.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe mat4_mul_s time: [1.6589 ns 1.6640 ns 1.6684 ns] change: [−43.668% −43.130% −42.518%] (p = 0.00 < 0.05) Performance has improved. Found 18 outliers among 100 measurements (18.00%) 8 (8.00%) low severe 3 (3.00%) low mild 1 (1.00%) high mild 6 (6.00%) high severe mat2_div_s time: [803.09 ps 804.70 ps 806.56 ps] change: [+10.272% +10.596% +10.960%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 3 (3.00%) high severe mat3_div_s time: [2.4929 ns 2.4947 ns 2.4967 ns] change: [+58.793% +59.185% +59.709%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) low severe 5 (5.00%) high mild 4 (4.00%) high severe mat4_div_s time: [5.1650 ns 5.1688 ns 5.1735 ns] change: [+76.816% +77.215% +77.629%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe mat2_inv time: [1.1514 ns 1.1523 ns 1.1533 ns] change: [−41.682% −41.556% −41.439%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 3 (3.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe mat3_inv time: [3.3641 ns 3.3707 ns 3.3826 ns] change: [−37.473% −37.358% −37.214%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 5 (5.00%) high severe mat4_inv time: [25.970 ns 26.006 ns 26.062 ns] change: [−9.0865% −8.9013% −8.6986%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 6 (6.00%) high severe mat2_transpose time: [409.94 ps 410.77 ps 411.75 ps] change: [−62.889% −62.624% −62.331%] (p = 0.00 < 0.05) Performance has improved. Found 17 outliers among 100 measurements (17.00%) 4 (4.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 7 (7.00%) high severe mat3_transpose time: [947.42 ps 953.20 ps 961.97 ps] change: [−61.273% −60.195% −58.616%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low mild 7 (7.00%) high mild 3 (3.00%) high severe mat4_transpose time: [1.6510 ns 1.6551 ns 1.6612 ns] change: [−65.877% −65.592% −65.225%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe mat_div_scalar time: [480.25 µs 480.55 µs 480.99 µs] change: [−22.235% −22.169% −22.095%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe mat100_add_mat100 time: [3.0426 µs 3.0910 µs 3.1351 µs] change: [+81.145% +84.392% +88.112%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) low severe 3 (3.00%) low mild 7 (7.00%) high mild 1 (1.00%) high severe mat4_mul_mat4 time: [36.836 ns 36.859 ns 36.886 ns] change: [+24.966% +25.568% +26.171%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) low severe 4 (4.00%) high mild 2 (2.00%) high severe mat5_mul_mat5 time: [56.715 ns 56.876 ns 57.015 ns] change: [+10.239% +10.666% +11.091%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low severe 1 (1.00%) low mild 6 (6.00%) high mild mat6_mul_mat6 time: [83.817 ns 83.999 ns 84.156 ns] change: [+10.675% +10.890% +11.065%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild mat7_mul_mat7 time: [93.211 ns 93.386 ns 93.534 ns] change: [+10.654% +10.892% +11.129%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low severe 2 (2.00%) low mild mat8_mul_mat8 time: [88.919 ns 89.410 ns 89.884 ns] change: [+22.808% +23.376% +23.888%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high mild mat9_mul_mat9 time: [207.12 ns 209.04 ns 211.17 ns] change: [+14.053% +14.646% +15.258%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 9 (9.00%) low mild 1 (1.00%) high mild mat10_mul_mat10 time: [236.75 ns 237.11 ns 237.47 ns] change: [+20.055% +20.366% +20.651%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) low severe 7 (7.00%) low mild 1 (1.00%) high mild mat10_mul_mat10_static time: [116.68 ns 117.15 ns 117.62 ns] change: [+11.160% +11.617% +12.049%] (p = 0.00 < 0.05) Performance has regressed. mat100_mul_mat100 time: [40.188 µs 40.327 µs 40.459 µs] change: [+3.2490% +3.4765% +3.7130%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 7 (7.00%) high mild 8 (8.00%) high severe mat500_mul_mat500 time: [4.3909 ms 4.3944 ms 4.3978 ms] change: [+0.8556% +0.9519% +1.0448%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) low severe 2 (2.00%) high mild 1 (1.00%) high severe iter time: [840.01 µs 840.39 µs 840.81 µs] change: [+10.527% +10.726% +10.915%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) high mild 11 (11.00%) high severe iter_rev time: [210.14 µs 211.10 µs 212.84 µs] change: [+0.2455% +0.7119% +1.7846%] (p = 0.02 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) high mild 6 (6.00%) high severe copy_from time: [199.77 µs 200.80 µs 202.55 µs] change: [+41.195% +41.962% +43.287%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 8 (8.00%) low mild 1 (1.00%) high severe axpy time: [31.301 µs 33.301 µs 34.957 µs] change: [+40.726% +52.001% +63.112%] (p = 0.00 < 0.05) Performance has regressed. tr_mul_to time: [126.46 µs 127.12 µs 128.09 µs] change: [−4.0124% −3.5145% −2.7708%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe mat_mul_mat time: [39.252 µs 39.443 µs 39.626 µs] change: [−0.7084% −0.3800% −0.0130%] (p = 0.02 < 0.05) Change within noise threshold. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low mild 8 (8.00%) high mild 2 (2.00%) high severe mat100_from_fn time: [6.8398 µs 6.8418 µs 6.8446 µs] change: [+519.35% +522.43% +524.76%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) high mild 9 (9.00%) high severe mat500_from_fn time: [172.11 µs 172.14 µs 172.18 µs] change: [+498.70% +499.32% +499.93%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low mild 5 (5.00%) high mild 7 (7.00%) high severe vec2_add_v_f32 time: [303.98 ps 304.76 ps 305.65 ps] change: [−5.1499% −4.3536% −3.5996%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 4 (4.00%) low severe 5 (5.00%) high mild 6 (6.00%) high severe vec3_add_v_f32 time: [586.36 ps 587.93 ps 589.92 ps] change: [+34.275% +34.886% +35.631%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low mild 5 (5.00%) high mild 6 (6.00%) high severe vec4_add_v_f32 time: [603.45 ps 604.44 ps 605.59 ps] change: [−18.949% −18.215% −17.623%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe vec2_add_v_f64 time: [602.08 ps 602.83 ps 603.64 ps] change: [+89.139% +90.573% +91.808%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 5 (5.00%) high severe vec3_add_v_f64 time: [910.94 ps 912.60 ps 914.56 ps] change: [+107.10% +108.18% +109.41%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) low severe 6 (6.00%) high mild 3 (3.00%) high severe vec4_add_v_f64 time: [1.1894 ns 1.1933 ns 1.1963 ns] change: [+82.607% +85.023% +86.911%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 9 (9.00%) low severe 2 (2.00%) low mild 2 (2.00%) high severe vec2_sub_v time: [303.45 ps 304.42 ps 305.37 ps] change: [−5.3598% −4.4578% −3.6738%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 8 (8.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 3 (3.00%) high severe vec3_sub_v time: [672.95 ps 674.82 ps 676.51 ps] change: [+51.463% +52.336% +53.346%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe vec4_sub_v time: [602.84 ps 604.65 ps 607.70 ps] change: [−19.744% −18.754% −17.881%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 6 (6.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe vec2_mul_s time: [666.49 ps 667.29 ps 668.31 ps] change: [+111.37% +111.81% +112.32%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 4 (4.00%) low severe 6 (6.00%) high mild 6 (6.00%) high severe vec3_mul_s time: [511.42 ps 513.44 ps 515.86 ps] change: [+15.556% +16.273% +17.049%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe vec4_mul_s time: [774.13 ps 775.22 ps 776.52 ps] change: [+5.1602% +5.5545% +6.0225%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 7 (7.00%) high severe vec2_div_s time: [1.3658 ns 1.3694 ns 1.3726 ns] change: [+328.67% +329.83% +331.09%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe vec3_div_s time: [607.73 ps 608.63 ps 609.66 ps] change: [+37.642% +38.017% +38.440%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) low severe 8 (8.00%) high mild 6 (6.00%) high severe vec4_div_s time: [802.59 ps 803.62 ps 804.82 ps] change: [+8.9451% +9.3240% +9.7149%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 3 (3.00%) low severe 6 (6.00%) high mild 2 (2.00%) high severe vec2_dot_f32 time: [461.20 ps 461.73 ps 462.30 ps] change: [+117.88% +119.27% +120.79%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 9 (9.00%) high severe vec3_dot_f32 time: [688.24 ps 689.05 ps 689.95 ps] change: [+225.49% +227.19% +229.16%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe vec4_dot_f32 time: [917.20 ps 921.23 ps 928.57 ps] change: [+338.59% +341.30% +344.17%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 8 (8.00%) high mild 5 (5.00%) high severe vec2_dot_f64 time: [596.11 ps 597.51 ps 598.79 ps] change: [+177.79% +179.60% +182.13%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe vec3_dot_f64 time: [749.32 ps 751.02 ps 752.81 ps] change: [+253.48% +257.12% +262.11%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe vec4_dot_f64 time: [1.0145 ns 1.0185 ns 1.0230 ns] change: [+376.34% +379.47% +383.46%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe vec3_cross time: [971.01 ps 971.87 ps 972.73 ps] change: [+122.34% +122.74% +123.17%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe vec2_norm time: [1.0612 ns 1.0623 ns 1.0637 ns] change: [−0.0722% +0.0499% +0.1765%] (p = 0.44 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) low mild 2 (2.00%) high severe vec3_norm time: [1.0649 ns 1.0665 ns 1.0694 ns] change: [−4.3787% −4.1856% −3.8679%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe vec4_norm time: [1.0733 ns 1.0739 ns 1.0746 ns] change: [−4.5616% −3.9738% −2.9157%] (p = 0.00 < 0.05) Performance has improved. Found 19 outliers among 100 measurements (19.00%) 2 (2.00%) low severe 7 (7.00%) low mild 5 (5.00%) high mild 5 (5.00%) high severe vec2_normalize time: [2.5310 ns 2.5326 ns 2.5345 ns] change: [+3.5769% +3.6696% +3.7678%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe vec3_normalize time: [2.5389 ns 2.5409 ns 2.5424 ns] change: [+1.1411% +1.2860% +1.4910%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe vec4_normalize time: [1.8154 ns 1.8164 ns 1.8173 ns] change: [−1.1191% −0.9926% −0.8485%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe vec10000_dot_f64 time: [2.0296 µs 2.0337 µs 2.0383 µs] change: [+71.107% +72.619% +74.228%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) low severe 3 (3.00%) high mild 4 (4.00%) high severe vec10000_dot_f32 time: [1.1891 µs 1.1926 µs 1.1962 µs] change: [+6.3585% +7.1059% +7.9357%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 6 (6.00%) high severe vec10000_axpy_f64 time: [2.0702 µs 2.0739 µs 2.0777 µs] change: [+39.373% +40.227% +41.210%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe vec10000_axpy_beta_f64 time: [2.0914 µs 2.0962 µs 2.1012 µs] change: [+31.958% +32.843% +33.467%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) low severe 5 (5.00%) high mild 2 (2.00%) high severe vec10000_axpy_f64_slice time: [2.0272 µs 2.0303 µs 2.0335 µs] change: [+35.880% +36.621% +37.307%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) low severe 2 (2.00%) high mild 1 (1.00%) high severe vec10000_axpy_f64_static time: [13.917 µs 13.965 µs 14.005 µs] change: [+859.61% +869.73% +879.35%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) low severe 3 (3.00%) high mild 2 (2.00%) high severe vec10000_axpy_f32 time: [1.0402 µs 1.0421 µs 1.0437 µs] change: [+38.710% +39.603% +40.363%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe vec10000_axpy_beta_f32 time: [1.0329 µs 1.0346 µs 1.0364 µs] change: [+30.705% +31.490% +32.040%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe quaternion_add_q time: [642.58 ps 650.39 ps 662.45 ps] change: [−11.788% −10.934% −9.9463%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 2 (2.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 6 (6.00%) high severe quaternion_sub_q time: [641.16 ps 643.22 ps 645.88 ps] change: [−12.654% −11.822% −10.943%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 5 (5.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 4 (4.00%) high severe quaternion_mul_q time: [1.4252 ns 1.4271 ns 1.4294 ns] change: [+94.545% +95.022% +95.499%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe unit_quaternion_mul_v time: [1.4859 ns 1.4874 ns 1.4890 ns] change: [+242.77% +243.56% +244.31%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild single_unit_quaternion_mul_v time: [1.0422 ns 1.0457 ns 1.0504 ns] Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) low severe 4 (4.00%) high mild 4 (4.00%) high severe quaternion_mul_s time: [771.17 ps 772.18 ps 773.37 ps] change: [+6.1278% +6.4276% +6.7583%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) low mild 3 (3.00%) high mild 3 (3.00%) high severe quaternion_div_s time: [798.54 ps 799.82 ps 801.43 ps] change: [+9.2123% +9.7287% +10.338%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe quaternion_inv time: [1.2401 ns 1.2408 ns 1.2417 ns] change: [−43.660% −43.521% −43.317%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) low severe 5 (5.00%) high mild 6 (6.00%) high severe unit_quaternion_inv time: [596.01 ps 598.93 ps 602.66 ps] change: [−49.707% −49.184% −48.445%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 6 (6.00%) high mild 9 (9.00%) high severe quaternion_conjugate time: [604.36 ps 608.60 ps 613.48 ps] Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) high mild 9 (9.00%) high severe quaternion_normalize time: [1.8268 ns 1.8274 ns 1.8281 ns] Found 18 outliers among 100 measurements (18.00%) 4 (4.00%) low severe 4 (4.00%) low mild 7 (7.00%) high mild 3 (3.00%) high severe bidiagonalize_100x100 time: [265.91 µs 266.00 µs 266.11 µs] change: [+0.7553% +0.8363% +0.9114%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe bidiagonalize_100x500 time: [2.0053 ms 2.0060 ms 2.0065 ms] change: [+4.0325% +4.2372% +4.3938%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) low severe 2 (2.00%) high mild 5 (5.00%) high severe bidiagonalize_4x4 time: [266.92 ns 267.24 ns 267.62 ns] change: [+7.1063% +7.2057% +7.3231%] (p = 0.00 < 0.05) Performance has regressed. Found 23 outliers among 100 measurements (23.00%) 1 (1.00%) low severe 5 (5.00%) low mild 13 (13.00%) high mild 4 (4.00%) high severe Benchmarking bidiagonalize_500x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50. bidiagonalize_500x100 time: [1.6781 ms 1.6793 ms 1.6804 ms] change: [+1.3944% +1.5312% +1.6400%] (p = 0.00 < 0.05) Performance has regressed. bidiagonalize_unpack_100x100 time: [522.13 µs 522.36 µs 522.63 µs] change: [−0.5318% −0.4044% −0.2627%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low mild 4 (4.00%) high mild 7 (7.00%) high severe bidiagonalize_unpack_100x500 time: [2.9858 ms 2.9916 ms 2.9976 ms] change: [−0.7824% −0.3995% −0.0370%] (p = 0.04 < 0.05) Change within noise threshold. bidiagonalize_unpack_500x100 time: [2.5884 ms 2.5896 ms 2.5910 ms] change: [+0.0767% +0.1539% +0.2316%] (p = 0.00 < 0.05) Change within noise threshold. cholesky_100x100 time: [31.084 µs 31.101 µs 31.122 µs] change: [−5.0365% −4.7949% −4.4205%] (p = 0.00 < 0.05) Performance has improved. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) low severe 4 (4.00%) low mild 1 (1.00%) high mild 9 (9.00%) high severe cholesky_500x500 time: [4.4799 ms 4.4849 ms 4.4903 ms] change: [−0.5985% −0.3685% −0.1374%] (p = 0.00 < 0.05) Change within noise threshold. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe cholesky_decompose_unpack_100x100 time: [31.659 µs 31.685 µs 31.727 µs] change: [−4.9712% −4.7445% −4.3325%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 4 (4.00%) low severe 4 (4.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe cholesky_decompose_unpack_500x500 time: [4.4795 ms 4.4845 ms 4.4910 ms] change: [−1.9595% −1.7121% −1.4978%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 7 (7.00%) high severe cholesky_solve_10x10 time: [170.70 ns 170.76 ns 170.82 ns] change: [+8.0936% +8.1777% +8.2764%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe cholesky_solve_100x100 time: [2.9071 µs 2.9117 µs 2.9174 µs] change: [+8.4770% +8.9956% +9.6254%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) low severe 3 (3.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe cholesky_solve_500x500 time: [54.193 µs 54.303 µs 54.417 µs] change: [+3.9332% +4.1755% +4.4477%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild cholesky_inverse_10x10 time: [1.3189 µs 1.3195 µs 1.3201 µs] change: [+2.5360% +2.6238% +2.7131%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe cholesky_inverse_100x100 time: [270.85 µs 270.88 µs 270.92 µs] change: [−0.9726% −0.8524% −0.7319%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) low severe 4 (4.00%) low mild 2 (2.00%) high mild 2 (2.00%) high severe cholesky_inverse_500x500 time: [26.673 ms 26.694 ms 26.714 ms] change: [+1.0784% +1.1816% +1.2794%] (p = 0.00 < 0.05) Performance has regressed. Found 23 outliers among 100 measurements (23.00%) 19 (19.00%) low severe 2 (2.00%) low mild 2 (2.00%) high severe full_piv_lu_decompose_10x10 time: [582.31 ns 582.48 ns 582.67 ns] change: [+19.583% +19.702% +19.795%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 6 (6.00%) high mild 2 (2.00%) high severe full_piv_lu_decompose_100x100 time: [218.73 µs 218.78 µs 218.84 µs] change: [+5.8729% +5.9828% +6.0904%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) low severe 5 (5.00%) low mild 1 (1.00%) high severe full_piv_lu_solve_10x10 time: [124.88 ns 124.94 ns 125.02 ns] change: [+7.4724% +7.6252% +7.7787%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 3 (3.00%) low severe 6 (6.00%) high mild 4 (4.00%) high severe full_piv_lu_solve_100x100 time: [2.5202 µs 2.5244 µs 2.5289 µs] change: [+11.226% +11.847% +12.518%] (p = 0.00 < 0.05) Performance has regressed. Found 17 outliers among 100 measurements (17.00%) 14 (14.00%) low severe 2 (2.00%) low mild 1 (1.00%) high mild full_piv_lu_inverse_10x10 time: [869.61 ns 870.27 ns 871.19 ns] change: [+4.7996% +4.9224% +5.0608%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low severe 1 (1.00%) high mild 4 (4.00%) high severe full_piv_lu_inverse_100x100 time: [212.68 µs 212.83 µs 213.05 µs] change: [−0.2835% −0.0351% +0.1310%] (p = 0.80 > 0.05) No change in performance detected. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low severe 4 (4.00%) low mild 3 (3.00%) high mild 5 (5.00%) high severe full_piv_lu_determinant_10x10 time: [15.320 ns 15.338 ns 15.357 ns] change: [+410.70% +421.41% +430.41%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 9 (9.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild full_piv_lu_determinant_100x100 time: [137.44 ns 139.37 ns 141.00 ns] change: [+213.54% +227.75% +241.42%] (p = 0.00 < 0.05) Performance has regressed. hessenberg_decompose_4x4 time: [82.510 ns 82.538 ns 82.564 ns] change: [−27.950% −27.887% −27.830%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild hessenberg_decompose_100x100 time: [295.98 µs 296.16 µs 296.44 µs] change: [+3.3234% +3.5705% +3.7986%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe hessenberg_decompose_200x200 time: [2.2647 ms 2.2681 ms 2.2714 ms] change: [+4.8426% +4.9983% +5.1646%] (p = 0.00 < 0.05) Performance has regressed. hessenberg_decompose_unpack_100x100 time: [435.30 µs 435.75 µs 436.12 µs] change: [+2.7479% +2.8420% +2.9424%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe hessenberg_decompose_unpack_200x200 time: [3.2667 ms 3.2678 ms 3.2690 ms] change: [+3.9624% +4.0021% +4.0423%] (p = 0.00 < 0.05) Performance has regressed. Found 22 outliers among 100 measurements (22.00%) 13 (13.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 5 (5.00%) high severe lu_decompose_10x10 time: [353.04 ns 353.16 ns 353.31 ns] change: [−5.0408% −4.9435% −4.8487%] (p = 0.00 < 0.05) Performance has improved. Found 19 outliers among 100 measurements (19.00%) 4 (4.00%) low severe 4 (4.00%) low mild 6 (6.00%) high mild 5 (5.00%) high severe lu_decompose_100x100 time: [71.544 µs 71.560 µs 71.579 µs] change: [−1.7176% −1.6430% −1.5721%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe lu_solve_10x10 time: [115.42 ns 115.52 ns 115.61 ns] change: [+3.9363% +4.1024% +4.2557%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 4 (4.00%) low severe 8 (8.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe lu_solve_100x100 time: [2.5152 µs 2.5190 µs 2.5225 µs] change: [+15.120% +15.625% +16.088%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) low severe 2 (2.00%) low mild 1 (1.00%) high mild lu_inverse_10x10 time: [902.55 ns 903.32 ns 903.97 ns] change: [+0.7407% +0.8734% +1.0263%] (p = 0.00 < 0.05) Change within noise threshold. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high severe lu_inverse_100x100 time: [216.21 µs 216.47 µs 216.80 µs] change: [−0.6663% −0.5584% −0.4316%] (p = 0.00 < 0.05) Change within noise threshold. Found 18 outliers among 100 measurements (18.00%) 2 (2.00%) low severe 4 (4.00%) low mild 5 (5.00%) high mild 7 (7.00%) high severe lu_determinant_10x10 time: [13.394 ns 13.481 ns 13.665 ns] change: [+508.98% +524.96% +543.53%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 6 (6.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe lu_determinant_100x100 time: [149.12 ns 150.16 ns 151.08 ns] change: [+265.69% +281.86% +296.23%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 10 (10.00%) low severe 4 (4.00%) low mild qr_decompose_100x100 time: [141.62 µs 141.65 µs 141.69 µs] change: [+0.6391% +0.8447% +0.9784%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe Benchmarking qr_decompose_100x500: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, enable flat sampling, or reduce sample count to 60. qr_decompose_100x500 time: [1.0071 ms 1.0082 ms 1.0097 ms] change: [+0.9031% +1.2358% +1.6126%] (p = 0.00 < 0.05) Change within noise threshold. Found 16 outliers among 100 measurements (16.00%) 12 (12.00%) low mild 2 (2.00%) high mild 2 (2.00%) high severe qr_decompose_4x4 time: [100.40 ns 100.43 ns 100.45 ns] change: [−19.315% −19.268% −19.224%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low mild 1 (1.00%) high mild 4 (4.00%) high severe Benchmarking qr_decompose_500x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60. qr_decompose_500x100 time: [847.17 µs 847.68 µs 848.21 µs] change: [+2.1441% +2.3425% +2.5069%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe qr_decompose_unpack_100x100 time: [283.22 µs 283.26 µs 283.30 µs] change: [−0.3591% −0.2383% −0.1147%] (p = 0.00 < 0.05) Change within noise threshold. Found 23 outliers among 100 measurements (23.00%) 21 (21.00%) low severe 1 (1.00%) low mild 1 (1.00%) high severe Benchmarking qr_decompose_unpack_100x500: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.8s, enable flat sampling, or reduce sample count to 60. qr_decompose_unpack_100x500 time: [1.1399 ms 1.1429 ms 1.1457 ms] change: [−1.9555% −1.8085% −1.6312%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild Benchmarking qr_decompose_unpack_500x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.6s, enable flat sampling, or reduce sample count to 50. qr_decompose_unpack_500x100 time: [1.6633 ms 1.6640 ms 1.6648 ms] change: [+1.4516% +1.5245% +1.5969%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) low severe 5 (5.00%) low mild 4 (4.00%) high severe qr_solve_10x10 time: [156.51 ns 156.56 ns 156.61 ns] change: [+3.7415% +3.8709% +3.9947%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) low severe 5 (5.00%) low mild 1 (1.00%) high mild qr_solve_100x100 time: [3.5393 µs 3.5454 µs 3.5511 µs] change: [+6.0908% +6.5747% +6.9798%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 6 (6.00%) low mild qr_inverse_10x10 time: [806.75 ns 807.99 ns 809.61 ns] change: [+0.6973% +0.8242% +0.9558%] (p = 0.00 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe qr_inverse_100x100 time: [330.65 µs 330.74 µs 330.85 µs] change: [+1.2238% +1.3244% +1.4518%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe schur_decompose_4x4 time: [969.14 ns 969.71 ns 970.18 ns] change: [−12.293% −12.223% −12.149%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe schur_decompose_10x10 time: [7.3226 µs 7.3237 µs 7.3247 µs] change: [+0.3785% +0.4095% +0.4394%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) low mild 4 (4.00%) high mild 3 (3.00%) high severe schur_decompose_100x100 time: [2.5760 ms 2.5763 ms 2.5768 ms] change: [+0.7992% +0.8504% +0.8935%] (p = 0.00 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe schur_decompose_200x200 time: [18.285 ms 18.296 ms 18.308 ms] change: [+1.9360% +2.0941% +2.2427%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe eigenvalues_4x4 time: [937.94 ns 938.15 ns 938.38 ns] change: [+25.764% +25.898% +26.023%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild eigenvalues_10x10 time: [5.9066 µs 5.9088 µs 5.9117 µs] change: [+0.1208% +0.1938% +0.2740%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe Benchmarking eigenvalues_100x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50. eigenvalues_100x100 time: [1.5870 ms 1.5873 ms 1.5876 ms] change: [−0.8569% −0.8247% −0.7914%] (p = 0.00 < 0.05) Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe eigenvalues_200x200 time: [11.081 ms 11.088 ms 11.102 ms] change: [+0.0054% +0.2956% +0.4946%] (p = 0.00 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe solve_l_triangular_100x100 time: [1.3250 µs 1.3651 µs 1.4012 µs] change: [+22.932% +24.999% +27.087%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 10 (10.00%) high mild 2 (2.00%) high severe solve_l_triangular_1000x1000 time: [101.52 µs 102.04 µs 102.85 µs] change: [+1.5784% +2.0953% +2.8471%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 9 (9.00%) high mild 6 (6.00%) high severe tr_solve_l_triangular_100x100 time: [2.0144 µs 2.0537 µs 2.0902 µs] change: [+13.600% +14.669% +15.998%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 5 (5.00%) high mild 11 (11.00%) high severe tr_solve_l_triangular_1000x1000 time: [93.569 µs 94.056 µs 94.857 µs] change: [+1.2474% +1.7955% +2.5979%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe solve_u_triangular_100x100 time: [1.5878 µs 1.6615 µs 1.7405 µs] change: [+31.200% +34.370% +38.132%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 10 (10.00%) high mild 3 (3.00%) high severe solve_u_triangular_1000x1000 time: [105.07 µs 105.46 µs 106.12 µs] change: [+6.6559% +7.0936% +7.8401%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe tr_solve_u_triangular_100x100 time: [1.4369 µs 1.4697 µs 1.4986 µs] change: [+17.195% +18.687% +20.307%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 11 (11.00%) high mild 2 (2.00%) high severe tr_solve_u_triangular_1000x1000 time: [88.868 µs 89.303 µs 90.014 µs] change: [+4.2489% +4.7933% +5.6045%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) high mild 7 (7.00%) high severe svd_decompose_2x2 time: [22.913 ns 22.958 ns 23.017 ns] change: [+9.3648% +9.7443% +10.253%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe svd_decompose_3x3 time: [359.30 ns 359.72 ns 360.20 ns] change: [+9.0123% +9.1174% +9.2394%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild svd_decompose_4x4 time: [896.28 ns 896.55 ns 896.85 ns] change: [−7.1192% −7.0496% −6.9853%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 3 (3.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe svd_decompose_10x10 time: [5.7680 µs 5.7708 µs 5.7739 µs] change: [+1.1933% +1.4155% +1.6347%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe Benchmarking svd_decompose_100x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50. svd_decompose_100x100 time: [1.5704 ms 1.5709 ms 1.5715 ms] change: [+1.4465% +1.4891% +1.5357%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe svd_decompose_200x200 time: [11.845 ms 11.847 ms 11.850 ms] change: [+1.4378% +1.4794% +1.5225%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high severe rank_4x4 time: [716.49 ns 716.62 ns 716.74 ns] change: [+4.9084% +4.9678% +5.0237%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild rank_10x10 time: [4.2304 µs 4.2341 µs 4.2377 µs] change: [+0.4993% +0.6056% +0.7271%] (p = 0.00 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild rank_100x100 time: [522.74 µs 522.85 µs 522.97 µs] change: [+0.2822% +0.3170% +0.3535%] (p = 0.00 < 0.05) Change within noise threshold. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 2 (2.00%) high severe rank_200x200 time: [3.0167 ms 3.0217 ms 3.0267 ms] change: [+0.3924% +0.5333% +0.6946%] (p = 0.00 < 0.05) Change within noise threshold. singular_values_4x4 time: [735.97 ns 736.08 ns 736.21 ns] change: [−7.6736% −7.6163% −7.5596%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) low severe 2 (2.00%) low mild 2 (2.00%) high severe singular_values_10x10 time: [4.2987 µs 4.2997 µs 4.3010 µs] change: [+1.6193% +1.7215% +1.8186%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe singular_values_100x100 time: [525.20 µs 525.36 µs 525.54 µs] change: [+0.4054% +0.4526% +0.4982%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe singular_values_200x200 time: [3.0712 ms 3.0729 ms 3.0750 ms] change: [+2.1769% +2.2358% +2.3112%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 1 (1.00%) high mild 1 (1.00%) high severe pseudo_inverse_4x4 time: [877.64 ns 878.38 ns 879.12 ns] change: [−8.2828% −8.2216% −8.1662%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low severe 3 (3.00%) low mild 2 (2.00%) high mild 7 (7.00%) high severe pseudo_inverse_10x10 time: [6.0008 µs 6.0034 µs 6.0064 µs] change: [+0.2665% +0.3678% +0.4766%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe Benchmarking pseudo_inverse_100x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.4s, enable flat sampling, or reduce sample count to 50. pseudo_inverse_100x100 time: [1.6088 ms 1.6091 ms 1.6094 ms] change: [+0.1161% +0.2007% +0.2937%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) high mild 10 (10.00%) high severe pseudo_inverse_200x200 time: [12.038 ms 12.042 ms 12.047 ms] change: [−0.4351% −0.2531% −0.0699%] (p = 0.01 < 0.05) Change within noise threshold. Found 22 outliers among 100 measurements (22.00%) 16 (16.00%) low severe 2 (2.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe symmetric_eigen_decompose_4x4 time: [518.00 ns 518.07 ns 518.15 ns] change: [+4.7008% +4.7492% +4.8006%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe symmetric_eigen_decompose_10x10 time: [3.6417 µs 3.6428 µs 3.6440 µs] change: [−0.1549% −0.0998% −0.0483%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) high mild 6 (6.00%) high severe symmetric_eigen_decompose_100x100 time: [761.64 µs 762.66 µs 763.80 µs] change: [−5.8109% −5.7178% −5.6284%] (p = 0.00 < 0.05) Performance has improved. Found 19 outliers among 100 measurements (19.00%) 9 (9.00%) low severe 9 (9.00%) low mild 1 (1.00%) high severe symmetric_eigen_decompose_200x200 time: [5.1304 ms 5.1337 ms 5.1372 ms] change: [−9.4434% −9.3646% −9.2959%] (p = 0.00 < 0.05) Performance has improved. Total run time of full benchmark suite on my machine (AMD 5950X) has not changed and is still around ~30 minutes.
1 parent 65302fa commit 8bec499

File tree

12 files changed

+1039
-556
lines changed

12 files changed

+1039
-556
lines changed

benches/core/matrix.rs

Lines changed: 238 additions & 99 deletions
Large diffs are not rendered by default.

benches/core/vector.rs

Lines changed: 101 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
use na::{DVector, SVector, Vector2, Vector3, Vector4};
22
use rand::Rng;
33
use rand_isaac::IsaacRng;
4+
45
use std::ops::{Add, Div, Mul, Sub};
56

67
#[path = "../common/macros.rs"]
@@ -49,81 +50,144 @@ bench_binop_ref!(vec10000_dot_f32, SVector<f32, 10000>, SVector<f32, 10000>, dot
4950

5051
fn vec10000_axpy_f64(bh: &mut criterion::Criterion) {
5152
use rand::SeedableRng;
53+
5254
let mut rng = IsaacRng::seed_from_u64(0);
53-
let mut a = DVector::new_random(10000);
54-
let b = DVector::new_random(10000);
55-
let n = rng.random::<f64>();
5655

57-
bh.bench_function("vec10000_axpy_f64", move |bh| {
58-
bh.iter(|| a.axpy(n, &b, 1.0))
56+
bh.bench_function("vec10000_axpy_f64", |bh| {
57+
bh.iter_batched(
58+
|| {
59+
(
60+
DVector::new_random(10000),
61+
DVector::new_random(10000),
62+
rng.random::<f64>(),
63+
)
64+
},
65+
|(mut a, b, n)| {
66+
a.axpy(n, &b, 1.0);
67+
(a, b)
68+
},
69+
criterion::BatchSize::SmallInput,
70+
)
5971
});
6072
}
6173

6274
fn vec10000_axpy_beta_f64(bh: &mut criterion::Criterion) {
6375
use rand::SeedableRng;
76+
6477
let mut rng = IsaacRng::seed_from_u64(0);
65-
let mut a = DVector::new_random(10000);
66-
let b = DVector::new_random(10000);
67-
let n = rng.random::<f64>();
68-
let beta = rng.random::<f64>();
6978

70-
bh.bench_function("vec10000_axpy_beta_f64", move |bh| {
71-
bh.iter(|| a.axpy(n, &b, beta))
79+
bh.bench_function("vec10000_axpy_beta_f64", |bh| {
80+
bh.iter_batched(
81+
|| {
82+
(
83+
DVector::new_random(10000),
84+
DVector::new_random(10000),
85+
rng.random::<f64>(),
86+
rng.random::<f64>(),
87+
)
88+
},
89+
|(mut a, b, n, beta)| {
90+
a.axpy(n, &b, beta);
91+
(a, b)
92+
},
93+
criterion::BatchSize::SmallInput,
94+
)
7295
});
7396
}
7497

7598
fn vec10000_axpy_f64_slice(bh: &mut criterion::Criterion) {
7699
use rand::SeedableRng;
77-
let mut rng = IsaacRng::seed_from_u64(0);
78-
let mut a = DVector::new_random(10000);
79-
let b = DVector::new_random(10000);
80-
let n = rng.random::<f64>();
81100

82-
bh.bench_function("vec10000_axpy_f64_slice", move |bh| {
83-
bh.iter(|| {
84-
let mut a = a.fixed_rows_mut::<10000>(0);
85-
let b = b.fixed_rows::<10000>(0);
101+
let mut rng = IsaacRng::seed_from_u64(0);
86102

87-
a.axpy(n, &b, 1.0)
88-
})
103+
bh.bench_function("vec10000_axpy_f64_slice", |bh| {
104+
bh.iter_batched(
105+
|| {
106+
(
107+
DVector::new_random(10000),
108+
DVector::new_random(10000),
109+
rng.random::<f64>(),
110+
)
111+
},
112+
|(mut a, b, n)| {
113+
let mut a_slice = a.fixed_rows_mut::<10000>(0);
114+
let b_slice = b.fixed_rows::<10000>(0);
115+
a_slice.axpy(n, &b_slice, 1.0);
116+
(a, b)
117+
},
118+
criterion::BatchSize::SmallInput,
119+
)
89120
});
90121
}
91122

92123
fn vec10000_axpy_f64_static(bh: &mut criterion::Criterion) {
93124
use rand::SeedableRng;
125+
94126
let mut rng = IsaacRng::seed_from_u64(0);
95-
let mut a = SVector::<f64, 10000>::new_random();
96-
let b = SVector::<f64, 10000>::new_random();
97-
let n = rng.random::<f64>();
98127

99128
// NOTE: for some reasons, it is much faster if the argument are boxed (Box::new(OVector...)).
100-
bh.bench_function("vec10000_axpy_f64_static", move |bh| {
101-
bh.iter(|| a.axpy(n, &b, 1.0))
129+
bh.bench_function("vec10000_axpy_f64_static", |bh| {
130+
bh.iter_batched(
131+
|| {
132+
(
133+
SVector::<f64, 10000>::new_random(),
134+
SVector::<f64, 10000>::new_random(),
135+
rng.random::<f64>(),
136+
)
137+
},
138+
|(mut a, b, n)| {
139+
a.axpy(n, &b, 1.0);
140+
(a, b)
141+
},
142+
criterion::BatchSize::SmallInput,
143+
)
102144
});
103145
}
104146

105147
fn vec10000_axpy_f32(bh: &mut criterion::Criterion) {
106148
use rand::SeedableRng;
149+
107150
let mut rng = IsaacRng::seed_from_u64(0);
108-
let mut a = DVector::new_random(10000);
109-
let b = DVector::new_random(10000);
110-
let n = rng.random::<f32>();
111151

112-
bh.bench_function("vec10000_axpy_f32", move |bh| {
113-
bh.iter(|| a.axpy(n, &b, 1.0))
152+
bh.bench_function("vec10000_axpy_f32", |bh| {
153+
bh.iter_batched(
154+
|| {
155+
(
156+
DVector::new_random(10000),
157+
DVector::new_random(10000),
158+
rng.random::<f32>(),
159+
)
160+
},
161+
|(mut a, b, n)| {
162+
a.axpy(n, &b, 1.0);
163+
(a, b)
164+
},
165+
criterion::BatchSize::SmallInput,
166+
)
114167
});
115168
}
116169

117170
fn vec10000_axpy_beta_f32(bh: &mut criterion::Criterion) {
118171
use rand::SeedableRng;
172+
119173
let mut rng = IsaacRng::seed_from_u64(0);
120-
let mut a = DVector::new_random(10000);
121-
let b = DVector::new_random(10000);
122-
let n = rng.random::<f32>();
123-
let beta = rng.random::<f32>();
124174

125-
bh.bench_function("vec10000_axpy_beta_f32", move |bh| {
126-
bh.iter(|| a.axpy(n, &b, beta))
175+
bh.bench_function("vec10000_axpy_beta_f32", |bh| {
176+
bh.iter_batched(
177+
|| {
178+
(
179+
DVector::new_random(10000),
180+
DVector::new_random(10000),
181+
rng.random::<f32>(),
182+
rng.random::<f32>(),
183+
)
184+
},
185+
|(mut a, b, n, beta)| {
186+
a.axpy(n, &b, beta);
187+
(a, b)
188+
},
189+
criterion::BatchSize::SmallInput,
190+
)
127191
});
128192
}
129193

benches/linalg/bidiagonal.rs

Lines changed: 51 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -5,61 +5,82 @@ mod macros;
55

66
// Without unpack.
77
fn bidiagonalize_100x100(bh: &mut criterion::Criterion) {
8-
let m = DMatrix::<f64>::new_random(100, 100);
9-
bh.bench_function("bidiagonalize_100x100", move |bh| {
10-
bh.iter(|| std::hint::black_box(Bidiagonal::new(m.clone())))
8+
bh.bench_function("bidiagonalize_100x100", |bh| {
9+
bh.iter_batched(
10+
|| DMatrix::<f64>::new_random(100, 100),
11+
|m| Bidiagonal::new(m),
12+
criterion::BatchSize::SmallInput,
13+
)
1114
});
1215
}
1316

1417
fn bidiagonalize_100x500(bh: &mut criterion::Criterion) {
15-
let m = DMatrix::<f64>::new_random(100, 500);
16-
bh.bench_function("bidiagonalize_100x500", move |bh| {
17-
bh.iter(|| std::hint::black_box(Bidiagonal::new(m.clone())))
18+
bh.bench_function("bidiagonalize_100x500", |bh| {
19+
bh.iter_batched(
20+
|| DMatrix::<f64>::new_random(100, 500),
21+
|m| Bidiagonal::new(m),
22+
criterion::BatchSize::SmallInput,
23+
)
1824
});
1925
}
2026

2127
fn bidiagonalize_4x4(bh: &mut criterion::Criterion) {
22-
let m = Matrix4::<f64>::new_random();
23-
bh.bench_function("bidiagonalize_4x4", move |bh| {
24-
bh.iter(|| std::hint::black_box(Bidiagonal::new(m.clone())))
28+
bh.bench_function("bidiagonalize_4x4", |bh| {
29+
bh.iter_batched(
30+
|| Matrix4::<f64>::new_random(),
31+
|m| Bidiagonal::new(m),
32+
criterion::BatchSize::SmallInput,
33+
)
2534
});
2635
}
2736

2837
fn bidiagonalize_500x100(bh: &mut criterion::Criterion) {
29-
let m = DMatrix::<f64>::new_random(500, 100);
30-
bh.bench_function("bidiagonalize_500x100", move |bh| {
31-
bh.iter(|| std::hint::black_box(Bidiagonal::new(m.clone())))
38+
bh.bench_function("bidiagonalize_500x100", |bh| {
39+
bh.iter_batched(
40+
|| DMatrix::<f64>::new_random(500, 100),
41+
|m| Bidiagonal::new(m),
42+
criterion::BatchSize::SmallInput,
43+
)
3244
});
3345
}
3446

3547
// With unpack.
3648
fn bidiagonalize_unpack_100x100(bh: &mut criterion::Criterion) {
37-
let m = DMatrix::<f64>::new_random(100, 100);
38-
bh.bench_function("bidiagonalize_unpack_100x100", move |bh| {
39-
bh.iter(|| {
40-
let bidiag = Bidiagonal::new(m.clone());
41-
let _ = bidiag.unpack();
42-
})
49+
bh.bench_function("bidiagonalize_unpack_100x100", |bh| {
50+
bh.iter_batched(
51+
|| DMatrix::<f64>::new_random(100, 100),
52+
|m| {
53+
let bidiag = Bidiagonal::new(m);
54+
bidiag.unpack()
55+
},
56+
criterion::BatchSize::SmallInput,
57+
)
4358
});
4459
}
4560

4661
fn bidiagonalize_unpack_100x500(bh: &mut criterion::Criterion) {
47-
let m = DMatrix::<f64>::new_random(100, 500);
48-
bh.bench_function("bidiagonalize_unpack_100x500", move |bh| {
49-
bh.iter(|| {
50-
let bidiag = Bidiagonal::new(m.clone());
51-
let _ = bidiag.unpack();
52-
})
62+
bh.bench_function("bidiagonalize_unpack_100x500", |bh| {
63+
bh.iter_batched(
64+
|| DMatrix::<f64>::new_random(100, 500),
65+
|m| {
66+
let bidiag = Bidiagonal::new(m);
67+
bidiag.unpack()
68+
},
69+
criterion::BatchSize::SmallInput,
70+
)
5371
});
5472
}
5573

5674
fn bidiagonalize_unpack_500x100(bh: &mut criterion::Criterion) {
57-
let m = DMatrix::<f64>::new_random(500, 100);
58-
bh.bench_function("bidiagonalize_unpack_500x100", move |bh| {
59-
bh.iter(|| {
60-
let bidiag = Bidiagonal::new(m.clone());
61-
let _ = bidiag.unpack();
62-
})
75+
bh.bench_function("bidiagonalize_unpack_500x100", |bh| {
76+
bh.iter_batched(
77+
|| DMatrix::<f64>::new_random(500, 100),
78+
|m| {
79+
let bidiag = Bidiagonal::new(m);
80+
bidiag.unpack()
81+
},
82+
criterion::BatchSize::SmallInput,
83+
)
6384
});
6485
}
6586

0 commit comments

Comments
 (0)