Skip to content

Conversation

@artagnon
Copy link
Contributor

@artagnon artagnon commented Jul 31, 2024

TargetSchedModel::computeOperandLatency is supposed to return the exact latency between two MIs, although it is observed that InstrSchedModel and InstrItineraries are often unavailable in many real-world scenarios. When these two pieces of information are not available, the function returns an estimate that is much too conservative: the default def latency. MachineTraceMetrics is one of the callers affected quite badly by these conservative estimates. To improve the estimate, and let callers of MTM generate better code, offset the default def latency by the estimated cycles elapsed between the def MI and use MI. Since we're trying to improve codegen in the case when no scheduling information is unavailable, it is impossible to determine the exact number of cycles elapsed between the two MIs, and we use the distance between them accounting for issue-width as an approximate. In practice, this improvement of one estimate by offseting it with another estimate leads to better codegen on average, and yields non-trivial gains on standard benchmarks.

@michaelmaitland
Copy link
Contributor

In practice, this improvement of one crude estimate by offseting it with another crude estimate leads to better codegen on average, and yields huge gains on standard benchmarks.

Do you have any data to support this claim?

@artagnon
Copy link
Contributor Author

artagnon commented Jul 31, 2024

I've tried variations of this patch in the past, but I think there's a good idea hidden here. I did statstical benchmarking with the benchmarking tests in llvm-test-suite, and saw an improvement of 8.7% on X86-native, as measured by perf. I didn't see statistically-significant differences on SPEC 2017. I messed up the benchmarking; please ignore these results.

Program                                       exec_time
                                              lhs        rhs        diff
MultiSourc.../Applications/sgefa/sgefa.test         0.15       0.45 204.3%
SingleSour...ce/Benchmarks/McGill/misr.test         0.15       0.31 110.4%
SingleSour...algebra/kernels/atax/atax.test         0.02       0.03  88.0%
MicroBench...da.test:BM_VOL3D_CALC_LAMBDA/1        40.40      74.68  84.9%
MicroBench...mbda.test:BM_INIT3_LAMBDA/5001         5.49       9.91  80.5%
SingleSour...e/Benchmarks/Misc/flops-2.test         0.37       0.65  75.3%
MultiSourc...Rodinia/backprop/backprop.test         0.63       1.10  74.6%
SingleSour...nchmarkGame/spectral-norm.test         0.29       0.50  72.7%
Bitcode/si...imd_ops_test_op_minpd_226.test         0.01       0.01  67.8%
SingleSour...ar-algebra/blas/symm/symm.test         0.00       0.00  65.7%
MicroBench...BENCHMARK_asinf_autovec_float_       118.03     192.37  63.0%
SingleSour...le_types_constant_folding.test         0.51       0.82  62.1%
MultiSourc...ow-dbl/GlobalDataFlow-dbl.test         1.93       3.12  61.5%
MicroBench...mCmp<4, GreaterThanZero, None>      1266.63    2006.76  58.4%
MultiSourc...chmarks/Olden/power/power.test         0.55       0.88  58.2%
SingleSour...r-algebra/kernels/mvt/mvt.test         0.05       0.08  55.1%
MultiSourc...itBench/uudecode/uudecode.test         0.02       0.03  55.0%
MicroBench...t:BENCHMARK_acosf_novec_float_       108.98     168.38  54.5%
MicroBench...BENCHMARK_acosf_autovec_float_       122.21     188.19  54.0%
MicroBench...Cmp<4, GreaterThanZero, First>      1277.75    1938.75  51.7%
MicroBench...emCmp<4, GreaterThanZero, Mid>      1284.56    1944.72  51.4%
MicroBench...aw.test:BM_INNER_PROD_RAW/5001        12.73      19.17  50.6%
External/S...ecrand_fr/997.specrand_fr.test         0.00       0.00  48.3%
MicroBench...runtime_checks_fail<2, double>         4.05       5.96  47.1%
MicroBench...mCmp<4, GreaterThanZero, Last>      1261.89    1850.63  46.7%
MicroBench...calsCRaw.test:BM_ADI_RAW/44217      1029.44    1504.82  46.2%
MicroBench...est:BM_MemCmp<4, EqZero, Last>      1308.17    1887.73  44.3%
MicroBench...Cmp<3, GreaterThanZero, First>      2135.02    3065.22  43.6%
MicroBench...w.test:BM_INNER_PROD_RAW/44217       113.61     162.36  42.9%
MicroBench...est:BM_MemCmp<4, EqZero, None>      1216.83    1728.42  42.0%
SingleSour...marks/CoyoteBench/lpbench.test         1.84       2.60  41.4%
MicroBench...test:BM_MemCmp<3, EqZero, Mid>      1870.92    2644.56  41.4%
SingleSour...Benchmarks/Stanford/Oscar.test         0.00       0.00  41.2%
MicroBench...test:BM_MemCmp<4, EqZero, Mid>      1166.12    1636.95  40.4%
Bitcode/si...d_ops_test_op_cmpltps_137.test         0.01       0.01  40.4%
MicroBench...ks.test:benchForIC1VW4LoopTC63         5.00       7.01  40.3%
MicroBench...aw.test:BM_BAND_LIN_EQ_RAW/171         0.06       0.08  39.2%
MicroBench...st:BM_MemCmp<3, EqZero, First>      1890.12    2620.43  38.6%
MicroBench...MemCmp<4, LessThanZero, First>      1428.67    1972.97  38.1%
MicroBench...Raw.test:BM_INNER_PROD_RAW/171         0.41       0.56  37.8%
MultiSourc...pansion-dbl/Expansion-dbl.test         2.26       3.07  35.8%
MicroBench..._MemCmp<2, LessThanZero, Last>      2621.29    3541.93  35.1%
SingleSour...out-C++/Shootout-C++-ary2.test         0.04       0.05  34.8%
MicroBench...CRaw.test:BM_HYDRO_1D_RAW/5001         2.82       3.76  33.2%
MicroBench...Lambda.test:BM_ICCG_LAMBDA/171         0.19       0.25  32.8%
MicroBench...lcalsCRaw.test:BM_ADI_RAW/5001       114.35     151.80  32.8%
Bitcode/si..._ops_test_op_packuswb_234.test         0.01       0.01  32.4%
MicroBench....test:BM_HYDRO_1D_LAMBDA/44217        27.42      36.29  32.4%
MicroBench...est:BM_MemCmp<3, EqZero, Last>      1894.35    2497.81  31.9%
SingleSour.../Benchmarks/Misc/evalloop.test         0.64       0.85  31.8%
MicroBench...runtime_checks_pass<2, double>         2.78       3.66  31.7%
MicroBench....test:BM_BAND_LIN_EQ_RAW/44217        18.30      24.05  31.4%
External/S...lancbmk_s/623.xalancbmk_s.test        46.06      60.33  31.0%
MicroBench...da.test:BM_VOL3D_CALC_LAMBDA/2         1.51       1.98  30.9%
MicroBench...est:BENCHMARK_HARRIS/2048/2048     33104.45   43122.50  30.3%
SingleSour...otout/Shootout-nestedloop.test         0.00       0.00  30.0%
MultiSourc.../Benchmarks/Ptrdist/ft/ft.test         1.01       1.31  29.7%
MicroBench...intersAllDisjointDecreasing/32     23990.85   31022.28  29.3%
MicroBench...w.test:BM_BAND_LIN_EQ_RAW/5001         1.89       2.44  29.1%
SingleSour...lgebra/blas/gemver/gemver.test         0.04       0.05  28.8%
Bitcode/si...imd_ops_test_op_mulpd_207.test         0.01       0.01  28.3%
MicroBench...st:BM_MemCmp<4, EqZero, First>      1220.83    1561.77  27.9%
MultiSourc.../Prolangs-C++/ocean/ocean.test         0.06       0.08  27.8%
MicroBench...da.test:BM_HYDRO_1D_LAMBDA/171         0.08       0.11  27.5%
MicroBench...t:BENCHMARK_acos_novec_double_        99.29     126.58  27.5%
MultiSourc.../Benchmarks/Olden/tsp/tsp.test         0.57       0.73  27.2%
Bitcode/si...imd_ops_test_op_addps_117.test         0.01       0.01  26.8%
MicroBench...Cmp<2, GreaterThanZero, First>      2806.71    3547.51  26.4%
Bitcode/si..._ops_test_op_blendvpd_254.test         0.01       0.01  26.2%
SingleSour...ks/Shootout/Shootout-ary3.test         0.58       0.72  24.4%
MicroBench...tersAllDisjointDecreasing/1000    762504.39  947181.00  24.2%
MicroBench...ambda.test:BM_INIT3_LAMBDA/171         0.08       0.10  24.1%
MicroBench...ks.test:benchAutoVecForLoopTC1         2.49       3.08  23.9%
MicroBench.../lcalsCRaw.test:BM_EOS_RAW/171         0.26       0.32  23.7%
MicroBench...ForLoopWithReductionAutoVecTC3         3.74       4.61  23.2%
MicroBench...t:BM_PRESSURE_CALC_LAMBDA/5001        12.76      15.70  23.0%
MicroBench...M_MemCmp<2, LessThanZero, Mid>      2730.98    3342.25  22.4%
MicroBench...w.test:BM_DIFF_PREDICT_RAW/171         0.81       0.98  21.7%
MicroBench....test:benchForIC2VW4BigLoopTC2         7.89       9.59  21.7%
MicroBench...w.test:BM_TRIDIAG_ELIM_RAW/171         0.30       0.36  21.6%
MicroBench...BENCHMARK_acos_autovec_double_       106.29     129.15  21.5%
MicroBench...emCmp<3, GreaterThanZero, Mid>      2539.83    3079.68  21.3%
MicroBench....test:BENCHMARK_HARRIS/256/256       500.39     605.73  21.1%
MicroBench...mCmp<2, GreaterThanZero, Last>      3333.44    4033.66  21.0%
MicroBench....test:BM_PRESSURE_CALC_RAW/171         0.38       0.45  20.6%
MicroBench...BigLoopWithReductionAutoVecTC2         6.92       8.35  20.5%
MicroBench...est:BM_MemCmp<3, EqZero, None>      1951.25    2343.81  20.1%
Bitcode/si...imd_ops_test_op_maxpd_210.test         0.01       0.01  20.1%
MicroBench...rIC1VW4BigLoopWithReductionTC2         6.98       8.38  20.0%
SingleSour...ing/covariance/covariance.test         0.00       0.00  20.0%
MicroBench...ntime_checks_needed<2, double>         4.30       5.14  19.7%
MicroBench...lsBRaw.test:BM_INIT3_RAW/44217        53.82      64.40  19.7%
MicroBench...:BENCHMARK_cos_autovec_double_       494.77     590.46  19.3%
MicroBench...test:benchAutoVecForBigLoopTC1         3.10       3.70  19.3%
MicroBench...a.test:BM_TRAP_INT_LAMBDA/5001        15.74      18.74  19.1%
MicroBench....test:BM_INT_PREDICT_RAW/44217       302.57     360.15  19.0%
MicroBench...test:BM_ENERGY_CALC_LAMBDA/171         1.33       1.59  18.9%
MicroBench...da.test:BM_TRAP_INT_LAMBDA/171         0.53       0.63  18.9%
MicroBench...rIC1VW1BigLoopWithReductionTC1         4.92       5.85  18.8%
External/S...ecrand_fs/996.specrand_fs.test         0.00       0.00  18.6%
MicroBench...da.test:BM_VOL3D_CALC_LAMBDA/0       203.20     240.98  18.6%
MicroBench....test:BM_TRIDIAG_ELIM_RAW/5001        10.69      12.64  18.3%
Bitcode/si...imd_ops_test_op_paddq_229.test         0.01       0.01  18.2%
MultiSourc...oxyApps-C++/miniFE/miniFE.test         2.74       3.24  18.1%
MicroBench...tersAllDisjointIncreasing/1000    767915.56  906603.10  18.1%
MicroBench....test:BM_TRAP_INT_LAMBDA/44217       142.42     167.51  17.6%
MicroBench....test:benchForIC4VW4BigLoopTC2         8.19       9.63  17.6%
MicroBench...emCmp<2, GreaterThanZero, Mid>      3043.36    3567.19  17.2%
Bitcode/si..._ops_test_op_blendvpd_300.test         0.01       0.01  17.2%
MultiSourc...DOE-ProxyApps-C/CoMD/CoMD.test         1.67       1.96  17.1%
Bitcode/si...simd_ops_test_op_paddb_90.test         0.01       0.01  17.1%
MicroBench...hForIC1VW4LoopWithReductionTC3         3.98       4.65  16.7%
MicroBench...t:BENCHMARK_asinf_novec_float_       166.99     194.60  16.5%
MicroBench...est:BM_MemCmp<1, EqZero, None>      4843.56    5637.63  16.4%
MicroBench...test:BM_TRIDIAG_ELIM_RAW/44217        96.25     111.90  16.3%
MicroBench...est:BM_ENERGY_CALC_LAMBDA/5001        43.79      50.79  16.0%
Bitcode/si...imd_ops_test_op_paddsb_49.test         0.01       0.01  15.9%
Bitcode/si...imd_ops_test_op_minpd_196.test         0.01       0.01  15.8%
External/S...ecrand_ir/999.specrand_ir.test         0.00       0.00  15.7%
MicroBench...bda.test:BM_PIC_2D_LAMBDA/5001        35.78      41.38  15.7%
MicroBench...a.test:BM_IF_QUAD_LAMBDA/44217       180.48     208.44  15.5%
Bitcode/si...simd_ops_test_op_maxps_75.test         0.01       0.01  15.4%
MicroBench...mCmp<2, GreaterThanZero, None>      3464.55    3997.66  15.4%
MicroBench.../lcalsCRaw.test:BM_ADI_RAW/171         3.93       4.53  15.2%
MicroBench...mbda.test:BM_PIC_2D_LAMBDA/171         1.28       1.48  15.2%
MicroBench...bda.test:BM_INIT3_LAMBDA/44217        61.21      70.48  15.1%
MicroBench...mCmp<3, GreaterThanZero, Last>      2529.52    2912.17  15.1%
MultiSourc...rks/tramp3d-v4/tramp3d-v4.test         0.17       0.20  15.1%
MultiSourc...-dbl/LinearDependence-dbl.test         2.24       2.57  14.9%
MicroBench...test:BM_FIRST_SUM_LAMBDA/44217        30.49      34.99  14.8%
Bitcode/si...imd_ops_test_op_pabsd_237.test         0.01       0.01  14.7%
MicroBench...sBRaw.test:BM_IF_QUAD_RAW/5001        20.40      23.36  14.5%
MicroBench...lcalsCRaw.test:BM_EOS_RAW/5001         8.65       9.89  14.3%
MicroBench...:BM_PRESSURE_CALC_LAMBDA/44217       136.97     156.20  14.0%
Bitcode/si...imd_ops_test_op_andps_186.test         0.01       0.01  14.0%
MicroBench...Raw.test:BM_TRAP_INT_RAW/44217       149.89     170.85  14.0%
MultiSourc...CI_Purple/SMG2000/smg2000.test         1.43       1.63  13.7%
MicroBench...lsBRaw.test:BM_IF_QUAD_RAW/171         0.69       0.79  13.7%
MicroBench...w.test:BM_DEL_DOT_VEC_2D_RAW/1        43.20      49.09  13.6%
MicroBench...bda.test:BM_IF_QUAD_LAMBDA/171         0.69       0.78  13.6%
MicroBench...hForIC4VW1LoopWithReductionTC3         4.23       4.80  13.5%
MicroBench...sARaw.test:BM_VOL3D_CALC_RAW/2         1.52       1.73  13.4%
MicroBench...st:BENCHMARK_cos_novec_double_       518.34     586.19  13.1%
MicroBench...da.test:BM_IF_QUAD_LAMBDA/5001        20.40      23.06  13.0%
MultiSourc...oops-flt/ControlLoops-flt.test         3.01       3.40  13.0%
MicroBench...st:BM_PRESSURE_CALC_LAMBDA/171         0.36       0.41  13.0%
Bitcode/si..._ops_test_op_blendvps_276.test         0.01       0.01  12.8%
MicroBench...rIC2VW4BigLoopWithReductionTC2         7.41       8.36  12.8%
MicroBench...da.test:BM_PIC_2D_LAMBDA/44217       328.44     370.38  12.8%
Bitcode/si...imd_ops_test_op_maxpd_225.test         0.01       0.01  12.8%
MicroBench...w.test:BM_DEL_DOT_VEC_2D_RAW/0       266.80     300.20  12.5%
MicroBench..._runtime_checks_needed<4, int>         2.35       2.64  12.5%
MicroBench..._MemCmp<4, LessThanZero, None>      1989.94    2234.90  12.3%
SingleSour...+/Shootout-C++-nestedloop.test         0.00       0.00  12.3%
SingleSour...chmarks/Misc/himenobmtxpa.test         0.63       0.71  12.3%
MicroBench...M_MemCmp<4, LessThanZero, Mid>      1839.21    2062.85  12.2%
MultiSourc...arks/mafft/pairlocalalign.test        14.84      16.63  12.1%
SingleSour...-C++/Shootout-C++-moments.test         0.08       0.09  11.9%
Bitcode/si...imd_ops_test_op_mulpd_192.test         0.01       0.01  11.8%
MicroBench...CLambda.test:BM_EOS_LAMBDA/171         0.28       0.32  11.7%
MicroBench...rks.test:benchForIC4VW4LoopTC3         4.76       5.31  11.5%
MicroBench...BRaw.test:BM_TRAP_INT_RAW/5001        16.88      18.82  11.4%
MicroBench...BRaw.test:BM_IF_QUAD_RAW/44217       182.75     203.54  11.4%
MultiSourc...oxyApps-C/miniAMR/miniAMR.test         0.51       0.57  11.2%
Bitcode/si...imd_ops_test_op_addpd_190.test         0.01       0.01  11.2%
MicroBench..._runtime_checks_needed<4, int>         2.26       2.51  11.0%
MicroBench...hForIC2VW1LoopWithReductionTC2         2.78       3.08  11.0%
MicroBench...BENCHMARK_cbrt_autovec_double_       447.07     495.99  10.9%
MicroBench...rks.test:benchForIC1VW4LoopTC7         2.47       2.73  10.5%
Bitcode/si...imd_ops_test_op_mulps_167.test         0.01       0.01  10.5%
MicroBench...w.test:BM_INT_PREDICT_RAW/5001        19.13      21.14  10.5%
MultiSourc...patricia/network-patricia.test         0.08       0.09  10.4%
MicroBench....test:benchForIC2VW4BigLoopTC1         4.95       5.46  10.3%
Bitcode/si...simd_ops_test_op_minps_76.test         0.01       0.01  10.3%
Bitcode/si..._ops_test_op_packusdw_296.test         0.01       0.01  10.2%
MicroBench...ks.test:benchForIC2VW4LoopTC16         1.75       1.93  10.0%
Bitcode/si...simd_ops_test_op_mulps_23.test         0.01       0.01  10.0%
Bitcode/si...simd_ops_test_op_paddd_47.test         0.01       0.01  10.0%
Bitcode/si...imd_ops_test_op_divpd_223.test         0.01       0.01  10.0%
MicroBench...Raw.test:BM_HYDRO_1D_RAW/44217        24.55      26.99  10.0%
MicroBench...sBRaw.test:BM_TRAP_INT_RAW/171         0.57       0.63   9.9%
SingleSour...-C++/stepanov_abstraction.test         3.76       4.13   9.8%
MicroBench...MemCmp<2, LessThanZero, First>      2846.03    3122.91   9.7%
Bitcode/si...imd_ops_test_op_pabsb_235.test         0.01       0.01   9.7%
MicroBench...da.test:BM_PIC_1D_LAMBDA/44217       450.59     494.13   9.7%
MicroBench...ks.test:benchForIC4VW4LoopTC31         5.32       5.82   9.4%
MicroBench...st:BENCHMARK_expf_novec_float_       116.57     127.45   9.3%
Bitcode/si...imd_ops_test_op_paddd_143.test         0.01       0.01   9.3%
MicroBench...BENCHMARK_ORDERED_DITHER/512/8      1088.80    1186.84   9.0%
External/S...017rate/557.xz_r/557.xz_r.test        57.10      62.18   8.9%
MicroBench...rks.test:benchForIC2VW4LoopTC7         3.01       3.28   8.7%
MicroBench...ForIC1VW1LoopWithReductionTC15         2.33       2.53   8.6%
MicroBench...:BENCHMARK_cosf_autovec_float_       282.84     307.27   8.6%
MultiSourc...arks/VersaBench/dbms/dbms.test         1.51       1.64   8.6%
Bitcode/si...imd_ops_test_op_maxpd_195.test         0.01       0.01   8.6%
MicroBench...ForIC4VW4LoopWithReductionTC15         3.60       3.91   8.5%
MicroBench..._MemCmp<3, LessThanZero, None>      2478.23    2689.25   8.5%
MicroBench...hForIC1VW1LoopWithReductionTC2         3.28       3.56   8.5%
MultiSourc...quoia/CrystalMk/CrystalMk.test         4.09       4.44   8.5%
MicroBench...test:BM_MULADDSUB_LAMBDA/44217        73.97      80.23   8.5%
MultiSourc...comm-CRC32/telecomm-CRC32.test         0.13       0.14   8.4%
Bitcode/si..._ops_test_op_packusdw_273.test         0.01       0.01   8.2%
MicroBench...st:BM_MemCmp<1, EqZero, First>      5057.76    5473.21   8.2%
MicroBench....test:benchForIC2VW4BigLoopTC7         9.32      10.09   8.2%
MicroBench...est:BM_MemCmp<5, EqZero, None>      1411.53    1527.32   8.2%
Bitcode/si...d_ops_test_op_cmpeqpd_227.test         0.01       0.01   8.2%
MicroBench...Lambda.test:BM_EOS_LAMBDA/5001         8.77       9.47   8.1%
MicroBench...sARaw.test:BM_VOL3D_CALC_RAW/0       203.10     219.46   8.1%
MicroBench...rIC1VW1BigLoopWithReductionTC8         7.96       8.59   8.0%
MicroBench...calsCRaw.test:BM_EOS_RAW/44217        79.35      85.57   7.8%
MicroBench...rIC4VW1BigLoopWithReductionTC4         5.20       5.61   7.8%
MicroBench...test:benchAutoVecForBigLoopTC7         2.51       2.70   7.7%
MultiSourc...nsumer-lame/consumer-lame.test         0.15       0.16   7.7%
MultiSourc...peg2/mpeg2dec/mpeg2decode.test         0.01       0.01   7.7%
MicroBench...mCmp<8, GreaterThanZero, None>       495.23     533.12   7.7%
MicroBench...rIC4VW4BigLoopWithReductionTC8         7.93       8.52   7.5%
MultiSourc...chmarks/Rodinia/srad/srad.test         0.43       0.47   7.5%
MicroBench...ntime_checks_needed<3, double>         4.10       4.41   7.4%
MicroBench...hForIC4VW1LoopWithReductionTC2         4.04       4.34   7.3%
MultiSourc...itBench/uuencode/uuencode.test         0.01       0.01   7.2%
MicroBench...test:BM_INT_PREDICT_LAMBDA/171         0.64       0.69   7.2%
Bitcode/Be...s/Halide/blur/halide_blur.test         2.08       2.22   7.0%
Bitcode/si...d_ops_test_op_cmpltps_185.test         0.01       0.01   7.0%
MicroBench...st:BENCHMARK_erff_novec_float_       203.63     217.72   6.9%
Bitcode/si...imd_ops_test_op_divpd_208.test         0.01       0.01   6.8%
Bitcode/si...md_ops_test_op_cmpeqps_40.test         0.01       0.01   6.8%
MultiSourc...ing-flt/Equivalencing-flt.test         0.42       0.45   6.7%
MicroBench...:BENCHMARK_sin_autovec_double_       542.45     578.57   6.7%
MicroBench...rks.test:benchForIC4VW4LoopTC7         3.10       3.30   6.6%
MicroBench...orIC2VW1LoopWithReductionTC127        17.32      18.45   6.5%
Bitcode/si..._ops_test_op_packsswb_218.test         0.01       0.01   6.4%
Bitcode/si...d_ops_test_op_cmpeqps_136.test         0.01       0.01   6.4%
MicroBench...128UniformDivisor<__uint128_t>        16.47      17.52   6.4%
MicroBench...orLoopWithReductionAutoVecTC15         2.24       2.38   6.3%
SingleSour...ce/Benchmarks/Misc/perlin.test         1.87       1.98   6.1%
SingleSour...-C++/Shootout-C++-objinst.test         0.00       0.00   6.1%
MultiSourc...marks/7zip/7zip-benchmark.test        10.22      10.84   6.0%
Bitcode/si...imd_ops_test_op_addps_165.test         0.01       0.01   6.0%
MicroBench...rIC4VW1BigLoopWithReductionTC2         8.18       8.67   6.0%
MicroBench...ks.test:benchAutoVecForLoopTC7         2.60       2.76   6.0%
MicroBench...rIC1VW1BigLoopWithReductionTC4         6.57       6.96   5.9%
MicroBench...rIC2VW4BigLoopWithReductionTC7         8.60       9.11   5.9%
MicroBench...aw.test:BM_INT_PREDICT_RAW/171         0.63       0.66   5.8%
MicroBench...:BENCHMARK_erff_autovec_float_       215.08     227.40   5.7%
MicroBench...t:BENCHMARK_cbrtf_novec_float_       458.39     484.06   5.6%
SingleSour...ncils/jacobi-2d/jacobi-2d.test         4.72       4.98   5.6%
MultiSourc.../Prolangs-C/bison/mybison.test         0.00       0.00   5.5%
SingleSour...hmarks/Linpack/linpack-pc.test         1.42       1.50   5.4%
Bitcode/si...simd_ops_test_op_addps_21.test         0.01       0.01   5.4%
Bitcode/si...imd_ops_test_op_pabsw_242.test         0.01       0.01   5.3%
MicroBench...test:benchAutoVecForBigLoopTC2         3.73       3.92   5.3%
MicroBench...MARK_BICUBIC_INTERPOLATION/128      5351.01    5630.24   5.2%
SingleSour...ebra/blas/gesummv/gesummv.test         0.00       0.00   5.2%
MicroBench...hForIC1VW1LoopWithReductionTC7         2.08       2.19   5.2%
MicroBench...st:BENCHMARK_exp_novec_double_       227.58     239.17   5.1%
MicroBench...Cmp<8, GreaterThanZero, First>       505.31     530.30   4.9%
MicroBench...aw.test:BM_MULADDSUB_RAW/44217        69.73      73.17   4.9%
MicroBench...HMARK_BICUBIC_INTERPOLATION/64      1262.04    1323.53   4.9%
External/S...epsjeng_s/631.deepsjeng_s.test       103.41     108.37   4.8%
MicroBench...mp<31, GreaterThanZero, First>       261.58     274.06   4.8%
SingleSour...rks/CoyoteBench/almabench.test         5.89       6.17   4.7%
MicroBench...st:BENCHMARK_cosf_novec_float_       257.28     269.36   4.7%
MicroBench...sCRaw.test:BM_HYDRO_1D_RAW/171         0.09       0.09   4.7%
Bitcode/si...imd_ops_test_op_mulps_119.test         0.01       0.01   4.6%
MicroBench...rIC2VW1BigLoopWithReductionTC2         8.25       8.63   4.6%
MicroBench..._MemCmp<2, LessThanZero, None>      3010.38    3149.40   4.6%
MicroBench...:BENCHMARK_erf_autovec_double_       230.49     241.10   4.6%
MicroBench...mCmp<3, GreaterThanZero, None>      2484.00    2597.64   4.6%
MicroBench....test:benchForIC4VW4BigLoopTC7         9.64      10.07   4.5%
SingleSour...enchmarks/Stanford/RealMM.test         0.00       0.00   4.5%
MicroBench...MemCmp<31, LessThanZero, Last>       258.77     270.19   4.4%
MicroBench...Raw.test:BM_MULADDSUB_RAW/5001         6.71       7.00   4.4%
MicroBench...hForIC1VW1LoopWithReductionTC8         1.96       2.05   4.4%
MicroBench...est:benchAutoVecForBigLoopTC15         2.69       2.81   4.3%
Bitcode/si..._ops_test_op_packssdw_217.test         0.01       0.01   4.3%
MicroBench...ForIC2VW1LoopWithReductionTC16         2.62       2.73   4.2%
SingleSour...ncils/seidel-2d/seidel-2d.test        38.66      40.30   4.2%
MicroBench...MemCmp<31, LessThanZero, None>       231.69     241.38   4.2%
SingleSour...d-warshall/floyd-warshall.test        38.49      40.10   4.2%
Bitcode/si...imd_ops_test_op_pabsd_240.test         0.01       0.01   4.2%
MicroBench...est:BM_MemCmp<8, EqZero, Last>       335.33     349.24   4.1%
MicroBench...intersAllDisjointIncreasing/32     30166.67   31410.02   4.1%
MicroBench...hForIC4VW4LoopWithReductionTC7         2.26       2.35   4.1%
MicroBench...rIC1VW4BigLoopWithReductionTC4         5.40       5.63   4.1%
MicroBench...ForIC1VW4LoopWithReductionTC15         2.28       2.37   4.1%
MicroBench...ForLoopWithReductionAutoVecTC7         2.29       2.39   4.1%
MicroBench....test:benchForIC1VW4BigLoopTC2         9.04       9.41   4.1%
MicroBench...rks.test:benchForIC4VW4LoopTC8         3.21       3.34   4.0%
Bitcode/si...d_ops_test_op_cmpeqpd_212.test         0.01       0.01   4.0%
MicroBench...ks.test:benchForIC4VW4LoopTC15         4.79       4.99   4.0%
MicroBench...est:BM_MemCmp<31, EqZero, Mid>       118.55     123.29   4.0%
MicroBench...MemCmp<15, LessThanZero, Last>       344.10     357.74   4.0%
MicroBench...c128UniformDivisor<__int128_t>        18.87      19.61   4.0%
MicroBench...BigLoopWithReductionAutoVecTC7         7.72       8.02   3.9%
MicroBench...HMARK_BICUBIC_INTERPOLATION/32       281.53     292.56   3.9%
MicroBench...s.test:benchAutoVecForLoopTC15         2.78       2.89   3.9%
MicroBench...ForIC2VW4LoopWithReductionTC15         3.29       3.42   3.8%
MicroBench...st:BM_MemCmp<31, EqZero, Last>       119.33     123.88   3.8%
MultiSourc.../Benchmarks/Bullet/bullet.test         3.62       3.76   3.8%
MicroBench...MARK_BILINEAR_INTERPOLATION/32        77.99      80.94   3.8%
MicroBench..._MemCmp<31, LessThanZero, Mid>       257.69     267.35   3.7%
Bitcode/si...simd_ops_test_op_paddd_95.test         0.01       0.01   3.7%
MicroBench..._MemCmp<8, LessThanZero, Last>       501.80     520.10   3.6%
MicroBench...st:BM_MemCmp<32, EqZero, None>        98.82     102.37   3.6%
SingleSour...C++/Shootout-C++-heapsort.test         2.82       2.92   3.6%
MicroBench...rIC1VW4BigLoopWithReductionTC7         7.56       7.83   3.5%
MicroBench...MemCmp<15, LessThanZero, None>       311.99     322.80   3.5%
MicroBench...ForIC4VW1LoopWithReductionTC15         2.47       2.55   3.5%
MicroBench...test:BM_BAND_LIN_EQ_LAMBDA/171         0.06       0.06   3.4%
MicroBench...emCmp<31, LessThanZero, First>       254.52     263.30   3.4%
MicroBench...Cmp<16, GreaterThanZero, Last>       329.09     340.24   3.4%
MicroBench....test:benchForIC1VW4BigLoopTC1         4.92       5.09   3.4%
Bitcode/si...imd_ops_test_op_addpd_205.test         0.01       0.01   3.3%
MicroBench...est:BM_MemCmp<15, EqZero, Mid>       202.66     209.34   3.3%
MicroBench...t:BM_MemCmp<32, EqZero, First>        98.10     101.33   3.3%
MicroBench...st:BENCHMARK_sin_novec_double_       571.04     589.63   3.3%
MicroBench...test:benchAutoVecForBigLoopTC8         1.54       1.59   3.2%
MicroBench...est:BM_MemCmp<32, EqZero, Mid>        97.94     101.08   3.2%
MicroBench...late.test:BENCHMARK_DILATE/512      1209.47    1247.87   3.2%
Bitcode/si..._ops_test_op_blendvpd_277.test         0.01       0.01   3.2%
MicroBench..._MemCmp<15, LessThanZero, Mid>       245.07     252.77   3.1%
MicroBench..._MemCmp<4, LessThanZero, Last>      1444.71    1489.85   3.1%
MicroBench...MARK_BILINEAR_INTERPOLATION/64       310.70     320.32   3.1%
MicroBench...HMARK_BICUBIC_INTERPOLATION/16        57.44      59.20   3.1%
MicroBench...ForLoopWithReductionAutoVecTC8         2.02       2.08   3.1%
MicroBench...est:BM_MemCmp<8, EqZero, None>       342.76     353.05   3.0%
MicroBench..._MemCmp<8, LessThanZero, None>       525.08     540.63   3.0%
MicroBench...hForIC4VW1LoopWithReductionTC7         2.56       2.63   2.9%
MicroBench...MemCmp<5, LessThanZero, First>      1587.71    1633.15   2.9%
MicroBench...st:BM_MemCmp<8, EqZero, First>       335.64     345.14   2.8%
MultiSourc...lications/sqlite3/sqlite3.test         2.49       2.56   2.8%
MultiSourc...rks/FreeBench/pifft/pifft.test         0.08       0.08   2.8%
MicroBench...t:BM_MemCmp<31, EqZero, First>       119.29     122.61   2.8%
MicroBench...BigLoopWithReductionAutoVecTC4         5.49       5.65   2.8%
MicroBench....test:benchForIC4VW4BigLoopTC8         9.60       9.86   2.8%
MicroBench...hForIC2VW1LoopWithReductionTC7         2.26       2.32   2.7%
MultiSourc...nchmarks/McCat/05-eks/eks.test         0.00       0.00   2.7%
MicroBench...hVW16From_uint8_t_To_uint16_t_       364.82     374.50   2.7%
MicroBench...MemCmp<8, LessThanZero, First>       522.44     536.22   2.6%
Bitcode/si..._ops_test_op_packsswb_233.test         0.01       0.01   2.6%
MicroBench...rIC2VW4BigLoopWithReductionTC4         6.67       6.85   2.6%
Bitcode/si...imd_ops_test_op_paddb_138.test         0.01       0.01   2.6%
MicroBench...Cmp<32, GreaterThanZero, None>       221.00     226.73   2.6%
MicroBench...emCmp<15, LessThanZero, First>       246.99     253.36   2.6%
MicroBench...st:BM_MemCmp<15, EqZero, Last>       193.97     198.94   2.6%
MicroBench...hForIC1VW4LoopWithReductionTC7         2.30       2.36   2.5%
MicroBench...st:BENCHMARK_erf_novec_double_       210.73     215.80   2.4%
MicroBench...emCmp<8, GreaterThanZero, Mid>       509.49     521.70   2.4%
MicroBench..._MemCmp<32, LessThanZero, Mid>       174.84     179.03   2.4%
MicroBench...est:BM_MemCmp<63, EqZero, Mid>       101.07     103.47   2.4%
MicroBench...MemCmp<32, LessThanZero, Last>       175.02     179.05   2.3%
MicroBench...ks.test:benchAutoVecForLoopTC3         5.11       5.22   2.3%
MicroBench...orIC4VW1LoopWithReductionTC128         7.76       7.94   2.3%
MicroBench...IC1VW1BigLoopWithReductionTC15        12.87      13.16   2.3%
MicroBench...128UniformDivisor<__uint128_t>        12.80      13.09   2.3%
MicroBench...w.test:BM_DEL_DOT_VEC_2D_RAW/2         0.49       0.50   2.2%
MicroBench...mp<32, GreaterThanZero, First>       185.67     189.81   2.2%
MicroBench...st:BM_MemCmp<32, EqZero, Last>        99.34     101.52   2.2%
MicroBench...runtime_checks_fail<3, double>         4.22       4.31   2.2%
MicroBench...mCmp<32, GreaterThanZero, Mid>       187.19     191.18   2.1%
MicroBench...st:BM_MemCmp<63, EqZero, Last>       132.75     135.55   2.1%
MicroBench...mbda.test:BM_PIC_1D_LAMBDA/171         1.36       1.38   2.0%
MicroBench...Cmp<32, GreaterThanZero, Last>       185.37     189.07   2.0%
MicroBench...ARK_BILINEAR_INTERPOLATION/256      4953.66    5049.39   1.9%
MicroBench...hForIC2VW1LoopWithReductionTC8         1.94       1.97   1.9%
MicroBench...orLoopWithReductionAutoVecTC16         1.75       1.78   1.9%
MicroBench...mCmp<8, GreaterThanZero, Last>       518.75     528.30   1.8%
MicroBench...rIC2VW1BigLoopWithReductionTC8         7.51       7.65   1.8%
MicroBench...st:BM_MemCmp<31, EqZero, None>       119.92     122.10   1.8%
MultiSourc...lFlow-flt/ControlFlow-flt.test         2.82       2.87   1.8%
MicroBench...rIC4VW1BigLoopWithReductionTC8         5.93       6.03   1.8%
MicroBench...emCmp<32, LessThanZero, First>       173.45     176.54   1.8%
MicroBench...mCmp<31, GreaterThanZero, Mid>       263.83     268.48   1.8%
MicroBench...Cmp<31, GreaterThanZero, Last>       254.40     258.85   1.7%
MicroBench...aw.test:BM_ENERGY_CALC_RAW/171         1.36       1.39   1.7%
MicroBench...MemCmp<63, LessThanZero, None>       123.25     125.39   1.7%
MicroBench...t:BM_MemCmp<15, EqZero, First>       196.57     199.94   1.7%
MicroBench...t:BENCHMARK_sinh_novec_double_       571.27     580.80   1.7%
SingleSour...enchmarks/SmallPT/smallpt.test         7.78       7.91   1.6%
MicroBench..._MemCmp<63, LessThanZero, Mid>        93.92      95.46   1.6%
MicroBench...MemCmp<32, LessThanZero, None>       213.86     217.34   1.6%
MicroBench...ForIC2VW1LoopWithReductionTC15         3.08       3.13   1.5%
MicroBench...runtime_checks_needed<16, int>         3.93       3.99   1.5%
MultiSourc...lications/SIBsim4/SIBsim4.test         2.79       2.83   1.4%
MicroBench..._MemCmp<6, LessThanZero, None>      1442.15    1462.72   1.4%
Bitcode/si...d_ops_test_op_cmpltpd_198.test         0.01       0.01   1.4%
MicroBench...hVW16From_uint8_t_To_uint16_t_       274.60     278.42   1.4%
MicroBench...ForIC2VW4LoopWithReductionTC16         1.66       1.69   1.4%
MicroBench...t:BM_MemCmp<63, EqZero, First>       101.42     102.78   1.3%
MicroBench...test:BM_MemCmp<8, EqZero, Mid>       335.48     339.85   1.3%
Bitcode/si...imd_ops_test_op_maxps_171.test         0.01       0.01   1.2%
MultiSourc...nchmarks/McCat/09-vor/vor.test         0.15       0.16   1.2%
MicroBench...st:BM_MemCmp<15, EqZero, None>       206.02     208.47   1.2%
MicroBench...Cmp<63, GreaterThanZero, None>       125.88     127.37   1.2%
MicroBench...mCmp<63, GreaterThanZero, Mid>       100.01     101.15   1.1%
MicroBench...ks.test:benchForIC2VW4LoopTC15         3.60       3.64   1.1%
MicroBench...rIC2VW1BigLoopWithReductionTC4         6.60       6.68   1.1%
MicroBench...orIC1VW4LoopWithReductionTC128         5.62       5.68   1.1%
MicroBench...IC4VW4BigLoopWithReductionTC15        12.46      12.59   1.1%
MicroBench...ks.test:benchForIC4VW4LoopTC16         1.85       1.87   1.1%
MicroBench...emCmp<63, LessThanZero, First>        94.55      95.55   1.0%
MicroBench...ForIC1VW1LoopWithReductionTC16         1.97       1.99   1.0%
MicroBench...st:BENCHMARK_sinf_novec_float_       260.04     262.54   1.0%
MicroBench...rIC4VW4BigLoopWithReductionTC4         6.68       6.74   0.9%
MicroBench...st:BM_MemCmp<63, EqZero, None>       124.90     126.02   0.9%
MicroBench...hForIC1VW4LoopWithReductionTC8         2.05       2.07   0.9%
MicroBench...est:benchAutoVecForBigLoopTC16         1.97       1.99   0.8%
MicroBench...ForIC1VW4LoopWithReductionTC16         1.76       1.78   0.8%
MicroBench...ks.test:benchForIC1VW4LoopTC15         2.85       2.87   0.8%
MicroBench...ks.test:benchForIC1VW4LoopTC32         2.65       2.68   0.8%
MicroBench...mp<64, GreaterThanZero, First>        99.25     100.04   0.8%
MicroBench...Cmp<31, GreaterThanZero, None>       259.43     261.43   0.8%
MicroBench...mbda.test:BM_ICCG_LAMBDA/44217        53.09      53.47   0.7%
MicroBench...emCmp<64, LessThanZero, First>        91.81      92.45   0.7%
MicroBench...ForIC4VW1LoopWithReductionTC64         3.18       3.20   0.6%
MicroBench...s.test:benchAutoVecForLoopTC16         1.98       1.99   0.6%
SingleSour...t-C++/Shootout-C++-lists1.test         0.20       0.20   0.6%
MicroBench...ForIC1VW4LoopWithReductionTC32         2.05       2.06   0.6%
MicroBench...s.test:benchForIC2VW4LoopTC127        10.50      10.56   0.6%
MicroBench...hForIC4VW1LoopWithReductionTC8         1.94       1.95   0.6%
MicroBench...ForIC1VW4LoopWithReductionTC63         3.59       3.61   0.6%
MicroBench...ForIC4VW1LoopWithReductionTC63         3.75       3.77   0.6%
MicroBench...IC2VW1BigLoopWithReductionTC31        18.74      18.85   0.5%
MicroBench...rLoopWithReductionAutoVecTC128         5.63       5.65   0.5%
External/S...0.omnetpp_r/520.omnetpp_r.test        82.68      83.05   0.4%
SingleSour...Benchmarks/Misc/whetstone.test         0.72       0.72   0.4%
MicroBench...orLoopWithReductionAutoVecTC63         3.61       3.62   0.4%
MicroBench...st:benchAutoVecForBigLoopTC128         8.84       8.87   0.4%
MicroBench...ForLoopWithReductionAutoVecTC2         3.37       3.39   0.4%
MicroBench...orIC1VW4LoopWithReductionTC127         6.01       6.03   0.4%
MicroBench...test:benchForIC1VW4BigLoopTC31        14.28      14.33   0.3%
MicroBench...test:benchForIC2VW4BigLoopTC15        12.39      12.43   0.3%
MicroBench..._MemCmp<3, LessThanZero, Last>      2391.59    2399.30   0.3%
MicroBench...ForIC2VW4LoopWithReductionTC31         3.16       3.17   0.3%
MicroBench...IC4VW4BigLoopWithReductionTC31        17.57      17.62   0.3%
MicroBench...orIC2VW4LoopWithReductionTC127         5.43       5.45   0.3%
MicroBench...s.test:benchAutoVecForLoopTC31         3.36       3.37   0.3%
MicroBench...IC2VW4BigLoopWithReductionTC16         7.93       7.95   0.3%
MicroBench...MemCmp<64, LessThanZero, None>       112.70     112.99   0.3%
MicroBench...IC1VW1BigLoopWithReductionTC31        25.26      25.32   0.2%
MicroBench...ForIC4VW4LoopWithReductionTC63         5.00       5.02   0.2%
MultiSourc...s/Fhourstones/fhourstones.test         0.97       0.97   0.2%
MicroBench...C4VW1BigLoopWithReductionTC128        41.04      41.10   0.2%
MicroBench..._MemCmp<64, LessThanZero, Mid>       119.56     119.73   0.1%
MicroBench...ForIC4VW4LoopWithReductionTC16         1.73       1.74   0.1%
MicroBench...thVW8From_uint8_t_To_uint16_t_       430.93     431.50   0.1%
MicroBench...IC2VW4BigLoopWithReductionTC31        13.87      13.89   0.1%
MicroBench...ForIC4VW1LoopWithReductionTC31         2.80       2.80   0.1%
MicroBench...ForIC1VW1LoopWithReductionTC32         3.10       3.10   0.0%
MicroBench...ks.test:benchAutoVecForLoopTC8         1.68       1.68   0.0%
MicroBench...IC1VW4BigLoopWithReductionTC64        21.09      21.10   0.0%
MicroBench...ForIC1VW4LoopWithReductionTC64         3.20       3.20  -0.0%
MicroBench...MemCmp<63, LessThanZero, Last>       131.31     131.30  -0.0%
MicroBench....test:benchAutoVecForLoopTC128         8.68       8.68  -0.0%
MicroBench...est:benchAutoVecForBigLoopTC31         3.31       3.31  -0.0%
MicroBench...orLoopWithReductionAutoVecTC31         2.62       2.62  -0.0%
MicroBench...est:benchAutoVecForBigLoopTC63         4.97       4.96  -0.0%
MicroBench...Cmp<64, GreaterThanZero, None>       113.61     113.55  -0.1%
MicroBench...ForIC4VW1LoopWithReductionTC16         1.76       1.76  -0.1%
MicroBench...rIC4VW4BigLoopWithReductionTC2         8.30       8.29  -0.1%
MicroBench...ForIC1VW4LoopWithReductionTC31         2.61       2.61  -0.1%
MicroBench...IC1VW1BigLoopWithReductionTC16        13.15      13.14  -0.1%
MicroBench....test:benchAutoVecForLoopTC127         9.44       9.43  -0.1%
MicroBench...ForIC2VW1LoopWithReductionTC63         8.41       8.41  -0.1%
MicroBench...test:benchForIC2VW4BigLoopTC31        16.54      16.53  -0.1%
MicroBench...a.test:BM_HYDRO_1D_LAMBDA/5001         3.02       3.02  -0.1%
MicroBench...C4VW4BigLoopWithReductionTC127        46.61      46.55  -0.1%
MicroBench....test:BM_DIFF_PREDICT_RAW/5001        34.90      34.86  -0.1%
MicroBench...orIC4VW4LoopWithReductionTC128         4.24       4.24  -0.2%
MicroBench...ForIC1VW1LoopWithReductionTC64         6.28       6.27  -0.2%
MicroBench...est:benchForIC4VW4BigLoopTC127        53.32      53.21  -0.2%
MicroBench....test:BM_MULADDSUB_LAMBDA/5001         7.65       7.63  -0.2%
MicroBench...rLoopWithReductionAutoVecTC127         6.02       6.01  -0.2%
MicroBench...test:benchForIC4VW4BigLoopTC64        23.92      23.87  -0.2%
MicroBench...orIC1VW1LoopWithReductionTC128        15.36      15.32  -0.2%
MicroBench...test:benchForIC4VW4BigLoopTC63        31.59      31.52  -0.2%
MicroBench...IC1VW4BigLoopWithReductionTC31        13.06      13.03  -0.2%
MicroBench...ForIC2VW1LoopWithReductionTC64         8.49       8.47  -0.2%
MicroBench...rIC4VW4BigLoopWithReductionTC7         8.94       8.92  -0.2%
MicroBench...est:benchForIC1VW4BigLoopTC127        48.17      48.05  -0.2%
MicroBench...ForIC4VW4LoopWithReductionTC32         1.84       1.83  -0.2%
MicroBench...ForIC2VW4LoopWithReductionTC64         2.63       2.62  -0.3%
MicroBench...IC2VW4BigLoopWithReductionTC64        20.48      20.43  -0.3%
MicroBench...orIC4VW4LoopWithReductionTC127         6.24       6.23  -0.3%
MicroBench...rIC2VW1BigLoopWithReductionTC1         5.14       5.13  -0.3%
MicroBench...C2VW1BigLoopWithReductionTC128        71.56      71.35  -0.3%
SingleSour...marks/Misc/matmul_f64_4x4.test         0.82       0.82  -0.3%
MicroBench...mCmp<64, GreaterThanZero, Mid>       123.79     123.42  -0.3%
MicroBench...C2VW4BigLoopWithReductionTC127        42.40      42.27  -0.3%
MicroBench...ForIC2VW4LoopWithReductionTC32         1.87       1.86  -0.3%
MicroBench...IC2VW1BigLoopWithReductionTC16        10.27      10.24  -0.3%
MicroBench...C1VW4BigLoopWithReductionTC127        42.35      42.21  -0.3%
MicroBench...C2VW4BigLoopWithReductionTC128        38.89      38.76  -0.3%
MicroBench...est:benchForIC1VW4BigLoopTC128        46.81      46.66  -0.3%
MicroBench...IC4VW1BigLoopWithReductionTC64        21.20      21.13  -0.3%
MicroBench...C4VW1BigLoopWithReductionTC127        42.41      42.26  -0.3%
MicroBench...ks.test:benchForIC1VW4LoopTC31         3.42       3.40  -0.4%
MicroBench...mp<63, GreaterThanZero, First>       101.54     101.17  -0.4%
MicroBench...ks.test:benchForIC2VW4LoopTC31         3.80       3.78  -0.4%
MicroBench...orLoopWithReductionAutoVecTC32         2.05       2.04  -0.4%
MicroBench...C1VW1BigLoopWithReductionTC128        93.09      92.74  -0.4%
MicroBench...test:benchForIC2VW4BigLoopTC64        23.88      23.79  -0.4%
MicroBench...C1VW4BigLoopWithReductionTC128        40.96      40.80  -0.4%
MicroBench...ForIC2VW4LoopWithReductionTC63         3.83       3.82  -0.4%
MicroBench...IC4VW1BigLoopWithReductionTC63        23.38      23.29  -0.4%
MicroBench...est:benchForIC4VW4BigLoopTC128        45.48      45.29  -0.4%
MicroBench...hVW8From_uint32_t_To_uint64_t_      1603.65    1596.88  -0.4%
MicroBench...st:BM_MemCmp<5, EqZero, First>      1538.71    1532.04  -0.4%
MicroBench...est:benchForIC2VW4BigLoopTC128        45.11      44.91  -0.4%
MicroBench...IC4VW4BigLoopWithReductionTC32        11.75      11.70  -0.4%
MicroBench...IC1VW1BigLoopWithReductionTC32        25.52      25.41  -0.4%
MultiSourc...cCat/03-testtrie/testtrie.test         0.01       0.01  -0.4%
MicroBench...test:benchForIC2VW4BigLoopTC63        28.00      27.87  -0.4%
MicroBench...rIC1VW4BigLoopWithReductionTC8         5.93       5.90  -0.5%
MicroBench...IC1VW4BigLoopWithReductionTC63        23.37      23.26  -0.5%
MicroBench...s.test:benchAutoVecForLoopTC32         2.61       2.60  -0.5%
MicroBench...ks.test:benchForIC1VW4LoopTC64         6.67       6.64  -0.5%
MicroBench...IC4VW4BigLoopWithReductionTC63        27.02      26.89  -0.5%
MicroBench...s.test:benchAutoVecForLoopTC63         4.97       4.95  -0.5%
MicroBench...gLoopWithReductionAutoVecTC127        42.32      42.11  -0.5%
MicroBench...IC1VW1BigLoopWithReductionTC64        44.82      44.59  -0.5%
MicroBench...test:benchForIC4VW4BigLoopTC32        12.46      12.40  -0.5%
MicroBench...est:benchForIC2VW4BigLoopTC127        49.44      49.19  -0.5%
MicroBench...nLoopFrom_uint8_t_To_uint16_t_       253.06     251.77  -0.5%
MicroBench...st:BM_MemCmp<64, EqZero, None>       113.31     112.73  -0.5%
MicroBench...igLoopWithReductionAutoVecTC31        13.11      13.04  -0.5%
MicroBench...IC4VW4BigLoopWithReductionTC64        20.42      20.32  -0.5%
MicroBench...Cmp<15, GreaterThanZero, Last>       382.22     380.22  -0.5%
MicroBench...igLoopWithReductionAutoVecTC32        11.91      11.85  -0.5%
MicroBench...s.test:benchForIC4VW4LoopTC127        10.61      10.56  -0.5%
MicroBench...orIC2VW4LoopWithReductionTC128         4.55       4.52  -0.5%
MicroBench...MemCmp<64, LessThanZero, Last>       119.79     119.14  -0.5%
MicroBench...rks.test:benchForIC2VW4LoopTC8         1.67       1.66  -0.6%
MicroBench...C4VW4BigLoopWithReductionTC128        38.90      38.68  -0.6%
MicroBench...ate.test:BENCHMARK_DILATE/1024      4775.37    4747.72  -0.6%
MicroBench...IC2VW1BigLoopWithReductionTC32        19.07      18.96  -0.6%
MicroBench...ForIC1VW1LoopWithReductionTC31         3.54       3.52  -0.6%
MicroBench...orLoopWithReductionAutoVecTC64         3.20       3.18  -0.6%
MicroBench...ks.test:benchForIC4VW4LoopTC63         7.11       7.07  -0.6%
MicroBench...C1VW1BigLoopWithReductionTC127        92.54      91.98  -0.6%
MicroBench...t:BM_MemCmp<64, EqZero, First>       100.93     100.31  -0.6%
MicroBench...orIC1VW1LoopWithReductionTC127        15.57      15.47  -0.6%
MicroBench...IC2VW1BigLoopWithReductionTC63        35.99      35.76  -0.6%
MicroBench...test:benchForIC1VW4BigLoopTC63        26.13      25.97  -0.6%
MicroBench...IC2VW4BigLoopWithReductionTC15        11.04      10.97  -0.6%
MicroBench...ForIC1VW1LoopWithReductionTC63         6.64       6.60  -0.6%
MicroBench...s.test:benchForIC2VW4LoopTC128         8.33       8.28  -0.7%
MicroBench...ARK_BILINEAR_INTERPOLATION/128      1268.21    1259.93  -0.7%
MicroBench...ks.test:benchForIC2VW4LoopTC63         5.14       5.10  -0.7%
MicroBench...s.test:benchForIC1VW4LoopTC127         9.60       9.53  -0.7%
MicroBench...igLoopWithReductionAutoVecTC63        23.42      23.26  -0.7%
MicroBench...test:benchForIC4VW4BigLoopTC31        21.13      20.98  -0.7%
MicroBench...igLoopWithReductionAutoVecTC64        21.26      21.11  -0.7%
MicroBench...ForIC4VW4LoopWithReductionTC31         4.54       4.50  -0.7%
MicroBench...nLoopFrom_uint8_t_To_uint32_t_       707.46     702.30  -0.7%
MicroBench...gLoopWithReductionAutoVecTC128        41.16      40.86  -0.7%
SingleSour...arks/CoyoteBench/fftbench.test         0.57       0.56  -0.7%
MicroBench...est:benchAutoVecForBigLoopTC64         4.52       4.48  -0.8%
MicroBench...st:BM_MemCmp<64, EqZero, Last>       123.89     122.91  -0.8%
MicroBench...IC1VW1BigLoopWithReductionTC63        46.45      46.08  -0.8%
MicroBench...IC4VW1BigLoopWithReductionTC32        11.93      11.84  -0.8%
MicroBench...Cmp<63, GreaterThanZero, Last>       134.13     133.06  -0.8%
MicroBench...orIC4VW1LoopWithReductionTC127         9.32       9.24  -0.8%
MicroBench...MARK_BILINEAR_INTERPOLATION/16        18.55      18.39  -0.8%
MicroBench...IC1VW4BigLoopWithReductionTC32        11.94      11.84  -0.8%
MicroBench...IC2VW1BigLoopWithReductionTC64        36.38      36.07  -0.8%
MicroBench...C2VW1BigLoopWithReductionTC127        71.23      70.61  -0.9%
MicroBench...hForIC4VW1LoopWithReductionTC1         3.20       3.17  -0.9%
MicroBench...da.test:BM_DISC_ORD_LAMBDA/171         4.69       4.65  -0.9%
MicroBench...IC2VW4BigLoopWithReductionTC63        24.27      24.04  -0.9%
MicroBench...s.test:benchForIC1VW4LoopTC128         8.90       8.82  -0.9%
MicroBench...IC2VW4BigLoopWithReductionTC32        11.72      11.61  -0.9%
MultiSourc...arks/BitBench/drop3/drop3.test         0.25       0.24  -1.0%
MicroBench...est:benchAutoVecForBigLoopTC32         2.60       2.57  -1.0%
MicroBench...est:BM_MemCmp<64, EqZero, Mid>       124.26     122.98  -1.0%
MultiSourc.../Benchmarks/Olden/mst/mst.test         0.07       0.07  -1.1%
MicroBench...Cmp<15, GreaterThanZero, None>       338.44     334.86  -1.1%
MicroBench...ForIC2VW1LoopWithReductionTC32         5.95       5.88  -1.1%
MultiSourc...cations/hexxagon/hexxagon.test         8.01       7.92  -1.1%
MicroBench...s.test:benchAutoVecForLoopTC64         4.53       4.48  -1.1%
MicroBench...IC2VW1BigLoopWithReductionTC15        10.41      10.30  -1.1%
MicroBench...Cmp<64, GreaterThanZero, Last>       123.60     122.15  -1.2%
MicroBench...test:benchForIC1VW4BigLoopTC64        24.78      24.49  -1.2%
MicroBench...st:benchAutoVecForBigLoopTC127         9.43       9.31  -1.2%
MicroBench...IC4VW1BigLoopWithReductionTC31        13.10      12.94  -1.2%
Bitcode/si..._ops_test_op_packssdw_202.test         0.01       0.01  -1.3%
Bitcode/Re...on/vector_widen/widen_bug.test         0.06       0.06  -1.3%
MicroBench...test:benchForIC2VW4BigLoopTC32        12.72      12.55  -1.3%
MultiSourc...pps-C/SimpleMOC/SimpleMOC.test         1.52       1.50  -1.4%
External/S...epsjeng_r/531.deepsjeng_r.test        92.97      91.70  -1.4%
MicroBench...ks.test:benchForIC4VW4LoopTC32         2.19       2.16  -1.4%
MultiSourc...nch/fourinarow/fourinarow.test         1.62       1.59  -1.4%
MicroBench...s.test:benchForIC4VW4LoopTC128         7.35       7.25  -1.4%
MultiSourc.../Trimaran/enc-rc4/enc-rc4.test         1.34       1.32  -1.4%
MultiSourc...decode/alacconvert-decode.test         0.02       0.02  -1.5%
Bitcode/si...simd_ops_test_op_addps_69.test         0.01       0.01  -1.5%
MicroBench...mCmp<15, GreaterThanZero, Mid>       276.12     272.02  -1.5%
MicroBench...hForIC4VW4LoopWithReductionTC8         2.43       2.39  -1.5%
MicroBench...ks.test:benchForIC2VW4LoopTC32         2.39       2.35  -1.5%
MultiSource/Benchmarks/PAQ8p/paq8p.test            34.43      33.90  -1.5%
MicroBench...rks.test:benchForIC1VW4LoopTC8         1.67       1.65  -1.6%
SingleSour...e/Benchmarks/Misc/flops-3.test         1.19       1.17  -1.6%
MicroBench...ForIC4VW4LoopWithReductionTC64         2.63       2.59  -1.6%
MicroBench...LoopFrom_uint32_t_To_uint64_t_      2192.14    2157.52  -1.6%
MicroBench...rIC1VW1BigLoopWithReductionTC7         8.23       8.10  -1.6%
MicroBench...thVW8From_uint8_t_To_uint16_t_       273.56     269.09  -1.6%
MicroBench...LoopFrom_uint32_t_To_uint64_t_      1504.81    1478.02  -1.8%
MultiSource/Benchmarks/sim/sim.test                 3.62       3.56  -1.8%
MicroBench...M_MemCmp<8, LessThanZero, Mid>       529.96     520.37  -1.8%
MicroBench...ks.test:benchForIC2VW4LoopTC64         4.18       4.10  -1.9%
MicroBench...IC1VW4BigLoopWithReductionTC16         8.11       7.95  -1.9%
Bitcode/si...imd_ops_test_op_pabsw_239.test         0.01       0.01  -1.9%
MicroBench..._MemCmp<16, LessThanZero, Mid>       319.56     313.39  -1.9%
MicroBench...test:benchForIC1VW4BigLoopTC32        13.09      12.84  -1.9%
MicroBench...rks.test:benchForIC1VW4LoopTC3         5.08       4.98  -2.0%
MicroBench...alsBRaw.test:BM_INIT3_RAW/5001         6.27       6.14  -2.0%
MicroBench...rIC2VW4BigLoopWithReductionTC8         5.76       5.64  -2.0%
MultiSource/Applications/aha/aha.test               1.47       1.44  -2.1%
MicroBench...rks.test:benchForIC2VW4LoopTC3         5.10       4.99  -2.1%
MicroBench...VW16From_uint32_t_To_uint64_t_      2127.15    2082.20  -2.1%
MicroBench....test:benchForIC4VW4BigLoopTC4         8.15       7.97  -2.1%
MicroBench...IC4VW4BigLoopWithReductionTC16         8.25       8.08  -2.1%
MicroBench...test:benchForIC4VW4BigLoopTC15        18.20      17.81  -2.1%
MicroBench...hVW16From_uint8_t_To_uint32_t_      1170.79    1145.25  -2.2%
MicroBench...nLoopFrom_uint8_t_To_uint64_t_      2892.91    2829.72  -2.2%
MicroBench...thVW8From_uint8_t_To_uint64_t_      1595.69    1560.50  -2.2%
MicroBench...rIC4VW1BigLoopWithReductionTC1         5.56       5.43  -2.2%
MicroBench...Cmp<16, GreaterThanZero, None>       318.45     311.29  -2.2%
MicroBench...mp<15, GreaterThanZero, First>       276.38     270.06  -2.3%
MultiSourc...adpcm/rawcaudio/rawcaudio.test         0.00       0.00  -2.4%
SingleSour...hootout/Shootout-heapsort.test         2.73       2.67  -2.4%
MicroBench...LoopFrom_uint16_t_To_uint64_t_      2275.83    2221.65  -2.4%
MicroBench....test:benchForIC4VW4BigLoopTC1         5.23       5.11  -2.4%
MicroBench...hForIC2VW4LoopWithReductionTC8         2.07       2.02  -2.4%
MicroBench...test:benchForIC1VW4BigLoopTC15        11.16      10.89  -2.4%
MicroBench...BigLoopWithReductionAutoVecTC8         5.98       5.84  -2.4%
MicroBench...BENCHMARK_sinhf_autovec_float_       455.09     443.85  -2.5%
MicroBench...igLoopWithReductionAutoVecTC15         9.97       9.71  -2.6%
MicroBench...mCmp<16, GreaterThanZero, Mid>       353.14     343.92  -2.6%
MicroBench...:BENCHMARK_sinf_autovec_float_       263.86     256.77  -2.7%
Bitcode/si...simd_ops_test_op_paddb_42.test         0.01       0.01  -2.7%
SingleSour.../Shootout/Shootout-strcat.test         0.17       0.16  -2.7%
MicroBench...ks.test:benchForIC4VW4LoopTC64         4.04       3.93  -2.7%
MicroBench...igLoopWithReductionAutoVecTC16         8.18       7.96  -2.8%
MicroBench...nLoopFrom_uint8_t_To_uint16_t_       424.89     412.97  -2.8%
Bitcode/si..._ops_test_op_blendvps_253.test         0.01       0.01  -2.8%
SingleSour...algebra/kernels/bicg/bicg.test         0.03       0.03  -2.9%
Bitcode/si...imd_ops_test_op_divpd_193.test         0.01       0.01  -2.9%
MicroBench...st:BM_GEN_LIN_RECUR_LAMBDA/171         1.16       1.13  -2.9%
MicroBench...VW16From_uint16_t_To_uint64_t_      2172.34    2109.66  -2.9%
MicroBench...:BM_GEN_LIN_RECUR_LAMBDA/44217       301.67     292.95  -2.9%
MicroBench...rks.test:benchForIC2VW4LoopTC2         4.27       4.14  -2.9%
MicroBench...st:BM_MemCmp<16, EqZero, None>       157.22     152.54  -3.0%
MicroBench...rIC1VW4BigLoopWithReductionTC3        10.93      10.60  -3.0%
MicroBench...VW16From_uint16_t_To_uint32_t_      1880.89    1823.44  -3.1%
MicroBench...mp<16, GreaterThanZero, First>       253.91     245.90  -3.2%
MicroBench...nLoopFrom_uint8_t_To_uint32_t_      1040.09    1006.76  -3.2%
MicroBench...BRaw.test:BM_MULADDSUB_RAW/171         0.14       0.14  -3.2%
MicroBench...est:BM_MemCmp<5, EqZero, Last>      1695.35    1640.33  -3.2%
SingleSour...Adobe-C++/functionobjects.test         4.17       4.04  -3.2%
MultiSourc...pansion-flt/Expansion-flt.test         1.73       1.68  -3.3%
MicroBench...mCmp<7, GreaterThanZero, Last>       739.91     715.76  -3.3%
MicroBench...bda.test:BM_PIC_1D_LAMBDA/5001        48.72      47.12  -3.3%
External/S...ecrand_is/998.specrand_is.test         0.00       0.00  -3.3%
MicroBench....test:BM_DISC_ORD_LAMBDA/44217      1244.39    1203.57  -3.3%
MicroBench...MemCmp<16, LessThanZero, Last>       316.09     305.69  -3.3%
SingleSour...hootout/Shootout-methcall.test         3.95       3.82  -3.3%
MicroBench...t:BM_MemCmp<16, EqZero, First>       157.79     152.57  -3.3%
MicroBench...VW16From_uint32_t_To_uint64_t_      2040.51    1972.87  -3.3%
MicroBench...t:BM_GEN_LIN_RECUR_LAMBDA/5001        34.41      33.25  -3.4%
MultiSourc...pplications/oggenc/oggenc.test         0.11       0.10  -3.4%
MicroBench...st:BM_ENERGY_CALC_LAMBDA/44217       533.58     515.58  -3.4%
MicroBench...rIC2VW4BigLoopWithReductionTC1         6.58       6.36  -3.4%
MicroBench...ks.test:benchForIC1VW4LoopTC16         2.14       2.06  -3.4%
MultiSourc...ow-flt/GlobalDataFlow-flt.test         1.39       1.34  -3.4%
MicroBench...BENCHMARK_sinh_autovec_double_       565.30     546.06  -3.4%
MicroBench...rks.test:benchForIC1VW4LoopTC2         3.94       3.80  -3.4%
MicroBench...test:benchAutoVecForBigLoopTC4         2.40       2.31  -3.5%
MicroBench...MemCmp<16, LessThanZero, None>       286.81     276.66  -3.5%
Bitcode/si...simd_ops_test_op_paddsb_2.test         0.01       0.01  -3.6%
MicroBench...est:BENCHMARK_FLOYD_DITHER/128       155.33     149.76  -3.6%
MicroBench...t:BENCHMARK_sinhf_novec_float_       450.00     433.69  -3.6%
Bitcode/si...d_ops_test_op_cmpltpd_213.test         0.01       0.01  -3.6%
MicroBench...ambda.test:BM_ICCG_LAMBDA/5001         5.95       5.73  -3.7%
Bitcode/si...simd_ops_test_op_orps_187.test         0.01       0.01  -3.7%
MicroBench...c128UniformDivisor<__int128_t>        12.59      12.12  -3.7%
MicroBench...hVW16From_uint8_t_To_uint64_t_      1583.79    1524.49  -3.7%
MultiSourc...rks/Olden/voronoi/voronoi.test         0.30       0.29  -3.8%
MicroBench...ntime_checks_needed<4, double>         4.52       4.35  -3.8%
Bitcode/si...imd_ops_test_op_pabsb_241.test         0.01       0.01  -3.9%
Bitcode/si...imd_ops_test_op_maxps_123.test         0.01       0.01  -3.9%
MicroBench...M_MemCmp<3, LessThanZero, Mid>      2669.00    2564.90  -3.9%
MicroBench...LoopFrom_uint64_t_To_uint32_t_      2519.59    2419.27  -4.0%
MicroBench...rIC1VW1BigLoopWithReductionTC2         7.88       7.56  -4.0%
MultiSourc...marks/Olden/bisort/bisort.test         0.54       0.52  -4.1%
External/S...8.imagick_s/638.imagick_s.test        43.93      42.11  -4.1%
MicroBench...hForIC1VW4LoopWithReductionTC2         3.62       3.47  -4.2%
Bitcode/si...imd_ops_test_op_pabsb_238.test         0.01       0.01  -4.2%
External/S...511.povray_r/511.povray_r.test         6.12       5.87  -4.2%
MicroBench...emCmp<16, LessThanZero, First>       222.70     213.35  -4.2%
MicroBench...hForIC2VW4LoopWithReductionTC7         3.11       2.98  -4.2%
MicroBench...meChecks4PointersDEqualsA/1000    831997.17  796870.36  -4.2%
External/S...8.imagick_r/538.imagick_r.test        44.85      42.94  -4.2%
MicroBench...IC1VW4BigLoopWithReductionTC15        10.13       9.69  -4.4%
External/S...6.blender_r/526.blender_r.test       167.06     159.67  -4.4%
MicroBench...hVW8From_uint32_t_To_uint64_t_      1817.53    1735.94  -4.5%
SingleSour...s/Shootout/Shootout-lists.test         4.31       4.11  -4.5%
MicroBench...ForIC4VW1LoopWithReductionTC32         2.54       2.42  -4.6%
MicroBench...a.test:BM_DISC_ORD_LAMBDA/5001       143.15     136.39  -4.7%
MicroBench...emCmp<7, GreaterThanZero, Mid>       569.08     541.97  -4.8%
External/S...7rate/502.gcc_r/502.gcc_r.test        62.72      59.69  -4.8%
MicroBench...est:BM_MemCmp<16, EqZero, Mid>       157.92     150.21  -4.9%
MicroBench...orIC2VW1LoopWithReductionTC128        21.28      20.24  -4.9%
MicroBench...LoopFrom_uint16_t_To_uint32_t_      1078.69    1025.86  -4.9%
Bitcode/si...simd_ops_test_op_mulps_71.test         0.01       0.01  -5.0%
Bitcode/si...imd_ops_test_op_minpd_211.test         0.01       0.01  -5.0%
MicroBench...thVW8From_uint8_t_To_uint32_t_       853.84     811.26  -5.0%
MicroBench...hVW8From_uint16_t_To_uint32_t_       854.01     811.12  -5.0%
SingleSour...sc-C++/stepanov_container.test         6.64       6.30  -5.1%
MicroBench...hVW8From_uint16_t_To_uint32_t_      1905.97    1808.38  -5.1%
MicroBench.../lcalsARaw.test:BM_FIR_RAW/171         0.44       0.42  -5.1%
MicroBench...BENCHMARK_asin_autovec_double_       149.15     141.41  -5.2%
External/S...d/641.leela_s/641.leela_s.test       115.09     109.11  -5.2%
MicroBench...st:BM_MemCmp<16, EqZero, Last>       158.21     149.96  -5.2%
MicroBench...sic128SmallDivisor<__int128_t>        12.41      11.76  -5.2%
MultiSourc...oxyApps-C/XSBench/XSBench.test         3.23       3.05  -5.3%
MultiSourc...stones-3.1/fhourstones3.1.test         1.15       1.09  -5.4%
MicroBench...ambda.test:BM_EOS_LAMBDA/44217        89.34      84.55  -5.4%
External/S...rlbench_r/500.perlbench_r.test        31.33      29.61  -5.5%
Bitcode/si...imd_ops_test_op_minps_124.test         0.01       0.01  -5.6%
Bitcode/si...md_ops_test_op_paddsb_145.test         0.01       0.01  -5.6%
MicroBench...Cmp<7, GreaterThanZero, First>       579.70     546.87  -5.7%
MicroBench...runtime_checks_fail<4, double>         4.57       4.31  -5.7%
MultiSourc...hmarks/McCat/08-main/main.test         0.06       0.06  -5.7%
MultiSourc.../Benchmarks/nbench/nbench.test         1.48       1.40  -5.7%
MicroBench...hVW8From_uint16_t_To_uint64_t_      1894.05    1784.88  -5.8%
MicroBench...nLoopFrom_uint8_t_To_uint64_t_      2970.00    2797.65  -5.8%
Bitcode/si...imd_ops_test_op_paddq_199.test         0.01       0.01  -5.8%
MicroBench...LoopFrom_uint16_t_To_uint32_t_       792.60     746.46  -5.8%
MicroBench....test:benchForIC1VW4BigLoopTC7         9.44       8.89  -5.9%
Bitcode/si...md_ops_test_op_cmpltps_89.test         0.01       0.01  -5.9%
MicroBench...a.test:BM_MAT_X_MAT_LAMBDA/171       160.12     150.64  -5.9%
Bitcode/si...imd_ops_test_op_paddq_214.test         0.01       0.01  -5.9%
MicroBench...:BENCHMARK_expf_autovec_float_       132.16     124.24  -6.0%
SingleSour...e/Benchmarks/Misc/flops-1.test         1.14       1.07  -6.0%
MicroBench...r_runtime_checks_fail<16, int>        11.79      11.08  -6.0%
SingleSour...s/gramschmidt/gramschmidt.test        17.56      16.49  -6.1%
MicroBench....test:benchForIC2VW4BigLoopTC4         8.37       7.86  -6.1%
Bitcode/si..._ops_test_op_blendvps_299.test         0.01       0.01  -6.1%
MicroBench...M_MemCmp<5, LessThanZero, Mid>      1574.41    1477.73  -6.1%
Bitcode/si..._ops_test_op_packuswb_219.test         0.01       0.01  -6.2%
SingleSour...t-C++/Shootout-C++-matrix.test         1.12       1.05  -6.2%
MicroBench...IC4VW1BigLoopWithReductionTC15        10.05       9.42  -6.2%
External/S...0.omnetpp_s/620.omnetpp_s.test        89.93      84.34  -6.2%
MicroBench...hVW16From_uint8_t_To_uint32_t_       980.35     918.65  -6.3%
MicroBench...MARK_BICUBIC_INTERPOLATION/256     22107.88   20685.31  -6.4%
Bitcode/si...simd_ops_test_op_maxps_27.test         0.01       0.01  -6.5%
MicroBench...timeChecks4PointersDBeforeA/32     37866.35   35369.48  -6.6%
MicroBench...LoopFrom_uint16_t_To_uint64_t_      1608.73    1502.34  -6.6%
MicroBench...IC4VW1BigLoopWithReductionTC16         8.04       7.51  -6.6%
MicroBench...runtime_checks_pass<3, double>         2.90       2.71  -6.7%
SingleSour...enchmarks/Dhrystone/fldry.test         0.41       0.38  -6.7%
MicroBench...thVW8From_uint8_t_To_uint64_t_      1873.97    1747.26  -6.8%
MicroBench...ForIC2VW1LoopWithReductionTC31         5.65       5.26  -6.8%
MultiSourc...nia/pathfinder/pathfinder.test         0.44       0.41  -6.8%
External/S...speed/605.mcf_s/605.mcf_s.test        75.85      70.65  -6.9%
MicroBench...VW16From_uint16_t_To_uint32_t_      1013.78     944.08  -6.9%
Bitcode/si...md_ops_test_op_cmpltps_41.test         0.01       0.01  -6.9%
MicroBench...calsBRaw.test:BM_INIT3_RAW/171         0.10       0.09  -6.9%
MicroBench...hForIC2VW1LoopWithReductionTC1         3.34       3.11  -7.1%
MicroBench...hVW16From_uint8_t_To_uint64_t_      2274.47    2112.96  -7.1%
MultiSourc.../Trimaran/enc-pc1/enc-pc1.test         0.42       0.39  -7.1%
SingleSour...marks/Stanford/Bubblesort.test         0.02       0.02  -7.2%
MultiSourc...ks/Prolangs-C++/city/city.test         0.01       0.01  -7.2%
Bitcode/si..._ops_test_op_packssdw_232.test         0.01       0.01  -7.3%
MultiSourc...s/FreeBench/neural/neural.test         0.08       0.07  -7.3%
MicroBench...t:BM_IMP_HYDRO_2D_LAMBDA/44217      1564.46    1449.29  -7.4%
MicroBench...t:BM_TRIDIAG_ELIM_LAMBDA/44217       107.33      99.42  -7.4%
MicroBench...thVW8From_uint8_t_To_uint32_t_      1159.38    1073.65  -7.4%
MicroBench...HMARK_ANISTROPIC_DIFFUSION/256     48785.99   45177.64  -7.4%
Bitcode/si...imd_ops_test_op_addpd_220.test         0.01       0.01  -7.6%
MicroBench...rIC2VW1BigLoopWithReductionTC7         7.93       7.32  -7.7%
SingleSour...C++/Shootout-C++-methcall.test         4.67       4.30  -7.9%
MultiSourc...C/Packing-flt/Packing-flt.test         3.56       3.28  -7.9%
MicroBench...HMARK_ANISTROPIC_DIFFUSION/128     11593.22   10671.70  -7.9%
MicroBench...hForIC1VW1LoopWithReductionTC1         2.76       2.54  -8.1%
MicroBench...test:benchForIC4VW4BigLoopTC16         9.77       8.97  -8.2%
SingleSour...arks/BenchmarkGame/n-body.test         0.45       0.41  -8.2%
MicroBench...test:BM_PLANCKIAN_LAMBDA/44217       626.89     575.28  -8.2%
MicroBench...ic128SmallDivisor<__uint128_t>        12.05      11.06  -8.2%
MicroBench...test:benchForIC2VW4BigLoopTC16         9.95       9.12  -8.3%
SingleSource/Benchmarks/Misc/pi.test                0.61       0.56  -8.3%
MicroBench...w.test:BM_ENERGY_CALC_RAW/5001        45.32      41.50  -8.4%
MicroBench...VW16From_uint16_t_To_uint64_t_      1612.08    1474.78  -8.5%
MicroBench...st:BENCHMARK_boxBlurKernel/128        72.07      65.89  -8.6%
MicroBench...st:BM_TRIDIAG_ELIM_LAMBDA/5001        12.14      11.09  -8.7%
MicroBench...hVW8From_uint16_t_To_uint64_t_      1631.40    1488.59  -8.8%
MicroBench...hForIC4VW1LoopWithReductionTC4         2.39       2.18  -8.8%
SingleSour...ut-C++/Shootout-C++-hash2.test         2.05       1.87  -8.8%
SingleSour...ce/Benchmarks/Misc/mandel.test         0.48       0.44  -8.8%
MicroBench...test:BM_PRESSURE_CALC_RAW/5001        14.21      12.96  -8.8%
MultiSourc...t/StatementReordering-flt.test         2.59       2.36  -8.8%
MicroBench...ks.test:benchAutoVecForLoopTC2         3.95       3.60  -8.9%
MicroBench..._MemCmp<5, LessThanZero, None>      1645.62    1499.45  -8.9%
MultiSourc.../Prolangs-C++/simul/simul.test         0.01       0.01  -8.9%
MicroBench...BigLoopWithReductionAutoVecTC3        11.32      10.32  -8.9%
MicroBench...hForIC4VW4LoopWithReductionTC2         3.76       3.43  -8.9%
SingleSour...BenchmarkGame/nsieve-bits.test         0.69       0.63  -8.9%
SingleSour...arks/Misc-C++/mandel-text.test         1.26       1.15  -9.0%
MultiSourc...enchmarks/Olden/em3d/em3d.test         2.50       2.28  -9.0%
MicroBench...VW16From_uint64_t_To_uint32_t_      1504.46    1368.35  -9.0%
MicroBench...timeChecks4PointersDEqualsA/32     32482.80   29534.67  -9.1%
Bitcode/si...imd_ops_test_op_minps_172.test         0.01       0.01  -9.1%
MicroBench...late.test:BENCHMARK_DILATE/128        74.22      67.42  -9.2%
SingleSour...out-C++/Shootout-C++-hash.test         0.42       0.38  -9.2%
MicroBench...runtime_checks_pass<4, double>         3.05       2.77  -9.2%
MicroBench...rIC4VW1BigLoopWithReductionTC7         7.99       7.26  -9.2%
MultiSourc...nch/pcompress2/pcompress2.test         0.13       0.12  -9.3%
SingleSour...t-C++/Shootout-C++-random.test         3.67       3.33  -9.3%
External/S...rlbench_s/600.perlbench_s.test        37.14      33.65  -9.4%
MultiSourc...lications/obsequi/Obsequi.test         2.04       1.85  -9.4%
MicroBench...hForIC1VW1LoopWithReductionTC3         4.49       4.06  -9.4%
SingleSour...rks/Adobe-C++/loop_unroll.test         0.77       0.70  -9.4%
MicroBench...lcalsCRaw.test:BM_ICCG_RAW/171         0.18       0.17  -9.5%
SingleSour...BenchmarkGame/partialsums.test         0.24       0.22  -9.5%
MicroBench...rks.test:benchForIC4VW4LoopTC1         3.16       2.85  -9.6%
SingleSour...Shootout/Shootout-objinst.test         0.00       0.00  -9.6%
MultiSourc...FreeBench/distray/distray.test         0.08       0.07  -9.6%
MicroBench...est:BENCHMARK_HARRIS/1024/1024     10444.83    9429.28  -9.7%
Bitcode/si...imd_ops_test_op_pabsd_243.test         0.01       0.01  -9.9%
MicroBench...BENCHMARK_cbrtf_autovec_float_       505.17     454.96  -9.9%
MicroBench...hForIC2VW4LoopWithReductionTC2         3.73       3.36  -9.9%
MultiSourc...lications/ClamAV/clamscan.test         0.16       0.14 -10.0%
MultiSourc...ks/McCat/01-qbsort/qbsort.test         0.06       0.05 -10.1%
MicroBench...aw.test:BM_FIRST_SUM_RAW/44217        30.47      27.36 -10.2%
MicroBench...t:BM_FIND_FIRST_MIN_LAMBDA/171         0.22       0.20 -10.3%
MicroBench...rks.test:benchForIC4VW4LoopTC2         4.52       4.05 -10.4%
MicroBench...Raw.test:BM_FIRST_SUM_RAW/5001         3.40       3.05 -10.4%
MicroBench...hForIC2VW4LoopWithReductionTC1         2.66       2.38 -10.4%
External/S...510.parest_r/510.parest_r.test        50.47      45.22 -10.4%
MicroBench....test:benchForIC1VW4BigLoopTC4         6.26       5.61 -10.4%
MultiSourc...encode/alacconvert-encode.test         0.03       0.03 -10.5%
MicroBench...alsCRaw.test:BM_ICCG_RAW/44217        56.91      50.89 -10.6%
SingleSour...hmarks/Misc-C++/Large/ray.test         2.91       2.60 -10.6%
MicroBench...test:benchForIC1VW4BigLoopTC16        10.49       9.37 -10.7%
SingleSour...e/Benchmarks/Misc/flops-5.test         1.33       1.19 -10.7%
MultiSourc...ProxyApps-C++/HPCCG/HPCCG.test         0.68       0.61 -10.7%
MicroBench...lcalsARaw.test:BM_FIR_RAW/5001        14.51      12.93 -10.8%
MicroBench...a.test:BM_PLANCKIAN_LAMBDA/171         2.43       2.16 -11.0%
SingleSour...ks/Shootout/Shootout-hash.test         3.50       3.12 -11.0%
SingleSour...e/Benchmarks/McGill/chomp.test         1.30       1.16 -11.2%
MicroBench...hVW8From_uint64_t_To_uint32_t_      1418.36    1259.62 -11.2%
Bitcode/si..._ops_test_op_packusdw_319.test         0.01       0.01 -11.3%
Bitcode/si...d_ops_test_op_cmpeqpd_197.test         0.01       0.01 -11.4%
MicroBench...est:BM_INT_PREDICT_LAMBDA/5001        24.23      21.46 -11.4%
MicroBench..._MemCmp<1, LessThanZero, None>      5205.90    4609.23 -11.5%
MicroBench...t:BENCHMARK_asin_novec_double_       152.36     134.89 -11.5%
SingleSour...enchmarks/Misc/fp-convert.test         2.15       1.90 -11.5%
MicroBench....test:BM_PLANCKIAN_LAMBDA/5001        72.40      64.06 -11.5%
MicroBench...ic128SmallDivisor<__uint128_t>        11.35      10.04 -11.6%
MultiSource/Applications/lua/lua.test              16.37      14.47 -11.6%
MultiSourc.../Trimaran/enc-md5/enc-md5.test         1.58       1.39 -11.9%
MultiSourc...dbl/LoopRestructuring-dbl.test         4.10       3.61 -11.9%
MultiSourc...count/automotive-bitcount.test         0.06       0.05 -11.9%
MultiSourc.../Benchmarks/Ptrdist/bc/bc.test         0.44       0.38 -12.0%
MicroBench...r_runtime_checks_pass<16, int>         9.24       8.13 -12.0%
SingleSour...olybench/stencils/adi/adi.test        11.10       9.76 -12.1%
MicroBench...BENCHMARK_atan_autovec_double_       338.75     297.70 -12.1%
Bitcode/si...simd_ops_test_op_paddsb_1.test         0.01       0.01 -12.1%
Bitcode/si...simd_ops_test_op_minps_28.test         0.01       0.01 -12.2%
SingleSour...e/Benchmarks/Misc/flops-4.test         0.65       0.57 -12.2%
External/S...speed/602.gcc_s/602.gcc_s.test        67.28      59.04 -12.2%
SingleSour...ootout/Shootout-ackermann.test         0.00       0.00 -12.3%
MultiSourc...nchmarks/NPB-serial/is/is.test         6.00       5.26 -12.3%
External/S...ate/508.namd_r/508.namd_r.test        43.59      38.19 -12.4%
External/S...7rate/544.nab_r/544.nab_r.test       138.48     121.15 -12.5%
MicroBench...sARaw.test:BM_VOL3D_CALC_RAW/1        40.72      35.62 -12.5%
MicroBench...est:BM_FIRST_DIFF_LAMBDA/44217        19.22      16.80 -12.6%
MicroBench...mCmp<7, GreaterThanZero, None>       785.02     685.07 -12.7%
External/S...speed/644.nab_s/644.nab_s.test       138.55     120.84 -12.8%
MultiSourc...s/Rodinia/hotspot/hotspot.test         0.17       0.15 -12.8%
MultiSourc...ications/JM/ldecod/ldecod.test         0.05       0.05 -12.8%
MultiSourc...plications/d/make_dparser.test         0.02       0.02 -12.9%
MultiSourc...dbl/InductionVariable-dbl.test         3.72       3.25 -12.9%
MicroBench...ForLoopWithReductionAutoVecTC1         2.67       2.32 -12.9%
SingleSour...ce/Benchmarks/Misc/fbench.test         0.97       0.85 -12.9%
MicroBench....test:benchForIC2VW4BigLoopTC8         7.60       6.61 -13.0%
MicroBench...BENCHMARK_atanf_autovec_float_       311.21     270.36 -13.1%
MultiSourc...ks/Prolangs-C/gnugo/gnugo.test         0.04       0.03 -13.2%
MultiSourc...Applications/kimwitu++/kc.test         0.14       0.12 -13.2%
MicroBench...hForIC2VW1LoopWithReductionTC4         2.24       1.94 -13.2%
MicroBench....test:benchForIC1VW4BigLoopTC3        12.39      10.75 -13.3%
MicroBench...hForIC1VW4LoopWithReductionTC1         2.65       2.30 -13.3%
SingleSour...enchmarks/Misc/revertBits.test         0.18       0.16 -13.5%
External/S...eed/625.x264_s/625.x264_s.test        34.53      29.84 -13.6%
MicroBench...sic128SmallDivisor<__int128_t>        14.68      12.68 -13.6%
MicroBench....test:benchForIC4VW4BigLoopTC3        12.91      11.14 -13.7%
MicroBench...hForIC2VW4LoopWithReductionTC3         4.45       3.84 -13.7%
MicroBench...hForIC4VW4LoopWithReductionTC1         2.66       2.30 -13.7%
SingleSour...t-C++/Shootout-C++-strcat.test         0.03       0.03 -13.8%
SingleSour...enchmarks/Misc-C++/bigfib.test         0.36       0.31 -13.9%
MicroBench...:BM_FIND_FIRST_MIN_LAMBDA/5001         6.34       5.46 -14.0%
MicroBench...est:BENCHMARK_FLOYD_DITHER/512      2646.92    2276.13 -14.0%
MicroBench....test:BM_FIRST_DIFF_LAMBDA/171         0.07       0.06 -14.0%
MicroBench...test:BM_MemCmp<1, EqZero, Mid>      5360.25    4605.91 -14.1%
MicroBench...t:BENCHMARK_cbrt_novec_double_       589.68     506.03 -14.2%
MicroBench...est:BM_MemCmp<2, EqZero, Last>      2453.26    2102.50 -14.3%
MicroBench...r_runtime_checks_fail<16, int>         9.30       7.97 -14.3%
MicroBench....test:benchForIC2VW4BigLoopTC3        12.65      10.82 -14.4%
MultiSourc...marks/SciMark2-C/scimark2.test        41.27      35.32 -14.4%
MicroBench...M_MemCmp<7, LessThanZero, Mid>       552.43     472.29 -14.5%
SingleSour...a/solvers/trisolv/trisolv.test         0.02       0.01 -14.5%
MicroBench...LoopFrom_uint64_t_To_uint32_t_      1584.17    1353.76 -14.5%
MicroBench...t:BENCHMARK_boxBlurKernel/1024      5163.86    4410.70 -14.6%
MicroBench...test:BM_MemCmp<5, EqZero, Mid>      1676.39    1431.67 -14.6%
MultiSourc...ediabench/gsm/toast/toast.test         0.02       0.01 -14.6%
SingleSour...ear-algebra/solvers/lu/lu.test        40.93      34.93 -14.7%
SingleSour...isc-C++/Large/sphereflake.test         3.84       3.27 -14.7%
SingleSour...tout-C++/Shootout-C++-ary.test         0.04       0.03 -14.8%
MicroBench...st:BENCHMARK_boxBlurKernel/256       321.93     274.19 -14.8%
Bitcode/si...imd_ops_test_op_paddsb_50.test         0.01       0.01 -14.9%
SingleSour...s/BenchmarkGame/recursive.test         0.86       0.73 -15.0%
MicroBench...test:BM_DIFF_PREDICT_RAW/44217       517.56     439.23 -15.1%
SingleSour...ar-algebra/blas/trmm/trmm.test         1.91       1.62 -15.2%
MicroBench...lcalsARaw.test:BM_COUPLE_RAW/2         1.99       1.68 -15.2%
Bitcode/si..._ops_test_op_packuswb_204.test         0.01       0.01 -15.3%
MicroBench...VW16From_uint64_t_To_uint32_t_      2618.07    2217.24 -15.3%
MicroBench...lcalsARaw.test:BM_COUPLE_RAW/0      1477.19    1250.23 -15.4%
MultiSourc...nch/beamformer/beamformer.test         0.78       0.66 -15.4%
MicroBench...test:BM_FIRST_DIFF_LAMBDA/5001         2.19       1.85 -15.4%
MicroBench..._MemCmp<7, LessThanZero, Last>       822.55     695.45 -15.5%
MultiSourc...yApps-C++/PENNANT/PENNANT.test         0.52       0.44 -15.5%
MicroBench....test:benchForIC1VW4BigLoopTC8         8.05       6.80 -15.5%
MultiSourc...ications/JM/lencod/lencod.test         5.05       4.26 -15.6%
External/S...lancbmk_r/523.xalancbmk_r.test        63.21      53.31 -15.7%
MicroBench...t:BENCHMARK_atan_novec_double_       336.02     283.29 -15.7%
MicroBench..._MemCmp<5, LessThanZero, Last>      1584.10    1334.71 -15.7%
MicroBench...calsCRaw.test:BM_ICCG_RAW/5001         6.39       5.38 -15.8%
MicroBench...or_runtime_checks_pass<4, int>         3.82       3.22 -15.8%
Bitcode/si...md_ops_test_op_cmpeqps_88.test         0.01       0.01 -15.8%
MultiSourc...oxyApps-C/RSBench/rsbench.test         0.49       0.41 -16.0%
SingleSour...nchmarks/Misc/ReedSolomon.test         4.79       4.03 -16.0%
MicroBench...runtime_checks_needed<16, int>         3.68       3.09 -16.2%
SingleSour...rks/CoyoteBench/huffbench.test        14.82      12.42 -16.2%
MicroBench...da.test:BM_HYDRO_2D_LAMBDA/171        10.65       8.90 -16.4%
MultiSourc...ks/McCat/12-IOtest/iotest.test         0.22       0.19 -16.5%
MultiSourc...-typeset/consumer-typeset.test         0.11       0.09 -16.6%
MultiSourc...lt/IndirectAddressing-flt.test         3.17       2.64 -16.6%
MultiSourc...rolangs-C++/employ/employ.test         0.01       0.01 -16.8%
MultiSourc...marks/Ptrdist/yacr2/yacr2.test         0.62       0.52 -16.8%
MicroBench....test:BM_GEN_LIN_RECUR_RAW/171         1.11       0.92 -16.9%
MultiSourc...ProxyApps-C++/CLAMR/CLAMR.test         1.37       1.14 -17.0%
SingleSour...ut-C++/Shootout-C++-sieve.test         1.52       1.26 -17.1%
External/S...7rate/519.lbm_r/519.lbm_r.test        29.55      24.49 -17.1%
MultiSourc...adpcm/rawdaudio/rawdaudio.test         0.00       0.00 -17.2%
MicroBench....test:BENCHMARK_HARRIS/512/512      3162.45    2618.15 -17.2%
External/S...e/541.leela_r/541.leela_r.test       116.19      96.17 -17.2%
MultiSourc...abench/jpeg/jpeg-6a/cjpeg.test         0.00       0.00 -17.3%
External/S...7rate/505.mcf_r/505.mcf_r.test        82.68      68.26 -17.4%
MicroBench...rks.test:benchForIC1VW4LoopTC1         2.47       2.04 -17.6%
MicroBench...a.test:BM_HYDRO_2D_LAMBDA/5001       352.91     290.44 -17.7%
MicroBench...CHMARK_ANISTROPIC_DIFFUSION/32       675.41     555.53 -17.8%
MicroBench...est:BENCHMARK_FLOYD_DITHER/256       632.84     520.38 -17.8%
SingleSour...ks/Shootout/Shootout-fib2.test         2.77       2.28 -17.8%
MultiSourc...eeBench/analyzer/analyzer.test         0.04       0.04 -17.8%
MicroBench...est:BM_IMP_HYDRO_2D_LAMBDA/171         6.66       5.47 -17.9%
MicroBench...t:BENCHMARK_GAUSSIAN_BLUR/1024     74602.43   61270.25 -17.9%
MicroBench...est:BM_TRIDIAG_ELIM_LAMBDA/171         0.38       0.31 -17.9%
Bitcode/Be...an/halide_local_laplacian.test        15.24      12.50 -18.0%
MicroBench...rIC2VW4BigLoopWithReductionTC3        12.03       9.87 -18.0%
Bitcode/si...d_ops_test_op_cmpeqps_184.test         0.01       0.01 -18.0%
External/S...17speed/657.xz_s/657.xz_s.test        69.44      56.86 -18.1%
MultiSourc.../Applications/spiff/spiff.test         2.25       1.84 -18.1%
MultiSourc...rks/Olden/treeadd/treeadd.test         0.23       0.19 -18.2%
MicroBench...BENCHMARK_ORDERED_DITHER/128/8        81.65      66.75 -18.2%
Bitcode/si...imd_ops_test_op_pabsw_236.test         0.01       0.01 -18.2%
SingleSour...Adobe-C++/stepanov_vector.test         2.13       1.74 -18.4%
External/S...ate/525.x264_r/525.x264_r.test        36.07      29.43 -18.4%
Bitcode/si...d_ops_test_op_cmpltpd_228.test         0.01       0.01 -18.4%
MultiSourc...telecomm-FFT/telecomm-fft.test         0.04       0.03 -18.6%
MicroBench...BM_FIND_FIRST_MIN_LAMBDA/44217        55.43      45.14 -18.6%
MicroBench...ENCHMARK_BILATERAL_FILTER/16/2        29.22      23.76 -18.7%
MicroBench...lcalsARaw.test:BM_COUPLE_RAW/1       261.52     212.58 -18.7%
MicroBench...CRaw.test:BM_FIRST_SUM_RAW/171         0.10       0.08 -18.7%
MicroBench...hForIC2VW1LoopWithReductionTC3         6.40       5.20 -18.7%
MultiSourc...nsumer-jpeg/consumer-jpeg.test         0.00       0.00 -18.8%
MicroBench...est:BM_GEN_LIN_RECUR_RAW/44217       291.04     235.98 -18.9%
Bitcode/si...md_ops_test_op_paddsb_146.test         0.01       0.01 -19.0%
MicroBench...t:BENCHMARK_atanf_novec_float_       313.23     253.67 -19.0%
MicroBench...calsARaw.test:BM_FIR_RAW/44217       127.86     103.54 -19.0%
MicroBench...BENCHMARK_ORDERED_DITHER/512/4      1278.98    1035.41 -19.0%
MultiSourc...math/automotive-basicmath.test         0.28       0.22 -19.1%
MicroBench...late.test:BENCHMARK_DILATE/256       300.07     242.65 -19.1%
MultiSourc...ve-susan/automotive-susan.test         0.04       0.03 -19.2%
MicroBench...test:BM_GEN_LIN_RECUR_RAW/5001        32.88      26.58 -19.2%
MicroBench...BENCHMARK_ORDERED_DITHER/128/2        66.01      53.35 -19.2%
MicroBench...:BENCHMARK_exp_autovec_double_       262.08     211.82 -19.2%
MicroBench...st:BENCHMARK_boxBlurKernel/512      1442.24    1164.86 -19.2%
MultiSourc...s/MallocBench/cfrac/cfrac.test         0.80       0.64 -19.3%
MicroBench...meChecks4PointersDBeforeA/1000   1275787.69 1026210.83 -19.6%
SingleSour.../Shootout/Shootout-matrix.test         1.52       1.22 -19.8%
MultiSourc...ijndael/security-rijndael.test         0.03       0.03 -19.8%
MicroBench...sCRaw.test:BM_DISC_ORD_RAW/171         4.61       3.70 -19.8%
SingleSour...ch/medley/deriche/deriche.test         0.63       0.51 -19.8%
SingleSour.../Benchmarks/Stanford/Perm.test         0.02       0.01 -19.9%
Bitcode/si...imd_ops_test_op_mulpd_222.test         0.01       0.01 -20.1%
MicroBench...Cmp<5, GreaterThanZero, First>      2333.87    1861.34 -20.2%
MultiSourc...ch/g721/g721encode/encode.test         0.04       0.04 -20.4%
SingleSour...enchmarks/Stanford/Towers.test         0.01       0.01 -20.5%
MicroBench...CRaw.test:BM_DISC_ORD_RAW/5001       140.83     111.83 -20.6%
MultiSourc...bl/IndirectAddressing-dbl.test         3.11       2.47 -20.6%
Bitcode/Be...rid/halide_bilateral_grid.test        17.86      14.14 -20.8%
SingleSour...BenchmarkGame/Large/fasta.test         1.15       0.91 -20.9%
MicroBench...BigLoopWithReductionAutoVecTC1         5.23       4.14 -20.9%
MicroBench...est:BM_PRESSURE_CALC_RAW/44217       130.68     103.32 -20.9%
SingleSour...e/Benchmarks/Misc/ffbench.test         0.49       0.39 -21.1%
MicroBench...a.test:BM_MULADDSUB_LAMBDA/171         0.16       0.12 -21.1%
MultiSourc...Olden/perimeter/perimeter.test         0.18       0.14 -21.3%
MicroBench...MemCmp<3, LessThanZero, First>      2562.89    2016.97 -21.3%
Bitcode/Regression/fft/fft.test                     0.02       0.02 -21.3%
MultiSourc...arching-flt/Searching-flt.test         3.74       2.93 -21.5%
MultiSourc...ing-dbl/LoopRerolling-dbl.test         3.29       2.58 -21.6%
MicroBench...Interchange.test:BENCHMARK_LI1       192.19     150.09 -21.9%
MicroBench...est:BM_DEL_DOT_VEC_2D_LAMBDA/2         0.49       0.38 -22.1%
MicroBench...ENCHMARK_BILATERAL_FILTER/64/2       582.92     452.66 -22.3%
MultiSourc...netbench-crc/netbench-crc.test         0.94       0.73 -22.7%
SingleSour.../Benchmarks/Misc/mandel-2.test         0.69       0.53 -22.8%
SingleSour...chmarks/Stanford/Treesort.test         0.09       0.07 -22.8%
MicroBench...BENCHMARK_ORDERED_DITHER/128/3        92.81      71.63 -22.8%
MicroBench...CHMARK_ANISTROPIC_DIFFUSION/64      2943.34    2271.32 -22.8%
MultiSourc...tions/lambda-0.1.3/lambda.test         3.28       2.53 -22.9%
MicroBench...BENCHMARK_ORDERED_DITHER/512/2      1059.71     814.87 -23.1%
MultiSourc...lications/viterbi/viterbi.test         1.48       1.14 -23.2%
MicroBench...Raw.test:BM_DISC_ORD_RAW/44217      1252.20     961.53 -23.2%
SingleSour...mple_types_loop_invariant.test         1.28       0.98 -23.2%
MultiSourc...ks/VersaBench/8b10b/8b10b.test         5.32       4.08 -23.3%
MicroBench...BENCHMARK_ORDERED_DITHER/128/4        83.16      63.79 -23.3%
MicroBench....test:BM_IMP_HYDRO_2D_RAW/5001       172.47     132.23 -23.3%
MultiSourc...dijkstra/network-dijkstra.test         0.04       0.03 -23.4%
SingleSour...Benchmarks/Misc/lowercase.test         0.00       0.00 -23.5%
MicroBench...Raw.test:BM_MAT_X_MAT_RAW/5001     12901.40    9860.91 -23.6%
MicroBench...st:BM_IMP_HYDRO_2D_LAMBDA/5001       215.20     164.42 -23.6%
MultiSourc...s/Ptrdist/anagram/anagram.test         0.88       0.68 -23.7%
MultiSourc...oxyApps-C/miniGMG/miniGMG.test         0.84       0.64 -23.7%
MultiSourc.../VersaBench/ecbdes/ecbdes.test         2.06       1.57 -23.7%
MicroBench...nLoopFrom_uint32_t_To_uint8_t_      2004.13    1527.09 -23.8%
MicroBench...test:BM_IMP_HYDRO_2D_RAW/44217      1510.45    1150.64 -23.8%
MicroBench...ENCHMARK_BILATERAL_FILTER/32/2       136.32     103.66 -24.0%
MicroBench...test:BM_MAT_X_MAT_LAMBDA/44217    291907.49  221928.03 -24.0%
SingleSour.../medley/nussinov/nussinov.test        14.13      10.73 -24.1%
MultiSourc...netbench-url/netbench-url.test         3.82       2.90 -24.1%
SingleSour...ar-algebra/blas/syrk/syrk.test         2.52       1.91 -24.2%
SingleSour...bra/solvers/durbin/durbin.test         0.01       0.01 -24.5%
MultiSourc...-flt/LinearDependence-flt.test         1.41       1.06 -24.6%
MicroBench...emCmp<5, GreaterThanZero, Mid>      2176.94    1636.02 -24.8%
SingleSource/Benchmarks/Misc/dt.test                0.20       0.15 -24.9%
MultiSourc...ks/Prolangs-C/agrep/agrep.test         0.00       0.00 -25.2%
MicroBench...or_runtime_checks_fail<4, int>         3.95       2.95 -25.2%
SingleSour...e/Benchmarks/Misc/flops-7.test         0.53       0.39 -25.2%
MicroBench...r_runtime_checks_pass<16, int>        15.13      11.30 -25.3%
MultiSourc...ocBench/espresso/espresso.test         0.43       0.32 -25.3%
SingleSour...hmarks/Stanford/Quicksort.test         0.04       0.03 -25.6%
MicroBench...nLoopFrom_uint64_t_To_uint8_t_      6103.04    4538.32 -25.6%
MultiSourc...ences-dbl/Recurrences-dbl.test         3.76       2.79 -25.7%
MicroBench...BENCHMARK_ORDERED_DITHER/256/2       275.80     204.77 -25.8%
MicroBench...CLambda.test:BM_ADI_LAMBDA/171         5.37       3.98 -25.8%
MicroBench...BENCHMARK_ORDERED_DITHER/256/3       378.23     279.81 -26.0%
SingleSour...hmarks/Misc-C++-EH/spirit.test         8.48       6.26 -26.1%
MultiSourc...ks/BitBench/five11/five11.test         1.65       1.21 -26.3%
MicroBench...ENCHMARK_BILATERAL_FILTER/64/4      2332.16    1715.52 -26.4%
MultiSourc...s/ASC_Sequoia/IRSmk/IRSmk.test         2.34       1.72 -26.5%
MultiSourc...security-sha/security-sha.test         0.02       0.01 -26.8%
MicroBench...est:BM_DEL_DOT_VEC_2D_LAMBDA/0       263.58     192.96 -26.8%
MicroBench...ENCHMARK_BILATERAL_FILTER/32/4       492.43     360.44 -26.8%
MultiSourc...C/Packing-dbl/Packing-dbl.test         4.00       2.93 -26.8%
MultiSourc...flt/InductionVariable-flt.test         2.44       1.78 -26.9%
SingleSour.../Benchmarks/Dhrystone/dry.test         0.32       0.23 -27.0%
MicroBench...aw.test:BM_MAT_X_MAT_RAW/44217    231281.95  168763.48 -27.0%
MicroBench...Lambda.test:BM_COUPLE_LAMBDA/0      1230.64     897.03 -27.1%
MultiSource/Applications/siod/siod.test             1.73       1.26 -27.1%
MicroBench...rIC2VW1BigLoopWithReductionTC3        11.09       8.06 -27.3%
MultiSourc...ctions-flt/Reductions-flt.test         6.83       4.95 -27.5%
SingleSour...ks/BenchmarkGame/fannkuch.test         3.03       2.19 -27.7%
MicroBench...hVW8From_uint64_t_To_uint32_t_      1467.85    1060.06 -27.8%
MicroBench...est:BM_MemCmp<2, EqZero, None>      2558.54    1847.32 -27.8%
SingleSour...enchmarks/Stanford/Queens.test         0.01       0.01 -27.8%
MicroBench....test:BM_ENERGY_CALC_RAW/44217       557.34     401.27 -28.0%
SingleSour...++/EH/Shootout-C++-except.test         0.17       0.12 -28.0%
MicroBench...est:BM_MemCmp<6, EqZero, None>      1230.90     885.94 -28.0%
SingleSour...nchmarks/Stanford/FloatMM.test         0.09       0.06 -28.0%
MicroBench...test:BM_MemCmp<6, EqZero, Mid>      1292.94     930.38 -28.0%
MicroBench...alsCRaw.test:BM_PIC_2D_RAW/171         1.22       0.87 -28.1%
MultiSourc...lications/minisat/minisat.test         7.70       5.52 -28.3%
MicroBench...ENCHMARK_BILATERAL_FILTER/16/4        93.58      66.91 -28.5%
MultiSourc...ing-flt/LoopRerolling-flt.test         2.80       2.00 -28.5%
SingleSour.../Benchmarks/Misc/oourafft.test         3.77       2.69 -28.6%
MicroBench...ALambda.test:BM_FIR_LAMBDA/171         0.35       0.25 -28.7%
MicroBench...Lambda.test:BM_COUPLE_LAMBDA/2         1.75       1.25 -28.7%
SingleSour...arks/BenchmarkGame/puzzle.test         0.15       0.11 -28.9%
SingleSour...s/Misc/richards_benchmark.test         0.76       0.54 -28.9%
MultiSource/Benchmarks/Olden/bh/bh.test             1.89       1.34 -29.1%
SingleSour...bra/solvers/ludcmp/ludcmp.test        42.21      29.89 -29.2%
MicroBench...w.test:BM_IMP_HYDRO_2D_RAW/171         5.80       4.10 -29.3%
MicroBench...Lambda.test:BM_FIR_LAMBDA/5001        11.30       7.99 -29.3%
MultiSourc...ks/McCat/04-bisect/bisect.test         0.08       0.06 -29.4%
MultiSourc.../Applications/SPASS/SPASS.test        10.35       7.30 -29.5%
MicroBench...est:BM_MemCmp<6, EqZero, Last>      1297.30     914.19 -29.5%
MicroBench...rIC4VW4BigLoopWithReductionTC1         5.81       4.10 -29.6%
MultiSourc.../Benchmarks/Ptrdist/ks/ks.test         0.87       0.61 -29.6%
MicroBench...BENCHMARK_ORDERED_DITHER/256/4       328.63     231.02 -29.7%
SingleSour.../stencils/fdtd-2d/fdtd-2d.test         6.58       4.62 -29.7%
MicroBench...est:BM_DEL_DOT_VEC_2D_LAMBDA/1        44.37      31.15 -29.8%
SingleSour...-algebra/blas/syr2k/syr2k.test         5.03       3.53 -29.8%
MicroBench...rks.test:benchForIC2VW4LoopTC1         3.13       2.19 -29.8%
MultiSourc...rimaran/enc-3des/enc-3des.test         1.88       1.32 -29.9%
MicroBench...or_runtime_checks_pass<4, int>         4.19       2.93 -29.9%
MultiSource/Applications/hbd/hbd.test               0.00       0.00 -30.0%
MicroBench...w.test:BM_FIRST_DIFF_RAW/44217        17.03      11.91 -30.1%
MicroBench...hForIC4VW4LoopWithReductionTC3         4.30       3.01 -30.1%
MicroBench...test:benchAutoVecForBigLoopTC3         5.74       4.01 -30.1%
SingleSour...enchmarks/Stanford/Puzzle.test         0.08       0.05 -30.2%
MicroBench...BENCHMARK_ORDERED_DITHER/256/8       327.33     227.86 -30.4%
MultiSourc...s-C/Pathfinder/PathFinder.test         2.60       1.81 -30.4%
MultiSourc...chmarks/MallocBench/gs/gs.test         0.03       0.02 -30.4%
MultiSourc...nchmarks/McCat/18-imp/imp.test         0.05       0.04 -30.4%
MicroBench...M_MemCmp<1, LessThanZero, Mid>      5723.69    3981.28 -30.4%
MultiSourc...s/ASC_Sequoia/AMGmk/AMGmk.test         8.49       5.90 -30.5%
SingleSour...s/Shootout/Shootout-sieve.test         4.04       2.80 -30.8%
Bitcode/si..._ops_test_op_packsswb_203.test         0.01       0.01 -30.9%
MicroBench...aw.test:BM_FIRST_DIFF_RAW/5001         1.93       1.33 -31.0%
MicroBench...ambda.test:BM_FIR_LAMBDA/44217       101.05      69.72 -31.0%
MicroBench...LoopFrom_uint32_t_To_uint16_t_      1933.70    1332.69 -31.1%
MultiSourc...arks/McCat/17-bintr/bintr.test         0.11       0.07 -31.3%
MicroBench...st:BM_BAND_LIN_EQ_LAMBDA/44217        29.96      20.59 -31.3%
MicroBench...Lambda.test:BM_COUPLE_LAMBDA/1       222.86     153.01 -31.3%
MicroBench...sCRaw.test:BM_PIC_1D_RAW/44217       456.01     312.98 -31.4%
MultiSourc...rks/FreeBench/mason/mason.test         0.17       0.12 -31.4%
MultiSourc...ctions-dbl/Reductions-dbl.test         3.66       2.51 -31.6%
MicroBench...BENCHMARK_ORDERED_DITHER/512/3      1500.73    1021.82 -31.9%
MicroBench...rIC1VW4BigLoopWithReductionTC1         5.78       3.93 -32.0%
MultiSourc...arching-dbl/Searching-dbl.test         3.76       2.55 -32.2%
MicroBench...MemCmp<6, LessThanZero, First>      1367.18     926.42 -32.2%
MultiSourc...mbolics-dbl/Symbolics-dbl.test         3.20       2.17 -32.4%
SingleSour...e/Benchmarks/Misc/salsa20.test         6.75       4.57 -32.4%
SingleSource/Benchmarks/Misc/flops.test             6.34       4.26 -32.7%
MicroBench...or_runtime_checks_fail<4, int>         4.80       3.23 -32.8%
MicroBench...st:BM_MemCmp<6, EqZero, First>      1340.79     899.15 -32.9%
SingleSour...out-C++/Shootout-C++-ary3.test         0.87       0.58 -33.0%
MicroBench...est:BM_MemCmp<1, EqZero, Last>      5698.19    3785.43 -33.6%
MicroBench...alsCRaw.test:BM_PIC_1D_RAW/171         1.35       0.90 -33.6%
MicroBench...sCRaw.test:BM_PIC_2D_RAW/44217       321.67     213.15 -33.7%
MicroBench...lsCRaw.test:BM_PIC_1D_RAW/5001        47.63      31.53 -33.8%
MicroBench...est:BM_BAND_LIN_EQ_LAMBDA/5001         3.29       2.17 -33.9%
MicroBench...mCmp<5, GreaterThanZero, Last>      1736.83    1147.00 -34.0%
MicroBench...test:BM_MemCmp<2, EqZero, Mid>      3017.86    1983.75 -34.3%
MultiSourc...hmarks/VersaBench/bmm/bmm.test         1.86       1.21 -34.7%
MicroBench...rIC1VW1BigLoopWithReductionTC3        10.16       6.62 -34.8%
MultiSourc...marks/Olden/health/health.test         0.28       0.18 -35.1%
MultiSourc...flt/LoopRestructuring-flt.test         3.55       2.30 -35.4%
External/S...speed/619.lbm_s/619.lbm_s.test       231.58     149.15 -35.6%
SingleSour.../Benchmarks/McGill/queens.test         2.74       1.76 -35.7%
SingleSour...e/Benchmarks/Misc/flops-6.test         1.38       0.88 -36.1%
MicroBench...est:BM_INNER_PROD_LAMBDA/44217       190.38     121.57 -36.1%
SingleSour...ncils/jacobi-1d/jacobi-1d.test         0.00       0.00 -36.2%
MicroBench...rIC4VW4BigLoopWithReductionTC3        11.07       7.06 -36.2%
MicroBench...ks.test:benchAutoVecForLoopTC4         3.99       2.54 -36.3%
MicroBench...rIC4VW1BigLoopWithReductionTC3        10.80       6.86 -36.5%
MicroBench...st:BM_DIFF_PREDICT_LAMBDA/5001        53.09      33.54 -36.8%
MicroBench....test:BM_HYDRO_2D_LAMBDA/44217      3997.18    2515.04 -37.1%
MicroBench...rks.test:benchForIC1VW4LoopTC4         3.94       2.47 -37.3%
MicroBench...emCmp<1, GreaterThanZero, Mid>      5419.27    3394.65 -37.4%
MultiSourc...+/HACCKernels/HACCKernels.test         1.97       1.23 -37.7%
MicroBench...mCmp<1, GreaterThanZero, None>      5387.86    3344.18 -37.9%
MultiSourc...ences-flt/Recurrences-flt.test         4.16       2.58 -38.0%
MultiSourc...telecomm-gsm/telecomm-gsm.test         0.14       0.09 -38.4%
MicroBench...mCmp<5, GreaterThanZero, None>      2269.31    1395.33 -38.5%
MicroBench....test:BM_INNER_PROD_LAMBDA/171         0.66       0.41 -38.6%
SingleSour...e/Benchmarks/Misc/flops-8.test         1.50       0.92 -38.7%
MicroBench...test:BM_INNER_PROD_LAMBDA/5001        21.50      13.18 -38.7%
MicroBench..._MemCmp<1, LessThanZero, Last>      6006.28    3661.63 -39.0%
MicroBench...Lambda.test:BM_ADI_LAMBDA/5001       201.79     122.77 -39.2%
SingleSour...out-C++/Shootout-C++-fibo.test         2.47       1.50 -39.4%
MicroBench...Raw.test:BM_HYDRO_2D_RAW/44217      2389.12    1445.45 -39.5%
MultiSourc...lFlow-dbl/ControlFlow-dbl.test         4.86       2.92 -39.8%
MicroBench...hForIC1VW4LoopWithReductionTC4         3.60       2.16 -40.0%
MicroBench...est:BM_DIFF_PREDICT_LAMBDA/171         1.22       0.73 -40.4%
MicroBench...nLoopFrom_uint64_t_To_uint8_t_     10102.82    6007.08 -40.5%
MicroBench...nLoopFrom_uint32_t_To_uint8_t_      2670.62    1584.32 -40.7%
MicroBench...MemCmp<1, LessThanZero, First>      5775.26    3411.98 -40.9%
SingleSour.../Shootout/Shootout-random.test         2.66       1.56 -41.3%
SingleSour.../stencils/heat-3d/heat-3d.test         5.01       2.90 -42.1%
SingleSour...ut-C++/Shootout-C++-lists.test         4.98       2.87 -42.3%
MultiSourc...l/StatementReordering-dbl.test         4.15       2.39 -42.3%
MicroBench...mCmp<1, GreaterThanZero, Last>      6087.05    3502.32 -42.5%
MicroBench...lsCRaw.test:BM_PIC_2D_RAW/5001        35.24      20.18 -42.7%
SingleSour...arks/Misc-C++/oopack_v1p8.test         0.15       0.08 -42.9%
MultiSourc...ing-flt/NodeSplitting-flt.test         3.57       2.03 -43.1%
MultiSourc...ing-dbl/NodeSplitting-dbl.test         4.32       2.45 -43.2%
MicroBench...st:BENCHMARK_GAUSSIAN_BLUR/512     18007.76   10109.04 -43.9%
MultiSourc...nchmarks/llubenchmark/llu.test         5.11       2.86 -44.0%
MicroBench...st:BM_MemCmp<2, EqZero, First>      2956.04    1651.12 -44.1%
MultiSourc...ing-dbl/Equivalencing-dbl.test         1.21       0.67 -44.5%
MicroBench...LoopFrom_uint64_t_To_uint16_t_      3153.34    1746.99 -44.6%
MultiSourc...bl/CrossingThresholds-dbl.test         3.55       1.97 -44.7%
MicroBench...Cmp<1, GreaterThanZero, First>      6149.79    3399.81 -44.7%
MicroBench...thVW8From_uint64_t_To_uint8_t_      2518.72    1392.43 -44.7%
MicroBench...st:BENCHMARK_GAUSSIAN_BLUR/256      4289.63    2371.32 -44.7%
MicroBench...MemCmp<7, LessThanZero, First>       854.19     471.81 -44.8%
MicroBench...hVW16From_uint32_t_To_uint8_t_      1160.91     640.47 -44.8%
MicroBench....test:BM_FIRST_SUM_LAMBDA/5001         6.49       3.58 -44.9%
MicroBench...CRaw.test:BM_HYDRO_2D_RAW/5001       257.58     141.98 -44.9%
MicroBench..._MemCmp<7, LessThanZero, None>      1275.68     698.27 -45.3%
MicroBench...st:BENCHMARK_GAUSSIAN_BLUR/128       999.07     542.55 -45.7%
MicroBench...hForIC1VW1LoopWithReductionTC4         3.52       1.90 -46.0%
MicroBench...LoopFrom_uint64_t_To_uint16_t_      4288.70    2315.80 -46.0%
MicroBench...imeChecks4PointersDAfterA/1000    821368.48  442244.34 -46.2%
SingleSour...solvers/cholesky/cholesky.test        26.06      13.94 -46.5%
MicroBench..._MemCmp<6, LessThanZero, Last>      1722.53     914.91 -46.9%
MicroBench...rks.test:benchForIC2VW4LoopTC4         5.36       2.81 -47.5%
MicroBench...VW16From_uint32_t_To_uint16_t_      1515.47     794.52 -47.6%
MicroBench...a.test:BM_FIRST_SUM_LAMBDA/171         0.20       0.11 -47.7%
MultiSourc...ks/Prolangs-C++/life/life.test         1.62       0.84 -47.8%
MicroBench...hVW16From_uint64_t_To_uint8_t_      2662.67    1388.36 -47.9%
MicroBench...nLoopFrom_uint16_t_To_uint8_t_       604.44     313.98 -48.1%
SingleSour...++/Shootout-C++-ackermann.test         2.01       1.04 -48.1%
MicroBench...hForIC2VW4LoopWithReductionTC4         4.62       2.38 -48.4%
MicroBench...hVW16From_uint16_t_To_uint8_t_       721.11     371.94 -48.4%
MultiSourc...mbolics-flt/Symbolics-flt.test         1.68       0.86 -48.8%
MicroBench...ntimeChecks4PointersDAfterA/32     28293.26   14463.40 -48.9%
MicroBench...thVW8From_uint64_t_To_uint8_t_      3059.55    1561.25 -49.0%
MicroBench...hVW16From_uint64_t_To_uint8_t_      2690.34    1372.36 -49.0%
MicroBench...thVW8From_uint32_t_To_uint8_t_      1242.54     632.76 -49.1%
MicroBench...t:BM_DIFF_PREDICT_LAMBDA/44217      1616.25     822.41 -49.1%
MicroBench...hVW16From_uint32_t_To_uint8_t_      1380.72     701.69 -49.2%
SingleSour...ks/Misc-C++/stepanov_v1p2.test         8.23       4.17 -49.3%
MicroBench...hVW16From_uint16_t_To_uint8_t_       577.28     291.52 -49.5%
MicroBench...ForLoopWithReductionAutoVecTC4         4.56       2.29 -49.7%
MicroBench...hForIC4VW4LoopWithReductionTC4         4.56       2.29 -49.8%
MicroBench...VW16From_uint64_t_To_uint16_t_      2422.60    1206.03 -50.2%
MultiSourc...oops-dbl/ControlLoops-dbl.test         3.18       1.58 -50.4%
SingleSour...a/kernels/doitgen/doitgen.test         1.74       0.86 -50.7%
MicroBench...VW16From_uint64_t_To_uint16_t_      2729.38    1344.61 -50.7%
MicroBench...nLoopFrom_uint16_t_To_uint8_t_      1098.01     536.87 -51.1%
MicroBench...hVW8From_uint32_t_To_uint16_t_      1853.88     898.93 -51.5%
MicroBench...thVW8From_uint32_t_To_uint8_t_      1730.42     835.26 -51.7%
MicroBench...sCRaw.test:BM_HYDRO_2D_RAW/171         9.04       4.33 -52.1%
MicroBench...hVW8From_uint32_t_To_uint16_t_      1578.26     753.25 -52.3%
MicroBench...thVW8From_uint16_t_To_uint8_t_      1179.84     562.59 -52.3%
MicroBench...ambda.test:BM_ADI_LAMBDA/44217      2233.48    1061.04 -52.5%
MicroBench...st:BM_FIND_FIRST_MIN_RAW/44217        41.87      19.87 -52.5%
MicroBench...hVW8From_uint64_t_To_uint16_t_      2801.24    1326.70 -52.6%
MicroBench...rks.test:benchForIC4VW4LoopTC4         5.71       2.69 -53.0%
MicroBench...hVW8From_uint64_t_To_uint16_t_      2551.07    1183.66 -53.6%
MicroBench...thVW8From_uint16_t_To_uint8_t_       701.71     325.58 -53.6%
MicroBench...test:BM_FIND_FIRST_MIN_RAW/171         0.18       0.08 -53.8%
MicroBench...VW16From_uint32_t_To_uint16_t_      2029.97     926.66 -54.4%
MicroBench...est:BM_FIND_FIRST_MIN_RAW/5001         4.93       2.24 -54.5%
MicroBench...CRaw.test:BM_MAT_X_MAT_RAW/171       153.31      69.61 -54.6%
MicroBench...M_MemCmp<6, LessThanZero, Mid>      1333.29     602.34 -54.8%
MultiSourc...lt/CrossingThresholds-flt.test         3.00       1.35 -55.0%
MicroBench...est:BM_MemCmp<7, EqZero, Last>       927.99     414.67 -55.3%
MicroBench...test:BM_MemCmp<7, EqZero, Mid>      1001.66     447.10 -55.4%
MicroBench...Raw.test:BM_FIRST_DIFF_RAW/171         0.07       0.03 -55.4%
MicroBench...st:BM_MemCmp<7, EqZero, First>       939.58     415.92 -55.7%
MicroBench...Cmp<6, GreaterThanZero, First>      1521.75     662.99 -56.4%
MicroBench...LoopFrom_uint32_t_To_uint16_t_      3028.78    1228.12 -59.5%
MicroBench...est:BM_MemCmp<7, EqZero, None>      1078.86     426.54 -60.5%
MicroBench...st:BM_INT_PREDICT_LAMBDA/44217       727.39     287.08 -60.5%
MicroBench...aw.test:BM_PLANCKIAN_RAW/44217       653.45     255.89 -60.8%
MicroBench...CRaw.test:BM_PLANCKIAN_RAW/171         2.41       0.94 -60.9%
MicroBench...mCmp<6, GreaterThanZero, None>      2133.19     831.83 -61.0%
MicroBench...emCmp<6, GreaterThanZero, Mid>      1669.55     647.73 -61.2%
MicroBench....test:BM_MAT_X_MAT_LAMBDA/5001     23294.67    8864.86 -61.9%
MicroBench...mCmp<6, GreaterThanZero, Last>      2606.73     925.93 -64.5%
MicroBench...Raw.test:BM_PLANCKIAN_RAW/5001        83.85      28.09 -66.5%
MultiSourc...rolangs-C++/primes/primes.test         0.52       0.16 -68.2%
SingleSour...g/correlation/correlation.test         9.82       3.11 -68.4%
                           Geomean difference                        -8.7%

@artagnon
Copy link
Contributor Author

I didn't see statistically-significant differences on SPEC 2017.

I should add that this is probably due to the fact that SPEC 2017 is polluted with short-running tests.

@artagnon
Copy link
Contributor Author

artagnon commented Aug 1, 2024

I knew that I'd messed something up, because such a crazy speed up was unexpected. The actual speed up with the revised version of the patch (and better benchmarking) is of the order of 1%, and this is across all targets. Full results can be viewed below.

Program                                       exec_time
                                              lhs       rhs       diff
SingleSour...g/correlation/correlation.test        1.01      2.55 151.5%
MultiSourc...ch/g721/g721encode/encode.test        0.02      0.05 148.8%
Bitcode/si...d_ops_test_op_cmpeqpd_197.test        0.00      0.01 130.3%
Bitcode/si...simd_ops_test_op_mulps_71.test        0.00      0.01 112.5%
SingleSour...a/solvers/trisolv/trisolv.test        0.01      0.02 107.3%
MultiSourc...plications/d/make_dparser.test        0.01      0.02  96.1%
SingleSour...enchmarks/Stanford/Puzzle.test        0.03      0.06  96.0%
MultiSourc...pplications/oggenc/oggenc.test        0.05      0.09  89.5%
SingleSour...tout-C++/Shootout-C++-ary.test        0.03      0.05  73.8%
Bitcode/si...imd_ops_test_op_paddq_199.test        0.00      0.01  70.4%
SingleSour...enchmarks/Stanford/Towers.test        0.01      0.02  66.6%
Bitcode/si...imd_ops_test_op_paddb_138.test        0.00      0.01  63.7%
MultiSourc...arks/McCat/17-bintr/bintr.test        0.05      0.08  63.1%
SingleSour...-C++/Shootout-C++-moments.test        0.06      0.09  59.1%
Bitcode/si...imd_ops_test_op_addpd_220.test        0.00      0.00  57.6%
Bitcode/si...d_ops_test_op_cmpltpd_228.test        0.00      0.00  57.1%
MultiSourc.../Prolangs-C++/ocean/ocean.test        0.04      0.06  53.5%
Bitcode/si...simd_ops_test_op_maxps_27.test        0.00      0.01  53.5%
Bitcode/si...imd_ops_test_op_pabsw_242.test        0.00      0.01  49.8%
Bitcode/si...simd_ops_test_op_minps_76.test        0.00      0.01  48.3%
MultiSourc...telecomm-gsm/telecomm-gsm.test        0.05      0.07  46.1%
Bitcode/si...imd_ops_test_op_mulpd_207.test        0.00      0.01  44.5%
Bitcode/si...simd_ops_test_op_addps_21.test        0.00      0.01  43.3%
Bitcode/si..._ops_test_op_packuswb_219.test        0.00      0.01  42.8%
SingleSour...ootout/Shootout-ackermann.test        0.00      0.01  42.5%
Bitcode/si...imd_ops_test_op_paddsb_50.test        0.00      0.01  42.2%
Bitcode/si...d_ops_test_op_cmpltps_185.test        0.00      0.01  38.8%
MultiSourc...ve-susan/automotive-susan.test        0.02      0.03  37.9%
SingleSour...-algebra/blas/syr2k/syr2k.test        2.57      3.53  37.3%
MultiSourc...rolangs-C++/employ/employ.test        0.00      0.01  34.9%
SingleSour...s/gramschmidt/gramschmidt.test        3.59      4.84  34.8%
Bitcode/si...simd_ops_test_op_orps_187.test        0.00      0.01  33.6%
Bitcode/si...simd_ops_test_op_addps_69.test        0.00      0.01  30.5%
MultiSourc...encode/alacconvert-encode.test        0.02      0.03  29.7%
SingleSour...nchmarks/Stanford/FloatMM.test        0.06      0.08  29.3%
SingleSource/Benchmarks/Misc/dt.test               0.12      0.15  24.9%
Bitcode/si...imd_ops_test_op_pabsw_236.test        0.01      0.01  21.8%
SingleSour...bra/solvers/ludcmp/ludcmp.test       20.77     25.21  21.4%
SingleSour.../Benchmarks/Stanford/Perm.test        0.01      0.01  20.9%
MicroBench...ks.test:benchForIC1VW4LoopTC63        5.59      6.70  19.8%
Bitcode/si...md_ops_test_op_cmpltps_41.test        0.01      0.01  19.6%
Bitcode/si...md_ops_test_op_cmpltps_89.test        0.01      0.01  19.3%
MultiSourc...rks/Olden/voronoi/voronoi.test        0.14      0.17  19.0%
Bitcode/si...imd_ops_test_op_mulps_119.test        0.01      0.01  19.0%
MultiSourc...math/automotive-basicmath.test        0.11      0.13  16.6%
SingleSour...hmarks/Stanford/Quicksort.test        0.04      0.05  16.6%
SingleSour...solvers/cholesky/cholesky.test       10.06     11.64  15.8%
SingleSour...bra/solvers/durbin/durbin.test        0.01      0.01  15.7%
Bitcode/si...imd_ops_test_op_paddq_229.test        0.00      0.00  15.2%
Bitcode/si...d_ops_test_op_cmpeqpd_227.test        0.00      0.00  14.8%
SingleSour...ear-algebra/solvers/lu/lu.test       21.93     25.18  14.8%
MicroBench...rks.test:benchForIC2VW4LoopTC1        1.25      1.43  14.4%
MultiSourc...eeBench/analyzer/analyzer.test        0.04      0.04  14.2%
MultiSourc...lications/ClamAV/clamscan.test        0.07      0.08  12.6%
MultiSourc...chmarks/MallocBench/gs/gs.test        0.03      0.03  12.6%
SingleSour...out-C++/Shootout-C++-ary2.test        0.05      0.06  12.1%
MultiSourc...nch/beamformer/beamformer.test        0.30      0.34  11.0%
MultiSourc...yApps-C++/PENNANT/PENNANT.test        0.20      0.22  10.9%
MicroBench...orIC4VW1LoopWithReductionTC128        6.86      7.61  10.9%
Bitcode/si..._ops_test_op_blendvps_253.test        0.01      0.01  10.8%
SingleSour...t-C++/Shootout-C++-lists1.test        0.09      0.10  10.7%
MultiSourc.../Prolangs-C++/simul/simul.test        0.00      0.01  10.5%
MultiSourc...nchmarks/McCat/18-imp/imp.test        0.04      0.04  10.3%
Bitcode/si...imd_ops_test_op_pabsb_241.test        0.01      0.01  10.2%
SingleSour...ar-algebra/blas/symm/symm.test        0.00      0.00   9.9%
Bitcode/Re...on/vector_widen/widen_bug.test        0.05      0.05   9.9%
MicroBench...orIC2VW1LoopWithReductionTC128       19.56     21.40   9.4%
SingleSour...t-C++/Shootout-C++-matrix.test        0.48      0.53   9.2%
MultiSourc...s/Rodinia/hotspot/hotspot.test        0.09      0.10   9.1%
MultiSourc...ing-flt/Equivalencing-flt.test        0.23      0.25   8.8%
Bitcode/si...imd_ops_test_op_paddsb_49.test        0.01      0.01   8.6%
MicroBench...rIC4VW1BigLoopWithReductionTC1        2.48      2.69   8.6%
MicroBench...ks.test:benchAutoVecForLoopTC1        1.31      1.42   8.5%
MicroBench...rIC2VW4BigLoopWithReductionTC1        2.45      2.65   8.3%
MicroBench...rIC4VW4BigLoopWithReductionTC1        2.47      2.67   8.2%
MultiSourc...marks/Olden/bisort/bisort.test        0.20      0.22   7.9%
MultiSourc...arks/BitBench/drop3/drop3.test        0.11      0.12   7.8%
MicroBench...BigLoopWithReductionAutoVecTC1        2.45      2.64   7.8%
MicroBench...rIC1VW4BigLoopWithReductionTC1        2.46      2.65   7.7%
MicroBench...rks.test:benchForIC1VW4LoopTC1        1.32      1.42   7.7%
MicroBench...rks.test:benchForIC4VW4LoopTC1        1.33      1.43   7.5%
MicroBench...rIC1VW1BigLoopWithReductionTC1        2.39      2.57   7.4%
Bitcode/Be...rid/halide_bilateral_grid.test       14.84     15.94   7.4%
MicroBench...ks.test:benchForIC4VW4LoopTC31        5.10      5.46   7.1%
SingleSour...e/Benchmarks/Misc/ffbench.test        0.25      0.27   7.1%
MultiSourc...-typeset/consumer-typeset.test        0.07      0.08   7.0%
MultiSource/Applications/hbd/hbd.test              0.00      0.00   7.0%
MicroBench...test:BM_MemCmp<5, EqZero, Mid>      427.05    456.42   6.9%
MicroBench...rks.test:benchForIC4VW4LoopTC2        1.27      1.35   6.8%
Bitcode/si...imd_ops_test_op_minps_172.test        0.01      0.01   6.8%
MicroBench...ks.test:benchAutoVecForLoopTC2        1.26      1.35   6.6%
Bitcode/si..._ops_test_op_packssdw_202.test        0.01      0.01   6.3%
MicroBench...rks.test:benchForIC2VW4LoopTC2        1.27      1.35   6.2%
MicroBench...test:benchAutoVecForBigLoopTC1        1.35      1.44   6.1%
MultiSourc.../Benchmarks/Ptrdist/ft/ft.test        0.34      0.37   6.0%
MultiSourc...ocBench/espresso/espresso.test        0.20      0.21   5.9%
MultiSourc...dbl/LoopRestructuring-dbl.test        1.63      1.73   5.9%
MultiSourc...oxyApps-C/RSBench/rsbench.test        0.17      0.18   5.8%
MicroBench...rks.test:benchForIC1VW4LoopTC2        1.30      1.37   5.6%
MultiSourc.../Trimaran/enc-pc1/enc-pc1.test        0.19      0.20   5.5%
MultiSourc...s/MallocBench/cfrac/cfrac.test        0.29      0.31   5.5%
SingleSour...C++/Shootout-C++-heapsort.test        1.12      1.18   5.4%
MicroBench...test:BM_MULADDSUB_LAMBDA/44217       36.17     38.13   5.4%
Bitcode/si...imd_ops_test_op_addpd_190.test        0.01      0.01   5.3%
MicroBench...test:benchAutoVecForBigLoopTC2        1.28      1.35   5.3%
MultiSourc...s/Fhourstones/fhourstones.test        0.37      0.39   5.1%
SingleSour...e/Benchmarks/Misc/flops-7.test        0.28      0.30   4.9%
SingleSour...e/Benchmarks/Misc/flops-6.test        0.45      0.47   4.9%
SingleSour...e/Benchmarks/McGill/chomp.test        0.46      0.48   4.8%
MultiSourc.../Benchmarks/nbench/nbench.test        0.62      0.65   4.8%
Bitcode/si...simd_ops_test_op_paddb_90.test        0.01      0.01   4.7%
MicroBench....test:BENCHMARK_HARRIS/512/512      860.74    899.70   4.5%
SingleSour...rks/Adobe-C++/loop_unroll.test        0.27      0.29   4.5%
MicroBench...mbda.test:BM_PIC_2D_LAMBDA/171        0.59      0.61   4.4%
SingleSour...arks/Misc-C++/mandel-text.test        0.62      0.64   4.4%
MicroBench...mbda.test:BM_INIT3_LAMBDA/5001        2.57      2.68   4.4%
SingleSour...ce/Benchmarks/McGill/misr.test        0.07      0.07   4.3%
MultiSourc...flt/LoopRestructuring-flt.test        1.38      1.44   4.2%
SingleSour...-C++/Shootout-C++-objinst.test        0.00      0.00   4.1%
SingleSour...marks/Misc/matmul_f64_4x4.test        0.32      0.34   4.1%
MultiSourc...rks/tramp3d-v4/tramp3d-v4.test        0.09      0.09   4.1%
MicroBench....test:BM_MULADDSUB_LAMBDA/5001        3.47      3.60   3.8%
Bitcode/si...imd_ops_test_op_maxps_123.test        0.00      0.00   3.7%
Bitcode/si...imd_ops_test_op_pabsd_243.test        0.01      0.01   3.6%
MicroBench....test:BENCHMARK_HARRIS/256/256      203.70    210.94   3.6%
MultiSourc...ing-dbl/LoopRerolling-dbl.test        1.26      1.30   3.4%
SingleSour.../Benchmarks/Misc/mandel-2.test        0.32      0.33   3.3%
SingleSour...a/kernels/doitgen/doitgen.test        0.49      0.50   3.3%
MultiSourc...lFlow-flt/ControlFlow-flt.test        1.00      1.04   3.3%
MicroBench...Raw.test:BM_MAT_X_MAT_RAW/5001     6202.16   6402.34   3.2%
Bitcode/si...imd_ops_test_op_andps_186.test        0.01      0.01   3.2%
MultiSourc...stones-3.1/fhourstones3.1.test        0.52      0.54   3.2%
MultiSourc...dbl/InductionVariable-dbl.test        1.37      1.42   3.2%
MultiSourc...nchmarks/llubenchmark/llu.test        1.13      1.17   3.1%
SingleSour...BenchmarkGame/nsieve-bits.test        0.28      0.29   3.1%
Bitcode/si...imd_ops_test_op_paddq_214.test        0.01      0.01   3.0%
MicroBench...est:BM_PRESSURE_CALC_RAW/44217       48.12     49.55   3.0%
SingleSour.../medley/nussinov/nussinov.test        3.24      3.33   2.9%
MicroBench...t:BENCHMARK_atanf_novec_float_      113.21    116.38   2.8%
MicroBench...w.test:BM_INT_PREDICT_RAW/5001        9.83     10.10   2.8%
SingleSour.../Benchmarks/Misc/evalloop.test        0.36      0.37   2.7%
SingleSour.../Benchmarks/McGill/queens.test        1.17      1.20   2.7%
MicroBench...orIC2VW1LoopWithReductionTC127       16.46     16.90   2.6%
MultiSourc...arks/VersaBench/dbms/dbms.test        0.47      0.48   2.5%
MicroBench...ic128SmallDivisor<__uint128_t>        4.74      4.85   2.4%
Bitcode/si...imd_ops_test_op_maxpd_225.test        0.01      0.01   2.4%
MultiSourc...ences-flt/Recurrences-flt.test        1.72      1.76   2.3%
MicroBench...rks.test:benchForIC4VW4LoopTC3        1.54      1.57   2.3%
MultiSourc...oxyApps-C/XSBench/XSBench.test        1.08      1.11   2.3%
MicroBench...ForIC2VW1LoopWithReductionTC32        5.85      5.98   2.3%
MicroBench...rks.test:benchForIC2VW4LoopTC3        1.53      1.57   2.2%
MultiSourc...quoia/CrystalMk/CrystalMk.test        1.47      1.50   2.2%
MultiSourc...ing-flt/LoopRerolling-flt.test        1.21      1.24   2.2%
MicroBench...aw.test:BM_ENERGY_CALC_RAW/171        0.57      0.58   2.2%
MultiSourc...chmarks/Olden/power/power.test        0.27      0.28   2.1%
MicroBench...test:BM_PRESSURE_CALC_RAW/5001        5.00      5.11   2.1%
MicroBench...da.test:BM_VOL3D_CALC_LAMBDA/2        0.86      0.88   2.0%
MicroBench...bda.test:BM_IF_QUAD_LAMBDA/171        0.44      0.45   2.0%
MultiSourc...ProxyApps-C++/HPCCG/HPCCG.test        0.28      0.29   1.9%
MultiSourc.../Benchmarks/Olden/tsp/tsp.test        0.25      0.25   1.9%
MicroBench...sARaw.test:BM_VOL3D_CALC_RAW/2        0.85      0.87   1.9%
MicroBench...w.test:BM_DEL_DOT_VEC_2D_RAW/0      121.91    124.22   1.9%
MultiSourc...rks/Olden/treeadd/treeadd.test        0.11      0.11   1.9%
MultiSourc...flt/InductionVariable-flt.test        1.07      1.09   1.9%
MicroBench...lcalsARaw.test:BM_COUPLE_RAW/1       96.39     98.19   1.9%
MicroBench...test:benchAutoVecForBigLoopTC3        1.54      1.57   1.9%
MicroBench...rks.test:benchForIC2VW4LoopTC4        1.71      1.74   1.8%
MultiSourc...oops-dbl/ControlLoops-dbl.test        1.03      1.04   1.8%
MicroBench...Lambda.test:BM_COUPLE_LAMBDA/1       98.39    100.13   1.8%
SingleSour...C++/Shootout-C++-methcall.test        2.15      2.19   1.7%
MultiSourc...lications/SIBsim4/SIBsim4.test        1.15      1.17   1.7%
MicroBench....test:BM_PRESSURE_CALC_RAW/171        0.16      0.16   1.7%
SingleSour...e/Benchmarks/Misc/flops-3.test        0.49      0.50   1.7%
MicroBench...hForIC4VW1LoopWithReductionTC7        1.97      2.01   1.7%
MicroBench...lcalsARaw.test:BM_FIR_RAW/5001        5.23      5.32   1.7%
MicroBench...ambda.test:BM_INIT3_LAMBDA/171        0.04      0.04   1.7%
MicroBench...da.test:BM_VOL3D_CALC_LAMBDA/1       22.90     23.29   1.7%
MicroBench...ic128SmallDivisor<__uint128_t>        4.65      4.73   1.7%
MicroBench...lcalsARaw.test:BM_COUPLE_RAW/2        0.78      0.79   1.7%
MicroBench...bda.test:BM_INIT3_LAMBDA/44217       29.44     29.93   1.7%
MicroBench...c128UniformDivisor<__int128_t>        4.11      4.17   1.6%
MicroBench...BENCHMARK_acos_autovec_double_       47.20     47.97   1.6%
MicroBench...t:BENCHMARK_acosf_novec_float_       83.16     84.52   1.6%
MicroBench...sARaw.test:BM_VOL3D_CALC_RAW/0      103.69    105.37   1.6%
MicroBench...a.test:BM_MULADDSUB_LAMBDA/171        0.06      0.06   1.6%
MultiSourc...lications/minisat/minisat.test        2.38      2.41   1.6%
MicroBench...w.test:BM_ENERGY_CALC_RAW/5001       17.78     18.06   1.6%
MicroBench...test:BM_ENERGY_CALC_LAMBDA/171        0.58      0.59   1.5%
MicroBench...128UniformDivisor<__uint128_t>        8.33      8.45   1.5%
MicroBench...Lambda.test:BM_COUPLE_LAMBDA/0      573.85    582.37   1.5%
SingleSour...ks/Misc-C++/stepanov_v1p2.test        2.45      2.48   1.4%
MicroBench...BENCHMARK_atanf_autovec_float_      112.63    114.24   1.4%
MicroBench...Lambda.test:BM_COUPLE_LAMBDA/2        0.78      0.79   1.4%
MicroBench...late.test:BENCHMARK_DILATE/128       23.01     23.34   1.4%
SingleSource/Benchmarks/Misc/flops.test            2.15      2.18   1.4%
MicroBench...da.test:BM_VOL3D_CALC_LAMBDA/0      105.65    107.14   1.4%
MicroBench...est:BM_DEL_DOT_VEC_2D_LAMBDA/0      122.66    124.38   1.4%
MicroBench...est:BM_DEL_DOT_VEC_2D_LAMBDA/1       20.24     20.53   1.4%
MicroBench...BENCHMARK_acosf_autovec_float_       83.19     84.35   1.4%
MultiSourc...ks/BitBench/five11/five11.test        0.54      0.55   1.4%
SingleSour...++/Shootout-C++-ackermann.test        0.57      0.58   1.4%
SingleSour.../Shootout/Shootout-matrix.test        0.70      0.71   1.4%
MicroBench...t:BENCHMARK_acos_novec_double_       47.30     47.93   1.3%
MultiSourc...rolangs-C++/primes/primes.test        0.13      0.13   1.3%
MicroBench...sARaw.test:BM_VOL3D_CALC_RAW/1       22.55     22.85   1.3%
SingleSour...Benchmarks/Misc/whetstone.test        0.37      0.37   1.3%
MultiSource/Applications/aha/aha.test              0.53      0.54   1.3%
MicroBench...ate.test:BENCHMARK_DILATE/1024     1379.31   1397.26   1.3%
SingleSour...olybench/stencils/adi/adi.test        5.80      5.87   1.3%
MicroBench...rks.test:benchForIC1VW4LoopTC3        1.54      1.56   1.3%
MicroBench...sic128SmallDivisor<__int128_t>        5.65      5.71   1.2%
MultiSourc...lt/IndirectAddressing-flt.test        1.07      1.08   1.2%
SingleSour...hmarks/Linpack/linpack-pc.test        0.59      0.60   1.2%
SingleSour...algebra/kernels/atax/atax.test        0.02      0.02   1.2%
MicroBench...:BM_PRESSURE_CALC_LAMBDA/44217       48.08     48.64   1.2%
MicroBench...lcalsARaw.test:BM_COUPLE_RAW/0      564.64    571.20   1.2%
MicroBench...st:BM_ENERGY_CALC_LAMBDA/44217      176.51    178.56   1.2%
SingleSour...ce/Benchmarks/Misc/perlin.test        0.83      0.83   1.2%
MicroBench...late.test:BENCHMARK_DILATE/512      348.01    352.00   1.1%
MicroBench...est:BM_DEL_DOT_VEC_2D_LAMBDA/2        0.25      0.25   1.1%
MultiSourc...s/ASC_Sequoia/IRSmk/IRSmk.test        0.72      0.72   1.1%
MultiSourc...mbolics-flt/Symbolics-flt.test        0.48      0.48   1.1%
MicroBench...w.test:BM_DEL_DOT_VEC_2D_RAW/2        0.25      0.25   1.1%
MicroBench...da.test:BM_IF_QUAD_LAMBDA/5001       12.87     13.02   1.1%
MicroBench...a.test:BM_IF_QUAD_LAMBDA/44217      114.65    115.91   1.1%
MicroBench...late.test:BENCHMARK_DILATE/256       88.93     89.91   1.1%
MultiSourc...nchmarks/NPB-serial/is/is.test        2.94      2.97   1.1%
MicroBench...c128UniformDivisor<__int128_t>        9.40      9.50   1.1%
MicroBench...ks.test:benchAutoVecForLoopTC3        1.54      1.56   1.1%
SingleSour...ce/Benchmarks/Misc/mandel.test        0.15      0.15   1.1%
MultiSourc...ProxyApps-C++/CLAMR/CLAMR.test        0.60      0.61   1.0%
MicroBench...st:BENCHMARK_exp_novec_double_       98.13     99.14   1.0%
MicroBench...128UniformDivisor<__uint128_t>        5.07      5.11   1.0%
Bitcode/si...md_ops_test_op_cmpeqps_40.test        0.01      0.01   1.0%
MicroBench...rIC2VW1BigLoopWithReductionTC2        3.45      3.48   0.9%
MicroBench...t:BM_PRESSURE_CALC_LAMBDA/5001        5.06      5.11   0.9%
SingleSour...ks/Shootout/Shootout-fib2.test        1.06      1.07   0.9%
MicroBench...st:BM_PRESSURE_CALC_LAMBDA/171        0.16      0.16   0.9%
MicroBench...ALambda.test:BM_FIR_LAMBDA/171        0.16      0.17   0.9%
MicroBench.../lcalsARaw.test:BM_FIR_RAW/171        0.16      0.17   0.9%
MicroBench...Lambda.test:BM_FIR_LAMBDA/5001        5.28      5.32   0.8%
MultiSourc.../Benchmarks/Bullet/bullet.test        1.55      1.56   0.8%
MicroBench...ARK_BILINEAR_INTERPOLATION/256     2011.02   2027.81   0.8%
MicroBench...st:BENCHMARK_expf_novec_float_       52.63     53.07   0.8%
MicroBench...BENCHMARK_atan_autovec_double_      118.24    119.19   0.8%
MultiSourc.../Applications/spiff/spiff.test        0.79      0.80   0.8%
MicroBench...:BENCHMARK_erff_autovec_float_       91.54     92.27   0.8%
MultiSourc...ctions-dbl/Reductions-dbl.test        1.73      1.74   0.8%
MicroBench...ForIC4VW4LoopWithReductionTC64        2.60      2.62   0.8%
MicroBench...rks.test:benchForIC2VW4LoopTC8        1.35      1.36   0.8%
MultiSource/Applications/siod/siod.test            0.70      0.70   0.7%
MicroBench...w.test:BM_DEL_DOT_VEC_2D_RAW/1       20.30     20.45   0.7%
MicroBench...est:BM_ENERGY_CALC_LAMBDA/5001       17.65     17.78   0.7%
SingleSour.../stencils/fdtd-2d/fdtd-2d.test        2.02      2.03   0.7%
MicroBench...a.test:BM_DISC_ORD_LAMBDA/5001       73.48     74.00   0.7%
MicroBench...est:benchAutoVecForBigLoopTC31        3.13      3.15   0.7%
MicroBench...calsARaw.test:BM_FIR_RAW/44217       46.82     47.15   0.7%
MicroBench...sic128SmallDivisor<__int128_t>        5.68      5.72   0.6%
SingleSour...s/Shootout/Shootout-sieve.test        1.49      1.50   0.6%
MicroBench...hForIC1VW4LoopWithReductionTC1        1.17      1.18   0.6%
MicroBench....test:BM_TRAP_INT_LAMBDA/44217       87.13     87.65   0.6%
MicroBench...st:BENCHMARK_erf_novec_double_      100.88    101.49   0.6%
MultiSourc...pansion-flt/Expansion-flt.test        0.63      0.64   0.6%
SingleSour...nchmarks/Misc/ReedSolomon.test        1.64      1.65   0.6%
MicroBench...Raw.test:BM_DISC_ORD_RAW/44217      653.08    656.91   0.6%
MicroBench...:BENCHMARK_sin_autovec_double_      282.63    284.28   0.6%
MicroBench...hForIC4VW1LoopWithReductionTC8        1.62      1.63   0.6%
MicroBench...hForIC1VW1LoopWithReductionTC4        1.33      1.34   0.6%
SingleSour...rks/CoyoteBench/huffbench.test        5.88      5.91   0.5%
MicroBench...a.test:BM_TRAP_INT_LAMBDA/5001        9.85      9.90   0.5%
SingleSour...ut-C++/Shootout-C++-hash2.test        1.02      1.02   0.5%
MicroBench...ForIC4VW4LoopWithReductionTC31        4.29      4.31   0.5%
SingleSour...t-C++/Shootout-C++-random.test        1.19      1.19   0.5%
MicroBench...hForIC1VW4LoopWithReductionTC2        1.53      1.54   0.5%
MultiSourc...ks/VersaBench/8b10b/8b10b.test        2.45      2.46   0.5%
MultiSourc...CI_Purple/SMG2000/smg2000.test        0.73      0.73   0.5%
MicroBench...MARK_BILINEAR_INTERPOLATION/64      125.72    126.27   0.4%
MicroBench...ForIC4VW1LoopWithReductionTC16        1.56      1.57   0.4%
MultiSourc...ks/McCat/12-IOtest/iotest.test        0.11      0.11   0.4%
MicroBench...:BENCHMARK_erf_autovec_double_      101.14    101.54   0.4%
MicroBench...BENCHMARK_asin_autovec_double_       48.92     49.11   0.4%
MicroBench...Raw.test:BM_TRAP_INT_RAW/44217       86.94     87.27   0.4%
MultiSourc...marks/7zip/7zip-benchmark.test        4.37      4.38   0.4%
MicroBench...t:BENCHMARK_cbrt_novec_double_      196.09    196.82   0.4%
MicroBench...BRaw.test:BM_TRAP_INT_RAW/5001        9.80      9.84   0.4%
MultiSourc...nch/fourinarow/fourinarow.test        0.78      0.78   0.4%
SingleSour...rks/CoyoteBench/almabench.test        2.27      2.28   0.4%
MicroBench...ForLoopWithReductionAutoVecTC2        1.53      1.53   0.3%
MicroBench...st:BENCHMARK_sin_novec_double_      283.31    284.27   0.3%
MultiSourc.../Trimaran/enc-rc4/enc-rc4.test        0.56      0.56   0.3%
MicroBench...hForIC2VW4LoopWithReductionTC2        1.53      1.54   0.3%
MicroBench...intersAllDisjointIncreasing/32     9215.63   9243.57   0.3%
MicroBench...C4VW4BigLoopWithReductionTC128       38.25     38.37   0.3%
MicroBench...da.test:BM_TRAP_INT_LAMBDA/171        0.34      0.34   0.3%
MicroBench...t:BENCHMARK_cbrtf_novec_float_      189.82    190.39   0.3%
MicroBench...st:BENCHMARK_sinf_novec_float_      127.09    127.46   0.3%
MicroBench...t:BENCHMARK_asin_novec_double_       49.02     49.16   0.3%
MultiSource/Benchmarks/PAQ8p/paq8p.test           14.21     14.25   0.3%
MicroBench...ambda.test:BM_FIR_LAMBDA/44217       47.24     47.37   0.3%
MultiSourc...lt/CrossingThresholds-flt.test        0.83      0.83   0.3%
MicroBench...rIC2VW4BigLoopWithReductionTC8        4.61      4.62   0.3%
MicroBench...rLoopWithReductionAutoVecTC128        5.54      5.55   0.3%
MicroBench...st:BM_TRIDIAG_ELIM_LAMBDA/5001        6.72      6.73   0.3%
MicroBench...LoopFrom_uint64_t_To_uint16_t_     1808.06   1812.48   0.2%
MicroBench...hForIC1VW1LoopWithReductionTC8        1.63      1.63   0.2%
MicroBench...ForIC4VW4LoopWithReductionTC16        1.55      1.55   0.2%
MultiSourc...ctions-flt/Reductions-flt.test        3.16      3.17   0.2%
MicroBench...ks.test:benchForIC1VW4LoopTC16        1.84      1.85   0.2%
MicroBench...orLoopWithReductionAutoVecTC64        3.12      3.12   0.2%
MicroBench...rIC2VW1BigLoopWithReductionTC3        5.10      5.11   0.2%
MicroBench...ks.test:benchForIC1VW4LoopTC15        2.44      2.44   0.2%
MicroBench...hForIC1VW4LoopWithReductionTC4        1.47      1.47   0.2%
MicroBench..._runtime_checks_needed<4, int>        1.38      1.39   0.2%
MicroBench...s.test:benchForIC2VW4LoopTC128        8.20      8.22   0.2%
MicroBench...test:benchAutoVecForBigLoopTC8        1.34      1.34   0.2%
MicroBench...VW16From_uint64_t_To_uint16_t_      992.38    994.15   0.2%
MicroBench...IC4VW4BigLoopWithReductionTC16        7.19      7.20   0.2%
MicroBench...rIC2VW1BigLoopWithReductionTC1        2.31      2.31   0.2%
MicroBench...ForIC4VW1LoopWithReductionTC31        2.67      2.68   0.2%
MicroBench...IC1VW1BigLoopWithReductionTC63       45.42     45.49   0.2%
MicroBench...BENCHMARK_cbrtf_autovec_float_      189.59    189.90   0.2%
MicroBench...hForIC4VW4LoopWithReductionTC2        1.54      1.54   0.2%
MicroBench...hVW8From_uint32_t_To_uint16_t_      653.82    654.86   0.2%
MicroBench...nLoopFrom_uint32_t_To_uint8_t_     1104.40   1106.06   0.2%
MicroBench...BigLoopWithReductionAutoVecTC8        4.85      4.86   0.1%
MicroBench...t:BM_FIND_FIRST_MIN_LAMBDA/171        0.07      0.07   0.1%
MicroBench...CRaw.test:BM_DISC_ORD_RAW/5001       73.83     73.94   0.1%
MicroBench...ForIC4VW1LoopWithReductionTC15        2.21      2.21   0.1%
MultiSourc...ing-flt/NodeSplitting-flt.test        1.10      1.10   0.1%
MicroBench...BENCHMARK_sinh_autovec_double_      216.16    216.46   0.1%
MicroBench...orLoopWithReductionAutoVecTC31        2.48      2.49   0.1%
MicroBench...ForIC1VW4LoopWithReductionTC31        2.48      2.49   0.1%
MultiSourc...rimaran/enc-3des/enc-3des.test        0.87      0.87   0.1%
MicroBench...hVW8From_uint64_t_To_uint32_t_      826.11    827.12   0.1%
MicroBench...sCRaw.test:BM_PIC_2D_RAW/44217      143.75    143.93   0.1%
MicroBench...st:BM_GEN_LIN_RECUR_LAMBDA/171        0.60      0.60   0.1%
MultiSourc...ks/McCat/01-qbsort/qbsort.test        0.05      0.05   0.1%
MicroBench...ARK_BILINEAR_INTERPOLATION/128      503.63    504.20   0.1%
MicroBench...hForIC4VW4LoopWithReductionTC8        1.98      1.98   0.1%
MicroBench...hForIC2VW4LoopWithReductionTC8        1.63      1.63   0.1%
MicroBench...rIC4VW1BigLoopWithReductionTC8        4.85      4.86   0.1%
MicroBench...rks.test:benchForIC1VW4LoopTC4        1.56      1.56   0.1%
MicroBench...orIC2VW4LoopWithReductionTC128        4.48      4.49   0.1%
MicroBench...ForIC4VW4LoopWithReductionTC63        4.88      4.88   0.1%
MicroBench...rks.test:benchForIC1VW4LoopTC8        1.34      1.34   0.1%
MicroBench...hForIC1VW1LoopWithReductionTC1        1.17      1.17   0.1%
MicroBench...Interchange.test:BENCHMARK_LI1       90.34     90.42   0.1%
MicroBench...t:BENCHMARK_sinh_novec_double_      216.50    216.69   0.1%
MicroBench...IC2VW4BigLoopWithReductionTC16        6.92      6.93   0.1%
MicroBench...thVW8From_uint16_t_To_uint8_t_      390.03    390.35   0.1%
MicroBench....test:BM_DISC_ORD_LAMBDA/44217      653.32    653.83   0.1%
MicroBench...est:benchAutoVecForBigLoopTC15        2.43      2.43   0.1%
MicroBench...est:benchForIC1VW4BigLoopTC127       47.83     47.86   0.1%
MicroBench...st:BENCHMARK_erff_novec_float_       92.21     92.27   0.1%
MicroBench...gLoopWithReductionAutoVecTC127       41.98     42.01   0.1%
MicroBench...:BENCHMARK_cosf_autovec_float_      130.14    130.23   0.1%
MicroBench...meChecks4PointersDBeforeA/1000   579910.87 580187.23   0.0%
MicroBench...st:BENCHMARK_cos_novec_double_      278.54    278.67   0.0%
MicroBench...sCRaw.test:BM_DISC_ORD_RAW/171        2.43      2.43   0.0%
MicroBench...hForIC1VW1LoopWithReductionTC3        1.59      1.59   0.0%
MicroBench...thVW8From_uint8_t_To_uint16_t_      335.19    335.29   0.0%
MicroBench...ks.test:benchForIC2VW4LoopTC15        3.06      3.06   0.0%
MicroBench...t:BENCHMARK_GAUSSIAN_BLUR/1024    21028.34  21034.62   0.0%
MicroBench...orIC1VW4LoopWithReductionTC128        5.54      5.54   0.0%
MicroBench...thVW8From_uint64_t_To_uint8_t_      942.53    942.75   0.0%
MicroBench...s.test:benchAutoVecForLoopTC64        4.39      4.39   0.0%
MicroBench...MARK_BILINEAR_INTERPOLATION/32       30.93     30.93   0.0%
SingleSour...ncils/jacobi-2d/jacobi-2d.test        1.70      1.70   0.0%
MicroBench...IC4VW1BigLoopWithReductionTC16        7.03      7.03   0.0%
SingleSour...ut-C++/Shootout-C++-lists.test        1.51      1.51   0.0%
Bitcode/si..._ops_test_op_blendvpd_300.test        0.01      0.01   0.0%
MultiSourc...enchmarks/Olden/em3d/em3d.test        0.92      0.92  -0.0%
MicroBench...s.test:benchAutoVecForLoopTC31        3.15      3.15  -0.0%
MicroBench...thVW8From_uint8_t_To_uint16_t_      208.73    208.70  -0.0%
MicroBench...IC4VW1BigLoopWithReductionTC63       23.06     23.06  -0.0%
MicroBench...IC4VW4BigLoopWithReductionTC64       20.11     20.10  -0.0%
MicroBench...HMARK_BICUBIC_INTERPOLATION/64      472.87    472.77  -0.0%
MicroBench...s.test:benchAutoVecForLoopTC15        2.44      2.44  -0.0%
MicroBench...timeChecks4PointersDBeforeA/32    16998.63  16994.67  -0.0%
MicroBench...igLoopWithReductionAutoVecTC63       23.05     23.04  -0.0%
MicroBench...rIC1VW4BigLoopWithReductionTC7        5.92      5.91  -0.0%
MicroBench...hVW8From_uint32_t_To_uint16_t_      561.04    560.85  -0.0%
MicroBench...IC1VW4BigLoopWithReductionTC16        7.01      7.01  -0.0%
MicroBench...est:BM_GEN_LIN_RECUR_RAW/44217      154.33    154.27  -0.0%
MicroBench...orIC2VW4LoopWithReductionTC127        5.35      5.35  -0.0%
MicroBench...orIC1VW1LoopWithReductionTC127       15.28     15.27  -0.0%
MicroBench....test:benchForIC4VW4BigLoopTC2        4.35      4.34  -0.0%
MicroBench...C2VW1BigLoopWithReductionTC128       70.52     70.49  -0.0%
MicroBench...ForIC1VW4LoopWithReductionTC16        1.57      1.56  -0.0%
MicroBench...est:benchAutoVecForBigLoopTC63        4.85      4.84  -0.0%
MicroBench...tersAllDisjointIncreasing/1000   272426.58 272284.35  -0.1%
MicroBench...LoopFrom_uint32_t_To_uint16_t_      943.00    942.46  -0.1%
MicroBench...st:BENCHMARK_cosf_novec_float_      129.90    129.83  -0.1%
MicroBench...est:BENCHMARK_HARRIS/1024/1024     5239.66   5236.28  -0.1%
MicroBench...ForLoopWithReductionAutoVecTC1        1.18      1.18  -0.1%
MicroBench...:BENCHMARK_cos_autovec_double_      277.58    277.40  -0.1%
MicroBench...r_runtime_checks_fail<16, int>        3.75      3.75  -0.1%
MicroBench...rks.test:benchForIC4VW4LoopTC7        2.36      2.36  -0.1%
MicroBench...hForIC1VW1LoopWithReductionTC2        1.55      1.55  -0.1%
MicroBench....test:BM_GEN_LIN_RECUR_RAW/171        0.60      0.60  -0.1%
MicroBench....test:BM_INNER_PROD_LAMBDA/171        0.25      0.25  -0.1%
MicroBench...st:BENCHMARK_boxBlurKernel/256      131.53    131.42  -0.1%
MicroBench...C4VW1BigLoopWithReductionTC127       42.22     42.18  -0.1%
MicroBench...ntime_checks_needed<3, double>        2.51      2.51  -0.1%
MicroBench...ForLoopWithReductionAutoVecTC3        1.60      1.60  -0.1%
MicroBench...ks.test:benchForIC2VW4LoopTC31        3.57      3.56  -0.1%
MicroBench...igLoopWithReductionAutoVecTC32       11.64     11.63  -0.1%
MicroBench...s.test:benchAutoVecForLoopTC16        1.79      1.79  -0.1%
MicroBench...ks.test:benchForIC2VW4LoopTC64        4.01      4.00  -0.1%
MicroBench...da.test:BM_DISC_ORD_LAMBDA/171        2.42      2.42  -0.1%
MicroBench...thVW8From_uint64_t_To_uint8_t_     1122.26   1121.04  -0.1%
SingleSour...mple_types_loop_invariant.test        0.65      0.65  -0.1%
SingleSour.../Benchmarks/Misc/oourafft.test        1.12      1.12  -0.1%
MicroBench...t:BENCHMARK_sinhf_novec_float_      185.59    185.39  -0.1%
MicroBench...:BM_FIND_FIRST_MIN_LAMBDA/5001        1.73      1.72  -0.1%
MicroBench...imeChecks4PointersDAfterA/1000   273484.24 273126.55  -0.1%
Bitcode/Be...s/Halide/blur/halide_blur.test        1.18      1.18  -0.1%
MicroBench....test:BM_ENERGY_CALC_RAW/44217      178.47    178.23  -0.1%
MicroBench...est:benchAutoVecForBigLoopTC16        1.80      1.79  -0.1%
MicroBench...test:benchForIC4VW4BigLoopTC16        7.89      7.88  -0.1%
MicroBench...ntime_checks_needed<2, double>        2.52      2.52  -0.1%
MicroBench...BENCHMARK_sinhf_autovec_float_      186.11    185.85  -0.1%
MicroBench...t:BM_GEN_LIN_RECUR_LAMBDA/5001       17.43     17.40  -0.1%
MicroBench...runtime_checks_pass<2, double>        1.68      1.67  -0.1%
MicroBench...st:BENCHMARK_boxBlurKernel/512      542.69    541.87  -0.2%
MicroBench...hVW8From_uint64_t_To_uint32_t_      977.63    976.13  -0.2%
MicroBench...test:benchForIC4VW4BigLoopTC64       23.66     23.62  -0.2%
MicroBench...rIC2VW4BigLoopWithReductionTC3        4.33      4.33  -0.2%
MultiSourc...bl/CrossingThresholds-dbl.test        1.14      1.13  -0.2%
MicroBench...est:benchAutoVecForBigLoopTC32        2.50      2.50  -0.2%
MicroBench...hVW16From_uint64_t_To_uint8_t_      965.96    964.37  -0.2%
MultiSourc...C/Packing-flt/Packing-flt.test        1.27      1.27  -0.2%
MicroBench...runtime_checks_fail<4, double>        2.61      2.61  -0.2%
MicroBench...t:BENCHMARK_boxBlurKernel/1024     2213.67   2209.90  -0.2%
MicroBench...ForIC1VW4LoopWithReductionTC15        2.00      2.00  -0.2%
MicroBench...MARK_BICUBIC_INTERPOLATION/256     8166.95   8152.81  -0.2%
MicroBench...LoopFrom_uint32_t_To_uint16_t_      902.33    900.76  -0.2%
MicroBench...VW16From_uint16_t_To_uint32_t_     1364.86   1362.48  -0.2%
MicroBench...hForIC2VW1LoopWithReductionTC8        1.64      1.63  -0.2%
MicroBench...r_runtime_checks_pass<16, int>        2.90      2.89  -0.2%
MicroBench...HMARK_ANISTROPIC_DIFFUSION/128     4846.62   4837.88  -0.2%
MultiSourc...arks/mafft/pairlocalalign.test        8.26      8.25  -0.2%
MicroBench...thVW8From_uint16_t_To_uint8_t_      211.02    210.61  -0.2%
MultiSourc...s/ASC_Sequoia/AMGmk/AMGmk.test        2.58      2.58  -0.2%
MicroBench...est:benchForIC1VW4BigLoopTC128       46.81     46.71  -0.2%
SingleSour...ncils/seidel-2d/seidel-2d.test       20.94     20.89  -0.2%
MicroBench...ks.test:benchForIC4VW4LoopTC64        3.86      3.85  -0.2%
MultiSourc...C/Packing-dbl/Packing-dbl.test        1.34      1.34  -0.2%
MicroBench...rks.test:benchForIC1VW4LoopTC7        1.98      1.97  -0.2%
MultiSourc...cations/hexxagon/hexxagon.test        3.41      3.40  -0.2%
MicroBench...LoopFrom_uint64_t_To_uint32_t_     1022.97   1020.77  -0.2%
MicroBench...IC2VW4BigLoopWithReductionTC64       20.23     20.19  -0.2%
MultiSourc...ences-dbl/Recurrences-dbl.test        1.74      1.73  -0.2%
MicroBench...ntimeChecks4PointersDAfterA/32     9283.91   9263.51  -0.2%
MicroBench...CLambda.test:BM_ADI_LAMBDA/171        1.93      1.92  -0.2%
MicroBench...test:BM_GEN_LIN_RECUR_RAW/5001       17.42     17.39  -0.2%
MicroBench...:BENCHMARK_expf_autovec_float_       53.21     53.09  -0.2%
MicroBench...lsCRaw.test:BM_PIC_1D_RAW/5001       21.66     21.61  -0.2%
MicroBench...BM_FIND_FIRST_MIN_LAMBDA/44217       15.11     15.07  -0.2%
MicroBench...ForIC4VW4LoopWithReductionTC15        3.24      3.23  -0.2%
MicroBench...rIC1VW4BigLoopWithReductionTC8        4.85      4.84  -0.2%
MicroBench...est:BENCHMARK_HARRIS/2048/2048    16876.35  16836.49  -0.2%
MicroBench...nLoopFrom_uint8_t_To_uint16_t_      197.13    196.65  -0.2%
MicroBench...igLoopWithReductionAutoVecTC31       12.63     12.60  -0.2%
MicroBench....test:benchAutoVecForLoopTC127        9.31      9.29  -0.2%
MicroBench...MARK_BILINEAR_INTERPOLATION/16        7.61      7.59  -0.2%
MicroBench...lsBRaw.test:BM_IF_QUAD_RAW/171        0.45      0.44  -0.2%
MicroBench...est:BM_IMP_HYDRO_2D_LAMBDA/171        2.73      2.72  -0.3%
MicroBench...LoopFrom_uint64_t_To_uint32_t_     1793.98   1789.34  -0.3%
MicroBench...hForIC2VW4LoopWithReductionTC3        1.61      1.61  -0.3%
MicroBench.../lcalsCRaw.test:BM_ADI_RAW/171        1.96      1.95  -0.3%
MicroBench...orLoopWithReductionAutoVecTC15        2.00      2.00  -0.3%
SingleSour...hootout/Shootout-heapsort.test        1.21      1.21  -0.3%
MicroBench...s.test:benchForIC1VW4LoopTC127        9.41      9.38  -0.3%
MicroBench...ForIC1VW1LoopWithReductionTC31        3.36      3.35  -0.3%
MultiSource/Applications/lua/lua.test              5.90      5.89  -0.3%
MicroBench...:BENCHMARK_sinf_autovec_float_      128.65    128.31  -0.3%
MicroBench...hVW16From_uint16_t_To_uint8_t_      195.38    194.84  -0.3%
MicroBench...t:BM_DIFF_PREDICT_LAMBDA/44217      262.04    261.31  -0.3%
MicroBench...C1VW4BigLoopWithReductionTC127       42.05     41.93  -0.3%
MicroBench...ks.test:benchAutoVecForLoopTC4        1.56      1.56  -0.3%
MicroBench...VW16From_uint64_t_To_uint32_t_     1057.26   1054.32  -0.3%
MicroBench...IC2VW1BigLoopWithReductionTC15        9.31      9.28  -0.3%
MicroBench...MemCmp<2, LessThanZero, First>     1045.04   1042.07  -0.3%
MicroBench...hForIC4VW1LoopWithReductionTC1        1.19      1.19  -0.3%
MicroBench...st:BENCHMARK_GAUSSIAN_BLUR/256     1254.58   1250.88  -0.3%
MicroBench...orLoopWithReductionAutoVecTC16        1.57      1.56  -0.3%
MicroBench...r_runtime_checks_pass<16, int>        3.81      3.80  -0.3%
MicroBench...a.test:BM_HYDRO_1D_LAMBDA/5001        1.34      1.34  -0.3%
MicroBench...s.test:benchAutoVecForLoopTC63        4.84      4.83  -0.3%
MicroBench...ks.test:benchForIC4VW4LoopTC63        6.91      6.89  -0.3%
MicroBench...mCmp<64, GreaterThanZero, Mid>      114.15    113.79  -0.3%
MicroBench...IC2VW4BigLoopWithReductionTC31       13.50     13.46  -0.3%
MicroBench...ForIC4VW1LoopWithReductionTC64        3.13      3.12  -0.3%
MicroBench...ForIC1VW4LoopWithReductionTC64        3.13      3.12  -0.3%
MicroBench...ntime_checks_needed<4, double>        2.56      2.55  -0.3%
MicroBench...IC4VW4BigLoopWithReductionTC32       11.51     11.48  -0.3%
MicroBench...st:BENCHMARK_GAUSSIAN_BLUR/128      295.89    294.93  -0.3%
MicroBench...ForIC4VW1LoopWithReductionTC63        3.67      3.66  -0.3%
MicroBench...ForIC2VW1LoopWithReductionTC31        5.06      5.05  -0.3%
MicroBench...hForIC2VW1LoopWithReductionTC3        1.64      1.64  -0.3%
MicroBench....test:benchForIC2VW4BigLoopTC2        4.36      4.35  -0.3%
MicroBench...tersAllDisjointDecreasing/1000   272675.61 271760.73  -0.3%
MicroBench...IC1VW4BigLoopWithReductionTC63       23.01     22.93  -0.3%
MicroBench...IC4VW1BigLoopWithReductionTC64       21.02     20.95  -0.3%
MicroBench...hForIC1VW4LoopWithReductionTC3        1.61      1.60  -0.3%
MicroBench...est:benchForIC4VW4BigLoopTC127       52.93     52.75  -0.3%
MicroBench...HMARK_ANISTROPIC_DIFFUSION/256    20215.46  20144.85  -0.3%
MicroBench...meChecks4PointersDEqualsA/1000   272955.96 271993.66  -0.4%
MicroBench...hVW16From_uint8_t_To_uint16_t_      204.56    203.83  -0.4%
MicroBench...rks.test:benchForIC2VW4LoopTC7        2.36      2.35  -0.4%
MicroBench...ks.test:benchForIC2VW4LoopTC16        1.72      1.72  -0.4%
MicroBench...hVW16From_uint64_t_To_uint8_t_      946.64    943.24  -0.4%
MicroBench...IC4VW1BigLoopWithReductionTC15        8.60      8.57  -0.4%
MicroBench...da.test:BM_PIC_1D_LAMBDA/44217      208.60    207.85  -0.4%
MicroBench...C2VW4BigLoopWithReductionTC127       42.24     42.09  -0.4%
MicroBench...st:BENCHMARK_boxBlurKernel/128       30.85     30.74  -0.4%
MicroBench...M_MemCmp<4, LessThanZero, Mid>      667.60    665.13  -0.4%
MicroBench...C4VW4BigLoopWithReductionTC127       46.44     46.26  -0.4%
MicroBench...BENCHMARK_cbrt_autovec_double_      197.26    196.52  -0.4%
MicroBench...hVW16From_uint32_t_To_uint8_t_      442.40    440.74  -0.4%
MicroBench...IC1VW4BigLoopWithReductionTC64       21.00     20.92  -0.4%
MicroBench...IC1VW4BigLoopWithReductionTC31       12.63     12.59  -0.4%
MicroBench...ForIC2VW4LoopWithReductionTC63        3.74      3.73  -0.4%
MicroBench...BigLoopWithReductionAutoVecTC7        6.07      6.05  -0.4%
MicroBench...rIC4VW4BigLoopWithReductionTC4        4.98      4.97  -0.4%
MicroBench...test:BM_INNER_PROD_LAMBDA/5001        7.82      7.79  -0.4%
MicroBench...C2VW1BigLoopWithReductionTC127       70.27     70.00  -0.4%
MicroBench...ks.test:benchForIC4VW4LoopTC15        4.05      4.03  -0.4%
MicroBench...or_runtime_checks_fail<4, int>        1.68      1.67  -0.4%
MicroBench...rIC2VW1BigLoopWithReductionTC8        6.37      6.34  -0.4%
MicroBench...hForIC2VW4LoopWithReductionTC4        1.65      1.64  -0.4%
MicroBench...C1VW1BigLoopWithReductionTC128       91.60     91.24  -0.4%
MicroBench...gLoopWithReductionAutoVecTC128       40.71     40.54  -0.4%
MicroBench...VW16From_uint64_t_To_uint16_t_      889.04    885.48  -0.4%
MicroBench...ForLoopWithReductionAutoVecTC7        1.77      1.76  -0.4%
MicroBench...s.test:benchAutoVecForLoopTC32        2.51      2.50  -0.4%
MicroBench...ForIC2VW4LoopWithReductionTC64        2.59      2.58  -0.4%
MicroBench...Cmp<64, GreaterThanZero, Last>      111.39    110.93  -0.4%
MicroBench...BigLoopWithReductionAutoVecTC4        3.89      3.88  -0.4%
MicroBench...IC4VW4BigLoopWithReductionTC15       11.38     11.33  -0.4%
MicroBench...IC2VW1BigLoopWithReductionTC16        9.67      9.63  -0.4%
MicroBench...ForIC1VW1LoopWithReductionTC16        1.76      1.75  -0.4%
MicroBench..._runtime_checks_needed<4, int>        1.35      1.35  -0.4%
MicroBench...thVW8From_uint32_t_To_uint8_t_      449.86    447.92  -0.4%
MicroBench...da.test:BM_HYDRO_1D_LAMBDA/171        0.05      0.05  -0.4%
MicroBench...ForIC1VW1LoopWithReductionTC64        6.17      6.14  -0.4%
MicroBench...test:benchAutoVecForBigLoopTC4        1.57      1.56  -0.4%
MicroBench...hVW16From_uint8_t_To_uint16_t_      286.26    284.97  -0.5%
MicroBench....test:benchForIC4VW4BigLoopTC7        7.57      7.54  -0.5%
MicroBench...hForIC2VW1LoopWithReductionTC4        1.35      1.35  -0.5%
MicroBench...hForIC4VW4LoopWithReductionTC3        1.60      1.60  -0.5%
SingleSour...hootout/Shootout-methcall.test        1.78      1.77  -0.5%
Bitcode/si...imd_ops_test_op_divpd_193.test        0.01      0.01  -0.5%
MicroBench...ForIC2VW4LoopWithReductionTC16        1.52      1.51  -0.5%
MicroBench...test:benchForIC1VW4BigLoopTC63       25.77     25.65  -0.5%
MicroBench...hForIC1VW4LoopWithReductionTC7        1.77      1.76  -0.5%
MicroBench...igLoopWithReductionAutoVecTC64       21.03     20.93  -0.5%
MicroBench...thVW8From_uint32_t_To_uint8_t_      591.20    588.43  -0.5%
MicroBench...intersAllDisjointDecreasing/32     9268.36   9224.84  -0.5%
MicroBench...st:BENCHMARK_GAUSSIAN_BLUR/512     5180.87   5156.52  -0.5%
Bitcode/si...imd_ops_test_op_minpd_211.test        0.01      0.01  -0.5%
MicroBench...timeChecks4PointersDEqualsA/32     9289.42   9245.56  -0.5%
MultiSource/Benchmarks/Olden/bh/bh.test            0.66      0.66  -0.5%
MicroBench...IC4VW1BigLoopWithReductionTC32       11.70     11.64  -0.5%
MicroBench....test:BM_PLANCKIAN_LAMBDA/5001       22.54     22.43  -0.5%
MicroBench...IC2VW1BigLoopWithReductionTC64       35.91     35.73  -0.5%
MicroBench...nLoopFrom_uint32_t_To_uint8_t_     1051.49   1046.38  -0.5%
MicroBench...nLoopFrom_uint16_t_To_uint8_t_      205.79    204.79  -0.5%
MicroBench...C2VW4BigLoopWithReductionTC128       38.58     38.39  -0.5%
MicroBench...ForIC2VW1LoopWithReductionTC64        8.37      8.33  -0.5%
Bitcode/si...imd_ops_test_op_addps_165.test        0.01      0.01  -0.5%
MicroBench...C4VW1BigLoopWithReductionTC128       40.78     40.58  -0.5%
MicroBench...hForIC2VW4LoopWithReductionTC1        1.18      1.17  -0.5%
MicroBench...runtime_checks_fail<2, double>        2.44      2.43  -0.5%
MicroBench...lsCRaw.test:BM_PIC_2D_RAW/5001       15.08     15.01  -0.5%
MicroBench...t:BM_TRIDIAG_ELIM_LAMBDA/44217       60.28     59.97  -0.5%
MicroBench...ks.test:benchAutoVecForLoopTC8        1.35      1.34  -0.5%
MicroBench...hForIC4VW4LoopWithReductionTC1        1.18      1.17  -0.5%
MicroBench...igLoopWithReductionAutoVecTC16        7.05      7.02  -0.5%
MicroBench...ks.test:benchAutoVecForLoopTC7        1.98      1.97  -0.5%
MicroBench...LoopFrom_uint64_t_To_uint16_t_     1304.68   1298.00  -0.5%
MicroBench...t:BM_MemCmp<15, EqZero, First>      154.09    153.30  -0.5%
MicroBench..._MemCmp<5, LessThanZero, Last>      452.53    450.20  -0.5%
MultiSourc...netbench-url/netbench-url.test        1.92      1.91  -0.5%
SingleSour...enchmarks/SmallPT/smallpt.test        2.68      2.66  -0.5%
MicroBench...hForIC2VW1LoopWithReductionTC2        1.51      1.50  -0.5%
MicroBench....test:benchForIC1VW4BigLoopTC4        3.83      3.81  -0.5%
MicroBench...test:benchForIC4VW4BigLoopTC32       12.22     12.15  -0.5%
MicroBench....test:benchForIC4VW4BigLoopTC8        8.32      8.28  -0.5%
MicroBench...est:benchForIC2VW4BigLoopTC127       48.95     48.68  -0.6%
MicroBench...s.test:benchForIC4VW4LoopTC127       10.43     10.37  -0.6%
MicroBench...MARK_BICUBIC_INTERPOLATION/128     1985.86   1974.79  -0.6%
MicroBench...sBRaw.test:BM_TRAP_INT_RAW/171        0.34      0.33  -0.6%
MicroBench...thVW8From_uint8_t_To_uint32_t_      665.28    661.56  -0.6%
MultiSourc...itBench/uuencode/uuencode.test        0.01      0.01  -0.6%
MicroBench...hForIC4VW1LoopWithReductionTC3        1.63      1.63  -0.6%
MicroBench...rIC1VW1BigLoopWithReductionTC4        4.79      4.76  -0.6%
MicroBench...st:BM_IMP_HYDRO_2D_LAMBDA/5001       87.60     87.11  -0.6%
MicroBench...orIC4VW1LoopWithReductionTC127        9.15      9.09  -0.6%
MicroBench...s.test:benchForIC4VW4LoopTC128        7.16      7.12  -0.6%
MicroBench...VW16From_uint64_t_To_uint32_t_     1547.56   1538.80  -0.6%
MicroBench...st:BM_DIFF_PREDICT_LAMBDA/5001       20.08     19.97  -0.6%
MicroBench...rIC4VW4BigLoopWithReductionTC8        7.11      7.07  -0.6%
MicroBench...test:BM_PLANCKIAN_LAMBDA/44217      199.66    198.52  -0.6%
MicroBench...da.test:BM_PIC_2D_LAMBDA/44217      143.40    142.57  -0.6%
MicroBench...hVW8From_uint64_t_To_uint16_t_      966.04    960.44  -0.6%
MicroBench...est:BM_INNER_PROD_LAMBDA/44217       69.44     69.03  -0.6%
MicroBench...rks.test:benchForIC4VW4LoopTC8        2.57      2.55  -0.6%
MicroBench...orIC1VW1LoopWithReductionTC128       15.18     15.09  -0.6%
MicroBench....test:benchAutoVecForLoopTC128        8.61      8.56  -0.6%
MicroBench...IC2VW1BigLoopWithReductionTC32       18.60     18.49  -0.6%
MicroBench....test:BM_TRIDIAG_ELIM_RAW/5001        6.74      6.70  -0.6%
MicroBench...hForIC4VW1LoopWithReductionTC2        1.58      1.57  -0.6%
MicroBench...rIC4VW4BigLoopWithReductionTC7        6.88      6.84  -0.6%
MicroBench...ForIC1VW4LoopWithReductionTC32        2.03      2.02  -0.6%
SingleSour...hmarks/Misc-C++/Large/ray.test        1.49      1.48  -0.6%
SingleSour...Adobe-C++/functionobjects.test        1.80      1.79  -0.6%
MicroBench...est:benchForIC2VW4BigLoopTC128       45.02     44.74  -0.6%
MicroBench...HMARK_BICUBIC_INTERPOLATION/32      107.51    106.84  -0.6%
MicroBench...MemCmp<64, LessThanZero, Last>      108.72    108.05  -0.6%
MicroBench...rIC1VW1BigLoopWithReductionTC8        7.10      7.06  -0.6%
MicroBench....test:benchForIC2VW4BigLoopTC7        7.56      7.51  -0.6%
MicroBench...est:benchForIC4VW4BigLoopTC128       45.22     44.93  -0.6%
MicroBench...IC2VW1BigLoopWithReductionTC63       35.47     35.25  -0.6%
MicroBench...rIC2VW4BigLoopWithReductionTC2        3.48      3.46  -0.6%
MultiSourc...t/StatementReordering-flt.test        1.14      1.14  -0.7%
MicroBench...IC4VW4BigLoopWithReductionTC63       26.75     26.58  -0.7%
MicroBench...nLoopFrom_uint16_t_To_uint8_t_      371.52    369.07  -0.7%
MultiSourc...lications/sqlite3/sqlite3.test        0.98      0.97  -0.7%
MicroBench...or_runtime_checks_fail<4, int>        1.56      1.55  -0.7%
MicroBench...hForIC2VW1LoopWithReductionTC1        1.21      1.20  -0.7%
MicroBench...or_runtime_checks_pass<4, int>        1.60      1.59  -0.7%
MicroBench...test:benchForIC2VW4BigLoopTC16        7.98      7.93  -0.7%
MicroBench...orIC4VW4LoopWithReductionTC128        4.20      4.17  -0.7%
MicroBench...BigLoopWithReductionAutoVecTC2        3.41      3.38  -0.7%
MicroBench...test:benchForIC2VW4BigLoopTC64       23.68     23.52  -0.7%
MicroBench...ks.test:benchForIC1VW4LoopTC31        3.16      3.14  -0.7%
MicroBench...test:benchForIC4VW4BigLoopTC31       20.41     20.27  -0.7%
MicroBench...hVW8From_uint64_t_To_uint16_t_      863.88    857.84  -0.7%
MicroBench...igLoopWithReductionAutoVecTC15        8.61      8.55  -0.7%
MicroBench...nLoopFrom_uint64_t_To_uint8_t_     3130.04   3107.98  -0.7%
MicroBench...rIC2VW1BigLoopWithReductionTC4        4.73      4.70  -0.7%
MultiSourc...oxyApps-C++/miniFE/miniFE.test        1.33      1.32  -0.7%
MicroBench...st:benchAutoVecForBigLoopTC127        9.27      9.21  -0.7%
MicroBench...ForIC2VW1LoopWithReductionTC63        8.23      8.17  -0.7%
MicroBench...ForIC1VW1LoopWithReductionTC63        6.48      6.44  -0.7%
MicroBench...nLoopFrom_uint64_t_To_uint8_t_     4199.44   4169.26  -0.7%
MicroBench...:BM_GEN_LIN_RECUR_LAMBDA/44217      155.08    153.96  -0.7%
MicroBench...hVW16From_uint32_t_To_uint8_t_      492.34    488.79  -0.7%
MicroBench...hForIC4VW4LoopWithReductionTC7        1.78      1.76  -0.7%
MicroBench...hForIC1VW4LoopWithReductionTC8        1.63      1.62  -0.7%
MicroBench...rIC2VW4BigLoopWithReductionTC4        5.01      4.97  -0.7%
MicroBench...runtime_checks_pass<4, double>        1.78      1.77  -0.7%
MicroBench...s.test:benchForIC1VW4LoopTC128        8.73      8.66  -0.7%
MicroBench...rIC1VW1BigLoopWithReductionTC2        3.36      3.34  -0.7%
MicroBench...IC2VW4BigLoopWithReductionTC15        9.90      9.82  -0.7%
MicroBench...Cmp<64, GreaterThanZero, None>      103.75    102.96  -0.8%
MicroBench...rIC1VW1BigLoopWithReductionTC3        4.22      4.19  -0.8%
MicroBench...sBRaw.test:BM_IF_QUAD_RAW/5001       12.99     12.90  -0.8%
SingleSour...d-warshall/floyd-warshall.test       15.26     15.14  -0.8%
MicroBench...IC4VW1BigLoopWithReductionTC31       12.69     12.60  -0.8%
MicroBench...a.test:BM_PLANCKIAN_LAMBDA/171        0.77      0.76  -0.8%
MicroBench...or_runtime_checks_pass<4, int>        1.69      1.68  -0.8%
MicroBench...ForLoopWithReductionAutoVecTC8        1.63      1.62  -0.8%
MicroBench...hVW16From_uint8_t_To_uint64_t_     1779.41   1765.69  -0.8%
MicroBench...IC1VW4BigLoopWithReductionTC32       11.66     11.57  -0.8%
MicroBench...ForIC1VW4LoopWithReductionTC63        3.54      3.51  -0.8%
MicroBench....test:benchForIC1VW4BigLoopTC2        4.35      4.32  -0.8%
MicroBench...hForIC2VW1LoopWithReductionTC7        1.75      1.74  -0.8%
MicroBench...IC2VW1BigLoopWithReductionTC31       18.28     18.14  -0.8%
MicroBench...st:benchAutoVecForBigLoopTC128        8.76      8.69  -0.8%
MicroBench...test:benchForIC4VW4BigLoopTC15       15.51     15.39  -0.8%
MicroBench...IC2VW4BigLoopWithReductionTC63       23.94     23.75  -0.8%
MicroBench...BigLoopWithReductionAutoVecTC3        4.35      4.32  -0.8%
MicroBench...test:benchForIC1VW4BigLoopTC32       12.66     12.56  -0.8%
MicroBench...test:benchAutoVecForBigLoopTC7        1.98      1.96  -0.8%
MicroBench...t:BENCHMARK_atan_novec_double_      120.09    119.12  -0.8%
MicroBench...emCmp<8, GreaterThanZero, Mid>      396.59    393.39  -0.8%
MicroBench...test:benchForIC1VW4BigLoopTC64       24.46     24.26  -0.8%
MicroBench...hForIC1VW1LoopWithReductionTC7        1.66      1.65  -0.8%
MicroBench...HMARK_BICUBIC_INTERPOLATION/16       21.64     21.46  -0.8%
MicroBench...thVW8From_uint8_t_To_uint64_t_     1173.78   1164.29  -0.8%
MicroBench...st:BM_FIND_FIRST_MIN_RAW/44217       15.01     14.89  -0.8%
MicroBench....test:BM_DIFF_PREDICT_RAW/5001       20.18     20.01  -0.8%
MicroBench...CHMARK_ANISTROPIC_DIFFUSION/64     1117.83   1108.63  -0.8%
MicroBench...test:benchForIC1VW4BigLoopTC16        8.34      8.27  -0.8%
MicroBench...orLoopWithReductionAutoVecTC63        3.54      3.51  -0.8%
MicroBench...rIC4VW1BigLoopWithReductionTC4        3.91      3.88  -0.8%
Bitcode/si...d_ops_test_op_cmpltps_137.test        0.01      0.01  -0.8%
MicroBench...IC1VW1BigLoopWithReductionTC15       11.65     11.55  -0.8%
MicroBench....test:benchForIC1VW4BigLoopTC7        6.71      6.65  -0.9%
MicroBench...bda.test:BM_PIC_2D_LAMBDA/5001       15.21     15.08  -0.9%
MultiSourc...tions/lambda-0.1.3/lambda.test        1.41      1.40  -0.9%
MicroBench...IC1VW1BigLoopWithReductionTC16       12.38     12.27  -0.9%
MicroBench...IC1VW1BigLoopWithReductionTC32       25.15     24.92  -0.9%
MicroBench...ForIC2VW4LoopWithReductionTC15        2.75      2.72  -0.9%
MicroBench...CRaw.test:BM_PLANCKIAN_RAW/171        0.77      0.76  -0.9%
MicroBench..._MemCmp<64, LessThanZero, Mid>      109.49    108.52  -0.9%
MicroBench...test:BM_TRIDIAG_ELIM_RAW/44217       60.37     59.83  -0.9%
MicroBench...IC2VW4BigLoopWithReductionTC32       11.44     11.33  -0.9%
MicroBench....test:benchForIC1VW4BigLoopTC8        5.61      5.55  -0.9%
MicroBench...ForIC4VW4LoopWithReductionTC32        1.79      1.78  -0.9%
MicroBench...C1VW4BigLoopWithReductionTC128       40.70     40.33  -0.9%
MultiSourc.../Applications/SPASS/SPASS.test        3.27      3.24  -0.9%
MicroBench...ForIC2VW1LoopWithReductionTC16        2.68      2.65  -0.9%
MicroBench...IC1VW1BigLoopWithReductionTC31       24.53     24.31  -0.9%
MicroBench...rIC2VW1BigLoopWithReductionTC7        6.12      6.06  -0.9%
MicroBench...orIC4VW4LoopWithReductionTC127        6.22      6.16  -0.9%
MicroBench...runtime_checks_needed<16, int>        1.60      1.59  -0.9%
MicroBench...MemCmp<32, LessThanZero, None>      191.63    189.81  -0.9%
MicroBench...MemCmp<16, LessThanZero, None>      231.21    228.99  -1.0%
Bitcode/si..._ops_test_op_packuswb_204.test        0.01      0.01  -1.0%
MicroBench...hVW8From_uint16_t_To_uint32_t_     1378.11   1364.81  -1.0%
MultiSourc...arching-dbl/Searching-dbl.test        1.57      1.56  -1.0%
MicroBench...rIC1VW4BigLoopWithReductionTC4        3.90      3.86  -1.0%
MicroBench...est:BM_FIND_FIRST_MIN_RAW/5001        1.70      1.69  -1.0%
MicroBench...r_runtime_checks_fail<16, int>        2.93      2.90  -1.0%
MicroBench.../lcalsCRaw.test:BM_EOS_RAW/171        0.13      0.13  -1.0%
MicroBench...est:benchAutoVecForBigLoopTC64        4.42      4.37  -1.0%
MicroBench...test:benchForIC2VW4BigLoopTC15       11.07     10.96  -1.0%
MicroBench...est:BM_TRIDIAG_ELIM_LAMBDA/171        0.17      0.17  -1.0%
MicroBench...MemCmp<15, LessThanZero, None>      247.99    245.52  -1.0%
MicroBench...runtime_checks_fail<3, double>        2.53      2.50  -1.0%
MicroBench...rIC4VW4BigLoopWithReductionTC3        4.36      4.32  -1.0%
MicroBench...ForLoopWithReductionAutoVecTC4        1.47      1.46  -1.0%
MicroBench...rIC1VW1BigLoopWithReductionTC7        6.25      6.19  -1.0%
MicroBench...Cmp<32, GreaterThanZero, Last>      168.71    167.01  -1.0%
MicroBench...IC1VW4BigLoopWithReductionTC15        8.61      8.53  -1.0%
MicroBench....test:benchForIC4VW4BigLoopTC4        5.67      5.61  -1.0%
MicroBench...alsCRaw.test:BM_PIC_1D_RAW/171        0.62      0.62  -1.0%
MicroBench...test:BM_FIND_FIRST_MIN_RAW/171        0.07      0.06  -1.0%
MultiSourc...ing-dbl/Equivalencing-dbl.test        0.43      0.42  -1.0%
MicroBench...IC1VW1BigLoopWithReductionTC64       44.64     44.18  -1.0%
MicroBench...est:BM_MemCmp<4, EqZero, Last>      476.21    471.24  -1.0%
MicroBench...rIC4VW1BigLoopWithReductionTC3        4.36      4.31  -1.0%
MicroBench...mCmp<2, GreaterThanZero, None>     1335.92   1321.93  -1.0%
SingleSour.../stencils/heat-3d/heat-3d.test        1.38      1.37  -1.1%
MicroBench...est:BM_MemCmp<64, EqZero, Mid>      112.07    110.87  -1.1%
MicroBench...st:BM_MemCmp<63, EqZero, None>      112.74    111.54  -1.1%
MicroBench...Cmp<32, GreaterThanZero, None>      200.62    198.47  -1.1%
MicroBench...hVW16From_uint8_t_To_uint64_t_     1167.15   1154.62  -1.1%
MicroBench..._MemCmp<31, LessThanZero, Mid>      226.92    224.47  -1.1%
MicroBench...test:benchForIC2VW4BigLoopTC63       27.65     27.35  -1.1%
MicroBench...emCmp<16, LessThanZero, First>      179.83    177.88  -1.1%
MicroBench...mbda.test:BM_PIC_1D_LAMBDA/171        0.62      0.62  -1.1%
MultiSourc...arching-flt/Searching-flt.test        1.53      1.52  -1.1%
MicroBench...rLoopWithReductionAutoVecTC127        5.94      5.88  -1.1%
MicroBench...t:BM_MemCmp<16, EqZero, First>      127.45    126.06  -1.1%
MultiSourc...oxyApps-C/miniGMG/miniGMG.test        0.32      0.31  -1.1%
MicroBench...st:BM_MemCmp<4, EqZero, First>      477.01    471.78  -1.1%
MicroBench..._MemCmp<6, LessThanZero, Last>      608.91    602.22  -1.1%
MicroBench...orIC1VW4LoopWithReductionTC127        5.99      5.92  -1.1%
MicroBench...rIC2VW4BigLoopWithReductionTC7        6.90      6.83  -1.1%
MicroBench...hVW16From_uint8_t_To_uint32_t_      749.21    740.92  -1.1%
MicroBench...st:BM_MemCmp<31, EqZero, Last>      104.67    103.50  -1.1%
MicroBench....test:benchForIC1VW4BigLoopTC1        3.07      3.04  -1.1%
MicroBench...mp<31, GreaterThanZero, First>      232.88    230.28  -1.1%
MicroBench...test:benchForIC4VW4BigLoopTC63       31.24     30.90  -1.1%
Bitcode/si...imd_ops_test_op_paddd_143.test        0.01      0.01  -1.1%
MicroBench...st:BM_MemCmp<16, EqZero, Last>      126.13    124.71  -1.1%
MicroBench...hForIC4VW4LoopWithReductionTC4        1.61      1.59  -1.1%
MicroBench...nLoopFrom_uint8_t_To_uint32_t_      826.76    817.46  -1.1%
MicroBench...w.test:BM_TRIDIAG_ELIM_RAW/171        0.17      0.17  -1.1%
MicroBench...CRaw.test:BM_FIRST_SUM_RAW/171        0.06      0.05  -1.1%
MicroBench...Raw.test:BM_PLANCKIAN_RAW/5001       22.61     22.35  -1.1%
MicroBench....test:benchForIC4VW4BigLoopTC1        3.09      3.05  -1.1%
MicroBench...MemCmp<15, LessThanZero, Last>      280.74    277.53  -1.1%
MicroBench...MemCmp<63, LessThanZero, Last>      118.56    117.19  -1.1%
MicroBench...t:BM_MemCmp<31, EqZero, First>      104.40    103.19  -1.2%
MicroBench...rIC1VW4BigLoopWithReductionTC3        4.35      4.30  -1.2%
MicroBench...Cmp<16, GreaterThanZero, None>      259.62    256.62  -1.2%
MicroBench...st:BM_MemCmp<63, EqZero, Last>      121.61    120.19  -1.2%
MicroBench....test:benchForIC2VW4BigLoopTC1        3.04      3.01  -1.2%
SingleSour...out-C++/Shootout-C++-fibo.test        0.84      0.83  -1.2%
SingleSour...e/Benchmarks/Misc/salsa20.test        2.61      2.58  -1.2%
MicroBench...M_MemCmp<5, LessThanZero, Mid>      477.26    471.67  -1.2%
MicroBench...bda.test:BM_PIC_1D_LAMBDA/5001       21.81     21.55  -1.2%
MicroBench...MemCmp<32, LessThanZero, Last>      157.26    155.42  -1.2%
MicroBench...ForIC2VW4LoopWithReductionTC32        1.82      1.80  -1.2%
MicroBench...rks.test:benchForIC4VW4LoopTC4        1.72      1.69  -1.2%
MicroBench...a.test:BM_HYDRO_2D_LAMBDA/5001      101.77    100.57  -1.2%
MicroBench...mCmp<16, GreaterThanZero, Mid>      288.67    285.25  -1.2%
MicroBench...nLoopFrom_uint8_t_To_uint64_t_     2264.96   2238.02  -1.2%
MicroBench..._MemCmp<6, LessThanZero, None>      513.71    507.60  -1.2%
SingleSour...-C++/stepanov_abstraction.test        1.66      1.64  -1.2%
MicroBench...aw.test:BM_INNER_PROD_RAW/5001        7.90      7.81  -1.2%
MicroBench...aw.test:BM_MAT_X_MAT_RAW/44217   107670.77 106382.50  -1.2%
MicroBench...Cmp<4, GreaterThanZero, First>      672.38    664.34  -1.2%
MicroBench..._MemCmp<5, LessThanZero, None>      445.69    440.35  -1.2%
MicroBench...nLoopFrom_uint8_t_To_uint16_t_      336.77    332.71  -1.2%
MicroBench...hVW16From_uint16_t_To_uint8_t_      256.00    252.90  -1.2%
MicroBench...:BENCHMARK_exp_autovec_double_       97.84     96.65  -1.2%
MultiSourc...oops-flt/ControlLoops-flt.test        1.00      0.99  -1.2%
MicroBench...mp<64, GreaterThanZero, First>       91.19     90.08  -1.2%
MicroBench...nLoopFrom_uint8_t_To_uint64_t_     2383.40   2354.25  -1.2%
MicroBench...runtime_checks_needed<16, int>        1.49      1.47  -1.2%
MicroBench...VW16From_uint32_t_To_uint16_t_      683.87    675.48  -1.2%
MultiSource/Benchmarks/sim/sim.test                1.56      1.54  -1.2%
MicroBench...est:BM_MemCmp<7, EqZero, Last>      296.93    293.28  -1.2%
MicroBench...Cmp<31, GreaterThanZero, Last>      223.09    220.34  -1.2%
MicroBench...st:BM_MemCmp<64, EqZero, Last>      112.02    110.64  -1.2%
MicroBench...st:BM_MemCmp<32, EqZero, Last>       88.79     87.70  -1.2%
MicroBench...st:BM_MemCmp<64, EqZero, None>      102.91    101.64  -1.2%
MicroBench...BRaw.test:BM_MULADDSUB_RAW/171        0.06      0.06  -1.2%
MicroBench...hVW8From_uint16_t_To_uint64_t_     1197.09   1182.17  -1.2%
MicroBench..._MemCmp<8, LessThanZero, Last>      398.46    393.45  -1.3%
MicroBench...st:BM_MemCmp<32, EqZero, None>       87.72     86.61  -1.3%
MicroBench...st:BM_MemCmp<6, EqZero, First>      350.22    345.78  -1.3%
MicroBench...nLoopFrom_uint8_t_To_uint32_t_      577.01    569.70  -1.3%
MicroBench...test:benchForIC2VW4BigLoopTC31       16.10     15.89  -1.3%
MicroBench...emCmp<3, GreaterThanZero, Mid>     1061.71   1048.19  -1.3%
MicroBench....test:BM_IMP_HYDRO_2D_RAW/5001       88.67     87.54  -1.3%
MicroBench...C1VW1BigLoopWithReductionTC127       91.96     90.79  -1.3%
MicroBench...Cmp<31, GreaterThanZero, None>      222.14    219.31  -1.3%
MicroBench...mCmp<32, GreaterThanZero, Mid>      169.64    167.48  -1.3%
MicroBench..._MemCmp<7, LessThanZero, Last>      514.79    508.17  -1.3%
MicroBench...hForIC4VW1LoopWithReductionTC4        1.48      1.46  -1.3%
MicroBench....test:benchForIC2VW4BigLoopTC8        5.43      5.36  -1.3%
MicroBench..._MemCmp<63, LessThanZero, Mid>       85.56     84.44  -1.3%
MicroBench...MemCmp<63, LessThanZero, None>      112.75    111.28  -1.3%
MicroBench...orLoopWithReductionAutoVecTC32        2.00      1.98  -1.3%
MicroBench...ambda.test:BM_EOS_LAMBDA/44217       35.32     34.86  -1.3%
MicroBench...test:BM_IMP_HYDRO_2D_RAW/44217      791.82    781.43  -1.3%
MicroBench...mp<32, GreaterThanZero, First>      168.74    166.53  -1.3%
MicroBench...test:BM_INT_PREDICT_LAMBDA/171        0.28      0.27  -1.3%
MicroBench...BENCHMARK_ORDERED_DITHER/512/4      556.46    549.13  -1.3%
MicroBench...Cmp<7, GreaterThanZero, First>      400.42    395.14  -1.3%
MultiSourc...pansion-dbl/Expansion-dbl.test        1.03      1.02  -1.3%
MicroBench...Lambda.test:BM_ICCG_LAMBDA/171        0.10      0.10  -1.3%
MicroBench...runtime_checks_pass<3, double>        1.74      1.72  -1.3%
MicroBench...est:BENCHMARK_FLOYD_DITHER/512     1368.50   1350.29  -1.3%
MicroBench...mCmp<4, GreaterThanZero, Last>      582.96    575.20  -1.3%
MicroBench...est:BM_MemCmp<63, EqZero, Mid>       92.21     90.99  -1.3%
MicroBench...est:BM_MemCmp<31, EqZero, Mid>      103.69    102.30  -1.3%
MicroBench...st:BM_MemCmp<7, EqZero, First>      297.49    293.51  -1.3%
MicroBench...rIC4VW1BigLoopWithReductionTC7        6.10      6.02  -1.3%
MicroBench...t:BM_MemCmp<32, EqZero, First>       88.07     86.89  -1.3%
MicroBench...aw.test:BM_INT_PREDICT_RAW/171        0.26      0.26  -1.3%
MicroBench...emCmp<5, GreaterThanZero, Mid>      555.26    547.70  -1.4%
MicroBench...est:BENCHMARK_FLOYD_DITHER/128       83.13     82.00  -1.4%
MicroBench...VW16From_uint32_t_To_uint16_t_      575.98    568.12  -1.4%
MicroBench...test:benchForIC1VW4BigLoopTC15        9.49      9.36  -1.4%
SingleSour...ks/BenchmarkGame/fannkuch.test        1.36      1.34  -1.4%
MicroBench...mp<16, GreaterThanZero, First>      206.48    203.65  -1.4%
MicroBench...aw.test:BM_PLANCKIAN_RAW/44217      200.40    197.65  -1.4%
MicroBench...Cmp<63, GreaterThanZero, None>      114.53    112.96  -1.4%
MicroBench...w.test:BM_IMP_HYDRO_2D_RAW/171        2.75      2.71  -1.4%
MicroBench...Cmp<16, GreaterThanZero, Last>      287.86    283.90  -1.4%
MicroBench...emCmp<6, GreaterThanZero, Mid>      463.61    457.22  -1.4%
MicroBench..._MemCmp<16, LessThanZero, Mid>      263.40    259.77  -1.4%
MicroBench...mCmp<15, GreaterThanZero, Mid>      220.59    217.54  -1.4%
MultiSourc...hmarks/VersaBench/bmm/bmm.test        0.61      0.61  -1.4%
MicroBench...MemCmp<31, LessThanZero, None>      201.80    199.01  -1.4%
MicroBench...ks.test:benchForIC2VW4LoopTC63        5.05      4.98  -1.4%
MicroBench...emCmp<63, LessThanZero, First>       85.80     84.61  -1.4%
MicroBench...calsCRaw.test:BM_EOS_RAW/44217       35.34     34.85  -1.4%
MicroBench...MemCmp<6, LessThanZero, First>      402.40    396.81  -1.4%
MicroBench...est:BENCHMARK_FLOYD_DITHER/256      336.59    331.91  -1.4%
MultiSourc...nch/pcompress2/pcompress2.test        0.09      0.09  -1.4%
MultiSourc...ications/JM/lencod/lencod.test        1.74      1.72  -1.4%
MicroBench...Lambda.test:BM_EOS_LAMBDA/5001        3.96      3.90  -1.4%
MicroBench...M_MemCmp<6, LessThanZero, Mid>      402.65    397.01  -1.4%
MicroBench...ForIC1VW1LoopWithReductionTC32        3.03      2.99  -1.4%
MicroBench...a.test:BM_MAT_X_MAT_LAMBDA/171       55.66     54.88  -1.4%
MicroBench...emCmp<31, LessThanZero, First>      222.05    218.91  -1.4%
MicroBench...BENCHMARK_ORDERED_DITHER/256/3      150.91    148.77  -1.4%
MicroBench...test:benchForIC1VW4BigLoopTC31       13.83     13.63  -1.4%
MicroBench...mp<15, GreaterThanZero, First>      220.90    217.75  -1.4%
MicroBench...t:BM_MemCmp<64, EqZero, First>       91.15     89.85  -1.4%
MicroBench..._MemCmp<15, LessThanZero, Mid>      194.23    191.45  -1.4%
MicroBench...Cmp<5, GreaterThanZero, First>      554.38    546.43  -1.4%
MicroBench...mCmp<7, GreaterThanZero, None>      509.99    502.67  -1.4%
MicroBench...st:BM_INT_PREDICT_LAMBDA/44217      145.97    143.87  -1.4%
MicroBench...est:BM_MemCmp<16, EqZero, Mid>      126.75    124.92  -1.4%
MicroBench...Cmp<6, GreaterThanZero, First>      464.85    458.15  -1.4%
MicroBench...BENCHMARK_ORDERED_DITHER/512/8      556.91    548.87  -1.4%
MicroBench...st:BM_MemCmp<16, EqZero, None>      127.27    125.43  -1.4%
MicroBench...mCmp<7, GreaterThanZero, Last>      531.04    523.36  -1.4%
MicroBench...LoopFrom_uint16_t_To_uint64_t_     1271.08   1252.70  -1.4%
MicroBench...LoopFrom_uint16_t_To_uint64_t_     1862.89   1835.79  -1.5%
MultiSourc...lFlow-dbl/ControlFlow-dbl.test        1.35      1.33  -1.5%
MicroBench....test:benchForIC2VW4BigLoopTC4        5.68      5.60  -1.5%
MultiSourc...ing-dbl/NodeSplitting-dbl.test        1.37      1.35  -1.5%
MicroBench...Cmp<3, GreaterThanZero, First>     1054.14   1038.74  -1.5%
MultiSourc...ks/Prolangs-C++/life/life.test        0.43      0.43  -1.5%
MicroBench...test:BM_DIFF_PREDICT_RAW/44217      260.82    256.99  -1.5%
MicroBench...a.test:BM_FIRST_SUM_LAMBDA/171        0.06      0.05  -1.5%
MicroBench...BENCHMARK_ORDERED_DITHER/128/4       34.51     34.00  -1.5%
MicroBench...emCmp<64, LessThanZero, First>       83.97     82.73  -1.5%
MicroBench...test:benchForIC2VW4BigLoopTC32       12.41     12.23  -1.5%
MicroBench...thVW8From_uint8_t_To_uint64_t_     1479.55   1457.69  -1.5%
SingleSour...chmarks/Misc/himenobmtxpa.test        0.24      0.24  -1.5%
MicroBench...CRaw.test:BM_HYDRO_1D_RAW/5001        1.35      1.33  -1.5%
MicroBench...test:BM_BAND_LIN_EQ_LAMBDA/171        0.03      0.03  -1.5%
MicroBench...emCmp<32, LessThanZero, First>      156.39    154.07  -1.5%
MicroBench...est:BM_MemCmp<32, EqZero, Mid>       88.01     86.70  -1.5%
MicroBench..._MemCmp<4, LessThanZero, None>      656.23    646.47  -1.5%
MicroBench....test:BM_MAT_X_MAT_LAMBDA/5001     6617.80   6519.26  -1.5%
MicroBench...IC4VW4BigLoopWithReductionTC31       17.15     16.90  -1.5%
MicroBench...mCmp<5, GreaterThanZero, Last>      520.36    512.59  -1.5%
MicroBench...mCmp<6, GreaterThanZero, None>      562.98    554.54  -1.5%
MicroBench...st:BM_MemCmp<15, EqZero, None>      161.09    158.68  -1.5%
MicroBench...BENCHMARK_ORDERED_DITHER/512/3      610.88    601.72  -1.5%
MicroBench...MemCmp<4, LessThanZero, First>      669.05    658.99  -1.5%
MicroBench..._MemCmp<7, LessThanZero, None>      513.46    505.74  -1.5%
MicroBench...mCmp<63, GreaterThanZero, Mid>       91.09     89.72  -1.5%
SingleSource/Benchmarks/Misc/pi.test               0.31      0.31  -1.5%
MicroBench...lcalsCRaw.test:BM_ADI_RAW/5001       58.21     57.32  -1.5%
MultiSourc...pps-C/SimpleMOC/SimpleMOC.test        0.59      0.58  -1.5%
MicroBench...BENCHMARK_ORDERED_DITHER/128/3       37.68     37.11  -1.5%
SingleSour...BenchmarkGame/Large/fasta.test        0.46      0.45  -1.5%
MicroBench...ambda.test:BM_ADI_LAMBDA/44217      512.50    504.64  -1.5%
MicroBench...t:BM_MemCmp<63, EqZero, First>       92.40     90.99  -1.5%
MicroBench...mCmp<8, GreaterThanZero, Last>      402.37    396.20  -1.5%
MicroBench...st:BM_BAND_LIN_EQ_LAMBDA/44217       11.37     11.20  -1.5%
MicroBench..._MemCmp<8, LessThanZero, None>      399.33    393.17  -1.5%
MicroBench...sCRaw.test:BM_HYDRO_1D_RAW/171        0.05      0.05  -1.5%
SingleSour...Adobe-C++/stepanov_vector.test        0.94      0.93  -1.5%
MicroBench...st:BM_MemCmp<8, EqZero, First>      253.94    249.99  -1.6%
MicroBench..._MemCmp<4, LessThanZero, Last>      603.20    593.73  -1.6%
MicroBench...mbda.test:BM_ICCG_LAMBDA/44217       34.43     33.88  -1.6%
MicroBench...mCmp<5, GreaterThanZero, None>      519.30    511.14  -1.6%
MicroBench...Cmp<63, GreaterThanZero, Last>      121.24    119.34  -1.6%
SingleSour...Benchmarks/Stanford/Oscar.test        0.00      0.00  -1.6%
MultiSourc...decode/alacconvert-decode.test        0.02      0.02  -1.6%
MicroBench...mCmp<8, GreaterThanZero, None>      396.33    390.05  -1.6%
MicroBench...emCmp<7, GreaterThanZero, Mid>      399.47    393.13  -1.6%
MicroBench...da.test:BM_HYDRO_2D_LAMBDA/171        3.19      3.14  -1.6%
MicroBench....test:BM_BAND_LIN_EQ_RAW/44217       11.51     11.33  -1.6%
MicroBench..._MemCmp<32, LessThanZero, Mid>      158.18    155.66  -1.6%
MicroBench...aw.test:BM_BAND_LIN_EQ_RAW/171        0.03      0.03  -1.6%
MicroBench...emCmp<4, GreaterThanZero, Mid>      581.02    571.76  -1.6%
MicroBench...emCmp<15, LessThanZero, First>      194.29    191.18  -1.6%
MicroBench...Lambda.test:BM_ADI_LAMBDA/5001       56.73     55.82  -1.6%
MicroBench...MemCmp<64, LessThanZero, None>      102.55    100.90  -1.6%
MicroBench...VW16From_uint16_t_To_uint64_t_     1785.10   1756.31  -1.6%
MicroBench...BENCHMARK_ORDERED_DITHER/512/2      494.87    486.86  -1.6%
MicroBench...lcalsCRaw.test:BM_ICCG_RAW/171        0.10      0.10  -1.6%
MicroBench...est:BM_BAND_LIN_EQ_LAMBDA/5001        1.21      1.19  -1.6%
MicroBench...est:BM_MemCmp<6, EqZero, Last>      350.08    344.36  -1.6%
MicroBench...emCmp<2, GreaterThanZero, Mid>     1197.37   1177.81  -1.6%
MicroBench...st:BM_MemCmp<31, EqZero, None>      104.87    103.15  -1.6%
MicroBench...M_MemCmp<3, LessThanZero, Mid>      909.39    894.49  -1.6%
MicroBench...lcalsCRaw.test:BM_EOS_RAW/5001        3.97      3.91  -1.6%
MicroBench...t:BM_IMP_HYDRO_2D_LAMBDA/44217      788.03    775.10  -1.6%
MicroBench...est:BM_MemCmp<8, EqZero, Last>      254.43    250.26  -1.6%
MicroBench...st:BM_MemCmp<5, EqZero, First>      420.75    413.85  -1.6%
MicroBench...est:BM_MemCmp<5, EqZero, Last>      419.74    412.85  -1.6%
MicroBench...test:BM_MemCmp<3, EqZero, Mid>      699.18    687.69  -1.6%
MicroBench...MemCmp<31, LessThanZero, Last>      232.18    228.35  -1.6%
MicroBench...est:BM_MemCmp<4, EqZero, None>      478.90    470.97  -1.7%
MicroBench...ambda.test:BM_ICCG_LAMBDA/5001        3.76      3.70  -1.7%
MicroBench...est:BM_DIFF_PREDICT_LAMBDA/171        0.32      0.32  -1.7%
MicroBench...est:BM_MemCmp<8, EqZero, None>      253.98    249.75  -1.7%
SingleSour...s/BenchmarkGame/recursive.test        0.38      0.37  -1.7%
MicroBench...M_MemCmp<8, LessThanZero, Mid>      404.93    398.16  -1.7%
MicroBench...ForIC1VW1LoopWithReductionTC15        2.11      2.07  -1.7%
MicroBench...MemCmp<7, LessThanZero, First>      346.20    340.36  -1.7%
MicroBench...MemCmp<3, LessThanZero, First>      918.33    902.79  -1.7%
MicroBench..._MemCmp<2, LessThanZero, Last>     1027.02   1009.61  -1.7%
MicroBench...CLambda.test:BM_EOS_LAMBDA/171        0.13      0.13  -1.7%
SingleSour...hmarks/Misc-C++-EH/spirit.test        3.36      3.31  -1.7%
MicroBench...mp<63, GreaterThanZero, First>       91.10     89.55  -1.7%
MicroBench...rIC4VW1BigLoopWithReductionTC2        3.52      3.46  -1.7%
SingleSour...enchmarks/Misc/fp-convert.test        0.78      0.76  -1.7%
MicroBench...Cmp<8, GreaterThanZero, First>      397.52    390.68  -1.7%
MicroBench..._MemCmp<2, LessThanZero, None>     1020.52   1002.86  -1.7%
MicroBench...rIC1VW4BigLoopWithReductionTC2        3.51      3.45  -1.7%
MultiSourc...lications/obsequi/Obsequi.test        0.80      0.79  -1.7%
MicroBench...VW16From_uint16_t_To_uint64_t_     1178.84   1158.31  -1.7%
MicroBench...CHMARK_ANISTROPIC_DIFFUSION/32      239.92    235.72  -1.7%
MicroBench...ks.test:benchForIC1VW4LoopTC32        2.58      2.54  -1.8%
MicroBench...mCmp<6, GreaterThanZero, Last>      670.61    658.81  -1.8%
MicroBench...Cmp<15, GreaterThanZero, Last>      308.61    303.17  -1.8%
MicroBench...hVW8From_uint16_t_To_uint64_t_     1509.51   1482.70  -1.8%
MicroBench...mCmp<2, GreaterThanZero, Last>     1302.75   1279.61  -1.8%
MicroBench...Cmp<15, GreaterThanZero, None>      279.16    274.19  -1.8%
MicroBench...mCmp<31, GreaterThanZero, Mid>      228.69    224.61  -1.8%
MicroBench...test:BM_MemCmp<8, EqZero, Mid>      254.44    249.88  -1.8%
MicroBench...est:BM_MemCmp<3, EqZero, None>      769.40    755.60  -1.8%
MicroBench...rIC4VW4BigLoopWithReductionTC2        3.52      3.46  -1.8%
MultiSourc...-dbl/LinearDependence-dbl.test        0.92      0.90  -1.8%
MicroBench...alsCRaw.test:BM_ICCG_RAW/44217       34.51     33.87  -1.8%
MultiSourc...ow-dbl/GlobalDataFlow-dbl.test        0.92      0.90  -1.8%
Bitcode/si...imd_ops_test_op_pabsb_235.test        0.01      0.01  -1.8%
MicroBench...MemCmp<8, LessThanZero, First>      400.28    392.86  -1.9%
MicroBench...BENCHMARK_ORDERED_DITHER/256/8      138.35    135.77  -1.9%
MicroBench...M_MemCmp<7, LessThanZero, Mid>      346.65    340.15  -1.9%
MicroBench...calsBRaw.test:BM_INIT3_RAW/171        0.04      0.03  -1.9%
MicroBench...w.test:BM_BAND_LIN_EQ_RAW/5001        1.23      1.21  -1.9%
MicroBench...st:BM_MemCmp<15, EqZero, Last>      155.83    152.87  -1.9%
MicroBench...MemCmp<16, LessThanZero, Last>      263.57    258.51  -1.9%
SingleSour...e/Benchmarks/Misc/flops-5.test        0.48      0.47  -1.9%
MicroBench...M_MemCmp<2, LessThanZero, Mid>     1021.55   1001.83  -1.9%
MicroBench...test:BM_MemCmp<4, EqZero, Mid>      478.33    469.09  -1.9%
MicroBench..._MemCmp<3, LessThanZero, Last>      774.55    759.57  -1.9%
MicroBench...st:BM_MemCmp<3, EqZero, First>      699.70    686.07  -1.9%
MicroBench...mCmp<1, GreaterThanZero, None>     2293.58   2248.65  -2.0%
MultiSourc.../Trimaran/enc-md5/enc-md5.test        0.85      0.83  -2.0%
MicroBench...est:BM_MemCmp<5, EqZero, None>      454.12    445.20  -2.0%
MicroBench....test:benchForIC1VW4BigLoopTC3        5.21      5.11  -2.0%
MicroBench...ENCHMARK_BILATERAL_FILTER/32/4      153.15    150.13  -2.0%
MicroBench...test:BM_MemCmp<2, EqZero, Mid>     1004.95    985.09  -2.0%
MicroBench...est:BM_MemCmp<3, EqZero, Last>      774.29    758.96  -2.0%
SingleSour...marks/CoyoteBench/lpbench.test        0.91      0.89  -2.0%
MicroBench...ForIC2VW1LoopWithReductionTC15        2.76      2.71  -2.0%
SingleSour...s/Shootout/Shootout-lists.test        2.03      1.99  -2.0%
MicroBench...calsCRaw.test:BM_ICCG_RAW/5001        3.76      3.69  -2.0%
MicroBench...Cmp<2, GreaterThanZero, First>     1199.91   1175.72  -2.0%
MicroBench..._MemCmp<1, LessThanZero, None>     1897.81   1859.48  -2.0%
MicroBench...BENCHMARK_ORDERED_DITHER/256/2      122.98    120.47  -2.0%
MicroBench...test:BM_MemCmp<7, EqZero, Mid>      323.78    317.12  -2.1%
MicroBench...w.test:BM_FIRST_DIFF_RAW/44217        8.51      8.33  -2.1%
MicroBench...est:BM_MemCmp<2, EqZero, Last>     1004.19    983.22  -2.1%
SingleSour...isc-C++/Large/sphereflake.test        1.59      1.56  -2.1%
Bitcode/si...simd_ops_test_op_mulps_23.test        0.01      0.01  -2.1%
MicroBench...emCmp<1, GreaterThanZero, Mid>     2298.85   2250.61  -2.1%
MicroBench...Cmp<1, GreaterThanZero, First>     2279.43   2231.40  -2.1%
MicroBench...ENCHMARK_BILATERAL_FILTER/32/2       44.49     43.55  -2.1%
MicroBench...BENCHMARK_asinf_autovec_float_       91.46     89.53  -2.1%
MicroBench...w.test:BM_INNER_PROD_RAW/44217       71.15     69.64  -2.1%
MicroBench...est:BM_MemCmp<2, EqZero, None>      958.99    938.62  -2.1%
MicroBench...BENCHMARK_ORDERED_DITHER/128/8       34.60     33.86  -2.1%
MultiSourc...Rodinia/backprop/backprop.test        0.35      0.34  -2.1%
MicroBench...mCmp<3, GreaterThanZero, None>      936.41    916.28  -2.2%
MicroBench..._MemCmp<3, LessThanZero, None>      777.22    760.50  -2.2%
MicroBench....test:benchForIC4VW4BigLoopTC3        5.24      5.12  -2.2%
MicroBench...ForIC4VW1LoopWithReductionTC32        2.37      2.32  -2.2%
MicroBench...mCmp<3, GreaterThanZero, Last>      927.91    907.81  -2.2%
MicroBench...mCmp<1, GreaterThanZero, Last>     2278.99   2229.63  -2.2%
MicroBench...BENCHMARK_ORDERED_DITHER/256/4      138.69    135.69  -2.2%
MultiSourc...+/HACCKernels/HACCKernels.test        0.78      0.77  -2.2%
MultiSourc...marks/SciMark2-C/scimark2.test       14.78     14.45  -2.2%
SingleSour...sc-C++/stepanov_container.test        1.66      1.62  -2.2%
MicroBench....test:benchForIC2VW4BigLoopTC3        5.24      5.12  -2.2%
MicroBench...ForIC2VW4LoopWithReductionTC31        3.08      3.01  -2.2%
MicroBench...Raw.test:BM_INNER_PROD_RAW/171        0.23      0.23  -2.2%
MicroBench...ks.test:benchForIC4VW4LoopTC32        2.12      2.07  -2.3%
MultiSourc.../VersaBench/ecbdes/ecbdes.test        0.99      0.97  -2.3%
MicroBench...MemCmp<1, LessThanZero, First>     2008.40   1962.07  -2.3%
MicroBench...ENCHMARK_BILATERAL_FILTER/64/4      704.85    688.44  -2.3%
MicroBench...test:BM_MemCmp<1, EqZero, Mid>     2013.98   1967.02  -2.3%
MicroBench...M_MemCmp<1, LessThanZero, Mid>     2013.64   1966.51  -2.3%
MicroBench...est:BM_MemCmp<15, EqZero, Mid>      164.18    160.33  -2.3%
MicroBench...LoopFrom_uint32_t_To_uint64_t_     1843.12   1799.76  -2.4%
MicroBench...ks.test:benchForIC2VW4LoopTC32        2.33      2.27  -2.4%
MicroBench...calsCRaw.test:BM_ADI_RAW/44217      524.85    512.46  -2.4%
MicroBench...hVW8From_uint32_t_To_uint64_t_     1275.68   1245.25  -2.4%
MicroBench...w.test:BM_DIFF_PREDICT_RAW/171        0.32      0.32  -2.4%
MicroBench..._MemCmp<1, LessThanZero, Last>     2007.86   1959.71  -2.4%
SingleSour...e/Benchmarks/Misc/flops-1.test        0.40      0.39  -2.4%
MicroBench...test:BM_MemCmp<6, EqZero, Mid>      380.82    371.64  -2.4%
MicroBench...sCRaw.test:BM_PIC_1D_RAW/44217      212.01    206.80  -2.5%
MultiSourc...mbolics-dbl/Symbolics-dbl.test        0.77      0.75  -2.5%
MicroBench...ENCHMARK_BILATERAL_FILTER/16/4       28.09     27.39  -2.5%
MicroBench...st:BM_MemCmp<1, EqZero, First>     2013.40   1963.49  -2.5%
MicroBench...LoopFrom_uint16_t_To_uint32_t_      881.90    860.02  -2.5%
MicroBench...ENCHMARK_BILATERAL_FILTER/64/2      189.98    185.21  -2.5%
MultiSourc...marks/Ptrdist/yacr2/yacr2.test        0.28      0.28  -2.5%
SingleSour...e/Benchmarks/Misc/flops-8.test        0.50      0.49  -2.5%
MicroBench...est:BM_MemCmp<1, EqZero, Last>     2016.10   1964.82  -2.5%
MicroBench...est:BM_FIRST_DIFF_LAMBDA/44217        8.46      8.25  -2.5%
Bitcode/si...imd_ops_test_op_pabsb_238.test        0.01      0.01  -2.6%
MultiSourc...itBench/uudecode/uudecode.test        0.02      0.02  -2.6%
MicroBench...thVW8From_uint8_t_To_uint32_t_      856.49    833.93  -2.6%
MultiSourc...bl/IndirectAddressing-dbl.test        1.28      1.24  -2.7%
MicroBench...est:BM_MemCmp<6, EqZero, None>      384.88    374.58  -2.7%
MultiSourc.../Benchmarks/Ptrdist/bc/bc.test        0.19      0.18  -2.7%
MultiSourc...s-C/Pathfinder/PathFinder.test        1.03      1.00  -2.7%
MicroBench...hVW8From_uint16_t_To_uint32_t_      698.26    679.52  -2.7%
MicroBench...MemCmp<5, LessThanZero, First>      483.53    470.50  -2.7%
MultiSourc...ow-flt/GlobalDataFlow-flt.test        0.53      0.52  -2.7%
Bitcode/si...imd_ops_test_op_pabsd_237.test        0.01      0.01  -2.7%
MultiSourc...marks/Olden/health/health.test        0.12      0.11  -2.7%
MicroBench...Raw.test:BM_HYDRO_1D_RAW/44217       12.45     12.11  -2.7%
MicroBench...st:BM_MemCmp<2, EqZero, First>     1008.59    981.05  -2.7%
MicroBench...VW16From_uint32_t_To_uint64_t_     1779.54   1730.88  -2.7%
MicroBench...sCRaw.test:BM_HYDRO_2D_RAW/171        3.22      3.13  -2.7%
MicroBench...t:BENCHMARK_asinf_novec_float_       90.84     88.33  -2.8%
MicroBench...hVW16From_uint8_t_To_uint32_t_      860.98    837.18  -2.8%
MicroBench...test:BM_MAT_X_MAT_LAMBDA/44217   108434.22 105420.91  -2.8%
MultiSourc...netbench-crc/netbench-crc.test        0.52      0.51  -2.8%
MicroBench...ENCHMARK_BILATERAL_FILTER/16/2        9.73      9.46  -2.8%
Bitcode/si...simd_ops_test_op_paddsb_1.test        0.01      0.01  -2.8%
MicroBench...est:BM_MemCmp<1, EqZero, None>     2023.10   1964.88  -2.9%
MicroBench...aw.test:BM_FIRST_SUM_RAW/44217       18.32     17.78  -3.0%
MicroBench...test:BM_FIRST_DIFF_LAMBDA/5001        0.98      0.95  -3.0%
Bitcode/si..._ops_test_op_blendvps_299.test        0.01      0.01  -3.0%
MicroBench...BENCHMARK_ORDERED_DITHER/128/2       31.20     30.25  -3.0%
MicroBench...test:BM_FIRST_SUM_LAMBDA/44217       18.26     17.70  -3.1%
MicroBench...VW16From_uint16_t_To_uint32_t_      824.38    799.04  -3.1%
MultiSourc...nia/pathfinder/pathfinder.test        0.22      0.21  -3.1%
Bitcode/si...imd_ops_test_op_minpd_196.test        0.01      0.01  -3.1%
SingleSour...ks/Shootout/Shootout-hash.test        1.36      1.32  -3.2%
MicroBench...aw.test:BM_FIRST_DIFF_RAW/5001        0.95      0.92  -3.2%
MicroBench...Raw.test:BM_FIRST_SUM_RAW/5001        2.06      1.99  -3.2%
Bitcode/Be...an/halide_local_laplacian.test       17.55     16.99  -3.2%
MicroBench...mCmp<4, GreaterThanZero, None>      625.79    605.77  -3.2%
MicroBench....test:BM_INT_PREDICT_RAW/44217      146.96    142.25  -3.2%
MicroBench...ks.test:benchForIC1VW4LoopTC64        6.41      6.20  -3.3%
MultiSourc...lications/viterbi/viterbi.test        0.54      0.52  -3.3%
MultiSourc...DOE-ProxyApps-C/CoMD/CoMD.test        0.70      0.67  -3.3%
SingleSour...ce/Benchmarks/Misc/fbench.test        0.46      0.44  -3.3%
MicroBench...s.test:benchForIC2VW4LoopTC127       10.20      9.86  -3.4%
MicroBench....test:BM_HYDRO_2D_LAMBDA/44217      997.86    963.95  -3.4%
SingleSour...ut-C++/Shootout-C++-sieve.test        0.71      0.69  -3.4%
MicroBench...ks.test:benchForIC4VW4LoopTC16        1.75      1.68  -3.5%
Bitcode/si...imd_ops_test_op_pabsw_239.test        0.01      0.01  -3.5%
MicroBench...VW16From_uint32_t_To_uint64_t_     1573.23   1518.32  -3.5%
MicroBench...LoopFrom_uint16_t_To_uint32_t_      636.48    614.22  -3.5%
MicroBench...BRaw.test:BM_IF_QUAD_RAW/44217      119.64    115.43  -3.5%
MultiSourc...-flt/LinearDependence-flt.test        0.62      0.60  -3.5%
MicroBench...alsBRaw.test:BM_INIT3_RAW/5001        2.68      2.59  -3.6%
MicroBench...LoopFrom_uint32_t_To_uint64_t_     1263.26   1217.70  -3.6%
MicroBench...Raw.test:BM_HYDRO_2D_RAW/44217      994.73    957.39  -3.8%
MicroBench....test:BM_FIRST_SUM_LAMBDA/5001        2.06      1.98  -3.8%
SingleSour...e/Benchmarks/Misc/flops-2.test        0.21      0.20  -3.8%
Bitcode/si...simd_ops_test_op_paddsb_2.test        0.01      0.01  -3.8%
MultiSourc...l/StatementReordering-dbl.test        1.44      1.39  -3.8%
MicroBench...hVW8From_uint32_t_To_uint64_t_     1518.55   1460.02  -3.9%
MicroBench...est:BM_INT_PREDICT_LAMBDA/5001       10.65     10.23  -4.0%
MicroBench...est:BM_MemCmp<7, EqZero, None>      326.19    313.27  -4.0%
SingleSour...ar-algebra/blas/syrk/syrk.test        0.80      0.77  -4.0%
MultiSourc...s/Ptrdist/anagram/anagram.test        0.36      0.35  -4.0%
SingleSour.../Shootout/Shootout-random.test        1.01      0.97  -4.0%
MicroBench....test:BM_HYDRO_1D_LAMBDA/44217       12.36     11.86  -4.1%
MicroBench...CRaw.test:BM_MAT_X_MAT_RAW/171       55.96     53.51  -4.4%
SingleSour...enchmarks/Stanford/Queens.test        0.01      0.01  -4.5%
SingleSour...out-C++/Shootout-C++-hash.test        0.21      0.20  -4.5%
MicroBench...Raw.test:BM_MULADDSUB_RAW/5001        3.64      3.48  -4.5%
Bitcode/si...imd_ops_test_op_addps_117.test        0.01      0.01  -4.6%
SingleSour...marks/Stanford/Bubblesort.test        0.03      0.02  -4.7%
MicroBench...CRaw.test:BM_HYDRO_2D_RAW/5001      103.12     98.09  -4.9%
Bitcode/si..._ops_test_op_packusdw_273.test        0.01      0.01  -5.0%
Bitcode/si...imd_ops_test_op_pabsd_240.test        0.01      0.01  -5.0%
Bitcode/si...md_ops_test_op_cmpeqps_88.test        0.01      0.01  -5.1%
Bitcode/si...imd_ops_test_op_mulpd_222.test        0.01      0.01  -5.2%
MultiSourc.../Benchmarks/Ptrdist/ks/ks.test        0.31      0.29  -5.2%
Bitcode/si...d_ops_test_op_cmpltpd_213.test        0.01      0.01  -5.2%
MultiSourc...nsumer-lame/consumer-lame.test        0.07      0.06  -5.3%
SingleSour...le_types_constant_folding.test        0.29      0.27  -5.3%
MicroBench...lsBRaw.test:BM_INIT3_RAW/44217       30.77     29.04  -5.6%
Bitcode/si...imd_ops_test_op_maxps_171.test        0.01      0.01  -5.9%
SingleSour...+/Shootout-C++-nestedloop.test        0.00      0.00  -6.0%
MicroBench...hForIC2VW4LoopWithReductionTC7        2.02      1.90  -6.1%
MultiSourc...oxyApps-C/miniAMR/miniAMR.test        0.23      0.22  -6.2%
MultiSourc...chmarks/Rodinia/srad/srad.test        0.24      0.22  -6.2%
MicroBench...aw.test:BM_MULADDSUB_RAW/44217       38.81     36.36  -6.3%
SingleSour...s/Misc/richards_benchmark.test        0.38      0.35  -6.3%
SingleSour...ks/Shootout/Shootout-ary3.test        0.25      0.24  -6.9%
MultiSourc...peg2/mpeg2dec/mpeg2decode.test        0.01      0.01  -7.0%
SingleSour...ch/medley/deriche/deriche.test        0.39      0.36  -7.1%
SingleSour...nchmarkGame/spectral-norm.test        0.17      0.16  -7.3%
SingleSour...e/Benchmarks/Misc/flops-4.test        0.28      0.26  -7.5%
SingleSour...++/EH/Shootout-C++-except.test        0.07      0.06  -7.8%
SingleSour...ebra/blas/gesummv/gesummv.test        0.00      0.00  -8.1%
Bitcode/si..._ops_test_op_packusdw_296.test        0.01      0.01  -8.2%
MicroBench...Raw.test:BM_FIRST_DIFF_RAW/171        0.03      0.02  -8.2%
SingleSour...enchmarks/Dhrystone/fldry.test        0.19      0.18  -8.7%
MultiSourc...rks/FreeBench/pifft/pifft.test        0.06      0.06  -8.9%
Bitcode/si...imd_ops_test_op_maxpd_195.test        0.01      0.01  -8.9%
MultiSourc...rks/FreeBench/mason/mason.test        0.11      0.10  -8.9%
SingleSour...arks/Misc-C++/oopack_v1p8.test        0.09      0.09  -9.1%
SingleSour...ar-algebra/blas/trmm/trmm.test        0.90      0.82  -9.1%
SingleSour...arks/CoyoteBench/fftbench.test        0.27      0.24  -9.5%
SingleSour...out-C++/Shootout-C++-ary3.test        0.26      0.23  -9.6%
SingleSour...t-C++/Shootout-C++-strcat.test        0.05      0.04  -9.9%
Bitcode/si...md_ops_test_op_paddsb_145.test        0.01      0.01 -10.2%
MultiSourc...comm-CRC32/telecomm-CRC32.test        0.08      0.07 -10.4%
MultiSourc...telecomm-FFT/telecomm-fft.test        0.02      0.02 -10.6%
MultiSourc...nsumer-jpeg/consumer-jpeg.test        0.00      0.00 -10.7%
Bitcode/si...d_ops_test_op_cmpeqps_136.test        0.01      0.01 -10.7%
MultiSourc...dijkstra/network-dijkstra.test        0.04      0.03 -10.8%
Bitcode/si...d_ops_test_op_cmpltpd_198.test        0.01      0.01 -10.8%
Bitcode/si...imd_ops_test_op_mulps_167.test        0.01      0.01 -11.3%
Bitcode/si..._ops_test_op_packssdw_232.test        0.01      0.01 -11.5%
SingleSour.../Benchmarks/Dhrystone/dry.test        0.17      0.15 -11.9%
Bitcode/si..._ops_test_op_packsswb_203.test        0.01      0.01 -11.9%
SingleSour...enchmarks/Misc-C++/bigfib.test        0.18      0.16 -12.3%
MicroBench....test:BM_FIRST_DIFF_LAMBDA/171        0.03      0.03 -12.4%
MultiSourc...adpcm/rawcaudio/rawcaudio.test        0.00      0.00 -12.4%
Bitcode/si..._ops_test_op_packsswb_218.test        0.01      0.01 -12.5%
Bitcode/si...imd_ops_test_op_minps_124.test        0.01      0.01 -12.6%
MultiSourc.../Applications/sgefa/sgefa.test        0.09      0.08 -12.7%
SingleSour...arks/BenchmarkGame/n-body.test        0.22      0.19 -12.7%
MultiSourc...cCat/03-testtrie/testtrie.test        0.01      0.01 -12.8%
Bitcode/si...simd_ops_test_op_minps_28.test        0.01      0.01 -13.0%
Bitcode/si...d_ops_test_op_cmpeqps_184.test        0.01      0.01 -13.2%
SingleSour...arks/BenchmarkGame/puzzle.test        0.09      0.07 -13.8%
Bitcode/si...imd_ops_test_op_addpd_205.test        0.01      0.01 -14.0%
Bitcode/si...d_ops_test_op_cmpeqpd_212.test        0.01      0.01 -14.1%
Bitcode/si..._ops_test_op_packuswb_234.test        0.01      0.01 -14.2%
MultiSourc.../Prolangs-C/bison/mybison.test        0.00      0.00 -15.5%
SingleSour...ncils/jacobi-1d/jacobi-1d.test        0.00      0.00 -15.7%
SingleSour...otout/Shootout-nestedloop.test        0.00      0.00 -15.9%
Bitcode/si..._ops_test_op_packsswb_233.test        0.01      0.01 -16.1%
MultiSourc...ks/Prolangs-C++/city/city.test        0.01      0.01 -16.7%
SingleSour...algebra/kernels/bicg/bicg.test        0.03      0.02 -17.1%
MultiSourc...patricia/network-patricia.test        0.06      0.05 -17.4%
MultiSourc...hmarks/McCat/08-main/main.test        0.07      0.05 -17.5%
SingleSour...Shootout/Shootout-objinst.test        0.00      0.00 -17.7%
SingleSour...BenchmarkGame/partialsums.test        0.10      0.08 -18.8%
Bitcode/si...simd_ops_test_op_paddb_42.test        0.01      0.01 -18.9%
MicroBench...alsCRaw.test:BM_PIC_2D_RAW/171        0.62      0.51 -19.0%
MultiSourc...ks/Prolangs-C/gnugo/gnugo.test        0.03      0.02 -19.0%
SingleSour...enchmarks/Misc/revertBits.test        0.10      0.08 -19.2%
MultiSourc...abench/jpeg/jpeg-6a/cjpeg.test        0.00      0.00 -21.1%
MultiSourc...s/FreeBench/neural/neural.test        0.07      0.05 -21.2%
SingleSour...enchmarks/Stanford/RealMM.test        0.00      0.00 -21.6%
SingleSour.../Shootout/Shootout-strcat.test        0.14      0.11 -21.7%
MultiSourc...Olden/perimeter/perimeter.test        0.11      0.09 -21.9%
MultiSourc...ications/JM/ldecod/ldecod.test        0.05      0.04 -23.4%
Bitcode/si...imd_ops_test_op_divpd_223.test        0.01      0.00 -24.7%
MultiSourc...ks/McCat/04-bisect/bisect.test        0.09      0.07 -25.6%
Bitcode/si..._ops_test_op_packusdw_319.test        0.01      0.00 -25.8%
Bitcode/si...md_ops_test_op_paddsb_146.test        0.00      0.00 -26.3%
Bitcode/si..._ops_test_op_blendvps_276.test        0.01      0.00 -26.3%
SingleSour...lgebra/blas/gemver/gemver.test        0.05      0.03 -27.7%
Bitcode/si..._ops_test_op_blendvpd_277.test        0.01      0.00 -28.9%
Bitcode/si...simd_ops_test_op_paddd_47.test        0.01      0.00 -29.1%
MultiSourc...FreeBench/distray/distray.test        0.07      0.05 -35.6%
Bitcode/si..._ops_test_op_packssdw_217.test        0.01      0.00 -36.7%
MultiSourc...ijndael/security-rijndael.test        0.03      0.02 -37.0%
MultiSourc...count/automotive-bitcount.test        0.06      0.04 -37.1%
MultiSourc...Applications/kimwitu++/kc.test        0.07      0.05 -38.7%
MultiSourc.../Benchmarks/Olden/mst/mst.test        0.05      0.03 -39.1%
SingleSour...r-algebra/kernels/mvt/mvt.test        0.06      0.04 -40.7%
Bitcode/si...simd_ops_test_op_paddd_95.test        0.01      0.00 -42.1%
MultiSourc...nchmarks/McCat/09-vor/vor.test        0.10      0.05 -43.1%
SingleSour...Benchmarks/Misc/lowercase.test        0.00      0.00 -43.3%
Bitcode/Regression/fft/fft.test                    0.08      0.04 -43.5%
SingleSour...chmarks/Stanford/Treesort.test        0.07      0.04 -47.1%
Bitcode/si...imd_ops_test_op_divpd_208.test        0.01      0.00 -49.7%
MultiSourc...adpcm/rawdaudio/rawdaudio.test        0.00      0.00 -50.0%
Bitcode/si...imd_ops_test_op_maxpd_210.test        0.01      0.00 -50.4%
Bitcode/si...imd_ops_test_op_minpd_226.test        0.01      0.00 -51.0%
MultiSourc...nchmarks/McCat/05-eks/eks.test        0.00      0.00 -53.8%
SingleSour...ing/covariance/covariance.test        0.00      0.00 -54.6%
Bitcode/si...imd_ops_test_op_mulpd_192.test        0.01      0.00 -56.0%
MultiSourc...security-sha/security-sha.test        0.02      0.01 -58.1%
MultiSourc...ks/Prolangs-C/agrep/agrep.test        0.00      0.00 -58.9%
MultiSourc...ediabench/gsm/toast/toast.test        0.02      0.01 -59.0%
Bitcode/si..._ops_test_op_blendvpd_254.test        0.01      0.00 -59.3%
Bitcode/si...simd_ops_test_op_maxps_75.test        0.01      0.00 -65.5%
MicroBench.../Builtins/Int128/Builtins.test        0.00      0.00
MicroBench...sion/AnisotropicDiffusion.test        0.00      0.00
MicroBench...Filtering/BilateralFilter.test        0.00      0.00
MicroBench...ImageProcessing/Blur/blur.test        0.00      0.00
MicroBench...eProcessing/Dilate/Dilate.test        0.00      0.00
MicroBench...eProcessing/Dither/Dither.test        0.00      0.00
MicroBench...terpolation/Interpolation.test        0.00      0.00
MicroBench...ALambdaLoops/lcalsALambda.test        0.00      0.00
MicroBench...SubsetARawLoops/lcalsARaw.test        0.00      0.00
MicroBench...BLambdaLoops/lcalsBLambda.test        0.00      0.00
MicroBench...SubsetBRawLoops/lcalsBRaw.test        0.00      0.00
MicroBench...CLambdaLoops/lcalsCLambda.test        0.00      0.00
MicroBench...SubsetCRawLoops/lcalsCRaw.test        0.00      0.00
MicroBench...terchange/LoopInterchange.test        0.00      0.00
MicroBench...oopInterleavingBenchmarks.test        0.00      0.00
MicroBench...opVectorizationBenchmarks.test        0.00      0.00
MicroBench...MemFunctions/MemFunctions.test        0.00      0.00
MicroBench...LPVectorizationBenchmarks.test        0.00      0.00
MicroBenchmarks/harris/harris.test                 0.00      0.00
                           Geomean difference                      -0.9%
           exec_time
l/r              lhs            rhs         diff
count  1264.000000    1264.000000    1245.000000
mean   1772.119900    1764.943251    0.000597
std    22786.398852   22753.813335   0.146146
min    0.000000       0.000000      -0.655440
25%    0.568802       0.561713      -0.015990
50%    3.846095       3.857900      -0.006242
75%    109.962397     109.045701     0.001942
max    579910.865178  580187.226446  1.515248

@artagnon artagnon requested review from asb and topperc August 2, 2024 14:12
@artagnon
Copy link
Contributor Author

artagnon commented Aug 2, 2024

I've updated the CodeGen tests for the RISC-V target as an example to help with the review, and unless I'm mistaken, I do see improvements mixed with changes that have no impact.

@artagnon artagnon requested a review from preames August 2, 2024 14:23
@artagnon
Copy link
Contributor Author

artagnon commented Aug 6, 2024

Gentle ping. I think the test changes can be summarized as:

  • ALU operations like add, sub, shift, and have latency of 1: therefore, it is fine to schedule a use of the result immediately after.
  • Load operations have non-trivial latency: therefore, instructions have been re-ordered to avoid stalling after a load.
  • Vector operations have non-trivial latency: again, instructions have been re-ordered to avoid waiting for the result.

@artagnon artagnon requested a review from RKSimon August 7, 2024 10:57
@artagnon
Copy link
Contributor Author

artagnon commented Aug 7, 2024

I've updated the X86 tests now, and I think the patch is ready for review.

@artagnon artagnon marked this pull request as ready for review August 7, 2024 10:57
@artagnon artagnon removed request for asb and preames August 7, 2024 10:58
@llvmbot
Copy link
Member

llvmbot commented Aug 7, 2024

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-globalisel

Author: Ramkumar Ramachandra (artagnon)

Changes

TargetSchedModel::computeOperandLatency is supposed to return the exact latency between two MIs, although it is observed that InstrSchedModel and InstrItineraries are often unavailable in many real-world scenarios. When these two pieces of information are not available, the function returns an estimate that is much too conservative: the default def latency. MachineTraceMetrics is one of the callers affected quite badly by these conservative estimates. To improve the estimate, and let callers of MTM generate better code, offset the default def latency by the estiamted cycles elapsed between the def MI and use MI. Since we're trying to improve codegen in the case when no scheduling information is unavailable, it is impossible to determine the number of cycles elapsed between the two MIs, and we use the distance between them accounting for issue-width as a crude approximate. In practice, this improvement of one crude estimate by offseting it with another crude estimate leads to better codegen on average, and yields non-trivial gains on standard benchmarks.


Patch is 281.95 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/101389.diff

77 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineTraceMetrics.cpp (+67-9)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll (+22-22)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/addcarry.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/atomicrmw-uinc-udec-wrap.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-ilp32-ilp32f-common.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-ilp32-ilp32f-ilp32d-common.ll (+14-14)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-ilp32e.ll (+28-28)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-lp64-lp64f-lp64d-common.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/compress.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/copysign-casts.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/div-pow2.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/float-intrinsics.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/iabs.ll (+20-20)
  • (modified) llvm/test/CodeGen/RISCV/machine-combiner.mir (+5-4)
  • (modified) llvm/test/CodeGen/RISCV/misched-load-clustering.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/mul.ll (+25-25)
  • (modified) llvm/test/CodeGen/RISCV/neg-abs.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/reduction-formation.ll (+42-42)
  • (modified) llvm/test/CodeGen/RISCV/rv32e.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rv32zba.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rv32zbb.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rv64e.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rv64zba.ll (+13-11)
  • (modified) llvm/test/CodeGen/RISCV/rvv/compressstore.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp2i.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-explodevector.ll (+117-117)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-formation.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-store.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-combine.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll (+13-13)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-reassociations.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/split-udiv-by-constant.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/srem-lkk.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/srem-seteq-illegal-types.ll (+21-21)
  • (modified) llvm/test/CodeGen/RISCV/srem-vector-lkk.ll (+33-33)
  • (modified) llvm/test/CodeGen/RISCV/urem-lkk.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/urem-seteq-illegal-types.ll (+14-14)
  • (modified) llvm/test/CodeGen/RISCV/urem-vector-lkk.ll (+19-19)
  • (modified) llvm/test/CodeGen/RISCV/xaluo.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/xtheadmac.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/xtheadmemidx.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/avx512bw-intrinsics-upgrade.ll (+6-6)
  • (modified) llvm/test/CodeGen/X86/bitcast-and-setcc-256.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/div-rem-pair-recomposition-signed.ll (+6-6)
  • (modified) llvm/test/CodeGen/X86/div-rem-pair-recomposition-unsigned.ll (+194-197)
  • (modified) llvm/test/CodeGen/X86/early-ifcvt-remarks.ll (+25-25)
  • (modified) llvm/test/CodeGen/X86/fold-tied-op.ll (+33-33)
  • (modified) llvm/test/CodeGen/X86/horizontal-sum.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/is_fpclass.ll (+31-59)
  • (modified) llvm/test/CodeGen/X86/lea-opt-cse4.ll (+10-8)
  • (modified) llvm/test/CodeGen/X86/machine-cp.ll (+35-35)
  • (modified) llvm/test/CodeGen/X86/madd.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/masked_gather_scatter.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/memcmp-more-load-pairs-x32.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/memcmp-more-load-pairs.ll (+16-16)
  • (modified) llvm/test/CodeGen/X86/midpoint-int-vec-256.ll (+82-82)
  • (modified) llvm/test/CodeGen/X86/mul-constant-result.ll (+75-80)
  • (modified) llvm/test/CodeGen/X86/mul-i512.ll (+19-19)
  • (modified) llvm/test/CodeGen/X86/mul64.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/pr62653.ll (+43-43)
  • (modified) llvm/test/CodeGen/X86/rotate-multi.ll (+12-12)
  • (modified) llvm/test/CodeGen/X86/sad.ll (+27-26)
  • (modified) llvm/test/CodeGen/X86/sext-vsetcc.ll (+35-34)
  • (modified) llvm/test/CodeGen/X86/smul_fix.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/statepoint-live-in.ll (+27-27)
  • (modified) llvm/test/CodeGen/X86/statepoint-regs.ll (+27-27)
  • (modified) llvm/test/CodeGen/X86/ucmp.ll (+148-148)
  • (modified) llvm/test/CodeGen/X86/umul-with-overflow.ll (+11-12)
  • (modified) llvm/test/CodeGen/X86/umul_fix.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/umulo-128-legalisation-lowering.ll (+25-28)
  • (modified) llvm/test/CodeGen/X86/v8i1-masks.ll (+72-72)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-5.ll (+3-3)
  • (modified) llvm/test/CodeGen/X86/vector-reduce-or-cmp.ll (+4-4)
  • (modified) llvm/test/CodeGen/X86/win-smallparams.ll (+16-16)
  • (modified) llvm/test/CodeGen/X86/x86-interleaved-access.ll (+12-12)
  • (modified) llvm/test/CodeGen/X86/xmulo.ll (+19-19)
diff --git a/llvm/lib/CodeGen/MachineTraceMetrics.cpp b/llvm/lib/CodeGen/MachineTraceMetrics.cpp
index bf3add010574b8..c3afba23628130 100644
--- a/llvm/lib/CodeGen/MachineTraceMetrics.cpp
+++ b/llvm/lib/CodeGen/MachineTraceMetrics.cpp
@@ -20,6 +20,7 @@
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachineOperand.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/TargetInstrInfo.h"
 #include "llvm/CodeGen/TargetRegisterInfo.h"
 #include "llvm/CodeGen/TargetSchedule.h"
 #include "llvm/CodeGen/TargetSubtargetInfo.h"
@@ -761,6 +762,64 @@ static void updatePhysDepsDownwards(const MachineInstr *UseMI,
   }
 }
 
+/// Estimates the number of cycles elapsed between DefMI and UseMI if they're
+/// non-null and in the same BasicBlock. Returns std::nullopt when UseMI is in a
+/// different MBB than DefMI, or when it is a dangling MI.
+static std::optional<unsigned>
+estimateDefUseCycles(const TargetSchedModel &Sched, const MachineInstr *DefMI,
+                     const MachineInstr *UseMI) {
+  if (!DefMI || !UseMI || DefMI == UseMI)
+    return 0;
+  const MachineBasicBlock *ParentBB = DefMI->getParent();
+  if (ParentBB != UseMI->getParent())
+    return std::nullopt;
+
+  const auto DefIt =
+      llvm::find_if(ParentBB->instrs(),
+                    [DefMI](const MachineInstr &MI) { return DefMI == &MI; });
+  const auto UseIt =
+      llvm::find_if(ParentBB->instrs(),
+                    [UseMI](const MachineInstr &MI) { return UseMI == &MI; });
+  assert(std::distance(DefIt, UseIt) > 0 &&
+         "Def expected to appear before use");
+  unsigned NumMicroOps = 0;
+  for (auto It = DefIt; It != UseIt; ++It) {
+    // In some cases, UseMI is a dangling MI beyond the end of the MBB.
+    if (It.isEnd())
+      return std::nullopt;
+
+    NumMicroOps += Sched.getNumMicroOps(&*It);
+  }
+  return NumMicroOps / Sched.getIssueWidth();
+}
+
+/// Wraps Sched.computeOperandLatency, accounting for the case when
+/// InstrSchedModel and InstrItineraries are not available: in this case,
+/// Sched.computeOperandLatency returns DefaultDefLatency, which is a very rough
+/// approximate; to improve this approximate, offset it by the approximate
+/// cycles elapsed from DefMI to UseMI (since the MIs could be re-ordered by the
+/// scheduler, and we don't have this information, this cannot be known
+/// exactly). When scheduling information is available,
+/// Sched.computeOperandLatency returns a much better estimate (especially if
+/// UseMI is non-null), so we just return that.
+static unsigned computeOperandLatency(const TargetSchedModel &Sched,
+                                      const MachineInstr *DefMI,
+                                      unsigned DefOperIdx,
+                                      const MachineInstr *UseMI,
+                                      unsigned UseOperIdx) {
+  assert(DefMI && "Non-null DefMI expected");
+  if (!Sched.hasInstrSchedModel() && !Sched.hasInstrItineraries()) {
+    unsigned DefaultDefLatency = Sched.getInstrInfo()->defaultDefLatency(
+        *Sched.getMCSchedModel(), *DefMI);
+    std::optional<unsigned> DefUseCycles =
+        estimateDefUseCycles(Sched, DefMI, UseMI);
+    if (!DefUseCycles || DefaultDefLatency <= DefUseCycles)
+      return 0;
+    return DefaultDefLatency - *DefUseCycles;
+  }
+  return Sched.computeOperandLatency(DefMI, DefOperIdx, UseMI, UseOperIdx);
+}
+
 /// The length of the critical path through a trace is the maximum of two path
 /// lengths:
 ///
@@ -813,8 +872,8 @@ updateDepth(MachineTraceMetrics::TraceBlockInfo &TBI, const MachineInstr &UseMI,
     unsigned DepCycle = Cycles.lookup(Dep.DefMI).Depth;
     // Add latency if DefMI is a real instruction. Transients get latency 0.
     if (!Dep.DefMI->isTransient())
-      DepCycle += MTM.SchedModel
-        .computeOperandLatency(Dep.DefMI, Dep.DefOp, &UseMI, Dep.UseOp);
+      DepCycle += computeOperandLatency(MTM.SchedModel, Dep.DefMI, Dep.DefOp,
+                                        &UseMI, Dep.UseOp);
     Cycle = std::max(Cycle, DepCycle);
   }
   // Remember the instruction depth.
@@ -929,8 +988,8 @@ static unsigned updatePhysDepsUpwards(const MachineInstr &MI, unsigned Height,
       if (!MI.isTransient()) {
         // We may not know the UseMI of this dependency, if it came from the
         // live-in list. SchedModel can handle a NULL UseMI.
-        DepHeight += SchedModel.computeOperandLatency(&MI, MO.getOperandNo(),
-                                                      I->MI, I->Op);
+        DepHeight += computeOperandLatency(SchedModel, &MI, MO.getOperandNo(),
+                                           I->MI, I->Op);
       }
       Height = std::max(Height, DepHeight);
       // This regunit is dead above MI.
@@ -963,10 +1022,9 @@ static bool pushDepHeight(const DataDep &Dep, const MachineInstr &UseMI,
                           unsigned UseHeight, MIHeightMap &Heights,
                           const TargetSchedModel &SchedModel,
                           const TargetInstrInfo *TII) {
-  // Adjust height by Dep.DefMI latency.
   if (!Dep.DefMI->isTransient())
-    UseHeight += SchedModel.computeOperandLatency(Dep.DefMI, Dep.DefOp, &UseMI,
-                                                  Dep.UseOp);
+    UseHeight += computeOperandLatency(SchedModel, Dep.DefMI, Dep.DefOp, &UseMI,
+                                       Dep.UseOp);
 
   // Update Heights[DefMI] to be the maximum height seen.
   MIHeightMap::iterator I;
@@ -1192,8 +1250,8 @@ MachineTraceMetrics::Trace::getPHIDepth(const MachineInstr &PHI) const {
   unsigned DepCycle = getInstrCycles(*Dep.DefMI).Depth;
   // Add latency if DefMI is a real instruction. Transients get latency 0.
   if (!Dep.DefMI->isTransient())
-    DepCycle += TE.MTM.SchedModel.computeOperandLatency(Dep.DefMI, Dep.DefOp,
-                                                        &PHI, Dep.UseOp);
+    DepCycle += computeOperandLatency(TE.MTM.SchedModel, Dep.DefMI, Dep.DefOp,
+                                      &PHI, Dep.UseOp);
   return DepCycle;
 }
 
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll b/llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll
index 5c42fefb95b39f..69261126cd8b0e 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll
@@ -94,15 +94,15 @@ define i7 @bitreverse_i7(i7 %x) {
 ; RV32-NEXT:    or a1, a1, a2
 ; RV32-NEXT:    slli a2, a0, 2
 ; RV32-NEXT:    andi a2, a2, 16
+; RV32-NEXT:    or a1, a1, a2
 ; RV32-NEXT:    andi a0, a0, 127
-; RV32-NEXT:    andi a3, a0, 8
-; RV32-NEXT:    or a2, a2, a3
+; RV32-NEXT:    andi a2, a0, 8
 ; RV32-NEXT:    or a1, a1, a2
 ; RV32-NEXT:    srli a2, a0, 2
 ; RV32-NEXT:    andi a2, a2, 4
-; RV32-NEXT:    srli a3, a0, 4
-; RV32-NEXT:    andi a3, a3, 2
-; RV32-NEXT:    or a2, a2, a3
+; RV32-NEXT:    or a1, a1, a2
+; RV32-NEXT:    srli a2, a0, 4
+; RV32-NEXT:    andi a2, a2, 2
 ; RV32-NEXT:    or a1, a1, a2
 ; RV32-NEXT:    srli a0, a0, 6
 ; RV32-NEXT:    or a0, a1, a0
@@ -117,15 +117,15 @@ define i7 @bitreverse_i7(i7 %x) {
 ; RV64-NEXT:    or a1, a1, a2
 ; RV64-NEXT:    slli a2, a0, 2
 ; RV64-NEXT:    andi a2, a2, 16
+; RV64-NEXT:    or a1, a1, a2
 ; RV64-NEXT:    andi a0, a0, 127
-; RV64-NEXT:    andi a3, a0, 8
-; RV64-NEXT:    or a2, a2, a3
+; RV64-NEXT:    andi a2, a0, 8
 ; RV64-NEXT:    or a1, a1, a2
 ; RV64-NEXT:    srliw a2, a0, 2
 ; RV64-NEXT:    andi a2, a2, 4
-; RV64-NEXT:    srliw a3, a0, 4
-; RV64-NEXT:    andi a3, a3, 2
-; RV64-NEXT:    or a2, a2, a3
+; RV64-NEXT:    or a1, a1, a2
+; RV64-NEXT:    srliw a2, a0, 4
+; RV64-NEXT:    andi a2, a2, 2
 ; RV64-NEXT:    or a1, a1, a2
 ; RV64-NEXT:    srliw a0, a0, 6
 ; RV64-NEXT:    or a0, a1, a0
@@ -145,24 +145,24 @@ define i24 @bitreverse_i24(i24 %x) {
 ; RV32-NEXT:    or a0, a0, a1
 ; RV32-NEXT:    lui a1, 1048335
 ; RV32-NEXT:    addi a1, a1, 240
-; RV32-NEXT:    and a3, a1, a2
-; RV32-NEXT:    and a3, a0, a3
+; RV32-NEXT:    and a3, a0, a1
+; RV32-NEXT:    and a3, a3, a2
 ; RV32-NEXT:    srli a3, a3, 4
 ; RV32-NEXT:    slli a0, a0, 4
 ; RV32-NEXT:    and a0, a0, a1
 ; RV32-NEXT:    or a0, a3, a0
 ; RV32-NEXT:    lui a1, 1047757
 ; RV32-NEXT:    addi a1, a1, -820
-; RV32-NEXT:    and a3, a1, a2
-; RV32-NEXT:    and a3, a0, a3
+; RV32-NEXT:    and a3, a0, a1
+; RV32-NEXT:    and a3, a3, a2
 ; RV32-NEXT:    srli a3, a3, 2
 ; RV32-NEXT:    slli a0, a0, 2
 ; RV32-NEXT:    and a0, a0, a1
 ; RV32-NEXT:    or a0, a3, a0
 ; RV32-NEXT:    lui a1, 1047211
 ; RV32-NEXT:    addi a1, a1, -1366
-; RV32-NEXT:    and a2, a1, a2
-; RV32-NEXT:    and a2, a0, a2
+; RV32-NEXT:    and a3, a0, a1
+; RV32-NEXT:    and a2, a3, a2
 ; RV32-NEXT:    srli a2, a2, 1
 ; RV32-NEXT:    slli a0, a0, 1
 ; RV32-NEXT:    and a0, a0, a1
@@ -179,24 +179,24 @@ define i24 @bitreverse_i24(i24 %x) {
 ; RV64-NEXT:    or a0, a0, a1
 ; RV64-NEXT:    lui a1, 1048335
 ; RV64-NEXT:    addi a1, a1, 240
-; RV64-NEXT:    and a3, a1, a2
-; RV64-NEXT:    and a3, a0, a3
+; RV64-NEXT:    and a3, a0, a1
+; RV64-NEXT:    and a3, a3, a2
 ; RV64-NEXT:    srliw a3, a3, 4
 ; RV64-NEXT:    slli a0, a0, 4
 ; RV64-NEXT:    and a0, a0, a1
 ; RV64-NEXT:    or a0, a3, a0
 ; RV64-NEXT:    lui a1, 1047757
 ; RV64-NEXT:    addi a1, a1, -820
-; RV64-NEXT:    and a3, a1, a2
-; RV64-NEXT:    and a3, a0, a3
+; RV64-NEXT:    and a3, a0, a1
+; RV64-NEXT:    and a3, a3, a2
 ; RV64-NEXT:    srliw a3, a3, 2
 ; RV64-NEXT:    slli a0, a0, 2
 ; RV64-NEXT:    and a0, a0, a1
 ; RV64-NEXT:    or a0, a3, a0
 ; RV64-NEXT:    lui a1, 1047211
 ; RV64-NEXT:    addiw a1, a1, -1366
-; RV64-NEXT:    and a2, a1, a2
-; RV64-NEXT:    and a2, a0, a2
+; RV64-NEXT:    and a3, a0, a1
+; RV64-NEXT:    and a2, a3, a2
 ; RV64-NEXT:    srliw a2, a2, 1
 ; RV64-NEXT:    slliw a0, a0, 1
 ; RV64-NEXT:    and a0, a0, a1
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll b/llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll
index d55adf371119b5..5723c4b9197a6a 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll
@@ -1266,8 +1266,8 @@ define i32 @va4_va_copy(i32 %argno, ...) nounwind {
 ; RV32-NEXT:    sw a3, 4(sp)
 ; RV32-NEXT:    lw a2, 0(a2)
 ; RV32-NEXT:    add a0, a0, s0
-; RV32-NEXT:    add a1, a1, a2
 ; RV32-NEXT:    add a0, a0, a1
+; RV32-NEXT:    add a0, a0, a2
 ; RV32-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
 ; RV32-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
 ; RV32-NEXT:    addi sp, sp, 48
@@ -1319,8 +1319,8 @@ define i32 @va4_va_copy(i32 %argno, ...) nounwind {
 ; RV64-NEXT:    sd a3, 8(sp)
 ; RV64-NEXT:    lw a2, 0(a2)
 ; RV64-NEXT:    add a0, a0, s0
-; RV64-NEXT:    add a1, a1, a2
-; RV64-NEXT:    addw a0, a0, a1
+; RV64-NEXT:    add a0, a0, a1
+; RV64-NEXT:    addw a0, a0, a2
 ; RV64-NEXT:    ld ra, 24(sp) # 8-byte Folded Reload
 ; RV64-NEXT:    ld s0, 16(sp) # 8-byte Folded Reload
 ; RV64-NEXT:    addi sp, sp, 96
@@ -1371,8 +1371,8 @@ define i32 @va4_va_copy(i32 %argno, ...) nounwind {
 ; RV32-WITHFP-NEXT:    sw a3, -16(s0)
 ; RV32-WITHFP-NEXT:    lw a2, 0(a2)
 ; RV32-WITHFP-NEXT:    add a0, a0, s1
-; RV32-WITHFP-NEXT:    add a1, a1, a2
 ; RV32-WITHFP-NEXT:    add a0, a0, a1
+; RV32-WITHFP-NEXT:    add a0, a0, a2
 ; RV32-WITHFP-NEXT:    lw ra, 28(sp) # 4-byte Folded Reload
 ; RV32-WITHFP-NEXT:    lw s0, 24(sp) # 4-byte Folded Reload
 ; RV32-WITHFP-NEXT:    lw s1, 20(sp) # 4-byte Folded Reload
@@ -1427,8 +1427,8 @@ define i32 @va4_va_copy(i32 %argno, ...) nounwind {
 ; RV64-WITHFP-NEXT:    sd a3, -32(s0)
 ; RV64-WITHFP-NEXT:    lw a2, 0(a2)
 ; RV64-WITHFP-NEXT:    add a0, a0, s1
-; RV64-WITHFP-NEXT:    add a1, a1, a2
-; RV64-WITHFP-NEXT:    addw a0, a0, a1
+; RV64-WITHFP-NEXT:    add a0, a0, a1
+; RV64-WITHFP-NEXT:    addw a0, a0, a2
 ; RV64-WITHFP-NEXT:    ld ra, 40(sp) # 8-byte Folded Reload
 ; RV64-WITHFP-NEXT:    ld s0, 32(sp) # 8-byte Folded Reload
 ; RV64-WITHFP-NEXT:    ld s1, 24(sp) # 8-byte Folded Reload
diff --git a/llvm/test/CodeGen/RISCV/addcarry.ll b/llvm/test/CodeGen/RISCV/addcarry.ll
index 3a4163a8bb50f9..053b98755417b2 100644
--- a/llvm/test/CodeGen/RISCV/addcarry.ll
+++ b/llvm/test/CodeGen/RISCV/addcarry.ll
@@ -18,9 +18,9 @@ define i64 @addcarry(i64 %x, i64 %y) nounwind {
 ; RISCV32-NEXT:    sltu a7, a4, a6
 ; RISCV32-NEXT:    sltu a5, a6, a5
 ; RISCV32-NEXT:    mulhu a6, a0, a3
-; RISCV32-NEXT:    mulhu t0, a1, a2
-; RISCV32-NEXT:    add a6, a6, t0
 ; RISCV32-NEXT:    add a5, a6, a5
+; RISCV32-NEXT:    mulhu a6, a1, a2
+; RISCV32-NEXT:    add a5, a5, a6
 ; RISCV32-NEXT:    add a5, a5, a7
 ; RISCV32-NEXT:    mul a6, a1, a3
 ; RISCV32-NEXT:    add a5, a5, a6
diff --git a/llvm/test/CodeGen/RISCV/atomicrmw-uinc-udec-wrap.ll b/llvm/test/CodeGen/RISCV/atomicrmw-uinc-udec-wrap.ll
index 634ed45044ee21..672625c182d0b5 100644
--- a/llvm/test/CodeGen/RISCV/atomicrmw-uinc-udec-wrap.ll
+++ b/llvm/test/CodeGen/RISCV/atomicrmw-uinc-udec-wrap.ll
@@ -227,8 +227,8 @@ define i16 @atomicrmw_uinc_wrap_i16(ptr %ptr, i16 %val) {
 ; RV32IA-NEXT:    addi a5, a5, 1
 ; RV32IA-NEXT:    sltu a7, a7, a1
 ; RV32IA-NEXT:    neg a7, a7
-; RV32IA-NEXT:    and a5, a5, a3
 ; RV32IA-NEXT:    and a5, a7, a5
+; RV32IA-NEXT:    and a5, a5, a3
 ; RV32IA-NEXT:    sll a5, a5, a0
 ; RV32IA-NEXT:    and a7, a6, a4
 ; RV32IA-NEXT:    or a7, a7, a5
@@ -307,8 +307,8 @@ define i16 @atomicrmw_uinc_wrap_i16(ptr %ptr, i16 %val) {
 ; RV64IA-NEXT:    addi a6, a6, 1
 ; RV64IA-NEXT:    sltu t0, t0, a1
 ; RV64IA-NEXT:    negw t0, t0
-; RV64IA-NEXT:    and a6, a6, a3
 ; RV64IA-NEXT:    and a6, t0, a6
+; RV64IA-NEXT:    and a6, a6, a3
 ; RV64IA-NEXT:    sllw a6, a6, a0
 ; RV64IA-NEXT:    and a4, a4, a5
 ; RV64IA-NEXT:    or a6, a4, a6
diff --git a/llvm/test/CodeGen/RISCV/calling-conv-ilp32-ilp32f-common.ll b/llvm/test/CodeGen/RISCV/calling-conv-ilp32-ilp32f-common.ll
index 278187f62cd75e..8bcdb059a95fbc 100644
--- a/llvm/test/CodeGen/RISCV/calling-conv-ilp32-ilp32f-common.ll
+++ b/llvm/test/CodeGen/RISCV/calling-conv-ilp32-ilp32f-common.ll
@@ -94,15 +94,15 @@ define i32 @callee_aligned_stack(i32 %a, i32 %b, fp128 %c, i32 %d, i32 %e, i64 %
 ; RV32I-FPELIM-LABEL: callee_aligned_stack:
 ; RV32I-FPELIM:       # %bb.0:
 ; RV32I-FPELIM-NEXT:    lw a0, 0(a2)
-; RV32I-FPELIM-NEXT:    lw a1, 8(sp)
+; RV32I-FPELIM-NEXT:    lw a1, 20(sp)
 ; RV32I-FPELIM-NEXT:    lw a2, 0(sp)
-; RV32I-FPELIM-NEXT:    lw a3, 20(sp)
+; RV32I-FPELIM-NEXT:    lw a3, 8(sp)
 ; RV32I-FPELIM-NEXT:    lw a4, 16(sp)
 ; RV32I-FPELIM-NEXT:    add a0, a0, a7
-; RV32I-FPELIM-NEXT:    add a1, a2, a1
-; RV32I-FPELIM-NEXT:    add a0, a0, a1
-; RV32I-FPELIM-NEXT:    add a3, a4, a3
+; RV32I-FPELIM-NEXT:    add a0, a0, a2
 ; RV32I-FPELIM-NEXT:    add a0, a0, a3
+; RV32I-FPELIM-NEXT:    add a0, a0, a4
+; RV32I-FPELIM-NEXT:    add a0, a0, a1
 ; RV32I-FPELIM-NEXT:    ret
 ;
 ; RV32I-WITHFP-LABEL: callee_aligned_stack:
@@ -112,15 +112,15 @@ define i32 @callee_aligned_stack(i32 %a, i32 %b, fp128 %c, i32 %d, i32 %e, i64 %
 ; RV32I-WITHFP-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
 ; RV32I-WITHFP-NEXT:    addi s0, sp, 16
 ; RV32I-WITHFP-NEXT:    lw a0, 0(a2)
-; RV32I-WITHFP-NEXT:    lw a1, 8(s0)
+; RV32I-WITHFP-NEXT:    lw a1, 20(s0)
 ; RV32I-WITHFP-NEXT:    lw a2, 0(s0)
-; RV32I-WITHFP-NEXT:    lw a3, 20(s0)
+; RV32I-WITHFP-NEXT:    lw a3, 8(s0)
 ; RV32I-WITHFP-NEXT:    lw a4, 16(s0)
 ; RV32I-WITHFP-NEXT:    add a0, a0, a7
-; RV32I-WITHFP-NEXT:    add a1, a2, a1
-; RV32I-WITHFP-NEXT:    add a0, a0, a1
-; RV32I-WITHFP-NEXT:    add a3, a4, a3
+; RV32I-WITHFP-NEXT:    add a0, a0, a2
 ; RV32I-WITHFP-NEXT:    add a0, a0, a3
+; RV32I-WITHFP-NEXT:    add a0, a0, a4
+; RV32I-WITHFP-NEXT:    add a0, a0, a1
 ; RV32I-WITHFP-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
 ; RV32I-WITHFP-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
 ; RV32I-WITHFP-NEXT:    addi sp, sp, 16
diff --git a/llvm/test/CodeGen/RISCV/calling-conv-ilp32-ilp32f-ilp32d-common.ll b/llvm/test/CodeGen/RISCV/calling-conv-ilp32-ilp32f-ilp32d-common.ll
index 231ed159ab2061..4906cc8eb73a53 100644
--- a/llvm/test/CodeGen/RISCV/calling-conv-ilp32-ilp32f-ilp32d-common.ll
+++ b/llvm/test/CodeGen/RISCV/calling-conv-ilp32-ilp32f-ilp32d-common.ll
@@ -87,16 +87,16 @@ define i32 @callee_many_scalars(i8 %a, i16 %b, i32 %c, i64 %d, i32 %e, i32 %f, i
 ; RV32I-FPELIM-NEXT:    andi a0, a0, 255
 ; RV32I-FPELIM-NEXT:    slli a1, a1, 16
 ; RV32I-FPELIM-NEXT:    srli a1, a1, 16
-; RV32I-FPELIM-NEXT:    add a0, a0, a2
 ; RV32I-FPELIM-NEXT:    add a0, a0, a1
+; RV32I-FPELIM-NEXT:    add a0, a0, a2
 ; RV32I-FPELIM-NEXT:    xor a1, a4, t1
 ; RV32I-FPELIM-NEXT:    xor a2, a3, a7
 ; RV32I-FPELIM-NEXT:    or a1, a2, a1
 ; RV32I-FPELIM-NEXT:    seqz a1, a1
+; RV32I-FPELIM-NEXT:    add a0, a1, a0
 ; RV32I-FPELIM-NEXT:    add a0, a0, a5
 ; RV32I-FPELIM-NEXT:    add a0, a0, a6
 ; RV32I-FPELIM-NEXT:    add a0, a0, t0
-; RV32I-FPELIM-NEXT:    add a0, a1, a0
 ; RV32I-FPELIM-NEXT:    ret
 ;
 ; RV32I-WITHFP-LABEL: callee_many_scalars:
@@ -110,16 +110,16 @@ define i32 @callee_many_scalars(i8 %a, i16 %b, i32 %c, i64 %d, i32 %e, i32 %f, i
 ; RV32I-WITHFP-NEXT:    andi a0, a0, 255
 ; RV32I-WITHFP-NEXT:    slli a1, a1, 16
 ; RV32I-WITHFP-NEXT:    srli a1, a1, 16
-; RV32I-WITHFP-NEXT:    add a0, a0, a2
 ; RV32I-WITHFP-NEXT:    add a0, a0, a1
+; RV32I-WITHFP-NEXT:    add a0, a0, a2
 ; RV32I-WITHFP-NEXT:    xor a1, a4, t1
 ; RV32I-WITHFP-NEXT:    xor a2, a3, a7
 ; RV32I-WITHFP-NEXT:    or a1, a2, a1
 ; RV32I-WITHFP-NEXT:    seqz a1, a1
+; RV32I-WITHFP-NEXT:    add a0, a1, a0
 ; RV32I-WITHFP-NEXT:    add a0, a0, a5
 ; RV32I-WITHFP-NEXT:    add a0, a0, a6
 ; RV32I-WITHFP-NEXT:    add a0, a0, t0
-; RV32I-WITHFP-NEXT:    add a0, a1, a0
 ; RV32I-WITHFP-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
 ; RV32I-WITHFP-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
 ; RV32I-WITHFP-NEXT:    addi sp, sp, 16
@@ -614,15 +614,15 @@ define i32 @callee_aligned_stack(i32 %a, i32 %b, fp128 %c, i32 %d, i32 %e, i64 %
 ; RV32I-FPELIM-LABEL: callee_aligned_stack:
 ; RV32I-FPELIM:       # %bb.0:
 ; RV32I-FPELIM-NEXT:    lw a0, 0(a2)
-; RV32I-FPELIM-NEXT:    lw a1, 8(sp)
+; RV32I-FPELIM-NEXT:    lw a1, 20(sp)
 ; RV32I-FPELIM-NEXT:    lw a2, 0(sp)
-; RV32I-FPELIM-NEXT:    lw a3, 20(sp)
+; RV32I-FPELIM-NEXT:    lw a3, 8(sp)
 ; RV32I-FPELIM-NEXT:    lw a4, 16(sp)
 ; RV32I-FPELIM-NEXT:    add a0, a0, a7
-; RV32I-FPELIM-NEXT:    add a1, a2, a1
-; RV32I-FPELIM-NEXT:    add a0, a0, a1
-; RV32I-FPELIM-NEXT:    add a3, a4, a3
+; RV32I-FPELIM-NEXT:    add a0, a0, a2
 ; RV32I-FPELIM-NEXT:    add a0, a0, a3
+; RV32I-FPELIM-NEXT:    add a0, a0, a4
+; RV32I-FPELIM-NEXT:    add a0, a0, a1
 ; RV32I-FPELIM-NEXT:    ret
 ;
 ; RV32I-WITHFP-LABEL: callee_aligned_stack:
@@ -632,15 +632,15 @@ define i32 @callee_aligned_stack(i32 %a, i32 %b, fp128 %c, i32 %d, i32 %e, i64 %
 ; RV32I-WITHFP-NEXT:    sw s0, 8(sp) # 4-byte Folded Spill
 ; RV32I-WITHFP-NEXT:    addi s0, sp, 16
 ; RV32I-WITHFP-NEXT:    lw a0, 0(a2)
-; RV32I-WITHFP-NEXT:    lw a1, 8(s0)
+; RV32I-WITHFP-NEXT:    lw a1, 20(s0)
 ; RV32I-WITHFP-NEXT:    lw a2, 0(s0)
-; RV32I-WITHFP-NEXT:    lw a3, 20(s0)
+; RV32I-WITHFP-NEXT:    lw a3, 8(s0)
 ; RV32I-WITHFP-NEXT:    lw a4, 16(s0)
 ; RV32I-WITHFP-NEXT:    add a0, a0, a7
-; RV32I-WITHFP-NEXT:    add a1, a2, a1
-; RV32I-WITHFP-NEXT:    add a0, a0, a1
-; RV32I-WITHFP-NEXT:    add a3, a4, a3
+; RV32I-WITHFP-NEXT:    add a0, a0, a2
 ; RV32I-WITHFP-NEXT:    add a0, a0, a3
+; RV32I-WITHFP-NEXT:    add a0, a0, a4
+; RV32I-WITHFP-NEXT:    add a0, a0, a1
 ; RV32I-WITHFP-NEXT:    lw ra, 12(sp) # 4-byte Folded Reload
 ; RV32I-WITHFP-NEXT:    lw s0, 8(sp) # 4-byte Folded Reload
 ; RV32I-WITHFP-NEXT:    addi sp, sp, 16
diff --git a/llvm/test/CodeGen/RISCV/calling-conv-ilp32e.ll b/llvm/test/CodeGen/RISCV/calling-conv-ilp32e.ll
index d08cf577b1bdd3..69691869997666 100644
--- a/llvm/test/CodeGen/RISCV/calling-conv-ilp32e.ll
+++ b/llvm/test/CodeGen/RISCV/calling-conv-ilp32e.ll
@@ -529,16 +529,16 @@ define i32 @callee_aligned_stack(i32 %a, i32 %b, fp128 %c, i32 %d, i32 %e, i64 %
 ; ILP32E-FPELIM-LABEL: callee_aligned_stack:
 ; ILP32E-FPELIM:       # %bb.0:
 ; ILP32E-FPELIM-NEXT:    lw a0, 0(a2)
-; ILP32E-FPELIM-NEXT:    lw a1, 12(sp)
+; ILP32E-FPELIM-NEXT:    lw a1, 24(sp)
 ; ILP32E-FPELIM-NEXT:    lw a2, 4(sp)
 ; ILP32E-FPELIM-NEXT:    lw a3, 8(sp)
-; ILP32E-FPELIM-NEXT:    lw a4, 24(sp)
+; ILP32E-FPELIM-NEXT:    lw a4, 12(sp)
 ; ILP32E-FPELIM-NEXT:    lw a5, 20(sp)
 ; ILP32E-FPELIM-NEXT:    add a0, a0, a2
-; ILP32E-FPELIM-NEXT:    add a1, a3, a1
-; ILP32E-FPELIM-NEXT:    add a0, a0, a1
-; ILP32E-FPELIM-NEXT:    add a4, a5, a4
+; ILP32E-FPELIM-NEXT:    add a0, a0, a3
 ; ILP32E-FP...
[truncated]

@artagnon artagnon requested a review from arsenm August 7, 2024 11:33
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be useful to use defaultDefLatency for the instructions we are iterating over here in the case that some of the instructions we are iterating over depend on eachother. Imagine a scenario:

defmi = ...
a = ...
b = use a
usemi = ...

In this case, what if we checked defaultDefLatency of a and used it to understand the number of cycles elapsed between a and b? For example, if a has a default latency of 10, then b can't really start in the next NumMicroOps / IssueWidth cycles, since it has to wait 10 additional cycles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm having trouble understanding your example. Here's what I think I understand:

c = ...
a = ...
b = use a
d = use c

Here, if c has a default def latency of N cycles, a has that of M cycles, b can start after M - 1 cycles, and d can start after N - 3 cycles, assuming issue-width = num-micro-ops = 1. What do you mean by "instructions that depends on each other"? How can one instruction depend on another, which in turn depends on the first? Wouldn't this break basic dominance criteria? Also, if I understand correctly, I think we're in SSA form at this point, so we don't have to worry about re-definitions.

Copy link
Contributor

@michaelmaitland michaelmaitland Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a loop that iterates over instructions between DefMI and UseMI:

for (auto It = DefIt; It != UseIt; ++It) {
    // In cases where the UseMI is a PHI at the beginning of the MBB, compute
    // MicroOps until the end of the MBB.
    if (It.isEnd())
      break;

    NumMicroOps += Sched.getNumMicroOps(&*It);
  }

I suggest the following scenario:

defmi = ...
a = ...
b = use a
usemi = ...

In this case, we will be looping over instructions a and b and adding their number of micro ops to calculate the number of cycles elapsed between defmi and usemi.

Let's take the assumption that a has default def latency of N cycles and b has default latency of M cycles.

What do you mean by "instructions that depends on each other"?

In my scenario, b uses the result of a. It cannot start until that result is ready (an extra N cycles). If we want to make it concrete, we could imagine that it looks like this:

a = add 3, 2
b = sub a, 2

We cannot start the subtraction until a is finished calculating. This is what I am calling a dependency. We can assume for sake of simplicity here that a and b are independent from defmi and usemi.

In this scenario, I am suggesting that if the default latency of a is larger than the number of micro-ops, then we must wait at least the default latency of a before starting b. I suggest that we can incorporate this into the estimation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for the explanation! However, I think we're missing the fact that CPUs are usually pipelined, and the defmi-usemi dependency will be in one pipeline, while the a-b dependency will be in another pipeline. Hence, I think the defmi-usemi dependency should be independent of the a-b dependency. When the code is called with DefMI = a, and UseMI = b, it will return the correct answer for that dependency.

Now, we haven't actually modeled any pipelines, but do you think this is feasible?

Copy link
Contributor

@michaelmaitland michaelmaitland Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good point that if there are multiple pipelines and the defmi-usemi instructions goes down one and the a-b goes down another, then what you have is correct. There definitely is also a scenario where these dependency pairs go down the same pipeline and in that case what I am suggesting is probably the better model.

Unfortunately we don't have any pipeline information because we don't have the scheduler model, so we don't actually know what we should do.

One argument is to keep what we have no because its simple and less expensive to compute.
Another argument is to pick the "more common" approach. I'd prefer not to make a blanket statement and say that "it is more likely for independent instructions to go down different pipelines", although I wouldn't be surprised if this was the case.

For these reasons, I am content with the approach you are proposing. Happy to see if anyone else has thoughts on this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have missed something but the use of microops is worrying me - many arch don't guarantee that uop and latency are a close match (e.g. alderlake divpd ymm uops=1 latency=15). And then dividing by issuewidth makes it feel more like a throughput estimate than latency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Will adderlake divpd not dispatch in one cycle though? We're subtracting DefaultDefLatency by number of cycles elapsed between the DefMI and UseMI. If there's a adderlake divpd between the DefMI and UseMI, should we be subtracting 15 cycles for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RKSimon do you still have concerns about use of micro-ops and issue width here?

I think it makes sense to use issue width in determining "the number of cycles between two instructions" in the calculation here. For example:

a = def
b = ...
c = use

If the issue width is 1, then a is issued in once cycle, and b the next. But if the issue width is 2 then a and b are issued in the same cycle. In the former case we should estimate 2 cycles between [a, c) and in the latter estimate 1 cycle.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this makes assumptions that the ops can be issued on any pipe? I'm still not convinced your approach makes sense tbh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't have the scheduling model, we could either assume that all ops are issued on the same pipeline, or that independent ops are always issued on different pipelines. I think the latter case is more common. The patch is a first-order improvement over DefaultDefLatency: it is by no means an accurate representation of how the machine functions, but wouldn't you agree that the patch is an improvement over DefaultDefLatency?

Of course, if you have a better idea concerning how to improve to latency computation in the fallback case, please do suggest.

@artagnon
Copy link
Contributor Author

Rebase and ping.

@RKSimon
Copy link
Collaborator

RKSimon commented Aug 16, 2024

@artagnon please can you rebase to fix the merge conflict?

@artagnon
Copy link
Contributor Author

artagnon commented Aug 19, 2024

Sorry for the delay; rebased now.

@artagnon
Copy link
Contributor Author

Gentle ping. Do any of the other reviewers have any comments?

@artagnon
Copy link
Contributor Author

artagnon commented Sep 2, 2024

Gentle ping.

@artagnon
Copy link
Contributor Author

Rebase (yet again), with a change: use DefMI->getIterator(), UseMI->getIterator() instead of iterating over MBB.

@artagnon
Copy link
Contributor Author

Gentle ping.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only useful for incomplete targets?

@artagnon
Copy link
Contributor Author

Is this only useful for incomplete targets?

It is useful when the CPU doesn't have a scheduler descriptor checked into the tree. This happens in the real-world on my X86 box, for instance.

@RKSimon
Copy link
Collaborator

RKSimon commented Sep 23, 2024

Is this only useful for incomplete targets?

It is useful when the CPU doesn't have a scheduler descriptor checked into the tree. This happens in the real-world on my X86 box, for instance.

what x86 are you missing scheduler support for?

@artagnon
Copy link
Contributor Author

Is this only useful for incomplete targets?

It is useful when the CPU doesn't have a scheduler descriptor checked into the tree. This happens in the real-world on my X86 box, for instance.

what x86 are you missing scheduler support for?

I should have been clearer: I think there are scheduler descriptor files for most (if not all) CPUs in the X86 world in the tree, but since my distro's Clang can't know about my hardware, an -mtune argument isn't baked in. In the RISC-V world, we only have scheduler descriptor for a few CPUs, including SiFive, in the tree.

TargetSchedModel::computeOperandLatency is supposed to return the exact
latency between two MIs, although it is observed that InstrSchedModel
and InstrItineraries are often unavailable in many real-world scenarios.
When these two pieces of information are not available, the function
returns an estimate that is much too conservative: the default def
latency. MachineTraceMetrics is one of the callers affected quite badly
by these conservative estimates. To improve the estimate, and let
callers of MTM generate better code, offset the default def latency by
the estiamted cycles elapsed between the def MI and use MI. Since we're
trying to improve codegen in the case when no scheduling information is
unavailable, it is impossible to determine the number of cycles elapsed
between the two MIs, and we use the distance between them as a crude
approximate. In practice, this improvement of one crude estimate by
offseting it with another crude estimate leads to better codegen on
average, and yields huge gains on standard benchmarks.
@artagnon
Copy link
Contributor Author

artagnon commented Oct 2, 2024

Rebase (yet again). Could we kindly converge on whether or not this patch is useful?

const MachineInstr *UseMI,
unsigned UseOperIdx) {
assert(DefMI && "Non-null DefMI expected");
if (!Sched.hasInstrSchedModel() && !Sched.hasInstrItineraries()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of placing this in this wrapper, could the default implementation of computeOperandLatency handle this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially tried this in #74088, but @jayfoad said that I should consider doing it in callers. Perhaps it's time to revisit that?

; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
; X86-NEXT: movl {{[0-9]+}}(%esp), %ebp
; X86-NEXT: pushl %ebp
; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should fix the missing models that cause any of the test changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #111865, there should be no changes in the X86 tests except in early-ifcvt-remarks.ll, which already specifies an mcpu that doesn't have scheduling information for all the instructions used. I'm not sure what we can do about the RISCV tests though: the only models available in the tree are SiFive's and Syntacore's; in practice, RISCV is very diverse, and there are lots of scheduling models downstream, so I think we'll have to live with those test changes.

@artagnon
Copy link
Contributor Author

I think we weren't able to conclude that this a good idea. Not pursuing.

@artagnon artagnon closed this Oct 22, 2024
@artagnon artagnon deleted the mtm-def-use-dist branch October 22, 2024 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants