Skip to content

Conversation

@manastasova
Copy link
Contributor

This commit improves the performance of the AVX2 Keccak-F1600x4 implementation by:

  • Replace gather-based loading (vpgatherqq) with explicit loads and transpose operations using unpack and permute instructions.
  • Replace individual scatter stores with batched 4-lane transpose and store operations (MLK_SCATTER_STORE256_4X).
  • Merged prepareTheta into thetaRhoPiChiIota at the start of each round.
  • Eliminated separate E state variables by using temporaries (Tba, etc.) and copying back to A at end of each round, removing A/E alternation.
  • Changed ROUNDS24 from 24 unrolled macro calls to a loop-based MLK_ROUNDS_x2 processing 2 rounds per iteration.

Signed-off-by: manastasova <manastasova2017@fau.edu>
@manastasova manastasova added benchmark this PR should be benchmarked in CI x86_64 labels Jan 22, 2026
Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 12327 cycles 12327 cycles 1
ML-KEM-512 encaps 15031 cycles 15030 cycles 1.00
ML-KEM-512 decaps 19607 cycles 19609 cycles 1.00
ML-KEM-768 keypair 21092 cycles 21092 cycles 1
ML-KEM-768 encaps 23863 cycles 23861 cycles 1.00
ML-KEM-768 decaps 30443 cycles 30442 cycles 1.00
ML-KEM-1024 keypair 30376 cycles 30376 cycles 1
ML-KEM-1024 encaps 34642 cycles 34643 cycles 1.00
ML-KEM-1024 decaps 44279 cycles 44268 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ppc64le (POWER10) benchmarks

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 59191 cycles 59226 cycles 1.00
ML-KEM-512 encaps 71851 cycles 71908 cycles 1.00
ML-KEM-512 decaps 91623 cycles 91531 cycles 1.00
ML-KEM-768 keypair 98551 cycles 98104 cycles 1.00
ML-KEM-768 encaps 114881 cycles 114532 cycles 1.00
ML-KEM-768 decaps 140316 cycles 140016 cycles 1.00
ML-KEM-1024 keypair 148524 cycles 148872 cycles 1.00
ML-KEM-1024 encaps 167437 cycles 167765 cycles 1.00
ML-KEM-1024 decaps 198455 cycles 199095 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 9367 cycles 9650 cycles 0.97
ML-KEM-512 encaps 11028 cycles 11457 cycles 0.96
ML-KEM-512 decaps 15284 cycles 15335 cycles 1.00
ML-KEM-768 keypair 16012 cycles 16453 cycles 0.97
ML-KEM-768 encaps 17642 cycles 17930 cycles 0.98
ML-KEM-768 decaps 23218 cycles 23627 cycles 0.98
ML-KEM-1024 keypair 22181 cycles 22362 cycles 0.99
ML-KEM-1024 encaps 24116 cycles 24602 cycles 0.98
ML-KEM-1024 decaps 31703 cycles 32362 cycles 0.98

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 13858 cycles 16681 cycles 0.83
ML-KEM-512 encaps 15565 cycles 18380 cycles 0.85
ML-KEM-512 decaps 21034 cycles 23712 cycles 0.89
ML-KEM-768 keypair 23415 cycles 28448 cycles 0.82
ML-KEM-768 encaps 24751 cycles 29801 cycles 0.83
ML-KEM-768 decaps 32627 cycles 37656 cycles 0.87
ML-KEM-1024 keypair 32841 cycles 41276 cycles 0.80
ML-KEM-1024 encaps 35167 cycles 43491 cycles 0.81
ML-KEM-1024 decaps 45730 cycles 53885 cycles 0.85

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 28399 cycles 28434 cycles 1.00
ML-KEM-512 encaps 35834 cycles 35766 cycles 1.00
ML-KEM-512 decaps 45554 cycles 45475 cycles 1.00
ML-KEM-768 keypair 45863 cycles 45954 cycles 1.00
ML-KEM-768 encaps 56265 cycles 56116 cycles 1.00
ML-KEM-768 decaps 69388 cycles 69482 cycles 1.00
ML-KEM-1024 keypair 71853 cycles 71559 cycles 1.00
ML-KEM-1024 encaps 84528 cycles 84605 cycles 1.00
ML-KEM-1024 decaps 101544 cycles 101115 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 28311 cycles 28387 cycles 1.00
ML-KEM-512 encaps 34295 cycles 34239 cycles 1.00
ML-KEM-512 decaps 44520 cycles 44594 cycles 1.00
ML-KEM-768 keypair 47859 cycles 47826 cycles 1.00
ML-KEM-768 encaps 54136 cycles 54274 cycles 1.00
ML-KEM-768 decaps 68640 cycles 68610 cycles 1.00
ML-KEM-1024 keypair 70519 cycles 70609 cycles 1.00
ML-KEM-1024 encaps 79075 cycles 79023 cycles 1.00
ML-KEM-1024 decaps 98785 cycles 98791 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 14391 cycles 16399 cycles 0.88
ML-KEM-512 encaps 16918 cycles 18695 cycles 0.90
ML-KEM-512 decaps 23408 cycles 25297 cycles 0.93
ML-KEM-768 keypair 25118 cycles 27938 cycles 0.90
ML-KEM-768 encaps 27078 cycles 29785 cycles 0.91
ML-KEM-768 decaps 36486 cycles 41180 cycles 0.89
ML-KEM-1024 keypair 33822 cycles 37708 cycles 0.90
ML-KEM-1024 encaps 36159 cycles 40685 cycles 0.89
ML-KEM-1024 decaps 49453 cycles 54424 cycles 0.91

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 17688 cycles 17702 cycles 1.00
ML-KEM-512 encaps 20670 cycles 20702 cycles 1.00
ML-KEM-512 decaps 27133 cycles 27133 cycles 1
ML-KEM-768 keypair 29987 cycles 30013 cycles 1.00
ML-KEM-768 encaps 32854 cycles 32811 cycles 1.00
ML-KEM-768 decaps 42013 cycles 42061 cycles 1.00
ML-KEM-1024 keypair 43899 cycles 43914 cycles 1.00
ML-KEM-1024 encaps 48921 cycles 48930 cycles 1.00
ML-KEM-1024 decaps 61602 cycles 61496 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 10521 cycles 12005 cycles 0.88
ML-KEM-512 encaps 12061 cycles 13291 cycles 0.91
ML-KEM-512 decaps 17019 cycles 18051 cycles 0.94
ML-KEM-768 keypair 18414 cycles 20559 cycles 0.90
ML-KEM-768 encaps 19515 cycles 21546 cycles 0.91
ML-KEM-768 decaps 26626 cycles 28661 cycles 0.93
ML-KEM-1024 keypair 24623 cycles 27867 cycles 0.88
ML-KEM-1024 encaps 26733 cycles 29966 cycles 0.89
ML-KEM-1024 decaps 36391 cycles 39500 cycles 0.92

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 18737 cycles 18762 cycles 1.00
ML-KEM-512 encaps 22002 cycles 22034 cycles 1.00
ML-KEM-512 decaps 29037 cycles 29048 cycles 1.00
ML-KEM-768 keypair 31782 cycles 31800 cycles 1.00
ML-KEM-768 encaps 35006 cycles 34950 cycles 1.00
ML-KEM-768 decaps 45010 cycles 45042 cycles 1.00
ML-KEM-1024 keypair 46334 cycles 46355 cycles 1.00
ML-KEM-1024 encaps 51700 cycles 51750 cycles 1.00
ML-KEM-1024 decaps 65265 cycles 65261 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 40116 cycles 40315 cycles 1.00
ML-KEM-512 encaps 48288 cycles 48372 cycles 1.00
ML-KEM-512 decaps 62456 cycles 62465 cycles 1.00
ML-KEM-768 keypair 63657 cycles 63637 cycles 1.00
ML-KEM-768 encaps 74664 cycles 74830 cycles 1.00
ML-KEM-768 decaps 93254 cycles 93236 cycles 1.00
ML-KEM-1024 keypair 95000 cycles 95054 cycles 1.00
ML-KEM-1024 encaps 108988 cycles 109053 cycles 1.00
ML-KEM-1024 decaps 131759 cycles 131949 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 45781 cycles 45812 cycles 1.00
ML-KEM-512 encaps 54744 cycles 54778 cycles 1.00
ML-KEM-512 decaps 70275 cycles 70387 cycles 1.00
ML-KEM-768 keypair 73830 cycles 73966 cycles 1.00
ML-KEM-768 encaps 85352 cycles 85383 cycles 1.00
ML-KEM-768 decaps 106339 cycles 106459 cycles 1.00
ML-KEM-1024 keypair 111726 cycles 111805 cycles 1.00
ML-KEM-1024 encaps 125852 cycles 125952 cycles 1.00
ML-KEM-1024 decaps 151675 cycles 151837 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 35451 cycles 35858 cycles 0.99
ML-KEM-512 encaps 40205 cycles 40235 cycles 1.00
ML-KEM-512 decaps 51249 cycles 51231 cycles 1.00
ML-KEM-768 keypair 56829 cycles 56807 cycles 1.00
ML-KEM-768 encaps 64602 cycles 65391 cycles 0.99
ML-KEM-768 decaps 78906 cycles 79291 cycles 1.00
ML-KEM-1024 keypair 88081 cycles 88036 cycles 1.00
ML-KEM-1024 encaps 97248 cycles 97186 cycles 1.00
ML-KEM-1024 decaps 116231 cycles 116104 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 36534 cycles 36687 cycles 1.00
ML-KEM-512 encaps 43003 cycles 43011 cycles 1.00
ML-KEM-512 decaps 55675 cycles 55666 cycles 1.00
ML-KEM-768 keypair 58415 cycles 58457 cycles 1.00
ML-KEM-768 encaps 67402 cycles 67409 cycles 1.00
ML-KEM-768 decaps 84350 cycles 84377 cycles 1.00
ML-KEM-1024 keypair 88631 cycles 88658 cycles 1.00
ML-KEM-1024 encaps 98864 cycles 98909 cycles 1.00
ML-KEM-1024 decaps 120369 cycles 120440 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 39000 cycles 39834 cycles 0.98
ML-KEM-512 encaps 44609 cycles 44640 cycles 1.00
ML-KEM-512 decaps 56711 cycles 56703 cycles 1.00
ML-KEM-768 keypair 62433 cycles 62431 cycles 1.00
ML-KEM-768 encaps 70885 cycles 71780 cycles 0.99
ML-KEM-768 decaps 86781 cycles 87166 cycles 1.00
ML-KEM-1024 keypair 96335 cycles 96262 cycles 1.00
ML-KEM-1024 encaps 106377 cycles 106330 cycles 1.00
ML-KEM-1024 decaps 126937 cycles 126801 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 59531 cycles 59477 cycles 1.00
ML-KEM-512 encaps 67162 cycles 67251 cycles 1.00
ML-KEM-512 decaps 85830 cycles 85738 cycles 1.00
ML-KEM-768 keypair 96983 cycles 97018 cycles 1.00
ML-KEM-768 encaps 110439 cycles 110424 cycles 1.00
ML-KEM-768 decaps 137219 cycles 137136 cycles 1.00
ML-KEM-1024 keypair 154182 cycles 154167 cycles 1.00
ML-KEM-1024 encaps 170792 cycles 170457 cycles 1.00
ML-KEM-1024 decaps 206619 cycles 206792 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 28394 cycles 28314 cycles 1.00
ML-KEM-512 encaps 34245 cycles 34303 cycles 1.00
ML-KEM-512 decaps 44588 cycles 44520 cycles 1.00
ML-KEM-768 keypair 47894 cycles 47846 cycles 1.00
ML-KEM-768 encaps 54377 cycles 54137 cycles 1.00
ML-KEM-768 decaps 68808 cycles 68665 cycles 1.00
ML-KEM-1024 keypair 70500 cycles 70549 cycles 1.00
ML-KEM-1024 encaps 79000 cycles 79141 cycles 1.00
ML-KEM-1024 decaps 98806 cycles 98835 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 59023 cycles 59570 cycles 0.99
ML-KEM-512 encaps 68543 cycles 68575 cycles 1.00
ML-KEM-512 decaps 87352 cycles 87326 cycles 1.00
ML-KEM-768 keypair 95685 cycles 95742 cycles 1.00
ML-KEM-768 encaps 109493 cycles 109660 cycles 1.00
ML-KEM-768 decaps 134366 cycles 134532 cycles 1.00
ML-KEM-1024 keypair 146719 cycles 148351 cycles 0.99
ML-KEM-1024 encaps 162524 cycles 164301 cycles 0.99
ML-KEM-1024 decaps 194686 cycles 195562 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 155156 cycles 155116 cycles 1.00
ML-KEM-512 encaps 163334 cycles 163295 cycles 1.00
ML-KEM-512 decaps 206536 cycles 206550 cycles 1.00
ML-KEM-768 keypair 249502 cycles 249522 cycles 1.00
ML-KEM-768 encaps 270309 cycles 270296 cycles 1.00
ML-KEM-768 decaps 332165 cycles 332114 cycles 1.00
ML-KEM-1024 keypair 395238 cycles 395146 cycles 1.00
ML-KEM-1024 encaps 423919 cycles 423791 cycles 1.00
ML-KEM-1024 decaps 505639 cycles 505524 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks

Details
Benchmark suite Current: 1647e99 Previous: ebed0f3 Ratio
ML-KEM-512 keypair 50737 cycles 50672 cycles 1.00
ML-KEM-512 encaps 58468 cycles 58540 cycles 1.00
ML-KEM-512 decaps 74124 cycles 74115 cycles 1.00
ML-KEM-768 keypair 86957 cycles 86481 cycles 1.01
ML-KEM-768 encaps 95630 cycles 94380 cycles 1.01
ML-KEM-768 decaps 118875 cycles 117403 cycles 1.01
ML-KEM-1024 keypair 131396 cycles 129771 cycles 1.01
ML-KEM-1024 encaps 143409 cycles 142171 cycles 1.01
ML-KEM-1024 decaps 174419 cycles 173614 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Signed-off-by: manastasova <manastasova2017@fau.edu>
Signed-off-by: manastasova <manastasova2017@fau.edu>
@oqs-bot
Copy link
Contributor

oqs-bot commented Jan 22, 2026

CBMC Results (ML-KEM-512)

Full Results (139 proofs)
Proof Status Current Previous Change
**TOTAL** 1428s 1185s +20.5%
mlk_indcpa_keypair_derand 236s 205s +15%
mlk_indcpa_enc 224s 181s +24%
mlk_keccak_squeezeblocks_x4 170s 133s +28%
mlk_rej_uniform_c 108s 73s +48%
mlk_polyvec_basemul_acc_montgomery_cached_c 60s 43s +40%
poly_ntt_native 43s 34s +26%
mlk_poly_rej_uniform 39s 32s +22%
mlk_ntt_layer 29s 23s +26%
polyvec_basemul_acc_montgomery_cached_native 25s 21s +19%
keccakf1600x4_permute_native_x4 18s 18s +0%
mlk_poly_reduce_native 18s 14s +29%
mlk_poly_sub 13s 9s +44%
mlk_keccak_absorb_once_x4 12s 9s +33%
mlk_ntt_butterfly_block 12s 10s +20%
mlk_poly_frommsg 12s 9s +33%
mlk_poly_frombytes_native 11s 11s +0%
mlk_polyvec_add 11s 10s +10%
mlk_poly_rej_uniform_x4 10s 6s +67%
mlk_fqmul 8s 6s +33%
mlk_indcpa_dec 8s 10s -20%
mlk_keccak_squeeze_once 8s 8s +0%
mlk_keccak_squeezeblocks 8s 7s +14%
mlk_invntt_layer 7s 6s +17%
mlk_poly_tomsg 7s 1s +600%
mlk_polymat_permute_bitrev_to_custom 7s 2s +250%
kem_dec 6s 5s +20%
mlk_keccakf1600_permute 6s 4s +50%
mlk_poly_tomont 6s 3s +100%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 5s 3s +67%
keccakf1600_permute_native 5s 5s +0%
mlk_gen_matrix_serial 5s 3s +67%
mlk_keccak_absorb_once 5s 5s +0%
mlk_poly_cbd_eta2 5s 4s +25%
mlk_poly_compress_du 5s 4s +25%
mlk_poly_getnoise_eta1122_4x 5s 5s +0%
mlk_poly_getnoise_eta1_4x 5s 3s +67%
mlk_poly_getnoise_eta1_4x_native 5s 2s +150%
mlk_poly_mulcache_compute_c 5s 2s +150%
mlk_scalar_decompress_d10 5s 4s +25%
mlk_shake256x4 5s 6s -17%
keccak_f1600_x4_native_aarch64_v84a 4s 2s +100%
mlk_keccakf1600_extract_bytes (big endian) 4s 1s +300%
mlk_keccakf1600_xor_bytes (big endian) 4s 3s +33%
mlk_poly_add 4s 2s +100%
mlk_poly_cbd_eta1 4s 2s +100%
mlk_poly_decompress_du 4s 2s +100%
mlk_poly_getnoise_eta2 4s 2s +100%
mlk_poly_mulcache_compute 4s 2s +100%
mlk_poly_reduce_c 4s 2s +100%
mlk_polyvec_permute_bitrev_to_custom_native 4s 4s +0%
mlk_polyvec_reduce 4s 2s +100%
mlk_polyvec_tomont 4s 2s +100%
mlk_scalar_compress_d5 4s 2s +100%
poly_getnoise_eta1122_4x_native 4s 1s +300%
poly_tomont_native_aarch64 4s 3s +33%
rej_uniform_native 4s 3s +33%
rej_uniform_native_aarch64 4s 3s +33%
keccak_f1600_x1_native_aarch64_v84a 3s 3s +0%
kem_check_pk 3s 3s +0%
kem_check_sk 3s 1s +200%
kem_enc_derand 3s 4s -25%
kem_keypair 3s 4s -25%
mlk_ct_cmask_neg_i16 3s 2s +50%
mlk_ct_cmask_nonzero_u16 3s 3s +0%
mlk_ct_cmask_nonzero_u8 3s 2s +50%
mlk_ct_memcmp 3s 2s +50%
mlk_gen_matrix 3s 4s -25%
mlk_poly_frombytes 3s 3s +0%
mlk_poly_invntt_tomont_c 3s 4s -25%
mlk_poly_ntt 3s 2s +50%
mlk_poly_tobytes 3s 2s +50%
mlk_poly_tomont_native 3s 2s +50%
mlk_polyvec_basemul_acc_montgomery_cached 3s 2s +50%
mlk_polyvec_compress_du 3s 3s +0%
mlk_polyvec_decompress_du 3s 5s -40%
mlk_polyvec_frombytes 3s 2s +50%
mlk_scalar_compress_d11 3s 2s +50%
mlk_scalar_decompress_d11 3s 3s +0%
mlk_scalar_decompress_d4 3s 3s +0%
mlk_scalar_decompress_d5 3s 1s +200%
poly_invntt_tomont_native 3s 5s -40%
poly_mulcache_compute_native_aarch64 3s 3s +0%
poly_reduce_native_aarch64 3s 2s +50%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 3s 3s +0%
sys_check_capability 3s 3s +0%
intt_native_aarch64 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 3s -33%
kem_keypair_derand 2s 5s -60%
mlk_barrett_reduce 2s 2s +0%
mlk_check_pct 2s 1s +100%
mlk_ct_cmov_zero 2s 6s -67%
mlk_ct_get_optblocker_i32 2s 2s +0%
mlk_ct_get_optblocker_u32 2s 1s +100%
mlk_ct_sel_int16 2s 1s +100%
mlk_ct_sel_uint8 2s 2s +0%
mlk_keccakf1600_extract_bytes 2s 2s +0%
mlk_keccakf1600_xor_bytes 2s 2s +0%
mlk_keccakf1600x4_permute 2s 3s -33%
mlk_montgomery_reduce 2s 2s +0%
mlk_poly_compress_dv 2s 3s -33%
mlk_poly_decompress_dv 2s 2s +0%
mlk_poly_frombytes_c 2s 1s +100%
mlk_poly_mulcache_compute_native 2s 3s -33%
mlk_poly_ntt_c 2s 1s +100%
mlk_poly_reduce 2s 2s +0%
mlk_poly_tobytes_c 2s 2s +0%
mlk_poly_tobytes_native 2s 3s -33%
mlk_poly_tomont_c 2s 4s -50%
mlk_polyvec_mulcache_compute 2s 2s +0%
mlk_polyvec_ntt 2s 2s +0%
mlk_polyvec_tobytes 2s 2s +0%
mlk_rej_uniform 2s 4s -50%
mlk_scalar_compress_d1 2s 2s +0%
mlk_scalar_compress_d10 2s 4s -50%
mlk_scalar_compress_d4 2s 3s -33%
mlk_scalar_signed_to_unsigned_q 2s 1s +100%
mlk_sha3_512 2s 3s -33%
mlk_shake128_absorb_once 2s 3s -33%
mlk_shake128_squeezeblocks 2s 1s +100%
mlk_shake128x4_absorb_once 2s 4s -50%
mlk_shake256 2s 1s +100%
mlk_value_barrier_i32 2s 2s +0%
mlk_value_barrier_u32 2s 4s -50%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 2s 2s +0%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 2s 3s -33%
keccak_f1600_x1_native_aarch64 1s 3s -67%
kem_enc 1s 2s -50%
mlk_ct_get_optblocker_u8 1s 2s -50%
mlk_keccakf1600x4_extract_bytes 1s 2s -50%
mlk_keccakf1600x4_xor_bytes 1s 2s -50%
mlk_matvec_mul 1s 2s -50%
mlk_poly_invntt_tomont 1s 4s -75%
mlk_polyvec_invntt_tomont 1s 3s -67%
mlk_polyvec_permute_bitrev_to_custom 1s 1s +0%
mlk_sha3_256 1s 2s -50%
mlk_shake128x4_squeezeblocks 1s 1s +0%
mlk_value_barrier_u8 1s 2s -50%
ntt_native_aarch64 1s 3s -67%
poly_tobytes_native_aarch64 1s 2s -50%

@oqs-bot
Copy link
Contributor

oqs-bot commented Jan 22, 2026

CBMC Results (ML-KEM-1024)

⚠️ Attention Required

Proof Status Current Previous Change
mlk_ntt_layer ⚠️ 47s 26s +81%
Full Results (139 proofs)
Proof Status Current Previous Change
**TOTAL** 1571s 1431s +9.8%
mlk_indcpa_enc 245s 254s -4%
mlk_indcpa_keypair_derand 176s 163s +8%
mlk_keccak_squeezeblocks_x4 144s 122s +18%
polyvec_basemul_acc_montgomery_cached_native 137s 107s +28%
mlk_polyvec_add 131s 120s +9%
mlk_rej_uniform_c 74s 68s +9%
mlk_polyvec_basemul_acc_montgomery_cached_c 48s 44s +9%
mlk_ntt_layer ⚠️ 47s 26s +81%
mlk_poly_rej_uniform 47s 35s +34%
mlk_poly_decompress_dv 27s 25s +8%
poly_ntt_native 23s 22s +5%
keccakf1600x4_permute_native_x4 19s 19s +0%
mlk_poly_reduce_native 17s 13s +31%
mlk_polyvec_ntt 17s 12s +42%
mlk_indcpa_dec 16s 16s +0%
mlk_poly_compress_du 12s 9s +33%
mlk_poly_frommsg 11s 9s +22%
mlk_ntt_butterfly_block 10s 10s +0%
mlk_poly_frombytes_native 10s 7s +43%
mlk_poly_sub 10s 9s +11%
kem_dec 9s 8s +12%
mlk_fqmul 9s 6s +50%
mlk_keccak_absorb_once_x4 9s 8s +12%
mlk_poly_rej_uniform_x4 9s 9s +0%
mlk_gen_matrix 7s 6s +17%
mlk_gen_matrix_serial 7s 6s +17%
mlk_keccak_squeeze_once 7s 8s -12%
mlk_keccak_squeezeblocks 7s 8s -12%
mlk_polyvec_permute_bitrev_to_custom_native 6s 5s +20%
keccakf1600_permute_native 5s 8s -38%
mlk_invntt_layer 5s 4s +25%
mlk_keccak_absorb_once 5s 3s +67%
mlk_poly_tomsg 5s 3s +67%
mlk_polymat_permute_bitrev_to_custom 5s 5s +0%
mlk_polyvec_tobytes 5s 3s +67%
keccak_f1600_x1_native_aarch64 4s 3s +33%
mlk_check_pct 4s 5s -20%
mlk_ct_memcmp 4s 4s +0%
mlk_keccakf1600_permute 4s 3s +33%
mlk_keccakf1600x4_extract_bytes 4s 3s +33%
mlk_poly_frombytes 4s 4s +0%
mlk_poly_getnoise_eta1_4x 4s 3s +33%
mlk_poly_getnoise_eta1_4x_native 4s 2s +100%
mlk_poly_tobytes_native 4s 2s +100%
mlk_poly_tomont 4s 1s +300%
mlk_scalar_compress_d5 4s 3s +33%
mlk_scalar_decompress_d11 4s 2s +100%
mlk_shake256x4 4s 6s -33%
poly_tobytes_native_aarch64 4s 3s +33%
intt_native_aarch64 3s 4s -25%
kem_keypair 3s 2s +50%
mlk_ct_cmask_nonzero_u8 3s 3s +0%
mlk_ct_get_optblocker_i32 3s 2s +50%
mlk_keccakf1600_xor_bytes (big endian) 3s 2s +50%
mlk_matvec_mul 3s 4s -25%
mlk_montgomery_reduce 3s 2s +50%
mlk_poly_cbd_eta1 3s 2s +50%
mlk_poly_cbd_eta2 3s 2s +50%
mlk_poly_compress_dv 3s 1s +200%
mlk_poly_getnoise_eta1122_4x 3s 4s -25%
mlk_poly_mulcache_compute_c 3s 2s +50%
mlk_poly_ntt_c 3s 3s +0%
mlk_poly_reduce 3s 2s +50%
mlk_poly_tobytes 3s 1s +200%
mlk_poly_tomont_c 3s 1s +200%
mlk_poly_tomont_native 3s 3s +0%
mlk_polyvec_compress_du 3s 3s +0%
mlk_polyvec_invntt_tomont 3s 2s +50%
mlk_polyvec_tomont 3s 1s +200%
mlk_scalar_decompress_d10 3s 1s +200%
mlk_scalar_decompress_d4 3s 2s +50%
mlk_scalar_decompress_d5 3s 3s +0%
mlk_shake128_absorb_once 3s 3s +0%
mlk_shake128x4_absorb_once 3s 2s +50%
mlk_shake256 3s 2s +50%
mlk_value_barrier_i32 3s 2s +50%
ntt_native_aarch64 3s 2s +50%
poly_invntt_tomont_native 3s 3s +0%
poly_tomont_native_aarch64 3s 2s +50%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 3s 2s +50%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 3s 3s +0%
rej_uniform_native_aarch64 3s 4s -25%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 1s +100%
kem_check_pk 2s 2s +0%
kem_enc 2s 2s +0%
kem_enc_derand 2s 2s +0%
kem_keypair_derand 2s 3s -33%
mlk_barrett_reduce 2s 2s +0%
mlk_ct_cmask_nonzero_u16 2s 2s +0%
mlk_ct_get_optblocker_u32 2s 2s +0%
mlk_ct_sel_int16 2s 2s +0%
mlk_keccakf1600_extract_bytes (big endian) 2s 1s +100%
mlk_poly_add 2s 3s -33%
mlk_poly_getnoise_eta2 2s 5s -60%
mlk_poly_invntt_tomont 2s 1s +100%
mlk_poly_invntt_tomont_c 2s 2s +0%
mlk_poly_mulcache_compute 2s 3s -33%
mlk_poly_mulcache_compute_native 2s 4s -50%
mlk_poly_ntt 2s 3s -33%
mlk_poly_reduce_c 2s 3s -33%
mlk_polyvec_decompress_du 2s 1s +100%
mlk_polyvec_mulcache_compute 2s 3s -33%
mlk_polyvec_permute_bitrev_to_custom 2s 1s +100%
mlk_rej_uniform 2s 1s +100%
mlk_scalar_compress_d1 2s 2s +0%
mlk_scalar_compress_d10 2s 2s +0%
mlk_scalar_compress_d11 2s 2s +0%
mlk_scalar_compress_d4 2s 2s +0%
mlk_scalar_signed_to_unsigned_q 2s 3s -33%
mlk_sha3_256 2s 3s -33%
mlk_sha3_512 2s 1s +100%
mlk_shake128_squeezeblocks 2s 3s -33%
mlk_shake128x4_squeezeblocks 2s 2s +0%
mlk_value_barrier_u32 2s 2s +0%
mlk_value_barrier_u8 2s 3s -33%
poly_getnoise_eta1122_4x_native 2s 1s +100%
poly_mulcache_compute_native_aarch64 2s 2s +0%
poly_reduce_native_aarch64 2s 4s -50%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 2s 2s +0%
rej_uniform_native 2s 3s -33%
keccak_f1600_x1_native_aarch64_v84a 1s 1s +0%
keccak_f1600_x4_native_aarch64_v84a 1s 2s -50%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 1s 2s -50%
kem_check_sk 1s 2s -50%
mlk_ct_cmask_neg_i16 1s 1s +0%
mlk_ct_cmov_zero 1s 2s -50%
mlk_ct_get_optblocker_u8 1s 3s -67%
mlk_ct_sel_uint8 1s 3s -67%
mlk_keccakf1600_extract_bytes 1s 5s -80%
mlk_keccakf1600_xor_bytes 1s 1s +0%
mlk_keccakf1600x4_permute 1s 3s -67%
mlk_keccakf1600x4_xor_bytes 1s 2s -50%
mlk_poly_decompress_du 1s 1s +0%
mlk_poly_frombytes_c 1s 1s +0%
mlk_poly_tobytes_c 1s 2s -50%
mlk_polyvec_basemul_acc_montgomery_cached 1s 3s -67%
mlk_polyvec_frombytes 1s 3s -67%
mlk_polyvec_reduce 1s 3s -67%
sys_check_capability 1s 4s -75%

@oqs-bot
Copy link
Contributor

oqs-bot commented Jan 22, 2026

CBMC Results (ML-KEM-768)

Full Results (139 proofs)
Proof Status Current Previous Change
**TOTAL** 1580s 1681s -6.0%
mlk_indcpa_keypair_derand 474s 498s -5%
mlk_indcpa_enc 207s 232s -11%
mlk_keccak_squeezeblocks_x4 138s 141s -2%
mlk_rej_uniform_c 85s 101s -16%
polyvec_basemul_acc_montgomery_cached_native 63s 65s -3%
mlk_polyvec_basemul_acc_montgomery_cached_c 51s 64s -20%
mlk_poly_rej_uniform 42s 45s -7%
mlk_ntt_layer 35s 45s -22%
poly_ntt_native 30s 33s -9%
keccakf1600x4_permute_native_x4 20s 18s +11%
mlk_indcpa_dec 15s 13s +15%
mlk_poly_reduce_native 15s 17s -12%
mlk_polyvec_add 13s 13s +0%
mlk_keccak_absorb_once_x4 11s 9s +22%
mlk_keccak_squeeze_once 10s 9s +11%
mlk_poly_frombytes_native 9s 10s -10%
mlk_poly_frommsg 9s 10s -10%
keccakf1600_permute_native 8s 5s +60%
kem_dec 8s 6s +33%
mlk_ntt_butterfly_block 8s 9s -11%
mlk_poly_rej_uniform_x4 8s 7s +14%
mlk_poly_sub 8s 9s -11%
mlk_polymat_permute_bitrev_to_custom 8s 6s +33%
mlk_invntt_layer 7s 6s +17%
mlk_keccak_squeezeblocks 7s 9s -22%
mlk_fqmul 6s 5s +20%
poly_mulcache_compute_native_aarch64 6s 4s +50%
mlk_poly_cbd_eta2 5s 2s +150%
poly_invntt_tomont_native 5s 3s +67%
mlk_check_pct 4s 3s +33%
mlk_ct_sel_uint8 4s 2s +100%
mlk_gen_matrix 4s 4s +0%
mlk_keccak_absorb_once 4s 5s -20%
mlk_keccakf1600_permute 4s 4s +0%
mlk_poly_compress_du 4s 4s +0%
mlk_poly_getnoise_eta1_4x 4s 4s +0%
mlk_poly_getnoise_eta1_4x_native 4s 3s +33%
mlk_poly_reduce_c 4s 2s +100%
mlk_poly_tobytes 4s 1s +300%
mlk_polyvec_basemul_acc_montgomery_cached 4s 5s -20%
mlk_polyvec_invntt_tomont 4s 2s +100%
mlk_shake256x4 4s 5s -20%
mlk_value_barrier_i32 4s 3s +33%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 4s 3s +33%
intt_native_aarch64 3s 1s +200%
keccak_f1600_x1_native_aarch64_v84a 3s 1s +200%
kem_check_pk 3s 3s +0%
kem_check_sk 3s 1s +200%
kem_enc 3s 1s +200%
mlk_ct_cmov_zero 3s 2s +50%
mlk_ct_memcmp 3s 1s +200%
mlk_gen_matrix_serial 3s 4s -25%
mlk_keccakf1600_xor_bytes 3s 2s +50%
mlk_keccakf1600_xor_bytes (big endian) 3s 1s +200%
mlk_keccakf1600x4_extract_bytes 3s 2s +50%
mlk_matvec_mul 3s 2s +50%
mlk_poly_add 3s 4s -25%
mlk_poly_decompress_du 3s 3s +0%
mlk_poly_decompress_dv 3s 3s +0%
mlk_poly_frombytes 3s 2s +50%
mlk_poly_getnoise_eta2 3s 2s +50%
mlk_poly_invntt_tomont_c 3s 3s +0%
mlk_poly_mulcache_compute 3s 2s +50%
mlk_poly_tomont 3s 2s +50%
mlk_poly_tomsg 3s 3s +0%
mlk_polyvec_compress_du 3s 3s +0%
mlk_polyvec_permute_bitrev_to_custom_native 3s 2s +50%
mlk_scalar_compress_d1 3s 3s +0%
mlk_scalar_compress_d10 3s 2s +50%
mlk_scalar_decompress_d11 3s 3s +0%
mlk_shake128_absorb_once 3s 5s -40%
mlk_shake128_squeezeblocks 3s 2s +50%
mlk_shake128x4_absorb_once 3s 2s +50%
mlk_value_barrier_u8 3s 2s +50%
poly_reduce_native_aarch64 3s 3s +0%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 3s 2s +50%
rej_uniform_native 3s 5s -40%
keccak_f1600_x4_native_aarch64_v84a 2s 1s +100%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 1s +100%
kem_enc_derand 2s 5s -60%
kem_keypair_derand 2s 3s -33%
mlk_ct_cmask_neg_i16 2s 2s +0%
mlk_ct_cmask_nonzero_u16 2s 2s +0%
mlk_ct_cmask_nonzero_u8 2s 2s +0%
mlk_ct_get_optblocker_i32 2s 2s +0%
mlk_ct_get_optblocker_u32 2s 3s -33%
mlk_ct_get_optblocker_u8 2s 4s -50%
mlk_ct_sel_int16 2s 4s -50%
mlk_keccakf1600_extract_bytes 2s 3s -33%
mlk_keccakf1600_extract_bytes (big endian) 2s 2s +0%
mlk_montgomery_reduce 2s 3s -33%
mlk_poly_cbd_eta1 2s 2s +0%
mlk_poly_frombytes_c 2s 2s +0%
mlk_poly_getnoise_eta1122_4x 2s 2s +0%
mlk_poly_invntt_tomont 2s 2s +0%
mlk_poly_mulcache_compute_c 2s 3s -33%
mlk_poly_mulcache_compute_native 2s 3s -33%
mlk_poly_ntt 2s 3s -33%
mlk_poly_tobytes_native 2s 2s +0%
mlk_poly_tomont_c 2s 1s +100%
mlk_polyvec_decompress_du 2s 3s -33%
mlk_polyvec_frombytes 2s 2s +0%
mlk_polyvec_mulcache_compute 2s 2s +0%
mlk_polyvec_ntt 2s 2s +0%
mlk_polyvec_permute_bitrev_to_custom 2s 4s -50%
mlk_polyvec_reduce 2s 2s +0%
mlk_polyvec_tobytes 2s 5s -60%
mlk_polyvec_tomont 2s 2s +0%
mlk_rej_uniform 2s 3s -33%
mlk_scalar_compress_d11 2s 2s +0%
mlk_scalar_compress_d4 2s 3s -33%
mlk_scalar_compress_d5 2s 3s -33%
mlk_scalar_decompress_d10 2s 4s -50%
mlk_scalar_decompress_d5 2s 1s +100%
mlk_scalar_signed_to_unsigned_q 2s 4s -50%
mlk_sha3_256 2s 1s +100%
mlk_sha3_512 2s 2s +0%
mlk_shake256 2s 1s +100%
mlk_value_barrier_u32 2s 2s +0%
ntt_native_aarch64 2s 3s -33%
poly_getnoise_eta1122_4x_native 2s 2s +0%
poly_tobytes_native_aarch64 2s 2s +0%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 2s 4s -50%
rej_uniform_native_aarch64 2s 5s -60%
sys_check_capability 2s 2s +0%
keccak_f1600_x1_native_aarch64 1s 2s -50%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 1s 4s -75%
kem_keypair 1s 4s -75%
mlk_barrett_reduce 1s 3s -67%
mlk_keccakf1600x4_permute 1s 1s +0%
mlk_keccakf1600x4_xor_bytes 1s 1s +0%
mlk_poly_compress_dv 1s 2s -50%
mlk_poly_ntt_c 1s 3s -67%
mlk_poly_reduce 1s 2s -50%
mlk_poly_tobytes_c 1s 2s -50%
mlk_poly_tomont_native 1s 3s -67%
mlk_scalar_decompress_d4 1s 3s -67%
mlk_shake128x4_squeezeblocks 1s 2s -50%
poly_tomont_native_aarch64 1s 3s -67%

@manastasova manastasova marked this pull request as ready for review January 22, 2026 23:07
@manastasova manastasova requested a review from a team as a code owner January 22, 2026 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark this PR should be benchmarked in CI x86_64

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants