Skip to content

Conversation

@willieyz
Copy link
Contributor

Ported from pq-code-package/mlkem-native#1514, adding PMU-based cycle counting support for Armv8.1-M Cortex-M processors and aligning the RISC-V PMU configuration in hal.c with mlkem-native.

@willieyz willieyz force-pushed the port-add-PMU-armv81 branch from a04e63f to 28d55e1 Compare January 22, 2026 10:33
@oqs-bot
Copy link
Contributor

oqs-bot commented Jan 22, 2026

CBMC Results (ML-DSA-87)

Full Results (173 proofs)
Proof Status Current Previous Change
**TOTAL** 2170s 2117s +2.5%
mld_attempt_signature_generation 199s 191s +4%
polyvec_matrix_expand 186s 190s -2%
polyvecl_pointwise_acc_montgomery_c 162s 156s +4%
rej_uniform_native 130s 125s +4%
poly_pointwise_montgomery_c 129s 128s +1%
sign_verify_internal 117s 116s +1%
polyvec_matrix_expand_serial 105s 103s +2%
mld_ct_memcmp 75s 76s -1%
mld_invntt_layer 57s 59s -3%
sign_signature_internal 55s 58s -5%
mld_ntt_layer 48s 42s +14%
keccak_squeezeblocks_x4 43s 44s -2%
mld_compute_t0_t1_tr_from_sk_components 26s 26s +0%
polymat_permute_bitrev_to_custom 25s 23s +9%
fqmul 21s 18s +17%
rej_uniform 20s 20s +0%
poly_chknorm_c 18s 15s +20%
rej_uniform_c 18s 17s +6%
poly_uniform_eta_4x 17s 17s +0%
polyvec_matrix_pointwise_montgomery 15s 13s +15%
poly_uniform_4x 13s 12s +8%
polyt0_unpack 13s 14s -7%
polyveck_add 13s 13s +0%
polyveck_power2round 13s 12s +8%
polyveck_reduce 13s 11s +18%
keccak_absorb_once_x4 12s 13s -8%
keccakf1600x4_permute_native 12s 14s -14%
mld_ntt_butterfly_block 12s 12s +0%
polyeta_unpack 12s 13s -8%
keccakf1600_permute_native 10s 6s +67%
mld_polyvecl_permute_bitrev_to_custom_native 10s 12s -17%
polyveck_chknorm 10s 7s +43%
mld_check_pct 9s 5s +80%
polyveck_use_hint 9s 8s +12%
polyvecl_ntt 9s 10s -10%
keccakf1600_permute 8s 9s -11%
mld_sample_s1_s2_serial 8s 6s +33%
poly_decompose_c 8s 8s +0%
polyveck_caddq 8s 9s -11%
polyveck_pointwise_poly_montgomery 8s 7s +14%
polyveck_shiftl 8s 6s +33%
sign 8s 7s +14%
poly_invntt_tomont_c 7s 8s -12%
polyveck_decompose 7s 6s +17%
polyveck_invntt_tomont 7s 9s -22%
polyveck_ntt 7s 6s +17%
sign_keypair_internal 7s 6s +17%
sign_pk_from_sk 7s 9s -22%
sign_signature_pre_hash_internal 7s 5s +40%
sign_verify 7s 5s +40%
keccakf1600x4_permute 6s 3s +100%
mld_compute_pack_z 6s 5s +20%
mld_h 6s 6s +0%
poly_use_hint_c 6s 3s +100%
polyveck_sub 6s 7s -14%
polyvecl_pack_eta 6s 2s +200%
sign_keypair 6s 3s +100%
sign_open 6s 4s +50%
sign_signature_pre_hash_shake256 6s 4s +50%
unpack_hints 6s 6s +0%
unpack_sk 6s 4s +50%
keccak_absorb 5s 8s -38%
mld_sample_s1_s2 5s 6s -17%
poly_caddq_native 5s 4s +25%
poly_challenge 5s 5s +0%
polyveck_make_hint 5s 6s -17%
polyveck_pack_w1 5s 2s +150%
polyvecl_chknorm 5s 5s +0%
power2round 5s 3s +67%
sign_signature_extmu 5s 3s +67%
sign_verify_pre_hash_shake256 5s 3s +67%
use_hint 5s 3s +67%
caddq 4s 3s +33%
keccak_finalize 4s 6s -33%
keccakf1600_extract_bytes (big endian) 4s 1s +300%
keccakf1600x4_xor_bytes 4s 2s +100%
pack_sig_c_h 4s 2s +100%
pack_sk 4s 3s +33%
poly_add 4s 3s +33%
poly_caddq 4s 3s +33%
poly_chknorm_native 4s 4s +0%
poly_invntt_tomont 4s 3s +33%
poly_invntt_tomont_native 4s 2s +100%
poly_pointwise_montgomery_native 4s 2s +100%
poly_sub 4s 2s +100%
poly_uniform 4s 2s +100%
poly_uniform_eta 4s 2s +100%
poly_uniform_gamma1 4s 2s +100%
poly_uniform_gamma1_4x 4s 4s +0%
poly_use_hint 4s 4s +0%
polyt0_pack 4s 4s +0%
polyvecl_permute_bitrev_to_custom 4s 2s +100%
polyvecl_uniform_gamma1 4s 4s +0%
polyvecl_uniform_gamma1_serial 4s 5s -20%
polyvecl_unpack_eta 4s 2s +100%
polyvecl_unpack_z 4s 6s -33%
polyz_unpack_c 4s 5s -20%
rej_eta_c 4s 5s -20%
shake128_release 4s 2s +100%
shake256x4_absorb_once 4s 2s +100%
sign_verify_pre_hash_internal 4s 2s +100%
unpack_pk 4s 4s +0%
decompose 3s 4s -25%
fqscale 3s 2s +50%
keccak_squeeze 3s 4s -25%
keccakf1600_xor_bytes 3s 3s +0%
make_hint 3s 5s -40%
mld_ct_cmask_nonzero_u32 3s 3s +0%
mld_ct_cmask_nonzero_u8 3s 3s +0%
mld_ct_get_optblocker_u8 3s 3s +0%
mld_prepare_domain_separation_prefix 3s 5s -40%
ntt_native_x86_64 3s 4s -25%
pack_pk 3s 4s -25%
poly_caddq_c 3s 1s +200%
poly_decompose 3s 3s +0%
poly_decompose_native 3s 3s +0%
poly_make_hint 3s 4s -25%
poly_ntt 3s 3s +0%
poly_ntt_native 3s 1s +200%
poly_pointwise_montgomery 3s 5s -40%
poly_power2round 3s 2s +50%
poly_use_hint_native 3s 4s -25%
polyeta_pack 3s 3s +0%
polyt1_pack 3s 4s -25%
polyt1_unpack 3s 2s +50%
polyveck_pack_eta 3s 4s -25%
polyveck_pack_t0 3s 4s -25%
polyveck_unpack_eta 3s 5s -40%
polyveck_unpack_t0 3s 4s -25%
polyvecl_pointwise_acc_montgomery 3s 4s -25%
polyvecl_pointwise_acc_montgomery_native 3s 3s +0%
polyw1_pack 3s 4s -25%
polyz_unpack 3s 4s -25%
rej_eta 3s 1s +200%
rej_eta_native 3s 5s -40%
shake128_absorb 3s 2s +50%
shake128_finalize 3s 1s +200%
shake128_squeeze 3s 3s +0%
shake256_absorb 3s 4s -25%
shake256_init 3s 3s +0%
shake256_release 3s 4s -25%
shake256_squeeze 3s 2s +50%
shake256x4_squeezeblocks 3s 4s -25%
sign_signature 3s 2s +50%
sign_verify_extmu 3s 3s +0%
sys_check_capability 3s 3s +0%
keccak_init 2s 2s +0%
mld_ct_cmask_neg_i32 2s 2s +0%
mld_ct_get_optblocker_u32 2s 4s -50%
mld_ct_sel_int32 2s 3s -33%
mld_keccakf1600_extract_bytes 2s 3s -33%
mld_value_barrier_i64 2s 5s -60%
mld_value_barrier_u32 2s 2s +0%
mld_value_barrier_u8 2s 2s +0%
pack_sig_z 2s 3s -33%
poly_chknorm 2s 5s -60%
poly_ntt_c 2s 4s -50%
poly_reduce 2s 1s +100%
poly_shiftl 2s 2s +0%
polyz_pack 2s 3s -33%
polyz_unpack_native 2s 3s -33%
reduce32 2s 3s -33%
shake128_init 2s 3s -33%
shake128x4_absorb_once 2s 5s -60%
shake128x4_squeezeblocks 2s 3s -33%
shake256 2s 3s -33%
shake256_finalize 2s 3s -33%
unpack_sig 2s 4s -50%
keccakf1600_xor_bytes (big endian) 1s 2s -50%
keccakf1600x4_extract_bytes 1s 3s -67%
mld_ct_abs_i32 1s 2s -50%
mld_ct_get_optblocker_i64 1s 3s -67%
montgomery_reduce 1s 2s -50%

@oqs-bot
Copy link
Contributor

oqs-bot commented Jan 22, 2026

CBMC Results (ML-DSA-44)

Full Results (173 proofs)
Proof Status Current Previous Change
**TOTAL** 1999s 1808s +10.6%
mld_attempt_signature_generation 242s 213s +14%
polyvecl_pointwise_acc_montgomery_c 227s 186s +22%
poly_pointwise_montgomery_c 147s 130s +13%
rej_uniform_native 140s 124s +13%
sign_verify_internal 128s 117s +9%
mld_ct_memcmp 86s 74s +16%
mld_invntt_layer 75s 70s +7%
mld_ntt_layer 47s 43s +9%
keccak_squeezeblocks_x4 46s 47s -2%
sign_signature_internal 34s 34s +0%
rej_uniform 22s 21s +5%
fqmul 19s 21s -10%
rej_uniform_c 19s 17s +12%
poly_chknorm_c 18s 15s +20%
mld_compute_t0_t1_tr_from_sk_components 17s 12s +42%
polyvec_matrix_expand 17s 15s +13%
keccakf1600x4_permute_native 16s 13s +23%
poly_uniform_eta_4x 16s 16s +0%
polymat_permute_bitrev_to_custom 16s 15s +7%
polyt0_unpack 16s 13s +23%
poly_uniform_4x 14s 13s +8%
polyeta_unpack 14s 13s +8%
keccak_absorb_once_x4 13s 13s +0%
mld_ntt_butterfly_block 12s 11s +9%
polyz_unpack_c 11s 11s +0%
poly_invntt_tomont_c 10s 11s -9%
polyvec_matrix_pointwise_montgomery 10s 7s +43%
keccakf1600_permute_native 9s 8s +12%
keccakf1600_permute 8s 6s +33%
mld_h 8s 7s +14%
mld_polyvecl_permute_bitrev_to_custom_native 8s 7s +14%
sign 8s 7s +14%
mld_check_pct 7s 7s +0%
polyveck_add 7s 6s +17%
polyveck_pointwise_poly_montgomery 7s 5s +40%
sign_pk_from_sk 7s 6s +17%
decompose 6s 6s +0%
keccak_absorb 6s 7s -14%
poly_add 6s 2s +200%
poly_caddq 6s 3s +100%
poly_uniform 6s 4s +50%
polyvec_matrix_expand_serial 6s 6s +0%
polyveck_caddq 6s 5s +20%
polyveck_decompose 6s 6s +0%
polyveck_invntt_tomont 6s 5s +20%
polyveck_ntt 6s 5s +20%
polyveck_sub 6s 5s +20%
polyvecl_chknorm 6s 6s +0%
polyvecl_unpack_eta 6s 2s +200%
unpack_hints 6s 6s +0%
mld_compute_pack_z 5s 6s -17%
mld_prepare_domain_separation_prefix 5s 2s +150%
mld_sample_s1_s2 5s 4s +25%
mld_sample_s1_s2_serial 5s 5s +0%
poly_decompose 5s 3s +67%
poly_decompose_native 5s 5s +0%
poly_invntt_tomont 5s 3s +67%
poly_use_hint_c 5s 3s +67%
polyveck_chknorm 5s 3s +67%
polyveck_pack_t0 5s 4s +25%
polyveck_power2round 5s 4s +25%
polyveck_reduce 5s 3s +67%
polyveck_use_hint 5s 3s +67%
polyvecl_ntt 5s 6s -17%
polyz_unpack 5s 4s +25%
rej_eta_c 5s 5s +0%
rej_eta_native 5s 5s +0%
shake128_finalize 5s 2s +150%
sign_keypair 5s 5s +0%
sign_keypair_internal 5s 5s +0%
sign_verify 5s 6s -17%
sign_verify_extmu 5s 7s -29%
sys_check_capability 5s 4s +25%
unpack_sk 5s 3s +67%
keccakf1600_xor_bytes (big endian) 4s 3s +33%
keccakf1600x4_permute 4s 1s +300%
make_hint 4s 2s +100%
pack_sk 4s 3s +33%
poly_caddq_native 4s 3s +33%
poly_challenge 4s 4s +0%
poly_chknorm 4s 4s +0%
poly_chknorm_native 4s 2s +100%
poly_ntt_native 4s 4s +0%
poly_pointwise_montgomery_native 4s 3s +33%
poly_reduce 4s 3s +33%
poly_uniform_eta 4s 4s +0%
poly_uniform_gamma1 4s 3s +33%
poly_uniform_gamma1_4x 4s 5s -20%
poly_use_hint_native 4s 3s +33%
polyeta_pack 4s 5s -20%
polyt0_pack 4s 4s +0%
polyveck_unpack_eta 4s 3s +33%
polyvecl_pack_eta 4s 3s +33%
polyvecl_pointwise_acc_montgomery 4s 4s +0%
polyvecl_uniform_gamma1 4s 4s +0%
polyw1_pack 4s 5s -20%
polyz_pack 4s 2s +100%
shake128_absorb 4s 2s +100%
shake256 4s 3s +33%
shake256_init 4s 4s +0%
shake256x4_squeezeblocks 4s 3s +33%
sign_signature 4s 3s +33%
sign_signature_extmu 4s 4s +0%
sign_signature_pre_hash_internal 4s 4s +0%
sign_verify_pre_hash_internal 4s 3s +33%
sign_verify_pre_hash_shake256 4s 6s -33%
unpack_sig 4s 3s +33%
use_hint 4s 3s +33%
keccak_squeeze 3s 3s +0%
keccakf1600_extract_bytes (big endian) 3s 2s +50%
mld_ct_get_optblocker_i64 3s 2s +50%
mld_ct_sel_int32 3s 2s +50%
mld_keccakf1600_extract_bytes 3s 3s +0%
mld_value_barrier_i64 3s 4s -25%
montgomery_reduce 3s 4s -25%
ntt_native_x86_64 3s 4s -25%
pack_pk 3s 3s +0%
poly_caddq_c 3s 2s +50%
poly_decompose_c 3s 3s +0%
poly_ntt 3s 5s -40%
poly_ntt_c 3s 2s +50%
poly_sub 3s 2s +50%
polyt1_pack 3s 2s +50%
polyveck_make_hint 3s 5s -40%
polyveck_pack_w1 3s 6s -50%
polyveck_shiftl 3s 3s +0%
polyveck_unpack_t0 3s 2s +50%
polyvecl_uniform_gamma1_serial 3s 2s +50%
polyvecl_unpack_z 3s 4s -25%
power2round 3s 3s +0%
shake128_release 3s 3s +0%
shake128_squeeze 3s 2s +50%
shake128x4_squeezeblocks 3s 3s +0%
shake256_finalize 3s 2s +50%
shake256_release 3s 3s +0%
shake256_squeeze 3s 2s +50%
sign_open 3s 7s -57%
sign_signature_pre_hash_shake256 3s 4s -25%
unpack_pk 3s 2s +50%
caddq 2s 3s -33%
fqscale 2s 3s -33%
keccak_init 2s 2s +0%
keccakf1600_xor_bytes 2s 1s +100%
keccakf1600x4_extract_bytes 2s 2s +0%
keccakf1600x4_xor_bytes 2s 4s -50%
mld_ct_abs_i32 2s 2s +0%
mld_ct_cmask_neg_i32 2s 2s +0%
mld_ct_cmask_nonzero_u32 2s 2s +0%
mld_ct_cmask_nonzero_u8 2s 2s +0%
mld_ct_get_optblocker_u32 2s 1s +100%
mld_ct_get_optblocker_u8 2s 1s +100%
mld_value_barrier_u32 2s 2s +0%
mld_value_barrier_u8 2s 3s -33%
pack_sig_c_h 2s 4s -50%
pack_sig_z 2s 4s -50%
poly_invntt_tomont_native 2s 2s +0%
poly_make_hint 2s 3s -33%
poly_pointwise_montgomery 2s 5s -60%
poly_power2round 2s 3s -33%
poly_shiftl 2s 2s +0%
poly_use_hint 2s 2s +0%
polyt1_unpack 2s 4s -50%
polyveck_pack_eta 2s 3s -33%
polyvecl_permute_bitrev_to_custom 2s 3s -33%
polyvecl_pointwise_acc_montgomery_native 2s 5s -60%
polyz_unpack_native 2s 2s +0%
reduce32 2s 4s -50%
shake128x4_absorb_once 2s 2s +0%
shake256_absorb 2s 2s +0%
shake256x4_absorb_once 2s 3s -33%
keccak_finalize 1s 4s -75%
rej_eta 1s 5s -80%
shake128_init 1s 3s -67%

@oqs-bot
Copy link
Contributor

oqs-bot commented Jan 22, 2026

CBMC Results (ML-DSA-65)

Full Results (173 proofs)
Proof Status Current Previous Change
**TOTAL** 2430s 2573s -5.6%
mld_attempt_signature_generation 412s 439s -6%
polyvecl_pointwise_acc_montgomery_c 241s 258s -7%
sign_verify_internal 176s 184s -4%
poly_pointwise_montgomery_c 136s 160s -15%
polyvec_matrix_expand 134s 144s -7%
rej_uniform_native 132s 136s -3%
mld_ct_memcmp 90s 94s -4%
polyvec_matrix_expand_serial 66s 71s -7%
mld_invntt_layer 65s 67s -3%
mld_ntt_layer 46s 50s -8%
sign_signature_internal 46s 50s -8%
keccak_squeezeblocks_x4 44s 46s -4%
mld_compute_t0_t1_tr_from_sk_components 25s 27s -7%
rej_uniform 21s 23s -9%
fqmul 19s 22s -14%
polymat_permute_bitrev_to_custom 19s 20s -5%
poly_uniform_eta_4x 17s 19s -11%
polyt0_unpack 17s 20s -15%
rej_uniform_c 17s 24s -29%
polyvec_matrix_pointwise_montgomery 16s 16s +0%
polyveck_decompose 16s 20s -20%
mld_ntt_butterfly_block 14s 12s +17%
poly_uniform_4x 14s 15s -7%
poly_chknorm_c 13s 16s -19%
keccak_absorb_once_x4 12s 15s -20%
keccakf1600x4_permute_native 12s 14s -14%
mld_check_pct 12s 12s +0%
polyveck_use_hint 12s 15s -20%
mld_polyvecl_permute_bitrev_to_custom_native 10s 8s +25%
polyveck_add 10s 8s +25%
sign 10s 12s -17%
poly_decompose_c 9s 8s +12%
poly_invntt_tomont_c 9s 11s -18%
polyveck_ntt 9s 8s +12%
polyveck_power2round 9s 8s +12%
polyveck_reduce 9s 6s +50%
polyvecl_ntt 9s 7s +29%
sign_pk_from_sk 9s 9s +0%
keccakf1600_permute 8s 10s -20%
keccakf1600_permute_native 8s 9s -11%
polyeta_unpack 8s 7s +14%
polyveck_shiftl 8s 6s +33%
keccak_absorb 7s 6s +17%
mld_compute_pack_z 7s 4s +75%
mld_sample_s1_s2_serial 7s 4s +75%
polyveck_caddq 7s 7s +0%
polyveck_invntt_tomont 7s 11s -36%
polyveck_sub 7s 7s +0%
poly_power2round 6s 2s +200%
mld_h 5s 4s +25%
mld_sample_s1_s2 5s 5s +0%
ntt_native_x86_64 5s 4s +25%
poly_add 5s 5s +0%
poly_decompose_native 5s 4s +25%
poly_use_hint_c 5s 5s +0%
polyveck_chknorm 5s 7s -29%
polyveck_make_hint 5s 7s -29%
polyveck_pointwise_poly_montgomery 5s 5s +0%
polyvecl_pointwise_acc_montgomery_native 5s 4s +25%
shake128x4_squeezeblocks 5s 3s +67%
sign_keypair_internal 5s 4s +25%
sign_open 5s 4s +25%
sign_signature_extmu 5s 3s +67%
sign_signature_pre_hash_shake256 5s 4s +25%
sign_verify_extmu 5s 2s +150%
sign_verify_pre_hash_shake256 5s 4s +25%
unpack_hints 5s 5s +0%
unpack_pk 5s 4s +25%
caddq 4s 3s +33%
keccakf1600x4_xor_bytes 4s 2s +100%
mld_ct_sel_int32 4s 1s +300%
mld_prepare_domain_separation_prefix 4s 5s -20%
montgomery_reduce 4s 3s +33%
poly_caddq_c 4s 4s +0%
poly_caddq_native 4s 4s +0%
poly_challenge 4s 5s -20%
poly_chknorm 4s 4s +0%
poly_chknorm_native 4s 5s -20%
poly_invntt_tomont 4s 2s +100%
poly_ntt 4s 1s +300%
poly_ntt_native 4s 6s -33%
poly_pointwise_montgomery 4s 2s +100%
poly_shiftl 4s 4s +0%
poly_uniform 4s 3s +33%
polyt0_pack 4s 6s -33%
polyt1_pack 4s 4s +0%
polyveck_pack_eta 4s 4s +0%
polyveck_unpack_t0 4s 4s +0%
polyvecl_pack_eta 4s 3s +33%
polyvecl_uniform_gamma1 4s 5s -20%
polyvecl_uniform_gamma1_serial 4s 4s +0%
polyvecl_unpack_eta 4s 4s +0%
polyvecl_unpack_z 4s 2s +100%
polyz_unpack_c 4s 4s +0%
polyz_unpack_native 4s 4s +0%
rej_eta_native 4s 6s -33%
shake128_absorb 4s 2s +100%
shake128_init 4s 3s +33%
shake256x4_absorb_once 4s 4s +0%
shake256x4_squeezeblocks 4s 3s +33%
sign_keypair 4s 5s -20%
sign_verify 4s 3s +33%
sys_check_capability 4s 4s +0%
keccak_finalize 3s 2s +50%
keccak_init 3s 2s +50%
keccak_squeeze 3s 5s -40%
keccakf1600_xor_bytes 3s 4s -25%
keccakf1600_xor_bytes (big endian) 3s 2s +50%
keccakf1600x4_extract_bytes 3s 2s +50%
mld_ct_cmask_nonzero_u32 3s 2s +50%
mld_ct_cmask_nonzero_u8 3s 2s +50%
mld_ct_get_optblocker_u8 3s 2s +50%
mld_value_barrier_i64 3s 1s +200%
mld_value_barrier_u32 3s 2s +50%
pack_pk 3s 4s -25%
pack_sig_c_h 3s 4s -25%
pack_sig_z 3s 3s +0%
pack_sk 3s 4s -25%
poly_decompose 3s 2s +50%
poly_reduce 3s 4s -25%
poly_uniform_eta 3s 5s -40%
poly_uniform_gamma1 3s 5s -40%
poly_uniform_gamma1_4x 3s 3s +0%
poly_use_hint 3s 3s +0%
poly_use_hint_native 3s 3s +0%
polyt1_unpack 3s 2s +50%
polyveck_pack_t0 3s 2s +50%
polyveck_unpack_eta 3s 2s +50%
polyvecl_pointwise_acc_montgomery 3s 4s -25%
polyw1_pack 3s 4s -25%
polyz_pack 3s 4s -25%
polyz_unpack 3s 3s +0%
power2round 3s 1s +200%
rej_eta 3s 5s -40%
rej_eta_c 3s 3s +0%
shake128_finalize 3s 2s +50%
shake128_release 3s 3s +0%
shake128_squeeze 3s 2s +50%
shake128x4_absorb_once 3s 3s +0%
shake256_finalize 3s 3s +0%
shake256_init 3s 2s +50%
sign_signature_pre_hash_internal 3s 6s -50%
sign_verify_pre_hash_internal 3s 4s -25%
unpack_sig 3s 5s -40%
unpack_sk 3s 5s -40%
use_hint 3s 3s +0%
decompose 2s 4s -50%
fqscale 2s 2s +0%
keccakf1600x4_permute 2s 2s +0%
make_hint 2s 3s -33%
mld_ct_abs_i32 2s 2s +0%
mld_ct_cmask_neg_i32 2s 1s +100%
mld_ct_get_optblocker_i64 2s 3s -33%
mld_keccakf1600_extract_bytes 2s 2s +0%
mld_value_barrier_u8 2s 2s +0%
poly_caddq 2s 4s -50%
poly_make_hint 2s 4s -50%
poly_pointwise_montgomery_native 2s 4s -50%
polyeta_pack 2s 2s +0%
polyveck_pack_w1 2s 2s +0%
polyvecl_chknorm 2s 6s -67%
polyvecl_permute_bitrev_to_custom 2s 5s -60%
reduce32 2s 4s -50%
shake256 2s 3s -33%
shake256_absorb 2s 3s -33%
shake256_release 2s 2s +0%
shake256_squeeze 2s 2s +0%
sign_signature 2s 4s -50%
keccakf1600_extract_bytes (big endian) 1s 2s -50%
mld_ct_get_optblocker_u32 1s 3s -67%
poly_invntt_tomont_native 1s 4s -75%
poly_ntt_c 1s 4s -75%
poly_sub 1s 2s -50%

@willieyz willieyz force-pushed the port-add-PMU-armv81 branch from 28d55e1 to 7b303ef Compare January 23, 2026 01:21
@willieyz willieyz marked this pull request as ready for review January 23, 2026 02:26
@willieyz willieyz requested a review from a team as a code owner January 23, 2026 02:26
@willieyz willieyz marked this pull request as draft January 23, 2026 02:52
@willieyz willieyz force-pushed the port-add-PMU-armv81 branch from 7b303ef to 99ee701 Compare January 23, 2026 09:43
@willieyz willieyz marked this pull request as ready for review January 23, 2026 10:05
Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @willieyz. You unintentionally changed the default benchmarking parameters - probably because you copied them from mlkem-native - we cannot use the same paramters as for mlkem-native. Besides that two other nits.

I will fix both myself shortly.

Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @willieyz. I fixed it myself.

Note: The benchmarking parameters for the mps3-an547 are really small. Too small to produce cycle counts for signing that are reasonably close to the expected value due to the rejection sampling. However, otherwise the benchmarks take forever. For key generation and verification, we have verified that cycles are reasonably stable, close to what we get with larger iteration counts, and benchmarking complete in less than 30 seconds on the MPS3 FPGA. For proper benchmarks of signing one will have to use much larger values.

Make the benchmark parameters (NWARMUP, NITERATIONS, NTESTS)
configurable via CFLAGS by wrapping them in #ifndef guards and
renaming to MLD_BENCHMARK_NWARMUP, MLD_BENCHMARK_NITERATIONS,
and MLD_BENCHMARK_NTESTS.

Signed-off-by: willieyz <[email protected]>
Add PMU-based cycle counting support for Armv8.1-M Cortex-M processors.
This uses the CMSIS PMU APIs for portable cycle counter access.

Signed-off-by: willieyz <[email protected]>
The cycle counts will be zero, but it still tests the PMU code builds.

Signed-off-by: willieyz <[email protected]>
@mkannwischer mkannwischer merged commit 87f7a12 into main Jan 24, 2026
337 checks passed
@mkannwischer mkannwischer deleted the port-add-PMU-armv81 branch January 24, 2026 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Port: Add PMU cycle counting for Armv8.1-M

4 participants