Skip to content

Conversation

@mkannwischer
Copy link
Contributor

@mkannwischer mkannwischer commented Oct 26, 2025

This PR ports the following two PRs from mlkem-native:

Unfortunately, the changes to mldsa-native are more invasive, as we have to touch 3 functions mld_polyvecl_uniform_gamma1, mld_sample_s1_s2, and `mld_polyvec_matrix_expand).
See the individual commit messages for details.

@mkannwischer mkannwischer force-pushed the serial-fips202 branch 2 times, most recently from e86480b to 55c610c Compare October 26, 2025 03:10
@mkannwischer mkannwischer changed the title WIP: Add Add MLK_CONFIG_SERIAL_FIPS202_ONLY option Add Add MLK_CONFIG_SERIAL_FIPS202_ONLY option Oct 26, 2025
@mkannwischer mkannwischer changed the title Add Add MLK_CONFIG_SERIAL_FIPS202_ONLY option Add MLK_CONFIG_SERIAL_FIPS202_ONLY option Oct 26, 2025
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (opt)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 46214 cycles 46211 cycles 1.00
ML-DSA-44 sign 132529 cycles 132524 cycles 1.00
ML-DSA-44 verify 47864 cycles 47860 cycles 1.00
ML-DSA-65 keypair 81119 cycles 81129 cycles 1.00
ML-DSA-65 sign 219041 cycles 219038 cycles 1.00
ML-DSA-65 verify 80133 cycles 80141 cycles 1.00
ML-DSA-87 keypair 132280 cycles 132297 cycles 1.00
ML-DSA-87 sign 280857 cycles 280826 cycles 1.00
ML-DSA-87 verify 130341 cycles 130358 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (no-opt)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 115062 cycles 115066 cycles 1.00
ML-DSA-44 sign 431665 cycles 431660 cycles 1.00
ML-DSA-44 verify 122202 cycles 122204 cycles 1.00
ML-DSA-65 keypair 197079 cycles 197093 cycles 1.00
ML-DSA-65 sign 701053 cycles 700981 cycles 1.00
ML-DSA-65 verify 197706 cycles 197690 cycles 1.00
ML-DSA-87 keypair 325266 cycles 325240 cycles 1.00
ML-DSA-87 sign 884938 cycles 884688 cycles 1.00
ML-DSA-87 verify 328870 cycles 328846 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 288207 cycles 286659 cycles 1.01
ML-DSA-44 sign 916985 cycles 926323 cycles 0.99
ML-DSA-44 verify 294338 cycles 290589 cycles 1.01
ML-DSA-65 keypair 485897 cycles 485454 cycles 1.00
ML-DSA-65 sign 1530923 cycles 1505432 cycles 1.02
ML-DSA-65 verify 475031 cycles 477675 cycles 0.99
ML-DSA-87 keypair 833621 cycles 834086 cycles 1.00
ML-DSA-87 sign 2041111 cycles 2070270 cycles 0.99
ML-DSA-87 verify 816648 cycles 821897 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 34938 cycles 34926 cycles 1.00
ML-DSA-44 sign 121031 cycles 121149 cycles 1.00
ML-DSA-44 verify 38332 cycles 38377 cycles 1.00
ML-DSA-65 keypair 62199 cycles 61464 cycles 1.01
ML-DSA-65 sign 200264 cycles 198984 cycles 1.01
ML-DSA-65 verify 62336 cycles 62416 cycles 1.00
ML-DSA-87 keypair 94930 cycles 94941 cycles 1.00
ML-DSA-87 sign 235477 cycles 234933 cycles 1.00
ML-DSA-87 verify 93912 cycles 94952 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 465017 cycles 460856 cycles 1.01
ML-DSA-44 sign 2222620 cycles 2207059 cycles 1.01
ML-DSA-44 verify 546684 cycles 544844 cycles 1.00
ML-DSA-65 keypair 774578 cycles 772368 cycles 1.00
ML-DSA-65 sign 3637110 cycles 3607517 cycles 1.01
ML-DSA-65 verify 847303 cycles 845335 cycles 1.00
ML-DSA-87 keypair 1246015 cycles 1247551 cycles 1.00
ML-DSA-87 sign 4448167 cycles 4456731 cycles 1.00
ML-DSA-87 verify 1358517 cycles 1357092 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 94975 cycles 94997 cycles 1.00
ML-DSA-44 sign 348626 cycles 348434 cycles 1.00
ML-DSA-44 verify 100700 cycles 100627 cycles 1.00
ML-DSA-65 keypair 164458 cycles 164944 cycles 1.00
ML-DSA-65 sign 566833 cycles 567932 cycles 1.00
ML-DSA-65 verify 165263 cycles 165264 cycles 1.00
ML-DSA-87 keypair 268148 cycles 266961 cycles 1.00
ML-DSA-87 sign 722541 cycles 722134 cycles 1.00
ML-DSA-87 verify 271532 cycles 271575 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 43572 cycles 42707 cycles 1.02
ML-DSA-44 sign 130680 cycles 130874 cycles 1.00
ML-DSA-44 verify 44442 cycles 44417 cycles 1.00
ML-DSA-65 keypair 72560 cycles 72601 cycles 1.00
ML-DSA-65 sign 211910 cycles 211120 cycles 1.00
ML-DSA-65 verify 73266 cycles 73063 cycles 1.00
ML-DSA-87 keypair 110030 cycles 109898 cycles 1.00
ML-DSA-87 sign 249841 cycles 248796 cycles 1.00
ML-DSA-87 verify 109880 cycles 110981 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 57291 cycles 56559 cycles 1.01
ML-DSA-44 sign 180306 cycles 180469 cycles 1.00
ML-DSA-44 verify 61196 cycles 61313 cycles 1.00
ML-DSA-65 keypair 99749 cycles 99415 cycles 1.00
ML-DSA-65 sign 297023 cycles 296957 cycles 1.00
ML-DSA-65 verify 100282 cycles 100080 cycles 1.00
ML-DSA-87 keypair 153417 cycles 153553 cycles 1.00
ML-DSA-87 sign 353122 cycles 352902 cycles 1.00
ML-DSA-87 verify 152910 cycles 153211 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 69289 cycles 69060 cycles 1.00
ML-DSA-44 sign 185177 cycles 185506 cycles 1.00
ML-DSA-44 verify 69143 cycles 69272 cycles 1.00
ML-DSA-65 keypair 119162 cycles 119253 cycles 1.00
ML-DSA-65 sign 295818 cycles 295980 cycles 1.00
ML-DSA-65 verify 115262 cycles 115145 cycles 1.00
ML-DSA-87 keypair 201279 cycles 201391 cycles 1.00
ML-DSA-87 sign 385370 cycles 385955 cycles 1.00
ML-DSA-87 verify 193648 cycles 193466 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 120088 cycles 121037 cycles 0.99
ML-DSA-44 sign 454835 cycles 455279 cycles 1.00
ML-DSA-44 verify 129904 cycles 130382 cycles 1.00
ML-DSA-65 keypair 206023 cycles 204940 cycles 1.01
ML-DSA-65 sign 736580 cycles 735259 cycles 1.00
ML-DSA-65 verify 210302 cycles 210201 cycles 1.00
ML-DSA-87 keypair 338195 cycles 337995 cycles 1.00
ML-DSA-87 sign 929159 cycles 927182 cycles 1.00
ML-DSA-87 verify 345084 cycles 345666 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 116180 cycles 116209 cycles 1.00
ML-DSA-44 sign 379866 cycles 380310 cycles 1.00
ML-DSA-44 verify 121394 cycles 121412 cycles 1.00
ML-DSA-65 keypair 199587 cycles 199388 cycles 1.00
ML-DSA-65 sign 624397 cycles 624088 cycles 1.00
ML-DSA-65 verify 198821 cycles 198617 cycles 1.00
ML-DSA-87 keypair 326496 cycles 326740 cycles 1.00
ML-DSA-87 sign 793167 cycles 791251 cycles 1.00
ML-DSA-87 verify 325285 cycles 325426 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 157768 cycles 157868 cycles 1.00
ML-DSA-44 sign 565176 cycles 565286 cycles 1.00
ML-DSA-44 verify 169663 cycles 169651 cycles 1.00
ML-DSA-65 keypair 270705 cycles 269933 cycles 1.00
ML-DSA-65 sign 926992 cycles 926417 cycles 1.00
ML-DSA-65 verify 275943 cycles 275707 cycles 1.00
ML-DSA-87 keypair 451762 cycles 453450 cycles 1.00
ML-DSA-87 sign 1182855 cycles 1184692 cycles 1.00
ML-DSA-87 verify 461039 cycles 461666 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 73601 cycles 73597 cycles 1.00
ML-DSA-44 sign 227202 cycles 227363 cycles 1.00
ML-DSA-44 verify 77868 cycles 77849 cycles 1.00
ML-DSA-65 keypair 129546 cycles 129620 cycles 1.00
ML-DSA-65 sign 376762 cycles 376582 cycles 1.00
ML-DSA-65 verify 128837 cycles 128919 cycles 1.00
ML-DSA-87 keypair 208069 cycles 210350 cycles 0.99
ML-DSA-87 sign 473163 cycles 478003 cycles 0.99
ML-DSA-87 verify 208251 cycles 209857 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 115246 cycles 115084 cycles 1.00
ML-DSA-44 sign 377347 cycles 377665 cycles 1.00
ML-DSA-44 verify 120379 cycles 120228 cycles 1.00
ML-DSA-65 keypair 199258 cycles 199006 cycles 1.00
ML-DSA-65 sign 623777 cycles 623291 cycles 1.00
ML-DSA-65 verify 198446 cycles 198172 cycles 1.00
ML-DSA-87 keypair 325682 cycles 325932 cycles 1.00
ML-DSA-87 sign 791474 cycles 790353 cycles 1.00
ML-DSA-87 verify 324720 cycles 324766 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 135236 cycles 134995 cycles 1.00
ML-DSA-44 sign 539740 cycles 539506 cycles 1.00
ML-DSA-44 verify 148168 cycles 148059 cycles 1.00
ML-DSA-65 keypair 228423 cycles 228471 cycles 1.00
ML-DSA-65 sign 890762 cycles 889688 cycles 1.00
ML-DSA-65 verify 237701 cycles 237667 cycles 1.00
ML-DSA-87 keypair 373382 cycles 373582 cycles 1.00
ML-DSA-87 sign 1105223 cycles 1107392 cycles 1.00
ML-DSA-87 verify 386880 cycles 388949 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 213946 cycles 213730 cycles 1.00
ML-DSA-44 sign 782669 cycles 794994 cycles 0.98
ML-DSA-44 verify 230553 cycles 230234 cycles 1.00
ML-DSA-65 keypair 384562 cycles 385231 cycles 1.00
ML-DSA-65 sign 1310747 cycles 1308454 cycles 1.00
ML-DSA-65 verify 375796 cycles 376707 cycles 1.00
ML-DSA-87 keypair 606042 cycles 605704 cycles 1.00
ML-DSA-87 sign 1624599 cycles 1626066 cycles 1.00
ML-DSA-87 verify 617550 cycles 617430 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 138393 cycles 138394 cycles 1.00
ML-DSA-44 sign 492826 cycles 493551 cycles 1.00
ML-DSA-44 verify 148361 cycles 148435 cycles 1.00
ML-DSA-65 keypair 241673 cycles 241410 cycles 1.00
ML-DSA-65 sign 810010 cycles 809659 cycles 1.00
ML-DSA-65 verify 240696 cycles 240564 cycles 1.00
ML-DSA-87 keypair 395958 cycles 395737 cycles 1.00
ML-DSA-87 sign 1027981 cycles 1027183 cycles 1.00
ML-DSA-87 verify 401661 cycles 401356 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 69310 cycles 69375 cycles 1.00
ML-DSA-44 sign 214142 cycles 214397 cycles 1.00
ML-DSA-44 verify 72525 cycles 72504 cycles 1.00
ML-DSA-65 keypair 122865 cycles 122679 cycles 1.00
ML-DSA-65 sign 352133 cycles 351717 cycles 1.00
ML-DSA-65 verify 120466 cycles 120383 cycles 1.00
ML-DSA-87 keypair 200304 cycles 200151 cycles 1.00
ML-DSA-87 sign 450001 cycles 450026 cycles 1.00
ML-DSA-87 verify 198317 cycles 198035 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 213327 cycles 213074 cycles 1.00
ML-DSA-44 sign 781964 cycles 782035 cycles 1.00
ML-DSA-44 verify 230096 cycles 230296 cycles 1.00
ML-DSA-65 keypair 384110 cycles 384025 cycles 1.00
ML-DSA-65 sign 1327668 cycles 1313438 cycles 1.01
ML-DSA-65 verify 375414 cycles 375546 cycles 1.00
ML-DSA-87 keypair 605582 cycles 604762 cycles 1.00
ML-DSA-87 sign 1621817 cycles 1622493 cycles 1.00
ML-DSA-87 verify 617531 cycles 616911 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 132670 cycles 132783 cycles 1.00
ML-DSA-44 sign 498269 cycles 498211 cycles 1.00
ML-DSA-44 verify 144844 cycles 144888 cycles 1.00
ML-DSA-65 keypair 226562 cycles 226229 cycles 1.00
ML-DSA-65 sign 813000 cycles 813157 cycles 1.00
ML-DSA-65 verify 231474 cycles 231572 cycles 1.00
ML-DSA-87 keypair 374255 cycles 374486 cycles 1.00
ML-DSA-87 sign 1020753 cycles 1020871 cycles 1.00
ML-DSA-87 verify 383631 cycles 383482 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 227420 cycles 237862 cycles 0.96
ML-DSA-44 sign 667992 cycles 679813 cycles 0.98
ML-DSA-44 verify 229294 cycles 229937 cycles 1.00
ML-DSA-65 keypair 398857 cycles 406732 cycles 0.98
ML-DSA-65 sign 1111293 cycles 1133783 cycles 0.98
ML-DSA-65 verify 389185 cycles 391169 cycles 0.99
ML-DSA-87 keypair 674768 cycles 674039 cycles 1.00
ML-DSA-87 sign 1446254 cycles 1498997 cycles 0.96
ML-DSA-87 verify 647923 cycles 652978 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 328782 cycles 317134 cycles 1.04
ML-DSA-44 sign 1234287 cycles 1208558 cycles 1.02
ML-DSA-44 verify 345765 cycles 337331 cycles 1.03
ML-DSA-65 keypair 590312 cycles 552792 cycles 1.07
ML-DSA-65 sign 2069113 cycles 1955537 cycles 1.06
ML-DSA-65 verify 560071 cycles 530317 cycles 1.06
ML-DSA-87 keypair 884785 cycles 856997 cycles 1.03
ML-DSA-87 sign 2553004 cycles 2434994 cycles 1.05
ML-DSA-87 verify 894934 cycles 873765 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 328782 cycles 317134 cycles 1.04
ML-DSA-65 keypair 590312 cycles 552792 cycles 1.07
ML-DSA-65 sign 2069113 cycles 1955537 cycles 1.06
ML-DSA-65 verify 560071 cycles 530317 cycles 1.06
ML-DSA-87 keypair 884785 cycles 856997 cycles 1.03
ML-DSA-87 sign 2553004 cycles 2434994 cycles 1.05

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)

Benchmark suite Current: 5ed8c3d Previous: 94311a7 Ratio
ML-DSA-44 keypair 821633 cycles 822306 cycles 1.00
ML-DSA-44 sign 3331800 cycles 3331144 cycles 1.00
ML-DSA-44 verify 918861 cycles 920414 cycles 1.00
ML-DSA-65 keypair 1397769 cycles 1399344 cycles 1.00
ML-DSA-65 sign 5458191 cycles 5449582 cycles 1.00
ML-DSA-65 verify 1464992 cycles 1467066 cycles 1.00
ML-DSA-87 keypair 2300306 cycles 2298802 cycles 1.00
ML-DSA-87 sign 6805595 cycles 6800261 cycles 1.00
ML-DSA-87 verify 2400437 cycles 2400074 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@mkannwischer mkannwischer force-pushed the serial-fips202 branch 3 times, most recently from d376e43 to dfa43d2 Compare October 27, 2025 05:24
@rod-chapman rod-chapman force-pushed the serial-fips202 branch 2 times, most recently from 29cd147 to e264496 Compare October 30, 2025 11:48
@rod-chapman
Copy link
Contributor

Ready to review now.

@jakemas
Copy link
Contributor

jakemas commented Nov 4, 2025

will take a look at CMBC proofs once past CI!

mkannwischer and others added 12 commits November 5, 2025 10:35
Currently, in  3 places in mldsa-native (mld_poly_uniform_4x,
mld_poly_uniform_eta_4x, mld_poly_uniform_gamma1_4x) we make use of 4-way
batched Keccak.
That approach requires to keep around 4 Keccak states in memory.
This approach is incompatible with using a Keccak accelerator where there is
only a single state and it lives inside of the accelerator.

This commit adds an option MLD_CONFIG_SERIAL_FIPS202_ONLY that (once fully
implemented) switches to serial processing for the 3 functions above.
The functions above are not yet modified in this commit, but instead we do
it in subsequent commits.

Signed-off-by: Matthias J. Kannwischer <[email protected]>
This commit changes mld_polyvecl_uniform_gamma1 to only use
mld_poly_uniform_gamma1 in case MLD_CONFIG_SERIAL_FIPS202_ONLY is set.

An additional CBMC proof is added that proves the function in case the
option is enabled.

Signed-off-by: Matthias J. Kannwischer <[email protected]>
This commit changes mld_sample_s1_s2 to use poly_uniform_eta only in case
MLD_CONFIG_SERIAL_FIPS202_ONLY is set.

poly_uniform_eta was previously removed because we don't need it in case
batching is enabled. It is re-added here.

A new CBMC proof for poly_uniform_eta is added.
An additional CBMC proof for sample_s1_s2 with MLD_CONFIG_SERIAL_FIPS202_ONLY
set is added.

Signed-off-by: Matthias J. Kannwischer <[email protected]>
This commit changes mld_polyvec_matrix_expand to use only poly_uniform
in case MLD_CONFIG_SERIAL_FIPS202_ONLY is set.

A CBMC proof is added.
This change closely follows
pq-code-package/mlkem-native@c670e1d

Signed-off-by: Matthias J. Kannwischer <[email protected]>
If MLD_CONFIG_SERIAL_FIPS202_ONLY is set, we don't make use of any fips202x4
functions.
This commit makes the include conditional, such that consumers setting
MLD_CONFIG_SERIAL_FIPS202_ONLY don't have to provide an (empty) fips202x4.h.

Signed-off-by: Matthias J. Kannwischer <[email protected]>
This commit adds a minimal example for how to use mlkem-native
with external FIPS202 HW/SW-implementations that use a single
global state (for example, some hardware accelerators).
Specifically, the example demonstrates the use of the serial-only
FIPS202 configuration `MLD_CONFIG_SERIAL_FIPS202_ONLY`.

Port of
pq-code-package/mlkem-native#1237

Signed-off-by: Matthias J. Kannwischer <[email protected]>
1. Weaken post-condition and loop invariant in polyvecl_add(). The stonger
   post-condition was unncessary.

2. Simplify polyvec_matrix_expand(). Small performance loss here since
   batched_seeds[] is (re-) initialized every time. This is bit slower
   but removes a loop statement entirely.

3. Refactor polyvec_pointwise_acc_montgomery() by splitting core
   "sum of products" calculation into a distinct local function
   mld_pointwise_sum_of_products(). Add proof of the latter.

Proof time for parameter set 87 now 4 minutes (real-time) and
40 minutes (user time) with 64 cores on an r7g instance.

Signed-off-by: Rod Chapman <[email protected]>
1. Weaken the contract of mld_polyveck_add(). This is sufficient
   to prove the caller in mld_attempt_signature_generation()

2. Introduce a new mld_polyveck_add_error() with stronger contracts
   where the second argument is an error term where all coefficients
   are bounded in absolute value to MLDSA_ETA.  The post-condition
   of this function is just right to prove the calling code
   in crypto_sign_keypair_internal() which relies on these
   stronger bounds.

3. Add proof artefacts as appropriate.

Signed-off-by: Rod Chapman <[email protected]>
@mkannwischer
Copy link
Contributor Author

will take a look at CMBC proofs once past CI!

Let's first get the CBMC fixes merged in #611.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Port MLK_CONFIG_SERIAL_FIPS202_ONLY configuration

5 participants