Skip to content

Conversation

@jammychiou1
Copy link
Contributor

@jammychiou1 jammychiou1 changed the title Add bounds reasoning comments to AVX2 ntt/intt Add bounds reasoning comments to AVX2 backend Oct 27, 2025
@jammychiou1 jammychiou1 force-pushed the avx2-bound-comments branch 2 times, most recently from ca27600 to da40590 Compare November 2, 2025 10:51
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (opt)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 46245 cycles 46243 cycles 1.00
ML-DSA-44 sign 132702 cycles 132735 cycles 1.00
ML-DSA-44 verify 47876 cycles 47881 cycles 1.00
ML-DSA-65 keypair 81159 cycles 81166 cycles 1.00
ML-DSA-65 sign 219247 cycles 219290 cycles 1.00
ML-DSA-65 verify 80130 cycles 80129 cycles 1.00
ML-DSA-87 keypair 132357 cycles 132350 cycles 1.00
ML-DSA-87 sign 280984 cycles 280937 cycles 1.00
ML-DSA-87 verify 130424 cycles 130406 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (no-opt)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 115064 cycles 115058 cycles 1.00
ML-DSA-44 sign 431635 cycles 431665 cycles 1.00
ML-DSA-44 verify 122206 cycles 122172 cycles 1.00
ML-DSA-65 keypair 197112 cycles 197083 cycles 1.00
ML-DSA-65 sign 701011 cycles 701034 cycles 1.00
ML-DSA-65 verify 197688 cycles 197688 cycles 1
ML-DSA-87 keypair 325227 cycles 325219 cycles 1.00
ML-DSA-87 sign 884685 cycles 884692 cycles 1.00
ML-DSA-87 verify 328848 cycles 328850 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 115004 cycles 114982 cycles 1.00
ML-DSA-44 sign 377271 cycles 377314 cycles 1.00
ML-DSA-44 verify 120313 cycles 120175 cycles 1.00
ML-DSA-65 keypair 199250 cycles 199171 cycles 1.00
ML-DSA-65 sign 622635 cycles 622821 cycles 1.00
ML-DSA-65 verify 198187 cycles 198196 cycles 1.00
ML-DSA-87 keypair 326349 cycles 325598 cycles 1.00
ML-DSA-87 sign 790980 cycles 790006 cycles 1.00
ML-DSA-87 verify 325253 cycles 324398 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 34962 cycles 35293 cycles 0.99
ML-DSA-44 sign 121049 cycles 120017 cycles 1.01
ML-DSA-44 verify 38221 cycles 38192 cycles 1.00
ML-DSA-65 keypair 61665 cycles 61551 cycles 1.00
ML-DSA-65 sign 199507 cycles 199134 cycles 1.00
ML-DSA-65 verify 62246 cycles 62382 cycles 1.00
ML-DSA-87 keypair 95126 cycles 93721 cycles 1.01
ML-DSA-87 sign 235196 cycles 229923 cycles 1.02
ML-DSA-87 verify 93916 cycles 94024 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 95254 cycles 94967 cycles 1.00
ML-DSA-44 sign 349037 cycles 348918 cycles 1.00
ML-DSA-44 verify 100789 cycles 100735 cycles 1.00
ML-DSA-65 keypair 164263 cycles 164543 cycles 1.00
ML-DSA-65 sign 567230 cycles 567551 cycles 1.00
ML-DSA-65 verify 165474 cycles 165398 cycles 1.00
ML-DSA-87 keypair 266932 cycles 267562 cycles 1.00
ML-DSA-87 sign 722097 cycles 722682 cycles 1.00
ML-DSA-87 verify 272046 cycles 271670 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 213333 cycles 213108 cycles 1.00
ML-DSA-44 sign 782029 cycles 782015 cycles 1.00
ML-DSA-44 verify 230092 cycles 230320 cycles 1.00
ML-DSA-65 keypair 384054 cycles 383982 cycles 1.00
ML-DSA-65 sign 1326768 cycles 1313471 cycles 1.01
ML-DSA-65 verify 375377 cycles 375490 cycles 1.00
ML-DSA-87 keypair 605449 cycles 605206 cycles 1.00
ML-DSA-87 sign 1621496 cycles 1622880 cycles 1.00
ML-DSA-87 verify 617407 cycles 617415 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 229297 cycles 225291 cycles 1.02
ML-DSA-44 sign 680601 cycles 674064 cycles 1.01
ML-DSA-44 verify 230038 cycles 228253 cycles 1.01
ML-DSA-65 keypair 392762 cycles 399283 cycles 0.98
ML-DSA-65 sign 1120149 cycles 1102258 cycles 1.02
ML-DSA-65 verify 383658 cycles 383981 cycles 1.00
ML-DSA-87 keypair 663306 cycles 645046 cycles 1.03
ML-DSA-87 sign 1465499 cycles 1407349 cycles 1.04
ML-DSA-87 verify 649376 cycles 625913 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 69406 cycles 69297 cycles 1.00
ML-DSA-44 sign 215066 cycles 215394 cycles 1.00
ML-DSA-44 verify 72872 cycles 72803 cycles 1.00
ML-DSA-65 keypair 123156 cycles 123048 cycles 1.00
ML-DSA-65 sign 353712 cycles 354049 cycles 1.00
ML-DSA-65 verify 120786 cycles 120878 cycles 1.00
ML-DSA-87 keypair 201134 cycles 200545 cycles 1.00
ML-DSA-87 sign 451336 cycles 451914 cycles 1.00
ML-DSA-87 verify 198201 cycles 198487 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 69683 cycles 69592 cycles 1.00
ML-DSA-44 sign 184860 cycles 185506 cycles 1.00
ML-DSA-44 verify 69154 cycles 69226 cycles 1.00
ML-DSA-65 keypair 119441 cycles 120798 cycles 0.99
ML-DSA-65 sign 295459 cycles 298349 cycles 0.99
ML-DSA-65 verify 115470 cycles 116590 cycles 0.99
ML-DSA-87 keypair 201494 cycles 201311 cycles 1.00
ML-DSA-87 sign 385746 cycles 386042 cycles 1.00
ML-DSA-87 verify 193805 cycles 193658 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 57175 cycles 56967 cycles 1.00
ML-DSA-44 sign 180395 cycles 180312 cycles 1.00
ML-DSA-44 verify 61138 cycles 61223 cycles 1.00
ML-DSA-65 keypair 99517 cycles 99149 cycles 1.00
ML-DSA-65 sign 295948 cycles 296299 cycles 1.00
ML-DSA-65 verify 100305 cycles 100113 cycles 1.00
ML-DSA-87 keypair 153336 cycles 153114 cycles 1.00
ML-DSA-87 sign 352913 cycles 353081 cycles 1.00
ML-DSA-87 verify 152140 cycles 151972 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 115166 cycles 115181 cycles 1.00
ML-DSA-44 sign 377510 cycles 377683 cycles 1.00
ML-DSA-44 verify 120392 cycles 120343 cycles 1.00
ML-DSA-65 keypair 199370 cycles 199283 cycles 1.00
ML-DSA-65 sign 623245 cycles 623012 cycles 1.00
ML-DSA-65 verify 198354 cycles 198353 cycles 1.00
ML-DSA-87 keypair 326701 cycles 326259 cycles 1.00
ML-DSA-87 sign 791828 cycles 790809 cycles 1.00
ML-DSA-87 verify 325412 cycles 324916 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 73849 cycles 73827 cycles 1.00
ML-DSA-44 sign 228459 cycles 228653 cycles 1.00
ML-DSA-44 verify 78255 cycles 78142 cycles 1.00
ML-DSA-65 keypair 129816 cycles 129734 cycles 1.00
ML-DSA-65 sign 378404 cycles 378349 cycles 1.00
ML-DSA-65 verify 129311 cycles 129160 cycles 1.00
ML-DSA-87 keypair 208622 cycles 210617 cycles 0.99
ML-DSA-87 sign 479146 cycles 479575 cycles 1.00
ML-DSA-87 verify 208503 cycles 210205 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 132648 cycles 132784 cycles 1.00
ML-DSA-44 sign 498274 cycles 498151 cycles 1.00
ML-DSA-44 verify 144830 cycles 144894 cycles 1.00
ML-DSA-65 keypair 226547 cycles 226211 cycles 1.00
ML-DSA-65 sign 813539 cycles 812397 cycles 1.00
ML-DSA-65 verify 231077 cycles 231596 cycles 1.00
ML-DSA-87 keypair 374183 cycles 374501 cycles 1.00
ML-DSA-87 sign 1020719 cycles 1020931 cycles 1.00
ML-DSA-87 verify 383566 cycles 383519 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 135440 cycles 134965 cycles 1.00
ML-DSA-44 sign 539993 cycles 540181 cycles 1.00
ML-DSA-44 verify 148184 cycles 148325 cycles 1.00
ML-DSA-65 keypair 228063 cycles 227974 cycles 1.00
ML-DSA-65 sign 889220 cycles 893643 cycles 1.00
ML-DSA-65 verify 237929 cycles 237974 cycles 1.00
ML-DSA-87 keypair 372971 cycles 372801 cycles 1.00
ML-DSA-87 sign 1106723 cycles 1105358 cycles 1.00
ML-DSA-87 verify 386855 cycles 387678 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 157988 cycles 157813 cycles 1.00
ML-DSA-44 sign 565277 cycles 564791 cycles 1.00
ML-DSA-44 verify 169766 cycles 169751 cycles 1.00
ML-DSA-65 keypair 270290 cycles 270498 cycles 1.00
ML-DSA-65 sign 925971 cycles 926242 cycles 1.00
ML-DSA-65 verify 276263 cycles 275901 cycles 1.00
ML-DSA-87 keypair 452768 cycles 453347 cycles 1.00
ML-DSA-87 sign 1188324 cycles 1186402 cycles 1.00
ML-DSA-87 verify 461505 cycles 461198 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 42686 cycles 42979 cycles 0.99
ML-DSA-44 sign 130695 cycles 130913 cycles 1.00
ML-DSA-44 verify 44216 cycles 44413 cycles 1.00
ML-DSA-65 keypair 72672 cycles 72353 cycles 1.00
ML-DSA-65 sign 210616 cycles 212806 cycles 0.99
ML-DSA-65 verify 73367 cycles 72915 cycles 1.01
ML-DSA-87 keypair 109463 cycles 109750 cycles 1.00
ML-DSA-87 sign 249730 cycles 248355 cycles 1.01
ML-DSA-87 verify 110219 cycles 111517 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 288553 cycles 288273 cycles 1.00
ML-DSA-44 sign 927585 cycles 936164 cycles 0.99
ML-DSA-44 verify 295250 cycles 292179 cycles 1.01
ML-DSA-65 keypair 487983 cycles 488081 cycles 1.00
ML-DSA-65 sign 1530864 cycles 1529760 cycles 1.00
ML-DSA-65 verify 482983 cycles 475632 cycles 1.02
ML-DSA-87 keypair 831824 cycles 841156 cycles 0.99
ML-DSA-87 sign 2087730 cycles 2121320 cycles 0.98
ML-DSA-87 verify 815092 cycles 826714 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 138369 cycles 138318 cycles 1.00
ML-DSA-44 sign 492999 cycles 493648 cycles 1.00
ML-DSA-44 verify 148347 cycles 148347 cycles 1
ML-DSA-65 keypair 241732 cycles 241461 cycles 1.00
ML-DSA-65 sign 809768 cycles 809767 cycles 1.00
ML-DSA-65 verify 240679 cycles 240584 cycles 1.00
ML-DSA-87 keypair 395821 cycles 395729 cycles 1.00
ML-DSA-87 sign 1027366 cycles 1027294 cycles 1.00
ML-DSA-87 verify 401516 cycles 401299 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 213829 cycles 213522 cycles 1.00
ML-DSA-44 sign 782940 cycles 794930 cycles 0.98
ML-DSA-44 verify 230437 cycles 230022 cycles 1.00
ML-DSA-65 keypair 384543 cycles 384988 cycles 1.00
ML-DSA-65 sign 1310640 cycles 1307299 cycles 1.00
ML-DSA-65 verify 375817 cycles 376399 cycles 1.00
ML-DSA-87 keypair 606688 cycles 605922 cycles 1.00
ML-DSA-87 sign 1626018 cycles 1626375 cycles 1.00
ML-DSA-87 verify 618278 cycles 617623 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 120170 cycles 120288 cycles 1.00
ML-DSA-44 sign 454518 cycles 453859 cycles 1.00
ML-DSA-44 verify 130051 cycles 130271 cycles 1.00
ML-DSA-65 keypair 204928 cycles 205208 cycles 1.00
ML-DSA-65 sign 736614 cycles 736112 cycles 1.00
ML-DSA-65 verify 209739 cycles 209715 cycles 1.00
ML-DSA-87 keypair 337171 cycles 337055 cycles 1.00
ML-DSA-87 sign 928686 cycles 926156 cycles 1.00
ML-DSA-87 verify 345798 cycles 345516 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 317365 cycles 318223 cycles 1.00
ML-DSA-44 sign 1224779 cycles 1227916 cycles 1.00
ML-DSA-44 verify 344907 cycles 339753 cycles 1.02
ML-DSA-65 keypair 552815 cycles 559842 cycles 0.99
ML-DSA-65 sign 1909394 cycles 1942017 cycles 0.98
ML-DSA-65 verify 519791 cycles 529374 cycles 0.98
ML-DSA-87 keypair 872331 cycles 863765 cycles 1.01
ML-DSA-87 sign 2478204 cycles 2444597 cycles 1.01
ML-DSA-87 verify 883699 cycles 863122 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 466124 cycles 464693 cycles 1.00
ML-DSA-44 sign 2213126 cycles 2221355 cycles 1.00
ML-DSA-44 verify 550344 cycles 545927 cycles 1.01
ML-DSA-65 keypair 777048 cycles 776812 cycles 1.00
ML-DSA-65 sign 3645043 cycles 3636149 cycles 1.00
ML-DSA-65 verify 849253 cycles 849524 cycles 1.00
ML-DSA-87 keypair 1254730 cycles 1270549 cycles 0.99
ML-DSA-87 sign 4489453 cycles 4519665 cycles 0.99
ML-DSA-87 verify 1370350 cycles 1380875 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-44 keypair 822519 cycles 821707 cycles 1.00
ML-DSA-44 sign 3332513 cycles 3334149 cycles 1.00
ML-DSA-44 verify 919075 cycles 918864 cycles 1.00
ML-DSA-65 keypair 1397964 cycles 1396572 cycles 1.00
ML-DSA-65 sign 5447376 cycles 5443674 cycles 1.00
ML-DSA-65 verify 1465026 cycles 1464453 cycles 1.00
ML-DSA-87 keypair 2301172 cycles 2300947 cycles 1.00
ML-DSA-87 sign 6820350 cycles 6813578 cycles 1.00
ML-DSA-87 verify 2402933 cycles 2397029 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 6ca67e1 Previous: ee7e1bf Ratio
ML-DSA-87 sign 1465499 cycles 1407349 cycles 1.04
ML-DSA-87 verify 649376 cycles 625913 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

@jammychiou1 jammychiou1 force-pushed the avx2-bound-comments branch 2 times, most recently from 06248ba to 8bf430b Compare November 3, 2025 01:42
@jammychiou1 jammychiou1 marked this pull request as ready for review November 3, 2025 01:45
@jammychiou1 jammychiou1 requested a review from a team as a code owner November 3, 2025 01:45
The new approach is adapted from our Neon implementation. See
<#411 (comment)>
for more information on the idea.

Bounds reasoning comments are also added.

Signed-off-by: jammychiou1 <[email protected]>
Edit some comments while we're at it.

Signed-off-by: jammychiou1 <[email protected]>
Copy link
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jammychiou1. I checked the bounds tracking for NTT, iNTT, basemul and everything makes sense to me. Please include some reasoning for the 3q/4 bounds.
It would also be great if you can extend https://github.com/pq-code-package/mlkem-native/blob/main/test/test_bounds.py to demonstrate the 3q/4 bound.

Let's move the decompose changes to a separate follow-up PR, please.

Comment on lines +46 to +50
/*
* Compute l + h, montmul(h - l, zh) then store the results back to l, h
* respectively.
*
* The general abs bound of Montgomery multiplication is 3q/4.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline: Please include reasoing for the 3q/4 bound --- you actually mean Montgomery multiplication by a constant, not general Montgomery multiplication.

Comment on lines +51 to +52
* Although the general abs bound of Montgomery multiplication is 3q/4, we use
* the more convenient bound q here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase and reference intt.S for the 3q/4.

@jammychiou1 jammychiou1 marked this pull request as draft November 5, 2025 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add bounds reasoning comments to AVX2 backend

4 participants