-
Notifications
You must be signed in to change notification settings - Fork 41
Optimized AVX2 batched Keccak_x4 #1521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: manastasova <manastasova2017@fau.edu>
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
12327 cycles |
12327 cycles |
1 |
ML-KEM-512 encaps |
15031 cycles |
15030 cycles |
1.00 |
ML-KEM-512 decaps |
19607 cycles |
19609 cycles |
1.00 |
ML-KEM-768 keypair |
21092 cycles |
21092 cycles |
1 |
ML-KEM-768 encaps |
23863 cycles |
23861 cycles |
1.00 |
ML-KEM-768 decaps |
30443 cycles |
30442 cycles |
1.00 |
ML-KEM-1024 keypair |
30376 cycles |
30376 cycles |
1 |
ML-KEM-1024 encaps |
34642 cycles |
34643 cycles |
1.00 |
ML-KEM-1024 decaps |
44279 cycles |
44268 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ppc64le (POWER10) benchmarks
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
59191 cycles |
59226 cycles |
1.00 |
ML-KEM-512 encaps |
71851 cycles |
71908 cycles |
1.00 |
ML-KEM-512 decaps |
91623 cycles |
91531 cycles |
1.00 |
ML-KEM-768 keypair |
98551 cycles |
98104 cycles |
1.00 |
ML-KEM-768 encaps |
114881 cycles |
114532 cycles |
1.00 |
ML-KEM-768 decaps |
140316 cycles |
140016 cycles |
1.00 |
ML-KEM-1024 keypair |
148524 cycles |
148872 cycles |
1.00 |
ML-KEM-1024 encaps |
167437 cycles |
167765 cycles |
1.00 |
ML-KEM-1024 decaps |
198455 cycles |
199095 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
9367 cycles |
9650 cycles |
0.97 |
ML-KEM-512 encaps |
11028 cycles |
11457 cycles |
0.96 |
ML-KEM-512 decaps |
15284 cycles |
15335 cycles |
1.00 |
ML-KEM-768 keypair |
16012 cycles |
16453 cycles |
0.97 |
ML-KEM-768 encaps |
17642 cycles |
17930 cycles |
0.98 |
ML-KEM-768 decaps |
23218 cycles |
23627 cycles |
0.98 |
ML-KEM-1024 keypair |
22181 cycles |
22362 cycles |
0.99 |
ML-KEM-1024 encaps |
24116 cycles |
24602 cycles |
0.98 |
ML-KEM-1024 decaps |
31703 cycles |
32362 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
13858 cycles |
16681 cycles |
0.83 |
ML-KEM-512 encaps |
15565 cycles |
18380 cycles |
0.85 |
ML-KEM-512 decaps |
21034 cycles |
23712 cycles |
0.89 |
ML-KEM-768 keypair |
23415 cycles |
28448 cycles |
0.82 |
ML-KEM-768 encaps |
24751 cycles |
29801 cycles |
0.83 |
ML-KEM-768 decaps |
32627 cycles |
37656 cycles |
0.87 |
ML-KEM-1024 keypair |
32841 cycles |
41276 cycles |
0.80 |
ML-KEM-1024 encaps |
35167 cycles |
43491 cycles |
0.81 |
ML-KEM-1024 decaps |
45730 cycles |
53885 cycles |
0.85 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
28399 cycles |
28434 cycles |
1.00 |
ML-KEM-512 encaps |
35834 cycles |
35766 cycles |
1.00 |
ML-KEM-512 decaps |
45554 cycles |
45475 cycles |
1.00 |
ML-KEM-768 keypair |
45863 cycles |
45954 cycles |
1.00 |
ML-KEM-768 encaps |
56265 cycles |
56116 cycles |
1.00 |
ML-KEM-768 decaps |
69388 cycles |
69482 cycles |
1.00 |
ML-KEM-1024 keypair |
71853 cycles |
71559 cycles |
1.00 |
ML-KEM-1024 encaps |
84528 cycles |
84605 cycles |
1.00 |
ML-KEM-1024 decaps |
101544 cycles |
101115 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
28311 cycles |
28387 cycles |
1.00 |
ML-KEM-512 encaps |
34295 cycles |
34239 cycles |
1.00 |
ML-KEM-512 decaps |
44520 cycles |
44594 cycles |
1.00 |
ML-KEM-768 keypair |
47859 cycles |
47826 cycles |
1.00 |
ML-KEM-768 encaps |
54136 cycles |
54274 cycles |
1.00 |
ML-KEM-768 decaps |
68640 cycles |
68610 cycles |
1.00 |
ML-KEM-1024 keypair |
70519 cycles |
70609 cycles |
1.00 |
ML-KEM-1024 encaps |
79075 cycles |
79023 cycles |
1.00 |
ML-KEM-1024 decaps |
98785 cycles |
98791 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
14391 cycles |
16399 cycles |
0.88 |
ML-KEM-512 encaps |
16918 cycles |
18695 cycles |
0.90 |
ML-KEM-512 decaps |
23408 cycles |
25297 cycles |
0.93 |
ML-KEM-768 keypair |
25118 cycles |
27938 cycles |
0.90 |
ML-KEM-768 encaps |
27078 cycles |
29785 cycles |
0.91 |
ML-KEM-768 decaps |
36486 cycles |
41180 cycles |
0.89 |
ML-KEM-1024 keypair |
33822 cycles |
37708 cycles |
0.90 |
ML-KEM-1024 encaps |
36159 cycles |
40685 cycles |
0.89 |
ML-KEM-1024 decaps |
49453 cycles |
54424 cycles |
0.91 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
17688 cycles |
17702 cycles |
1.00 |
ML-KEM-512 encaps |
20670 cycles |
20702 cycles |
1.00 |
ML-KEM-512 decaps |
27133 cycles |
27133 cycles |
1 |
ML-KEM-768 keypair |
29987 cycles |
30013 cycles |
1.00 |
ML-KEM-768 encaps |
32854 cycles |
32811 cycles |
1.00 |
ML-KEM-768 decaps |
42013 cycles |
42061 cycles |
1.00 |
ML-KEM-1024 keypair |
43899 cycles |
43914 cycles |
1.00 |
ML-KEM-1024 encaps |
48921 cycles |
48930 cycles |
1.00 |
ML-KEM-1024 decaps |
61602 cycles |
61496 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
10521 cycles |
12005 cycles |
0.88 |
ML-KEM-512 encaps |
12061 cycles |
13291 cycles |
0.91 |
ML-KEM-512 decaps |
17019 cycles |
18051 cycles |
0.94 |
ML-KEM-768 keypair |
18414 cycles |
20559 cycles |
0.90 |
ML-KEM-768 encaps |
19515 cycles |
21546 cycles |
0.91 |
ML-KEM-768 decaps |
26626 cycles |
28661 cycles |
0.93 |
ML-KEM-1024 keypair |
24623 cycles |
27867 cycles |
0.88 |
ML-KEM-1024 encaps |
26733 cycles |
29966 cycles |
0.89 |
ML-KEM-1024 decaps |
36391 cycles |
39500 cycles |
0.92 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
18737 cycles |
18762 cycles |
1.00 |
ML-KEM-512 encaps |
22002 cycles |
22034 cycles |
1.00 |
ML-KEM-512 decaps |
29037 cycles |
29048 cycles |
1.00 |
ML-KEM-768 keypair |
31782 cycles |
31800 cycles |
1.00 |
ML-KEM-768 encaps |
35006 cycles |
34950 cycles |
1.00 |
ML-KEM-768 decaps |
45010 cycles |
45042 cycles |
1.00 |
ML-KEM-1024 keypair |
46334 cycles |
46355 cycles |
1.00 |
ML-KEM-1024 encaps |
51700 cycles |
51750 cycles |
1.00 |
ML-KEM-1024 decaps |
65265 cycles |
65261 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
40116 cycles |
40315 cycles |
1.00 |
ML-KEM-512 encaps |
48288 cycles |
48372 cycles |
1.00 |
ML-KEM-512 decaps |
62456 cycles |
62465 cycles |
1.00 |
ML-KEM-768 keypair |
63657 cycles |
63637 cycles |
1.00 |
ML-KEM-768 encaps |
74664 cycles |
74830 cycles |
1.00 |
ML-KEM-768 decaps |
93254 cycles |
93236 cycles |
1.00 |
ML-KEM-1024 keypair |
95000 cycles |
95054 cycles |
1.00 |
ML-KEM-1024 encaps |
108988 cycles |
109053 cycles |
1.00 |
ML-KEM-1024 decaps |
131759 cycles |
131949 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
45781 cycles |
45812 cycles |
1.00 |
ML-KEM-512 encaps |
54744 cycles |
54778 cycles |
1.00 |
ML-KEM-512 decaps |
70275 cycles |
70387 cycles |
1.00 |
ML-KEM-768 keypair |
73830 cycles |
73966 cycles |
1.00 |
ML-KEM-768 encaps |
85352 cycles |
85383 cycles |
1.00 |
ML-KEM-768 decaps |
106339 cycles |
106459 cycles |
1.00 |
ML-KEM-1024 keypair |
111726 cycles |
111805 cycles |
1.00 |
ML-KEM-1024 encaps |
125852 cycles |
125952 cycles |
1.00 |
ML-KEM-1024 decaps |
151675 cycles |
151837 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
35451 cycles |
35858 cycles |
0.99 |
ML-KEM-512 encaps |
40205 cycles |
40235 cycles |
1.00 |
ML-KEM-512 decaps |
51249 cycles |
51231 cycles |
1.00 |
ML-KEM-768 keypair |
56829 cycles |
56807 cycles |
1.00 |
ML-KEM-768 encaps |
64602 cycles |
65391 cycles |
0.99 |
ML-KEM-768 decaps |
78906 cycles |
79291 cycles |
1.00 |
ML-KEM-1024 keypair |
88081 cycles |
88036 cycles |
1.00 |
ML-KEM-1024 encaps |
97248 cycles |
97186 cycles |
1.00 |
ML-KEM-1024 decaps |
116231 cycles |
116104 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
36534 cycles |
36687 cycles |
1.00 |
ML-KEM-512 encaps |
43003 cycles |
43011 cycles |
1.00 |
ML-KEM-512 decaps |
55675 cycles |
55666 cycles |
1.00 |
ML-KEM-768 keypair |
58415 cycles |
58457 cycles |
1.00 |
ML-KEM-768 encaps |
67402 cycles |
67409 cycles |
1.00 |
ML-KEM-768 decaps |
84350 cycles |
84377 cycles |
1.00 |
ML-KEM-1024 keypair |
88631 cycles |
88658 cycles |
1.00 |
ML-KEM-1024 encaps |
98864 cycles |
98909 cycles |
1.00 |
ML-KEM-1024 decaps |
120369 cycles |
120440 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
39000 cycles |
39834 cycles |
0.98 |
ML-KEM-512 encaps |
44609 cycles |
44640 cycles |
1.00 |
ML-KEM-512 decaps |
56711 cycles |
56703 cycles |
1.00 |
ML-KEM-768 keypair |
62433 cycles |
62431 cycles |
1.00 |
ML-KEM-768 encaps |
70885 cycles |
71780 cycles |
0.99 |
ML-KEM-768 decaps |
86781 cycles |
87166 cycles |
1.00 |
ML-KEM-1024 keypair |
96335 cycles |
96262 cycles |
1.00 |
ML-KEM-1024 encaps |
106377 cycles |
106330 cycles |
1.00 |
ML-KEM-1024 decaps |
126937 cycles |
126801 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
59531 cycles |
59477 cycles |
1.00 |
ML-KEM-512 encaps |
67162 cycles |
67251 cycles |
1.00 |
ML-KEM-512 decaps |
85830 cycles |
85738 cycles |
1.00 |
ML-KEM-768 keypair |
96983 cycles |
97018 cycles |
1.00 |
ML-KEM-768 encaps |
110439 cycles |
110424 cycles |
1.00 |
ML-KEM-768 decaps |
137219 cycles |
137136 cycles |
1.00 |
ML-KEM-1024 keypair |
154182 cycles |
154167 cycles |
1.00 |
ML-KEM-1024 encaps |
170792 cycles |
170457 cycles |
1.00 |
ML-KEM-1024 decaps |
206619 cycles |
206792 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
28394 cycles |
28314 cycles |
1.00 |
ML-KEM-512 encaps |
34245 cycles |
34303 cycles |
1.00 |
ML-KEM-512 decaps |
44588 cycles |
44520 cycles |
1.00 |
ML-KEM-768 keypair |
47894 cycles |
47846 cycles |
1.00 |
ML-KEM-768 encaps |
54377 cycles |
54137 cycles |
1.00 |
ML-KEM-768 decaps |
68808 cycles |
68665 cycles |
1.00 |
ML-KEM-1024 keypair |
70500 cycles |
70549 cycles |
1.00 |
ML-KEM-1024 encaps |
79000 cycles |
79141 cycles |
1.00 |
ML-KEM-1024 decaps |
98806 cycles |
98835 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
59023 cycles |
59570 cycles |
0.99 |
ML-KEM-512 encaps |
68543 cycles |
68575 cycles |
1.00 |
ML-KEM-512 decaps |
87352 cycles |
87326 cycles |
1.00 |
ML-KEM-768 keypair |
95685 cycles |
95742 cycles |
1.00 |
ML-KEM-768 encaps |
109493 cycles |
109660 cycles |
1.00 |
ML-KEM-768 decaps |
134366 cycles |
134532 cycles |
1.00 |
ML-KEM-1024 keypair |
146719 cycles |
148351 cycles |
0.99 |
ML-KEM-1024 encaps |
162524 cycles |
164301 cycles |
0.99 |
ML-KEM-1024 decaps |
194686 cycles |
195562 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpacemiT K1 8 (Banana Pi F3) benchmarks
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
155156 cycles |
155116 cycles |
1.00 |
ML-KEM-512 encaps |
163334 cycles |
163295 cycles |
1.00 |
ML-KEM-512 decaps |
206536 cycles |
206550 cycles |
1.00 |
ML-KEM-768 keypair |
249502 cycles |
249522 cycles |
1.00 |
ML-KEM-768 encaps |
270309 cycles |
270296 cycles |
1.00 |
ML-KEM-768 decaps |
332165 cycles |
332114 cycles |
1.00 |
ML-KEM-1024 keypair |
395238 cycles |
395146 cycles |
1.00 |
ML-KEM-1024 encaps |
423919 cycles |
423791 cycles |
1.00 |
ML-KEM-1024 decaps |
505639 cycles |
505524 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks
Details
| Benchmark suite | Current: 1647e99 | Previous: ebed0f3 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
50737 cycles |
50672 cycles |
1.00 |
ML-KEM-512 encaps |
58468 cycles |
58540 cycles |
1.00 |
ML-KEM-512 decaps |
74124 cycles |
74115 cycles |
1.00 |
ML-KEM-768 keypair |
86957 cycles |
86481 cycles |
1.01 |
ML-KEM-768 encaps |
95630 cycles |
94380 cycles |
1.01 |
ML-KEM-768 decaps |
118875 cycles |
117403 cycles |
1.01 |
ML-KEM-1024 keypair |
131396 cycles |
129771 cycles |
1.01 |
ML-KEM-1024 encaps |
143409 cycles |
142171 cycles |
1.01 |
ML-KEM-1024 decaps |
174419 cycles |
173614 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
Signed-off-by: manastasova <manastasova2017@fau.edu>
Signed-off-by: manastasova <manastasova2017@fau.edu>
CBMC Results (ML-KEM-512)Full Results (139 proofs)
|
CBMC Results (ML-KEM-1024)
Full Results (139 proofs)
|
CBMC Results (ML-KEM-768)Full Results (139 proofs)
|
This commit improves the performance of the AVX2 Keccak-F1600x4 implementation by: