You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As Christophe pointed out, tuning the chacha implementation by
scheduling the instructions like what GCC does can improve the
performance.
The tuning does not introduce too much complexity (basically it's just
reordering some instructions). And the tuning does not hurt readibility
too much: actually the tuned code looks even more similar to a
textbook-style implementation based on 128-bit vectors. So overall it's
a good deal to me.
Tested with vdso_test_getchacha and benched with vdso_test_getrandom.
On a LA664 the speedup is 5%, and I expect a larger speedup on LA[2-4]64
with a lower issue rate.
Suggested-by: Christophe Leroy <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Xi Ruoyao <[email protected]>
Reviewed-by: Huacai Chen <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
0 commit comments