Skip to content

Commit be2e1a1

Browse files
Nicoshevfacebook-github-bot
authored andcommitted
Improve kleidi-ai matmul register usage (#5165)
Summary: X-link: facebookresearch/FBGEMM#2164 Refactor kleidi-ai matrix multiplication routines to only rely on temporary registers. Doing this removes from each call the need to save and restore registers [x19, x30], and also simd registers [d9, d15] for the smaller routines. Cache miss is likely to happen for these loads, as the matrix processing should fill the cache. Around 10 memory load/store instructions get removed on each subroutine. Because larger matrixes get broken down into small pieces, these savings are once per piece. Reducing code size for these routines also makes them more likely to be in cache when needing to execute them. Benchmarks seem to show a small improvement. Now, some nice runs show almost the same throughput as BGM: P2050470491, P2050484212 Reviewed By: mcfi Differential Revision: D87656468
1 parent eb1ae89 commit be2e1a1

File tree

1 file changed

+965
-965
lines changed

1 file changed

+965
-965
lines changed

0 commit comments

Comments
 (0)