Skip to content

Commit edd0424

Browse files
Nicoshevfacebook-github-bot
authored andcommitted
Improve kleidi-ai matmul register usage
Summary: Refactor kleidi-ai matrix multiplication routines to only rely on temporary registers. Doing this removes the need to save and restore registers [x19, x30] on each call. Cache miss is likely to happen for these loads, as the matrix processing should fill the cache. 10 memory load/store instructions get removed on the 8x1 and 7x1 cases, 8 or 9 per case on the remaining ones. Because larger matrixes get broken down into small pieces, these savings are once per piece. Reducing code size for these routines also makes them more likely to be in cache when needing to execute it. Benchmarks seem to show a small improvement. Now, some nice runs show almost the same throughput as BGM: P2050470491, P2050484212 Differential Revision: D87656468
1 parent f9c6156 commit edd0424

File tree

1 file changed

+768
-768
lines changed

1 file changed

+768
-768
lines changed

0 commit comments

Comments
 (0)