Improve kleidi-ai matmul register usage #5165

Nicoshev · 2025-11-21T19:01:56Z

Summary:
Refactor kleidi-ai matrix multiplication routines to only rely on temporary registers.
Doing this removes from each call the need to save and restore registers [x19, x30], and also simd registers [d9, d15] for the smaller routines.
Around 10 memory load/store instructions get removed on each subroutine.
Because larger matrixes get broken down into small pieces, these savings are once per piece.
Reducing code size for these routines also makes them more likely to be in cache when needing to execute them.

Differential Revision: D87656468

meta-codesync · 2025-11-21T19:02:02Z

@Nicoshev has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87656468.

Summary: X-link: facebookresearch/FBGEMM#2164 Refactor kleidi-ai matrix multiplication routines to only rely on temporary registers. Doing this removes from each call the need to save and restore registers [x19, x30], and also simd registers [d9, d15] for the smaller routines. Cache miss is likely to happen for these loads, as the matrix processing should fill the cache. Around 10 memory load/store instructions get removed on each subroutine. Because larger matrixes get broken down into small pieces, these savings are once per piece. Reducing code size for these routines also makes them more likely to be in cache when needing to execute them. Benchmarks seem to show a small improvement. Now, some nice runs show almost the same throughput as BGM: P2050470491, P2050484212 Reviewed By: mcfi Differential Revision: D87656468

meta-cla bot added the cla signed label Nov 21, 2025

meta-codesync bot added fb-exported meta-exported labels Nov 21, 2025

Nicoshev force-pushed the export-D87656468 branch from edd0424 to c995700 Compare November 22, 2025 15:29

Nicoshev force-pushed the export-D87656468 branch from c995700 to be2e1a1 Compare November 23, 2025 15:04

Nicoshev force-pushed the export-D87656468 branch from be2e1a1 to 94470f7 Compare November 24, 2025 14:08

Nicoshev force-pushed the export-D87656468 branch from 94470f7 to 249fdfc Compare November 24, 2025 15:17

Nicoshev force-pushed the export-D87656468 branch from 249fdfc to 19eaafd Compare November 25, 2025 18:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve kleidi-ai matmul register usage #5165

Improve kleidi-ai matmul register usage #5165

Uh oh!

Nicoshev commented Nov 21, 2025 •

edited

Loading

Uh oh!

meta-codesync bot commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Improve kleidi-ai matmul register usage #5165

Are you sure you want to change the base?

Improve kleidi-ai matmul register usage #5165

Uh oh!

Conversation

Nicoshev commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync bot commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Nicoshev commented Nov 21, 2025 •

edited

Loading