You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In class tinyBLAS_PPC, previously, packing of input matrices A and B
was performed on-the-fly within each GEMM microkernel.
This patch refactors the code to decouple packing from kernel
by introducing a preprocessing step that packs matrices once before any kernel is invoked.
Benefits:
- Enables better memory locality and data reuse
- Simplifies the kernel logic by focusing purely on computation
- Improves overall GEMM performance, especially for large matrix sizes
Signed-off-by: Shalini Salomi Bodapati <[email protected]>
0 commit comments