Skip to content

Commit 505d77f

Browse files
committed
Enable MMA for BF16 data type on ppc64le
This patch upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for BF16 data type. This change results in 9x - 40x gains in total speed S t/s (ie all tokens/total time), across various batch sizes tested using llama-batched-bench benchmark. The patch is tested with Meta-Lllama-3-8B, and Mistral-7B models (BF16 models generated by using llama-quantize from corresponding FP32 models) on an IBM POWER10 machine. Signed-off-by: Shalini Salomi Bodapati <[email protected]>
1 parent 74d4f5b commit 505d77f

File tree

1 file changed

+627
-2
lines changed

1 file changed

+627
-2
lines changed

0 commit comments

Comments
 (0)