Commit 505d77f

committed

Enable MMA for BF16 data type on ppc64le

This patch upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for BF16 data type. This change results in 9x - 40x gains in total speed S t/s (ie all tokens/total time), across various batch sizes tested using llama-batched-bench benchmark. The patch is tested with Meta-Lllama-3-8B, and Mistral-7B models (BF16 models generated by using llama-quantize from corresponding FP32 models) on an IBM POWER10 machine. Signed-off-by: Shalini Salomi Bodapati <[email protected]>

1 parent 74d4f5b commit 505d77fCopy full SHA for 505d77f

1 file changed

+627

-2

lines changed

ggml/src/ggml-cpu/llamafile
- sgemm.cpp

1 file changed

+627

-2

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 505d77f

1 file changed

1 file changed

File tree

1 file changed

1 file changed

0 commit comments