Commit c7b43ab
authored
llamafile : ppc64le MMA implementation for Q4_0. (ggml-org#12489)
This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le ISA using MMA
builtins. This patch handles matrix multiplication
between quantised datatypes, block_q4_0 and
block_q8_0.
This change results in 5% - 50% improvement
in total speed(ie all tokens/total time), across
various batch sizes.
The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.
Signed-off-by: Amrita H S <[email protected]>1 parent 24feaec commit c7b43ab
1 file changed
+517
-86
lines changed
0 commit comments