Skip to content

Conversation

@amritahs-ibm
Copy link
Contributor

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le ISA using MMA
builtins. This patch handles matrix multiplication between quantised datatypes, block_q4_0 and
block_q8_0.

This change results in 5% - 50% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Make sure to read the contributing guidelines before submitting a PR

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 21, 2025
This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le ISA using MMA
builtins. This patch handles matrix multiplication
between quantised datatypes, block_q4_0 and
block_q8_0.

This change results in 5% - 50% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <[email protected]>
@amritahs-ibm
Copy link
Contributor Author

@ggerganov Can you please review these changes?

@amritahs-ibm
Copy link
Contributor Author

@ggerganov Can you please review this patch?

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add some sort of CI for this arch in the future. If you have any ideas, let me know.

@ggerganov ggerganov merged commit c7b43ab into ggml-org:master Mar 27, 2025
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants