Skip to content

Conversation

@shalinib-ibm
Copy link
Contributor

This patch upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for BF16 data type.

This change results in 9x - 40x gains
in total speed S t/s (ie all tokens/total time), across various batch sizes tested using llama-batched-bench benchmark.

The patch is tested with Meta-Lllama-3-8B,
and Mistral-7B models (BF16 models generated by using llama-quantize from corresponding FP32 models) on an IBM POWER10 machine.

Make sure to read the contributing guidelines before submitting a PR

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Apr 28, 2025
@shalinib-ibm
Copy link
Contributor Author

@ggerganov Can you please review this PR and provide your comments ?

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a machine to test this, but at least fix the indentation of the code and we can merge it.

@shalinib-ibm shalinib-ibm force-pushed the main_bf16_sgemm branch 3 times, most recently from c6c14fa to b9c6af2 Compare May 2, 2025 06:20
This patch upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for BF16 data type.

This change results in 9x - 40x gains
in total speed S t/s (ie all tokens/total time), across various batch sizes tested using llama-batched-bench benchmark.

The patch is tested with Meta-Lllama-3-8B,
and Mistral-7B models (BF16 models generated by using llama-quantize from corresponding FP32 models) on an IBM POWER10 machine.

Signed-off-by: Shalini Salomi Bodapati <[email protected]>
@shalinib-ibm
Copy link
Contributor Author

I don't have a machine to test this, but at least fix the indentation of the code and we can merge it.

Thank you @ggerganov . I have fixed the code indent. Can you please review ?

@shalinib-ibm shalinib-ibm requested a review from ggerganov May 2, 2025 16:39
@ggerganov ggerganov merged commit 3f3769b into ggml-org:master May 2, 2025
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants