Skip to content

Conversation

@shalinib-ibm
Copy link
Contributor

Final Version

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 25, 2025
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several stray printfs and debugging code. Also the indentation is not consistent. Seems like a work-in-progress - should be cleaned-up before consider merging.

@shalinib-ibm shalinib-ibm marked this pull request as draft March 27, 2025 09:15
@shalinib-ibm
Copy link
Contributor Author

There are several stray printfs and debugging code. Also the indentation is not consistent. Seems like a work-in-progress - should be cleaned-up before consider merging.

Thank you @ggerganov . Moved it to draft. Will publish the final version soon.

This patch upstreams llamafile's cpu matrix multiplication
kernels for ppc64le using MMA builtins for BF16 data type.

This change results in 9x - 40x gains
in total speed S t/s (ie all tokens/total time), across
various batch sizes tested using llama-batched-bench benchmark.

The patch is tested with Meta-Lllama-3-8B,
and Mistral-7B models (BF16 models generated by using
llama-quantize from corresponding FP32 models) on an
IBM POWER10 machine.

Signed-off-by: Shalini Salomi Bodapati <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants