Skip to content

Conversation

@shalinib-ibm
Copy link
Owner

This patch implemenrts tiled GEMM for large blocks where we pack blocks of 64x64 and perfrom matmul.

30 ~ 50 % improvement in llama-bench and llama-batched-bench with Meta-Llama3-8B Qunatized models( Q4_0 and Q8_0).

Make sure to read the contributing guidelines before submitting a PR

@github-actions github-actions bot added the ggml label Oct 26, 2025
This patch implemenrts tiled GEMM for large blocks
where we pack blocks of 64x64 and perfrom matmul.

30 ~ 50 % improvement in llama-bench and llama-batched-bench
with Meta-Llama3-8B Qunatized models( Q4_0 and Q8_0).

Signed-off-by: root <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants