Q4/Q8 Tiled Gemm Optimization. #22

shalinib-ibm · 2025-10-26T16:17:06Z

This patch implemenrts tiled GEMM for large blocks where we pack blocks of 64x64 and perfrom matmul.

30 ~ 50 % improvement in llama-bench and llama-batched-bench with Meta-Llama3-8B Qunatized models( Q4_0 and Q8_0).

Make sure to read the contributing guidelines before submitting a PR

This patch implemenrts tiled GEMM for large blocks where we pack blocks of 64x64 and perfrom matmul. 30 ~ 50 % improvement in llama-bench and llama-batched-bench with Meta-Llama3-8B Qunatized models( Q4_0 and Q8_0). Signed-off-by: root <[email protected]>

github-actions bot added the ggml label Oct 26, 2025

Q4/Q8 Tiled Gemm Optimization.

e16fee0

This patch implemenrts tiled GEMM for large blocks where we pack blocks of 64x64 and perfrom matmul. 30 ~ 50 % improvement in llama-bench and llama-batched-bench with Meta-Llama3-8B Qunatized models( Q4_0 and Q8_0). Signed-off-by: root <[email protected]>

shalinib-ibm force-pushed the q4_q8_llama_opt branch from c726070 to e16fee0 Compare October 28, 2025 13:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Q4/Q8 Tiled Gemm Optimization. #22

Q4/Q8 Tiled Gemm Optimization. #22

Uh oh!

shalinib-ibm commented Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Q4/Q8 Tiled Gemm Optimization. #22

Are you sure you want to change the base?

Q4/Q8 Tiled Gemm Optimization. #22

Uh oh!

Conversation

shalinib-ibm commented Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants