Q4 Tiled Gemm Optimization #21

shalinib-ibm · 2025-10-23T15:14:55Z

./patch_build/bin/llama-bench -m /home/shalini/Models/Meta-Llama-3-8B/ggml-model-q4.gguf -p 64,128,256,512 -n 1

model	size	params	backend	threads	test	t/s
llama 8B Q4_0	4.33 GiB	8.03 B	CPU	10	pp64	55.26 ± 0.02
llama 8B Q4_0	4.33 GiB	8.03 B	CPU	10	pp128	56.51 ± 0.01
llama 8B Q4_0	4.33 GiB	8.03 B	CPU	10	pp256	55.00 ± 0.01
llama 8B Q4_0	4.33 GiB	8.03 B	CPU	10	pp512	51.52 ± 0.01
llama 8B Q4_0	4.33 GiB	8.03 B	CPU	10	tg1	13.57 ± 0.03

build: 344dc63c (6761)
./base_build/bin/llama-bench -m /home/shalini/Models/Meta-Llama-3-8B/ggml-model-q4.gguf -p 64,128,256,512 -n 1

model	size	params	backend	threads	test	t/s
llama 8B Q4_0	4.33 GiB	8.03 B	CPU	10	pp64	39.01 ± 0.19
llama 8B Q4_0	4.33 GiB	8.03 B	CPU	10	pp128	38.80 ± 0.01
llama 8B Q4_0	4.33 GiB	8.03 B	CPU	10	pp256	37.84 ± 0.00
llama 8B Q4_0	4.33 GiB	8.03 B	CPU	10	pp512	35.70 ± 0.22
llama 8B Q4_0	4.33 GiB	8.03 B	CPU	10	tg1	13.58 ± 0.02

Make sure to read the contributing guidelines before submitting a PR

./patch_build/bin/llama-bench -m /home/shalini/Models/Meta-Llama-3-8B/ggml-model-q4.gguf -p 64,128,256,512 -n 1 | model | size | params | backend | threads | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp64 | 55.26 ± 0.02 | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp128 | 56.51 ± 0.01 | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp256 | 55.00 ± 0.01 | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp512 | 51.52 ± 0.01 | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | tg1 | 13.57 ± 0.03 | build: 344dc63c (6761) ./base_build/bin/llama-bench -m /home/shalini/Models/Meta-Llama-3-8B/ggml-model-q4.gguf -p 64,128,256,512 -n 1 | model | size | params | backend | threads | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp64 | 39.01 ± 0.19 | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp128 | 38.80 ± 0.01 | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp256 | 37.84 ± 0.00 | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp512 | 35.70 ± 0.22 | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | tg1 | 13.58 ± 0.02 | Signed-off-by: Shalini Salomi Bodapati <[email protected]>

github-actions bot added the ggml label Oct 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Q4 Tiled Gemm Optimization #21

Q4 Tiled Gemm Optimization #21

Uh oh!

shalinib-ibm commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Q4 Tiled Gemm Optimization #21

Are you sure you want to change the base?

Q4 Tiled Gemm Optimization #21

Uh oh!

Conversation

shalinib-ibm commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants