Skip to content

Commit 875e5cb

Browse files
committed
Q4 Tiled Gemm Optimization
./patch_build/bin/llama-bench -m /home/shalini/Models/Meta-Llama-3-8B/ggml-model-q4.gguf -p 64,128,256,512 -n 1 | model | size | params | backend | threads | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp64 | 55.26 ± 0.02 | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp128 | 56.51 ± 0.01 | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp256 | 55.00 ± 0.01 | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp512 | 51.52 ± 0.01 | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | tg1 | 13.57 ± 0.03 | build: 344dc63c (6761) ./base_build/bin/llama-bench -m /home/shalini/Models/Meta-Llama-3-8B/ggml-model-q4.gguf -p 64,128,256,512 -n 1 | model | size | params | backend | threads | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp64 | 39.01 ± 0.19 | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp128 | 38.80 ± 0.01 | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp256 | 37.84 ± 0.00 | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp512 | 35.70 ± 0.22 | | llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | tg1 | 13.58 ± 0.02 | Signed-off-by: Shalini Salomi Bodapati <[email protected]>
1 parent 061f0ef commit 875e5cb

File tree

1 file changed

+279
-20
lines changed

1 file changed

+279
-20
lines changed

0 commit comments

Comments
 (0)