Skip to content

Conversation

@shalinib-ibm
Copy link
Owner

./patch_build/bin/llama-bench -m /home/shalini/Models/Meta-Llama-3-8B/ggml-model-q4.gguf -p 64,128,256,512 -n 1

model size params backend threads test t/s
llama 8B Q4_0 4.33 GiB 8.03 B CPU 10 pp64 55.26 ± 0.02
llama 8B Q4_0 4.33 GiB 8.03 B CPU 10 pp128 56.51 ± 0.01
llama 8B Q4_0 4.33 GiB 8.03 B CPU 10 pp256 55.00 ± 0.01
llama 8B Q4_0 4.33 GiB 8.03 B CPU 10 pp512 51.52 ± 0.01
llama 8B Q4_0 4.33 GiB 8.03 B CPU 10 tg1 13.57 ± 0.03

build: 344dc63c (6761)
./base_build/bin/llama-bench -m /home/shalini/Models/Meta-Llama-3-8B/ggml-model-q4.gguf -p 64,128,256,512 -n 1

model size params backend threads test t/s
llama 8B Q4_0 4.33 GiB 8.03 B CPU 10 pp64 39.01 ± 0.19
llama 8B Q4_0 4.33 GiB 8.03 B CPU 10 pp128 38.80 ± 0.01
llama 8B Q4_0 4.33 GiB 8.03 B CPU 10 pp256 37.84 ± 0.00
llama 8B Q4_0 4.33 GiB 8.03 B CPU 10 pp512 35.70 ± 0.22
llama 8B Q4_0 4.33 GiB 8.03 B CPU 10 tg1 13.58 ± 0.02

Make sure to read the contributing guidelines before submitting a PR

./patch_build/bin/llama-bench -m /home/shalini/Models/Meta-Llama-3-8B/ggml-model-q4.gguf -p 64,128,256,512 -n 1
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| llama 8B Q4_0                  |   4.33 GiB |     8.03 B | CPU        |      10 |            pp64 |         55.26 ± 0.02 |
| llama 8B Q4_0                  |   4.33 GiB |     8.03 B | CPU        |      10 |           pp128 |         56.51 ± 0.01 |
| llama 8B Q4_0                  |   4.33 GiB |     8.03 B | CPU        |      10 |           pp256 |         55.00 ± 0.01 |
| llama 8B Q4_0                  |   4.33 GiB |     8.03 B | CPU        |      10 |           pp512 |         51.52 ± 0.01 |
| llama 8B Q4_0                  |   4.33 GiB |     8.03 B | CPU        |      10 |             tg1 |         13.57 ± 0.03 |

build: 344dc63c (6761)
./base_build/bin/llama-bench -m /home/shalini/Models/Meta-Llama-3-8B/ggml-model-q4.gguf -p 64,128,256,512 -n 1
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| llama 8B Q4_0                  |   4.33 GiB |     8.03 B | CPU        |      10 |            pp64 |         39.01 ± 0.19 |
| llama 8B Q4_0                  |   4.33 GiB |     8.03 B | CPU        |      10 |           pp128 |         38.80 ± 0.01 |
| llama 8B Q4_0                  |   4.33 GiB |     8.03 B | CPU        |      10 |           pp256 |         37.84 ± 0.00 |
| llama 8B Q4_0                  |   4.33 GiB |     8.03 B | CPU        |      10 |           pp512 |         35.70 ± 0.22 |
| llama 8B Q4_0                  |   4.33 GiB |     8.03 B | CPU        |      10 |             tg1 |         13.58 ± 0.02 |

Signed-off-by: Shalini Salomi Bodapati <[email protected]>
@github-actions github-actions bot added the ggml label Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants