forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit 875e5cb
committed
Q4 Tiled Gemm Optimization
./patch_build/bin/llama-bench -m /home/shalini/Models/Meta-Llama-3-8B/ggml-model-q4.gguf -p 64,128,256,512 -n 1
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp64 | 55.26 ± 0.02 |
| llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp128 | 56.51 ± 0.01 |
| llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp256 | 55.00 ± 0.01 |
| llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp512 | 51.52 ± 0.01 |
| llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | tg1 | 13.57 ± 0.03 |
build: 344dc63c (6761)
./base_build/bin/llama-bench -m /home/shalini/Models/Meta-Llama-3-8B/ggml-model-q4.gguf -p 64,128,256,512 -n 1
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp64 | 39.01 ± 0.19 |
| llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp128 | 38.80 ± 0.01 |
| llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp256 | 37.84 ± 0.00 |
| llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | pp512 | 35.70 ± 0.22 |
| llama 8B Q4_0 | 4.33 GiB | 8.03 B | CPU | 10 | tg1 | 13.58 ± 0.02 |
Signed-off-by: Shalini Salomi Bodapati <[email protected]>1 parent 061f0ef commit 875e5cbCopy full SHA for 875e5cb
File tree
Expand file treeCollapse file tree
1 file changed
+279
-20
lines changedOpen diff view settings
Filter options
- ggml/src/ggml-cpu/llamafile
Expand file treeCollapse file tree
1 file changed
+279
-20
lines changedOpen diff view settings
0 commit comments