Skip to content

Commit 1583908

Browse files
committed
Update README.md
1 parent 0a28796 commit 1583908

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ The default behavior for CPU only operations is unchanged. When a GPU is present
66

77
## Intial testing results (Xeon 8592+):
88

9-
## llama-bench
9+
## llama-bench:
1010
### No AMX
1111
```
1212
numactl -N 2 -m 2 llama-bench -m /Qwen3-30B-A3B-Thinking-2507-Q4_0.gguf -t 32 --numa numactl -ngl 10 -nopo 1 -b 512 -ub 512 -pg 512,512 --repetitions 3
@@ -36,9 +36,9 @@ ggml_cuda_init: found 1 CUDA devices:
3636
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | CUDA | 10 | 32 | 512 | 1 | 1 | tg128 | 55.55 ± 0.26 |
3737
| qwen3moe 30B.A3B Q4_0 | 16.18 GiB | 30.53 B | CUDA | 10 | 32 | 512 | 1 | 1 | pp512+tg512 | 77.62 ± 0.26 |
3838
```
39-
**PP512 + 69.62 t/s (+32.47%)**
40-
**TG128 + 9.88 t/s (+21.63%)**
41-
**PP512+TG512 + 12.35 t/s (+18.92%)**
39+
- **PP512 + 69.62 t/s (+32.47%)**
40+
- **TG128 + 9.88 t/s (+21.63%)**
41+
- **PP512+TG512 + 12.35 t/s (+18.92%)**
4242

4343

4444
## CLI performance:
@@ -66,9 +66,9 @@ llama_perf_context_print: eval time = 10416.81 ms / 511 runs ( 20
6666
llama_perf_context_print: total time = 10670.73 ms / 516 tokens
6767
llama_perf_context_print: graphs reused = 508
6868
```
69-
**Decode (generation): +8.74 t/s (+21.68%)**
70-
**Prompt (prefill): +11.07 t/s (+12.88%)**
71-
**Overall throughput: + 8.77 t/s (+21.64%)**
69+
- **Decode (generation): +8.74 t/s (+21.68%)**
70+
- **Prompt (prefill): +11.07 t/s (+12.88%)**
71+
- **Overall throughput: + 8.77 t/s (+21.64%)**
7272

7373

7474
## Instructions:

0 commit comments

Comments
 (0)