@@ -20,10 +20,20 @@ Performance testing tool for llama.cpp.
2020## Syntax  
2121
2222``` 
23- usage: ./ llama-bench [options] 
23+ usage: llama-bench [options] 
2424
2525options: 
2626  -h, --help 
27+   --numa <distribute|isolate|numactl>       numa mode (default: disabled) 
28+   -r, --repetitions <n>                     number of times to repeat each test (default: 5) 
29+   --prio <0|1|2|3>                          process/thread priority (default: 0) 
30+   --delay <0...N> (seconds)                 delay between each test (default: 0) 
31+   -o, --output <csv|json|jsonl|md|sql>      output format printed to stdout (default: md) 
32+   -oe, --output-err <csv|json|jsonl|md|sql> output format printed to stderr (default: none) 
33+   -v, --verbose                             verbose output 
34+   --progress                                print test progress indicators 
35+ 
36+ test parameters: 
2737  -m, --model <filename>                    (default: models/7B/ggml-model-q4_0.gguf) 
2838  -p, --n-prompt <n>                        (default: 512) 
2939  -n, --n-gen <n>                           (default: 128) 
@@ -33,7 +43,7 @@ options:
3343  -ub, --ubatch-size <n>                    (default: 512) 
3444  -ctk, --cache-type-k <t>                  (default: f16) 
3545  -ctv, --cache-type-v <t>                  (default: f16) 
36-   -t, --threads <n>                         (default: 8 ) 
46+   -t, --threads <n>                         (default: 16 ) 
3747  -C, --cpu-mask <hex,hex>                  (default: 0x0) 
3848  --cpu-strict <0|1>                        (default: 0) 
3949  --poll <0...100>                          (default: 50) 
@@ -44,17 +54,15 @@ options:
4454  -nkvo, --no-kv-offload <0|1>              (default: 0) 
4555  -fa, --flash-attn <0|1>                   (default: 0) 
4656  -mmp, --mmap <0|1>                        (default: 1) 
47-   --numa <distribute|isolate|numactl>       (default: disabled) 
4857  -embd, --embeddings <0|1>                 (default: 0) 
4958  -ts, --tensor-split <ts0/ts1/..>          (default: 0) 
50-   -r, --repetitions <n>                     (default: 5) 
51-   --prio <0|1|2|3>                          (default: 0) 
52-   --delay <0...N> (seconds)                 (default: 0) 
53-   -o, --output <csv|json|jsonl|md|sql>      (default: md) 
54-   -oe, --output-err <csv|json|jsonl|md|sql> (default: none) 
55-   -v, --verbose                             (default: 0) 
56- 
57- Multiple values can be given for each parameter by separating them with ',' or by specifying the parameter multiple times. 
59+   -ot --override-tensors <tensor name pattern>=<buffer type>;... 
60+                                             (default: disabled) 
61+   -nopo, --no-op-offload <0|1>              (default: 0) 
62+ 
63+ Multiple values can be given for each parameter by separating them with ',' 
64+ or by specifying the parameter multiple times. Ranges can be given as 
65+ 'start-end' or 'start-end+step' or 'start-end*mult'. 
5866``` 
5967
6068llama-bench can perform three types of tests:
0 commit comments