@@ -20,10 +20,20 @@ Performance testing tool for llama.cpp.
2020## Syntax
2121
2222```
23- usage: ./ llama-bench [options]
23+ usage: llama-bench [options]
2424
2525options:
2626 -h, --help
27+ --numa <distribute|isolate|numactl> numa mode (default: disabled)
28+ -r, --repetitions <n> number of times to repeat each test (default: 5)
29+ --prio <0|1|2|3> process/thread priority (default: 0)
30+ --delay <0...N> (seconds) delay between each test (default: 0)
31+ -o, --output <csv|json|jsonl|md|sql> output format printed to stdout (default: md)
32+ -oe, --output-err <csv|json|jsonl|md|sql> output format printed to stderr (default: none)
33+ -v, --verbose verbose output
34+ --progress print test progress indicators
35+
36+ test parameters:
2737 -m, --model <filename> (default: models/7B/ggml-model-q4_0.gguf)
2838 -p, --n-prompt <n> (default: 512)
2939 -n, --n-gen <n> (default: 128)
@@ -33,7 +43,7 @@ options:
3343 -ub, --ubatch-size <n> (default: 512)
3444 -ctk, --cache-type-k <t> (default: f16)
3545 -ctv, --cache-type-v <t> (default: f16)
36- -t, --threads <n> (default: 8 )
46+ -t, --threads <n> (default: 16 )
3747 -C, --cpu-mask <hex,hex> (default: 0x0)
3848 --cpu-strict <0|1> (default: 0)
3949 --poll <0...100> (default: 50)
@@ -44,17 +54,15 @@ options:
4454 -nkvo, --no-kv-offload <0|1> (default: 0)
4555 -fa, --flash-attn <0|1> (default: 0)
4656 -mmp, --mmap <0|1> (default: 1)
47- --numa <distribute|isolate|numactl> (default: disabled)
4857 -embd, --embeddings <0|1> (default: 0)
4958 -ts, --tensor-split <ts0/ts1/..> (default: 0)
50- -r, --repetitions <n> (default: 5)
51- --prio <0|1|2|3> (default: 0)
52- --delay <0...N> (seconds) (default: 0)
53- -o, --output <csv|json|jsonl|md|sql> (default: md)
54- -oe, --output-err <csv|json|jsonl|md|sql> (default: none)
55- -v, --verbose (default: 0)
56-
57- Multiple values can be given for each parameter by separating them with ',' or by specifying the parameter multiple times.
59+ -ot --override-tensors <tensor name pattern>=<buffer type>;...
60+ (default: disabled)
61+ -nopo, --no-op-offload <0|1> (default: 0)
62+
63+ Multiple values can be given for each parameter by separating them with ','
64+ or by specifying the parameter multiple times. Ranges can be given as
65+ 'start-end' or 'start-end+step' or 'start-end*mult'.
5866```
5967
6068llama-bench can perform three types of tests:
0 commit comments