tests : enhance llama-bench with parameter sweep and separate timings, added n_threads_batch #14203

thad0ctor · 2025-06-15T22:28:09Z

Minor improvments to llama-bench

New Features

Separate Prompt/Generation Timing: Provides detailed performance metrics by separately measuring prompt processing and token generation.
n_threads_batch: Add n_threads_batch to available commands

thad0ctor · 2025-06-15T22:34:19Z

Added commands in bold:

./llama-bench --help
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 3 CUDA devices:
Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
Device 1: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
Device 2: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
usage: ./llama-bench [options]

test parameters:
-m, --model (default: models/7B/ggml-model-q4_0.gguf)
-p, --n-prompt (default: 512)
-n, --n-gen (default: 128)
-pg <pp,tg> (default: )
-d, --n-depth (default: 0)
-b, --batch-size (default: 2048)
-ub, --ubatch-size (default: 512)
-ctk, --cache-type-k (default: f16)
-ctv, --cache-type-v (default: f16)
-dt, --defrag-thold (default: -1)
-t, --threads (default: 24)
--n-threads-batch (default: 24)
-C, --cpu-mask <hex,hex> (default: 0x0)
--cpu-strict <0|1> (default: 0)
--poll <0...100> (default: 50)
-ngl, --n-gpu-layers (default: 99)
-sm, --split-mode <none|layer|row> (default: layer)
-mg, --main-gpu (default: 0)
-nkvo, --no-kv-offload <0|1> (default: 0)
-fa, --flash-attn <0|1> (default: 0)
-mmp, --mmap <0|1> (default: 1)
-embd, --embeddings <0|1> (default: 0)
-ts, --tensor-split <ts0/ts1/..> (default: 0)
-ot --override-tensors =;...
(default: disabled)
-nopo, --no-op-offload <0|1> (default: 0)

thad0ctor · 2025-06-15T22:38:31Z

OBE

slaren · 2025-06-16T00:12:02Z

llama-bench accepts ranges for the numeric parameters, e.g. to test pp from 128 to 256 in increments of 64, you can use llama-bench -p 128-256+64. How does this functionality differ?

thad0ctor · 2025-06-16T01:29:29Z

llama-bench accepts ranges for the numeric parameters, e.g. to test pp from 128 to 256 in increments of 64, you can use llama-bench -p 128-256+64. How does this functionality differ?

Good catch.

I missed this detail in the documentation so can remove the added args. The addition of the pp and gen t/s columns in the console output are likely worth keeping, also the n-theads-batch paramater. But the rest can go.

github-actions bot added the examples label Jun 15, 2025

thad0ctor marked this pull request as draft June 16, 2025 01:34

thad0ctor closed this Jun 16, 2025

thad0ctor force-pushed the master branch from 03d7ac2 to d7da8dc Compare June 16, 2025 01:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tests : enhance llama-bench with parameter sweep and separate timings, added n_threads_batch #14203

tests : enhance llama-bench with parameter sweep and separate timings, added n_threads_batch #14203

Uh oh!

thad0ctor commented Jun 15, 2025 •

edited

Loading

Uh oh!

thad0ctor commented Jun 15, 2025 •

edited

Loading

Uh oh!

thad0ctor commented Jun 15, 2025 •

edited

Loading

Uh oh!

slaren commented Jun 16, 2025

Uh oh!

thad0ctor commented Jun 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tests : enhance llama-bench with parameter sweep and separate timings, added n_threads_batch #14203

tests : enhance llama-bench with parameter sweep and separate timings, added n_threads_batch #14203

Uh oh!

Conversation

thad0ctor commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Minor improvments to llama-bench

New Features

Uh oh!

thad0ctor commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thad0ctor commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented Jun 16, 2025

Uh oh!

thad0ctor commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thad0ctor commented Jun 15, 2025 •

edited

Loading

thad0ctor commented Jun 15, 2025 •

edited

Loading

thad0ctor commented Jun 15, 2025 •

edited

Loading

thad0ctor commented Jun 16, 2025 •

edited

Loading