tools: update llama-bench to include TTFT, E2E, ITL metrics #15643

taronaeo · 2025-08-28T17:51:15Z

Introduces the Time to First Token (TTFT), End-to-End Latency (E2E), and Inter-token Latency (ITL) metrics. Updates the README.md to explain the calculation as well.

Signed-off-by: Aaron Teo <[email protected]>

This reverts commit b5b2626. Signed-off-by: Aaron Teo <[email protected]>

taronaeo · 2025-09-02T03:02:59Z

Hi @ggerganov @slaren, any interest in having these metrics in llama-bench? :)

slaren

I am not convinced that this is necessary. It doesn't really fit all that well into the llama-bench model and will only produce meaningful results with some types of tests.

OTOH, you can already calculate all of these values if you formulate the tests properly, e.g. TTFT can be estimated with -pg <n_prompt>,1, E2E with any -pg test, and ITL with -n.

tools/llama-bench/llama-bench.cpp

tools/llama-bench/README.md

slaren · 2025-09-03T18:31:28Z

It may be more appropriate to add a python or shell script that runsllama-bench with the right tests, and calculates these values.

Signed-off-by: Aaron Teo <[email protected]>

taronaeo · 2025-09-04T08:38:17Z

It may be more appropriate to add a python or shell script that runs llama-bench with the right tests, and calculates these values.

Hmm yeah thats more reasonable. I can move the metrics into a Python script but I would prefer that we at least still log the TTFT within llama-bench so that we don't have to do 2 benchmark runs (i.e., -pg 512,1 and -pg 512,128) to generate the metrics.

This is because at least on IBM Z & LinuxONE, most of our users are running Type-2 virtualisation and the benchmark runs can have data varying far apart from one another, invalidating the results from the first run for TTFT as now they're inaccurate.

taronaeo · 2025-09-04T08:40:40Z

Let me know if its okay to keep the samples_ttft_ns/samples_ttft_ms within llama-bench to avoid the data inaccuracy I mentioned above

slaren · 2025-09-04T23:20:04Z

Let me know if its okay to keep the samples_ttft_ns/samples_ttft_ms within llama-bench to avoid the data inaccuracy I mentioned above

This still seems too specific, however, having an option to export in the json formats the timing of every token generated could be useful. It would also allow other use cases, such as generating very detailed graphs of performance vs context depth.

tools: impl ttft, e2e, itl metrics to llama-bench

3d4bd77

Signed-off-by: Aaron Teo <[email protected]>

github-actions bot added the examples label Aug 28, 2025

taronaeo added 3 commits August 29, 2025 01:53

lint: fix missing newline from eof

a22b702

Signed-off-by: Aaron Teo <[email protected]>

tools: fix return type mismatch

b5b2626

Signed-off-by: Aaron Teo <[email protected]>

Revert "tools: fix return type mismatch"

c1e72f9

This reverts commit b5b2626. Signed-off-by: Aaron Teo <[email protected]>

slaren reviewed Sep 3, 2025

View reviewed changes

tools/llama-bench/llama-bench.cpp Outdated Show resolved Hide resolved

tools/llama-bench/README.md Outdated Show resolved Hide resolved

tools/llama-bench/README.md Show resolved Hide resolved

taronaeo added 2 commits September 4, 2025 16:14

tools: update llama-bench docs

6b8431c

Signed-off-by: Aaron Teo <[email protected]>

tools: remove redundant first token check

405a7f9

Signed-off-by: Aaron Teo <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tools: update llama-bench to include TTFT, E2E, ITL metrics #15643

tools: update llama-bench to include TTFT, E2E, ITL metrics #15643

Uh oh!

taronaeo commented Aug 28, 2025

Uh oh!

taronaeo commented Sep 2, 2025

Uh oh!

slaren left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slaren commented Sep 3, 2025

Uh oh!

taronaeo commented Sep 4, 2025

Uh oh!

taronaeo commented Sep 4, 2025

Uh oh!

slaren commented Sep 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tools: update llama-bench to include TTFT, E2E, ITL metrics #15643

Are you sure you want to change the base?

tools: update llama-bench to include TTFT, E2E, ITL metrics #15643

Uh oh!

Conversation

taronaeo commented Aug 28, 2025

Uh oh!

taronaeo commented Sep 2, 2025

Uh oh!

slaren left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slaren commented Sep 3, 2025

Uh oh!

taronaeo commented Sep 4, 2025

Uh oh!

taronaeo commented Sep 4, 2025

Uh oh!

slaren commented Sep 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants