-
Notifications
You must be signed in to change notification settings - Fork 13.4k
tools: update llama-bench to include TTFT, E2E, ITL metrics #15643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
tools: update llama-bench to include TTFT, E2E, ITL metrics #15643
Conversation
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
This reverts commit b5b2626. Signed-off-by: Aaron Teo <[email protected]>
|
Hi @ggerganov @slaren, any interest in having these metrics in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced that this is necessary. It doesn't really fit all that well into the llama-bench model and will only produce meaningful results with some types of tests.
OTOH, you can already calculate all of these values if you formulate the tests properly, e.g. TTFT can be estimated with -pg <n_prompt>,1, E2E with any -pg test, and ITL with -n.
|
It may be more appropriate to add a python or shell script that runs |
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Hmm yeah thats more reasonable. I can move the metrics into a Python script but I would prefer that we at least still log the TTFT within This is because at least on IBM Z & LinuxONE, most of our users are running Type-2 virtualisation and the benchmark runs can have data varying far apart from one another, invalidating the results from the first run for TTFT as now they're inaccurate. |
|
Let me know if its okay to keep the |
This still seems too specific, however, having an option to export in the json formats the timing of every token generated could be useful. It would also allow other use cases, such as generating very detailed graphs of performance vs context depth. |
Introduces the Time to First Token (TTFT), End-to-End Latency (E2E), and Inter-token Latency (ITL) metrics. Updates the README.md to explain the calculation as well.