fix: inter-token latency metric calculation #99

Amirresm · 2025-11-08T14:37:12Z

This PR introduces two fixes to the inter-token latency (ITL) metric:

Uses a consistent output token count to calculate ITL
- fixes an issue where streamed chunks return more than one token (e.g. with speculative decoding)
Exclude the time taken to generate the first token (TTFT) in calculating ITL

This fixes the metric in niche use cases (e.g. with vLLM's draft models or tool responses) and aligns the implementation with other analyzers like Nvidia GenAI-Perf.

Amirresm added 2 commits November 8, 2025 06:27

fix: exclude ttft from itl in openai clients

b4600ac

fix: use correct output token count in itl

b4e1f8a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: inter-token latency metric calculation #99

fix: inter-token latency metric calculation #99

Uh oh!

Amirresm commented Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: inter-token latency metric calculation #99

Are you sure you want to change the base?

fix: inter-token latency metric calculation #99

Uh oh!

Conversation

Amirresm commented Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant