Skip to content

Conversation

@mayabar
Copy link
Collaborator

@mayabar mayabar commented Nov 3, 2025

Currently vllm:max_num_generation_tokens report same values as vllm:request_generation_tokens since response always contains only one choice.

Fixes #243

… we never return responses with more than one choice, the implementation is basic. Once 'n' request property will be supported - need to change to support real maximum. Added support in fake metrics. Tests added too.

Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Maya Barnea <[email protected]>
@mayabar mayabar requested a review from irar2 November 3, 2025 07:17
…sage and fix, fix arguments in invalid configuration tests.

Fix validation of ttft and tpot fake definitions.

Signed-off-by: Maya Barnea <[email protected]>
…x invalid lora test in config, add missing comments

Signed-off-by: Maya Barnea <[email protected]>
@mayabar mayabar requested a review from irar2 November 4, 2025 09:11
Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Maya Barnea <[email protected]>
@irar2
Copy link
Collaborator

irar2 commented Nov 4, 2025

/lgtm
/approve

@github-actions github-actions bot added the lgtm label Nov 4, 2025
@github-actions github-actions bot merged commit e1e27ea into llm-d:main Nov 4, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add vllm:request_max_num_generation_tokens metric

2 participants