Required Metrics for Workload Variant Autoscaler (WVA) Integration with llm-d-inference-sim

We (the WVA team from [llm-d-incubation/workload-variant-autoscaler](https://github.com/llm-d-incubation/workload-variant-autoscaler)) plan to integrate the `llm-d-inference-sim` for our e2e testing infrastructure. The WVA relies on the following metrics to function correctly:
1. `vllm:request_success_total `
2. `vllm:request_prompt_tokens `
3. `vllm:request_generation_tokens` 
4. `vllm:time_to_first_token_seconds`
5. `vllm:time_per_output_token_seconds `

While PR #202 introduces the first three of these metrics, the support for the crucial TTFT and TPOT metrics (`vllm:time_to_first_token_seconds` and `vllm:time_per_output_token_seconds`) is still missing.

We request the llm-d-inference-sim team to consider adding these metrics as the next immediate step. 

cc @WheelyMcBones, @mamy-CS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Required Metrics for Workload Variant Autoscaler (WVA) Integration with llm-d-inference-sim #211

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Required Metrics for Workload Variant Autoscaler (WVA) Integration with llm-d-inference-sim #211

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions