We (the WVA team from llm-d-incubation/workload-variant-autoscaler) plan to integrate the llm-d-inference-sim for our e2e testing infrastructure. The WVA relies on the following metrics to function correctly:
- vllm:request_success_total 
- vllm:request_prompt_tokens 
- vllm:request_generation_tokens
- vllm:time_to_first_token_seconds
- vllm:time_per_output_token_seconds 
While PR #202 introduces the first three of these metrics, the support for the crucial TTFT and TPOT metrics (vllm:time_to_first_token_seconds and vllm:time_per_output_token_seconds) is still missing.
We request the llm-d-inference-sim team to consider adding these metrics as the next immediate step.
cc @WheelyMcBones, @mamy-CS