Skip to content

Required Metrics for Workload Variant Autoscaler (WVA) Integration with llm-d-inference-sim #211

@vishakha-ramani

Description

@vishakha-ramani

We (the WVA team from llm-d-incubation/workload-variant-autoscaler) plan to integrate the llm-d-inference-sim for our e2e testing infrastructure. The WVA relies on the following metrics to function correctly:

  1. vllm:request_success_total
  2. vllm:request_prompt_tokens
  3. vllm:request_generation_tokens
  4. vllm:time_to_first_token_seconds
  5. vllm:time_per_output_token_seconds

While PR #202 introduces the first three of these metrics, the support for the crucial TTFT and TPOT metrics (vllm:time_to_first_token_seconds and vllm:time_per_output_token_seconds) is still missing.

We request the llm-d-inference-sim team to consider adding these metrics as the next immediate step.

cc @WheelyMcBones, @mamy-CS

Sub-issues

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions