Skip to content

Support more Prometheus metricsΒ #191

@irar2

Description

@irar2

Support the following vLLM Prometheus metrics:

vllm:e2e_request_latency_seconds
vllm:request_queue_time_seconds
vllm:request_inference_time_seconds
vllm:request_prefill_time_seconds
vllm:request_decode_time_seconds

vllm:request_params_max_tokens - What is reported if max_tokens not set? What about max_completion_tokens?
vllm:request_max_num_generation_tokens - This is the minimum of max-model-len - prompt length and max_tokens if defined.

vllm:request_success_total - Should be labeled with the finish reason

It would be better to add the following metrics after we add the full tokenization:
vllm:request_prompt_tokens
vllm:request_generation_tokens

vllm:request_params_n

Sub-issues

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions