-
Couldn't load subscription status.
- Fork 37
Description
Support the following vLLM Prometheus metrics:
vllm:e2e_request_latency_seconds
vllm:request_queue_time_seconds
vllm:request_inference_time_seconds
vllm:request_prefill_time_seconds
vllm:request_decode_time_seconds
vllm:request_params_max_tokens - What is reported if max_tokens not set? What about max_completion_tokens?
vllm:request_max_num_generation_tokens - This is the minimum of max-model-len - prompt length and max_tokens if defined.
vllm:request_success_total - Should be labeled with the finish reason
It would be better to add the following metrics after we add the full tokenization:
vllm:request_prompt_tokens
vllm:request_generation_tokens
vllm:request_params_n