-
Notifications
You must be signed in to change notification settings - Fork 372
Open
Labels
feature requestNew feature or requestNew feature or request
Description
Describe the feature
Summary
When static_backend_health_checks=True, health checks for vLLM instances that require authentication fail because the current implementation (is_model_healthy) does not handle API key headers and relies on model-specific payloads.
Current Behavior
is_model_healthyalways sends a POST request with a JSON payload andContent-Typeheader only.- If the vLLM instance requires authentication (API key), the health check request fails.
- In production, this causes routers to mark healthy backends as unhealthy.
Desired Behavior
- Add support for including API key headers when performing health checks.
- Optionally allow a generic "ping" health check that does not rely on model-specific payloads when
static_backend_health_checks=True.
Proposed Changes
- Update
is_model_healthyto:- Include API key header if provided (e.g.,
Authorization: Bearer <API_KEY>or configured header). - Skip sending model-specific payloads when
static_backend_health_checks=True, allowing a generic health endpoint.
- Include API key header if provided (e.g.,
- Make API key configurable via environment variable or router config.
Motivation
Ensures reliable health checking for authenticated vLLM instances in production deployments.
Additional Context
Current code snippet:
def is_model_healthy(url: str, model: str, model_type: str) -> bool:
model_details = ModelType[model_type]
try:
response = requests.post(
f"{url}{model_details.value}",
headers={"Content-Type": "application/json"},
json={"model": model} | model_details.get_test_payload(model_type),
timeout=30,
)
except Exception as e:
logger.error(e)
return False
return response.status_code == 200Why do you need this feature?
No response
Additional context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request