-
Notifications
You must be signed in to change notification settings - Fork 296
Open
Description
I configured SR using two models and run benchmark.
Some prompts (from MMLU-Pro) works ok, I can send a request and I am getting an answer, however for some prompts sending request fails and I am getting an error: "no healthy upstream". I see in the envoy log that response code is 503.
models:
- vllm serve microsoft/Phi-3-mini-4k-instruct --port=8002 --host 127.0.0.1 --max-model-len 8192 --enforce-eager --served-model-name phi3
- vllm serve openai/gpt-oss-20b --port=8003 --host 0.0.0.0 --served-model-name gpt-oss --max-model-len 8192 --enforce-eager
Failing request for example:
curl -X POST http://localhost:8801/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "MoM",
"messages": [
{"role": "user", "content": "who are you"}
]
}'
Working request for example:
Adding a question mark for the prompt above ("who are you?").
config.yaml
fail_envoy_log.txt
fail_router_log.txt
How can I debug this?
attaching the logs.
Thank you
Metadata
Metadata
Assignees
Labels
No labels