-
Notifications
You must be signed in to change notification settings - Fork 840
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Summary
The request processing pipeline becomes completely unresponsive at high concurrency levels (>=1000 concurrent requests). This is a regression from version 0.6.0, which handled the same workload without issues.
Symptoms
- Pipeline processes ~7000 requests successfully, then stalls completely
- All new requests timeout with errors
- Basic endpoints like
/v1/modelsbecome unresponsive - Python async tasks continue executing (memory reporter prints work) - not a GIL deadlock
- Suspected to be a tokio runtime deadlock or resource starvation
Reproduction Steps
- Set up a minimal pipeline:
frontend -> backend-proxy -> backend - Run load test at high concurrency:
# Works fine at concurrency 400 ./load_test.py --payload-size 2 --concurrency 400 # Result: 10000 requests, 0 errors, 1639.8 req/s # Deadlocks at concurrency 1000 ./load_test.py --payload-size 2 --concurrency 1000
Observed Behavior
Request 7000 completed in 0.110s
# STALLS FOR 100s of seconds here
Total error count: 100
Total error count: 200
...
Total error count: 1800
After the stall:
$ curl http://localhost:8000/v1/models
# Hangs indefinitely - no responseComparison
| Version | Concurrency | Result |
|---|---|---|
| 0.6.0 | 1000 | Works correctly |
| main | 400 | Works correctly |
| main | 1000 | Stalls after ~7k requests |
Likely Areas to Investigate
- Changes between 0.6.0 and main affecting concurrency handling
- Tokio runtime configuration (thread pool exhaustion, task starvation)
- Channel/queue backpressure mechanisms
- Connection handling limits in Rust HTTP layer
Related
- Reproduction code in PR fix: repro memory leak and deadlock under high payloads and high concurrency #5269
- Works correctly on 0.6.0 branch
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working