Skip to content

[BUG] Pipeline stalls/deadlocks at high concurrency (>=1000) - regression from 0.6.0 #5276

@nnshah1

Description

@nnshah1

Summary

The request processing pipeline becomes completely unresponsive at high concurrency levels (>=1000 concurrent requests). This is a regression from version 0.6.0, which handled the same workload without issues.

Symptoms

  • Pipeline processes ~7000 requests successfully, then stalls completely
  • All new requests timeout with errors
  • Basic endpoints like /v1/models become unresponsive
  • Python async tasks continue executing (memory reporter prints work) - not a GIL deadlock
  • Suspected to be a tokio runtime deadlock or resource starvation

Reproduction Steps

  1. Set up a minimal pipeline: frontend -> backend-proxy -> backend
  2. Run load test at high concurrency:
    # Works fine at concurrency 400
    ./load_test.py --payload-size 2 --concurrency 400
    # Result: 10000 requests, 0 errors, 1639.8 req/s
    
    # Deadlocks at concurrency 1000
    ./load_test.py --payload-size 2 --concurrency 1000

Observed Behavior

Request 7000 completed in 0.110s
# STALLS FOR 100s of seconds here
Total error count: 100
Total error count: 200
...
Total error count: 1800

After the stall:

$ curl http://localhost:8000/v1/models
# Hangs indefinitely - no response

Comparison

Version Concurrency Result
0.6.0 1000 Works correctly
main 400 Works correctly
main 1000 Stalls after ~7k requests

Likely Areas to Investigate

  • Changes between 0.6.0 and main affecting concurrency handling
  • Tokio runtime configuration (thread pool exhaustion, task starvation)
  • Channel/queue backpressure mechanisms
  • Connection handling limits in Rust HTTP layer

Related

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions