-
-
Notifications
You must be signed in to change notification settings - Fork 824
Description
WebSocket Handshake Timeout During Peak Traffic - H13 Connection Closed Without Response
Summary
Experiencing massive WebSocket handshake timeouts during peak server usage, resulting in hundreds to thousands of H13 errors (Connection closed without response) within minutes. The issue affects both long-lived and short-lived connections and is temporarily resolved by scaling dynos.
Environment
- Channels Version: 4.1.0
- Django Version: 4.2.11
- Python Version: 3.9+ (from runtime.txt)
- Deployment: Heroku with Daphne
- Channel Layer: InMemoryChannelLayer
- Load Pattern: Peak traffic causes 200-1000+ timeout errors in minutes
Error Details
Heroku Logs
2025-06-14 17:08:48.001 UTC
proc_id:web.1 2025-06-14 18:08:48,001 WARNING dropping connection to peer tcp4:10.1.35.137:30895 with abort=True: WebSocket opening handshake timeout (peer did not finish the opening handshake in time)
2025-06-14 17:08:48.005 UTC
proc_id:router heroku.dyno:web.1 heroku.status:503 heroku.method:GET heroku.host:top-up-a2061d35c25e.herokuapp.com heroku.path:/ws/MyConsumer/?operator=djezzy heroku.client:197.207.228.115 at=error code=H13 desc="Connection closed without response" method=GET path="/ws/MyConsumer/?operator=djezzy" host=top-up-a2061d35c25e.herokuapp.com request_id=fc265640-9fe6-426e-bcc7-f85558f6bfc8 fwd="197.207.228.115" dyno=web.1 connect=0ms service=4865ms status=503 bytes=0 protocol=http
Current Configuration
Procfile:
web: daphne topup.asgi:application --port $PORT --bind 0.0.0.0 -v2 --websocket_timeout 120
Channel Layers (settings.py):
CHANNEL_LAYERS = {
'default': {
'BACKEND': 'channels.layers.InMemoryChannelLayer',
},
}Problem Analysis
Consumer Implementation Issues
class MyConsumer(WebsocketConsumer):
def connect(self):
headers = dict(self.scope["headers"])
api_key = headers.get(b'apikey')
if True:
if not api_key:
self.user = self.scope["user"]
if self.user.is_anonymous:
self.accept()
self.send({'status': 'error', 'message': 'API key is missing'})
self.close()
return
else:
self.accept()
else:
api_key = api_key.decode('utf-8')
try:
api_key_obj = ApiKeys.objects.get(apikey=api_key)
self.user = api_key_obj.user
except ApiKeys.DoesNotExist:
self.accept()
self.send({'status': 'error', 'message': 'Invalid API key'})
self.close()
return
if not self.user:
self.accept()
self.send({'status': 'error', 'message': 'You are not authenticated'})
self.close()
else:
self.accept()
user = Userinfo.objects.get(username=self.user.username)
query_string = self.scope['query_string'].decode()
query_params = dict(qc.split('=') for qc in query_string.split('&'))
operator = query_params.get('operator')
self.sent_operations = set()
self.sent_offers = set()
if user.user_type != 'super_user' :
self.send_json({'status': 'error', 'message': 'You do not have permission to access this endpoint'})
async_to_sync(self.channel_layer.group_add)(
user.username + operator,
self.channel_name
)
self.send(text_data=json.dumps({
'message': 'You are connected',
'group' : user.username + operator
}))
Scaling Behavior
- Peak Traffic: 200-1000+ timeouts in minutes
- Temporary Fix: Scaling dynos resolves the issue
- Pattern: Affects both long-lived and short-lived connections
- Root Cause: Event loop blocking during WebSocket handshake
Expected Behavior
WebSocket connections should complete handshake within reasonable time even during peak traffic.
Actual Behavior
During peak usage:
- WebSocket handshake timeouts occur en masse
- Heroku returns H13 errors (Connection closed without response)
- Clients cannot establish WebSocket connections
- Issue resolves temporarily after scaling dynos
Questions for Maintainers
-
Is there a recommended pattern for handling heavy database operations during WebSocket connect without blocking the handshake?
-
Should channels provide better guidance on async vs sync consumers for production use?
-
Are there built-in mechanisms to defer operations until after handshake completion?
-
What are the recommended timeout values for production deployments with high concurrent connections?
Additional Context
- Using
django-db-connection-pool[postgresql]for database connection pooling - Application handles real-time operation updates and offers
- Peak traffic involves hundreds of concurrent WebSocket connections
- Current workaround: Manual dyno scaling during peak hours
Reproducible Test Case
The issue can be reproduced by:
- Creating multiple concurrent WebSocket connections (200+)
- Each connection triggers heavy database operations in
connect() - Monitor for handshake timeout warnings and H13 errors
Would appreciate guidance on best practices for handling this scenario and whether this indicates a bug in channels or a configuration/implementation issue.