Skip to content

WebSocket Handshake Timeout During Peak Traffic - H13 Connection Closed Without Response #2165

@sofiz

Description

@sofiz

WebSocket Handshake Timeout During Peak Traffic - H13 Connection Closed Without Response

Summary

Experiencing massive WebSocket handshake timeouts during peak server usage, resulting in hundreds to thousands of H13 errors (Connection closed without response) within minutes. The issue affects both long-lived and short-lived connections and is temporarily resolved by scaling dynos.

Environment

  • Channels Version: 4.1.0
  • Django Version: 4.2.11
  • Python Version: 3.9+ (from runtime.txt)
  • Deployment: Heroku with Daphne
  • Channel Layer: InMemoryChannelLayer
  • Load Pattern: Peak traffic causes 200-1000+ timeout errors in minutes

Error Details

Heroku Logs

2025-06-14 17:08:48.001 UTC
proc_id:web.1 2025-06-14 18:08:48,001 WARNING dropping connection to peer tcp4:10.1.35.137:30895 with abort=True: WebSocket opening handshake timeout (peer did not finish the opening handshake in time)

2025-06-14 17:08:48.005 UTC
proc_id:router heroku.dyno:web.1 heroku.status:503 heroku.method:GET heroku.host:top-up-a2061d35c25e.herokuapp.com heroku.path:/ws/MyConsumer/?operator=djezzy heroku.client:197.207.228.115 at=error code=H13 desc="Connection closed without response" method=GET path="/ws/MyConsumer/?operator=djezzy" host=top-up-a2061d35c25e.herokuapp.com request_id=fc265640-9fe6-426e-bcc7-f85558f6bfc8 fwd="197.207.228.115" dyno=web.1 connect=0ms service=4865ms status=503 bytes=0 protocol=http

Current Configuration

Procfile:

web: daphne topup.asgi:application --port $PORT --bind 0.0.0.0 -v2 --websocket_timeout 120

Channel Layers (settings.py):

CHANNEL_LAYERS = {
    'default': {
        'BACKEND': 'channels.layers.InMemoryChannelLayer',
    },
}

Problem Analysis

Consumer Implementation Issues

class MyConsumer(WebsocketConsumer):
    def connect(self):
        headers = dict(self.scope["headers"])
        api_key = headers.get(b'apikey')

        if True:
            if not api_key:
                self.user = self.scope["user"]
                if self.user.is_anonymous:
                    self.accept()
                    self.send({'status': 'error', 'message': 'API key is missing'})
                    self.close()
                    return
                else:
                    self.accept()
            else:
                api_key = api_key.decode('utf-8')
                try:
                    api_key_obj = ApiKeys.objects.get(apikey=api_key)
                    self.user = api_key_obj.user
                except ApiKeys.DoesNotExist:
                    self.accept()
                    self.send({'status': 'error', 'message': 'Invalid API key'})
                    self.close()
                    return

                if not self.user:
                    self.accept()
                    self.send({'status': 'error', 'message': 'You are not authenticated'})
                    self.close()
                else:
                    self.accept()

            user = Userinfo.objects.get(username=self.user.username)
            query_string = self.scope['query_string'].decode()
            query_params = dict(qc.split('=') for qc in query_string.split('&'))
            operator = query_params.get('operator')
            self.sent_operations = set()
            self.sent_offers = set()
            if user.user_type != 'super_user' :
                self.send_json({'status': 'error', 'message': 'You do not have permission to access this endpoint'})

        async_to_sync(self.channel_layer.group_add)(
            user.username + operator,
            self.channel_name
        )
        self.send(text_data=json.dumps({
            'message': 'You are connected',
            'group' : user.username + operator
        }))

Scaling Behavior

  • Peak Traffic: 200-1000+ timeouts in minutes
  • Temporary Fix: Scaling dynos resolves the issue
  • Pattern: Affects both long-lived and short-lived connections
  • Root Cause: Event loop blocking during WebSocket handshake

Expected Behavior

WebSocket connections should complete handshake within reasonable time even during peak traffic.

Actual Behavior

During peak usage:

  1. WebSocket handshake timeouts occur en masse
  2. Heroku returns H13 errors (Connection closed without response)
  3. Clients cannot establish WebSocket connections
  4. Issue resolves temporarily after scaling dynos

Questions for Maintainers

  1. Is there a recommended pattern for handling heavy database operations during WebSocket connect without blocking the handshake?

  2. Should channels provide better guidance on async vs sync consumers for production use?

  3. Are there built-in mechanisms to defer operations until after handshake completion?

  4. What are the recommended timeout values for production deployments with high concurrent connections?

Additional Context

  • Using django-db-connection-pool[postgresql] for database connection pooling
  • Application handles real-time operation updates and offers
  • Peak traffic involves hundreds of concurrent WebSocket connections
  • Current workaround: Manual dyno scaling during peak hours

Reproducible Test Case

The issue can be reproduced by:

  1. Creating multiple concurrent WebSocket connections (200+)
  2. Each connection triggers heavy database operations in connect()
  3. Monitor for handshake timeout warnings and H13 errors

Would appreciate guidance on best practices for handling this scenario and whether this indicates a bug in channels or a configuration/implementation issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions