Skip to content

Possible bug in KCP network code #70

@eddiesubedi

Description

@eddiesubedi

KCP Network: Disconnect Events Never Fire When Client Force-Closes

Summary

When using the KCP network transport, if a client force-closes (process killed, network disconnected, crash, etc.), the server never fires a disconnect event. The connection remains in an open state indefinitely, and disconnectHandler is never called.

Environment

  • due version: v2.4.2
  • Go version: 1.23+
  • Network transport: KCP (github.com/dobyte/due/network/kcp/v2)

When This Occurs

My multiplayer game architecture:

  1. Gate - KCP server with heartbeat enabled (10s interval), uses Redis locator and Consul registry
  2. Lobby Node - Handles player join, calls BindGate() and BindNode() to lobby
  3. Match-Maker Node - Manages matchmaking queue (Redis-based), finds opponents, delivers to game node
  4. Game Node - Handles active game sessions, game loop broadcasts state at 50ms (20 Hz)

Flow:

  1. Client connects to gate via KCP
  2. Client sends ROUTE_LOBBY_JOIN to lobby node
  3. Lobby node calls ctx.BindGate(uid) to bind user to gate
  4. Lobby node calls proxy.BindNode(ctx, uid, "lobby", nodeID) to bind user to lobby
  5. Client sends ROUTE_MATCH_QUEUE_REQUESTED to join matchmaking
  6. Lobby forwards request to match-maker node via proxy.Deliver()
  7. Match-maker adds player to Redis queue, waits for opponent
  8. When two players matched, match-maker calls proxy.Deliver() to game node with both player IDs
  9. Game node creates game instance, pushes ROUTE_MATCH_JOINED to both players
  10. Each client sends ROUTE_MATCH_READY, game node calls ctx.BindNode() to bind each player to game node
  11. When both ready, game loop starts - broadcasts state via proxy.Push() every 50ms to both players
  12. One client force-closes (kill process, Alt+F4, crash, network loss)
  13. Disconnect event never fires - game keeps broadcasting to dead client forever
  14. See this error: conn.go:195 connection heartbeat timeout on the Game Node and Gate

Key conditions:

  • User is bound to game node via ctx.BindNode() in the ready handler
  • Game loop continuously calls proxy.Push() to send state updates (50ms tick rate)

If the server is idle (not sending data), the heartbeat timeout in write() may eventually trigger. But in real-time games where the server constantly pushes updates, the issue is guaranteed to occur.

Expected Behavior

The server should detect that the client is no longer responding within 2 * heartbeatInterval (20 seconds with default settings) and:

  1. Close the connection
  2. Fire the disconnect event via disconnectHandler

Actual Behavior

The server blocks indefinitely waiting for data from the dead client. The disconnect event is never fired.

Root Cause Analysis

The issue is in network/kcp/server_conn.go in the read() function:

func (c *serverConn) read() {
    conn := c.conn

    for {
        select {
        case <-c.close:
            return
        default:
            msg, err := packet.ReadMessage(conn)  // BLOCKS FOREVER
            if err != nil {
                _ = c.forceClose(true)
                return
            }
            // ...
        }
    }
}

The packet.ReadMessage(conn) call eventually calls KCP's Read() method, which blocks indefinitely when no read deadline is set. Looking at github.com/xtaci/kcp-go/v5/sess.go:

func (s *UDPSession) Read(b []byte) (n int, err error) {
    var timeout *time.Timer
    var c <-chan time.Time
    if !s.rd.IsZero() {  // Read deadline
        delay := time.Until(s.rd)
        timeout = time.NewTimer(delay)
        c = timeout.C
    }

    for {
        // ... try to read data ...

        select {
        case <-s.chReadEvent:   // Data available
        case <-c:               // Timeout (only if deadline set!)
            return 0, errTimeout
        case <-s.chSocketReadError:
        case <-s.die:
        }
    }
}

If no read deadline is set (s.rd.IsZero() is true), the timeout channel c is nil, and the select will never return a timeout error.

Why the heartbeat timeout in write() does not work

The write() goroutine has heartbeat timeout logic in the ticker.C case:

case <-ticker.C:
    if lastHB < deadline {
        _ = c.forceClose(true)
        return
    }

However, this case can never fire when the server is continuously sending data:

  1. When a node is bound via BindNode() and sends frequent updates (e.g., game state at 50ms intervals)
  2. The chWrite channel always has data, so the case r, ok := <-c.chWrite always executes
  3. conn.Write() eventually blocks because KCP's send buffer fills up (no ACKs from dead client)
  4. The write goroutine is now blocked on conn.Write(), not on the select statement
  5. The ticker.C case never gets a chance to execute

This creates a deadlock where:

  • read() blocks on Read() with no deadline
  • write() blocks on Write() when the send buffer is full

Note: If the server is idle (not sending data), the ticker.C case may eventually fire and detect the timeout. This is why the bug may not be immediately apparent in simple test cases without continuous data flow.

Proposed Fix

Add a read deadline to the KCP connection based on the heartbeat interval:

func (c *serverConn) read() {
    conn := c.conn

    // Set read deadline based on heartbeat timeout (2x heartbeat interval)
    readTimeout := 2 * c.connMgr.server.opts.heartbeatInterval

    for {
        select {
        case <-c.close:
            return
        default:
            // Set read deadline before each read
            if readTimeout > 0 {
                conn.SetReadDeadline(time.Now().Add(readTimeout))
            }

            msg, err := packet.ReadMessage(conn)
            if err != nil {
                _ = c.forceClose(true)
                return
            }

            if c.connMgr.server.opts.heartbeatInterval > 0 {
                c.lastHeartbeatTime.Store(xtime.Now().UnixNano())
            }
            // ... rest of the function
        }
    }
}

This ensures that if no data is received within the timeout period, Read() returns a timeout error, which triggers forceClose() and properly fires the disconnect event.

Impact

This bug affects any application using KCP transport where:

  • Clients may disconnect unexpectedly (crashes, network issues, force-quit)
  • The server needs to detect and handle disconnections
  • Resources are tied to connection lifecycle (e.g., player sessions in games)

Without this fix, connections from dead clients remain open indefinitely, causing:

  • Resource leaks (goroutines, memory for connection state)
  • Incorrect online user counts
  • Game state inconsistencies (players appear online but are actually gone)
  • Failure to trigger cleanup logic in disconnect handlers

Thought I had

The TCP transport (network/tcp/server_conn.go) has similar code without a read deadline, but TCP handles this differently at the OS level - when a TCP connection is terminated, the OS sends FIN/RST packets which cause Read() to return an error. KCP, being UDP-based, has no such mechanism and relies entirely on application-level timeout handling.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions