Possible bug in KCP network code

# KCP Network: Disconnect Events Never Fire When Client Force-Closes

## Summary

When using the KCP network transport, if a client force-closes (process killed, network disconnected, crash, etc.), the server never fires a disconnect event. The connection remains in an open state indefinitely, and `disconnectHandler` is never called.

## Environment

- due version: v2.4.2
- Go version: 1.23+
- Network transport: KCP (`github.com/dobyte/due/network/kcp/v2`)

## When This Occurs

My multiplayer game architecture:

1. **Gate** - KCP server with heartbeat enabled (10s interval), uses Redis locator and Consul registry
2. **Lobby Node** - Handles player join, calls `BindGate()` and `BindNode()` to lobby
3. **Match-Maker Node** - Manages matchmaking queue (Redis-based), finds opponents, delivers to game node
4. **Game Node** - Handles active game sessions, game loop broadcasts state at 50ms (20 Hz)

**Flow**:
1. Client connects to gate via KCP
2. Client sends `ROUTE_LOBBY_JOIN` to lobby node
3. Lobby node calls `ctx.BindGate(uid)` to bind user to gate
4. Lobby node calls `proxy.BindNode(ctx, uid, "lobby", nodeID)` to bind user to lobby
5. Client sends `ROUTE_MATCH_QUEUE_REQUESTED` to join matchmaking
6. Lobby forwards request to match-maker node via `proxy.Deliver()`
7. Match-maker adds player to Redis queue, waits for opponent
8. When two players matched, match-maker calls `proxy.Deliver()` to game node with both player IDs
9. Game node creates game instance, pushes `ROUTE_MATCH_JOINED` to both players
10. Each client sends `ROUTE_MATCH_READY`, game node calls `ctx.BindNode()` to bind each player to game node
11. When both ready, game loop starts - broadcasts state via `proxy.Push()` every 50ms to both players
12. One client force-closes (kill process, Alt+F4, crash, network loss)
13. **Disconnect event never fires** - game keeps broadcasting to dead client forever
14. See this error: conn.go:195 connection heartbeat timeout on the Game Node and Gate

**Key conditions**:
- User is bound to game node via `ctx.BindNode()` in the ready handler
- Game loop continuously calls `proxy.Push()` to send state updates (50ms tick rate)

If the server is idle (not sending data), the heartbeat timeout in `write()` may eventually trigger. But in real-time games where the server constantly pushes updates, the issue is guaranteed to occur.

## Expected Behavior

The server should detect that the client is no longer responding within `2 * heartbeatInterval` (20 seconds with default settings) and:
1. Close the connection
2. Fire the disconnect event via `disconnectHandler`

## Actual Behavior

The server blocks indefinitely waiting for data from the dead client. The disconnect event is never fired.

## Root Cause Analysis

The issue is in `network/kcp/server_conn.go` in the `read()` function:

```go
func (c *serverConn) read() {
    conn := c.conn

    for {
        select {
        case <-c.close:
            return
        default:
            msg, err := packet.ReadMessage(conn)  // BLOCKS FOREVER
            if err != nil {
                _ = c.forceClose(true)
                return
            }
            // ...
        }
    }
}
```

The `packet.ReadMessage(conn)` call eventually calls KCP's `Read()` method, which blocks indefinitely when no read deadline is set. Looking at `github.com/xtaci/kcp-go/v5/sess.go`:

```go
func (s *UDPSession) Read(b []byte) (n int, err error) {
    var timeout *time.Timer
    var c <-chan time.Time
    if !s.rd.IsZero() {  // Read deadline
        delay := time.Until(s.rd)
        timeout = time.NewTimer(delay)
        c = timeout.C
    }

    for {
        // ... try to read data ...

        select {
        case <-s.chReadEvent:   // Data available
        case <-c:               // Timeout (only if deadline set!)
            return 0, errTimeout
        case <-s.chSocketReadError:
        case <-s.die:
        }
    }
}
```

If no read deadline is set (`s.rd.IsZero()` is true), the timeout channel `c` is nil, and the select will never return a timeout error.

### Why the heartbeat timeout in write() does not work

The `write()` goroutine has heartbeat timeout logic in the `ticker.C` case:

```go
case <-ticker.C:
    if lastHB < deadline {
        _ = c.forceClose(true)
        return
    }
```

However, this case can never fire when the server is continuously sending data:

1. When a node is bound via `BindNode()` and sends frequent updates (e.g., game state at 50ms intervals)
2. The `chWrite` channel always has data, so the `case r, ok := <-c.chWrite` always executes
3. `conn.Write()` eventually blocks because KCP's send buffer fills up (no ACKs from dead client)
4. The write goroutine is now blocked on `conn.Write()`, not on the select statement
5. The `ticker.C` case never gets a chance to execute

This creates a deadlock where:
- `read()` blocks on `Read()` with no deadline
- `write()` blocks on `Write()` when the send buffer is full

**Note**: If the server is idle (not sending data), the `ticker.C` case may eventually fire and detect the timeout. This is why the bug may not be immediately apparent in simple test cases without continuous data flow.

## Proposed Fix

Add a read deadline to the KCP connection based on the heartbeat interval:

```go
func (c *serverConn) read() {
    conn := c.conn

    // Set read deadline based on heartbeat timeout (2x heartbeat interval)
    readTimeout := 2 * c.connMgr.server.opts.heartbeatInterval

    for {
        select {
        case <-c.close:
            return
        default:
            // Set read deadline before each read
            if readTimeout > 0 {
                conn.SetReadDeadline(time.Now().Add(readTimeout))
            }

            msg, err := packet.ReadMessage(conn)
            if err != nil {
                _ = c.forceClose(true)
                return
            }

            if c.connMgr.server.opts.heartbeatInterval > 0 {
                c.lastHeartbeatTime.Store(xtime.Now().UnixNano())
            }
            // ... rest of the function
        }
    }
}
```

This ensures that if no data is received within the timeout period, `Read()` returns a timeout error, which triggers `forceClose()` and properly fires the disconnect event.

## Impact

This bug affects any application using KCP transport where:
- Clients may disconnect unexpectedly (crashes, network issues, force-quit)
- The server needs to detect and handle disconnections
- Resources are tied to connection lifecycle (e.g., player sessions in games)

Without this fix, connections from dead clients remain open indefinitely, causing:
- Resource leaks (goroutines, memory for connection state)
- Incorrect online user counts
- Game state inconsistencies (players appear online but are actually gone)
- Failure to trigger cleanup logic in disconnect handlers

## Thought I had

The TCP transport (`network/tcp/server_conn.go`) has similar code without a read deadline, but TCP handles this differently at the OS level - when a TCP connection is terminated, the OS sends FIN/RST packets which cause `Read()` to return an error. KCP, being UDP-based, has no such mechanism and relies entirely on application-level timeout handling.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug in KCP network code #70

KCP Network: Disconnect Events Never Fire When Client Force-Closes

Summary

Environment

When This Occurs

Expected Behavior

Actual Behavior

Root Cause Analysis

Why the heartbeat timeout in write() does not work

Proposed Fix

Impact

Thought I had

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Possible bug in KCP network code #70

Description

KCP Network: Disconnect Events Never Fire When Client Force-Closes

Summary

Environment

When This Occurs

Expected Behavior

Actual Behavior

Root Cause Analysis

Why the heartbeat timeout in write() does not work

Proposed Fix

Impact

Thought I had

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions