-
Notifications
You must be signed in to change notification settings - Fork 154
Description
KCP Network: Disconnect Events Never Fire When Client Force-Closes
Summary
When using the KCP network transport, if a client force-closes (process killed, network disconnected, crash, etc.), the server never fires a disconnect event. The connection remains in an open state indefinitely, and disconnectHandler is never called.
Environment
- due version: v2.4.2
- Go version: 1.23+
- Network transport: KCP (
github.com/dobyte/due/network/kcp/v2)
When This Occurs
My multiplayer game architecture:
- Gate - KCP server with heartbeat enabled (10s interval), uses Redis locator and Consul registry
- Lobby Node - Handles player join, calls
BindGate()andBindNode()to lobby - Match-Maker Node - Manages matchmaking queue (Redis-based), finds opponents, delivers to game node
- Game Node - Handles active game sessions, game loop broadcasts state at 50ms (20 Hz)
Flow:
- Client connects to gate via KCP
- Client sends
ROUTE_LOBBY_JOINto lobby node - Lobby node calls
ctx.BindGate(uid)to bind user to gate - Lobby node calls
proxy.BindNode(ctx, uid, "lobby", nodeID)to bind user to lobby - Client sends
ROUTE_MATCH_QUEUE_REQUESTEDto join matchmaking - Lobby forwards request to match-maker node via
proxy.Deliver() - Match-maker adds player to Redis queue, waits for opponent
- When two players matched, match-maker calls
proxy.Deliver()to game node with both player IDs - Game node creates game instance, pushes
ROUTE_MATCH_JOINEDto both players - Each client sends
ROUTE_MATCH_READY, game node callsctx.BindNode()to bind each player to game node - When both ready, game loop starts - broadcasts state via
proxy.Push()every 50ms to both players - One client force-closes (kill process, Alt+F4, crash, network loss)
- Disconnect event never fires - game keeps broadcasting to dead client forever
- See this error: conn.go:195 connection heartbeat timeout on the Game Node and Gate
Key conditions:
- User is bound to game node via
ctx.BindNode()in the ready handler - Game loop continuously calls
proxy.Push()to send state updates (50ms tick rate)
If the server is idle (not sending data), the heartbeat timeout in write() may eventually trigger. But in real-time games where the server constantly pushes updates, the issue is guaranteed to occur.
Expected Behavior
The server should detect that the client is no longer responding within 2 * heartbeatInterval (20 seconds with default settings) and:
- Close the connection
- Fire the disconnect event via
disconnectHandler
Actual Behavior
The server blocks indefinitely waiting for data from the dead client. The disconnect event is never fired.
Root Cause Analysis
The issue is in network/kcp/server_conn.go in the read() function:
func (c *serverConn) read() {
conn := c.conn
for {
select {
case <-c.close:
return
default:
msg, err := packet.ReadMessage(conn) // BLOCKS FOREVER
if err != nil {
_ = c.forceClose(true)
return
}
// ...
}
}
}The packet.ReadMessage(conn) call eventually calls KCP's Read() method, which blocks indefinitely when no read deadline is set. Looking at github.com/xtaci/kcp-go/v5/sess.go:
func (s *UDPSession) Read(b []byte) (n int, err error) {
var timeout *time.Timer
var c <-chan time.Time
if !s.rd.IsZero() { // Read deadline
delay := time.Until(s.rd)
timeout = time.NewTimer(delay)
c = timeout.C
}
for {
// ... try to read data ...
select {
case <-s.chReadEvent: // Data available
case <-c: // Timeout (only if deadline set!)
return 0, errTimeout
case <-s.chSocketReadError:
case <-s.die:
}
}
}If no read deadline is set (s.rd.IsZero() is true), the timeout channel c is nil, and the select will never return a timeout error.
Why the heartbeat timeout in write() does not work
The write() goroutine has heartbeat timeout logic in the ticker.C case:
case <-ticker.C:
if lastHB < deadline {
_ = c.forceClose(true)
return
}However, this case can never fire when the server is continuously sending data:
- When a node is bound via
BindNode()and sends frequent updates (e.g., game state at 50ms intervals) - The
chWritechannel always has data, so thecase r, ok := <-c.chWritealways executes conn.Write()eventually blocks because KCP's send buffer fills up (no ACKs from dead client)- The write goroutine is now blocked on
conn.Write(), not on the select statement - The
ticker.Ccase never gets a chance to execute
This creates a deadlock where:
read()blocks onRead()with no deadlinewrite()blocks onWrite()when the send buffer is full
Note: If the server is idle (not sending data), the ticker.C case may eventually fire and detect the timeout. This is why the bug may not be immediately apparent in simple test cases without continuous data flow.
Proposed Fix
Add a read deadline to the KCP connection based on the heartbeat interval:
func (c *serverConn) read() {
conn := c.conn
// Set read deadline based on heartbeat timeout (2x heartbeat interval)
readTimeout := 2 * c.connMgr.server.opts.heartbeatInterval
for {
select {
case <-c.close:
return
default:
// Set read deadline before each read
if readTimeout > 0 {
conn.SetReadDeadline(time.Now().Add(readTimeout))
}
msg, err := packet.ReadMessage(conn)
if err != nil {
_ = c.forceClose(true)
return
}
if c.connMgr.server.opts.heartbeatInterval > 0 {
c.lastHeartbeatTime.Store(xtime.Now().UnixNano())
}
// ... rest of the function
}
}
}This ensures that if no data is received within the timeout period, Read() returns a timeout error, which triggers forceClose() and properly fires the disconnect event.
Impact
This bug affects any application using KCP transport where:
- Clients may disconnect unexpectedly (crashes, network issues, force-quit)
- The server needs to detect and handle disconnections
- Resources are tied to connection lifecycle (e.g., player sessions in games)
Without this fix, connections from dead clients remain open indefinitely, causing:
- Resource leaks (goroutines, memory for connection state)
- Incorrect online user counts
- Game state inconsistencies (players appear online but are actually gone)
- Failure to trigger cleanup logic in disconnect handlers
Thought I had
The TCP transport (network/tcp/server_conn.go) has similar code without a read deadline, but TCP handles this differently at the OS level - when a TCP connection is terminated, the OS sends FIN/RST packets which cause Read() to return an error. KCP, being UDP-based, has no such mechanism and relies entirely on application-level timeout handling.