|
| 1 | +# WebSocket Client Reconnect |
| 2 | + |
| 3 | +Purpose: describe the desired reconnect behavior for the browser/node client in a concise, implementation‑agnostic way. |
| 4 | + |
| 5 | +## Goals |
| 6 | +- Stay connected across transient network issues without user code handling retries. |
| 7 | +- Avoid tight retry loops when offline or after fatal server closes. |
| 8 | +- Provide predictable hooks so apps can show status and react to failures. |
| 9 | + |
| 10 | +## Connection Model |
| 11 | +- States: `connecting`, `connected`, `reconnecting`, `disconnected`, `error`. |
| 12 | +- The client starts connecting immediately. Any disconnection while retrying is allowed moves to `reconnecting`; fatal conditions move to `disconnected`. |
| 13 | +- A single promise (`waitConnected`) always resolves on the next successful transition to `connected`; it is renewed on each reconnect attempt. |
| 14 | + |
| 15 | +## Retry Policy |
| 16 | +- Enabled by default; exponential backoff starting at ~0.5s, capped around 15s, with jitter (~25%) to prevent herding. |
| 17 | +- Retries continue indefinitely unless a maximum attempt count is configured. |
| 18 | +- Fatal stop conditions halt retries (e.g., permission/auth failures, explicit fatal close codes or reasons). After a fatal stop, the client remains `disconnected` until manually retried. |
| 19 | + |
| 20 | +## Liveness & Half‑Open Detection |
| 21 | +- Periodic application‑level pings are sent while connected. |
| 22 | +- Missing pongs trigger a controlled close with a liveness reason, which then enters the normal backoff flow. This prevents silent half‑open sockets. |
| 23 | + |
| 24 | +## Offline Behavior |
| 25 | +- When the environment reports offline, active retries are paused and the socket is closed cleanly. |
| 26 | +- When coming back online, a reconnect is scheduled immediately (backoff resets unless disabled). |
| 27 | + |
| 28 | +## Join Handling |
| 29 | +- `join` calls issued while the socket is not yet open are enqueued and flushed after connect. |
| 30 | +- The queue is unbounded by design; applications concerned about backpressure should gate their own join volume. |
| 31 | +- Each join exposes optional per‑room status callbacks: `connecting`, `joined`, `reconnecting`, `disconnected`, `error`. |
| 32 | + |
| 33 | +## Room Rejoin |
| 34 | +- Successfully joined rooms are tracked (room id + CRDT type + auth bytes). |
| 35 | +- After reconnect, the client automatically resends JoinRequest for each tracked room. |
| 36 | +- If a rejoin fails fatally, the room moves to `error` and is removed from the tracked set so callers can decide next steps. |
| 37 | + |
| 38 | +## Manual Controls |
| 39 | +- `connect({ resetBackoff?: boolean })` or `retryNow()` starts/forces a reconnect and optionally resets backoff. |
| 40 | +- `close()` stops auto‑reconnect and transitions to `disconnected`; callers must explicitly reconnect afterwards. |
| 41 | + |
| 42 | +## Observability Hooks |
| 43 | +- Client status listener: notifies transitions among the top‑level states. |
| 44 | +- Per‑room status listener: notifies the per‑room states listed above. |
| 45 | +- Optional latency callback fed by ping RTT measurements. |
| 46 | + |
| 47 | +## Success Criteria |
| 48 | +- Retries pause while offline and resume promptly when online. |
| 49 | +- Missing pongs or half‑open links recover via reconnect. |
| 50 | +- Fatal closes stop retries; manual retry is still possible. |
| 51 | +- Queued joins do not throw and complete once connected; failed rejoins surface as `error` so apps can respond. |
0 commit comments