Skip to content

Commit d130f42

Browse files
authored
fix: refine websocket client reconnect (#31)
* docs: websocket reconnect redesign * fix: refine LoroWebsocketClientt reconnect * fix: room auto rejoin and status listener * chore: fix warnings * fix: enhance error handling in LoroWebsocketClient callbacks and ensure proper logging * fix: rm maxQueue limit to improve DX * fix: connected promise maintain * docs: simplify reconnect prd * fix: prevent unhandled rejection if nobody awaits the connected promise
1 parent faab7df commit d130f42

File tree

3 files changed

+668
-80
lines changed

3 files changed

+668
-80
lines changed
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# WebSocket Client Reconnect
2+
3+
Purpose: describe the desired reconnect behavior for the browser/node client in a concise, implementation‑agnostic way.
4+
5+
## Goals
6+
- Stay connected across transient network issues without user code handling retries.
7+
- Avoid tight retry loops when offline or after fatal server closes.
8+
- Provide predictable hooks so apps can show status and react to failures.
9+
10+
## Connection Model
11+
- States: `connecting`, `connected`, `reconnecting`, `disconnected`, `error`.
12+
- The client starts connecting immediately. Any disconnection while retrying is allowed moves to `reconnecting`; fatal conditions move to `disconnected`.
13+
- A single promise (`waitConnected`) always resolves on the next successful transition to `connected`; it is renewed on each reconnect attempt.
14+
15+
## Retry Policy
16+
- Enabled by default; exponential backoff starting at ~0.5s, capped around 15s, with jitter (~25%) to prevent herding.
17+
- Retries continue indefinitely unless a maximum attempt count is configured.
18+
- Fatal stop conditions halt retries (e.g., permission/auth failures, explicit fatal close codes or reasons). After a fatal stop, the client remains `disconnected` until manually retried.
19+
20+
## Liveness & Half‑Open Detection
21+
- Periodic application‑level pings are sent while connected.
22+
- Missing pongs trigger a controlled close with a liveness reason, which then enters the normal backoff flow. This prevents silent half‑open sockets.
23+
24+
## Offline Behavior
25+
- When the environment reports offline, active retries are paused and the socket is closed cleanly.
26+
- When coming back online, a reconnect is scheduled immediately (backoff resets unless disabled).
27+
28+
## Join Handling
29+
- `join` calls issued while the socket is not yet open are enqueued and flushed after connect.
30+
- The queue is unbounded by design; applications concerned about backpressure should gate their own join volume.
31+
- Each join exposes optional per‑room status callbacks: `connecting`, `joined`, `reconnecting`, `disconnected`, `error`.
32+
33+
## Room Rejoin
34+
- Successfully joined rooms are tracked (room id + CRDT type + auth bytes).
35+
- After reconnect, the client automatically resends JoinRequest for each tracked room.
36+
- If a rejoin fails fatally, the room moves to `error` and is removed from the tracked set so callers can decide next steps.
37+
38+
## Manual Controls
39+
- `connect({ resetBackoff?: boolean })` or `retryNow()` starts/forces a reconnect and optionally resets backoff.
40+
- `close()` stops auto‑reconnect and transitions to `disconnected`; callers must explicitly reconnect afterwards.
41+
42+
## Observability Hooks
43+
- Client status listener: notifies transitions among the top‑level states.
44+
- Per‑room status listener: notifies the per‑room states listed above.
45+
- Optional latency callback fed by ping RTT measurements.
46+
47+
## Success Criteria
48+
- Retries pause while offline and resume promptly when online.
49+
- Missing pongs or half‑open links recover via reconnect.
50+
- Fatal closes stop retries; manual retry is still possible.
51+
- Queued joins do not throw and complete once connected; failed rejoins surface as `error` so apps can respond.

0 commit comments

Comments
 (0)