|
| 1 | +# Server: batched SUB command processing |
| 2 | + |
| 3 | +Implementation plan for Part 1 of [RFC 2026-03-28-subscription-performance](../rfcs/2026-03-28-subscription-performance.md). |
| 4 | + |
| 5 | +## Current state |
| 6 | + |
| 7 | +When a batch of ~135 SUB commands arrives, the server already batches: |
| 8 | +- Queue record lookups (`getQueueRecs` in `receive`, Server.hs:1151) |
| 9 | +- Command verification (`verifyLoadedQueue`, Server.hs:1152) |
| 10 | + |
| 11 | +But command processing is per-command (`foldrM process` in `client`, Server.hs:1372-1375). Each SUB calls `subscribeQueueAndDeliver` which calls `tryPeekMsg` - one DB query per queue. For Postgres, that's ~135 individual `SELECT ... FROM messages WHERE recipient_id = ? ORDER BY message_id ASC LIMIT 1` queries per batch. |
| 12 | + |
| 13 | +## Goal |
| 14 | + |
| 15 | +Replace ~135 individual message peek queries with 1 batched query per batch. No protocol changes. |
| 16 | + |
| 17 | +## Implementation |
| 18 | + |
| 19 | +### Step 1: Add `tryPeekMsgs` to MsgStoreClass |
| 20 | + |
| 21 | +File: `src/Simplex/Messaging/Server/MsgStore/Types.hs` |
| 22 | + |
| 23 | +Add to `MsgStoreClass`: |
| 24 | + |
| 25 | +```haskell |
| 26 | +tryPeekMsgs :: s -> [StoreQueue s] -> ExceptT ErrorType IO (Map RecipientId Message) |
| 27 | +``` |
| 28 | + |
| 29 | +Returns a map from recipient ID to earliest pending message for each queue that has one. Queues with no messages are absent from the map. |
| 30 | + |
| 31 | +### Step 2: Parameterize `deliver` to accept pre-fetched message |
| 32 | + |
| 33 | +File: `src/Simplex/Messaging/Server.hs` |
| 34 | + |
| 35 | +Currently `deliver` (inside `subscribeQueueAndDeliver`, line 1641) calls `tryPeekMsg ms q`. Add a parameter for an optional pre-fetched message: |
| 36 | + |
| 37 | +```haskell |
| 38 | +deliver :: Maybe Message -> (Bool, Maybe Sub) -> M s ResponseAndMessage |
| 39 | +deliver prefetchedMsg (hasSub, sub_) = do |
| 40 | + stats <- asks serverStats |
| 41 | + fmap (either ((,Nothing) . err) id) $ liftIO $ runExceptT $ do |
| 42 | + msg_ <- maybe (tryPeekMsg ms q) (pure . Just) prefetchedMsg |
| 43 | + ... |
| 44 | +``` |
| 45 | + |
| 46 | +When `Nothing` is passed, falls back to individual `tryPeekMsg` (existing behavior). When `Just msg` is passed, uses it directly (batched path). |
| 47 | + |
| 48 | +### Step 3: Pre-fetch messages before the processing loop |
| 49 | + |
| 50 | +File: `src/Simplex/Messaging/Server.hs` |
| 51 | + |
| 52 | +Currently (lines 1372-1375): |
| 53 | + |
| 54 | +```haskell |
| 55 | +forever $ |
| 56 | + atomically (readTBQueue rcvQ) |
| 57 | + >>= foldrM process ([], []) |
| 58 | + >>= \(rs_, msgs) -> ... |
| 59 | +``` |
| 60 | + |
| 61 | +Add a pre-fetch step before the existing loop: |
| 62 | + |
| 63 | +```haskell |
| 64 | +forever $ do |
| 65 | + batch <- atomically (readTBQueue rcvQ) |
| 66 | + msgMap <- prefetchMsgs batch |
| 67 | + foldrM (process msgMap) ([], []) batch |
| 68 | + >>= \(rs_, msgs) -> ... |
| 69 | +``` |
| 70 | + |
| 71 | +`prefetchMsgs` scans the batch, collects queues from SUB commands that have a verified queue (`q_ = Just (q, _)`), calls `tryPeekMsgs` once, returns the map. For batches with no SUBs it returns an empty map (no DB call). |
| 72 | + |
| 73 | +`process` passes the looked-up message (or Nothing) through to `processCommand` and down to `deliver`. |
| 74 | + |
| 75 | +The `foldrM process` loop, `processCommand`, `subscribeQueueAndDeliver`, and all other command handlers stay structurally the same. Only `deliver` gains one parameter, and the `client` loop gains one pre-fetch call. |
| 76 | + |
| 77 | +### Step 4: Review |
| 78 | + |
| 79 | +Review the typeclass signature and server usage. Confirm the interface has the right shape before implementing store backends. |
| 80 | + |
| 81 | +### Step 5: Implement for each store backend |
| 82 | + |
| 83 | +#### Postgres |
| 84 | + |
| 85 | +File: `src/Simplex/Messaging/Server/MsgStore/Postgres.hs` |
| 86 | + |
| 87 | +Single query using `DISTINCT ON`: |
| 88 | + |
| 89 | +```sql |
| 90 | +SELECT DISTINCT ON (recipient_id) |
| 91 | + recipient_id, msg_id, msg_ts, msg_quota, msg_ntf_flag, msg_body |
| 92 | +FROM messages |
| 93 | +WHERE recipient_id IN ? |
| 94 | +ORDER BY recipient_id, message_id ASC |
| 95 | +``` |
| 96 | + |
| 97 | +Build `Map RecipientId Message` from results. |
| 98 | + |
| 99 | +#### STM |
| 100 | + |
| 101 | +File: `src/Simplex/Messaging/Server/MsgStore/STM.hs` |
| 102 | + |
| 103 | +Loop over queues, call `tryPeekMsg` for each, collect into map. |
| 104 | + |
| 105 | +#### Journal |
| 106 | + |
| 107 | +File: `src/Simplex/Messaging/Server/MsgStore/Journal.hs` |
| 108 | + |
| 109 | +Loop over queues, call `tryPeekMsg` for each, collect into map. |
| 110 | + |
| 111 | +### Step 6: Handle edge cases |
| 112 | + |
| 113 | +1. **Mixed batches**: `prefetchMsgs` collects only SUB queues. Non-SUB commands get Nothing for the pre-fetched message and process unchanged. |
| 114 | + |
| 115 | +2. **Already-subscribed queues**: Include in pre-fetch - `deliver` is called for re-SUBs too (delivers pending message). |
| 116 | + |
| 117 | +3. **Service subscriptions**: The pre-fetch doesn't care about service state. `sharedSubscribeQueue` handles service association in STM; message peek is the same. |
| 118 | + |
| 119 | +4. **Error queues**: Verification errors from `receive` are Left values in the batch. `prefetchMsgs` only looks at Right values with SUB commands. |
| 120 | + |
| 121 | +5. **Empty pre-fetch**: If batch has no SUBs (e.g., all ACKs), `prefetchMsgs` returns empty map, no DB call made. |
| 122 | + |
| 123 | +### Step 7: Batch other commands (future, not in scope) |
| 124 | + |
| 125 | +The same pattern (pre-fetch before loop, parameterize handler) can extend to: |
| 126 | +- `ACK` with `tryDelPeekMsg` - batch delete+peek |
| 127 | +- `GET` with `tryPeekMsg` - same map lookup |
| 128 | + |
| 129 | +Lower priority since these don't have the N-at-once pattern of subscriptions. |
| 130 | + |
| 131 | +## File changes summary |
| 132 | + |
| 133 | +| File | Change | |
| 134 | +|---|---| |
| 135 | +| `src/Simplex/Messaging/Server/MsgStore/Types.hs` | Add `tryPeekMsgs` to typeclass | |
| 136 | +| `src/Simplex/Messaging/Server/MsgStore/Postgres.hs` | Implement `tryPeekMsgs` with batch SQL | |
| 137 | +| `src/Simplex/Messaging/Server/MsgStore/STM.hs` | Implement `tryPeekMsgs` as loop | |
| 138 | +| `src/Simplex/Messaging/Server/MsgStore/Journal.hs` | Implement `tryPeekMsgs` as loop | |
| 139 | +| `src/Simplex/Messaging/Server.hs` | Add `prefetchMsgs`, parameterize `deliver` | |
| 140 | + |
| 141 | +## Testing |
| 142 | + |
| 143 | +1. Existing server tests must pass unchanged (correctness preserved). |
| 144 | +2. Add a test that subscribes a batch of queues (some with pending messages, some without) and verifies all get correct SOK + MSG responses. |
| 145 | +3. Prometheus metrics: existing `qSub` stat should still increment correctly. |
| 146 | + |
| 147 | +## Performance expectation |
| 148 | + |
| 149 | +For 300K queues across ~2200 batches: |
| 150 | +- Before: ~300K individual DB queries |
| 151 | +- After: ~2200 batched DB queries (one per batch of ~135) |
| 152 | +- ~136x reduction in DB round-trips |
0 commit comments