|
1 | | -## Memory Diagnostics Results (5.5 hours, 07:49 - 13:19, Mar 19) |
| 1 | +## Memory Diagnostics Results |
2 | 2 |
|
3 | | -**rts_heap (total process heap) — grows monotonically, never shrinks:** |
4 | | -``` |
5 | | -07:49 10.1 GB |
6 | | -09:24 14.1 GB |
7 | | -11:24 18.7 GB |
8 | | -13:19 20.7 GB |
9 | | -``` |
10 | | -Growth: **+10.6 GB in 5.5 hours** (~1.9 GB/hour). GHC never returns memory to the OS. |
| 3 | +### Data Collection |
| 4 | + |
| 5 | +Server: smp19.simplex.im, PostgreSQL backend, `useCache = False` |
| 6 | +RTS flags: `+RTS -N -A16m -I0.01 -Iw15 -s -RTS` (16 cores) |
| 7 | + |
| 8 | +### Mar 20 Data (1 hour, 07:19-08:19) |
11 | 9 |
|
12 | | -**rts_live (actual live data) — sawtooth pattern, minimums growing:** |
13 | 10 | ``` |
14 | | -07:54 5.5 GB (post-GC valley) |
15 | | -08:54 6.2 GB |
16 | | -09:44 6.6 GB |
17 | | -11:24 6.6 GB |
18 | | -13:14 9.1 GB |
| 11 | +Time rts_live rts_heap rts_large rts_frag clients non-large |
| 12 | +07:19 7.5 GB 8.2 GB 5.5 GB 0.03 GB 14,000 2.0 GB |
| 13 | +07:24 6.4 GB 10.8 GB 5.2 GB 3.6 GB 14,806 1.2 GB |
| 14 | +07:29 8.2 GB 10.8 GB 6.5 GB 1.8 GB 15,667 1.7 GB |
| 15 | +07:34 10.0 GB 12.3 GB 7.9 GB 1.4 GB 15,845 2.1 GB |
| 16 | +07:39 6.7 GB 13.0 GB 5.3 GB 5.6 GB 16,589 1.4 GB |
| 17 | +07:44 8.5 GB 13.0 GB 6.7 GB 3.7 GB 16,283 1.8 GB |
| 18 | +07:49 6.5 GB 13.0 GB 5.2 GB 5.8 GB 16,532 1.3 GB |
| 19 | +07:54 6.0 GB 13.0 GB 4.8 GB 6.3 GB 16,636 1.2 GB |
| 20 | +07:59 6.4 GB 13.0 GB 5.1 GB 5.9 GB 16,769 1.3 GB |
| 21 | +08:04 8.3 GB 13.0 GB 6.5 GB 3.9 GB 17,352 1.8 GB |
| 22 | +08:09 10.2 GB 13.0 GB 8.0 GB 1.9 GB 17,053 2.2 GB |
| 23 | +08:14 5.6 GB 13.0 GB 4.5 GB 6.8 GB 17,147 1.1 GB |
| 24 | +08:19 7.6 GB 13.0 GB 6.1 GB 4.6 GB 17,496 1.5 GB |
19 | 25 | ``` |
20 | | -The post-GC floor is rising: **+3.6 GB over 5.5 hours**. This confirms a genuine leak. |
21 | 26 |
|
22 | | -**But smpQSubs is NOT the cause** — it oscillates between 1.2M-1.4M, not growing monotonically. At ~130 bytes/entry, 1.4M entries = ~180MB. Can't explain 9GB. |
| 27 | +non-large = rts_live - rts_large (normal Haskell heap objects: Maps, TVars, closures) |
| 28 | + |
| 29 | +### Mar 19 Data (5.5 hours, 07:49-13:19) |
| 30 | + |
| 31 | +rts_heap grew from 10.1 GB to 20.7 GB over 5.5 hours. |
| 32 | +Post-GC rts_live floor rose from 5.5 GB to 9.1 GB. |
| 33 | + |
| 34 | +### Findings |
| 35 | + |
| 36 | +**1. Large/pinned objects dominate live data (60-80%)** |
| 37 | + |
| 38 | +`rts_large` = 4.5-8.0 GB out of 5.6-10.2 GB live. These are allocations > ~3KB that go on GHC's large object heap. They oscillate (not growing monotonically), meaning they are being allocated and freed constantly — transient, not leaked. |
| 39 | + |
| 40 | +**2. Fragmentation is the heap growth mechanism** |
| 41 | + |
| 42 | +`rts_heap ≈ rts_live + rts_frag`. The heap grows because pinned/large objects fragment GHC's block allocator. Once GHC expands the heap, it never shrinks. Growth pattern: |
| 43 | +- Large objects allocated → occupy blocks |
| 44 | +- Large objects freed → blocks can't be reused if ANY other object shares the block |
| 45 | +- New allocations need fresh blocks → heap expands |
| 46 | +- Heap never returns memory to OS |
| 47 | + |
| 48 | +**3. Non-large heap data is stable (~1.0-2.2 GB)** |
| 49 | + |
| 50 | +Normal Haskell objects (Maps, TVars, closures, client structures) account for only 1-2 GB. This scales with client count at ~100-130 KB/client and does NOT grow over time. |
| 51 | + |
| 52 | +**4. All tracked data structures are NOT the cause** |
| 53 | + |
| 54 | +- `clientSndQ=0, clientMsgQ=0` — TBQueues empty, no message accumulation |
| 55 | +- `smpQSubs` oscillates ~1.0-1.4M — entries are cleaned up, not leaking |
| 56 | +- `ntfStore` < 2K entries — negligible |
| 57 | +- All proxy agent maps near 0 |
| 58 | +- `loadedQ=0` — useCache=False confirmed working |
| 59 | + |
| 60 | +**5. Source of large objects is unclear without heap profiling** |
| 61 | + |
| 62 | +The 4.5-8.0 GB of large objects could come from: |
| 63 | +- PostgreSQL driver (`postgresql-simple`/`libpq`) — pinned ByteStrings for query results |
| 64 | +- TLS library (`tls`) — pinned buffers per connection |
| 65 | +- Network socket I/O — pinned ByteStrings for recv/send |
| 66 | +- SMP protocol message blocks |
| 67 | + |
| 68 | +Cannot distinguish between these without `-hT` heap profiling (which is too expensive for this server). |
| 69 | + |
| 70 | +### Root Cause |
| 71 | + |
| 72 | +**GHC heap fragmentation from constant churn of large/pinned ByteString allocations.** |
| 73 | + |
| 74 | +Not a data structure leak. The live data itself is reasonable (5-10 GB for 15-17K clients). The problem is that GHC's copying GC cannot compact around pinned objects, so the heap grows with fragmentation and never shrinks. |
| 75 | + |
| 76 | +### Mitigation Options |
23 | 77 |
|
24 | | -**clients** oscillates 14K-20K, also not monotonically growing. |
| 78 | +All are RTS flag changes — no rebuild needed, reversible by restart. |
25 | 79 |
|
26 | | -**Everything else is tiny**: ntfStore ~7K entries (<1MB), paClients ~350 (~50KB), all other metrics near 0. |
| 80 | +**1. `-F1.2`** (reduce GC trigger factor from default 2.0) |
| 81 | +- Triggers major GC when heap reaches 1.2x live data instead of 2x |
| 82 | +- Reclaims fragmented blocks sooner |
| 83 | +- Trade-off: more frequent GC, slightly higher CPU |
| 84 | +- Risk: low — just makes GC run more often |
27 | 85 |
|
28 | | -**The leak is in something we're not measuring.** ~6-9GB of live data is unaccounted for by all tracked structures. The most likely candidates are: |
| 86 | +**2. Reduce `-A16m` to `-A4m`** (smaller nursery) |
| 87 | +- More frequent minor GC → short-lived pinned objects freed faster |
| 88 | +- Trade-off: more GC cycles, but each is smaller |
| 89 | +- Risk: low — may actually improve latency by reducing GC pause times |
29 | 90 |
|
30 | | -1. **Per-client state we didn't measure** — the *contents* of TBQueues (buffered messages), per-client `subscriptions` TMap contents (Sub records with TVars) |
31 | | -2. **TLS connection buffers** — the `tls` library allocates internal state per connection |
32 | | -3. **Pinned ByteStrings** from PostgreSQL queries — these aren't collected by normal GC |
33 | | -4. **GHC heap fragmentation** — pinned objects cause block-level fragmentation |
| 91 | +**3. `+RTS -xn`** (nonmoving GC) |
| 92 | +- Designed for pinned-heavy workloads — avoids copying entirely |
| 93 | +- Available since GHC 8.10, improved in 9.x |
| 94 | +- Trade-off: different GC characteristics, less battle-tested |
| 95 | +- Risk: medium — different GC algorithm, should test first |
34 | 96 |
|
35 | | -The next step is either: |
36 | | -- **Add more metrics**: measure total TBQueue fill across all clients, total subscription count, and pinned byte count from RTS stats |
37 | | -- **Run with `-hT`**: heap profiling by type to see exactly what's consuming memory |
| 97 | +**4. Limit concurrent connections** (application-level) |
| 98 | +- Since large objects scale per-client, fewer clients = less fragmentation |
| 99 | +- Trade-off: reduced capacity |
| 100 | +- Risk: low but impacts users |
0 commit comments