|
| 1 | +--- |
| 2 | +title: "High Memory Usage During Merge in system.metric_log" |
| 3 | +linkTitle: "Merge Memory in metric_log" |
| 4 | +weight: 100 |
| 5 | +description: >- |
| 6 | + Resolving excessive memory consumption during merges in the ClickHouse® system.metric_log table. |
| 7 | +--- |
| 8 | + |
| 9 | +# Problem: High Memory Usage During Merge in `system.metric_log` |
| 10 | + |
| 11 | +## Overview |
| 12 | + |
| 13 | +In recent versions of ClickHouse®, the **merge process (part compaction)** in the `system.metric_log` table can consume a large amount of memory. |
| 14 | +The issue arises due to an **unfortunate combination of settings**, where: |
| 15 | + |
| 16 | +* the merge is already large enough to produce **wide parts**, |
| 17 | +* but not yet large enough to enable **vertical merges**. |
| 18 | + |
| 19 | +This problem has become more pronounced in newer ClickHouse® versions because the `system.metric_log` table has **expanded significantly** — many new metrics were added, increasing the total number of columns. |
| 20 | + |
| 21 | +> **Wide vs Compact** — storage formats for table parts: |
| 22 | +> * *Wide* — each column is stored in a separate file (more efficient for large datasets). |
| 23 | +> * *Compact* — all data is stored in a single file (more efficient for small inserts). |
| 24 | +> |
| 25 | +> **Horizontal vs Vertical merge** — algorithms for combining data during merges: |
| 26 | +> * *Horizontal merge* reads and merges all columns at once — meaning all files are opened simultaneously, and buffers are allocated for each column and each part. |
| 27 | +> * *Vertical merge* processes columns in batches — first merging only columns from `ORDER BY`, then the rest one by one. This approach **significantly reduces memory usage**. |
| 28 | +
|
| 29 | +The most memory-intensive scenario is a **horizontal merge of wide parts** in a table with a large number of columns. |
| 30 | + |
| 31 | +--- |
| 32 | + |
| 33 | +## Demonstrating the Problem |
| 34 | + |
| 35 | +The issue can be reproduced easily by adjusting a few settings: |
| 36 | + |
| 37 | +```sql |
| 38 | +ALTER TABLE system.metric_log MODIFY SETTING min_bytes_for_wide_part = 100; |
| 39 | +OPTIMIZE TABLE system.metric_log FINAL; |
| 40 | +```` |
| 41 | + |
| 42 | +Example log output: |
| 43 | + |
| 44 | +``` |
| 45 | +[c9d66aa9f9d1] 2025.11.10 10:04:59.091067 [97] <Debug> MemoryTracker: Background process (mutate/merge) peak memory usage: 6.00 GiB. |
| 46 | +``` |
| 47 | +
|
| 48 | +**The merge consumed 6 GB of memory** — far too much for this table. |
| 49 | +
|
| 50 | +--- |
| 51 | +
|
| 52 | +## Vertical Merges Are Not Affected |
| 53 | +
|
| 54 | +If you explicitly force vertical merges, memory consumption normalizes, although the process becomes slightly slower: |
| 55 | +
|
| 56 | +```sql |
| 57 | +ALTER TABLE system.metric_log MODIFY SETTING |
| 58 | + min_bytes_for_wide_part = 100, |
| 59 | + vertical_merge_algorithm_min_rows_to_activate = 1; |
| 60 | +
|
| 61 | +OPTIMIZE TABLE system.metric_log FINAL; |
| 62 | +``` |
| 63 | + |
| 64 | +Example log output: |
| 65 | + |
| 66 | +``` |
| 67 | +[c9d66aa9f9d1] 2025.11.10 10:06:14.575832 [97] <Debug> MemoryTracker: Background process (mutate/merge) peak memory usage: 13.98 MiB. |
| 68 | +``` |
| 69 | + |
| 70 | +Now memory usage **drops from 6 GB to only 14 MB**. |
| 71 | + |
| 72 | +--- |
| 73 | + |
| 74 | +## Root Cause |
| 75 | + |
| 76 | +The problem stems from the fact that: |
| 77 | + |
| 78 | +* the threshold for enabling *wide* parts is configured in **bytes** (`min_bytes_for_wide_part`); |
| 79 | +* while the threshold for enabling *vertical merges* is configured in **rows** (`vertical_merge_algorithm_min_rows_to_activate`). |
| 80 | + |
| 81 | +When a table contains very **wide rows** (many lightweight columns), this mismatch causes wide parts to appear too early, while vertical merges are triggered much later. |
| 82 | + |
| 83 | +--- |
| 84 | + |
| 85 | +## Default Settings |
| 86 | + |
| 87 | +| Parameter | Value | |
| 88 | +| ------------------------------------------------ | ---------------- | |
| 89 | +| `vertical_merge_algorithm_min_rows_to_activate` | 131072 | |
| 90 | +| `vertical_merge_algorithm_min_bytes_to_activate` | 0 | |
| 91 | +| `min_bytes_for_wide_part` | 10485760 (10 MB) | |
| 92 | +| `min_rows_for_wide_part` | 0 | |
| 93 | + |
| 94 | +The average row size in `metric_log` is approximately **2.8 KB**, meaning wide parts are created after roughly: |
| 95 | + |
| 96 | +``` |
| 97 | +10485760 / 2800 ≈ 3744 rows |
| 98 | +``` |
| 99 | + |
| 100 | +Meanwhile, the vertical merge algorithm activates only after **131 072 rows** — much later. |
| 101 | + |
| 102 | +--- |
| 103 | + |
| 104 | +## Possible Solutions |
| 105 | + |
| 106 | +1. **Increase `min_bytes_for_wide_part`** |
| 107 | + For example, set it to at least `2800 * 131072 ≈ 350 MB`. |
| 108 | + This delays the switch to the wide format until vertical merges can also be used. |
| 109 | + |
| 110 | +2. **Switch to a row-based threshold** |
| 111 | + Use `min_rows_for_wide_part` instead of `min_bytes_for_wide_part`. |
| 112 | + |
| 113 | +3. **Lower the threshold for vertical merges** |
| 114 | + Reduce `vertical_merge_algorithm_min_rows_to_activate`, |
| 115 | + or add a value for `vertical_merge_algorithm_min_bytes_to_activate`. |
| 116 | + |
| 117 | +--- |
| 118 | + |
| 119 | +## Example Local Fix for `metric_log` |
| 120 | + |
| 121 | +Apply the configuration below, then restart ClickHouse® and drop the `metric_log` table (so it will be recreated with the updated settings): |
| 122 | + |
| 123 | +```xml |
| 124 | +<metric_log replace="1"> |
| 125 | + <database>system</database> |
| 126 | + <table>metric_log</table> |
| 127 | + <engine> |
| 128 | + ENGINE = MergeTree |
| 129 | + PARTITION BY (event_date) |
| 130 | + ORDER BY (event_time) |
| 131 | + TTL event_date + INTERVAL 14 DAY DELETE |
| 132 | + SETTINGS min_bytes_for_wide_part = 536870912; |
| 133 | + </engine> |
| 134 | + <flush_interval_milliseconds>7500</flush_interval_milliseconds> |
| 135 | +</metric_log> |
| 136 | +``` |
| 137 | + |
| 138 | +This configuration increases the threshold for wide parts to **512 MB**, preventing premature switching to the wide format and reducing memory usage during merges. |
| 139 | + |
| 140 | +The PR [#89811](https://github.com/ClickHouse/ClickHouse/pull/89811) introduces a similar improvement. |
| 141 | + |
| 142 | +--- |
| 143 | + |
| 144 | +## Global Fix (All Tables) |
| 145 | + |
| 146 | +In addition to `metric_log`, other tables may also be affected — particularly those with **average row sizes greater than ~80 bytes** and **hundreds of columns**. |
| 147 | + |
| 148 | +```xml |
| 149 | +<clickhouse> |
| 150 | + <merge_tree> |
| 151 | + <min_bytes_for_wide_part>134217728</min_bytes_for_wide_part> |
| 152 | + <vertical_merge_algorithm_min_bytes_to_activate>134217728</vertical_merge_algorithm_min_bytes_to_activate> |
| 153 | + </merge_tree> |
| 154 | +</clickhouse> |
| 155 | +``` |
| 156 | + |
| 157 | +These settings tell ClickHouse® to **keep using compact parts longer** |
| 158 | +and to **enable the vertical merge algorithm** simultaneously with the switch to the wide format, preventing sudden spikes in memory usage. |
| 159 | + |
| 160 | +--- |
| 161 | + |
| 162 | +### ⚠️ Potential Risks and Trade-offs |
| 163 | + |
| 164 | +Raising `min_bytes_for_wide_part` globally keeps more data in **compact parts**, which can both help and hurt depending on workload. Compact parts store all columns in a single `data.bin` file — this makes **inserts much faster**, especially for tables with **many columns**, since fewer files are created per part. It’s also a big advantage when storing data on **S3 or other object storage**, where every extra file adds latency and increases API call counts. |
| 165 | + |
| 166 | +The trade-off is that this layout makes **reads less efficient** for column-selective queries. Reading one or two columns from a large compact part means scanning and decompressing shared blocks instead of isolated files. It can also reduce cache locality, slightly worsen compression (different columns compressed together), and make **mutations or ALTERs** more expensive because each change rewrites the entire part. |
| 167 | + |
| 168 | +Lowering thresholds for vertical merges further decreases merge memory but may make the first merges slower, as they process columns sequentially. This configuration works best for **wide, append-only tables** or **S3-based storage**, while analytical tables with frequent updates or narrow schemas may perform better with defaults. If merge memory or S3 request overhead is your main concern, applying it globally is reasonable — otherwise, start with specific wide tables like `system.metric_log`, verify performance improvements, and expand gradually. |
| 169 | + |
| 170 | +--- |
| 171 | + |
| 172 | +✅ **Summary** |
| 173 | + |
| 174 | +The root issue is a mismatch between byte-based and row-based thresholds for wide parts and vertical merges. |
| 175 | +Aligning these values — by adjusting one or both parameters — stabilizes memory usage and prevents excessive RAM consumption during merges in `system.metric_log` and similar tables. |
0 commit comments