Skip to content

Commit 84135f2

Browse files
authored
Create metric_log_ram.md
1 parent fbf2c73 commit 84135f2

File tree

1 file changed

+175
-0
lines changed

1 file changed

+175
-0
lines changed
Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
---
2+
title: "High Memory Usage During Merge in system.metric_log"
3+
linkTitle: "Merge Memory in metric_log"
4+
weight: 100
5+
description: >-
6+
Resolving excessive memory consumption during merges in the ClickHouse® system.metric_log table.
7+
---
8+
9+
# Problem: High Memory Usage During Merge in `system.metric_log`
10+
11+
## Overview
12+
13+
In recent versions of ClickHouse®, the **merge process (part compaction)** in the `system.metric_log` table can consume a large amount of memory.
14+
The issue arises due to an **unfortunate combination of settings**, where:
15+
16+
* the merge is already large enough to produce **wide parts**,
17+
* but not yet large enough to enable **vertical merges**.
18+
19+
This problem has become more pronounced in newer ClickHouse® versions because the `system.metric_log` table has **expanded significantly** — many new metrics were added, increasing the total number of columns.
20+
21+
> **Wide vs Compact** — storage formats for table parts:
22+
> * *Wide* — each column is stored in a separate file (more efficient for large datasets).
23+
> * *Compact* — all data is stored in a single file (more efficient for small inserts).
24+
>
25+
> **Horizontal vs Vertical merge** — algorithms for combining data during merges:
26+
> * *Horizontal merge* reads and merges all columns at once — meaning all files are opened simultaneously, and buffers are allocated for each column and each part.
27+
> * *Vertical merge* processes columns in batches — first merging only columns from `ORDER BY`, then the rest one by one. This approach **significantly reduces memory usage**.
28+
29+
The most memory-intensive scenario is a **horizontal merge of wide parts** in a table with a large number of columns.
30+
31+
---
32+
33+
## Demonstrating the Problem
34+
35+
The issue can be reproduced easily by adjusting a few settings:
36+
37+
```sql
38+
ALTER TABLE system.metric_log MODIFY SETTING min_bytes_for_wide_part = 100;
39+
OPTIMIZE TABLE system.metric_log FINAL;
40+
````
41+
42+
Example log output:
43+
44+
```
45+
[c9d66aa9f9d1] 2025.11.10 10:04:59.091067 [97] <Debug> MemoryTracker: Background process (mutate/merge) peak memory usage: 6.00 GiB.
46+
```
47+
48+
**The merge consumed 6 GB of memory** — far too much for this table.
49+
50+
---
51+
52+
## Vertical Merges Are Not Affected
53+
54+
If you explicitly force vertical merges, memory consumption normalizes, although the process becomes slightly slower:
55+
56+
```sql
57+
ALTER TABLE system.metric_log MODIFY SETTING
58+
min_bytes_for_wide_part = 100,
59+
vertical_merge_algorithm_min_rows_to_activate = 1;
60+
61+
OPTIMIZE TABLE system.metric_log FINAL;
62+
```
63+
64+
Example log output:
65+
66+
```
67+
[c9d66aa9f9d1] 2025.11.10 10:06:14.575832 [97] <Debug> MemoryTracker: Background process (mutate/merge) peak memory usage: 13.98 MiB.
68+
```
69+
70+
Now memory usage **drops from 6 GB to only 14 MB**.
71+
72+
---
73+
74+
## Root Cause
75+
76+
The problem stems from the fact that:
77+
78+
* the threshold for enabling *wide* parts is configured in **bytes** (`min_bytes_for_wide_part`);
79+
* while the threshold for enabling *vertical merges* is configured in **rows** (`vertical_merge_algorithm_min_rows_to_activate`).
80+
81+
When a table contains very **wide rows** (many lightweight columns), this mismatch causes wide parts to appear too early, while vertical merges are triggered much later.
82+
83+
---
84+
85+
## Default Settings
86+
87+
| Parameter | Value |
88+
| ------------------------------------------------ | ---------------- |
89+
| `vertical_merge_algorithm_min_rows_to_activate` | 131072 |
90+
| `vertical_merge_algorithm_min_bytes_to_activate` | 0 |
91+
| `min_bytes_for_wide_part` | 10485760 (10 MB) |
92+
| `min_rows_for_wide_part` | 0 |
93+
94+
The average row size in `metric_log` is approximately **2.8 KB**, meaning wide parts are created after roughly:
95+
96+
```
97+
10485760 / 2800 ≈ 3744 rows
98+
```
99+
100+
Meanwhile, the vertical merge algorithm activates only after **131 072 rows** — much later.
101+
102+
---
103+
104+
## Possible Solutions
105+
106+
1. **Increase `min_bytes_for_wide_part`**
107+
For example, set it to at least `2800 * 131072 ≈ 350 MB`.
108+
This delays the switch to the wide format until vertical merges can also be used.
109+
110+
2. **Switch to a row-based threshold**
111+
Use `min_rows_for_wide_part` instead of `min_bytes_for_wide_part`.
112+
113+
3. **Lower the threshold for vertical merges**
114+
Reduce `vertical_merge_algorithm_min_rows_to_activate`,
115+
or add a value for `vertical_merge_algorithm_min_bytes_to_activate`.
116+
117+
---
118+
119+
## Example Local Fix for `metric_log`
120+
121+
Apply the configuration below, then restart ClickHouse® and drop the `metric_log` table (so it will be recreated with the updated settings):
122+
123+
```xml
124+
<metric_log replace="1">
125+
<database>system</database>
126+
<table>metric_log</table>
127+
<engine>
128+
ENGINE = MergeTree
129+
PARTITION BY (event_date)
130+
ORDER BY (event_time)
131+
TTL event_date + INTERVAL 14 DAY DELETE
132+
SETTINGS min_bytes_for_wide_part = 536870912;
133+
</engine>
134+
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
135+
</metric_log>
136+
```
137+
138+
This configuration increases the threshold for wide parts to **512 MB**, preventing premature switching to the wide format and reducing memory usage during merges.
139+
140+
The PR [#89811](https://github.com/ClickHouse/ClickHouse/pull/89811) introduces a similar improvement.
141+
142+
---
143+
144+
## Global Fix (All Tables)
145+
146+
In addition to `metric_log`, other tables may also be affected — particularly those with **average row sizes greater than ~80 bytes** and **hundreds of columns**.
147+
148+
```xml
149+
<clickhouse>
150+
<merge_tree>
151+
<min_bytes_for_wide_part>134217728</min_bytes_for_wide_part>
152+
<vertical_merge_algorithm_min_bytes_to_activate>134217728</vertical_merge_algorithm_min_bytes_to_activate>
153+
</merge_tree>
154+
</clickhouse>
155+
```
156+
157+
These settings tell ClickHouse® to **keep using compact parts longer**
158+
and to **enable the vertical merge algorithm** simultaneously with the switch to the wide format, preventing sudden spikes in memory usage.
159+
160+
---
161+
162+
### ⚠️ Potential Risks and Trade-offs
163+
164+
Raising `min_bytes_for_wide_part` globally keeps more data in **compact parts**, which can both help and hurt depending on workload. Compact parts store all columns in a single `data.bin` file — this makes **inserts much faster**, especially for tables with **many columns**, since fewer files are created per part. It’s also a big advantage when storing data on **S3 or other object storage**, where every extra file adds latency and increases API call counts.
165+
166+
The trade-off is that this layout makes **reads less efficient** for column-selective queries. Reading one or two columns from a large compact part means scanning and decompressing shared blocks instead of isolated files. It can also reduce cache locality, slightly worsen compression (different columns compressed together), and make **mutations or ALTERs** more expensive because each change rewrites the entire part.
167+
168+
Lowering thresholds for vertical merges further decreases merge memory but may make the first merges slower, as they process columns sequentially. This configuration works best for **wide, append-only tables** or **S3-based storage**, while analytical tables with frequent updates or narrow schemas may perform better with defaults. If merge memory or S3 request overhead is your main concern, applying it globally is reasonable — otherwise, start with specific wide tables like `system.metric_log`, verify performance improvements, and expand gradually.
169+
170+
---
171+
172+
**Summary**
173+
174+
The root issue is a mismatch between byte-based and row-based thresholds for wide parts and vertical merges.
175+
Aligning these values — by adjusting one or both parameters — stabilizes memory usage and prevents excessive RAM consumption during merges in `system.metric_log` and similar tables.

0 commit comments

Comments
 (0)