Skip to content

Commit d8b6625

Browse files
authored
Merge pull request #4172 from Blargian/oom_knowledgebase_article
Knowledge base: add article on OOM queries
2 parents deab992 + 1a5e792 commit d8b6625

File tree

2 files changed

+76
-0
lines changed

2 files changed

+76
-0
lines changed
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
title: 'Memory limit exceeded for query'
3+
description: 'Troubleshooting memory limit exceeded errors for a query'
4+
date: 2025-07-25
5+
tags: ['Errors and Exceptions']
6+
keywords: ['OOM', 'memory limit exceeded']
7+
---
8+
9+
{frontMatter.description}
10+
{/* truncate */}
11+
12+
import Image from '@theme/IdealImage';
13+
import joins from '@site/static/images/knowledgebase/memory-limit-exceeded-for-query.png';
14+
15+
## Memory limit exceeded for query {#troubleshooting-out-of-memory-issues}
16+
17+
As a new user, ClickHouse can often seem like magic - every query is super fast,
18+
even on the largest datasets and most ambitious queries. Invariably though,
19+
real-world usage tests even the limits of ClickHouse. Queries exceeding memory
20+
can be the result of a number of causes. Most commonly, we see large joins or
21+
aggregations on high cardinality fields. If performance is critical, and these
22+
queries are required, we often recommend users simply scale up - something
23+
ClickHouse Cloud does automatically and effortlessly to ensure your queries
24+
remain responsive. We appreciate, however, that in self-managed scenarios,
25+
this is sometimes not trivial, and maybe optimal performance is not even required.
26+
Users, in this case, have a few options.
27+
28+
### Aggregations {#aggregations}
29+
30+
For memory-intensive aggregations or sorting scenarios, users can use the settings
31+
[`max_bytes_before_external_group_by`](/operations/settings/settings#max_bytes_before_external_group_by)
32+
and [`max_bytes_before_external_sort`](/operations/settings/settings#max_bytes_ratio_before_external_sort) respectively.
33+
The former of which is discussed extensively [here](/sql-reference/statements/select/group-by/#group-by-in-external-memory).
34+
35+
In summary, this ensures any aggregations can “spill” out to disk if a memory
36+
threshold is exceeded. This will invariably impact query performance but will
37+
help ensure queries do not OOM. The latter sorting setting helps address similar
38+
issues with memory-intensive sorts. This can be particularly important in
39+
distributed environments where a coordinating node receives sorted responses
40+
from child shards. In this case, the coordinating server can be asked to sort a
41+
dataset larger than its available memory. With [`max_bytes_before_external_sort`](/operations/settings/settings#max_bytes_ratio_before_external_sort),
42+
sorting can be allowed to spill over to disk. This setting is also helpful for
43+
cases where the user has an `ORDER BY` after a `GROUP BY` with a `LIMIT`,
44+
especially in cases where the query is distributed.
45+
46+
### Joins {#joins}
47+
48+
For joins, users can select different `JOIN` algorithms, which can assist in
49+
lowering the required memory. By default, joins use the hash join, which offers
50+
the most completeness with respect to features and often the best performance.
51+
This algorithm loads the right-hand table of the `JOIN` into an in-memory hash
52+
table, against which the left-hand table is then evaluated. To minimize memory,
53+
users should thus place the smaller table on the right side. This approach still
54+
has limitations in memory-bound cases, however. In these cases, `partial_merge`
55+
join can be enabled via the [`join_algorithm`](/operations/settings/settings#join_algorithm)
56+
setting. This derivative of the [sort-merge algorithm](https://en.wikipedia.org/wiki/Sort-merge_join),
57+
first sorts the right table into blocks and creates a min-max index for them.
58+
It then sorts parts of the left table by the join key and joins them over the
59+
right table. The min-max index is used to skip unneeded right table blocks.
60+
This is less memory-intensive at the expense of performance. Taking this concept
61+
further, the `full_sorting_merge` algorithm allows a `JOIN` to be performed when
62+
the right-hand side is very large and doesn't fit into memory and lookups are
63+
impossible, e.g. a complex subquery. In this case, both the right and left side
64+
are sorted on disk if they do not fit in memory, allowing large tables to be
65+
joined.
66+
67+
<Image img={joins} size="md" alt="Joins algorithms"/>
68+
69+
Since 20.3, ClickHouse has supported an auto value for the `join_algorithm` setting.
70+
This instructs ClickHouse to apply an adaptive join approach, where the hash-join
71+
algorithm is preferred until memory limits are violated, at which point the
72+
partial_merge algorithm is attempted. Finally, concerning joins, we encourage
73+
readers to be aware of the behavior of distributed joins and how to minimize
74+
their memory consumption. More information can be found [here](/sql-reference/operators/in#distributed-subqueries).
75+
76+
123 KB
Loading

0 commit comments

Comments
 (0)