You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/en/engines/mergetree-table-engine-family/pick-keys.md
+9-8Lines changed: 9 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,18 +9,19 @@ description: >-
9
9
Optimizing ClickHouse® MergeTree tables
10
10
---
11
11
12
-
Good `order by` usually have 3 to 5 columns, from lowest cardinal on the left (and the most important for filtering) to highest cardinal (and less important for filtering).
12
+
Good `order by` usually has 3 to 5 columns, from lowest cardinal on the left (and the most important for filtering) to highest cardinal (and less important for filtering).
13
13
14
-
Practical approach to create an good ORDER BY for a table:
14
+
Practical approach to create a good ORDER BY for a table:
15
15
16
16
1. Pick the columns you use in filtering always
17
-
2. The most important for filtering and the lowest cardinal should be the left-most. Typically it's something like `tenant_id`
18
-
3. Next column is more cardinal, less important. It can be rounded time sometimes, or `site_id`, or `source_id`, or `group_id` or something similar.
17
+
2. The most important for filtering and the lowest cardinal should be the left-most. Typically, it's something like `tenant_id`
18
+
3. Next column is more cardinal, less important. It can be a rounded time sometimes, or `site_id`, or `source_id`, or `group_id` or something similar.
19
19
4. Repeat step 3 once again (or a few times)
20
20
5. If you already added all columns important for filtering and you're still not addressing a single row with your pk - you can add more columns which can help to put similar records close to each other (to improve the compression)
21
-
6. If you have something like hierarchy / tree-like relations between the columns - put there the records from 'root' to 'leaves' for example (continent, country, cityname). This way ClickHouse® can do lookup by country / city even if continent is not specified (it will just 'check all continents')
21
+
6. If you have something like hierarchy / tree-like relations between the columns - put there the records from 'root' to 'leaves' for example (continent, country, cityname). This way ClickHouse® can do a lookup by country/city even if the continent is not specified (it will just 'check all continents')
22
22
special variants of MergeTree may require special ORDER BY to make the record unique etc.
23
-
7. For [timeseries](https://altinity.com/blog/2019-5-23-handling-variable-time-series-efficiently-in-clickhouse) it usually make sense to put timestamp as latest column in ORDER BY, it helps with putting the same data near by for better locality. There is only 2 major patterns for timestamps in ORDER BY: (..., toStartOf(Day|Hour|...)(timestamp), ..., timestamp) and (..., timestamp). First one is useful when your often query small part of table partition. (table partitioned by months and your read only 1-4 days 90% of times)
23
+
7. For [timeseries](https://altinity.com/blog/2019-5-23-handling-variable-time-series-efficiently-in-clickhouse), it usually makes sense to put the timestamp as the latest column in ORDER BY, which helps with putting the same data nearby for better locality. There are only 2 major patterns for timestamps in ORDER BY: (..., toStartOf(Day|Hour|...)(timestamp), ..., timestamp) and (..., timestamp). The first one is useful when you often query a small part of a table partition. (table partitioned by months, and you read only 1-4 days 90% of the time).
24
+
8. There are exceptions to the rule "low cordinality - first" related to compression ratio. For example, data with a lot of repeated attributes in rows (like clickstream), ordering by session_id will benefit compression and reduce disk read, while setting a low cardinality column (like event type) in the first place makes compression and overall query time worse.
0 commit comments