Skip to content

Commit f07b6b6

Browse files
Fix grammar and clarity in pick-keys.md
Correct grammatical errors and improve clarity in the text.
1 parent 4add229 commit f07b6b6

File tree

1 file changed

+9
-8
lines changed
  • content/en/engines/mergetree-table-engine-family

1 file changed

+9
-8
lines changed

content/en/engines/mergetree-table-engine-family/pick-keys.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,18 +9,19 @@ description: >-
99
Optimizing ClickHouse® MergeTree tables
1010
---
1111

12-
Good `order by` usually have 3 to 5 columns, from lowest cardinal on the left (and the most important for filtering) to highest cardinal (and less important for filtering).
12+
Good `order by` usually has 3 to 5 columns, from lowest cardinal on the left (and the most important for filtering) to highest cardinal (and less important for filtering).
1313

14-
Practical approach to create an good ORDER BY for a table:
14+
Practical approach to create a good ORDER BY for a table:
1515

1616
1. Pick the columns you use in filtering always
17-
2. The most important for filtering and the lowest cardinal should be the left-most. Typically it's something like `tenant_id`
18-
3. Next column is more cardinal, less important. It can be rounded time sometimes, or `site_id`, or `source_id`, or `group_id` or something similar.
17+
2. The most important for filtering and the lowest cardinal should be the left-most. Typically, it's something like `tenant_id`
18+
3. Next column is more cardinal, less important. It can be a rounded time sometimes, or `site_id`, or `source_id`, or `group_id` or something similar.
1919
4. Repeat step 3 once again (or a few times)
2020
5. If you already added all columns important for filtering and you're still not addressing a single row with your pk - you can add more columns which can help to put similar records close to each other (to improve the compression)
21-
6. If you have something like hierarchy / tree-like relations between the columns - put there the records from 'root' to 'leaves' for example (continent, country, cityname). This way ClickHouse® can do lookup by country / city even if continent is not specified (it will just 'check all continents')
21+
6. If you have something like hierarchy / tree-like relations between the columns - put there the records from 'root' to 'leaves' for example (continent, country, cityname). This way ClickHouse® can do a lookup by country/city even if the continent is not specified (it will just 'check all continents')
2222
special variants of MergeTree may require special ORDER BY to make the record unique etc.
23-
7. For [timeseries](https://altinity.com/blog/2019-5-23-handling-variable-time-series-efficiently-in-clickhouse) it usually make sense to put timestamp as latest column in ORDER BY, it helps with putting the same data near by for better locality. There is only 2 major patterns for timestamps in ORDER BY: (..., toStartOf(Day|Hour|...)(timestamp), ..., timestamp) and (..., timestamp). First one is useful when your often query small part of table partition. (table partitioned by months and your read only 1-4 days 90% of times)
23+
7. For [timeseries](https://altinity.com/blog/2019-5-23-handling-variable-time-series-efficiently-in-clickhouse), it usually makes sense to put the timestamp as the latest column in ORDER BY, which helps with putting the same data nearby for better locality. There are only 2 major patterns for timestamps in ORDER BY: (..., toStartOf(Day|Hour|...)(timestamp), ..., timestamp) and (..., timestamp). The first one is useful when you often query a small part of a table partition. (table partitioned by months, and you read only 1-4 days 90% of the time).
24+
8. There are exceptions to the rule "low cordinality - first" related to compression ratio. For example, data with a lot of repeated attributes in rows (like clickstream), ordering by session_id will benefit compression and reduce disk read, while setting a low cardinality column (like event type) in the first place makes compression and overall query time worse.
2425

2526
Some examples of good `ORDER BY`:
2627
```
@@ -38,9 +39,9 @@ PRIMARY KEY (site_id, toStartOfHour(timestamp), sessionid)
3839

3940
All dimensions go to ORDER BY, all metrics - outside of that.
4041

41-
The most important for filtering columns with the lowest cardinality should be the left most.
42+
The most important for filtering columns with the lowest cardinality should be the left-most.
4243

43-
If number of dimensions is high it's typically make sense to use a prefix of ORDER BY as a PRIMARY KEY to avoid polluting sparse index.
44+
If the number of dimensions is high, it typically makes sense to use a prefix of ORDER BY as a PRIMARY KEY to avoid polluting the sparse index.
4445

4546
Examples:
4647

0 commit comments

Comments
 (0)