You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/en/concepts/why-clickhouse-is-so-fast.md
+8Lines changed: 8 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,6 +13,8 @@ From an architectural perspective, databases consist (at least) of a storage lay
13
13
14
14
## Storage Layer: Concurrent inserts are isolated from each other
15
15
16
+
<iframewidth="768"height="432"src="https://www.youtube.com/embed/vsykFYns0Ws?si=hE2qnOf6cDKn-otP"title="YouTube video player"frameborder="0"allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"referrerpolicy="strict-origin-when-cross-origin"allowfullscreen></iframe>
17
+
16
18
In ClickHouse, each table consists of multiple "table parts". A [part](/docs/en/parts) is created whenever a user inserts data into the table (INSERT statement). A query is always executed against all table parts that exist at the time the query starts.
17
19
18
20
To avoid that too many parts accumulate, ClickHouse runs a [merge](/docs/en/merges) operation in the background which continuously combines multiple smaller parts into a single bigger part.
@@ -21,10 +23,14 @@ This approach has several advantages: All data processing can be [offloaded to b
21
23
22
24
## Storage Layer: Concurrent inserts and selects are isolated
23
25
26
+
<iframewidth="768"height="432"src="https://www.youtube.com/embed/dvGlPh2bJFo?si=F3MSALPpe0gAoq5k"title="YouTube video player"frameborder="0"allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"referrerpolicy="strict-origin-when-cross-origin"allowfullscreen></iframe>
27
+
24
28
Inserts are fully isolated from SELECT queries, and merging inserted data parts happens in the background without affecting concurrent queries.
25
29
26
30
## Storage Layer: Merge-time computation
27
31
32
+
<iframewidth="768"height="432"src="https://www.youtube.com/embed/_w3zQg695c0?si=g0Wa_Petn-LcmC-6"title="YouTube video player"frameborder="0"allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"referrerpolicy="strict-origin-when-cross-origin"allowfullscreen></iframe>
33
+
28
34
Unlike other databases, ClickHouse keeps data writes lightweight and efficient by performing all additional data transformations during the [merge](/docs/en/merges) background process. Examples of this include:
29
35
30
36
-**Replacing merges** which retain only the most recent version of a row in the input parts and discard all other row versions. Replacing merges can be thought of as a merge-time cleanup operation.
@@ -41,6 +47,8 @@ On the other hand, the majority of the runtime of merges is consumed by loading
41
47
42
48
## Storage Layer: Data pruning
43
49
50
+
<iframewidth="768"height="432"src="https://www.youtube.com/embed/UJpVAx7o1aY?si=w-AfhBcRIO-e3Ysj"title="YouTube video player"frameborder="0"allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"referrerpolicy="strict-origin-when-cross-origin"allowfullscreen></iframe>
51
+
44
52
In practice, many queries are repetitive, i.e., run unchanged or only with slight modifications (e.g. different parameter values) in periodic intervals. Running the same or similar queries again and again allows adding indexes or re-organize the data in a way that frequent queries can access it faster. This approach is also known as "data pruning" and ClickHouse provides three techniques for that:
45
53
46
54
1.[Primary key indexes](https://clickhouse.com/docs/en/optimize/sparse-primary-indexes) which define the sort order of the table data. A well-chosen primary key allows to evaluate filters (like the WHERE clauses in the above query) using fast binary searches instead of full-column scans. In more technical terms, the runtime of scans becomes logarithmic instead of linear in the data size.
0 commit comments