Skip to content

Commit 8bdf1e2

Browse files
authored
Merge pull request #3273 from mneedham/fast-videos
first 4 videos
2 parents cb3c66e + 49ef62d commit 8bdf1e2

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

docs/en/concepts/why-clickhouse-is-so-fast.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ From an architectural perspective, databases consist (at least) of a storage lay
1313

1414
## Storage Layer: Concurrent inserts are isolated from each other
1515

16+
<iframe width="768" height="432" src="https://www.youtube.com/embed/vsykFYns0Ws?si=hE2qnOf6cDKn-otP" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
17+
1618
In ClickHouse, each table consists of multiple "table parts". A [part](/docs/en/parts) is created whenever a user inserts data into the table (INSERT statement). A query is always executed against all table parts that exist at the time the query starts.
1719

1820
To avoid that too many parts accumulate, ClickHouse runs a [merge](/docs/en/merges) operation in the background which continuously combines multiple smaller parts into a single bigger part.
@@ -21,10 +23,14 @@ This approach has several advantages: All data processing can be [offloaded to b
2123

2224
## Storage Layer: Concurrent inserts and selects are isolated
2325

26+
<iframe width="768" height="432" src="https://www.youtube.com/embed/dvGlPh2bJFo?si=F3MSALPpe0gAoq5k" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
27+
2428
Inserts are fully isolated from SELECT queries, and merging inserted data parts happens in the background without affecting concurrent queries.
2529

2630
## Storage Layer: Merge-time computation
2731

32+
<iframe width="768" height="432" src="https://www.youtube.com/embed/_w3zQg695c0?si=g0Wa_Petn-LcmC-6" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
33+
2834
Unlike other databases, ClickHouse keeps data writes lightweight and efficient by performing all additional data transformations during the [merge](/docs/en/merges) background process. Examples of this include:
2935

3036
- **Replacing merges** which retain only the most recent version of a row in the input parts and discard all other row versions. Replacing merges can be thought of as a merge-time cleanup operation.
@@ -41,6 +47,8 @@ On the other hand, the majority of the runtime of merges is consumed by loading
4147

4248
## Storage Layer: Data pruning
4349

50+
<iframe width="768" height="432" src="https://www.youtube.com/embed/UJpVAx7o1aY?si=w-AfhBcRIO-e3Ysj" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
51+
4452
In practice, many queries are repetitive, i.e., run unchanged or only with slight modifications (e.g. different parameter values) in periodic intervals. Running the same or similar queries again and again allows adding indexes or re-organize the data in a way that frequent queries can access it faster. This approach is also known as "data pruning" and ClickHouse provides three techniques for that:
4553

4654
1. [Primary key indexes](https://clickhouse.com/docs/en/optimize/sparse-primary-indexes) which define the sort order of the table data. A well-chosen primary key allows to evaluate filters (like the WHERE clauses in the above query) using fast binary searches instead of full-column scans. In more technical terms, the runtime of scans becomes logarithmic instead of linear in the data size.

0 commit comments

Comments
 (0)