@@ -49,7 +49,26 @@ ClickHouse automatically samples the primary keys and chooses a table that can
49
49
generate the same correct result, but requires the least amount of data to be
50
50
read as shown in the figure below:
51
51
52
- <Image img ={projections_1} size =" lg " alt =" Projections in ClickHouse " />
52
+ <Image img ={projections_1} size =" md " alt =" Projections in ClickHouse " />
53
+
54
+ ### Smarter storage with ` _part_offset `
55
+
56
+ Since version 25.5, ClickHouse supports the virtual column ` _part_offset ` in
57
+ projections which offers a new way to define a projection.
58
+
59
+ There are now two ways to define a projection:
60
+
61
+ - ** Store full columns (the original behavior)** : The projection contains full
62
+ data and can be read directly, offering faster performance when filters match
63
+ the projection’s sort order.
64
+
65
+ - ** Store only the sorting key + ` _part_offset ` ** : The projection works like an index.
66
+ ClickHouse uses the projection’s primary index to locate matching rows, but reads the
67
+ actual data from the base table. This reduces storage overhead at the cost of
68
+ slightly more I/O at query time.
69
+
70
+ The approaches above can also be mixed, storing some columns in the projection and
71
+ others indirectly via ` _part_offset ` .
53
72
54
73
## When to use Projections? {#when-to-use-projections}
55
74
@@ -68,8 +87,6 @@ users should be aware of and thus should be deployed sparingly.
68
87
69
88
- Projections don't allow using different TTL for the source table and the
70
89
(hidden) target table, materialized views allow different TTLs.
71
- - Projections don't currently support ` optimize_read_in_order ` for the (hidden)
72
- target table.
73
90
- Lightweight updates and deletes are not supported for tables with projections.
74
91
- Materialized Views can be chained: the target table of one Materialized View
75
92
can be the source table of another Materialized View, and so on. This is not
@@ -85,7 +102,7 @@ We recommend using projections when:
85
102
to exploit projections that use a simple reordering, i.e., ` SELECT * ORDER BY x ` .
86
103
Users can select a subset of columns in this expression to reduce storage
87
104
footprint.
88
- - Users are comfortable with the associated increase in storage footprint and
105
+ - Users are comfortable with the potential associated increase in storage footprint and
89
106
overhead of writing data twice. Test the impact on insertion speed and
90
107
[ evaluate the storage overhead] ( /data-compression/compression-in-clickhouse ) .
91
108
@@ -290,7 +307,7 @@ becomes `AggregatingMergeTree`, and all aggregate functions are converted to
290
307
The figure below is a visualization of the main table ` uk_price_paid_with_projections `
291
308
and its two projections:
292
309
293
- <Image img ={projections_2} size =" lg " alt =" Visualization of the main table uk_price_paid_with_projections and its two projections " />
310
+ <Image img ={projections_2} size =" md " alt =" Visualization of the main table uk_price_paid_with_projections and its two projections " />
294
311
295
312
If we now run the query that lists the counties in London for the three highest
296
313
paid prices again, we see an improvement in query performance:
@@ -516,6 +533,125 @@ LIMIT 100
516
533
517
534
Again, the result is the same but notice the improvement in query performance for the 2nd query.
518
535
536
+ ### Combining projections in one query {#combining-projections}
537
+
538
+ Starting in version 25.6, building on the ` _part_offset ` support introduced in
539
+ the previous version, ClickHouse can now use multiple projections to accelerate
540
+ a single query with multiple filters.
541
+
542
+ Importantly, ClickHouse still reads data from only one projection (or the base table),
543
+ but can use other projections' primary indexes to prune unnecessary parts before reading.
544
+ This is especially useful for queries that filter on multiple columns, each
545
+ potentially matching a different projection.
546
+
547
+ > Currently, this mechanism only prunes entire parts. Granule-level pruning is
548
+ not yet supported.
549
+
550
+ To demonstrate this, we define the table (with projections using ` _part_offset ` columns)
551
+ and insert five example rows matching the diagrams above.
552
+
553
+ ``` sql
554
+ CREATE TABLE page_views
555
+ (
556
+ id UInt64,
557
+ event_date Date ,
558
+ user_id UInt32,
559
+ url String,
560
+ region String,
561
+ PROJECTION region_proj
562
+ (
563
+ SELECT _part_offset ORDER BY region
564
+ ),
565
+ PROJECTION user_id_proj
566
+ (
567
+ SELECT _part_offset ORDER BY user_id
568
+ )
569
+ )
570
+ ENGINE = MergeTree
571
+ ORDER BY (event_date, id);
572
+ SETTINGS
573
+ index_granularity = 1 , -- one row per granule
574
+ max_bytes_to_merge_at_max_space_in_pool = 1 ; -- disable merge
575
+ ```
576
+
577
+ Then we insert data into the table:
578
+
579
+ ``` sql
580
+ INSERT INTO page_views VALUES (
581
+ 1 , ' 2025-07-01' , 101 , ' https://example.com/page1' , ' europe' );
582
+ INSERT INTO page_views VALUES (
583
+ 2 , ' 2025-07-01' , 102 , ' https://example.com/page2' , ' us_west' );
584
+ INSERT INTO page_views VALUES (
585
+ 3 , ' 2025-07-02' , 106 , ' https://example.com/page3' , ' us_west' );
586
+ INSERT INTO page_views VALUES (
587
+ 4 , ' 2025-07-02' , 107 , ' https://example.com/page4' , ' us_west' );
588
+ INSERT INTO page_views VALUES (
589
+ 5 , ' 2025-07-03' , 104 , ' https://example.com/page5' , ' asia' );
590
+ ```
591
+
592
+ ::: note
593
+ Note: The table uses custom settings for illustration, such as one-row granules
594
+ and disabled part merges, which are not recommended for production use.
595
+ :::
596
+
597
+ This setup produces:
598
+ - Five separate parts (one per inserted row)
599
+ - One primary index entry per row (in the base table and each projection)
600
+ - Each part contains exactly one row
601
+
602
+ With this setup, we run a query filtering on both ` region ` and ` user_id ` .
603
+ Since the base table’s primary index is built from ` event_date ` and ` id ` , it
604
+ is unhelpful here, ClickHouse therefore uses:
605
+
606
+ - ` region_proj ` to prune parts by region
607
+ - ` user_id_proj ` to further prune by ` user_id `
608
+
609
+ This behavior is visible using ` EXPLAIN projections = 1 ` , which shows how
610
+ ClickHouse selects and applies projections.
611
+
612
+ ``` sql
613
+ EXPLAIN projections= 1
614
+ SELECT * FROM page_views WHERE region = ' us_west' AND user_id = 107 ;
615
+ ```
616
+
617
+ ``` response
618
+ ┌─explain────────────────────────────────────────────────────────────────────────────────┐
619
+ 1. │ Expression ((Project names + Projection)) │
620
+ 2. │ Expression │
621
+ 3. │ ReadFromMergeTree (default.page_views) │
622
+ 4. │ Projections: │
623
+ 5. │ Name: region_proj │
624
+ 6. │ Description: Projection has been analyzed and is used for part-level filtering │
625
+ 7. │ Condition: (region in ['us_west', 'us_west']) │
626
+ 8. │ Search Algorithm: binary search │
627
+ 9. │ Parts: 3 │
628
+ 10. │ Marks: 3 │
629
+ 11. │ Ranges: 3 │
630
+ 12. │ Rows: 3 │
631
+ 13. │ Filtered Parts: 2 │
632
+ 14. │ Name: user_id_proj │
633
+ 15. │ Description: Projection has been analyzed and is used for part-level filtering │
634
+ 16. │ Condition: (user_id in [107, 107]) │
635
+ 17. │ Search Algorithm: binary search │
636
+ 18. │ Parts: 1 │
637
+ 19. │ Marks: 1 │
638
+ 20. │ Ranges: 1 │
639
+ 21. │ Rows: 1 │
640
+ 22. │ Filtered Parts: 2 │
641
+ └────────────────────────────────────────────────────────────────────────────────────────┘
642
+ ```
643
+
644
+ The ` EXPLAIN ` output (shown above) reveals the logical query plan, top to bottom:
645
+
646
+ | Row number | Description |
647
+ | ------------| ----------------------------------------------------------------------------------------------------------|
648
+ | 3 | Plans to read from the ` page_views ` base table |
649
+ | 5-13 | Uses ` region_proj ` to identify 3 parts where region = 'us_west', pruning 2 of the 5 parts |
650
+ | 14-22 | Uses user` _id_proj ` to identify 1 part where ` user_id = 107 ` , further pruning 2 of the 3 remaining parts |
651
+
652
+ In the end, just ** 1 out of 5 parts** is read from the base table.
653
+ By combining the index analysis of multiple projections, ClickHouse significantly reduces the amount of data scanned,
654
+ improving performance while keeping storage overhead low.
519
655
520
656
## Related content {#related-content}
521
657
- [ A Practical Introduction to Primary Indexes in ClickHouse] ( /guides/best-practices/sparse-primary-indexes#option-3-projections )
0 commit comments