Skip to content

Commit 9f88738

Browse files
committed
update projections docs
1 parent cb54bd4 commit 9f88738

File tree

4 files changed

+230
-57
lines changed

4 files changed

+230
-57
lines changed

docs/data-modeling/projections/1_projections.md

Lines changed: 141 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,26 @@ ClickHouse automatically samples the primary keys and chooses a table that can
4949
generate the same correct result, but requires the least amount of data to be
5050
read as shown in the figure below:
5151

52-
<Image img={projections_1} size="lg" alt="Projections in ClickHouse"/>
52+
<Image img={projections_1} size="md" alt="Projections in ClickHouse"/>
53+
54+
### Smarter storage with `_part_offset`
55+
56+
Since version 25.5, ClickHouse supports the virtual column `_part_offset` in
57+
projections which offers a new way to define a projection.
58+
59+
There are now two ways to define a projection:
60+
61+
- **Store full columns (the original behavior)**: The projection contains full
62+
data and can be read directly, offering faster performance when filters match
63+
the projection’s sort order.
64+
65+
- **Store only the sorting key + `_part_offset`**: The projection works like an index.
66+
ClickHouse uses the projection’s primary index to locate matching rows, but reads the
67+
actual data from the base table. This reduces storage overhead at the cost of
68+
slightly more I/O at query time.
69+
70+
The approaches above can also be mixed, storing some columns in the projection and
71+
others indirectly via `_part_offset`.
5372

5473
## When to use Projections? {#when-to-use-projections}
5574

@@ -68,8 +87,6 @@ users should be aware of and thus should be deployed sparingly.
6887

6988
- Projections don't allow using different TTL for the source table and the
7089
(hidden) target table, materialized views allow different TTLs.
71-
- Projections don't currently support `optimize_read_in_order` for the (hidden)
72-
target table.
7390
- Lightweight updates and deletes are not supported for tables with projections.
7491
- Materialized Views can be chained: the target table of one Materialized View
7592
can be the source table of another Materialized View, and so on. This is not
@@ -85,7 +102,7 @@ We recommend using projections when:
85102
to exploit projections that use a simple reordering, i.e., `SELECT * ORDER BY x`.
86103
Users can select a subset of columns in this expression to reduce storage
87104
footprint.
88-
- Users are comfortable with the associated increase in storage footprint and
105+
- Users are comfortable with the potential associated increase in storage footprint and
89106
overhead of writing data twice. Test the impact on insertion speed and
90107
[evaluate the storage overhead](/data-compression/compression-in-clickhouse).
91108

@@ -290,7 +307,7 @@ becomes `AggregatingMergeTree`, and all aggregate functions are converted to
290307
The figure below is a visualization of the main table `uk_price_paid_with_projections`
291308
and its two projections:
292309

293-
<Image img={projections_2} size="lg" alt="Visualization of the main table uk_price_paid_with_projections and its two projections"/>
310+
<Image img={projections_2} size="md" alt="Visualization of the main table uk_price_paid_with_projections and its two projections"/>
294311

295312
If we now run the query that lists the counties in London for the three highest
296313
paid prices again, we see an improvement in query performance:
@@ -516,6 +533,125 @@ LIMIT 100
516533

517534
Again, the result is the same but notice the improvement in query performance for the 2nd query.
518535

536+
### Combining projections in one query {#combining-projections}
537+
538+
Starting in version 25.6, building on the `_part_offset` support introduced in
539+
the previous version, ClickHouse can now use multiple projections to accelerate
540+
a single query with multiple filters.
541+
542+
Importantly, ClickHouse still reads data from only one projection (or the base table),
543+
but can use other projections' primary indexes to prune unnecessary parts before reading.
544+
This is especially useful for queries that filter on multiple columns, each
545+
potentially matching a different projection.
546+
547+
> Currently, this mechanism only prunes entire parts. Granule-level pruning is
548+
not yet supported.
549+
550+
To demonstrate this, we define the table (with projections using `_part_offset` columns)
551+
and insert five example rows matching the diagrams above.
552+
553+
```sql
554+
CREATE TABLE page_views
555+
(
556+
id UInt64,
557+
event_date Date,
558+
user_id UInt32,
559+
url String,
560+
region String,
561+
PROJECTION region_proj
562+
(
563+
SELECT _part_offset ORDER BY region
564+
),
565+
PROJECTION user_id_proj
566+
(
567+
SELECT _part_offset ORDER BY user_id
568+
)
569+
)
570+
ENGINE = MergeTree
571+
ORDER BY (event_date, id);
572+
SETTINGS
573+
index_granularity = 1, -- one row per granule
574+
max_bytes_to_merge_at_max_space_in_pool = 1; -- disable merge
575+
```
576+
577+
Then we insert data into the table:
578+
579+
```sql
580+
INSERT INTO page_views VALUES (
581+
1, '2025-07-01', 101, 'https://example.com/page1', 'europe');
582+
INSERT INTO page_views VALUES (
583+
2, '2025-07-01', 102, 'https://example.com/page2', 'us_west');
584+
INSERT INTO page_views VALUES (
585+
3, '2025-07-02', 106, 'https://example.com/page3', 'us_west');
586+
INSERT INTO page_views VALUES (
587+
4, '2025-07-02', 107, 'https://example.com/page4', 'us_west');
588+
INSERT INTO page_views VALUES (
589+
5, '2025-07-03', 104, 'https://example.com/page5', 'asia');
590+
```
591+
592+
:::note
593+
Note: The table uses custom settings for illustration, such as one-row granules
594+
and disabled part merges, which are not recommended for production use.
595+
:::
596+
597+
This setup produces:
598+
- Five separate parts (one per inserted row)
599+
- One primary index entry per row (in the base table and each projection)
600+
- Each part contains exactly one row
601+
602+
With this setup, we run a query filtering on both `region` and `user_id`.
603+
Since the base table’s primary index is built from `event_date` and `id`, it
604+
is unhelpful here, ClickHouse therefore uses:
605+
606+
- `region_proj` to prune parts by region
607+
- `user_id_proj` to further prune by `user_id`
608+
609+
This behavior is visible using `EXPLAIN projections = 1`, which shows how
610+
ClickHouse selects and applies projections.
611+
612+
```sql
613+
EXPLAIN projections=1
614+
SELECT * FROM page_views WHERE region = 'us_west' AND user_id = 107;
615+
```
616+
617+
```response
618+
┌─explain────────────────────────────────────────────────────────────────────────────────┐
619+
1. │ Expression ((Project names + Projection)) │
620+
2. │ Expression │
621+
3. │ ReadFromMergeTree (default.page_views) │
622+
4. │ Projections: │
623+
5. │ Name: region_proj │
624+
6. │ Description: Projection has been analyzed and is used for part-level filtering │
625+
7. │ Condition: (region in ['us_west', 'us_west']) │
626+
8. │ Search Algorithm: binary search │
627+
9. │ Parts: 3 │
628+
10. │ Marks: 3 │
629+
11. │ Ranges: 3 │
630+
12. │ Rows: 3 │
631+
13. │ Filtered Parts: 2 │
632+
14. │ Name: user_id_proj │
633+
15. │ Description: Projection has been analyzed and is used for part-level filtering │
634+
16. │ Condition: (user_id in [107, 107]) │
635+
17. │ Search Algorithm: binary search │
636+
18. │ Parts: 1 │
637+
19. │ Marks: 1 │
638+
20. │ Ranges: 1 │
639+
21. │ Rows: 1 │
640+
22. │ Filtered Parts: 2 │
641+
└────────────────────────────────────────────────────────────────────────────────────────┘
642+
```
643+
644+
The `EXPLAIN` output (shown above) reveals the logical query plan, top to bottom:
645+
646+
| Row number | Description |
647+
|------------|----------------------------------------------------------------------------------------------------------|
648+
| 3 | Plans to read from the `page_views` base table |
649+
| 5-13 | Uses `region_proj` to identify 3 parts where region = 'us_west', pruning 2 of the 5 parts |
650+
| 14-22 | Uses user`_id_proj` to identify 1 part where `user_id = 107`, further pruning 2 of the 3 remaining parts |
651+
652+
In the end, just **1 out of 5 parts** is read from the base table.
653+
By combining the index analysis of multiple projections, ClickHouse significantly reduces the amount of data scanned,
654+
improving performance while keeping storage overhead low.
519655

520656
## Related content {#related-content}
521657
- [A Practical Introduction to Primary Indexes in ClickHouse](/guides/best-practices/sparse-primary-indexes#option-3-projections)

0 commit comments

Comments
 (0)