Skip to content

Commit 5b93a7b

Browse files
authored
Merge pull request #3997 from Blargian/mv_v_projections
Materialized views vs projections
2 parents 6911ae2 + a3b5f6d commit 5b93a7b

File tree

5 files changed

+102
-1
lines changed

5 files changed

+102
-1
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,3 +68,4 @@ docs/cloud/reference/release-notes-index.md
6868
docs/whats-new/changelog/index.md
6969
docs/cloud/manage/api/api-reference-index.md
7070
docs/getting-started/index.md
71+
docs/data-modeling/projections/index.md

docs/data-modeling/projections.md renamed to docs/data-modeling/projections/1_projections.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ title: 'Projections'
44
description: 'Page describing what projections are, how they can be used to improve
55
query performance, and how they differ from materialized views.'
66
keywords: ['projection', 'projections', 'query optimization']
7+
sidebar_order: 1
78
---
89

910
import projections_1 from '@site/static/images/data-modeling/projections_1.png';
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
---
2+
slug: /managing-data/materialized-views-versus-projections
3+
sidebar_label: 'Materialized views vs projections'
4+
title: 'Materialized Views versus Projections'
5+
hide_title: false
6+
description: 'Article comparing materialized views and projections in ClickHouse, including their use cases, performance, and limitations.'
7+
---
8+
9+
> A common question from users is when they should use materialized views versus
10+
projections. In this article we will explore the key differences between the two and why you
11+
may want to pick one over the other in certain scenarios.
12+
13+
## Summary of key differences {#key-differences}
14+
15+
The table below summarizes the key differences between materialized views and projections for various aspects of consideration.
16+
17+
| Aspect | Materialized views | Projections |
18+
|------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
19+
| Data storage and location | Store their results in a **separate, explicit target table**, acting as insert triggers, on insert to a source table. | Projections create optimized data layouts that are physically **stored alongside the main table data** and are invisible to the user. |
20+
| Update mechanism | Operate **synchronously** on `INSERT` to the source table (for incremental materialized views). Note: they can also be **scheduled** using refreshable materialized views. | **Asynchronous** updates in the background upon `INSERT` to the main table. |
21+
| Query interaction | Working with Materialized Views requires querying the **target table directly**, meaning that users need to be aware of the existence of materialized views when writing queries. | Projections are **automatically selected** by ClickHouse's query optimizer, and are transparent in the sense that the user does not have to modify their queries to the table with the projection in order to utilise it. From version 25.6 it is also possible to filter by more than one projection. |
22+
| Handling `UPDATE` / `DELETE` | **Do not automatically react** to `UPDATE` or `DELETE` operations on the source table as materialized views have no knowledge of the source table, acting only as insert triggers _to_ a source table. This can lead to potential data staleness between source and target tables and requires workarounds or periodic full refresh. (via refreshable materialized view). | By default, are **incompatible with `DELETED` rows** (especially lightweight deletes). `lightweight_mutation_projection_mode` (v24.7+) can enable compatibility. |
23+
| `JOIN` support | Yes. Refreshable materialized views can be used for complex denormalization. Incremental materialized views only trigger on left-most table inserts. | No. `JOIN` operations are not supported within projection definitions for filtering the materialised data. |
24+
| `WHERE` clause in definition | Yes. `WHERE` clauses can be included to filter data before materialization. | No. `WHERE` clauses are not supported within projection definitions for filtering the materialized data. |
25+
| Chaining capabilities | Yes, the target table of one materialized view can be the source for another materialized view, enabling multi-stage pipelines. | No. Projections cannot be chained. |
26+
| Applicable table engines | Can be used with various source table engines, but target tables are usually of the `MergeTree` family. | **Only available** for `MergeTree` family table engines. |
27+
| Failure handling | Failure during data insertion means that data is lost in the target table, leading to potential inconsistency. | Failures are handled **silently** in the background. Queries can seamlessly mix materialized and unmaterialized parts. |
28+
| Operational overhead | Requires explicit target table creation and often manual backfilling. Managing consistency with `UPDATE`/`DELETE` increases complexity. | Projections are automatically maintained and kept-in-sync and generally have a lower operational burden. |
29+
| `FINAL` query compatibility | Generally compatible, but often require `GROUP BY` on the target table. | **Do not work** with `FINAL` queries. |
30+
| Lazy materialization | Yes. | Monitor for projection compatibility issues when using materialization features. You may need to set `query_plan_optimize_lazy_materialization = false` |
31+
| Parallel replicas | Yes. | No. |
32+
33+
## Comparing materialized views and projections {#choose-between}
34+
35+
### When to choose materialized views {#choosing-materialized-views}
36+
37+
You should consider using materialized views when:
38+
39+
- Working with **real-time ETL & multi-stage data pipelines:** You need to perform complex transformations, aggregations, or to route data as it arrives, potentially across multiple stages by chaining views.
40+
- You require **complex denormalization**: You need to pre-join data from several sources (tables, subqueries or dictionaries) into a single, query-optimized table, especially if periodic full refreshes with the use of refreshable materialized views are acceptable.
41+
- You want **explicit schema control**: You require a separate, distinct target table with its own schema and engine for the pre-computed results, offering greater flexibility for data modelling.
42+
- You want to **filter at ingestion**: You need to filter data _before_ it's materialized, reducing the volume of data written to the target table.
43+
44+
### When to avoid materialized views {#avoid-materialized-views}
45+
46+
You should consider avoiding use of materialized views when:
47+
48+
- **Source data is frequently updated or deleted**: Without additional strategies for handling consistency between the source and target tables, incremental materialized views could become stale and inconsistent.
49+
- **Simplicity and automatic optimization are preferred**: If you want to avoid managing separate target tables.
50+
51+
### When to choose projections {#choosing-projections}
52+
53+
You should consider using projections when:
54+
55+
- **Optimizing queries for a single table**: Your primary goal is to speed up queries on a single base table by providing alternative sorting orders, optimizing filters on columns which are not part of the primary-key, or pre-computing aggregations for a single table.
56+
- You want **query transparency**: you want queries to target the original table without modification, relying on ClickHouse to pick the best data layout for a given query.
57+
58+
### When to avoid projections {#avoid-projections}
59+
60+
You should consider avoiding the use of projections when:
61+
62+
- **Complex data transformation or multi-stage ETL are required**: Projections do not support `JOIN` operations within their definitions, cannot be changed to build multi-step pipelines and cannot handle some SQL features like window functions or complex `CASE` statements. As such they are not suited for complex data transformation.
63+
- **Explicit filtering of materialized data is needed**: Projections do not support `WHERE` clauses in their definition to filter the data that gets materialized into the projection itself.
64+
- **Non-MergeTree table engines are used**: Projections are exclusively available for tables using the `MergeTree` family of engines.
65+
- `FINAL` queries are essential: Projections do not work with `FINAL` queries, which are sometimes used for deduplication.
66+
- You need [parallel replicas](/deployment-guides/parallel-replicas) as they are not supported with projections.
67+
68+
## Summary {#summary}
69+
70+
Materialized views and projections are both powerful tools in your toolkit for
71+
optimizing queries and transforming data, and in general, we recommend not to view
72+
using them as an either/or choice. Instead, they can be used in a complementary
73+
manner to get the most out of your queries. As such, the choice between materialized
74+
views and projections in ClickHouse really depends on your specific use case and
75+
access patterns.
76+
77+
As a general rule of thumb, you should consider using materialized views when
78+
you need to aggregate data from one or more source tables into a target table or
79+
perform complex transformations at scale. Materialized views are excellent for shifting
80+
the work of expensive aggregations from query time to insert time. They are a
81+
great choice for daily or monthly rollups, real-time dashboards or data summaries.
82+
83+
On the other hand, you should use projections when you need to optimize queries
84+
which filter on different columns than those which are used in the table's primary
85+
key which determines the physical ordering of the data on disk. They are particularly
86+
useful when it's no longer possible to change the primary key of a table, or when
87+
your access patterns are more diverse than what the primary key can accommodate.

scripts/autogenerate-table-of-contents.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,5 +34,6 @@ python3 scripts/table-of-contents-generator/toc_gen.py --single-toc --dir="docs/
3434
python3 scripts/table-of-contents-generator/toc_gen.py --single-toc --dir="docs/cloud/changelogs" --md="docs/cloud/reference/release-notes-index.md"
3535
python3 scripts/table-of-contents-generator/toc_gen.py --single-toc --dir="docs/development" --md="docs/development/index.md" --ignore images
3636
python3 scripts/table-of-contents-generator/toc_gen.py --single-toc --dir="docs/getting-started/example-datasets" --md="docs/getting-started/index.md" --ignore images
37+
python3 scripts/table-of-contents-generator/toc_gen.py --single-toc --dir="docs/data-modelling/projections" --md="docs/data-modelling/projections/index.md"
3738
deactivate
3839
rm -r venv

sidebars.js

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1153,7 +1153,18 @@ const sidebars = {
11531153
"materialized-view/refreshable-materialized-view"
11541154
],
11551155
},
1156-
"data-modeling/projections",
1156+
{
1157+
type: "category",
1158+
label: "Projections",
1159+
collapsed: true,
1160+
collapsible: true,
1161+
items: [
1162+
{
1163+
type: "autogenerated",
1164+
dirName: "data-modeling/projections",
1165+
}
1166+
]
1167+
},
11571168
{
11581169
type: "category",
11591170
label: "Data Compression",

0 commit comments

Comments
 (0)