Skip to content

Commit 212716b

Browse files
committed
Separate the docs for dbt. Include a new section in MVs to cover the new target functionality. Fix some links.
1 parent 0b21576 commit 212716b

File tree

6 files changed

+805
-586
lines changed

6 files changed

+805
-586
lines changed

docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md

Lines changed: 4 additions & 578 deletions
Large diffs are not rendered by default.

docs/integrations/data-ingestion/etl-tools/dbt/guides.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
sidebar_label: 'Guides'
33
slug: /integrations/dbt/guides
4-
sidebar_position: 2
4+
sidebar_position: 4
55
description: 'Guides for using dbt with ClickHouse'
66
keywords: ['clickhouse', 'dbt', 'guides']
77
title: 'Guides'
@@ -32,7 +32,8 @@ This section provides guides on setting up dbt and the ClickHouse adapter, as we
3232
5. Creating a snapshot model.
3333
6. Using materialized views.
3434

35-
These guides are designed to be used in conjunction with the rest of the [documentation](/integrations/dbt) and the [features and configurations](/integrations/dbt/features-and-configurations).
35+
These guides are designed to be used in conjunction with the rest of the [documentation](/integrations/dbt), the [features and configurations](/integrations/dbt/features-and-configurations) and the
36+
[materializations reference](/integrations/dbt/materializations)..
3637

3738
<TOCInline toc={toc} maxHeadingLevel={2} />
3839

docs/integrations/data-ingestion/etl-tools/dbt/index.md

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
---
2-
sidebar_label: 'Overview'
32
slug: /integrations/dbt
43
sidebar_position: 1
54
description: 'You can transform and model your data in ClickHouse using dbt'
@@ -12,9 +11,22 @@ integration:
1211
- website: 'https://github.com/ClickHouse/dbt-clickhouse'
1312
---
1413

15-
import TOCInline from '@theme/TOCInline';
14+
import {useCurrentSidebarCategory} from '@docusaurus/theme-common/internal';
1615
import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported';
1716

17+
export const DocPageList = () => {
18+
const category = useCurrentSidebarCategory();
19+
return (
20+
<ul>
21+
{category.items.map((item) => (
22+
<li key={item.docId || item.label}>
23+
<a href={item.href}>{item.label}</a>
24+
</li>
25+
))}
26+
</ul>
27+
);
28+
};
29+
1830
# Integrating dbt and ClickHouse {#integrate-dbt-clickhouse}
1931

2032
<ClickHouseSupportedBadge/>
@@ -26,7 +38,8 @@ Within dbt, these models can be cross-referenced and layered to allow the constr
2638

2739
dbt is compatible with ClickHouse through a [ClickHouse-supported adapter](https://github.com/ClickHouse/dbt-clickhouse).
2840

29-
<TOCInline toc={toc} maxHeadingLevel={2} />
41+
## Related pages
42+
<DocPageList />
3043

3144
## Supported features {#supported-features}
3245

@@ -73,7 +86,7 @@ The following are [experimental features](https://clickhouse.com/docs/en/beta-an
7386

7487
| Type | Supported? | Details |
7588
|-----------------------------------------|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
76-
| Materialized View materialization | YES, Experimental | Creates a [materialized view](https://clickhouse.com/docs/en/materialized-view). |
89+
| Materialized View materialization | YES, Experimental | Creates a [materialized view](/integrations/dbt/materialized-views). |
7790
| Distributed table materialization | YES, Experimental | Creates a [distributed table](https://clickhouse.com/docs/en/engines/table-engines/special/distributed). |
7891
| Distributed incremental materialization | YES, Experimental | Incremental model based on the same idea as distributed table. Note that not all strategies are supported, visit [this](https://github.com/ClickHouse/dbt-clickhouse?tab=readme-ov-file#distributed-incremental-materialization) for more info. |
7992
| Dictionary materialization | YES, Experimental | Creates a [dictionary](https://clickhouse.com/docs/en/engines/table-engines/special/dictionary). |
@@ -157,7 +170,7 @@ For deployment (i.e., the CD step), we recommend using the artifacts from your p
157170

158171
If you encounter issues connecting to ClickHouse from dbt, make sure the following criteria are met:
159172

160-
- The engine must be one of the [supported engines](/integrations/dbt/features-and-configurations#supported-table-engines).
173+
- The engine must be one of the [supported engines](/integrations/dbt/materializations#supported-table-engines).
161174
- You must have adequate permissions to access the database.
162175
- If you're not using the default table engine for the database, you must specify a table engine in your model
163176
configuration.
Lines changed: 301 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,301 @@
1+
---
2+
sidebar_label: 'Materialization: materialized_view'
3+
slug: /integrations/dbt/materialization-materialized-view
4+
sidebar_position: 4
5+
description: 'Using the materialized_view materialization in dbt-clickhouse'
6+
keywords: ['clickhouse', 'dbt', 'materialized view', 'refreshable', 'external target table', 'catchup']
7+
title: 'Materialized Views'
8+
doc_type: 'guide'
9+
---
10+
11+
import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported';
12+
13+
# Materialized Views
14+
15+
<ClickHouseSupportedBadge/>
16+
17+
:::note
18+
This materialization is experimental. For general materialization concepts and shared configurations, see the [Materializations](/integrations/dbt/materializations) page.
19+
:::
20+
21+
A `materialized_view` materialization should be a `SELECT` from an existing (source) table. The adapter will create a
22+
target table with the model name
23+
and a ClickHouse MATERIALIZED VIEW with the name `<model_name>_mv`. Unlike PostgreSQL, a ClickHouse materialized view is
24+
not "static" (and has
25+
no corresponding REFRESH operation). Instead, it acts as an "insert trigger", and will insert new rows into the target
26+
table using the defined `SELECT`
27+
"transformation" in the view definition on rows inserted into the source table. See the [test file](https://github.com/ClickHouse/dbt-clickhouse/blob/main/tests/integration/adapter/materialized_view/test_materialized_view.py)
28+
for an introductory example
29+
of how to use this functionality.
30+
31+
## Multiple materialized views {#multiple-materialized-views}
32+
33+
Clickhouse provides the ability for more than one materialized view to write records to the same target table. To
34+
support this in dbt-clickhouse, you can construct a `UNION` in your model file, such that the SQL for each of your
35+
materialized views is wrapped with comments of the form `--my_mv_name:begin` and `--my_mv_name:end`.
36+
37+
For example the following will build two materialized views both writing data to the same destination table of the
38+
model. The names of the materialized views will take the form `<model_name>_mv1` and `<model_name>_mv2` :
39+
40+
```sql
41+
--mv1:begin
42+
select a,b,c from {{ source('raw', 'table_1') }}
43+
--mv1:end
44+
union all
45+
--mv2:begin
46+
select a,b,c from {{ source('raw', 'table_2') }}
47+
--mv2:end
48+
```
49+
50+
> IMPORTANT!
51+
>
52+
> When updating a model with multiple materialized views (MVs), especially when renaming one of the MV names,
53+
> dbt-clickhouse does not automatically drop the old MV. Instead,
54+
> you will encounter the following warning:
55+
`Warning - Table <previous table name> was detected with the same pattern as model name <your model name> but was not found in this run. In case it is a renamed mv that was previously part of this model, drop it manually (!!!) `
56+
57+
## How to iterate the target table schema {#how-to-iterate-the-target-table-schema}
58+
Starting with dbt-clickhouse version 1.9.8, you can control how the target table schema is iterated when `dbt run` encounters different columns in the MV's SQL.
59+
60+
By default, dbt will not apply any changes to the target table (`ignore` setting value), but you can change this setting to follow the same behavior as the `on_schema_change` config [in incremental models](https://docs.getdbt.com/docs/build/incremental-models#what-if-the-columns-of-my-incremental-model-change).
61+
62+
Also, you can use this setting as a safety mechanism. If you set it to `fail`, the build will fail if the columns in the MV's SQL differ from the target table that was created by the first `dbt run`.
63+
64+
```jinja2
65+
{{config(
66+
materialized='materialized_view',
67+
engine='MergeTree()',
68+
order_by='(id)',
69+
on_schema_change='fail'
70+
)}}
71+
```
72+
73+
## Data catch-up {#data-catch-up}
74+
75+
By default, when creating or recreating a materialized view (MV), the target table is first populated with historical data before the MV itself is created. You can disable this behavior by setting the `catchup` config to `False`.
76+
77+
| Operation | `catchup: True` (default) | `catchup: False` |
78+
|-----------|---------------------------|------------------|
79+
| Initial deployment (`dbt run`) | Target table backfilled with historical data | Target table created empty |
80+
| Full refresh (`dbt run --full-refresh`) | Target table rebuilt and backfilled | Target table recreated empty, **existing data lost** |
81+
| Normal operation | Materialized view captures new inserts | Materialized view captures new inserts |
82+
83+
```python
84+
{{config(
85+
materialized='materialized_view',
86+
engine='MergeTree()',
87+
order_by='(id)',
88+
catchup=False
89+
)}}
90+
```
91+
92+
:::warning Data Loss Risk with Full Refresh
93+
Using `catchup: False` with `dbt run --full-refresh` will **discard all existing data** in the target table. The table will be recreated empty and only capture new data going forward. Ensure you have backups if the historical data might be needed later.
94+
:::
95+
96+
## Refreshable Materialized Views {#refreshable-materialized-views}
97+
98+
To use [Refreshable Materialized View](/materialized-view/refreshable-materialized-view),
99+
please adjust the following configs as needed in your MV model (all these configs are supposed to be set inside a
100+
refreshable config object):
101+
102+
| Option | Description | Required | Default Value |
103+
|-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|---------------|
104+
| refresh_interval | The interval clause (required) | Yes | |
105+
| randomize | The randomization clause, will appear after `RANDOMIZE FOR` | | |
106+
| append | If set to `True`, each refresh inserts rows into the table without deleting existing rows. The insert is not atomic, just like a regular INSERT SELECT. | | False |
107+
| depends_on | A dependencies list for the refreshable mv. Please provide the dependencies in the following format `{schema}.{view_name}` | | |
108+
| depends_on_validation | Whether to validate the existence of the dependencies provided in `depends_on`. In case a dependency doesn't contain a schema, the validation occurs on schema `default` | | False |
109+
110+
A config example for refreshable materialized view:
111+
112+
```python
113+
{{
114+
config(
115+
materialized='materialized_view',
116+
refreshable={
117+
"interval": "EVERY 5 MINUTE",
118+
"randomize": "1 MINUTE",
119+
"append": True,
120+
"depends_on": ['schema.depend_on_model'],
121+
"depends_on_validation": True
122+
}
123+
)
124+
}}
125+
```
126+
127+
### Limitations {#refreshable-limitations}
128+
129+
* When creating a refreshable materialized view (MV) in ClickHouse that has a dependency, ClickHouse does not throw an
130+
error if the specified dependency does not exist at the time of creation. Instead, the refreshable MV remains in an
131+
inactive state, waiting for the dependency to be satisfied before it starts processing updates or refreshing.
132+
This behavior is by design, but it may lead to delays in data availability if the required dependency is not addressed
133+
promptly. Users are advised to ensure all dependencies are correctly defined and exist before creating a refreshable
134+
materialized view.
135+
* As of today, there is no actual "dbt linkage" between the mv and its dependencies, therefore the creation order is not
136+
guaranteed.
137+
* The refreshable feature was not tested with multiple mvs directing to the same target model.
138+
139+
## External Target Table (Experimental) {#external-target-table}
140+
141+
:::note
142+
This feature is experimental and available starting from dbt-clickhouse version 1.9.x. The API may change based on community feedback.
143+
:::
144+
145+
By default, dbt-clickhouse creates and manages both the target table and the materialized view(s) within a single model. This approach has some limitations:
146+
147+
- All resources (target table + MVs) share the same configuration
148+
- The MV SQL is used to infer the target table schema
149+
- Multiple MVs pointing to the same table must be defined together using `UNION ALL` syntax
150+
151+
The **external target table** feature allows you to define the target table separately as a regular `table` materialization and then reference it from your materialized view models. This provides more flexibility and follows dbt's philosophy of 1:1 resource mapping.
152+
153+
### Benefits {#external-target-benefits}
154+
155+
- **Separate configurations**: Target table and MVs can have different engines, settings, and configurations
156+
- **Cleaner model organization**: Each resource is defined in its own file
157+
- **Better readability**: No need for `UNION ALL` with comment markers
158+
- **Individual resource management**: Each MV can be managed independently
159+
- **Explicit schema definition**: Target table schema is defined explicitly, not inferred
160+
161+
### Usage {#external-target-usage}
162+
163+
**Step 1: Define the target table as a regular table model**
164+
165+
```sql
166+
-- models/events_daily.sql
167+
{{
168+
config(
169+
materialized='table',
170+
engine='SummingMergeTree()',
171+
order_by='(event_date, event_type)',
172+
partition_by='toYYYYMM(event_date)'
173+
)
174+
}}
175+
176+
SELECT
177+
toDate(now()) AS event_date,
178+
'' AS event_type,
179+
toUInt64(0) AS total
180+
WHERE 0 -- Creates empty table with correct schema
181+
```
182+
183+
The `WHERE 0` clause creates an empty table with the correct schema. This is necessary because the target table needs to exist before the MVs are created.
184+
185+
**Step 2: Define materialized views pointing to the target table**
186+
187+
```sql
188+
-- models/page_events_aggregator.sql
189+
{{ config(materialized='materialized_view') }}
190+
{{ materialization_target_table(ref('events_daily')) }}
191+
192+
SELECT
193+
toStartOfDay(event_time) AS event_date,
194+
event_type,
195+
count() AS total
196+
FROM {{ source('raw', 'page_events') }}
197+
GROUP BY event_date, event_type
198+
```
199+
200+
```sql
201+
-- models/mobile_events_aggregator.sql
202+
{{ config(materialized='materialized_view') }}
203+
{{ materialization_target_table(ref('events_daily')) }}
204+
205+
SELECT
206+
toStartOfDay(event_time) AS event_date,
207+
event_type,
208+
count() AS total
209+
FROM {{ source('raw', 'mobile_events') }}
210+
GROUP BY event_date, event_type
211+
```
212+
213+
The `materialization_target_table()` macro tells dbt-clickhouse to create the MV with a `TO` clause pointing to the specified table instead of creating its own target table.
214+
215+
### Configuration Options {#external-target-configuration}
216+
217+
When using external target tables, the following configurations apply:
218+
219+
**On the target table (`materialized='table'`):**
220+
221+
| Option | Description | Default |
222+
|--------|-------------|---------|
223+
| `on_schema_change` | How to handle schema changes when the table is used by dbt-managed MVs. Set to `fail` by default for tables with MVs pointing to them. | `fail` (when MVs exist) |
224+
| `repopulate_from_mvs_on_full_refresh` | On `--full-refresh`, instead of running the table's SQL, rebuild the table by executing INSERT-SELECTs using the SQL from all MVs pointing to it. | `False` |
225+
226+
**On the materialized view (`materialized='materialized_view'`):**
227+
228+
| Option | Description | Default |
229+
|--------|-------------|---------|
230+
| `catchup` | Whether to backfill historical data when the MV is created. | `True` |
231+
232+
### Behavior Comparison {#external-target-behavior}
233+
234+
| Operation | Standard MV | External Target MV |
235+
|-----------|-------------|-------------------|
236+
| First `dbt run` | Creates target table + MV(s) | Creates MV with `TO` clause (target table must exist) |
237+
| Subsequent `dbt run` | All resources managed together | MV updated with `ALTER TABLE MODIFY QUERY` |
238+
| `--full-refresh` | Recreates everything with optional catchup | Recreates MV only. Use `repopulate_from_mvs_on_full_refresh` on target table for atomic rebuild |
239+
| Schema changes | Controlled by `on_schema_change` | Target table: `on_schema_change` (defaults to `fail`). MV: uses `ALTER TABLE MODIFY QUERY` |
240+
241+
### Full Refresh with External Targets {#external-target-full-refresh}
242+
243+
When using `--full-refresh` with external target tables, you have two options:
244+
245+
**Option 1: Refresh MVs only (default)**
246+
247+
Each MV is dropped and recreated. If `catchup=True`, the MV backfills data from its source.
248+
249+
```sql
250+
-- models/page_events_aggregator.sql
251+
{{ config(
252+
materialized='materialized_view',
253+
catchup=True -- Will backfill after recreation
254+
) }}
255+
{{ materialization_target_table(ref('events_daily')) }}
256+
...
257+
```
258+
259+
**Option 2: Atomic table rebuild using MVs**
260+
261+
Set `repopulate_from_mvs_on_full_refresh=True` on the target table. This will:
262+
1. Create a new temporary table
263+
2. Execute INSERT-SELECT using each MV's SQL
264+
3. Atomically swap the tables
265+
266+
```sql
267+
-- models/events_daily.sql
268+
{{
269+
config(
270+
materialized='table',
271+
engine='SummingMergeTree()',
272+
order_by='(event_date, event_type)',
273+
repopulate_from_mvs_on_full_refresh=True
274+
)
275+
}}
276+
...
277+
```
278+
279+
:::warning
280+
When using `repopulate_from_mvs_on_full_refresh`, ensure all MVs are created before running `--full-refresh` on the target table, as it uses the MV definitions from ClickHouse.
281+
:::
282+
283+
### Changing the Target Table {#external-target-changing}
284+
285+
You cannot change the target table of an MV without a `--full-refresh`. If you try to run `dbt run` after changing the `materialization_target_table()` reference, the build will fail with an error message indicating that the target has changed.
286+
287+
To change the target:
288+
1. Update the `materialization_target_table()` call
289+
2. Run `dbt run --full-refresh -s your_mv_model`
290+
291+
### Migration from Standard MVs {#external-target-migration}
292+
293+
To migrate from the standard MV approach to external target tables:
294+
295+
1. **Create the target table model** with the same schema as your current MV target
296+
2. **Update your MV models** to use `materialization_target_table()`
297+
3. **Run with `--full-refresh`** to recreate the MVs with `TO` clauses
298+
299+
:::note
300+
The old target table (created by the standard MV approach) will not be automatically dropped. You may need to clean it up manually after verifying the migration.
301+
:::

0 commit comments

Comments
 (0)