|
| 1 | +--- |
| 2 | +sidebar_label: 'Materialization: materialized_view' |
| 3 | +slug: /integrations/dbt/materialization-materialized-view |
| 4 | +sidebar_position: 4 |
| 5 | +description: 'Using the materialized_view materialization in dbt-clickhouse' |
| 6 | +keywords: ['clickhouse', 'dbt', 'materialized view', 'refreshable', 'external target table', 'catchup'] |
| 7 | +title: 'Materialized Views' |
| 8 | +doc_type: 'guide' |
| 9 | +--- |
| 10 | + |
| 11 | +import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported'; |
| 12 | + |
| 13 | +# Materialized Views |
| 14 | + |
| 15 | +<ClickHouseSupportedBadge/> |
| 16 | + |
| 17 | +:::note |
| 18 | +This materialization is experimental. For general materialization concepts and shared configurations, see the [Materializations](/integrations/dbt/materializations) page. |
| 19 | +::: |
| 20 | + |
| 21 | +A `materialized_view` materialization should be a `SELECT` from an existing (source) table. The adapter will create a |
| 22 | +target table with the model name |
| 23 | +and a ClickHouse MATERIALIZED VIEW with the name `<model_name>_mv`. Unlike PostgreSQL, a ClickHouse materialized view is |
| 24 | +not "static" (and has |
| 25 | +no corresponding REFRESH operation). Instead, it acts as an "insert trigger", and will insert new rows into the target |
| 26 | +table using the defined `SELECT` |
| 27 | +"transformation" in the view definition on rows inserted into the source table. See the [test file](https://github.com/ClickHouse/dbt-clickhouse/blob/main/tests/integration/adapter/materialized_view/test_materialized_view.py) |
| 28 | +for an introductory example |
| 29 | +of how to use this functionality. |
| 30 | + |
| 31 | +## Multiple materialized views {#multiple-materialized-views} |
| 32 | + |
| 33 | +Clickhouse provides the ability for more than one materialized view to write records to the same target table. To |
| 34 | +support this in dbt-clickhouse, you can construct a `UNION` in your model file, such that the SQL for each of your |
| 35 | +materialized views is wrapped with comments of the form `--my_mv_name:begin` and `--my_mv_name:end`. |
| 36 | + |
| 37 | +For example the following will build two materialized views both writing data to the same destination table of the |
| 38 | +model. The names of the materialized views will take the form `<model_name>_mv1` and `<model_name>_mv2` : |
| 39 | + |
| 40 | +```sql |
| 41 | +--mv1:begin |
| 42 | +select a,b,c from {{ source('raw', 'table_1') }} |
| 43 | +--mv1:end |
| 44 | +union all |
| 45 | +--mv2:begin |
| 46 | +select a,b,c from {{ source('raw', 'table_2') }} |
| 47 | +--mv2:end |
| 48 | +``` |
| 49 | + |
| 50 | +> IMPORTANT! |
| 51 | +> |
| 52 | +> When updating a model with multiple materialized views (MVs), especially when renaming one of the MV names, |
| 53 | +> dbt-clickhouse does not automatically drop the old MV. Instead, |
| 54 | +> you will encounter the following warning: |
| 55 | +`Warning - Table <previous table name> was detected with the same pattern as model name <your model name> but was not found in this run. In case it is a renamed mv that was previously part of this model, drop it manually (!!!) ` |
| 56 | +
|
| 57 | +## How to iterate the target table schema {#how-to-iterate-the-target-table-schema} |
| 58 | +Starting with dbt-clickhouse version 1.9.8, you can control how the target table schema is iterated when `dbt run` encounters different columns in the MV's SQL. |
| 59 | + |
| 60 | +By default, dbt will not apply any changes to the target table (`ignore` setting value), but you can change this setting to follow the same behavior as the `on_schema_change` config [in incremental models](https://docs.getdbt.com/docs/build/incremental-models#what-if-the-columns-of-my-incremental-model-change). |
| 61 | + |
| 62 | +Also, you can use this setting as a safety mechanism. If you set it to `fail`, the build will fail if the columns in the MV's SQL differ from the target table that was created by the first `dbt run`. |
| 63 | + |
| 64 | +```jinja2 |
| 65 | +{{config( |
| 66 | + materialized='materialized_view', |
| 67 | + engine='MergeTree()', |
| 68 | + order_by='(id)', |
| 69 | + on_schema_change='fail' |
| 70 | +)}} |
| 71 | +``` |
| 72 | + |
| 73 | +## Data catch-up {#data-catch-up} |
| 74 | + |
| 75 | +By default, when creating or recreating a materialized view (MV), the target table is first populated with historical data before the MV itself is created. You can disable this behavior by setting the `catchup` config to `False`. |
| 76 | + |
| 77 | +| Operation | `catchup: True` (default) | `catchup: False` | |
| 78 | +|-----------|---------------------------|------------------| |
| 79 | +| Initial deployment (`dbt run`) | Target table backfilled with historical data | Target table created empty | |
| 80 | +| Full refresh (`dbt run --full-refresh`) | Target table rebuilt and backfilled | Target table recreated empty, **existing data lost** | |
| 81 | +| Normal operation | Materialized view captures new inserts | Materialized view captures new inserts | |
| 82 | + |
| 83 | +```python |
| 84 | +{{config( |
| 85 | + materialized='materialized_view', |
| 86 | + engine='MergeTree()', |
| 87 | + order_by='(id)', |
| 88 | + catchup=False |
| 89 | +)}} |
| 90 | +``` |
| 91 | + |
| 92 | +:::warning Data Loss Risk with Full Refresh |
| 93 | +Using `catchup: False` with `dbt run --full-refresh` will **discard all existing data** in the target table. The table will be recreated empty and only capture new data going forward. Ensure you have backups if the historical data might be needed later. |
| 94 | +::: |
| 95 | + |
| 96 | +## Refreshable Materialized Views {#refreshable-materialized-views} |
| 97 | + |
| 98 | +To use [Refreshable Materialized View](/materialized-view/refreshable-materialized-view), |
| 99 | +please adjust the following configs as needed in your MV model (all these configs are supposed to be set inside a |
| 100 | +refreshable config object): |
| 101 | + |
| 102 | +| Option | Description | Required | Default Value | |
| 103 | +|-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|---------------| |
| 104 | +| refresh_interval | The interval clause (required) | Yes | | |
| 105 | +| randomize | The randomization clause, will appear after `RANDOMIZE FOR` | | | |
| 106 | +| append | If set to `True`, each refresh inserts rows into the table without deleting existing rows. The insert is not atomic, just like a regular INSERT SELECT. | | False | |
| 107 | +| depends_on | A dependencies list for the refreshable mv. Please provide the dependencies in the following format `{schema}.{view_name}` | | | |
| 108 | +| depends_on_validation | Whether to validate the existence of the dependencies provided in `depends_on`. In case a dependency doesn't contain a schema, the validation occurs on schema `default` | | False | |
| 109 | + |
| 110 | +A config example for refreshable materialized view: |
| 111 | + |
| 112 | +```python |
| 113 | +{{ |
| 114 | + config( |
| 115 | + materialized='materialized_view', |
| 116 | + refreshable={ |
| 117 | + "interval": "EVERY 5 MINUTE", |
| 118 | + "randomize": "1 MINUTE", |
| 119 | + "append": True, |
| 120 | + "depends_on": ['schema.depend_on_model'], |
| 121 | + "depends_on_validation": True |
| 122 | + } |
| 123 | + ) |
| 124 | +}} |
| 125 | +``` |
| 126 | + |
| 127 | +### Limitations {#refreshable-limitations} |
| 128 | + |
| 129 | +* When creating a refreshable materialized view (MV) in ClickHouse that has a dependency, ClickHouse does not throw an |
| 130 | + error if the specified dependency does not exist at the time of creation. Instead, the refreshable MV remains in an |
| 131 | + inactive state, waiting for the dependency to be satisfied before it starts processing updates or refreshing. |
| 132 | + This behavior is by design, but it may lead to delays in data availability if the required dependency is not addressed |
| 133 | + promptly. Users are advised to ensure all dependencies are correctly defined and exist before creating a refreshable |
| 134 | + materialized view. |
| 135 | +* As of today, there is no actual "dbt linkage" between the mv and its dependencies, therefore the creation order is not |
| 136 | + guaranteed. |
| 137 | +* The refreshable feature was not tested with multiple mvs directing to the same target model. |
| 138 | + |
| 139 | +## External Target Table (Experimental) {#external-target-table} |
| 140 | + |
| 141 | +:::note |
| 142 | +This feature is experimental and available starting from dbt-clickhouse version 1.9.x. The API may change based on community feedback. |
| 143 | +::: |
| 144 | + |
| 145 | +By default, dbt-clickhouse creates and manages both the target table and the materialized view(s) within a single model. This approach has some limitations: |
| 146 | + |
| 147 | +- All resources (target table + MVs) share the same configuration |
| 148 | +- The MV SQL is used to infer the target table schema |
| 149 | +- Multiple MVs pointing to the same table must be defined together using `UNION ALL` syntax |
| 150 | + |
| 151 | +The **external target table** feature allows you to define the target table separately as a regular `table` materialization and then reference it from your materialized view models. This provides more flexibility and follows dbt's philosophy of 1:1 resource mapping. |
| 152 | + |
| 153 | +### Benefits {#external-target-benefits} |
| 154 | + |
| 155 | +- **Separate configurations**: Target table and MVs can have different engines, settings, and configurations |
| 156 | +- **Cleaner model organization**: Each resource is defined in its own file |
| 157 | +- **Better readability**: No need for `UNION ALL` with comment markers |
| 158 | +- **Individual resource management**: Each MV can be managed independently |
| 159 | +- **Explicit schema definition**: Target table schema is defined explicitly, not inferred |
| 160 | + |
| 161 | +### Usage {#external-target-usage} |
| 162 | + |
| 163 | +**Step 1: Define the target table as a regular table model** |
| 164 | + |
| 165 | +```sql |
| 166 | +-- models/events_daily.sql |
| 167 | +{{ |
| 168 | + config( |
| 169 | + materialized='table', |
| 170 | + engine='SummingMergeTree()', |
| 171 | + order_by='(event_date, event_type)', |
| 172 | + partition_by='toYYYYMM(event_date)' |
| 173 | + ) |
| 174 | +}} |
| 175 | + |
| 176 | +SELECT |
| 177 | + toDate(now()) AS event_date, |
| 178 | + '' AS event_type, |
| 179 | + toUInt64(0) AS total |
| 180 | +WHERE 0 -- Creates empty table with correct schema |
| 181 | +``` |
| 182 | + |
| 183 | +The `WHERE 0` clause creates an empty table with the correct schema. This is necessary because the target table needs to exist before the MVs are created. |
| 184 | + |
| 185 | +**Step 2: Define materialized views pointing to the target table** |
| 186 | + |
| 187 | +```sql |
| 188 | +-- models/page_events_aggregator.sql |
| 189 | +{{ config(materialized='materialized_view') }} |
| 190 | +{{ materialization_target_table(ref('events_daily')) }} |
| 191 | + |
| 192 | +SELECT |
| 193 | + toStartOfDay(event_time) AS event_date, |
| 194 | + event_type, |
| 195 | + count() AS total |
| 196 | +FROM {{ source('raw', 'page_events') }} |
| 197 | +GROUP BY event_date, event_type |
| 198 | +``` |
| 199 | + |
| 200 | +```sql |
| 201 | +-- models/mobile_events_aggregator.sql |
| 202 | +{{ config(materialized='materialized_view') }} |
| 203 | +{{ materialization_target_table(ref('events_daily')) }} |
| 204 | + |
| 205 | +SELECT |
| 206 | + toStartOfDay(event_time) AS event_date, |
| 207 | + event_type, |
| 208 | + count() AS total |
| 209 | +FROM {{ source('raw', 'mobile_events') }} |
| 210 | +GROUP BY event_date, event_type |
| 211 | +``` |
| 212 | + |
| 213 | +The `materialization_target_table()` macro tells dbt-clickhouse to create the MV with a `TO` clause pointing to the specified table instead of creating its own target table. |
| 214 | + |
| 215 | +### Configuration Options {#external-target-configuration} |
| 216 | + |
| 217 | +When using external target tables, the following configurations apply: |
| 218 | + |
| 219 | +**On the target table (`materialized='table'`):** |
| 220 | + |
| 221 | +| Option | Description | Default | |
| 222 | +|--------|-------------|---------| |
| 223 | +| `on_schema_change` | How to handle schema changes when the table is used by dbt-managed MVs. Set to `fail` by default for tables with MVs pointing to them. | `fail` (when MVs exist) | |
| 224 | +| `repopulate_from_mvs_on_full_refresh` | On `--full-refresh`, instead of running the table's SQL, rebuild the table by executing INSERT-SELECTs using the SQL from all MVs pointing to it. | `False` | |
| 225 | + |
| 226 | +**On the materialized view (`materialized='materialized_view'`):** |
| 227 | + |
| 228 | +| Option | Description | Default | |
| 229 | +|--------|-------------|---------| |
| 230 | +| `catchup` | Whether to backfill historical data when the MV is created. | `True` | |
| 231 | + |
| 232 | +### Behavior Comparison {#external-target-behavior} |
| 233 | + |
| 234 | +| Operation | Standard MV | External Target MV | |
| 235 | +|-----------|-------------|-------------------| |
| 236 | +| First `dbt run` | Creates target table + MV(s) | Creates MV with `TO` clause (target table must exist) | |
| 237 | +| Subsequent `dbt run` | All resources managed together | MV updated with `ALTER TABLE MODIFY QUERY` | |
| 238 | +| `--full-refresh` | Recreates everything with optional catchup | Recreates MV only. Use `repopulate_from_mvs_on_full_refresh` on target table for atomic rebuild | |
| 239 | +| Schema changes | Controlled by `on_schema_change` | Target table: `on_schema_change` (defaults to `fail`). MV: uses `ALTER TABLE MODIFY QUERY` | |
| 240 | + |
| 241 | +### Full Refresh with External Targets {#external-target-full-refresh} |
| 242 | + |
| 243 | +When using `--full-refresh` with external target tables, you have two options: |
| 244 | + |
| 245 | +**Option 1: Refresh MVs only (default)** |
| 246 | + |
| 247 | +Each MV is dropped and recreated. If `catchup=True`, the MV backfills data from its source. |
| 248 | + |
| 249 | +```sql |
| 250 | +-- models/page_events_aggregator.sql |
| 251 | +{{ config( |
| 252 | + materialized='materialized_view', |
| 253 | + catchup=True -- Will backfill after recreation |
| 254 | +) }} |
| 255 | +{{ materialization_target_table(ref('events_daily')) }} |
| 256 | +... |
| 257 | +``` |
| 258 | + |
| 259 | +**Option 2: Atomic table rebuild using MVs** |
| 260 | + |
| 261 | +Set `repopulate_from_mvs_on_full_refresh=True` on the target table. This will: |
| 262 | +1. Create a new temporary table |
| 263 | +2. Execute INSERT-SELECT using each MV's SQL |
| 264 | +3. Atomically swap the tables |
| 265 | + |
| 266 | +```sql |
| 267 | +-- models/events_daily.sql |
| 268 | +{{ |
| 269 | + config( |
| 270 | + materialized='table', |
| 271 | + engine='SummingMergeTree()', |
| 272 | + order_by='(event_date, event_type)', |
| 273 | + repopulate_from_mvs_on_full_refresh=True |
| 274 | + ) |
| 275 | +}} |
| 276 | +... |
| 277 | +``` |
| 278 | + |
| 279 | +:::warning |
| 280 | +When using `repopulate_from_mvs_on_full_refresh`, ensure all MVs are created before running `--full-refresh` on the target table, as it uses the MV definitions from ClickHouse. |
| 281 | +::: |
| 282 | + |
| 283 | +### Changing the Target Table {#external-target-changing} |
| 284 | + |
| 285 | +You cannot change the target table of an MV without a `--full-refresh`. If you try to run `dbt run` after changing the `materialization_target_table()` reference, the build will fail with an error message indicating that the target has changed. |
| 286 | + |
| 287 | +To change the target: |
| 288 | +1. Update the `materialization_target_table()` call |
| 289 | +2. Run `dbt run --full-refresh -s your_mv_model` |
| 290 | + |
| 291 | +### Migration from Standard MVs {#external-target-migration} |
| 292 | + |
| 293 | +To migrate from the standard MV approach to external target tables: |
| 294 | + |
| 295 | +1. **Create the target table model** with the same schema as your current MV target |
| 296 | +2. **Update your MV models** to use `materialization_target_table()` |
| 297 | +3. **Run with `--full-refresh`** to recreate the MVs with `TO` clauses |
| 298 | + |
| 299 | +:::note |
| 300 | +The old target table (created by the standard MV approach) will not be automatically dropped. You may need to clean it up manually after verifying the migration. |
| 301 | +::: |
0 commit comments