Skip to content

Commit 562acc9

Browse files
authored
Merge pull request #4885 from ClickHouse/dbt-clickhouse/dbt-1.10-support
[dbt] notes on dbt 1.10 support
2 parents 1b318ab + 02ede43 commit 562acc9

File tree

2 files changed

+75
-7
lines changed

2 files changed

+75
-7
lines changed

docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md

Lines changed: 74 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -126,8 +126,8 @@ without `on cluster` clause for this model.
126126
#### Read-after-write Consistency {#read-after-write-consistency}
127127

128128
dbt relies on a read-after-insert consistency model. This is not compatible with ClickHouse clusters that have more than one replica if you cannot guarantee that all operations will go to the same replica. You may not encounter problems in your day-to-day usage of dbt, but there are some strategies depending on your cluster to have this guarantee in place:
129-
- If you are using a ClickHouse Cloud cluster, you only need to set `select_sequential_consistency: 1` in your profile's `custom_settings` property. You can find more information about this setting [here](https://clickhouse.com/docs/operations/settings/settings#select_sequential_consistency).
130-
- If you are using a self-hosted cluster, make sure all dbt requests are sent to the same ClickHouse replica. If you have a load balancer on top of it, try using some `replica aware routing`/`sticky sessions` mechanism to be able to always reach the same replica. Adding the setting `select_sequential_consistency = 1` in clusters outside ClickHouse Cloud is [not recommended](https://clickhouse.com/docs/operations/settings/settings#select_sequential_consistency).
129+
- If you are using a ClickHouse Cloud cluster, you only need to set `select_sequential_consistency: 1` in your profile's `custom_settings` property. You can find more information about this setting [here](/operations/settings/settings#select_sequential_consistency).
130+
- If you are using a self-hosted cluster, make sure all dbt requests are sent to the same ClickHouse replica. If you have a load balancer on top of it, try using some `replica aware routing`/`sticky sessions` mechanism to be able to always reach the same replica. Adding the setting `select_sequential_consistency = 1` in clusters outside ClickHouse Cloud is [not recommended](/operations/settings/settings#select_sequential_consistency).
131131

132132
## General information about features {#general-information-about-features}
133133

@@ -269,7 +269,7 @@ group by event_type
269269

270270
### Materialization: view {#materialization-view}
271271

272-
A dbt model can be created as a [ClickHouse view](https://clickhouse.com/docs/en/sql-reference/table-functions/view/)
272+
A dbt model can be created as a [ClickHouse view](/sql-reference/table-functions/view/)
273273
and configured using the following syntax:
274274

275275
Project File (`dbt_project.yml`):
@@ -286,7 +286,7 @@ Or config block (`models/<model_name>.sql`):
286286

287287
### Materialization: table {#materialization-table}
288288

289-
A dbt model can be created as a [ClickHouse table](https://clickhouse.com/docs/en/operations/system-tables/tables/) and
289+
A dbt model can be created as a [ClickHouse table](/operations/system-tables/tables/) and
290290
configured using the following syntax:
291291

292292
Project File (`dbt_project.yml`):
@@ -500,7 +500,7 @@ If you prefer not to preload historical data during MV creation, you can disable
500500

501501
#### Refreshable Materialized Views {#refreshable-materialized-views}
502502

503-
To use [Refreshable Materialized View](https://clickhouse.com/docs/en/materialized-view/refreshable-materialized-view),
503+
To use [Refreshable Materialized View](/materialized-view/refreshable-materialized-view),
504504
please adjust the following configs as needed in your MV model (all these configs are supposed to be set inside a
505505
refreshable config object):
506506

@@ -711,7 +711,7 @@ keys used to populate the parameters of the S3 table function:
711711
| structure | The column structure of the data in bucket, as a list of name/datatype pairs, such as `['id UInt32', 'date DateTime', 'value String']` If not provided ClickHouse will infer the structure. |
712712
| aws_access_key_id | The S3 access key id. |
713713
| aws_secret_access_key | The S3 secret key. |
714-
| role_arn | The ARN of a ClickhouseAccess IAM role to use to securely access the S3 objects. See this [documentation](https://clickhouse.com/docs/en/cloud/security/secure-s3) for more information. |
714+
| role_arn | The ARN of a ClickhouseAccess IAM role to use to securely access the S3 objects. See this [documentation](/cloud/data-sources/secure-s3) for more information. |
715715
| compression | The compression method used with the S3 objects. If not provided ClickHouse will attempt to determine compression based on the file name. |
716716

717717
See
@@ -727,3 +727,71 @@ dbt-clickhouse supports most of the cross database macros now included in `dbt C
727727
interpreted as a string, not a column name
728728
* Similarly, the `replace` SQL function in ClickHouse requires constant strings for the `old_chars` and `new_chars`
729729
parameters, so those parameters will be interpreted as strings rather than column names when invoking this macro.
730+
731+
## Catalog Support {#catalog-support}
732+
733+
### dbt Catalog Integration Status {#dbt-catalog-integration-status}
734+
735+
dbt Core v1.10 introduced catalog integration support, which allows adapters to materialize models into external catalogs that manage open table formats like Apache Iceberg. **This feature is not yet natively implemented in dbt-clickhouse.** You can track the progress of this feature implementation in [GitHub issue #489](https://github.com/ClickHouse/dbt-clickhouse/issues/489).
736+
737+
### ClickHouse Catalog Support {#clickhouse-catalog-support}
738+
739+
ClickHouse recently added native support for Apache Iceberg tables and data catalogs. Most of the features are still `experimental`, but you can already use them if you use a recent ClickHouse version.
740+
741+
* You can use ClickHouse to **query Iceberg tables stored in object storage** (S3, Azure Blob Storage, Google Cloud Storage) using the [Iceberg table engine](/engines/table-engines/integrations/iceberg) and [iceberg table function](/sql-reference/table-functions/iceberg).
742+
743+
* Additionally, ClickHouse provides the [DataLakeCatalog database engine](/engines/database-engines/datalakecatalog), which enables **connection to external data catalogs** including AWS Glue Catalog, Databricks Unity Catalog, Hive Metastore, and REST Catalogs. This allows you to query open table format data (Iceberg, Delta Lake) directly from external catalogs without data duplication.
744+
745+
### Workarounds for Working with Iceberg and Catalogs {#workarounds-iceberg-catalogs}
746+
747+
You can read data from Iceberg tables or catalogs from your dbt project if you have already defined them in your ClickHouse cluster with the tools defined above. You can leverage the `source` functionality in dbt to reference these tables in your dbt projects. For example, if you want to access your tables in a REST Catalog, you can:
748+
749+
1. **Create a database pointing to an external catalog:**
750+
751+
```sql
752+
-- Example with REST Catalog
753+
SET allow_experimental_database_iceberg = 1;
754+
755+
CREATE DATABASE iceberg_catalog
756+
ENGINE = DataLakeCatalog('http://rest:8181/v1', 'admin', 'password')
757+
SETTINGS
758+
catalog_type = 'rest',
759+
storage_endpoint = 'http://minio:9000/lakehouse',
760+
warehouse = 'demo'
761+
```
762+
763+
2. **Define the catalog database and its tables as sources in dbt:** remember that the tables should already be available in ClickHouse
764+
765+
```yaml
766+
version: 2
767+
768+
sources:
769+
- name: external_catalog
770+
database: iceberg_catalog
771+
tables:
772+
- name: orders
773+
- name: customers
774+
```
775+
776+
3. **Use the catalog tables in your dbt models:**
777+
778+
```sql
779+
SELECT
780+
o.order_id,
781+
c.customer_name,
782+
o.order_date
783+
FROM {{ source('external_catalog', 'orders') }} o
784+
INNER JOIN {{ source('external_catalog', 'customers') }} c
785+
ON o.customer_id = c.customer_id
786+
```
787+
788+
### Notes on the Workarounds {#benefits-workarounds}
789+
790+
The good things about these workarounds are:
791+
* You'll have immediate access to different external table types and external catalogs without waiting for native dbt catalog integration.
792+
* You'll have a seamless migration path when native catalog support becomes available.
793+
794+
But there are currently some limitations:
795+
* **Manual setup:** Iceberg tables and catalog databases must be created manually in ClickHouse before they can be referenced in dbt.
796+
* **No catalog-level DDL:** dbt cannot manage catalog-level operations like creating or dropping Iceberg tables in external catalogs. So you will not be able to create them right now from the dbt connector. Creating tables with the Iceberg() engines may be added in the future.
797+
* **Write operations:** Currently, writing into Iceberg/Data Catalog tables is limited. Check the ClickHouse documentation to understand which options are available.

docs/integrations/data-ingestion/etl-tools/dbt/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ List of supported features:
4949
- [x] ClickHouse-specific column configurations (Codec, TTL...)
5050
- [x] ClickHouse-specific table settings (indexes, projections...)
5151

52-
All features up to dbt-core 1.9 are supported. We will soon add the features added in dbt-core 1.10.
52+
All features up to dbt-core 1.10 are supported, including `--sample` flag and all deprecation warnings fixed for future releases. **Catalog integrations** (e.g., Iceberg) introduced in dbt 1.10 are not yet natively supported in the adapter, but workarounds are available. See the [Catalog Support section](/integrations/dbt/features-and-configurations#catalog-support) for details.
5353

5454
This adapter is still not available for use inside [dbt Cloud](https://docs.getdbt.com/docs/dbt-cloud/cloud-overview), but we expect to make it available soon. Please reach out to support to get more information on this.
5555

0 commit comments

Comments
 (0)