|
| 1 | +--- |
| 2 | +slug: /use-cases/data-lake/rest-catalog |
| 3 | +sidebar_label: 'REST Catalog' |
| 4 | +title: 'REST Catalog' |
| 5 | +pagination_prev: null |
| 6 | +pagination_next: null |
| 7 | +description: 'In this guide, we will walk you through the steps to query |
| 8 | + your data in S3 buckets using ClickHouse and the REST Catalog.' |
| 9 | +keywords: ['REST', 'Tabular', 'Data Lake', 'Iceberg'] |
| 10 | +show_related_blogs: true |
| 11 | +--- |
| 12 | + |
| 13 | +import ExperimentalBadge from '@theme/badges/ExperimentalBadge'; |
| 14 | + |
| 15 | +<ExperimentalBadge/> |
| 16 | + |
| 17 | +:::note |
| 18 | +Integration with the REST Catalog works with Iceberg tables only. |
| 19 | +This integration supports both AWS S3 and other cloud storage providers. |
| 20 | +::: |
| 21 | + |
| 22 | +ClickHouse supports integration with multiple catalogs (Unity, Glue, REST, Polaris, etc.). This guide will walk you through the steps to query your data using ClickHouse and the [REST Catalog](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml/) specification. |
| 23 | + |
| 24 | +The REST Catalog is a standardized API specification for Iceberg catalogs, supported by various platforms including: |
| 25 | +- **Local development environments** (using docker-compose setups) |
| 26 | +- **Managed services** like Tabular.io |
| 27 | +- **Self-hosted** REST catalog implementations |
| 28 | + |
| 29 | +:::note |
| 30 | +As this feature is experimental, you will need to enable it using: |
| 31 | +`SET allow_experimental_database_rest_catalog = 1;` |
| 32 | +::: |
| 33 | + |
| 34 | +## Local Development Setup {#local-development-setup} |
| 35 | + |
| 36 | +For local development and testing, you can use a containerized REST catalog setup. This approach is ideal for learning, prototyping, and development environments. |
| 37 | + |
| 38 | +### Prerequisites {#local-prerequisites} |
| 39 | + |
| 40 | +1. **Docker and Docker Compose**: Ensure Docker is installed and running |
| 41 | +2. **Sample Setup**: You can use various docker-compose setups (see Alternative Docker Images below) |
| 42 | + |
| 43 | +### Setting up Local REST Catalog {#setting-up-local-rest-catalog} |
| 44 | + |
| 45 | +You can use various containerized REST catalog implementations such as **[Databricks docker-spark-iceberg](https://github.com/databricks/docker-spark-iceberg/blob/main/docker-compose.yml?ref=blog.min.io)** which provides a complete Spark + Iceberg + REST catalog environment with docker-compose, making it ideal for testing Iceberg integrations. |
| 46 | + |
| 47 | +You'll need to add ClickHouse as a dependency in your docker-compose setup: |
| 48 | + |
| 49 | +```yaml |
| 50 | +clickhouse: |
| 51 | + image: clickhouse/clickhouse-server:main |
| 52 | + container_name: clickhouse |
| 53 | + user: '0:0' # Ensures root permissions |
| 54 | + networks: |
| 55 | + iceberg_net: |
| 56 | + ports: |
| 57 | + - "8123:8123" |
| 58 | + - "9002:9000" |
| 59 | + volumes: |
| 60 | + - ./clickhouse:/var/lib/clickhouse |
| 61 | + - ./clickhouse/data_import:/var/lib/clickhouse/data_import # Mount dataset folder |
| 62 | + networks: |
| 63 | + - iceberg_net |
| 64 | + environment: |
| 65 | + - CLICKHOUSE_DB=default |
| 66 | + - CLICKHOUSE_USER=default |
| 67 | + - CLICKHOUSE_DO_NOT_CHOWN=1 |
| 68 | + - CLICKHOUSE_PASSWORD= |
| 69 | +``` |
| 70 | +
|
| 71 | +### Connecting to Local REST Catalog {#connecting-to-local-rest-catalog} |
| 72 | +
|
| 73 | +Connect to your ClickHouse container: |
| 74 | +
|
| 75 | +```bash |
| 76 | +docker exec -it clickhouse clickhouse-client |
| 77 | +``` |
| 78 | + |
| 79 | +Then create the database connection to the REST catalog: |
| 80 | + |
| 81 | +```sql |
| 82 | +CREATE DATABASE demo |
| 83 | +ENGINE = DataLakeCatalog('http://rest:8181/v1', 'admin', 'password') |
| 84 | +SETTINGS |
| 85 | + catalog_type = 'rest', |
| 86 | + storage_endpoint = 'http://minio:9000/lakehouse', |
| 87 | + warehouse = 'demo' |
| 88 | +``` |
| 89 | + |
| 90 | +## Querying REST catalog tables using ClickHouse {#querying-rest-catalog-tables-using-clickhouse} |
| 91 | + |
| 92 | +Now that the connection is in place, you can start querying via the REST catalog. For example: |
| 93 | + |
| 94 | +```sql |
| 95 | +USE demo; |
| 96 | + |
| 97 | +SHOW TABLES; |
| 98 | +``` |
| 99 | + |
| 100 | +```sql title="Response" |
| 101 | +┌─name──────────┐ |
| 102 | +│ default.taxis │ |
| 103 | +└───────────────┘ |
| 104 | +``` |
| 105 | + |
| 106 | +To query a table: |
| 107 | + |
| 108 | +```sql |
| 109 | +SELECT count(*) FROM `default.taxis`; |
| 110 | +``` |
| 111 | + |
| 112 | +```sql title="Response" |
| 113 | +┌─count()─┐ |
| 114 | +│ 2171187 │ |
| 115 | +└─────────┘ |
| 116 | +``` |
| 117 | + |
| 118 | +:::note Backticks required |
| 119 | +Backticks are required because ClickHouse doesn't support more than one namespace. |
| 120 | +::: |
| 121 | + |
| 122 | +To inspect the table DDL: |
| 123 | + |
| 124 | +```sql |
| 125 | +SHOW CREATE TABLE `default.taxis`; |
| 126 | +``` |
| 127 | + |
| 128 | +```sql title="Response" |
| 129 | +┌─statement─────────────────────────────────────────────────────────────────────────────────────┐ |
| 130 | +│ CREATE TABLE demo.`default.taxis` │ |
| 131 | +│ ( │ |
| 132 | +│ `VendorID` Nullable(Int64), │ |
| 133 | +│ `tpep_pickup_datetime` Nullable(DateTime64(6)), │ |
| 134 | +│ `tpep_dropoff_datetime` Nullable(DateTime64(6)), │ |
| 135 | +│ `passenger_count` Nullable(Float64), │ |
| 136 | +│ `trip_distance` Nullable(Float64), │ |
| 137 | +│ `RatecodeID` Nullable(Float64), │ |
| 138 | +│ `store_and_fwd_flag` Nullable(String), │ |
| 139 | +│ `PULocationID` Nullable(Int64), │ |
| 140 | +│ `DOLocationID` Nullable(Int64), │ |
| 141 | +│ `payment_type` Nullable(Int64), │ |
| 142 | +│ `fare_amount` Nullable(Float64), │ |
| 143 | +│ `extra` Nullable(Float64), │ |
| 144 | +│ `mta_tax` Nullable(Float64), │ |
| 145 | +│ `tip_amount` Nullable(Float64), │ |
| 146 | +│ `tolls_amount` Nullable(Float64), │ |
| 147 | +│ `improvement_surcharge` Nullable(Float64), │ |
| 148 | +│ `total_amount` Nullable(Float64), │ |
| 149 | +│ `congestion_surcharge` Nullable(Float64), │ |
| 150 | +│ `airport_fee` Nullable(Float64) │ |
| 151 | +│ ) │ |
| 152 | +│ ENGINE = Iceberg('http://minio:9000/lakehouse/warehouse/default/taxis/', 'admin', '[HIDDEN]') │ |
| 153 | +└───────────────────────────────────────────────────────────────────────────────────────────────┘ |
| 154 | +``` |
| 155 | + |
| 156 | +## Loading data from your Data Lake into ClickHouse {#loading-data-from-your-data-lake-into-clickhouse} |
| 157 | + |
| 158 | +If you need to load data from the REST catalog into ClickHouse, start by creating a local ClickHouse table: |
| 159 | + |
| 160 | +```sql |
| 161 | +CREATE TABLE taxis |
| 162 | +( |
| 163 | + `VendorID` Int64, |
| 164 | + `tpep_pickup_datetime` DateTime64(6), |
| 165 | + `tpep_dropoff_datetime` DateTime64(6), |
| 166 | + `passenger_count` Float64, |
| 167 | + `trip_distance` Float64, |
| 168 | + `RatecodeID` Float64, |
| 169 | + `store_and_fwd_flag` String, |
| 170 | + `PULocationID` Int64, |
| 171 | + `DOLocationID` Int64, |
| 172 | + `payment_type` Int64, |
| 173 | + `fare_amount` Float64, |
| 174 | + `extra` Float64, |
| 175 | + `mta_tax` Float64, |
| 176 | + `tip_amount` Float64, |
| 177 | + `tolls_amount` Float64, |
| 178 | + `improvement_surcharge` Float64, |
| 179 | + `total_amount` Float64, |
| 180 | + `congestion_surcharge` Float64, |
| 181 | + `airport_fee` Float64 |
| 182 | +) |
| 183 | +ENGINE = MergeTree() |
| 184 | +PARTITION BY toYYYYMM(tpep_pickup_datetime) |
| 185 | +ORDER BY (VendorID, tpep_pickup_datetime, PULocationID, DOLocationID); |
| 186 | +``` |
| 187 | + |
| 188 | +Then load the data from your REST catalog table via an `INSERT INTO SELECT`: |
| 189 | + |
| 190 | +```sql |
| 191 | +INSERT INTO taxis |
| 192 | +SELECT * FROM demo.`default.taxis`; |
| 193 | +``` |
0 commit comments