Skip to content

Commit 470bbc0

Browse files
authored
Add REST catalog support in docs (#1)
1 parent 3fbd0b4 commit 470bbc0

File tree

4 files changed

+199
-3
lines changed

4 files changed

+199
-3
lines changed

docs/integrations/index.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -244,6 +244,7 @@ We are actively compiling this list of ClickHouse integrations below, so it's no
244244
|RabbitMQ|<Rabbitmqsvg alt="RabbitMQ logo" style={{width: '3rem', 'height': '3rem'}}/>|Data ingestion|Allows ClickHouse to connect [RabbitMQ](https://www.rabbitmq.com/).|[Documentation](/engines/table-engines/integrations/rabbitmq)|
245245
|Redis|<Redissvg alt="Redis logo" style={{width: '3rem', 'height': '3rem'}}/>|Data ingestion|Allows ClickHouse to use [Redis](https://redis.io/) as a dictionary source.|[Documentation](/sql-reference/dictionaries/index.md#redis)|
246246
|Redpanda|<Image img={redpanda} alt="Redpanda logo" size="logo"/>|Data ingestion|Redpanda is the streaming data platform for developers. It's API-compatible with Apache Kafka, but 10x faster, much easier to use, and more cost effective|[Blog](https://redpanda.com/blog/real-time-olap-database-clickhouse-redpanda)|
247+
|REST Catalog||Data ingestion|Integration with REST Catalog specification for Iceberg tables, supporting multiple catalog providers including Tabular.io.|[Documentation](/use-cases/data_lake/rest-catalog)|
247248
|Rust|<Image img={rust} size="logo" alt="Rust logo"/>|Language client|A typed client for ClickHouse|[Documentation](/integrations/language-clients/rust.md)|
248249
|SQLite|<Sqlitesvg alt="Sqlite logo" style={{width: '3rem', 'height': '3rem'}}/>|Data ingestion|Allows to import and export data to SQLite and supports queries to SQLite tables directly from ClickHouse.|[Documentation](/engines/table-engines/integrations/sqlite)|
249250
|Superset|<Supersetsvg alt="Superset logo" style={{width: '3rem'}}/>|Data visualization|Explore and visualize your ClickHouse data with Apache Superset.|[Documentation](/integrations/data-visualization/superset-and-clickhouse.md)|

docs/use-cases/data_lake/index.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,13 @@ pagination_prev: null
44
pagination_next: null
55
slug: /use-cases/data-lake
66
title: 'Data Lake'
7-
keywords: ['data lake', 'glue', 'unity']
7+
keywords: ['data lake', 'glue', 'unity', 'rest']
88
---
99

10-
ClickHouse supports integration with multiple catalogs (Unity, Glue, Polaris, etc.).
10+
ClickHouse supports integration with multiple catalogs (Unity, Glue, REST, Polaris, etc.).
1111

1212
| Page | Description |
1313
|-----|-----|
1414
| [Querying data in S3 using ClickHouse and the Glue Data Catalog](/use-cases/data-lake/glue-catalog) | Query your data in S3 buckets using ClickHouse and the Glue Data Catalog. |
1515
| [Querying data in S3 using ClickHouse and the Unity Data Catalog](/use-cases/data-lake/unity-catalog) | Query your using the Unity Catalog. |
16+
| [Querying data in S3 using ClickHouse and the REST Catalog](/use-cases/data-lake/rest-catalog) | Query your data using the REST Catalog (Tabular.io). |
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
---
2+
slug: /use-cases/data-lake/rest-catalog
3+
sidebar_label: 'REST Catalog'
4+
title: 'REST Catalog'
5+
pagination_prev: null
6+
pagination_next: null
7+
description: 'In this guide, we will walk you through the steps to query
8+
your data in S3 buckets using ClickHouse and the REST Catalog.'
9+
keywords: ['REST', 'Tabular', 'Data Lake', 'Iceberg']
10+
show_related_blogs: true
11+
---
12+
13+
import ExperimentalBadge from '@theme/badges/ExperimentalBadge';
14+
15+
<ExperimentalBadge/>
16+
17+
:::note
18+
Integration with the REST Catalog works with Iceberg tables only.
19+
This integration supports both AWS S3 and other cloud storage providers.
20+
:::
21+
22+
ClickHouse supports integration with multiple catalogs (Unity, Glue, REST, Polaris, etc.). This guide will walk you through the steps to query your data using ClickHouse and the [REST Catalog](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml/) specification.
23+
24+
The REST Catalog is a standardized API specification for Iceberg catalogs, supported by various platforms including:
25+
- **Local development environments** (using docker-compose setups)
26+
- **Managed services** like Tabular.io
27+
- **Self-hosted** REST catalog implementations
28+
29+
:::note
30+
As this feature is experimental, you will need to enable it using:
31+
`SET allow_experimental_database_rest_catalog = 1;`
32+
:::
33+
34+
## Local Development Setup {#local-development-setup}
35+
36+
For local development and testing, you can use a containerized REST catalog setup. This approach is ideal for learning, prototyping, and development environments.
37+
38+
### Prerequisites {#local-prerequisites}
39+
40+
1. **Docker and Docker Compose**: Ensure Docker is installed and running
41+
2. **Sample Setup**: You can use various docker-compose setups (see Alternative Docker Images below)
42+
43+
### Setting up Local REST Catalog {#setting-up-local-rest-catalog}
44+
45+
You can use various containerized REST catalog implementations such as **[Databricks docker-spark-iceberg](https://github.com/databricks/docker-spark-iceberg/blob/main/docker-compose.yml?ref=blog.min.io)** which provides a complete Spark + Iceberg + REST catalog environment with docker-compose, making it ideal for testing Iceberg integrations.
46+
47+
You'll need to add ClickHouse as a dependency in your docker-compose setup:
48+
49+
```yaml
50+
clickhouse:
51+
image: clickhouse/clickhouse-server:main
52+
container_name: clickhouse
53+
user: '0:0' # Ensures root permissions
54+
networks:
55+
iceberg_net:
56+
ports:
57+
- "8123:8123"
58+
- "9002:9000"
59+
volumes:
60+
- ./clickhouse:/var/lib/clickhouse
61+
- ./clickhouse/data_import:/var/lib/clickhouse/data_import # Mount dataset folder
62+
networks:
63+
- iceberg_net
64+
environment:
65+
- CLICKHOUSE_DB=default
66+
- CLICKHOUSE_USER=default
67+
- CLICKHOUSE_DO_NOT_CHOWN=1
68+
- CLICKHOUSE_PASSWORD=
69+
```
70+
71+
### Connecting to Local REST Catalog {#connecting-to-local-rest-catalog}
72+
73+
Connect to your ClickHouse container:
74+
75+
```bash
76+
docker exec -it clickhouse clickhouse-client
77+
```
78+
79+
Then create the database connection to the REST catalog:
80+
81+
```sql
82+
CREATE DATABASE demo
83+
ENGINE = DataLakeCatalog('http://rest:8181/v1', 'admin', 'password')
84+
SETTINGS
85+
catalog_type = 'rest',
86+
storage_endpoint = 'http://minio:9000/lakehouse',
87+
warehouse = 'demo'
88+
```
89+
90+
## Querying REST catalog tables using ClickHouse {#querying-rest-catalog-tables-using-clickhouse}
91+
92+
Now that the connection is in place, you can start querying via the REST catalog. For example:
93+
94+
```sql
95+
USE demo;
96+
97+
SHOW TABLES;
98+
```
99+
100+
```sql title="Response"
101+
┌─name──────────┐
102+
default.taxis
103+
└───────────────┘
104+
```
105+
106+
To query a table:
107+
108+
```sql
109+
SELECT count(*) FROM `default.taxis`;
110+
```
111+
112+
```sql title="Response"
113+
┌─count()─┐
114+
2171187
115+
└─────────┘
116+
```
117+
118+
:::note Backticks required
119+
Backticks are required because ClickHouse doesn't support more than one namespace.
120+
:::
121+
122+
To inspect the table DDL:
123+
124+
```sql
125+
SHOW CREATE TABLE `default.taxis`;
126+
```
127+
128+
```sql title="Response"
129+
┌─statement─────────────────────────────────────────────────────────────────────────────────────┐
130+
│ CREATE TABLE demo.`default.taxis`
131+
│ ( │
132+
`VendorID` Nullable(Int64), │
133+
`tpep_pickup_datetime` Nullable(DateTime64(6)), │
134+
`tpep_dropoff_datetime` Nullable(DateTime64(6)), │
135+
`passenger_count` Nullable(Float64), │
136+
`trip_distance` Nullable(Float64), │
137+
`RatecodeID` Nullable(Float64), │
138+
`store_and_fwd_flag` Nullable(String), │
139+
`PULocationID` Nullable(Int64), │
140+
`DOLocationID` Nullable(Int64), │
141+
`payment_type` Nullable(Int64), │
142+
`fare_amount` Nullable(Float64), │
143+
`extra` Nullable(Float64), │
144+
`mta_tax` Nullable(Float64), │
145+
`tip_amount` Nullable(Float64), │
146+
`tolls_amount` Nullable(Float64), │
147+
`improvement_surcharge` Nullable(Float64), │
148+
`total_amount` Nullable(Float64), │
149+
`congestion_surcharge` Nullable(Float64), │
150+
`airport_fee` Nullable(Float64) │
151+
│ ) │
152+
│ ENGINE = Iceberg('http://minio:9000/lakehouse/warehouse/default/taxis/', 'admin', '[HIDDEN]') │
153+
└───────────────────────────────────────────────────────────────────────────────────────────────┘
154+
```
155+
156+
## Loading data from your Data Lake into ClickHouse {#loading-data-from-your-data-lake-into-clickhouse}
157+
158+
If you need to load data from the REST catalog into ClickHouse, start by creating a local ClickHouse table:
159+
160+
```sql
161+
CREATE TABLE taxis
162+
(
163+
`VendorID` Int64,
164+
`tpep_pickup_datetime` DateTime64(6),
165+
`tpep_dropoff_datetime` DateTime64(6),
166+
`passenger_count` Float64,
167+
`trip_distance` Float64,
168+
`RatecodeID` Float64,
169+
`store_and_fwd_flag` String,
170+
`PULocationID` Int64,
171+
`DOLocationID` Int64,
172+
`payment_type` Int64,
173+
`fare_amount` Float64,
174+
`extra` Float64,
175+
`mta_tax` Float64,
176+
`tip_amount` Float64,
177+
`tolls_amount` Float64,
178+
`improvement_surcharge` Float64,
179+
`total_amount` Float64,
180+
`congestion_surcharge` Float64,
181+
`airport_fee` Float64
182+
)
183+
ENGINE = MergeTree()
184+
PARTITION BY toYYYYMM(tpep_pickup_datetime)
185+
ORDER BY (VendorID, tpep_pickup_datetime, PULocationID, DOLocationID);
186+
```
187+
188+
Then load the data from your REST catalog table via an `INSERT INTO SELECT`:
189+
190+
```sql
191+
INSERT INTO taxis
192+
SELECT * FROM demo.`default.taxis`;
193+
```

sidebars.js

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,8 @@ const sidebars = {
167167
link: { type: "doc", id: "use-cases/data_lake/index" },
168168
items: [
169169
"use-cases/data_lake/glue_catalog",
170-
"use-cases/data_lake/unity_catalog"
170+
"use-cases/data_lake/unity_catalog",
171+
"use-cases/data_lake/rest_catalog"
171172
]
172173
}
173174
]

0 commit comments

Comments
 (0)