Skip to content

Commit 1ca009e

Browse files
kbatuigasFeediver1
andauthored
How to query Iceberg topics using Snowflake and Open Catalog (#957)
Co-authored-by: Joyce Fee <[email protected]>
1 parent f585b97 commit 1ca009e

File tree

4 files changed

+229
-3
lines changed

4 files changed

+229
-3
lines changed

modules/ROOT/nav.adoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -170,12 +170,14 @@
170170
*** xref:manage:security/iam-roles.adoc[]
171171
** xref:manage:tiered-storage-linux/index.adoc[Tiered Storage]
172172
*** xref:manage:tiered-storage.adoc[]
173-
*** xref:manage:topic-iceberg-integration.adoc[Iceberg topics]
174173
*** xref:manage:fast-commission-decommission.adoc[]
175174
*** xref:manage:mountable-topics.adoc[]
176175
*** xref:manage:remote-read-replicas.adoc[Remote Read Replicas]
177176
*** xref:manage:topic-recovery.adoc[Topic Recovery]
178177
*** xref:manage:whole-cluster-restore.adoc[Whole Cluster Restore]
178+
** xref:manage:iceberg/index.adoc[Iceberg]
179+
*** xref:manage:iceberg/topic-iceberg-integration.adoc[Iceberg topics]
180+
*** xref:manage:iceberg/redpanda-topics-iceberg-snowflake-catalog.adoc[Query Iceberg topics with Snowflake]
179181
** xref:manage:schema-reg/index.adoc[Schema Registry]
180182
*** xref:manage:schema-reg/schema-reg-overview.adoc[Overview]
181183
*** xref:manage:schema-reg/manage-schema-reg.adoc[]
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
= Integrate Redpanda with Iceberg
2+
:description: Generate Iceberg tables for your Redpanda topics for data lakehouse access.
3+
:page-layout: index
Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
= Query Iceberg Topics using Snowflake and Open Catalog
2+
:description: Add Redpanda topics as Iceberg tables that you can query in Snowflake using an Open Catalog integration.
3+
:page-categories: Iceberg, Tiered Storage, Management, High Availability, Data Replication, Integration
4+
:page-beta: true
5+
6+
[NOTE]
7+
====
8+
include::shared:partial$enterprise-license.adoc[]
9+
====
10+
11+
This guide walks you through querying Redpanda topics as Iceberg tables in https://docs.snowflake.com/en/user-guide/tables-iceberg[Snowflake^], with AWS S3 as object storage and a catalog integration using https://other-docs.snowflake.com/en/opencatalog/overview[Open Catalog^].
12+
13+
== Prerequisites
14+
15+
* xref:manage:tiered-storage.adoc#configure-object-storage[Object storage configured] for your cluster and xref:manage:tiered-storage.adoc#enable-tiered-storage[Tiered Storage enabled] for the topics for which you want to generate Iceberg tables.
16+
** The S3 bucket URI so that you can configure it as external storage for Open Catalog.
17+
* A Snowflake account.
18+
* An Open Catalog account. To https://other-docs.snowflake.com/en/opencatalog/create-open-catalog-account[create an Open Catalog account], you require ORGADMIN access in Snowflake.
19+
* An internal catalog created in Open Catalog with your Tiered Storage AWS S3 bucket configured as external storage.
20+
** Follow this guide to https://other-docs.snowflake.com/en/opencatalog/create-catalog#create-a-catalog-using-amazon-simple-storage-service-amazon-s3[create a catalog] with the S3 bucket configured as external storage. You require admin permissions to carry out these steps in AWS:
21+
. If you don't already have one, create an IAM policy that gives Open Catalog read and write access to your S3 bucket.
22+
. Create an IAM role and attach the IAM policy to the role.
23+
. After creating a new catalog in Open Catalog, grant the catalog's AWS IAM user access to the S3 bucket.
24+
* A Snowflake https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume[external volume] set up using the Tiered Storage bucket.
25+
** Follow this guide to https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume-s3[configure the external volume with S3]. You can use the same IAM policy as the catalog for the external volume's IAM role and user.
26+
27+
== Set up catalog integration using Open Catalog
28+
29+
=== Create a new Open Catalog service connection for Redpanda
30+
31+
To create a new service connection to integrate the Iceberg-enabled topics into Open Catalog:
32+
33+
. In Open Catalog, select *Connections*, then *+ Connection*.
34+
. In *Configure Service Connection*, provide a name. Open Catalog creates a new principal with this name.
35+
. Make sure *Create new principal role* is toggled on.
36+
. Enter a name for the principal role. Then, click *Create*.
37+
38+
After you create the connection, you are provided the client ID and client secret. Save these credentials to add to your cluster configuration in a later step.
39+
40+
=== Create a catalog role
41+
42+
Grant privileges to the principal created in the previous step:
43+
44+
. In Open Catalog, select *Catalogs*, and select your catalog.
45+
. On the *Roles* tab of your catalog, click *+ Catalog Role*.
46+
. Give the catalog role a name.
47+
. Under *Privileges*, select `CATALOG_MANAGE_CONTENT`. This provides full management https://other-docs.snowflake.com/en/opencatalog/access-control#catalog-privileges[privileges] for the catalog. Then, click *Create*.
48+
. On the *Roles* tab of the catalog, click *Grant to Principal Role*.
49+
. Select the catalog role you just created.
50+
. Select the principal role you created earlier. Click *Grant*.
51+
52+
=== Update cluster configuration
53+
54+
To configure your Redpanda cluster to enable Iceberg on a topic, as well as set up the integration with Open Catalog:
55+
56+
. Edit your cluster configuration to set the `iceberg_enabled` property to `true`, and set the catalog integration properties listed in the example below. You must restart your cluster if you change this configuration for a running cluster. You can run `rpk cluster config edit` to update these properties:
57+
+
58+
[,bash]
59+
----
60+
iceberg_enabled: true
61+
iceberg_catalog_type: rest
62+
iceberg_rest_catalog_endpoint: https://<snowflake-orgname>-<open-catalog-account-name>.snowflakecomputing.com/polaris/api/catalog
63+
iceberg_rest_catalog_client_id: <open-catalog-connection-client-id>
64+
iceberg_rest_catalog_client_secret: <open-catalog-connection-client-secret>
65+
iceberg_rest_catalog_prefix: <open-catalog-name>
66+
67+
# Optional
68+
iceberg_translation_interval_ms_default: 1000
69+
iceberg_catalog_commit_interval_ms: 1000
70+
----
71+
+
72+
Use your own values for the following placeholders:
73+
+
74+
--
75+
- `<snowflake-orgname>` and `<open-catalog-account-name>`: Your https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration-open-catalog#required-parameters[Open Catalog account URI] is composed of these values.
76+
+
77+
TIP: In Snowflake, navigate to **Admin**, then **Accounts**. Click the ellipsis near your Open Catalog account name, and select **Manage URLs**. The **Current URL** contains `<snowflake-orgname>` and `<open-catalog-account-name>`.
78+
- `<open-catalog-connection-client-id>`: The client ID of the service connection you created in an earlier step.
79+
- `<open-catalog-connection-client-secret>`: The client secret of the service connection you created in an earlier step.
80+
- `<open-catalog-name>`: The name of your catalog in Open Catalog.
81+
--
82+
+
83+
[,bash,role=no-copy]
84+
----
85+
Successfully updated configuration. New configuration version is 2.
86+
----
87+
88+
. You must restart your cluster so that the configuration changes take effect.
89+
90+
. Enable the integration for a topic by configuring the topic property `redpanda.iceberg.mode`. This mode creates an Iceberg table for the topic consisting of two columns, one for the record metadata including the key, and another binary column for the record's value. See xref:manage:iceberg/topic-iceberg-integration.adoc#enable-iceberg-integration[Enable Iceberg integration] for more details on Iceberg modes. The following examples show how to use xref:get-started:rpk-install.adoc[`rpk`] to either create a new topic, or alter the configuration for an existing topic, to set the Iceberg mode to `key_value`.
91+
+
92+
.Create a new topic and set `redpanda.iceberg.mode`:
93+
[,bash]
94+
----
95+
rpk topic create <topic-name> --topic-config=redpanda.iceberg.mode=key_value
96+
----
97+
+
98+
.Set `redpanda.iceberg.mode` for an existing topic:
99+
[,bash]
100+
----
101+
rpk topic alter-config <topic-name> --set redpanda.iceberg.mode=key_value
102+
----
103+
104+
. Produce to the topic. For example,
105+
+
106+
[,bash]
107+
----
108+
echo "hello world\nfoo bar\nbaz qux" | rpk topic produce <topic-name> --format='%k %v\n'
109+
----
110+
111+
You should see the topic as a table in Open Catalog.
112+
113+
. In Open Catalog, select *Catalogs*, then open your catalog.
114+
. Under your catalog, you should see the `redpanda` namespace, and a table with the name of your topic. The `redpanda` namespace and the table are automatically added for you.
115+
116+
== Query Iceberg table in Snowflake
117+
118+
To query the topic in Snowflake, you must create a https://docs.snowflake.com/en/user-guide/tables-iceberg#catalog-integration[catalog integration^] so that Snowflake has access to the table data and metadata.
119+
120+
=== Configure catalog integration with Snowflake
121+
122+
. Run the https://docs.snowflake.com/sql-reference/sql/create-catalog-integration-open-catalog[`CREATE CATALOG INTEGRATION`] command in Snowflake:
123+
+
124+
[,sql]
125+
----
126+
CREATE CATALOG INTEGRATION <catalog-integration-name>
127+
CATALOG_SOURCE = POLARIS
128+
TABLE_FORMAT = ICEBERG
129+
CATALOG_NAMESPACE = 'redpanda'
130+
REST_CONFIG = (
131+
CATALOG_URI = '<open-catalog-uri>'
132+
WAREHOUSE = '<open-catalog-name>'
133+
)
134+
REST_AUTHENTICATION = (
135+
TYPE = OAUTH
136+
OAUTH_CLIENT_ID = '<open-catalog-connection-client-id>'
137+
OAUTH_CLIENT_SECRET = '<open-catalog-connection-client-secret>'
138+
OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
139+
)
140+
REFRESH_INTERVAL_SECONDS = 30
141+
ENABLED = TRUE;
142+
----
143+
+
144+
Use your own values for the following placeholders:
145+
+
146+
- `<catalog-integration-name>`: Provide a name for your Iceberg catalog integration in Snowflake.
147+
- `<open-catalog-uri>`: Your https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration-open-catalog#required-parameters[Open Catalog account URI] (`https://<snowflake-orgname>-<account-name>.snowflakecomputing.com/polaris/api/catalog`).
148+
- `<open-catalog-name>`: The name of your catalog in Open Catalog.
149+
- `<open-catalog-connection-client-id>`: The client ID of the service connection you created in an earlier step.
150+
- `<open-catalog-connection-client-secret>`: The client secret of the service connection you created in an earlier step.
151+
152+
. Run the following command to verify that the catalog is integrated correctly:
153+
+
154+
[,sql]
155+
----
156+
SELECT SYSTEM$LIST_ICEBERG_TABLES_FROM_CATALOG('<catalog-integration-name>');
157+
----
158+
+
159+
[,bash,role="no-copy no-placeholders"]
160+
----
161+
# Example result for redpanda.iceberg.mode=key_value
162+
+-----------------------------------------------------------------------+
163+
| SYSTEM$LIST_ICEBERG_TABLES_FROM_CATALOG('<catalog_integration_name>') |
164+
+-----------------------------------------------------------------------+
165+
| [{"namespace":"redpanda","name":"<table_name>"}] |
166+
+-----------------------------------------------------------------------+
167+
----
168+
169+
=== Create Iceberg table in Snowflake
170+
171+
After creating the catalog integration, you must create an externally-managed table in Snowflake. You must run your Snowflake queries against this table.
172+
173+
In your Snowflake database, run the https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-rest[CREATE ICEBERG TABLE] command. The following example also specifies that the table should automatically refresh metadata:
174+
175+
[,sql]
176+
----
177+
CREATE ICEBERG TABLE <table-name>
178+
CATALOG = '<catalog-integration-name>'
179+
EXTERNAL_VOLUME = '<iceberg-external-volume-name>'
180+
CATALOG_TABLE_NAME = '<topic-name>'
181+
AUTO_REFRESH = TRUE
182+
----
183+
184+
Use your own values for the following placeholders:
185+
186+
- `<table-name>`: Provide a name for your table in Snowflake.
187+
- `<catalog-integration-name>`: The name of the catalog integration you configured in an earlier step.
188+
- `<iceberg-external-volume-name>`: The name of the external volume you configured using the Tiered Storage bucket.
189+
- `<topic-name>`: The name of the table in your catalog, which is the same as your Redpanda topic name.
190+
191+
=== Query table
192+
193+
To verify that Snowflake has successfully created the table containing the topic data, run the following:
194+
195+
[,sql]
196+
----
197+
SELECT * FROM <table-name>;
198+
----
199+
200+
Your query results should look like the following:
201+
202+
[,bash,role=no-copy]
203+
----
204+
# Example for redpanda.iceberg.mode=key_value with 3 records produced to topic
205+
206+
+--------------------------------------------------------------------------------------------------------------+------------+
207+
| REDPANDA | VALUE |
208+
+--------------------------------------------------------------------------------------------------------------+------------+
209+
| { "partition": 0, "offset": 0, "timestamp": "2025-02-07 16:29:50.122", "headers": null, "key": "68656C6C6F"} | 776F726C64 |
210+
| { "partition": 0, "offset": 1, "timestamp": "2025-02-07 16:29:50.122", "headers": null, "key": "666F6F"} | 626172 |
211+
| { "partition": 0, "offset": 2, "timestamp": "2025-02-07 16:29:50.122", "headers": null, "key": "62617A" } | 717578 |
212+
+--------------------------------------------------------------------------------------------------------------+------------+
213+
214+
----

modules/manage/pages/topic-iceberg-integration.adoc renamed to modules/manage/pages/iceberg/topic-iceberg-integration.adoc

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
= Iceberg Topics
22
:description: Learn how to integrate Redpanda topics with Apache Iceberg.
3-
:page-context-links: [{"name": "Linux", "to": "manage:topic-iceberg-integration.adoc" } ]
3+
:page-context-links: [{"name": "Linux", "to": "manage:iceberg/topic-iceberg-integration.adoc" } ]
44
:page-categories: Iceberg, Tiered Storage, Management, High Availability, Data Replication, Integration
5+
:page-aliases: manage:topic-iceberg-integration.adoc
56
:page-beta: true
67

78
[NOTE]
@@ -272,7 +273,7 @@ Set the cluster configuration property `iceberg_catalog_type` with one of the fo
272273

273274
Once you have enabled the Iceberg integration for a topic and selected a catalog type, you cannot switch to another catalog type.
274275

275-
For production use cases, Redpanda recommends the `rest` option with REST-enabled Iceberg catalog services such as https://docs.tabular.io/[Tabular^], https://docs.databricks.com/en/data-governance/unity-catalog/index.html[Databricks Unity^] and https://github.com/apache/polaris[Apache Polaris^].
276+
For production use cases, Redpanda recommends the `rest` option with REST-enabled Iceberg catalog services such as https://docs.tabular.io/[Tabular^], https://docs.databricks.com/en/data-governance/unity-catalog/index.html[Databricks Unity^] and https://other-docs.snowflake.com/en/opencatalog/overview[Snowflake Open Catalog^].
276277

277278
For an Iceberg REST catalog, set the following additional cluster configuration properties:
278279

@@ -323,6 +324,8 @@ SELECT * FROM streaming.redpanda.ClickEvent;
323324

324325
Spark can use the REST catalog to automatically discover the topic's Iceberg table.
325326

327+
See also: xref:manage:iceberg/redpanda-topics-iceberg-snowflake-catalog.adoc[]
328+
326329
==== File system-based catalog (`object_storage`)
327330

328331
If you are using the `object_storage` catalog type, you must set up the catalog integration in your processing engine accordingly. For example, you can configure Spark to use a file system-based catalog with at least the following properties, is using AWS S3 for object storage:
@@ -432,6 +435,10 @@ FROM <catalog-name>.ClickEvent_key_value;
432435
+------------------------------------------------------------------------------+
433436
----
434437

438+
== Next steps
439+
440+
* xref:manage:iceberg/redpanda-topics-iceberg-snowflake-catalog.adoc[]
441+
435442
== Suggested reading
436443

437444
* xref:manage:schema-reg/schema-id-validation.adoc[]

0 commit comments

Comments
 (0)