Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
:::note Catalog options

We support different catalog options for Iceberg lakes.
The instructions below are not necessary when using Snowflake Open Catalog.

:::

The [Iceberg documentation](https://iceberg.apache.org/docs/latest/maintenance/) makes recommendations for running regular maintenance jobs to get the best performance from your lake. This guide expands on those recommendations specifically for your Snowplow events lake.

The Snowplow Lake Loader **does not** automatically run the maintenance tasks described below.
Expand Down
46 changes: 30 additions & 16 deletions docs/destinations/warehouses-lakes/iceberg/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ import TabItem from '@theme/TabItem';

:::info Cloud availability

The Iceberg integration is available for Snowplow pipelines running on **AWS** only.
The Iceberg integration is available for Snowplow pipelines running on **AWS** and **GCP** only.

:::

Expand All @@ -29,7 +29,13 @@ Iceberg data can be consumed using various tools and products, for example:
* Snowflake
* ClickHouse

We currently only support the Glue Iceberg catalog.
We currently support the following catalogs:
| Catalog | AWS | GCP |
| ------- | --- | --- |
| Glue | :white_check_mark: | :x: |
| REST¹ | :white_check_mark: | :white_check_mark: |

_¹The REST catalog has only been tested with the Snowflake Open Catalog implementation._

## What you will need

Expand All @@ -43,27 +49,35 @@ The list below is just a heads up. The Snowplow Console will guide you through t

Keep in mind that you will need to be able to:

* Specify your AWS account ID
* Provide an S3 bucket and an AWS Glue database
* Create an IAM role with the following permissions:
* For the S3 bucket:
* `s3:ListBucket`
* `s3:GetObject`
* `s3:PutObject`
* `s3:DeleteObject`
* For the Glue database:
* `glue:CreateTable`
* `glue:GetTable`
* `glue:UpdateTable`
* Schedule a regular job to optimize the lake
<Tabs groupId="catalog" queryString lazy>
<TabItem value="rest" label="REST" default>
* Specify your Snowflake Open Catalog account id and region, as well as namespace
* Create a service connection to the catalog and provide the client id and client secret
</TabItem>
<TabItem value="glue" label="AWS Glue">
* Specify your AWS account ID
* Provide an S3 bucket and an AWS Glue database
* Create an IAM role with the following permissions:
* For the S3 bucket:
* `s3:ListBucket`
* `s3:GetObject`
* `s3:PutObject`
* `s3:DeleteObject`
* For the Glue database:
* `glue:CreateTable`
* `glue:GetTable`
* `glue:UpdateTable`
* Schedule a regular job to optimize the lake
</TabItem>
</Tabs>

## Getting started

You can add an Iceberg destination through the Snowplow Console. (For self-hosted customers, please refer to the [Loader API reference](/docs/api-reference/loaders-storage-targets/lake-loader/index.md) instead.)

<SetupInstructions destinationName="Iceberg" connectionType="Iceberg" />

We recommend scheduling regular [lake maintenance jobs](/docs/api-reference/loaders-storage-targets/lake-loader/maintenance/index.md?lake-format=iceberg) to ensure the best long-term performance.
For AWS Glue, we recommend scheduling regular [lake maintenance jobs](/docs/api-reference/loaders-storage-targets/lake-loader/maintenance/index.md?lake-format=iceberg) to ensure the best long-term performance.

## How loading works

Expand Down