This repository contains a Deephaven deployment for Apache Iceberg.
It deploys an Apache Iceberg REST catalog with MinIO as the S3-compatible object store. Deephaven can be used to query the catalog.
This Docker deployment is based on the Apache Spark quickstart for Iceberg. The deployment is extended by adding Deephaven to the same Docker network as the query engine. In addition, it automates the creation of a sample Iceberg catalog and table on startup.
This deployment uses Docker and Docker Compose. Additionally, you will need Git to clone the repository.
The application launches five Docker containers:
- Deephaven
- MinIO
- The MinIO client
- An Apache Iceberg REST catalog
- An Apache Spark server with Iceberg support
The fifth container shuts down once the Iceberg catalog and table are created, leaving only four running.
This deployment is intended for local development and testing. It is not intended for production use for a number of reasons, the main two being:
- Deephaven uses anonymous authentication (no authentication), which is not secure.
- AWS credentials are insecure and hardcoded into the Docker configurations.
The deployment is defined in docker-compose.yml
. All Docker containers use default images except for the spark-iceberg
container, which uses a custom image built off the tabulario/spark-iceberg
image. The custom image adds a Jupyter notebook that gets run on startup to build the catalog with a table. Once the Jupyter notebook completes, the container shuts down.
In addition, this deployment uses Deephaven's application mode to bind several variables used in the guides to environment variables.
This deployment is used by Deephaven's Iceberg user guide. You can follow all of the steps in the guide using this deployment. See the documentation below:
To start the deployment, cd
into either the Python
or Groovy
directory and run:
docker compose up
See License.
Reach out to us on Slack!