|
| 1 | +--- |
| 2 | +title: 'Connecting to Analytics Buckets' |
| 3 | +--- |
| 4 | + |
| 5 | +<Admonition type="caution"> |
| 6 | + |
| 7 | +This feature is in **Private Alpha**. API stability and backward compatibility are not guaranteed at this stage. Reach out from this [Form](https://forms.supabase.com/analytics-buckets) to request access |
| 8 | + |
| 9 | +</Admonition> |
| 10 | + |
| 11 | +When interacting with Analytics Buckets, you authenticate against two main services - the Iceberg REST Catalog and the S3-Compatible Storage Endpoint. |
| 12 | + |
| 13 | +The **Iceberg REST Catalog** acts as the central management system for Iceberg tables. It allows Iceberg clients, such as PyIceberg and Apache Spark, to perform metadata operations including: |
| 14 | + |
| 15 | +- Creating and managing tables and namespaces |
| 16 | +- Tracking schemas and handling schema evolution |
| 17 | +- Managing partitions and snapshots |
| 18 | +- Ensuring transactional consistency and isolation |
| 19 | + |
| 20 | +The REST Catalog itself does not store the actual data. Instead, it stores metadata describing the structure, schema, and partitioning strategy of Iceberg tables. |
| 21 | + |
| 22 | +Actual data storage and retrieval operations occur through the separate S3-compatible endpoint, optimized for reading and writing large analytical datasets stored in Parquet files. |
| 23 | + |
| 24 | +## Authentication |
| 25 | + |
| 26 | +To connect to an Analytics Bucket, you will need |
| 27 | + |
| 28 | +- An Iceberg client (Spark, PyIceberg, etc) which supports the REST Catalog interface. |
| 29 | +- S3 credentials to authenticate your Iceberg client with the underlying S3 Bucket. |
| 30 | + To create S3 Credentials go to [**Project Settings > Storage**](https://supabase.com/dashboard/project/_/settings/storage), for more information, see the [S3 Authentication Guide](https://supabase.com/docs/guides/storage/s3/authentication). We will support other authentication methods in the future. |
| 31 | + |
| 32 | +- The project reference and Service key for your Supabase project. |
| 33 | + You can find your Service key in the Supabase Dashboard under [**Project Settings > API**.](https://supabase.com/dashboard/project/_/settings/api-keys) |
| 34 | + |
| 35 | +You will now have an **Access Key** and a **Secret Key** that you can use to authenticate your Iceberg client. |
| 36 | + |
| 37 | +## Connecting via PyIceberg |
| 38 | + |
| 39 | +PyIceberg is a Python client for Apache Iceberg, facilitating interaction with Iceberg Buckets. |
| 40 | + |
| 41 | +**Installation** |
| 42 | + |
| 43 | +```bash |
| 44 | +pip install pyiceberg pyarrow |
| 45 | +``` |
| 46 | + |
| 47 | +Here's a comprehensive example using PyIceberg with clearly separated configuration: |
| 48 | + |
| 49 | +```python |
| 50 | +from pyiceberg.catalog import load_catalog |
| 51 | +import pyarrow as pa |
| 52 | +import datetime |
| 53 | + |
| 54 | +# Supabase project ref |
| 55 | +PROJECT_REF = "<your-supabase-project-ref>" |
| 56 | + |
| 57 | +# Configuration for Iceberg REST Catalog |
| 58 | +WAREHOUSE = "your-analytics-bucket-name" |
| 59 | +TOKEN = "SERVICE_KEY" |
| 60 | + |
| 61 | +# Configuration for S3-Compatible Storage |
| 62 | +S3_ACCESS_KEY = "KEY" |
| 63 | +S3_SECRET_KEY = "SECRET" |
| 64 | +S3_REGION = "PROJECT_REGION" |
| 65 | + |
| 66 | +S3_ENDPOINT = f"https://{PROJECT_REF}.supabase.co/storage/v1/s3" |
| 67 | +CATALOG_URI = f"https://{PROJECT_REF}.supabase.co/storage/v1/iceberg" |
| 68 | + |
| 69 | +# Load the Iceberg catalog |
| 70 | +catalog = load_catalog( |
| 71 | + "analytics-bucket", |
| 72 | + type="rest", |
| 73 | + warehouse=WAREHOUSE, |
| 74 | + uri=CATALOG_URI, |
| 75 | + token=TOKEN, |
| 76 | + **{ |
| 77 | + "py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO", |
| 78 | + "s3.endpoint": S3_ENDPOINT, |
| 79 | + "s3.access-key-id": S3_ACCESS_KEY, |
| 80 | + "s3.secret-access-key": S3_SECRET_KEY, |
| 81 | + "s3.region": S3_REGION, |
| 82 | + "s3.force-virtual-addressing": False, |
| 83 | + }, |
| 84 | +) |
| 85 | + |
| 86 | +# Create namespace if it doesn't exist |
| 87 | +catalog.create_namespace_if_not_exists("default") |
| 88 | + |
| 89 | +# Define schema for your Iceberg table |
| 90 | +schema = pa.schema([ |
| 91 | + pa.field("event_id", pa.int64()), |
| 92 | + pa.field("event_name", pa.string()), |
| 93 | + pa.field("event_timestamp", pa.timestamp("ms")), |
| 94 | +]) |
| 95 | + |
| 96 | +# Create table (if it doesn't exist already) |
| 97 | +table = catalog.create_table_if_not_exists(("default", "events"), schema=schema) |
| 98 | + |
| 99 | +# Generate and insert sample data |
| 100 | +current_time = datetime.datetime.now() |
| 101 | +data = pa.table({ |
| 102 | + "event_id": [1, 2, 3], |
| 103 | + "event_name": ["login", "logout", "purchase"], |
| 104 | + "event_timestamp": [current_time, current_time, current_time], |
| 105 | +}) |
| 106 | + |
| 107 | +# Append data to the Iceberg table |
| 108 | +table.append(data) |
| 109 | + |
| 110 | +# Scan table and print data as pandas DataFrame |
| 111 | +df = table.scan().to_pandas() |
| 112 | +print(df) |
| 113 | +``` |
| 114 | + |
| 115 | +## Connecting via Apache Spark |
| 116 | + |
| 117 | +Apache Spark allows distributed analytical queries against Iceberg Buckets. |
| 118 | + |
| 119 | +```python |
| 120 | +from pyspark.sql import SparkSession |
| 121 | + |
| 122 | +# Supabase project ref |
| 123 | +PROJECT_REF = "<your-supabase-ref>" |
| 124 | + |
| 125 | +# Configuration for Iceberg REST Catalog |
| 126 | +WAREHOUSE = "your-analytics-bucket-name" |
| 127 | +TOKEN = "SERVICE_KEY" |
| 128 | + |
| 129 | +# Configuration for S3-Compatible Storage |
| 130 | +S3_ACCESS_KEY = "KEY" |
| 131 | +S3_SECRET_KEY = "SECRET" |
| 132 | +S3_REGION = "PROJECT_REGION" |
| 133 | + |
| 134 | +S3_ENDPOINT = f"https://{PROJECT_REF}.supabase.co/storage/v1/s3" |
| 135 | +CATALOG_URI = f"https://{PROJECT_REF}.supabase.co/storage/v1/iceberg" |
| 136 | + |
| 137 | +# Initialize Spark session with Iceberg configuration |
| 138 | +spark = SparkSession.builder \ |
| 139 | + .master("local[*]") \ |
| 140 | + .appName("SupabaseIceberg") \ |
| 141 | + .config("spark.driver.host", "127.0.0.1") \ |
| 142 | + .config("spark.driver.bindAddress", "127.0.0.1") \ |
| 143 | + .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.6.1,org.apache.iceberg:iceberg-aws-bundle:1.6.1') \ |
| 144 | + .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \ |
| 145 | + .config("spark.sql.catalog.my_catalog", "org.apache.iceberg.spark.SparkCatalog") \ |
| 146 | + .config("spark.sql.catalog.my_catalog.type", "rest") \ |
| 147 | + .config("spark.sql.catalog.my_catalog.uri", CATALOG_URI) \ |
| 148 | + .config("spark.sql.catalog.my_catalog.warehouse", WAREHOUSE) \ |
| 149 | + .config("spark.sql.catalog.my_catalog.token", TOKEN) \ |
| 150 | + .config("spark.sql.catalog.my_catalog.s3.endpoint", S3_ENDPOINT) \ |
| 151 | + .config("spark.sql.catalog.my_catalog.s3.path-style-access", "true") \ |
| 152 | + .config("spark.sql.catalog.my_catalog.s3.access-key-id", S3_ACCESS_KEY) \ |
| 153 | + .config("spark.sql.catalog.my_catalog.s3.secret-access-key", S3_SECRET_KEY) \ |
| 154 | + .config("spark.sql.catalog.my_catalog.s3.remote-signing-enabled", "false") \ |
| 155 | + .config("spark.sql.defaultCatalog", "my_catalog") \ |
| 156 | + .getOrCreate() |
| 157 | + |
| 158 | +# SQL Operations |
| 159 | +spark.sql("CREATE NAMESPACE IF NOT EXISTS analytics") |
| 160 | + |
| 161 | +spark.sql(""" |
| 162 | + CREATE TABLE IF NOT EXISTS analytics.users ( |
| 163 | + user_id BIGINT, |
| 164 | + username STRING |
| 165 | + ) |
| 166 | + USING iceberg |
| 167 | +""") |
| 168 | + |
| 169 | +spark.sql(""" |
| 170 | + INSERT INTO analytics.users (user_id, username) |
| 171 | + VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie') |
| 172 | +""") |
| 173 | + |
| 174 | +result_df = spark.sql("SELECT * FROM analytics.users") |
| 175 | +result_df.show() |
| 176 | +``` |
| 177 | + |
| 178 | +## Connecting to the Iceberg REST Catalog directly |
| 179 | + |
| 180 | +To authenticate with the Iceberg REST Catalog directly, you need to provide a valid Supabase **Service key** as a Bearer token. |
| 181 | + |
| 182 | +``` |
| 183 | +curl \ |
| 184 | + --request GET -sL \ |
| 185 | + --url 'https://<your-supabase-project>.supabase.co/storage/v1/iceberg/v1/config?warehouse=<bucket-name>' \ |
| 186 | + --header 'Authorization: Bearer <your-service-key>' |
| 187 | +``` |
0 commit comments