Skip to content

Commit 9575057

Browse files
charislamfenosiniangithub-actions[bot]
authored
feat(docs): add scaffolding for iceberg docs (supabase#36888)
* feat(docs): add scaffolding for iceberg docs * docs:(storage/iceberg): Analytical buckets docs * Apply suggestions from code review Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Charis <[email protected]> * fix(docs): update analytics buckets docs * nav reorg * doc: analytics buckets limit section * fix ci errors --------- Co-authored-by: fenos <[email protected]> Co-authored-by: Inian <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent bd7d4f4 commit 9575057

File tree

8 files changed

+296
-0
lines changed

8 files changed

+296
-0
lines changed

apps/docs/components/Navigation/NavigationMenu/NavigationMenu.constants.ts

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1717,6 +1717,25 @@ export const storage: NavMenuConstant = {
17171717
{ name: 'API Compatibility', url: '/guides/storage/s3/compatibility' },
17181718
],
17191719
},
1720+
{
1721+
name: 'Analytics Buckets',
1722+
url: undefined,
1723+
items: [
1724+
{ name: 'Introduction', url: '/guides/storage/analytics/introduction' },
1725+
{
1726+
name: 'Creating Analytics Buckets',
1727+
url: '/guides/storage/analytics/creating-analytics-buckets',
1728+
},
1729+
{
1730+
name: 'Connecting to Analytics Buckets',
1731+
url: '/guides/storage/analytics/connecting-to-analytics-bucket',
1732+
},
1733+
{
1734+
name: 'Limits',
1735+
url: '/guides/storage/analytics/limits',
1736+
},
1737+
],
1738+
},
17201739
{
17211740
name: 'CDN',
17221741
url: undefined,
Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
---
2+
title: 'Connecting to Analytics Buckets'
3+
---
4+
5+
<Admonition type="caution">
6+
7+
This feature is in **Private Alpha**. API stability and backward compatibility are not guaranteed at this stage. Reach out from this [Form](https://forms.supabase.com/analytics-buckets) to request access
8+
9+
</Admonition>
10+
11+
When interacting with Analytics Buckets, you authenticate against two main services - the Iceberg REST Catalog and the S3-Compatible Storage Endpoint.
12+
13+
The **Iceberg REST Catalog** acts as the central management system for Iceberg tables. It allows Iceberg clients, such as PyIceberg and Apache Spark, to perform metadata operations including:
14+
15+
- Creating and managing tables and namespaces
16+
- Tracking schemas and handling schema evolution
17+
- Managing partitions and snapshots
18+
- Ensuring transactional consistency and isolation
19+
20+
The REST Catalog itself does not store the actual data. Instead, it stores metadata describing the structure, schema, and partitioning strategy of Iceberg tables.
21+
22+
Actual data storage and retrieval operations occur through the separate S3-compatible endpoint, optimized for reading and writing large analytical datasets stored in Parquet files.
23+
24+
## Authentication
25+
26+
To connect to an Analytics Bucket, you will need
27+
28+
- An Iceberg client (Spark, PyIceberg, etc) which supports the REST Catalog interface.
29+
- S3 credentials to authenticate your Iceberg client with the underlying S3 Bucket.
30+
To create S3 Credentials go to [**Project Settings > Storage**](https://supabase.com/dashboard/project/_/settings/storage), for more information, see the [S3 Authentication Guide](https://supabase.com/docs/guides/storage/s3/authentication). We will support other authentication methods in the future.
31+
32+
- The project reference and Service key for your Supabase project.
33+
You can find your Service key in the Supabase Dashboard under [**Project Settings > API**.](https://supabase.com/dashboard/project/_/settings/api-keys)
34+
35+
You will now have an **Access Key** and a **Secret Key** that you can use to authenticate your Iceberg client.
36+
37+
## Connecting via PyIceberg
38+
39+
PyIceberg is a Python client for Apache Iceberg, facilitating interaction with Iceberg Buckets.
40+
41+
**Installation**
42+
43+
```bash
44+
pip install pyiceberg pyarrow
45+
```
46+
47+
Here's a comprehensive example using PyIceberg with clearly separated configuration:
48+
49+
```python
50+
from pyiceberg.catalog import load_catalog
51+
import pyarrow as pa
52+
import datetime
53+
54+
# Supabase project ref
55+
PROJECT_REF = "<your-supabase-project-ref>"
56+
57+
# Configuration for Iceberg REST Catalog
58+
WAREHOUSE = "your-analytics-bucket-name"
59+
TOKEN = "SERVICE_KEY"
60+
61+
# Configuration for S3-Compatible Storage
62+
S3_ACCESS_KEY = "KEY"
63+
S3_SECRET_KEY = "SECRET"
64+
S3_REGION = "PROJECT_REGION"
65+
66+
S3_ENDPOINT = f"https://{PROJECT_REF}.supabase.co/storage/v1/s3"
67+
CATALOG_URI = f"https://{PROJECT_REF}.supabase.co/storage/v1/iceberg"
68+
69+
# Load the Iceberg catalog
70+
catalog = load_catalog(
71+
"analytics-bucket",
72+
type="rest",
73+
warehouse=WAREHOUSE,
74+
uri=CATALOG_URI,
75+
token=TOKEN,
76+
**{
77+
"py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO",
78+
"s3.endpoint": S3_ENDPOINT,
79+
"s3.access-key-id": S3_ACCESS_KEY,
80+
"s3.secret-access-key": S3_SECRET_KEY,
81+
"s3.region": S3_REGION,
82+
"s3.force-virtual-addressing": False,
83+
},
84+
)
85+
86+
# Create namespace if it doesn't exist
87+
catalog.create_namespace_if_not_exists("default")
88+
89+
# Define schema for your Iceberg table
90+
schema = pa.schema([
91+
pa.field("event_id", pa.int64()),
92+
pa.field("event_name", pa.string()),
93+
pa.field("event_timestamp", pa.timestamp("ms")),
94+
])
95+
96+
# Create table (if it doesn't exist already)
97+
table = catalog.create_table_if_not_exists(("default", "events"), schema=schema)
98+
99+
# Generate and insert sample data
100+
current_time = datetime.datetime.now()
101+
data = pa.table({
102+
"event_id": [1, 2, 3],
103+
"event_name": ["login", "logout", "purchase"],
104+
"event_timestamp": [current_time, current_time, current_time],
105+
})
106+
107+
# Append data to the Iceberg table
108+
table.append(data)
109+
110+
# Scan table and print data as pandas DataFrame
111+
df = table.scan().to_pandas()
112+
print(df)
113+
```
114+
115+
## Connecting via Apache Spark
116+
117+
Apache Spark allows distributed analytical queries against Iceberg Buckets.
118+
119+
```python
120+
from pyspark.sql import SparkSession
121+
122+
# Supabase project ref
123+
PROJECT_REF = "<your-supabase-ref>"
124+
125+
# Configuration for Iceberg REST Catalog
126+
WAREHOUSE = "your-analytics-bucket-name"
127+
TOKEN = "SERVICE_KEY"
128+
129+
# Configuration for S3-Compatible Storage
130+
S3_ACCESS_KEY = "KEY"
131+
S3_SECRET_KEY = "SECRET"
132+
S3_REGION = "PROJECT_REGION"
133+
134+
S3_ENDPOINT = f"https://{PROJECT_REF}.supabase.co/storage/v1/s3"
135+
CATALOG_URI = f"https://{PROJECT_REF}.supabase.co/storage/v1/iceberg"
136+
137+
# Initialize Spark session with Iceberg configuration
138+
spark = SparkSession.builder \
139+
.master("local[*]") \
140+
.appName("SupabaseIceberg") \
141+
.config("spark.driver.host", "127.0.0.1") \
142+
.config("spark.driver.bindAddress", "127.0.0.1") \
143+
.config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.6.1,org.apache.iceberg:iceberg-aws-bundle:1.6.1') \
144+
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
145+
.config("spark.sql.catalog.my_catalog", "org.apache.iceberg.spark.SparkCatalog") \
146+
.config("spark.sql.catalog.my_catalog.type", "rest") \
147+
.config("spark.sql.catalog.my_catalog.uri", CATALOG_URI) \
148+
.config("spark.sql.catalog.my_catalog.warehouse", WAREHOUSE) \
149+
.config("spark.sql.catalog.my_catalog.token", TOKEN) \
150+
.config("spark.sql.catalog.my_catalog.s3.endpoint", S3_ENDPOINT) \
151+
.config("spark.sql.catalog.my_catalog.s3.path-style-access", "true") \
152+
.config("spark.sql.catalog.my_catalog.s3.access-key-id", S3_ACCESS_KEY) \
153+
.config("spark.sql.catalog.my_catalog.s3.secret-access-key", S3_SECRET_KEY) \
154+
.config("spark.sql.catalog.my_catalog.s3.remote-signing-enabled", "false") \
155+
.config("spark.sql.defaultCatalog", "my_catalog") \
156+
.getOrCreate()
157+
158+
# SQL Operations
159+
spark.sql("CREATE NAMESPACE IF NOT EXISTS analytics")
160+
161+
spark.sql("""
162+
CREATE TABLE IF NOT EXISTS analytics.users (
163+
user_id BIGINT,
164+
username STRING
165+
)
166+
USING iceberg
167+
""")
168+
169+
spark.sql("""
170+
INSERT INTO analytics.users (user_id, username)
171+
VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie')
172+
""")
173+
174+
result_df = spark.sql("SELECT * FROM analytics.users")
175+
result_df.show()
176+
```
177+
178+
## Connecting to the Iceberg REST Catalog directly
179+
180+
To authenticate with the Iceberg REST Catalog directly, you need to provide a valid Supabase **Service key** as a Bearer token.
181+
182+
```
183+
curl \
184+
--request GET -sL \
185+
--url 'https://<your-supabase-project>.supabase.co/storage/v1/iceberg/v1/config?warehouse=<bucket-name>' \
186+
--header 'Authorization: Bearer <your-service-key>'
187+
```
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
---
2+
title: 'Creating Analytics Buckets'
3+
subtitle: ''
4+
---
5+
6+
<Admonition type="caution">
7+
8+
This feature is in **Private Alpha**. API stability and backward compatibility are not guaranteed at this stage. Reach out from this [Form](https://forms.supabase.com/analytics-buckets) to request access
9+
10+
</Admonition>
11+
12+
Analytics Buckets use [Apache Iceberg](https://iceberg.apache.org/), an open-table format for managing large analytical datasets.
13+
You can interact with them using tools such as [PyIceberg](https://py.iceberg.apache.org/), [Apache Spark](https://spark.apache.org/) or any client which supports the [standard Iceberg REST Catalog API](https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/apache/iceberg/main/open-api/rest-catalog-open-api.yaml).
14+
15+
You can create an Analytics Bucket using either the Supabase SDK or the Supabase Dashboard.
16+
17+
### Using the Supabase SDK
18+
19+
```ts
20+
import { createClient } from '@supabase/supabase-js'
21+
22+
const supabase = createClient('https://your-project.supabase.co', 'your-service-key')
23+
24+
supabase.storage.createBucket('my-analytics-bucket', {
25+
type: 'ANALYTICS',
26+
})
27+
```
28+
29+
### Using the Supabase Dashboard
30+
31+
1. Navigate to the Storage section in the Supabase Dashboard.
32+
2. Click on "Create Bucket".
33+
3. Enter a name for your bucket (e.g., my-analytics-bucket).
34+
4. Select "Analytics Bucket" as the bucket type.
35+
36+
<img alt="Storage schema design" src="/docs/img/storage/iceberg-bucket.png" />
37+
38+
Now, that you have created your Analytics Bucket, you can start [connecting to it](/docs/guides/storage/analytics/connecting-to-analytics-bucket) with Iceberg clients like PyIceberg or Apache Spark.
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
title: 'Analytics Buckets'
3+
subtitle: ''
4+
---
5+
6+
<Admonition type="caution">
7+
8+
This feature is in **Private Alpha**. API stability and backward compatibility are not guaranteed at this stage. Reach out from this [Form](https://forms.supabase.com/analytics-buckets) to request access
9+
10+
</Admonition>
11+
12+
**Analytics Buckets** are designed for analytical workflows on large datasets without impacting your main database.
13+
14+
Postgres tables are optimized for handling real-time, transactional workloads with frequent inserts, updates, deletes and low-latency queries. **Analytical workloads** have very different requirements: processing large volumes of historical data, running complex queries and aggregations, minimizing storage costs, and ensuring these analytical queries do not interfere with the production traffic.
15+
16+
**Analytics Buckets** address these requirements using [Apache Iceberg](https://iceberg.apache.org/), an open-table format for managing large analytical datasets efficiently.
17+
18+
Analytics Buckets are ideal for
19+
• Data warehousing and business intelligence
20+
• Historical data archiving
21+
• Periodically refreshed real-time analytics
22+
• Complex analytical queries over large datasets
23+
24+
By separating transactional and analytical workloads, Supabase makes it easy to build scalable analytics pipelines without impacting your primary Postgres performance.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
title: 'Analytics Buckets Limits'
3+
subtitle: ''
4+
---
5+
6+
<Admonition type="caution">
7+
8+
This feature is in **Private Alpha**. API stability and backward compatibility are not guaranteed at this stage. Reach out from this [Form](https://forms.supabase.com/analytics-buckets) to request access
9+
10+
</Admonition>
11+
12+
The following default limits are applied when this feature is in the private alpha stage, they can be adjusted on a case-by-case basis:
13+
14+
| **Category** | **Limit** |
15+
| --------------------------------------- | --------- |
16+
| Number of Analytics Buckets per project | 2 |
17+
| Number of namespaces per bucket | 10 |
18+
| Number of tables per namespace | 10 |
19+
20+
## Pricing
21+
22+
Analytics Buckets are Free to use during the Private Alpha phase,
23+
however, you'll still be charged for the underlying egress.
47.3 KB
Loading

supa-mdx-lint/Rule001HeadingCase.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ may_uppercase = [
1010
"Analytics",
1111
"Android",
1212
"Angular",
13+
"Apache Spark",
1314
"Apple",
1415
"Assistant",
1516
"Audit Logs?",
@@ -25,6 +26,7 @@ may_uppercase = [
2526
"Branching",
2627
"Broadcast",
2728
"CAPTCHA",
29+
"Catalog",
2830
"Channel",
2931
"ChatGPT",
3032
"Chrome",
@@ -95,6 +97,7 @@ may_uppercase = [
9597
"IPv4",
9698
"IPv6",
9799
"IVFFlat",
100+
"Iceberg",
98101
"IdP",
99102
"Inbucket",
100103
"Index Advisor",
@@ -150,6 +153,7 @@ may_uppercase = [
150153
"Prisma",
151154
"PrivateLink",
152155
"Prometheus",
156+
"PyIceberg",
153157
"Python",
154158
"Qodo Gen",
155159
"Queues?",

supa-mdx-lint/Rule003Spelling.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,7 @@ allow_list = [
246246
"PubSub",
247247
"Prisma",
248248
"PrivateLink",
249+
"PyIceberg",
249250
"Qodo",
250251
"README",
251252
"Redis",

0 commit comments

Comments
 (0)