Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions src/content/changelog/r2/2025-04-10-r2-data-catalog-beta.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
title: R2 Data Catalog is a managed Apache Iceberg data catalog built directly into R2 buckets
description: A managed Apache Iceberg data catalog built directly into R2 buckets
products:
- r2
date: 2025-04-10T13:00:00Z
hidden: true
---

Today, we're launching [R2 Data Catalog](/r2/data-catalog/) in open beta, a managed Apache Iceberg catalog built directly into your [Cloudflare R2](/r2/) bucket.

If you're not already familiar with it, [Apache Iceberg](https://iceberg.apache.org/) is an open table format designed to handle large-scale analytics datasets stored in object storage, offering ACID transactions and schema evolution. R2 Data Catalog exposes a standard Iceberg REST catalog interface, so you can connect engines like [Spark](/r2/data-catalog/config-examples/spark-scala/), [Snowflake](/r2/data-catalog/config-examples/snowflake/), and [PyIceberg](/r2/data-catalog/config-examples/pyiceberg/) to start querying your tables using the tools you already know.

To enable a data catalog on your R2 bucket, find **R2 Data Catalog** in your buckets settings in the dashboard, or run:

```bash
npx wrangler r2 bucket catalog enable my-bucket
```

And that's it. You'll get a catalog URI and warehouse you can plug into your favorite Iceberg engines.

Visit our [getting started guide](/r2/data-catalog/get-started/) for step-by-step instructions on enabling R2 Data Catalog, creating tables, and running your first queries.
18 changes: 12 additions & 6 deletions src/content/docs/r2/api/tokens.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,18 @@ Jurisdictional buckets can only be accessed via the corresponding jurisdictional

## Permissions

| Permission | Description |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| Admin Read & Write | Allows the ability to create, list and delete buckets, and edit bucket configurations in addition to list, write, and read object access. |
| Admin Read only | Allows the ability to list buckets and view bucket configuration in addition to list and read object access. |
| Object Read & Write | Allows the ability to read, write, and list objects in specific buckets. |
| Object Read only | Allows the ability to read and list objects in specific buckets. |
| Permission | Description |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Admin Read & Write | Allows the ability to create, list, and delete buckets, edit bucket configuration, read, write, and list objects, and read and write to data catalog tables and associated metadata. |
| Admin Read only | Allows the ability to list buckets and view bucket configuration, read and list objects, and read from the data catalog tables and associated metadata. |
| Object Read & Write | Allows the ability to read, write, and list objects in specific buckets. |
| Object Read only | Allows the ability to read and list objects in specific buckets. |

:::note

Currently **Admin Read & Write** or **Admin Read only** permission is required to interact with [R2 Data Catalog](/r2/data-catalog/).

:::

## Create API tokens via API

Expand Down
16 changes: 16 additions & 0 deletions src/content/docs/r2/data-catalog/config-examples/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
pcx_content_type: navigation
title: Connect to Iceberg engines
head: []
sidebar:
order: 4
group:
hideIndex: true
description: Find detailed setup instructions for Apache Spark and other common query engines.
---

import { DirectoryListing } from "~/components";

Below are configuration examples to connect various Iceberg engines to [R2 Data Catalog](/r2/data-catalog/):

<DirectoryListing />
50 changes: 50 additions & 0 deletions src/content/docs/r2/data-catalog/config-examples/pyiceberg.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
title: PyIceberg
pcx_content_type: example
---

Below is an example of using [PyIceberg](https://py.iceberg.apache.org/) to connect to R2 Data Catalog.

## Prerequisites

- Sign up for a [Cloudflare account](https://dash.cloudflare.com/sign-up/workers-and-pages).
- [Create an R2 bucket](/r2/buckets/create-buckets/) and [enable the data catalog](/r2/data-catalog/manage-catalogs/#enable-r2-data-catalog-on-a-bucket).
- [Create an R2 API token](/r2/api/tokens/) with both [R2 and data catalog permissions](/r2/api/tokens/#permissions).
- Install the [PyIceberg](https://py.iceberg.apache.org/#installation) and [PyArrow](https://arrow.apache.org/docs/python/install.html) libraries.

## Example usage

```py
import pyarrow as pa
from pyiceberg.catalog.rest import RestCatalog
from pyiceberg.exceptions import NamespaceAlreadyExistsError

# Define catalog connection details (replace variables)
WAREHOUSE = "<WAREHOUSE>"
TOKEN = "<TOKEN>"
CATALOG_URI = "<CATALOG_URI>"

# Connect to R2 Data Catalog
catalog = RestCatalog(
name="my_catalog",
warehouse=WAREHOUSE,
uri=CATALOG_URI,
token=TOKEN,
)

# Create default namespace
catalog.create_namespace("default")

# Create simple PyArrow table
df = pa.table({
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
})

# Create an Iceberg table
test_table = ("default", "my_table")
table = catalog.create_table(
test_table,
schema=df.schema,
)
```
62 changes: 62 additions & 0 deletions src/content/docs/r2/data-catalog/config-examples/snowflake.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
title: Snowflake
pcx_content_type: example
---

Below is an example of using [Snowflake](https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-catalog-integration-rest) to connect and query data from R2 Data Catalog (read-only).

## Prerequisites

- Sign up for a [Cloudflare account](https://dash.cloudflare.com/sign-up/workers-and-pages).
- [Create an R2 bucket](/r2/buckets/create-buckets/) and [enable the data catalog](/r2/data-catalog/manage-catalogs/#enable-r2-data-catalog-on-a-bucket).
- [Create an R2 API token](/r2/api/tokens/) with both [R2 and data catalog permissions](/r2/api/tokens/#permissions).
- A [Snowflake](https://www.snowflake.com/) account with the necessary privileges to create external volumes and catalog integrations.

## Example usage

In your Snowflake [SQL worksheet](https://docs.snowflake.com/en/user-guide/ui-snowsight-worksheets-gs) or [notebook](https://docs.snowflake.com/en/user-guide/ui-snowsight/notebooks), run the following commands:

```sql
-- Create a database (if you don't already have one) to organize your external data
CREATE DATABASE IF NOT EXISTS r2_example_db;

-- Create an external volume pointing to your R2 bucket
CREATE OR REPLACE EXTERNAL VOLUME ext_vol_r2
STORAGE_LOCATIONS = (
(
NAME = 'my_r2_storage_location'
STORAGE_PROVIDER = 'S3COMPAT'
STORAGE_BASE_URL = 's3compat://<bucket-name>'
CREDENTIALS = (
AWS_KEY_ID = '<access_key>'
AWS_SECRET_KEY = '<secret_access_key>'
)
STORAGE_ENDPOINT = '<account_id>.r2.cloudflarestorage.com'
)
)
ALLOW_WRITES = FALSE;

-- Create a catalog integration for R2 Data Catalog (read-only)
CREATE OR REPLACE CATALOG INTEGRATION r2_data_catalog
CATALOG_SOURCE = ICEBERG_REST
TABLE_FORMAT = ICEBERG
CATALOG_NAMESPACE = 'default'
REST_CONFIG = (
CATALOG_URI = '<catalog_uri>'
CATALOG_NAME = '<warehouse_name>'
)
REST_AUTHENTICATION = (
TYPE = BEARER
BEARER_TOKEN = '<token>'
)
ENABLED = TRUE;

-- Create an Apache Iceberg table in your selected Snowflake database
CREATE ICEBERG TABLE my_iceberg_table
CATALOG = 'r2_data_catalog'
EXTERNAL_VOLUME = 'ext_vol_r2'
CATALOG_TABLE_NAME = 'my_table'; -- Name of existing table in your R2 data catalog

-- Query your Iceberg table
SELECT * FROM my_iceberg_table;
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
title: Spark (PySpark)
pcx_content_type: example
---

Below is an example of using [PySpark](https://spark.apache.org/docs/latest/api/python/index.html) to connect to R2 Data Catalog.

## Prerequisites

- Sign up for a [Cloudflare account](https://dash.cloudflare.com/sign-up/workers-and-pages).
- [Create an R2 bucket](/r2/buckets/create-buckets/) and [enable the data catalog](/r2/data-catalog/manage-catalogs/#enable-r2-data-catalog-on-a-bucket).
- [Create an R2 API token](/r2/api/tokens/) with both [R2 and data catalog permissions](/r2/api/tokens/#permissions).
- Install the [PySpark](https://spark.apache.org/docs/latest/api/python/getting_started/install.html) library.

## Example usage

```py
from pyspark.sql import SparkSession

# Define catalog connection details (replace variables)
WAREHOUSE = "<WAREHOUSE>"
TOKEN = "<TOKEN>"
CATALOG_URI = "<CATALOG_URI>"

# Build Spark session with Iceberg configurations
spark = SparkSession.builder \
.appName("R2DataCatalogExample") \
.config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.6.1,org.apache.iceberg:iceberg-aws-bundle:1.6.1') \
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
.config("spark.sql.catalog.my_catalog", "org.apache.iceberg.spark.SparkCatalog") \
.config("spark.sql.catalog.my_catalog.type", "rest") \
.config("spark.sql.catalog.my_catalog.uri", CATALOG_URI) \
.config("spark.sql.catalog.my_catalog.warehouse", WAREHOUSE) \
.config("spark.sql.catalog.my_catalog.token", TOKEN) \
.config("spark.sql.catalog.my_catalog.header.X-Iceberg-Access-Delegation", "vended-credentials") \
.config("spark.sql.catalog.my_catalog.s3.remote-signing-enabled", "false") \
.config("spark.sql.defaultCatalog", "my_catalog") \
.getOrCreate()
spark.sql("USE my_catalog")

# Create namespace if it does not exist
spark.sql("CREATE NAMESPACE IF NOT EXISTS default")

# Create a table in the namespace using Iceberg
spark.sql("""
CREATE TABLE IF NOT EXISTS default.my_table (
id BIGINT,
name STRING
)
USING iceberg
""")

# Create a simple DataFrame
df = spark.createDataFrame(
[(1, "Alice"), (2, "Bob"), (3, "Charlie")],
["id", "name"]
)

# Write the DataFrame to the Iceberg table
df.write \
.format("iceberg") \
.mode("append") \
.save("default.my_table")

# Read the data back from the Iceberg table
result_df = spark.read \
.format("iceberg") \
.load("default.my_table")

result_df.show()
```
Loading
Loading