cloudflare · jonesphillip · Apr 9, 2025 · Mar 30, 2025 · Apr 5, 2025 · Apr 5, 2025
@@ -0,0 +1,22 @@
+---
+title: R2 Data Catalog is a managed Apache Iceberg data catalog built directly into R2 buckets
+description: A managed Apache Iceberg data catalog built directly into R2 buckets
+products:
+  - r2
+date: 2025-04-10T13:00:00Z
+hidden: true
+---
+
+Today, we're launching [R2 Data Catalog](/r2/data-catalog/) in open beta, a managed Apache Iceberg catalog built directly into your [Cloudflare R2](/r2/) bucket.
+
+If you're not already familiar with it, [Apache Iceberg](https://iceberg.apache.org/) is an open table format designed to handle large-scale analytics datasets stored in object storage, offering ACID transactions and schema evolution. R2 Data Catalog exposes a standard Iceberg REST catalog interface, so you can connect engines like [Spark](/r2/data-catalog/config-examples/spark-scala/), [Snowflake](/r2/data-catalog/config-examples/snowflake/), and [PyIceberg](/r2/data-catalog/config-examples/pyiceberg/) to start querying your tables using the tools you already know.
+
+To enable a data catalog on your R2 bucket, find **R2 Data Catalog** in your buckets settings in the dashboard, or run:
+
+```bash
+npx wrangler r2 bucket catalog enable my-bucket
+```
+
+And that's it. You'll get a catalog URI and warehouse you can plug into your favorite Iceberg engines.
+
+Visit our [getting started guide](/r2/data-catalog/get-started/) for step-by-step instructions on enabling R2 Data Catalog, creating tables, and running your first queries.
@@ -45,12 +45,18 @@ Jurisdictional buckets can only be accessed via the corresponding jurisdictional
 
 ## Permissions
 
-| Permission          | Description                                                                                                                               |
-| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
-| Admin Read & Write  | Allows the ability to create, list and delete buckets, and edit bucket configurations in addition to list, write, and read object access. |
-| Admin Read only     | Allows the ability to list buckets and view bucket configuration in addition to list and read object access.                              |
-| Object Read & Write | Allows the ability to read, write, and list objects in specific buckets.                                                                  |
-| Object Read only    | Allows the ability to read and list objects in specific buckets.                                                                          |
+| Permission          | Description                                                                                                                                                                                 |
+| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Admin Read & Write  | Allows the ability to create, list, and delete buckets, edit bucket configuration, read, write, and list objects, and read and write to data catalog tables and associated metadata.        |
+| Admin Read only     | Allows the ability to list buckets and view bucket configuration, read and list objects, and read from the data catalog tables and associated metadata.                                     |
+| Object Read & Write | Allows the ability to read, write, and list objects in specific buckets.                                                                                                                    |
+| Object Read only    | Allows the ability to read and list objects in specific buckets.                                                                                                                            |
+
+:::note
+
+Currently **Admin Read & Write** or **Admin Read only** permission is required to interact with [R2 Data Catalog](/r2/data-catalog/).
+
+:::
 
 ## Create API tokens via API
 

@@ -0,0 +1,16 @@
+---
+pcx_content_type: navigation
+title: Connect to Iceberg engines
+head: []
+sidebar:
+  order: 4
+  group:
+    hideIndex: true
+description: Find detailed setup instructions for Apache Spark and other common query engines.
+---
+
+import { DirectoryListing } from "~/components";
+
+Below are configuration examples to connect various Iceberg engines to [R2 Data Catalog](/r2/data-catalog/):
+
+<DirectoryListing />
@@ -0,0 +1,50 @@
+---
+title: PyIceberg
+pcx_content_type: example
+---
+
+Below is an example of using [PyIceberg](https://py.iceberg.apache.org/) to connect to R2 Data Catalog.
+
+## Prerequisites
+
+- Sign up for a [Cloudflare account](https://dash.cloudflare.com/sign-up/workers-and-pages).
+- [Create an R2 bucket](/r2/buckets/create-buckets/) and [enable the data catalog](/r2/data-catalog/manage-catalogs/#enable-r2-data-catalog-on-a-bucket).
+- [Create an R2 API token](/r2/api/tokens/) with both [R2 and data catalog permissions](/r2/api/tokens/#permissions).
+- Install the [PyIceberg](https://py.iceberg.apache.org/#installation) and [PyArrow](https://arrow.apache.org/docs/python/install.html) libraries.
+
+## Example usage
+
+```py
+import pyarrow as pa
+from pyiceberg.catalog.rest import RestCatalog
+from pyiceberg.exceptions import NamespaceAlreadyExistsError
+
+# Define catalog connection details (replace variables)
+WAREHOUSE = "<WAREHOUSE>"
+TOKEN = "<TOKEN>"
+CATALOG_URI = "<CATALOG_URI>"
+
+# Connect to R2 Data Catalog
+catalog = RestCatalog(
+    name="my_catalog",
+    warehouse=WAREHOUSE,
+    uri=CATALOG_URI,
+    token=TOKEN,
+)
+
+# Create default namespace
+catalog.create_namespace("default")
+
+# Create simple PyArrow table
+df = pa.table({
+    "id": [1, 2, 3],
+    "name": ["Alice", "Bob", "Charlie"],
+})
+
+# Create an Iceberg table
+test_table = ("default", "my_table")
+table = catalog.create_table(
+    test_table,
+    schema=df.schema,
+)
+```
@@ -0,0 +1,62 @@
+---
+title: Snowflake
+pcx_content_type: example
+---
+
+Below is an example of using [Snowflake](https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-catalog-integration-rest) to connect and query data from R2 Data Catalog (read-only).
+
+## Prerequisites
+
+- Sign up for a [Cloudflare account](https://dash.cloudflare.com/sign-up/workers-and-pages).
+- [Create an R2 bucket](/r2/buckets/create-buckets/) and [enable the data catalog](/r2/data-catalog/manage-catalogs/#enable-r2-data-catalog-on-a-bucket).
+- [Create an R2 API token](/r2/api/tokens/) with both [R2 and data catalog permissions](/r2/api/tokens/#permissions).
+- A [Snowflake](https://www.snowflake.com/) account with the necessary privileges to create external volumes and catalog integrations.
+
+## Example usage
+
+In your Snowflake [SQL worksheet](https://docs.snowflake.com/en/user-guide/ui-snowsight-worksheets-gs) or [notebook](https://docs.snowflake.com/en/user-guide/ui-snowsight/notebooks), run the following commands:
+
+```sql
+-- Create a database (if you don't already have one) to organize your external data
+CREATE DATABASE IF NOT EXISTS r2_example_db;
+
+-- Create an external volume pointing to your R2 bucket
+CREATE OR REPLACE EXTERNAL VOLUME ext_vol_r2
+    STORAGE_LOCATIONS = (
+        (
+            NAME = 'my_r2_storage_location'
+            STORAGE_PROVIDER = 'S3COMPAT'
+            STORAGE_BASE_URL = 's3compat://<bucket-name>'
+            CREDENTIALS = (
+                AWS_KEY_ID = '<access_key>'
+                AWS_SECRET_KEY = '<secret_access_key>'
+            )
+            STORAGE_ENDPOINT = '<account_id>.r2.cloudflarestorage.com'
+        )
+    )
+    ALLOW_WRITES = FALSE;
+
+-- Create a catalog integration for R2 Data Catalog (read-only)
+CREATE OR REPLACE CATALOG INTEGRATION r2_data_catalog
+    CATALOG_SOURCE = ICEBERG_REST
+    TABLE_FORMAT = ICEBERG
+    CATALOG_NAMESPACE = 'default'
+    REST_CONFIG = (
+        CATALOG_URI = '<catalog_uri>'
+        CATALOG_NAME = '<warehouse_name>'
+    )
+    REST_AUTHENTICATION = (
+        TYPE = BEARER
+        BEARER_TOKEN = '<token>'
+    )
+    ENABLED = TRUE;
+
+-- Create an Apache Iceberg table in your selected Snowflake database
+CREATE ICEBERG TABLE my_iceberg_table
+    CATALOG = 'r2_data_catalog'
+    EXTERNAL_VOLUME = 'ext_vol_r2'
+    CATALOG_TABLE_NAME = 'my_table';  -- Name of existing table in your R2 data catalog
+
+-- Query your Iceberg table
+SELECT * FROM my_iceberg_table;
+```
@@ -0,0 +1,71 @@
+---
+title: Spark (PySpark)
+pcx_content_type: example
+---
+
+Below is an example of using [PySpark](https://spark.apache.org/docs/latest/api/python/index.html) to connect to R2 Data Catalog.
+
+## Prerequisites
+
+- Sign up for a [Cloudflare account](https://dash.cloudflare.com/sign-up/workers-and-pages).
+- [Create an R2 bucket](/r2/buckets/create-buckets/) and [enable the data catalog](/r2/data-catalog/manage-catalogs/#enable-r2-data-catalog-on-a-bucket).
+- [Create an R2 API token](/r2/api/tokens/) with both [R2 and data catalog permissions](/r2/api/tokens/#permissions).
+- Install the [PySpark](https://spark.apache.org/docs/latest/api/python/getting_started/install.html) library.
+
+## Example usage
+
+```py
+from pyspark.sql import SparkSession
+
+# Define catalog connection details (replace variables)
+WAREHOUSE = "<WAREHOUSE>"
+TOKEN = "<TOKEN>"
+CATALOG_URI = "<CATALOG_URI>"
+
+# Build Spark session with Iceberg configurations
+spark = SparkSession.builder \
+  .appName("R2DataCatalogExample") \
+  .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.6.1,org.apache.iceberg:iceberg-aws-bundle:1.6.1') \
+  .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
+  .config("spark.sql.catalog.my_catalog", "org.apache.iceberg.spark.SparkCatalog") \
+  .config("spark.sql.catalog.my_catalog.type", "rest") \
+  .config("spark.sql.catalog.my_catalog.uri", CATALOG_URI) \
+  .config("spark.sql.catalog.my_catalog.warehouse", WAREHOUSE) \
+  .config("spark.sql.catalog.my_catalog.token", TOKEN) \
+  .config("spark.sql.catalog.my_catalog.header.X-Iceberg-Access-Delegation", "vended-credentials") \
+  .config("spark.sql.catalog.my_catalog.s3.remote-signing-enabled", "false") \
+  .config("spark.sql.defaultCatalog", "my_catalog") \
+  .getOrCreate()
+spark.sql("USE my_catalog")
+
+# Create namespace if it does not exist
+spark.sql("CREATE NAMESPACE IF NOT EXISTS default")
+
+# Create a table in the namespace using Iceberg
+spark.sql("""
+    CREATE TABLE IF NOT EXISTS default.my_table (
+        id BIGINT,
+        name STRING
+    )
+    USING iceberg
+""")
+
+# Create a simple DataFrame
+df = spark.createDataFrame(
+    [(1, "Alice"), (2, "Bob"), (3, "Charlie")],
+    ["id", "name"]
+)
+
+# Write the DataFrame to the Iceberg table
+df.write \
+    .format("iceberg") \
+    .mode("append") \
+    .save("default.my_table")
+
+# Read the data back from the Iceberg table
+result_df = spark.read \
+    .format("iceberg") \
+    .load("default.my_table")
+
+result_df.show()
+```