cloudflare
diff --git a/‎src/content/docs/r2/data-catalog/config-examples/index.mdx‎
Lines changed: 16 additions & 0 deletions b/‎src/content/docs/r2/data-catalog/config-examples/index.mdx‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎src/content/docs/r2/data-catalog/config-examples/pyiceberg.mdx‎
Lines changed: 50 additions & 0 deletions b/‎src/content/docs/r2/data-catalog/config-examples/pyiceberg.mdx‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎src/content/docs/r2/data-catalog/get-started.mdx‎
Lines changed: 291 additions & 0 deletions b/‎src/content/docs/r2/data-catalog/get-started.mdx‎
Lines changed: 291 additions & 0 deletions
@@ -0,0 +1,16 @@
+---
+pcx_content_type: navigation
+title: Configuration examples
+head: []
+sidebar:
+  order: 3
+  group:
+    hideIndex: true
+description: Find detailed setup instructions for Apache Spark and other common query engines.
+---
+
+import { DirectoryListing } from "~/components";
+
+Below are configuration examples to connect various Iceberg engines to [R2 Data Catalog](/r2/data-catalog/):
+
+<DirectoryListing />
@@ -0,0 +1,50 @@
+---
+title: PyIceberg
+pcx_content_type: example
+---
+
+Below is an example of using [PyIceberg](https://py.iceberg.apache.org/) to connect to R2 Data Catalog.
+
+## Prerequisites
+
+- Sign up for a [Cloudflare account](https://dash.cloudflare.com/sign-up/workers-and-pages).
+- Create an [R2 bucket](/r2/buckets/) and enable the data catalog.
+- Create an [R2 API token](/r2/api/s3/tokens/) with both R2 and data catalog permissions.
+- Install the [PyIceberg](https://py.iceberg.apache.org/#installation) and [PyArrow](https://arrow.apache.org/docs/python/install.html) libraries.
+
+## Example usage
+
+```py
+import pyarrow as pa
+from pyiceberg.catalog.rest import RestCatalog
+from pyiceberg.exceptions import NamespaceAlreadyExistsError
+
+# Define catalog connection details (replace variables)
+WAREHOUSE = "<WAREHOUSE>"
+TOKEN = "<TOKEN>"
+CATALOG_URL = f"https://catalog.cloudflarestorage.com/{WAREHOUSE}"
+
+# Connect to R2 Data Catalog
+catalog = RestCatalog(
+    name="my_catalog",
+    warehouse=WAREHOUSE,
+    uri=CATALOG_URL,
+    token=TOKEN,
+)
+
+# Create default namespace
+catalog.create_namespace("default")
+
+# Create simple PyArrow table
+df = pa.table({
+    "id": [1, 2, 3],
+    "name": ["Alice", "Bob", "Charlie"],
+})
+
+# Create an Iceberg table
+test_table = ("default", "my_table")
+table = catalog.create_table(
+    test_table,
+    schema=df.schema,
+)
+```
@@ -0,0 +1,291 @@
+---
+pcx_content_type: get-started
+title: Get Started
+head: []
+sidebar:
+  order: 2
+description: Learn how to enable the R2 Data Catalog on your bucket, load sample data, and run your first query.
+---
+
+import {
+	Render,
+	PackageManagers,
+	Steps,
+	FileTree,
+	Tabs,
+	TabItem,
+	TypeScriptExample,
+	WranglerConfig,
+	LinkCard,
+} from "~/components";
+
+## Overview
+
+This guide will instruct you through:
+
+- Creating your first [R2 bucket](/r2/buckets/) and enabling its [data catalog](/r2/data-catalog/).
+- Creating an API token needed for query engines to authenticate with your data catalog.
+- Using [PyIceberg](https://py.iceberg.apache.org/) to create your first Iceberg table in a [marimo](https://marimo.io/) Python notebook.
+- Using [PyIceberg](https://py.iceberg.apache.org/) to load sample data into your table and query it.
+
+## Prerequisites
+
+<Render file="prereqs" product="workers" />
+
+## 1. Create an R2 bucket
+
+<Tabs syncKey='CLIvDash'>
+<TabItem label='Wrangler CLI'>
+
+<Steps>
+1. If not already logged in, run:
+
+    	```
+    	npx wrangler login
+    	```
+
+2.  Then, enable the catalog on your chosen R2 bucket:
+
+        ```
+        npx wrangler r2 bucket r2-data-catalog-tutorial
+        ```
+
+</Steps>
+
+</TabItem>
+<TabItem label='Dashboard'>
+
+<Steps>
+1. From the Cloudflare dashboard, select **R2 Object Storage** from the sidebar.
+2. Select the bucket you want to enable as a data catalog.
+3. Switch to the **Settings** tab, scroll down to **R2 Data Catalog**, and select **Enable**.
+4. Once enabled, note the **Catalog URI** and **Warehouse name**.
+</Steps>
+</TabItem>
+</Tabs>
+
+## 2. Enable the data catalog for your bucket
+
+<Tabs syncKey='CLIvDash'>
+<TabItem label='Wrangler CLI'>
+
+Then, enable the catalog on your chosen R2 bucket:
+
+        ```
+        npx wrangler r2 bucket catalog enable r2-data-catalog-tutorial
+        ```
+
+</TabItem>
+<TabItem label='Dashboard'>
+
+<Steps>
+1. From the Cloudflare dashboard, select **R2 Object Storage** from the sidebar.
+2. Select the bucket you want to enable as a data catalog.
+3. Switch to the **Settings** tab, scroll down to **R2 Data Catalog**, and select **Enable**.
+4. Once enabled, note the **Catalog URI** and **Warehouse name**.
+</Steps>
+</TabItem>
+</Tabs>
+
+## 3. Create an API token
+
+Iceberg clients (including [PyIceberg](https://py.iceberg.apache.org/)) must authenticate to the catalog with a [Cloudflare API token](/fundamentals/api/get-started/create-token/) that has both R2 and catalog permissions.
+
+<Steps>
+1. From the Cloudflare dashboard, select **R2 Object Storage** from the sidebar.
+
+2. Expand the **API** dropdown and select **Manage API tokens**.
+
+3. Select **Create API token**.
+
+4. Select the **R2 Token** text to edit your API token name.
+
+5. Under **Permissions**, choose the **Admin Read & Write** permission.
+
+6. Select **Create API Token**.
+
+7. Note the **Token value**, you will need this.
+
+</Steps>
+
+## 4. Install uv
+
+Next, you'll need to install a Python package manager, in this guide we'll be using [uv](https://docs.astral.sh/uv/). If you don't already have uv installed, follow the [installing uv guide](https://docs.astral.sh/uv/getting-started/installation/).
+
+## 5. Install marimo
+
+We'll be using [marimo](https://github.com/marimo-team/marimo) as a Python notebook.
+
+<Steps>
+1. Create a directory where our notebook will live:
+
+    	```
+    	mkdir r2-data-catalog-notebook
+    	```
+
+2.  Change into our new directory:
+
+        ```
+        cd r2-data-catalog-notebook
+        ```
+
+3.  Create a new Python virtual environment:
+
+        ```
+        uv venv
+        ```
+
+4.  Activate the Python virtual environment:
+
+        ```
+        source .venv/bin/activate
+        ```
+
+5.  Install marimo with uv:
+
+        ```py
+        uv pip install marimo
+        ```
+
+</Steps>
+
+## 6. Create a Python notebook to interact with our data warehouse
+
+<Steps>
+1. Create a file called `r2-data-catalog-tutorial.py`.
+
+2.  Paste the following code snippet into your `r2-data-catalog-tutorial.py` file:
+
+        ```py
+        import marimo
+
+        __generated_with = "0.11.31"
+        app = marimo.App(width="medium")
+
+
+        @app.cell
+        def _():
+        		import marimo as mo
+        		return (mo,)
+
+
+        @app.cell
+        def _():
+        		import pandas
+        		import pyarrow as pa
+        		import pyarrow.compute as pc
+        		import pyarrow.parquet as pq
+
+        		from pyiceberg.catalog.rest import RestCatalog
+        		from pyiceberg.exceptions import NamespaceAlreadyExistsError
+
+        		# Define catalog connection details (replace variables)
+        		WAREHOUSE = "<WAREHOUSE>"
+        		TOKEN = "<TOKEN>"
+        		CATALOG_URL = f"https://catalog.cloudflarestorage.com/{WAREHOUSE}"
+
+        		# Connect to R2 Data Catalog
+        		catalog = RestCatalog(
+        				name="my_catalog",
+        				warehouse=WAREHOUSE,
+        				uri=CATALOG_URL,
+        				token=TOKEN,
+        		)
+        		return (
+        				CATALOG_URL,
+        				NamespaceAlreadyExistsError,
+        				RestCatalog,
+        				TOKEN,
+        				WAREHOUSE,
+        				catalog,
+        				pa,
+        				pandas,
+        				pc,
+        				pq,
+        		)
+
+
+        @app.cell
+        def _(NamespaceAlreadyExistsError, catalog):
+        		# Create default namespace if needed
+        		try:
+        				catalog.create_namespace("default")
+        		except NamespaceAlreadyExistsError:
+        				pass
+        		return
+
+
+        @app.cell
+        def _(pa):
+        		# Create simple PyArrow table
+        		df = pa.table({
+        				"id": [1, 2, 3],
+        				"name": ["Alice", "Bob", "Charlie"],
+        				"score": [80.0, 92.5, 88.0],
+        		})
+        		return (df,)
+
+
+        @app.cell
+        def _(catalog, df):
+        		# Create or load Iceberg table
+        		test_table = ("default", "people")
+        		if not catalog.table_exists(test_table):
+        				print(f"Creating table: {test_table}")
+        				table = catalog.create_table(
+        						test_table,
+        						schema=df.schema,
+        				)
+        		else:
+        				table = catalog.load_table(test_table)
+        		return table, test_table
+
+
+        @app.cell
+        def _(df, table):
+        		# Append data
+        		table.append(df)
+        		return
+
+
+        @app.cell
+        def _(table):
+        		print("Table contents:")
+        		scanned = table.scan().to_arrow()
+        		print(scanned.to_pandas())
+        		return (scanned,)
+
+
+        @app.cell
+        def _():
+        		# Optional cleanup. To run uncomment and run cell
+        		# print(f"Deleting table: {test_table}")
+        		# catalog.drop_table(test_table)
+        		# print("Table dropped.")
+        		return
+
+
+        if __name__ == "__main__":
+        		app.run()
+        ```
+
+3.  Replace the `WAREHOUSE` and `TOKEN` variables with your values from sections **2** and **3** respectively.
+
+</Steps>
+In the Python notebook above, you:
+
+1. Connect to your catalog.
+2. Create the `default` namespace.
+3. Create a simple PyArrow table.
+4. Create (or load) the `people` table in the `default` namespace.
+5. Append sample data to the table.
+6. Print the contents of the table.
+7. (Optional) Drop the `people` table we created for this tutorial.
+
+## Learn more
+
+<LinkCard
+	title="Configuration examples"
+	href="/r2/data-catalog/config-examples/"
+	description="Find detailed setup instructions for Apache Spark and other common query engines."
+/>