cloudflare · jonesphillip · Sep 25, 2025 · Sep 18, 2025 · Sep 18, 2025 · Sep 19, 2025
@@ -29,4 +29,4 @@ pnpm-debug.log*
 /assets/secrets
 /worker/functions/
 
-.idea
+.idea
@@ -261,7 +261,7 @@
 	},
 	{
 		"name": "Pipelines",
-		"deeplink": "/?to=/:account/workers/pipelines",
+		"deeplink": "/?to=/:account/pipelines",
 		"parent": ["Storage & Databases"]
 	},
 	{

@@ -0,0 +1,350 @@
+---
+pcx_content_type: get-started
+title: Getting started
+head: []
+sidebar:
+  order: 2
+description: Learn how to get up and running with R2 SQL using R2 Data Catalog and Pipelines
+---
+import {
+	Render,
+	Steps,
+	Tabs,
+	TabItem,
+	DashButton,
+	LinkCard,
+} from "~/components";
+
+## Overview
+
+This guide will instruct you through:
+
+- Creating an [R2 bucket](/r2/buckets/) and enabling its [data catalog](/r2/data-catalog/).
+- Using Wrangler to create a Pipeline Stream, Sink, and the SQL that reads from the stream and writes it to the sink.
+- Sending some data to the stream via the HTTP Streams endpoint.
+- Querying the data using R2 SQL.
+
+## Prerequisites
+
+1. Sign up for a [Cloudflare account](https://dash.cloudflare.com/sign-up).
+2. Install [Node.js](https://nodejs.org/en/).
+3. Install [Wrangler](/workers/wranger/install-and-update).
+
+:::note[Node.js version manager]
+Use a Node version manager like [Volta](https://volta.sh/) or [nvm](https://github.com/nvm-sh/nvm) to avoid permission issues and change Node.js versions.
+
+Wrangler requires a Node version of 16.17.0 or later.
+:::
+
+## 1. Set up authentication
+
+You will need API tokens to interact with Cloudflare services.
+
+<Steps>
+1. In the Cloudflare dashboard, go to the **R2 object storage** page.
+
+   <DashButton url="/?to=/:account/r2/overview" />
+
+2. Select **Manage API tokens**.
+
+3. Select **Create API token**.
+
+4. Select the **R2 Token** text to edit your API token name.
+
+5. Under **Permissions**, choose the **Admin Read & Write** permission.
+
+6. Select **Create API Token**.
+
+7. Note the **Token value**.
+
+</Steps>
+
+Export your new token as an environment variable:
+
+```bash
+export WRANGLER_R2_SQL_AUTH_TOKEN= #paste your token here
+```
+
+If this is your first time using Wrangler, make sure to login.
+
+```bash
+npx wrangler login
+```
+
+## 2. Create an R2 bucket and enable R2 Data Catalog
+
+<Tabs syncKey='CLIvDash'>
+<TabItem label='Wrangler CLI'>
+
+Create an R2 bucket:
+
+	```bash
+	npx wrangler r2 bucket create r2-sql-demo
+	```
+
+</TabItem>
+<TabItem label='Dashboard'>
+
+<Steps>
+1. In the Cloudflare dashboard, go to the **R2 object storage** page.
+
+   <DashButton url="/?to=/:account/r2/overview" />
+
+2. Select **Create bucket**.
+
+3. Enter the bucket name: `r2-sql-demo`
+
+4. Select **Create bucket**.
+</Steps>
+</TabItem>
+</Tabs>
+
+## 3. Enable R2 Data Catalog
+
+<Tabs syncKey='CLIvDash'>
+<TabItem label='Wrangler CLI'>
+
+Enable the catalog on your R2 bucket:
+
+```bash
+npx wrangler r2 bucket catalog enable r2-sql-demo
+```
+
+When you run this command, take note of the "Warehouse". You will need these later.
+
+</TabItem>
+<TabItem label='Dashboard'>
+
+<Steps>
+1. In the Cloudflare dashboard, go to the **R2 object storage** page.
+
+   <DashButton url="/?to=/:account/r2/overview" />
+
+2. Select the bucket: `r2-sql-demo`.
+
+3. Switch to the **Settings** tab, scroll down to **R2 Data Catalog**, and select **Enable**.
+
+4. Once enabled, note the **Catalog URI** and **Warehouse name**.
+</Steps>
+</TabItem>
+</Tabs>
+
+
+:::note
+Copy the warehouse (ACCOUNTID_BUCKETNAME) and paste it in the `export` below. We will use it later in the tutorial.
+:::
+
+```bash
+export $WAREHOUSE= #Paste your warehouse here
+```
+
+## 4. Create the data Pipeline
+
+<Tabs syncKey='CLIvDash'>
+<TabItem label='Wrangler CLI'>
+
+### 4.1. Create the Pipeline Stream
+
+First, create a schema file called `demo_schema.json` with the following `json` schema:
+
+```json
+{
+  "fields": [
+    {"name": "user_id", "type": "int64", "required": true},
+    {"name": "payload", "type": "string", "required": false},
+		{"name": "numbers", "type": "int32", "required": false}
+  ]
+}
+```
+Next, create the stream we will use to ingest events to:
+
+```bash
+npx wrangler pipelines streams create demo_stream \
+  --schema-file demo_schema.json \
+	--http-enabled true \
+  --http-auth false
+```
+:::note
+Note the **HTTP Ingest Endpoint URL** from the output. This is the endpoint you will use to send data to your pipeline.
+:::
+
+```bash
+# The http ingest endpoint from the output (see example below)
+export STREAM_ENDPOINT= #the http ingest endpoint from the output (see example below)
+```
+
+The output should look like this:
+
+```sh
+🌀 Creating stream 'demo_stream'...
+✨ Successfully created stream 'demo_stream' with id 'stream_id'.
+
+Creation Summary:
+General:
+  Name:  demo_stream
+
+HTTP Ingest:
+  Enabled:         Yes
+  Authentication:  No
+  Endpoint:        https://stream_id.ingest.cloudflare.com
+  CORS Origins:    None
+
+Input Schema:
+┌────────────┬────────┬────────────┬──────────┐
+│ Field Name │ Type   │ Unit/Items │ Required │
+├────────────┼────────┼────────────┼──────────┤
+│ user_id    │ int64  │            │ Yes      │
+├────────────┼────────┼────────────┼──────────┤
+│ payload    │ string │            │ No       │
+├────────────┼────────┼────────────┼──────────┤
+│ numbers    │ int32  │            │ No       │
+└────────────┴────────┴────────────┴──────────┘
+```
+
+### 4.2. Create the Pipeline Sink
+
+Create a sink that writes data to your R2 bucket as Apache Iceberg tables:
+
+```bash
+npx wrangler pipelines sinks create demo_sink \
+  --type "r2-data-catalog" \
+	--bucket "r2-sql-demo" \
+	--roll-interval 30 \
+	--namespace "demo" \
+	--table "first_table" \
+	--catalog-token $WRANGLER_R2_SQL_AUTH_TOKEN
+```
+
+:::note
+This creates a `sink` configuration that will write to the Iceberg table `demo.first_table` in your R2 Data Catalog every 30 seconds. Pipelines automatically appends an `__ingest_ts` column that is used to partition the table by `DAY`.
+:::
+
+### 4.3. Create the Pipeline
+
+Pipelines are SQL statements that reads data from the stream, does some work, and writes it to the sink.
+
+```bash
+npx wrangler pipelines create demo_pipeline \
+  --sql "INSERT INTO demo_sink SELECT * FROM demo_stream WHERE numbers > 5;"
+```
+:::note
+Note that there is a filter on this statement that will only send events where `numbers` is greater than 5.
+:::
+
+</TabItem>
+<TabItem label='Dashboard'>
+<Steps>
+1. In the Cloudflare dashboard, go to the Pipelines page.
+
+   <DashButton url="/?to=/:account/pipelines" />
+
+2. Select **Create Pipeline**.
+
+3. **Connect to a Stream**:
+   - Pipeline name: `demo`
+   - Enable HTTP endpoint for sending data: Enabled
+   - HTTP authentication: Disabled (default)
+   - Select **Next**
+
+4. **Define Input Schema**:
+   - Select **JSON editor**
+   - Copy in the schema:
+    ```json
+    {
+      "fields": [
+        {"name": "user_id", "type": "int64", "required": true},
+        {"name": "payload", "type": "string", "required": false},
+        {"name": "numbers", "type": "int32", "required": false}
+      ]
+    }
+    ```
+
+   - Select **Next**
+
+5. **Define Sink**:
+   - Select your R2 bucket: `r2-sql-demo`
+   - Storage type: **R2 Data Catalog**
+   - Namespace: `fraud_detection`
+   - Table name: `transactions`
+   - **Advanced Settings**: Change **Maximum Time Interval** to `30 seconds`
+   - Select **Next**
+
+6. **Credentials**:
+   - Disable **Automatically create an Account API token for your sink**
+   - Enter **Catalog Token** from step 1
+   - Select **Next**
+
+7. **Pipeline Definition**:
+   - Leave the default SQL query:
+     ```sql
+     INSERT INTO demo_sink SELECT * FROM demo_stream;
+     ```
+   - Select **Create Pipeline**
+
+8. :::note
+    Note the **HTTP Ingest Endpoint URL** from the output. This is the endpoint you will use to send data to your pipeline.
+   :::
+
+</Steps>
+
+```bash
+# The http ingest endpoint
+export STREAM_ENDPOINT= #the http ingest endpoint from the output (see example below)
+```
+</TabItem>
+</Tabs>
+
+
+## 5. Send some data
+
+Next, send some events to our stream:
+
+```curl
+curl -X POST "$STREAM_ENDPOINT" \
+  -H "Content-Type: application/json" \
+  -d '[
+    {
+      "user_id": 1,
+      "payload": "you should see this",
+      "numbers": 42
+    },
+    {
+      "user_id": 2,
+      "payload": "you should also see this",
+      "numbers": 100
+    },
+    {
+      "user_id": 3,
+      "payload": null,
+      "numbers": 1
+    },
+    {
+      "user_id": 4,
+      "numbers": null
+    }
+  ]'
+```
+
+This will send 4 events in one `POST`. Since our Pipeline is filtering out records with `numbers` less than 5, `user_id` `3` and `4` should not appear in the table. Feel free to change values and send more events.
+
+## 6. Query the table with R2 SQL
+
+After you have sent your events to the stream, it will take about 30 seconds for the data to show in the table, since that is what we configured our `roll interval` to be in the Sink.
+
+```bash
+npx wrangler r2 sql query "$WAREHOUSE" "SELECT * FROM demo.first_table LIMIT 10"
+```
+
+## Additional resources
+
+<LinkCard
+	title="Managing R2 Data Catalogs"
+	href="/r2/data-catalog/manage-catalogs/"
+	description="Enable or disable R2 Data Catalog on your bucket, retrieve configuration details, and authenticate your Iceberg engine."
+/>
+
+<LinkCard
+	title="Try another example"
+	href="/r2-sql/tutorials/end-to-end-pipeline"
+	description="Detailed tutorial for setting up a simple fraud detection data pipeline, and generate events for it in Python."
+/>
-Original file line number
+Diff line change
@@ Expand Up / @@ -29,4 +29,4 @@ pnpm-debug.log* @@
     /assets/secrets
     /worker/functions/
-    .idea
+    .idea