Skip to content
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
cb03a30
initial docs
Marcinthecloud Sep 18, 2025
1cbebac
fixed link in index
Marcinthecloud Sep 18, 2025
dd0e8d5
fix indents in index, add query-data
Marcinthecloud Sep 19, 2025
87e5a32
Improved all docs, added index.mdx in platform
Marcinthecloud Sep 19, 2025
b8abf91
removed redundant command
Marcinthecloud Sep 19, 2025
1f9632f
A ton of changes and improvements
Marcinthecloud Sep 20, 2025
9512bde
Update get-started.mdx
Marcinthecloud Sep 20, 2025
579cbf2
Update end-to-end-pipeline.mdx
Marcinthecloud Sep 20, 2025
3b1acc7
added dash steps/tabs, moved out of r2, reformatted most of the R2 SQ…
Marcinthecloud Sep 22, 2025
5a27768
added new R2 SQL token env variable
Marcinthecloud Sep 22, 2025
dde1d62
adding wrangler commands
Marcinthecloud Sep 22, 2025
74b405c
Update .gitignore
Marcinthecloud Sep 22, 2025
d08b951
PCX Review
Oxyjun Sep 23, 2025
741c9ed
Update src/content/docs/r2-sql/reference/limitations-best-practices.mdx
Marcinthecloud Sep 23, 2025
79680f6
adding improvements from the latest round of reviews
Marcinthecloud Sep 23, 2025
39c0e8c
Merge branch 'mselwan-pipeline' of https://github.com/Marcinthecloud/…
Marcinthecloud Sep 23, 2025
be11616
fixed min permissions needed
Marcinthecloud Sep 23, 2025
205b4a4
more improvements from reviews
Marcinthecloud Sep 23, 2025
f199743
Merge branch 'mselwan-pipeline' of https://github.com/Marcinthecloud/…
Marcinthecloud Sep 23, 2025
f5d33f5
Update src/content/docs/r2-sql/platform/pricing.mdx
Marcinthecloud Sep 23, 2025
5004c64
Update src/content/docs/r2-sql/reference/limitations-best-practices.mdx
Marcinthecloud Sep 23, 2025
b7693e4
Update src/content/docs/r2-sql/reference/sql-reference.mdx
Marcinthecloud Sep 23, 2025
ca29d7d
Adding improvements from Nikita's review
Marcinthecloud Sep 23, 2025
b15a5a0
Merge branch 'mselwan-pipeline' of https://github.com/Marcinthecloud/…
Marcinthecloud Sep 23, 2025
8318059
changed the getting started to match Pipelines for consistency
Marcinthecloud Sep 23, 2025
5a67919
Small formatting + other changes
jonesphillip Sep 23, 2025
8f7f5e9
fixed typo and added the changelog
Marcinthecloud Sep 23, 2025
803fcd1
Add redirect for troubleshooting guide
jonesphillip Sep 23, 2025
c88356a
Chagnes to changelog
jonesphillip Sep 23, 2025
501babb
Fixes pipeline, broken links
jonesphillip Sep 24, 2025
6f1c557
adding our official r2-sql icon
Marcinthecloud Sep 24, 2025
83ab7c9
addressing Yevgen's feedback
Marcinthecloud Sep 24, 2025
2b18964
fix typo in getting started for r2 sql
jonesphillip Sep 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,4 @@ pnpm-debug.log*
/assets/secrets
/worker/functions/

.idea
.idea
2 changes: 1 addition & 1 deletion src/content/dash-routes/index.json
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,7 @@
},
{
"name": "Pipelines",
"deeplink": "/?to=/:account/workers/pipelines",
"deeplink": "/?to=/:account/pipelines",
"parent": ["Storage & Databases"]
},
{
Expand Down
350 changes: 350 additions & 0 deletions src/content/docs/r2-sql/get-started.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,350 @@
---
pcx_content_type: get-started
title: Getting started
head: []
sidebar:
order: 2
description: Learn how to get up and running with R2 SQL using R2 Data Catalog and Pipelines
---
import {
Render,
Steps,
Tabs,
TabItem,
DashButton,
LinkCard,
} from "~/components";

## Overview

This guide will instruct you through:

- Creating an [R2 bucket](/r2/buckets/) and enabling its [data catalog](/r2/data-catalog/).
- Using Wrangler to create a Pipeline Stream, Sink, and the SQL that reads from the stream and writes it to the sink.
- Sending some data to the stream via the HTTP Streams endpoint.
- Querying the data using R2 SQL.

## Prerequisites

1. Sign up for a [Cloudflare account](https://dash.cloudflare.com/sign-up).
2. Install [Node.js](https://nodejs.org/en/).
3. Install [Wrangler](/workers/wranger/install-and-update).

:::note[Node.js version manager]
Use a Node version manager like [Volta](https://volta.sh/) or [nvm](https://github.com/nvm-sh/nvm) to avoid permission issues and change Node.js versions.

Wrangler requires a Node version of 16.17.0 or later.
:::

## 1. Set up authentication

You will need API tokens to interact with Cloudflare services.

<Steps>
1. In the Cloudflare dashboard, go to the **R2 object storage** page.

<DashButton url="/?to=/:account/r2/overview" />

2. Select **Manage API tokens**.

3. Select **Create API token**.

4. Select the **R2 Token** text to edit your API token name.

5. Under **Permissions**, choose the **Admin Read & Write** permission.

6. Select **Create API Token**.

7. Note the **Token value**.

</Steps>

Export your new token as an environment variable:

```bash
export WRANGLER_R2_SQL_AUTH_TOKEN= #paste your token here
```

If this is your first time using Wrangler, make sure to login.

```bash
npx wrangler login
```

## 2. Create an R2 bucket and enable R2 Data Catalog

<Tabs syncKey='CLIvDash'>
<TabItem label='Wrangler CLI'>

Create an R2 bucket:

```bash
npx wrangler r2 bucket create r2-sql-demo
```

</TabItem>
<TabItem label='Dashboard'>

<Steps>
1. In the Cloudflare dashboard, go to the **R2 object storage** page.

<DashButton url="/?to=/:account/r2/overview" />

2. Select **Create bucket**.

3. Enter the bucket name: `r2-sql-demo`

4. Select **Create bucket**.
</Steps>
</TabItem>
</Tabs>

## 3. Enable R2 Data Catalog

<Tabs syncKey='CLIvDash'>
<TabItem label='Wrangler CLI'>

Enable the catalog on your R2 bucket:

```bash
npx wrangler r2 bucket catalog enable r2-sql-demo
```

When you run this command, take note of the "Warehouse". You will need these later.

</TabItem>
<TabItem label='Dashboard'>

<Steps>
1. In the Cloudflare dashboard, go to the **R2 object storage** page.

<DashButton url="/?to=/:account/r2/overview" />

2. Select the bucket: `r2-sql-demo`.

3. Switch to the **Settings** tab, scroll down to **R2 Data Catalog**, and select **Enable**.

4. Once enabled, note the **Catalog URI** and **Warehouse name**.
</Steps>
</TabItem>
</Tabs>


:::note
Copy the warehouse (ACCOUNTID_BUCKETNAME) and paste it in the `export` below. We will use it later in the tutorial.
:::

```bash
export $WAREHOUSE= #Paste your warehouse here
```

## 4. Create the data Pipeline

<Tabs syncKey='CLIvDash'>
<TabItem label='Wrangler CLI'>

### 4.1. Create the Pipeline Stream

First, create a schema file called `demo_schema.json` with the following `json` schema:

```json
{
"fields": [
{"name": "user_id", "type": "int64", "required": true},
{"name": "payload", "type": "string", "required": false},
{"name": "numbers", "type": "int32", "required": false}
]
}
```
Next, create the stream we will use to ingest events to:

```bash
npx wrangler pipelines streams create demo_stream \
--schema-file demo_schema.json \
--http-enabled true \
--http-auth false
```
:::note
Note the **HTTP Ingest Endpoint URL** from the output. This is the endpoint you will use to send data to your pipeline.
:::

```bash
# The http ingest endpoint from the output (see example below)
export STREAM_ENDPOINT= #the http ingest endpoint from the output (see example below)
```

The output should look like this:

```sh
🌀 Creating stream 'demo_stream'...
✨ Successfully created stream 'demo_stream' with id 'stream_id'.

Creation Summary:
General:
Name: demo_stream

HTTP Ingest:
Enabled: Yes
Authentication: No
Endpoint: https://stream_id.ingest.cloudflare.com
CORS Origins: None

Input Schema:
┌────────────┬────────┬────────────┬──────────┐
│ Field Name │ Type │ Unit/Items │ Required │
├────────────┼────────┼────────────┼──────────┤
│ user_id │ int64 │ │ Yes │
├────────────┼────────┼────────────┼──────────┤
│ payload │ string │ │ No │
├────────────┼────────┼────────────┼──────────┤
│ numbers │ int32 │ │ No │
└────────────┴────────┴────────────┴──────────┘
```

### 4.2. Create the Pipeline Sink

Create a sink that writes data to your R2 bucket as Apache Iceberg tables:

```bash
npx wrangler pipelines sinks create demo_sink \
--type "r2-data-catalog" \
--bucket "r2-sql-demo" \
--roll-interval 30 \
--namespace "demo" \
--table "first_table" \
--catalog-token $WRANGLER_R2_SQL_AUTH_TOKEN
```

:::note
This creates a `sink` configuration that will write to the Iceberg table `demo.first_table` in your R2 Data Catalog every 30 seconds. Pipelines automatically appends an `__ingest_ts` column that is used to partition the table by `DAY`.
:::

### 4.3. Create the Pipeline

Pipelines are SQL statements that reads data from the stream, does some work, and writes it to the sink.

```bash
npx wrangler pipelines create demo_pipeline \
--sql "INSERT INTO demo_sink SELECT * FROM demo_stream WHERE numbers > 5;"
```
:::note
Note that there is a filter on this statement that will only send events where `numbers` is greater than 5.
:::

</TabItem>
<TabItem label='Dashboard'>
<Steps>
1. In the Cloudflare dashboard, go to the Pipelines page.

<DashButton url="/?to=/:account/pipelines" />

2. Select **Create Pipeline**.

3. **Connect to a Stream**:
- Pipeline name: `demo`
- Enable HTTP endpoint for sending data: Enabled
- HTTP authentication: Disabled (default)
- Select **Next**

4. **Define Input Schema**:
- Select **JSON editor**
- Copy in the schema:
```json
{
"fields": [
{"name": "user_id", "type": "int64", "required": true},
{"name": "payload", "type": "string", "required": false},
{"name": "numbers", "type": "int32", "required": false}
]
}
```

- Select **Next**

5. **Define Sink**:
- Select your R2 bucket: `r2-sql-demo`
- Storage type: **R2 Data Catalog**
- Namespace: `fraud_detection`
- Table name: `transactions`
- **Advanced Settings**: Change **Maximum Time Interval** to `30 seconds`
- Select **Next**

6. **Credentials**:
- Disable **Automatically create an Account API token for your sink**
- Enter **Catalog Token** from step 1
- Select **Next**

7. **Pipeline Definition**:
- Leave the default SQL query:
```sql
INSERT INTO demo_sink SELECT * FROM demo_stream;
```
- Select **Create Pipeline**

8. :::note
Note the **HTTP Ingest Endpoint URL** from the output. This is the endpoint you will use to send data to your pipeline.
:::

</Steps>

```bash
# The http ingest endpoint
export STREAM_ENDPOINT= #the http ingest endpoint from the output (see example below)
```
</TabItem>
</Tabs>


## 5. Send some data

Next, send some events to our stream:

```curl
curl -X POST "$STREAM_ENDPOINT" \
-H "Content-Type: application/json" \
-d '[
{
"user_id": 1,
"payload": "you should see this",
"numbers": 42
},
{
"user_id": 2,
"payload": "you should also see this",
"numbers": 100
},
{
"user_id": 3,
"payload": null,
"numbers": 1
},
{
"user_id": 4,
"numbers": null
}
]'
```

This will send 4 events in one `POST`. Since our Pipeline is filtering out records with `numbers` less than 5, `user_id` `3` and `4` should not appear in the table. Feel free to change values and send more events.

## 6. Query the table with R2 SQL

After you have sent your events to the stream, it will take about 30 seconds for the data to show in the table, since that is what we configured our `roll interval` to be in the Sink.

```bash
npx wrangler r2 sql query "$WAREHOUSE" "SELECT * FROM demo.first_table LIMIT 10"
```

## Additional resources

<LinkCard
title="Managing R2 Data Catalogs"
href="/r2/data-catalog/manage-catalogs/"
description="Enable or disable R2 Data Catalog on your bucket, retrieve configuration details, and authenticate your Iceberg engine."
/>

<LinkCard
title="Try another example"
href="/r2-sql/tutorials/end-to-end-pipeline"
description="Detailed tutorial for setting up a simple fraud detection data pipeline, and generate events for it in Python."
/>
Loading