Skip to content

Commit 1f9632f

Browse files
A ton of changes and improvements
implemented Jerome's feedback in virtually all docs. Docs are properly organized now.
1 parent b8abf91 commit 1f9632f

File tree

9 files changed

+292
-72
lines changed

9 files changed

+292
-72
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,4 +29,5 @@ pnpm-debug.log*
2929
/assets/secrets
3030
/worker/functions/
3131

32-
.idea
32+
.idea
33+
package-lock.json
Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
---
2+
pcx_content_type: get-started
3+
title: Getting started
4+
head: []
5+
sidebar:
6+
order: 2
7+
description: Learn how to get up and running with R2 SQL using R2 Data Catalog and Pipelines
8+
---
9+
import {
10+
Render,
11+
LinkCard,
12+
} from "~/components";
13+
14+
## Overview
15+
16+
This guide will instruct you through:
17+
18+
- Creating an [R2 bucket](/r2/buckets/) and enabling its [data catalog](/r2/data-catalog/).
19+
- Using Wrangler to create a Pipeline Stream, Sink, and the SQL that reads from the stream and writes it to the sink
20+
- Sending some data to the stream via the HTTP Streams endpoint
21+
- Querying the data using R2 SQL
22+
23+
## Prerequisites
24+
25+
1. Sign up for a [Cloudflare account](https://dash.cloudflare.com/sign-up).
26+
2. Install [Node.js](https://nodejs.org/en/).
27+
3. Install [Wrangler](/workers/wranger/install-and-update)
28+
29+
:::note[Node.js version manager]
30+
Use a Node version manager like [Volta](https://volta.sh/) or [nvm](https://github.com/nvm-sh/nvm) to avoid permission issues and change Node.js versions. Wrangler requires a Node version of 16.17.0 or later.
31+
:::
32+
33+
## 1. Set up authentication
34+
35+
You'll need API tokens to interact with Cloudflare services.
36+
37+
### Custom API Token
38+
1. Go to **My Profile****API Tokens** in the Cloudflare dashboard
39+
2. Select **Create Token****Custom token**
40+
3. Add the following permissions:
41+
- **Workers Pipelines** - Read, Send, Edit
42+
- **Workers R2 Storage** - Edit, Read
43+
- **Workers R2 Data Catalog** - Edit, Read
44+
- **Workers R2 SQL** - Read
45+
46+
Export your new token as an environment variable:
47+
48+
```bash
49+
export WRANGLER_R2_SQL_AUTH_TOKEN=your_token_here
50+
```
51+
52+
If this is your first time using Wrangler, make sure to login.
53+
```bash
54+
npx wrangler login
55+
```
56+
57+
## 2. Create an R2 bucket
58+
59+
Create a new R2 bucket:
60+
61+
```bash
62+
npx wrangler r2 bucket create r2-sql-demo
63+
```
64+
65+
## 3. Enable R2 Data Catalog
66+
67+
Enable [R2 Data Catalog](/r2/data-catalog/) feature on your bucket to use Apache Iceberg tables:
68+
69+
```bash
70+
npx wrangler r2 bucket catalog enable r2-sql-demo
71+
```
72+
## 4. Create the data Pipeline
73+
74+
### 1. Create the Pipeline Stream
75+
76+
First, create a schema file called `demo_schema.json` with the following `json` schema:
77+
```json
78+
{
79+
"fields": [
80+
{"name": "user_id", "type": "int64", "required": true},
81+
{"name": "payload", "type": "string", "required": false},
82+
{"name": "numbers", "type": "int32", "required": false}
83+
]
84+
}
85+
```
86+
Next, crete the stream we'll use to ingest events to:
87+
88+
```bash
89+
npx wrangler pipelines streams create demo_stream \
90+
--schema-file demo_schema.json \
91+
--http-enabled true \
92+
--http-auth false
93+
```
94+
:::note
95+
Note the **HTTP Ingest Endpoint URL** from the output. This is the endpoint you'll use to send data to your pipeline.
96+
:::
97+
98+
```bash
99+
# The http ingest endpoint from the output (see example below)
100+
export STREAM_ENDPOINT= #the http ingest endpoint from the output (see example below)
101+
```
102+
The output should look like this:
103+
```sh
104+
🌀 Creating stream 'demo_stream'...
105+
✨ Successfully created stream 'demo_stream' with id 'stream_id'.
106+
107+
Creation Summary:
108+
General:
109+
Name: demo_stream
110+
111+
HTTP Ingest:
112+
Enabled: Yes
113+
Authentication: No
114+
Endpoint: https://stream_id.ingest.cloudflare.com
115+
CORS Origins: None
116+
117+
Input Schema:
118+
┌────────────┬────────┬────────────┬──────────┐
119+
│ Field Name │ Type │ Unit/Items │ Required │
120+
├────────────┼────────┼────────────┼──────────┤
121+
│ user_id │ int64 │ │ Yes │
122+
├────────────┼────────┼────────────┼──────────┤
123+
│ payload │ string │ │ No │
124+
├────────────┼────────┼────────────┼──────────┤
125+
│ numbers │ int32 │ │ No │
126+
└────────────┴────────┴────────────┴──────────┘
127+
```
128+
129+
130+
### 2. Create the Pipeline Sink
131+
132+
Create a sink that writes data to your R2 bucket as Apache Iceberg tables:
133+
134+
```bash
135+
npx wrangler pipelines sinks create demo_sink \
136+
--type "r2-data-catalog" \
137+
--bucket "r2-sql-demo" \
138+
--roll-interval 30 \
139+
--namespace "demo" \
140+
--table "first_table" \
141+
--catalog-token $WRANGLER_R2_SQL_AUTH_TOKEN
142+
```
143+
144+
:::note
145+
This creates a `sink` configuration that will write to the Iceberg table demo.first_table in your R2 Data Catalog every 30 seconds. Pipelines automatically appends an `__ingest_ts` column that is used to partition the table by `DAY`
146+
:::
147+
148+
### 3. Create the Pipeline
149+
150+
Pipelines are SQL statements read data from the stream, does some work, and writes it to the sink
151+
152+
```bash
153+
npx wrangler pipelines create demo_pipeline \
154+
--sql "INSERT INTO demo_sink SELECT * FROM demo_stream WHERE numbers > 5;"
155+
```
156+
:::note
157+
Note that there is a filter on this statement that will only send events where `numbers` is greater than 5
158+
:::
159+
160+
## 5. Send some data
161+
162+
Next, let's send some events to our stream:
163+
164+
```curl
165+
curl -X POST "$STREAM_ENDPOINT" \
166+
-H "Authorization: Bearer YOUR_API_TOKEN" \
167+
-H "Content-Type: application/json" \
168+
-d '[
169+
{
170+
"user_id": 1,
171+
"payload": "you should see this",
172+
"numbers": 42
173+
},
174+
{
175+
"user_id": 2,
176+
"payload": "you should also see this",
177+
"numbers": 100
178+
},
179+
{
180+
"user_id": 3,
181+
"payload": null,
182+
"numbers": 1
183+
},
184+
{
185+
"user_id": 4,
186+
"numbers": null
187+
}
188+
]'
189+
```
190+
This will send 4 events in one `POST`. Since our Pipeline is filtering out records with `numbers` less than 5, `user_id` `3` and `4` should not appear in the table. Feel free to change values and send more events.
191+
192+
## 6. Query the table with R2 SQL
193+
194+
After you've sent your events to the stream, it will take about 30 seconds for the data to show in the table since that's what we configured our `roll interval` to be in the Sink.
195+
196+
```bash
197+
npx wrangler r2 sql query "SELECT * FROM demo.first_table LIMIT 10"
198+
```
199+
200+
<LinkCard
201+
title="Managing R2 Data Catalogs"
202+
href="/r2/data-catalog/manage-catalogs/"
203+
description="Enable or disable R2 Data Catalog on your bucket, retrieve configuration details, and authenticate your Iceberg engine."
204+
/>
205+
206+
<LinkCard
207+
title="Try another example"
208+
href="/r2/sql/tutorials/end-to-end-pipeline"
209+
description="Detailed tutorial for setting up a simple fruad detection data pipeline and generate events for it in Python."
210+
/>

src/content/docs/r2/sql/platform/limitations-best-practices.mdx

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -21,20 +21,20 @@ R2 SQL is designed for querying **partitioned** Apache Iceberg tables in your R2
2121

2222
| Feature | Supported | Notes |
2323
| :---- | :---- | :---- |
24-
| Basic SELECT | Yes | Columns, \*, aliases |
25-
| SQL Functions | No | No COUNT, AVG, etc. |
26-
| Single table FROM | Yes | With aliasing |
24+
| Basic SELECT | Yes | Columns, \* |
25+
| Aggregation functions | No | No COUNT, AVG, etc. |
26+
| Single table FROM | Yes | Note, aliasing not supported|
27+
| WHERE clause | Yes | Filters, comparisons, equality, etc |
2728
| JOINs | No | No table joins |
28-
| WHERE with time | Yes | Required |
2929
| Array filtering | No | No array type support |
3030
| JSON filtering | No | No nested object queries |
3131
| Simple LIMIT | Yes | 1-10,000 range |
32-
| ORDER BY | Yes | Only on partition key |
32+
| ORDER BY | Yes | Any columns of the partition key only|
3333
| GROUP BY | No | Not supported |
3434

3535
## Supported SQL Clauses
3636

37-
R2 SQL supports a limited set of SQL clauses: `SELECT`, `FROM`, `WHERE`, and `LIMIT`. All other SQL clauses are not supported at the moment. New features will release often, keep an eye on this page and the changelog\[LINK TO CHANGE LOG\] for the latest.
37+
R2 SQL supports a limited set of SQL clauses: `SELECT`, `FROM`, `WHERE`, and `LIMIT`. All other SQL clauses are not supported at the moment. New features will be released in the future, keep an eye on this page and the changelog\[LINK TO CHANGE LOG\] for the latest.
3838

3939
---
4040

@@ -50,7 +50,7 @@ R2 SQL supports a limited set of SQL clauses: `SELECT`, `FROM`, `WHERE`, and `LI
5050
- **No JSON field querying**: Cannot query individual fields from JSON objects
5151
- **No SQL functions**: Functions like `AVG()`, `COUNT()`, `MAX()`, `MIN()`, quantiles are not supported
5252
- **No synthetic data**: Cannot create synthetic columns like `SELECT 1 AS what, "hello" AS greeting`
53-
- **Field aliasing**: `SELECT field AS another_name`
53+
- **No field aliasing**: `SELECT field AS another_name`
5454

5555

5656
### Examples
@@ -85,7 +85,7 @@ SELECT 1 AS synthetic_column
8585
- **No schema evolution**: Schema cannot be altered (no ALTER TABLE, migrations)
8686
- **Immutable datasets**: No UPDATE or DELETE operations allowed
8787
- **Fully defined schema**: Dynamic or union-type fields are not supported
88-
- **Table aliasing**: `SELECT * FROM table_name AS alias`
88+
- **No table aliasing**: `SELECT * FROM table_name AS alias`
8989

9090
### Examples
9191

@@ -105,13 +105,12 @@ SELECT * FROM (SELECT * FROM events WHERE status = 200)
105105

106106
### Supported Features
107107

108-
- **Time filtering**: Queries should include a time filter
109-
- **Simple type filtering**: Supports `string`, `boolean`, and `number` types
108+
- **Simple type filtering**: Supports `string`, `boolean`, `number` types, and timestamps expressed as RFC3339
110109
- **Boolean logic**: Supports `AND`, `OR`, `NOT` operators
111110
- **Comparison operators**: `>`, `>=`, `=`, `<`, `<=`, `!=`
112111
- **Grouped conditions**: `WHERE col_a="hello" AND (col_b>5 OR col_c != 3)`
113-
- **Pattern mating:** `WHERE col_a LIKE ‘%hello w%’`
114-
- **NULL Handling:** `WHERE col_a IS NOT NULL`
112+
- **Pattern matching:** `WHERE col_a LIKE ‘hello w%’` (prefix matching only)
113+
- **NULL Handling :** `WHERE col_a IS NOT NULL` (`IS`/`IS NOT`)
115114

116115
### Limitations
117116

@@ -208,5 +207,4 @@ The following SQL clauses are **not supported**:
208207
2. **Use specific column selection** instead of `SELECT *` when possible for better performance
209208
3. **Structure your data** to avoid nested JSON objects if you need to filter on those fields
210209

211-
---
212-
210+
---

src/content/docs/r2/sql/platform/pricing.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,4 @@ R2 SQL is currently not billed during open beta but will eventually be billed on
1414

1515
During the first phase of the R2 SQL open beta, you will not be billed for R2 SQL usage. You will be billed only for R2 usage.
1616

17-
We plan to price based on the volume of data queried by R2 SQL. We will provide at least 30 days' notice and exact pricing before charging.
17+
We plan to price based on the volume of data queried by R2 SQL. We will provide at least 30 days notice and exact pricing before charging.

src/content/docs/r2/sql/platform/sql-reference.mdx

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,7 @@ SELECT * WHERE condition [AND|OR condition ...]
9393
- `column_name <= value`
9494
- `column_name < value`
9595
- `column_name != value`
96+
- `column_name LIKE value%`
9697

9798
#### Logical Operators
9899

@@ -104,11 +105,12 @@ SELECT * WHERE condition [AND|OR condition ...]
104105
- **integer** \- Whole numbers
105106
- **float** \- Decimal numbers
106107
- **string** \- Text values (quoted)
108+
- **timestamp** - RFC3339 format (`'YYYY-DD-MMT-HH:MM:SSZ'`)
107109

108110
### Examples
109111

110112
```sql
111-
SELECT * FROM table_name WHERE timestamp BETWEEN '2025-01-01' AND '2025-01-02'
113+
SELECT * FROM table_name WHERE timestamp BETWEEN '2025-09-24T01:00:00Z' AND '2025-09-25T01:00:00Z'
112114
SELECT * FROM table_name WHERE status = 200
113115
SELECT * FROM table_name WHERE response_time > 1000
114116
SELECT * FROM table_name WHERE user_id IS NOT NULL
@@ -123,19 +125,21 @@ SELECT * FROM table_name WHERE (status = 404 OR status = 500) AND timestamp > '2
123125
### Syntax
124126

125127
```sql
126-
--Note: ORDERY BY only supports ordering by the partition key
128+
--Note: ORDER BY only supports ordering by the partition key
127129
ORDER BY partition_key [DESC]
128130
```
129131

130-
- **Default**: Ascending order (ASC)
132+
- **ASC**: Ascending order
131133
- **DESC**: Descending order
134+
- **Default**: partition_key DESC
135+
- Can contain any columns from the partition key
132136

133137
### Examples
134138

135139
```sql
136-
SELECT * FROM table_name WHERE ... ORDER BY partitionKey
137-
SELECT * FROM table_name WHERE ... ORDER BY partitionKey DESC
138-
SELECT * FROM table_name WHERE ... ORDER BY partitionKey DESC
140+
SELECT * FROM table_name WHERE ... ORDER BY paetition_key_A
141+
SELECT * FROM table_name WHERE ... ORDER BY partition_key_B DESC
142+
SELECT * FROM table_name WHERE ... ORDER BY partitionKey_A ASC
139143

140144
```
141145

@@ -151,6 +155,7 @@ LIMIT number
151155

152156
- **Range**: 1 to 10,000
153157
- **Type**: Integer only
158+
- **Default**: 500
154159

155160
### Examples
156161

@@ -167,7 +172,7 @@ SELECT * FROM table_name WHERE ... LIMIT 100
167172
```sql
168173
SELECT *
169174
FROM http_requests
170-
WHERE timestamp BETWEEN '2024-01-01' AND '2024-01-02'
175+
WHERE timestamp BETWEEN '2025-09-24T01:00:00Z' AND '2025-09-25T01:00:00Z'
171176
LIMIT 100
172177
```
173178

@@ -215,6 +220,8 @@ LIMIT 500
215220
| `integer` | Whole numbers | `1`, `42`, `-10`, `0` |
216221
| `float` | Decimal numbers | `1.5`, `3.14`, `-2.7`, `0.0` |
217222
| `string` | Text values | `'hello'`, `'GET'`, `'2024-01-01'` |
223+
| `boolean` | boolean values | `true`, `false` |
224+
| `timestamp` | RFC3339 | `'2025-09-24T01:00:00Z'` |
218225

219226
### Type Usage in Conditions
220227

@@ -237,7 +244,7 @@ SELECT * FROM table_name WHERE country_code = 'US'
237244

238245
## Operator Precedence
239246

240-
1. **Comparison operators**: `=`, `!=`, `<`, `<=`, `>`, `>=`, `BETWEEN`, `IS NULL`, `IS NOT NULL`
247+
1. **Comparison operators**: `=`, `!=`, `<`, `<=`, `>`, `>=`, `LIK#`, `BETWEEN`, `IS NULL`, `IS NOT NULL`
241248
2. **AND** (higher precedence)
242249
3. **OR** (lower precedence)
243250

0 commit comments

Comments
 (0)