Skip to content

Commit d08b951

Browse files
committed
PCX Review
1 parent 74b405c commit d08b951

File tree

8 files changed

+174
-140
lines changed

8 files changed

+174
-140
lines changed

src/content/docs/r2-sql/get-started.mdx

Lines changed: 38 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -20,26 +20,29 @@ import {
2020
This guide will instruct you through:
2121

2222
- Creating an [R2 bucket](/r2/buckets/) and enabling its [data catalog](/r2/data-catalog/).
23-
- Using Wrangler to create a Pipeline Stream, Sink, and the SQL that reads from the stream and writes it to the sink
24-
- Sending some data to the stream via the HTTP Streams endpoint
25-
- Querying the data using R2 SQL
23+
- Using Wrangler to create a Pipeline Stream, Sink, and the SQL that reads from the stream and writes it to the sink.
24+
- Sending some data to the stream via the HTTP Streams endpoint.
25+
- Querying the data using R2 SQL.
2626

2727
## Prerequisites
2828

2929
1. Sign up for a [Cloudflare account](https://dash.cloudflare.com/sign-up).
3030
2. Install [Node.js](https://nodejs.org/en/).
31-
3. Install [Wrangler](/workers/wranger/install-and-update)
31+
3. Install [Wrangler](/workers/wranger/install-and-update).
3232

3333
:::note[Node.js version manager]
34-
Use a Node version manager like [Volta](https://volta.sh/) or [nvm](https://github.com/nvm-sh/nvm) to avoid permission issues and change Node.js versions. Wrangler requires a Node version of 16.17.0 or later.
34+
Use a Node version manager like [Volta](https://volta.sh/) or [nvm](https://github.com/nvm-sh/nvm) to avoid permission issues and change Node.js versions.
35+
36+
Wrangler requires a Node version of 16.17.0 or later.
3537
:::
3638

3739
## 1. Set up authentication
3840

39-
You'll need API tokens to interact with Cloudflare services.
41+
You will need API tokens to interact with Cloudflare services.
4042

4143
<Steps>
4244
1. In the Cloudflare dashboard, go to the **R2 object storage** page.
45+
4346
<DashButton url="/?to=/:account/r2/overview" />
4447

4548
2. Select **Manage API tokens**.
@@ -63,6 +66,7 @@ export WRANGLER_R2_SQL_AUTH_TOKEN= #paste your token here
6366
```
6467

6568
If this is your first time using Wrangler, make sure to login.
69+
6670
```bash
6771
npx wrangler login
6872
```
@@ -83,18 +87,19 @@ Create an R2 bucket:
8387

8488
<Steps>
8589
1. In the Cloudflare dashboard, go to the **R2 object storage** page.
90+
8691
<DashButton url="/?to=/:account/r2/overview" />
8792

8893
2. Select **Create bucket**.
8994

90-
3. Enter the bucket name: r2-sql-demo
95+
3. Enter the bucket name: `r2-sql-demo`
9196

9297
4. Select **Create bucket**.
9398
</Steps>
9499
</TabItem>
95100
</Tabs>
96101

97-
## 2. Enable R2 Data Catalog
102+
## 3. Enable R2 Data Catalog
98103

99104
<Tabs syncKey='CLIvDash'>
100105
<TabItem label='Wrangler CLI'>
@@ -112,9 +117,10 @@ When you run this command, take note of the "Warehouse". You will need these lat
112117

113118
<Steps>
114119
1. In the Cloudflare dashboard, go to the **R2 object storage** page.
120+
115121
<DashButton url="/?to=/:account/r2/overview" />
116122

117-
2. Select the bucket: r2-sql-demo.
123+
2. Select the bucket: `r2-sql-demo`.
118124

119125
3. Switch to the **Settings** tab, scroll down to **R2 Data Catalog**, and select **Enable**.
120126

@@ -125,20 +131,22 @@ When you run this command, take note of the "Warehouse". You will need these lat
125131

126132

127133
:::note
128-
Copy the warehouse (ACCOUNTID_BUCKETNAME) and paste it in the `export` below. We'll use it later in the tutorial.
134+
Copy the warehouse (ACCOUNTID_BUCKETNAME) and paste it in the `export` below. We will use it later in the tutorial.
129135
:::
130136

131137
```bash
132138
export $WAREHOUSE= #Paste your warehouse here
133139
```
134140

135-
## 3. Create the data Pipeline
141+
## 4. Create the data Pipeline
136142

137143
<Tabs syncKey='CLIvDash'>
138144
<TabItem label='Wrangler CLI'>
139-
### 1. Create the Pipeline Stream
145+
146+
### 4.1. Create the Pipeline Stream
140147

141148
First, create a schema file called `demo_schema.json` with the following `json` schema:
149+
142150
```json
143151
{
144152
"fields": [
@@ -148,7 +156,7 @@ First, create a schema file called `demo_schema.json` with the following `json`
148156
]
149157
}
150158
```
151-
Next, crete the stream we'll use to ingest events to:
159+
Next, create the stream we will use to ingest events to:
152160

153161
```bash
154162
npx wrangler pipelines streams create demo_stream \
@@ -157,14 +165,16 @@ npx wrangler pipelines streams create demo_stream \
157165
--http-auth false
158166
```
159167
:::note
160-
Note the **HTTP Ingest Endpoint URL** from the output. This is the endpoint you'll use to send data to your pipeline.
168+
Note the **HTTP Ingest Endpoint URL** from the output. This is the endpoint you will use to send data to your pipeline.
161169
:::
162170

163171
```bash
164172
# The http ingest endpoint from the output (see example below)
165173
export STREAM_ENDPOINT= #the http ingest endpoint from the output (see example below)
166174
```
175+
167176
The output should look like this:
177+
168178
```sh
169179
🌀 Creating stream 'demo_stream'...
170180
✨ Successfully created stream 'demo_stream' with id 'stream_id'.
@@ -191,8 +201,7 @@ Input Schema:
191201
└────────────┴────────┴────────────┴──────────┘
192202
```
193203

194-
195-
### 2. Create the Pipeline Sink
204+
### 4.2. Create the Pipeline Sink
196205

197206
Create a sink that writes data to your R2 bucket as Apache Iceberg tables:
198207

@@ -207,25 +216,26 @@ npx wrangler pipelines sinks create demo_sink \
207216
```
208217

209218
:::note
210-
This creates a `sink` configuration that will write to the Iceberg table demo.first_table in your R2 Data Catalog every 30 seconds. Pipelines automatically appends an `__ingest_ts` column that is used to partition the table by `DAY`
219+
This creates a `sink` configuration that will write to the Iceberg table `demo.first_table` in your R2 Data Catalog every 30 seconds. Pipelines automatically appends an `__ingest_ts` column that is used to partition the table by `DAY`.
211220
:::
212221

213-
### 3. Create the Pipeline
222+
### 4.3. Create the Pipeline
214223

215-
Pipelines are SQL statements read data from the stream, does some work, and writes it to the sink
224+
Pipelines are SQL statements that reads data from the stream, does some work, and writes it to the sink.
216225

217226
```bash
218227
npx wrangler pipelines create demo_pipeline \
219228
--sql "INSERT INTO demo_sink SELECT * FROM demo_stream WHERE numbers > 5;"
220229
```
221230
:::note
222-
Note that there is a filter on this statement that will only send events where `numbers` is greater than 5
231+
Note that there is a filter on this statement that will only send events where `numbers` is greater than 5.
223232
:::
224233

225234
</TabItem>
226235
<TabItem label='Dashboard'>
227236
<Steps>
228-
1. In the Cloudflare dashboard, go to **Pipelines** > **Pipelines**.
237+
1. In the Cloudflare dashboard, go to the Pipelines page.
238+
229239
<DashButton url="/?to=/:account/pipelines" />
230240

231241
2. Select **Create Pipeline**.
@@ -272,7 +282,7 @@ Note that there is a filter on this statement that will only send events where `
272282
- Select **Create Pipeline**
273283

274284
8. :::note
275-
Note the **HTTP Ingest Endpoint URL** from the output. This is the endpoint you'll use to send data to your pipeline.
285+
Note the **HTTP Ingest Endpoint URL** from the output. This is the endpoint you will use to send data to your pipeline.
276286
:::
277287

278288
</Steps>
@@ -287,7 +297,7 @@ export STREAM_ENDPOINT= #the http ingest endpoint from the output (see example b
287297

288298
## 5. Send some data
289299

290-
Next, let's send some events to our stream:
300+
Next, send some events to our stream:
291301

292302
```curl
293303
curl -X POST "$STREAM_ENDPOINT" \
@@ -314,16 +324,19 @@ curl -X POST "$STREAM_ENDPOINT" \
314324
}
315325
]'
316326
```
327+
317328
This will send 4 events in one `POST`. Since our Pipeline is filtering out records with `numbers` less than 5, `user_id` `3` and `4` should not appear in the table. Feel free to change values and send more events.
318329

319330
## 6. Query the table with R2 SQL
320331

321-
After you've sent your events to the stream, it will take about 30 seconds for the data to show in the table since that's what we configured our `roll interval` to be in the Sink.
332+
After you have sent your events to the stream, it will take about 30 seconds for the data to show in the table, since that is what we configured our `roll interval` to be in the Sink.
322333

323334
```bash
324335
npx wrangler r2 sql query "$WAREHOUSE" "SELECT * FROM demo.first_table LIMIT 10"
325336
```
326337

338+
## Additional resources
339+
327340
<LinkCard
328341
title="Managing R2 Data Catalogs"
329342
href="/r2/data-catalog/manage-catalogs/"
@@ -333,5 +346,5 @@ npx wrangler r2 sql query "$WAREHOUSE" "SELECT * FROM demo.first_table LIMIT 10"
333346
<LinkCard
334347
title="Try another example"
335348
href="/r2-sql/tutorials/end-to-end-pipeline"
336-
description="Detailed tutorial for setting up a simple fruad detection data pipeline and generate events for it in Python."
349+
description="Detailed tutorial for setting up a simple fraud detection data pipeline, and generate events for it in Python."
337350
/>

src/content/docs/r2-sql/index.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ description: A distributed SQL engine for R2 Data Catalog
1616
R2 SQL is in public beta, and any developer with an R2 subscription can start using it. Currently, outside of standard R2 storage and operations, you will not be billed for your use of R2 SQL. We will update [the pricing page](/r2-sql/platform/pricing) and provide at least 30 days notice before enabling billing.
1717
:::
1818

19-
R2 SQL is Cloudflare's serverless, distributed, analytics query engine for querying [Apache Iceberg](https://iceberg.apache.org/) tables stored in [R2 data catalog](https://developers.cloudflare.com/r2/data-catalog/). R2 SQL is designed to efficiently query large amounts of data by automatically utilizing file pruning, Cloudflare's distributed compute, and R2 object storage.
19+
R2 SQL is Cloudflare's serverless, distributed, analytics query engine for querying [Apache Iceberg](https://iceberg.apache.org/) tables stored in [R2 data catalog](/r2/data-catalog/). R2 SQL is designed to efficiently query large amounts of data by automatically utilizing file pruning, Cloudflare's distributed compute, and R2 object storage.
2020

2121
```sh
2222
❯ npx wrangler r2 sql query "3373912de3f5202317188ae01300bd6_data-catalog" \

src/content/docs/r2-sql/platform/pricing.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ head:
1010
---
1111

1212

13-
R2 SQL is currently not billed during open beta but will eventually be billed on the amount of data queried.
13+
R2 SQL is currently not billed during open beta, but will eventually be billed on the amount of data queried.
1414

15-
During the first phase of the R2 SQL open beta, you will not be billed for R2 SQL usage. You will be billed only for R2 usage.
15+
During the first phase of the R2 SQL open beta, you will not be billed for R2 SQL usage. You will only be billed for R2 usage.
1616

1717
We plan to price based on the volume of data queried by R2 SQL. We will provide at least 30 days notice and exact pricing before charging.

src/content/docs/r2-sql/query-data.mdx

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ import {
1111
} from "~/components";
1212

1313
:::note
14-
R2 SQL is currently in open beta
14+
R2 SQL is currently in open beta.
1515
:::
1616

1717
Learn how to:
@@ -32,11 +32,11 @@ Create an [API token](https://dash.cloudflare.com/profile/api-tokens) with:
3232
- Access to R2 storage (**minimum**: read-only)
3333
- Access to R2 SQL (**minimum**: read-only)
3434

35-
Wrangler now supports the environment variable `WRANGLER_R2_SQL_AUTH_TOKEN` which you can `export` your token as.
35+
Wrangler now supports the environment variable `WRANGLER_R2_SQL_AUTH_TOKEN` which you can use to `export` your token.
3636

3737
### Create API token via API
3838

39-
To create an API token programmatically for use with R2 SQL, you'll need to specify R2 SQL, R2 Data Catalog, and R2 storage permission groups in your [Access Policy](/r2/api/tokens/#access-policy).
39+
To create an API token programmatically for use with R2 SQL, you will need to specify R2 SQL, R2 Data Catalog, and R2 storage permission groups in your [Access Policy](/r2/api/tokens/#access-policy).
4040

4141
#### Example Access Policy
4242

@@ -77,12 +77,13 @@ export WRANGLER_R2_SQL_AUTH_TOKEN=your_token_here
7777
```
7878

7979
If this is your first time using Wrangler, make sure to login.
80+
8081
```bash
8182
npx wrangler login
8283
```
8384

8485
:::note
85-
You'll want to copy the **warehouse** of the R2 Data Catalog:
86+
You will want to copy the `Warehouse` of the R2 Data Catalog:
8687
:::
8788

8889
```sh
@@ -103,10 +104,12 @@ To query R2 SQL with Wrangler, simply run:
103104
```sh
104105
npx wrangler r2 sql query "YOUR_WAREHOUSE" "SELECT * FROM namespace.table_name limit 10;"
105106
```
106-
For a full list of supported sql commands, check out the [R2 SQL reference page](/r2-sql/reference/sql-reference).
107+
108+
For a full list of supported sql commands, refer to the [R2 SQL reference page](/r2-sql/reference/sql-reference).
107109

108110

109111
## REST API
112+
110113
Below is an example of using R2 SQL via the REST endpoint:
111114

112115
```bash
@@ -119,7 +122,8 @@ curl -X POST \
119122
}'
120123
```
121124

122-
Learn more:
125+
## Additional resources
126+
123127
<LinkCard
124128
title="Manage R2 Data Catalogs"
125129
href="/r2/data-catalog/manage-catalogs/"
@@ -129,5 +133,5 @@ Learn more:
129133
<LinkCard
130134
title="Build an end to end data pipeline"
131135
href="/r2-sql/tutorials/end-to-end-pipeline"
132-
description="Detailed tutorial for setting up a simple fruad detection data pipeline and generate events for it in Python."
136+
description="Detailed tutorial for setting up a simple fraud detection data pipeline, and generate events for it in Python."
133137
/>

src/content/docs/r2-sql/reference/limitations-best-practices.mdx

Lines changed: 18 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -8,29 +8,28 @@ sidebar:
88

99
---
1010

11-
# R2 SQL Limitations and Best Practices
12-
1311
## Overview
1412

15-
R2 SQL is in public beta, limitations and best practices will change over time.
13+
:::note
14+
R2 SQL is in public beta. Limitations and best practices will change over time.
15+
:::
1616

1717
R2 SQL is designed for querying **partitioned** Apache Iceberg tables in your R2 data catalog. This document outlines the supported features, limitations, and best practices of R2 SQL.
1818

19-
2019
## Quick Reference
2120

22-
| Feature | Supported | Notes |
23-
| :---- | :---- | :---- |
24-
| Basic SELECT | Yes | Columns, \* |
25-
| Aggregation functions | No | No COUNT, AVG, etc. |
26-
| Single table FROM | Yes | Note, aliasing not supported|
27-
| WHERE clause | Yes | Filters, comparisons, equality, etc |
28-
| JOINs | No | No table joins |
29-
| Array filtering | No | No array type support |
30-
| JSON filtering | No | No nested object queries |
31-
| Simple LIMIT | Yes | 1-10,000 range |
32-
| ORDER BY | Yes | Any columns of the partition key only|
33-
| GROUP BY | No | Not supported |
21+
| Feature | Supported | Notes |
22+
| :---- | :---- | :---- |
23+
| Basic SELECT | Yes | Columns, \* |
24+
| Aggregation functions | No | No COUNT, AVG, etc. |
25+
| Single table FROM | Yes | Note, aliasing not supported |
26+
| WHERE clause | Yes | Filters, comparisons, equality, etc |
27+
| JOINs | No | No table joins |
28+
| Array filtering | No | No array type support |
29+
| JSON filtering | No | No nested object queries |
30+
| Simple LIMIT | Yes | 1-10,000 range |
31+
| ORDER BY | Yes | Any columns of the partition key only|
32+
| GROUP BY | No | Not supported |
3433

3534
## Supported SQL Clauses
3635

@@ -203,8 +202,8 @@ The following SQL clauses are **not supported**:
203202

204203
## Best Practices
205204

206-
1. **Always include time filters** in your WHERE clause to ensure efficient queries
207-
2. **Use specific column selection** instead of `SELECT *` when possible for better performance
208-
3. **Structure your data** to avoid nested JSON objects if you need to filter on those fields
205+
1. Always include time filters in your WHERE clause to ensure efficient queries.
206+
2. Use specific column selection instead of `SELECT *` when possible for better performance.
207+
3. Structure your data to avoid nested JSON objects if you need to filter on those fields.
209208

210209
---

src/content/docs/r2-sql/reference/sql-reference.mdx

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,12 @@ sidebar:
1111

1212
## Overview
1313

14-
R2 SQL is in public beta, supported SQL grammar will change over time.
1514

16-
This reference documents the R2 SQL syntax based on the currently supported grammar in public beta.
15+
:::note
16+
R2 SQL is in public beta. Supported SQL grammar may change over time.
17+
:::
18+
19+
This page documents the R2 SQL syntax based on the currently supported grammar in public beta.
1720

1821
---
1922

0 commit comments

Comments
 (0)