Skip to content

Commit 79680f6

Browse files
adding improvements from the latest round of reviews
1 parent d08b951 commit 79680f6

File tree

3 files changed

+30
-30
lines changed

3 files changed

+30
-30
lines changed

src/content/docs/r2-sql/get-started.mdx

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ This guide will instruct you through:
2828

2929
1. Sign up for a [Cloudflare account](https://dash.cloudflare.com/sign-up).
3030
2. Install [Node.js](https://nodejs.org/en/).
31-
3. Install [Wrangler](/workers/wranger/install-and-update).
31+
3. Install [Wrangler](/workers/wrangler/install-and-update).
3232

3333
:::note[Node.js version manager]
3434
Use a Node version manager like [Volta](https://volta.sh/) or [nvm](https://github.com/nvm-sh/nvm) to avoid permission issues and change Node.js versions.
@@ -47,13 +47,13 @@ You will need API tokens to interact with Cloudflare services.
4747

4848
2. Select **Manage API tokens**.
4949

50-
3. Select **Create API token**.
50+
3. Select **Create User API token**.
5151

5252
4. Select the **R2 Token** text to edit your API token name.
5353

5454
5. Under **Permissions**, choose the **Admin Read & Write** permission.
5555

56-
6. Select **Create API Token**.
56+
6. Select **Create User API Token**.
5757

5858
7. Note the **Token value**.
5959

@@ -99,8 +99,6 @@ Create an R2 bucket:
9999
</TabItem>
100100
</Tabs>
101101

102-
## 3. Enable R2 Data Catalog
103-
104102
<Tabs syncKey='CLIvDash'>
105103
<TabItem label='Wrangler CLI'>
106104

@@ -138,12 +136,12 @@ Copy the warehouse (ACCOUNTID_BUCKETNAME) and paste it in the `export` below. We
138136
export $WAREHOUSE= #Paste your warehouse here
139137
```
140138

141-
## 4. Create the data Pipeline
139+
## 3. Create the data Pipeline
142140

143141
<Tabs syncKey='CLIvDash'>
144142
<TabItem label='Wrangler CLI'>
145143

146-
### 4.1. Create the Pipeline Stream
144+
### 3.1. Create the Pipeline Stream
147145

148146
First, create a schema file called `demo_schema.json` with the following `json` schema:
149147

@@ -201,7 +199,7 @@ Input Schema:
201199
└────────────┴────────┴────────────┴──────────┘
202200
```
203201

204-
### 4.2. Create the Pipeline Sink
202+
### 3.2. Create the Pipeline Sink
205203

206204
Create a sink that writes data to your R2 bucket as Apache Iceberg tables:
207205

@@ -219,7 +217,7 @@ npx wrangler pipelines sinks create demo_sink \
219217
This creates a `sink` configuration that will write to the Iceberg table `demo.first_table` in your R2 Data Catalog every 30 seconds. Pipelines automatically appends an `__ingest_ts` column that is used to partition the table by `DAY`.
220218
:::
221219

222-
### 4.3. Create the Pipeline
220+
### 3.3. Create the Pipeline
223221

224222
Pipelines are SQL statements that reads data from the stream, does some work, and writes it to the sink.
225223

@@ -295,7 +293,7 @@ export STREAM_ENDPOINT= #the http ingest endpoint from the output (see example b
295293
</Tabs>
296294

297295

298-
## 5. Send some data
296+
## 4. Send some data
299297

300298
Next, send some events to our stream:
301299

@@ -327,7 +325,7 @@ curl -X POST "$STREAM_ENDPOINT" \
327325

328326
This will send 4 events in one `POST`. Since our Pipeline is filtering out records with `numbers` less than 5, `user_id` `3` and `4` should not appear in the table. Feel free to change values and send more events.
329327

330-
## 6. Query the table with R2 SQL
328+
## 5. Query the table with R2 SQL
331329

332330
After you have sent your events to the stream, it will take about 30 seconds for the data to show in the table, since that is what we configured our `roll interval` to be in the Sink.
333331

src/content/docs/r2-sql/query-data.mdx

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ sidebar:
88
import {
99
Render,
1010
LinkCard,
11+
Tabs,
12+
TabItem,
13+
Steps
1114
} from "~/components";
1215

1316
:::note
@@ -24,8 +27,8 @@ R2 SQL can currently be accessed via Wrangler commands or a REST API.
2427

2528
To query Apache Iceberg tables in R2 Data Catalog, you must provide a Cloudflare API token with R2 SQL, R2 Data Catalog, and R2 storage permissions.
2629

27-
### Create API token in the dashboard
28-
30+
<Tabs syncKey='CLIvDash'>
31+
<TabItem label='Dashboard'>
2932
Create an [API token](https://dash.cloudflare.com/profile/api-tokens) with:
3033

3134
- Access to R2 Data Catalog (**minimum**: read-only)
@@ -34,8 +37,8 @@ Create an [API token](https://dash.cloudflare.com/profile/api-tokens) with:
3437

3538
Wrangler now supports the environment variable `WRANGLER_R2_SQL_AUTH_TOKEN` which you can use to `export` your token.
3639

37-
### Create API token via API
38-
40+
</TabItem>
41+
<TabItem label='Via API'>
3942
To create an API token programmatically for use with R2 SQL, you will need to specify R2 SQL, R2 Data Catalog, and R2 storage permission groups in your [Access Policy](/r2/api/tokens/#access-policy).
4043

4144
#### Example Access Policy
@@ -66,7 +69,8 @@ To create an API token programmatically for use with R2 SQL, you will need to sp
6669
}
6770
]
6871
```
69-
72+
</TabItem>
73+
</Tabs>
7074

7175
## Query data via Wrangler
7276

src/content/docs/r2-sql/tutorials/end-to-end-pipeline.mdx

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -110,13 +110,11 @@ Create an R2 bucket:
110110
</TabItem>
111111
</Tabs>
112112

113-
## 3. Enable R2 Data Catalog
113+
Enable the catalog on your R2 bucket:
114114

115115
<Tabs syncKey='CLIvDash'>
116116
<TabItem label='Wrangler CLI'>
117117

118-
Enable the catalog on your R2 bucket:
119-
120118
```bash
121119
npx wrangler r2 bucket catalog enable fraud-pipeline
122120
```
@@ -177,9 +175,9 @@ npx wrangler r2 bucket catalog compaction enable fraud-pipeline --token $WRANGLE
177175
</TabItem>
178176
</Tabs>
179177

180-
## 4. Set up the pipeline infrastructure
178+
## 3. Set up the pipeline infrastructure
181179

182-
### 4.1. Create the Pipeline stream
180+
### 3.1. Create the Pipeline stream
183181

184182
<Tabs syncKey='CLIvDash'>
185183
<TabItem label='Wrangler CLI'>
@@ -191,7 +189,7 @@ First, create a schema file called `raw_transactions_schema.json` with the follo
191189
"fields": [
192190
{"name": "transaction_id", "type": "string", "required": true},
193191
{"name": "user_id", "type": "int64", "required": true},
194-
{"name": "amount", "type": "f64", "required": false},
192+
{"name": "amount", "type": "float64", "required": false},
195193
{"name": "transaction_timestamp", "type": "string", "required": false},
196194
{"name": "location", "type": "string", "required": false},
197195
{"name": "merchant_category", "type": "string", "required": false},
@@ -242,7 +240,7 @@ Input Schema:
242240
├───────────────────────┼────────┼────────────┼──────────┤
243241
│ user_id │ int64 │ │ Yes │
244242
├───────────────────────┼────────┼────────────┼──────────┤
245-
│ amount │ f64 │ │ No │
243+
│ amount │float64 │ │ No │
246244
├───────────────────────┼────────┼────────────┼──────────┤
247245
│ transaction_timestamp │ string │ │ No │
248246
├───────────────────────┼────────┼────────────┼──────────┤
@@ -254,7 +252,7 @@ Input Schema:
254252
└───────────────────────┴────────┴────────────┴──────────┘
255253
```
256254

257-
### 4.2. Create the data sink
255+
### 3.2. Create the data sink
258256

259257
Create a sink that writes data to your R2 bucket as Apache Iceberg tables:
260258

@@ -272,7 +270,7 @@ npx wrangler pipelines sinks create raw_events_sink \
272270
This creates a `sink` configuration that will write to the Iceberg table `fraud_detection.transactions` in your R2 Data Catalog every 30 seconds. Pipelines automatically appends an `__ingest_ts` column that is used to partition the table by `DAY`.
273271
:::
274272

275-
### 4.3. Create the pipeline
273+
### 3.3. Create the pipeline
276274

277275
Connect your stream to your sink with SQL:
278276

@@ -304,7 +302,7 @@ npx wrangler pipelines create raw_events_pipeline \
304302
"fields": [
305303
{"name": "transaction_id", "type": "string", "required": true},
306304
{"name": "user_id", "type": "int64", "required": true},
307-
{"name": "amount", "type": "f64", "required": false},
305+
{"name": "amount", "type": "float64", "required": false},
308306
{"name": "transaction_timestamp", "type": "string", "required": false},
309307
{"name": "location", "type": "string", "required": false},
310308
{"name": "merchant_category", "type": "string", "required": false},
@@ -341,7 +339,7 @@ npx wrangler pipelines create raw_events_pipeline \
341339
</TabItem>
342340
</Tabs>
343341

344-
## 5. Generate fraud detection data
342+
## 4. Generate sample fraud detection data
345343

346344
Create a Python script to generate realistic transaction data with fraud patterns:
347345

@@ -491,11 +489,11 @@ pip install requests
491489
python fraud_data_generator.py
492490
```
493491

494-
## 6. Query your fraud data with R2 SQL
492+
## 5. Query the data with R2 SQL
495493

496494
Now you can analyze your fraud detection data using R2 SQL. Here are some example queries:
497495

498-
### 6.1. View recent transactions
496+
### 5.1. View recent transactions
499497

500498
```bash
501499
npx wrangler r2 sql query "$WAREHOUSE" "
@@ -513,7 +511,7 @@ AND is_fraud = true
513511
LIMIT 10"
514512
```
515513

516-
### 6.2. Filter the raw transactions into a new table to highlight high-value transactions
514+
### 5.2. Filter the raw transactions into a new table to highlight high-value transactions
517515

518516
Create a new sink that will write the filtered data to a new Apache Iceberg table in R2 Data Catalog:
519517

0 commit comments

Comments
 (0)