Skip to content

Commit 4fc246c

Browse files
committed
Address PR feedback, made resource pages more concise, other minor changes
1 parent f0846b1 commit 4fc246c

File tree

10 files changed

+69
-76
lines changed

10 files changed

+69
-76
lines changed

src/content/docs/pipelines/getting-started.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ First, create a schema file that defines your ecommerce data structure:
147147
},
148148
{
149149
"name": "amount",
150-
"type": "f64",
150+
"type": "float64",
151151
"required": false
152152
}
153153
]

src/content/docs/pipelines/index.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Ingest, transform, and load streaming data into Apache Iceberg or Parquet in R2.
3232

3333
<Plan type="paid" />
3434

35-
Cloudflare Pipelines ingests streaming events from your applications, transforms them with SQL, and loads them into [R2](/r2/) as queryable Apache Iceberg tables managed by [R2 Data Catalog](/r2/data-catalog/) or as Parquet and JSON files.
35+
Cloudflare Pipelines ingests events, transforms them with SQL, and delivers them to R2 as [Iceberg tables](/r2/data-catalog/) or as Parquet and JSON files.
3636

3737
Whether you're processing server logs, mobile application events, IoT telemetry, or clickstream data, Pipelines provides durable ingestion via HTTP endpoints or Worker bindings, SQL-based transformations, and exactly-once delivery to R2. This makes it easy to build analytics-ready data warehouses and lakehouses without managing streaming infrastructure.
3838

src/content/docs/pipelines/pipelines/index.mdx

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,14 @@
22
title: Pipelines
33
pcx_content_type: navigation
44
sidebar:
5-
order: 3
5+
order: 4
66
---
77

88
import { LinkCard } from "~/components";
99

1010
Pipelines connect [streams](/pipelines/streams/) and [sinks](/pipelines/sinks/) via SQL transformations, which can modify events before writing them to storage. This enables you to shift left, pushing validation, schematization, and processing to your ingestion layer to make your queries easy, fast, and correct.
1111

12-
## What are Pipelines?
13-
14-
Pipelines define the SQL transformations that process data as it flows from streams to sinks. They enable you to filter, transform, enrich, and restructure events in real-time before they reach storage.
12+
Pipelines enable you to filter, transform, enrich, and restructure events in real-time as data flows from streams to sinks.
1513

1614
## Learn more
1715

src/content/docs/pipelines/platform/limits.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ sidebar:
55
order: 2
66
---
77

8+
import { Render } from "~/components";
9+
810
While in open beta, the following limits are currently in effect:
911

1012
| Feature | Limit |
@@ -14,3 +16,5 @@ While in open beta, the following limits are currently in effect:
1416
| Maximum ingest rate per stream | 5 MB/s |
1517
| Maximum sinks per account | 20 |
1618
| Maximum pipelines per account | 20 |
19+
20+
<Render file="limits_increase" product="workers" />

src/content/docs/pipelines/sinks/available-sinks/r2-data-catalog.mdx

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ npx wrangler pipelines sinks create my-sink \
1919
--catalog-token YOUR_CATALOG_TOKEN
2020
```
2121

22-
If the specified namespace and table do not exist, the sink will create them automatically.
22+
The sink will create the specified namespace and table if they do not exist. Sinks cannot be created for existing Iceberg tables.
2323

2424
## Format
2525

@@ -43,10 +43,10 @@ Configure Parquet compression for optimal storage and query performance:
4343

4444
### Row group size
4545

46-
[Row groups](https://parquet.apache.org/docs/file-format/configurations/) are sets of rows in a Parquet file that are stored together, affecting memory usage and query performance. Configure the target row group size:
46+
[Row groups](https://parquet.apache.org/docs/file-format/configurations/) are sets of rows in a Parquet file that are stored together, affecting memory usage and query performance. Configure the target row group size in MB:
4747

4848
```bash
49-
--target-row-group-size 1024MB
49+
--target-row-group-size 256
5050
```
5151

5252
## Batching and rolling policy
@@ -77,10 +77,5 @@ Set maximum file size in MB before creating a new file:
7777
R2 Data Catalog sinks require an API token with [R2 Admin Read & Write permissions](/r2/data-catalog/manage-catalogs/#create-api-token-in-the-dashboard). This permission grants the sink access to both R2 Data Catalog and R2 storage.
7878

7979
```bash
80-
npx wrangler pipelines sinks create my-sink \
81-
--type r2-data-catalog \
82-
--bucket my-bucket \
83-
--namespace my_namespace \
84-
--table my_table \
85-
--catalog-token YOUR_CATALOG_TOKEN
80+
--catalog-token YOUR_CATALOG_TOKEN
8681
```

src/content/docs/pipelines/sinks/available-sinks/r2.mdx

Lines changed: 11 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -25,22 +25,15 @@ R2 sinks support two output formats:
2525
Write data as newline-delimited JSON files:
2626

2727
```bash
28-
npx wrangler pipelines sinks create my-sink \
29-
--type r2 \
30-
--bucket my-bucket \
31-
--format json
28+
--format json
3229
```
3330

3431
### Parquet format
3532

3633
Write data as Parquet files for better query performance and compression:
3734

3835
```bash
39-
npx wrangler pipelines sinks create my-sink \
40-
--type r2 \
41-
--bucket my-bucket \
42-
--format parquet \
43-
--compression zstd
36+
--format parquet --compression zstd
4437
```
4538

4639
**Compression options for Parquet:**
@@ -52,42 +45,38 @@ npx wrangler pipelines sinks create my-sink \
5245
- `uncompressed` - No compression
5346

5447
**Row group size:**
55-
[Row groups](https://parquet.apache.org/docs/file-format/configurations/) are sets of rows in a Parquet file that are stored together, affecting memory usage and query performance. Configure the target row group size:
48+
[Row groups](https://parquet.apache.org/docs/file-format/configurations/) are sets of rows in a Parquet file that are stored together, affecting memory usage and query performance. Configure the target row group size in MB:
5649

5750
```bash
58-
--target-row-group-size 1024MB
51+
--target-row-group-size 256
5952
```
6053

6154
## File organization
6255

63-
Files are written with UUID names within the partitioned directory structure. For example, with prefix `analytics` and default partitioning:
56+
Files are written with UUID names within the partitioned directory structure. For example, with path `analytics` and default partitioning:
6457

6558
```
6659
analytics/year=2025/month=09/day=18/002507a5-d449-48e8-a484-b1bea916102f.parquet
6760
```
6861

69-
### Path prefix
62+
### Path
7063

7164
Set a base directory in your bucket where files will be written:
7265

7366
```bash
74-
npx wrangler pipelines sinks create my-sink \
75-
--type r2 \
76-
--bucket my-bucket \
77-
--path analytics/events
67+
--path analytics/events
7868
```
7969

8070
### Partitioning
8171

82-
R2 sinks automatically partition files by time using a configurable pattern. The default pattern is `year=%Y/month=%m/day=%d`.
72+
R2 sinks automatically partition files by time using a configurable pattern. The default pattern is `year=%Y/month=%m/day=%d` (Hive-style partitioning).
8373

8474
```bash
85-
npx wrangler pipelines sinks create my-sink \
86-
--type r2 \
87-
--bucket my-bucket \
88-
--partitioning "year=%Y/month=%m/day=%d/hour=%H"
75+
--partitioning "year=%Y/month=%m/day=%d/hour=%H"
8976
```
9077

78+
For available format specifiers, refer to [strftime documentation](https://docs.rs/chrono/latest/chrono/format/strftime/index.html).
79+
9180
## Batching and rolling policy
9281

9382
Control when files are written to R2. Configure based on your needs:

src/content/docs/pipelines/sinks/index.mdx

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,14 @@
22
title: Sinks
33
pcx_content_type: navigation
44
sidebar:
5-
order: 4
5+
order: 3
66
---
77

88
import { LinkCard } from "~/components";
99

1010
Sinks define destinations for your data in Cloudflare Pipelines. They support writing to [R2 Data Catalog](/r2/data-catalog/) as Apache Iceberg tables or to [R2](/r2/) as raw JSON or Parquet files.
1111

12-
## What are Sinks?
13-
14-
Sinks write processed data from pipelines to R2. They provide exactly-once delivery guarantees, ensuring events are never duplicated or dropped.
15-
16-
Sinks can be configured to write files frequently for low-latency ingestion or to write larger, less frequent files for better query performance. Configuration options include batching settings, compression, and output formats.
12+
Sinks provide exactly-once delivery guarantees, ensuring events are never duplicated or dropped. They can be configured to write files frequently for low-latency ingestion or to write larger, less frequent files for better query performance.
1713

1814
## Learn more
1915

src/content/docs/pipelines/sql-reference/sql-data-types.mdx

Lines changed: 9 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,11 @@ Cloudflare Pipelines supports a set of primitive and composite data types for SQ
1414
| `bool` | `BOOLEAN` | `TRUE`, `FALSE` |
1515
| `int32` | `INT`, `INTEGER` | `0`, `1`, `-2` |
1616
| `int64` | `BIGINT` | `0`, `1`, `-2` |
17-
| `f32` | `FLOAT`, `REAL` | `0.0`, `-2.4`, `1E-3` |
18-
| `f64` | `DOUBLE` | `0.0`, `-2.4`, `1E-35` |
17+
| `float32` | `FLOAT`, `REAL` | `0.0`, `-2.4`, `1E-3` |
18+
| `float64` | `DOUBLE` | `0.0`, `-2.4`, `1E-35` |
1919
| `string` | `VARCHAR`, `CHAR`, `TEXT`, `STRING` | `"hello"`, `"world"` |
2020
| `timestamp` | `TIMESTAMP` | `'2020-01-01'`, `'2023-05-17T22:16:00.648662+00:00'` |
21-
| `bytes` | `BYTEA` | `X'A123'` (hex) |
21+
| `binary` | `BYTEA` | `X'A123'` (hex) |
2222
| `json` | `JSON` | `'{"event": "purchase", "amount": 29.99}'` |
2323

2424
## Composite types
@@ -41,21 +41,14 @@ Pipelines provides array functions for manipulating list values, and lists may b
4141

4242
### Struct types
4343

44-
Structs combine related fields into a single value. In stream schemas, structs are declared using the `struct` type with a `fields` array. In SQL, structs are declared using this syntax: `STRUCT<field_name field_type, ..>`, and may contain any other type, including lists and other structs.
44+
Structs combine related fields into a single value. In stream schemas, structs are declared using the `struct` type with a `fields` array. In SQL, structs can be created using the `struct` function.
4545

46-
Example struct in SQL:
46+
Example creating a struct in SQL:
4747

4848
```sql
49-
CREATE TABLE events (
50-
properties STRUCT <
51-
user_id TEXT,
52-
amounts INT[],
53-
profile STRUCT <
54-
first_name TEXT,
55-
last_name TEXT
56-
>
57-
>
58-
)
49+
SELECT struct('user123', 'purchase', 29.99) as event_data FROM events
5950
```
6051

61-
Struct fields can be accessed via `.` notation, for example `properties.profile.first_name`.
52+
This creates a struct with fields `c0`, `c1`, `c2` containing the user ID, event type, and amount.
53+
54+
Struct fields can be accessed via `.` notation, for example `event_data.c0` for the user ID.

src/content/docs/pipelines/streams/index.mdx

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,11 @@ sidebar:
77

88
import { LinkCard } from "~/components";
99

10-
Streams are durable, buffered queues that receive and store events for processing in [Cloudflare Pipelines](/pipelines/). They provide reliable data ingestion and can accept events via HTTP endpoints or Worker bindings.
10+
Streams are durable, buffered queues that receive and store events for processing in [Cloudflare Pipelines](/pipelines/). They provide reliable data ingestion via HTTP endpoints and Worker bindings, ensuring no data loss even during downstream processing delays or failures.
1111

12-
## What are Streams?
12+
A single stream can be read by multiple pipelines, allowing you to route the same data to different destinations or apply different transformations. For example, you might send user events to both a real-time analytics pipeline and a data warehouse pipeline.
1313

14-
Streams act as the entry point for your data into Pipelines. They durably buffer incoming events, ensuring no data loss even during downstream processing delays or failures. Events are persisted until successfully processed by connected pipelines.
15-
16-
Streams currently accept events in JSON format via [HTTP endpoints](/pipelines/streams/writing-to-streams/) and [Workers bindings](/pipelines/streams/writing-to-streams/). Streams support both structured events with defined schemas and unstructured JSON. When a schema is provided, Streams will validate and enforce it for incoming events.
14+
Streams currently accept events in JSON format and support both structured events with defined schemas and unstructured JSON. When a schema is provided, streams will validate and enforce it for incoming events.
1715

1816
## Learn more
1917

src/content/docs/pipelines/streams/manage-streams.mdx

Lines changed: 32 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -65,15 +65,35 @@ Example schema file:
6565
"type": "string",
6666
"required": true
6767
},
68-
{
69-
"name": "event_type",
70-
"type": "string",
71-
"required": true
72-
},
7368
{
7469
"name": "amount",
75-
"type": "f64",
70+
"type": "float64",
7671
"required": false
72+
},
73+
{
74+
"name": "tags",
75+
"type": "list",
76+
"required": false,
77+
"items": {
78+
"type": "string"
79+
}
80+
},
81+
{
82+
"name": "metadata",
83+
"type": "struct",
84+
"required": false,
85+
"fields": [
86+
{
87+
"name": "source",
88+
"type": "string",
89+
"required": false
90+
},
91+
{
92+
"name": "priority",
93+
"type": "int32",
94+
"required": false
95+
}
96+
]
7797
}
7898
]
7999
}
@@ -83,16 +103,16 @@ Example schema file:
83103

84104
- `string` - Text values
85105
- `int32`, `int64` - Integer numbers
86-
- `f32`, `f64` - Floating-point numbers
106+
- `float32`, `float64` - Floating-point numbers
87107
- `bool` - Boolean true/false
88-
- `timestamp` - ISO 8601 timestamps
89-
- `json` - Nested JSON objects
90-
- `bytes` - Binary data
108+
- `timestamp` - RFC 3339 timestamps, or numeric values parsed as Unix seconds, milliseconds, or microseconds (depending on unit)
109+
- `json` - JSON objects
110+
- `binary` - Binary data (base64-encoded)
91111
- `list` - Arrays of values
92112
- `struct` - Nested objects with defined fields
93113

94114
:::note
95-
Events with invalid schemas are accepted during ingestion but will be dropped during processing. Schema modifications are not supported after stream creation.
115+
Events that do not match the defined schema are accepted during ingestion but will be dropped during processing. Schema modifications are not supported after stream creation.
96116
:::
97117

98118
## View stream configuration
@@ -167,5 +187,5 @@ npx wrangler pipelines streams delete <STREAM_ID>
167187
```
168188

169189
:::caution
170-
Deleting a stream will permanently remove all buffered events that have not been processed. Ensure all data has been delivered to your sink before deletion.
190+
Deleting a stream will permanently remove all buffered events that have not been processed and will delete any dependent pipelines. Ensure all data has been delivered to your sink before deletion.
171191
:::

0 commit comments

Comments
 (0)