Address PR feedback, made resource pages more concise, other minor changes

jonesphillip · jonesphillip · commit 4fc246ce37d9 · 2025-09-23T07:38:36.000-07:00
diff --git a/src/content/docs/pipelines/getting-started.mdx b/src/content/docs/pipelines/getting-started.mdx
@@ -147,7 +147,7 @@ First, create a schema file that defines your ecommerce data structure:
     },
     {
       "name": "amount",
-      "type": "f64",
+      "type": "float64",
       "required": false
     }
   ]
diff --git a/src/content/docs/pipelines/index.mdx b/src/content/docs/pipelines/index.mdx
@@ -32,7 +32,7 @@ Ingest, transform, and load streaming data into Apache Iceberg or Parquet in R2.
 
 <Plan type="paid" />
 
-Cloudflare Pipelines ingests streaming events from your applications, transforms them with SQL, and loads them into [R2](/r2/) as queryable Apache Iceberg tables managed by [R2 Data Catalog](/r2/data-catalog/) or as Parquet and JSON files.
+Cloudflare Pipelines ingests events, transforms them with SQL, and delivers them to R2 as [Iceberg tables](/r2/data-catalog/) or as Parquet and JSON files.
 
 Whether you're processing server logs, mobile application events, IoT telemetry, or clickstream data, Pipelines provides durable ingestion via HTTP endpoints or Worker bindings, SQL-based transformations, and exactly-once delivery to R2. This makes it easy to build analytics-ready data warehouses and lakehouses without managing streaming infrastructure.
 
diff --git a/src/content/docs/pipelines/pipelines/index.mdx b/src/content/docs/pipelines/pipelines/index.mdx
@@ -2,16 +2,14 @@
 title: Pipelines
 pcx_content_type: navigation
 sidebar:
-  order: 3
+  order: 4
 ---
 
 import { LinkCard } from "~/components";
 
 Pipelines connect [streams](/pipelines/streams/) and [sinks](/pipelines/sinks/) via SQL transformations, which can modify events before writing them to storage. This enables you to shift left, pushing validation, schematization, and processing to your ingestion layer to make your queries easy, fast, and correct.
 
-## What are Pipelines?
-
-Pipelines define the SQL transformations that process data as it flows from streams to sinks. They enable you to filter, transform, enrich, and restructure events in real-time before they reach storage.
+Pipelines enable you to filter, transform, enrich, and restructure events in real-time as data flows from streams to sinks.
 
 ## Learn more
 
diff --git a/src/content/docs/pipelines/platform/limits.mdx b/src/content/docs/pipelines/platform/limits.mdx
@@ -5,6 +5,8 @@ sidebar:
   order: 2
 ---
 
+import { Render } from "~/components";
+
 While in open beta, the following limits are currently in effect:
 
 | Feature                                    | Limit  |
@@ -14,3 +16,5 @@ While in open beta, the following limits are currently in effect:
 | Maximum ingest rate per stream             | 5 MB/s |
 | Maximum sinks per account                  | 20     |
 | Maximum pipelines per account              | 20     |
+
+<Render file="limits_increase" product="workers" />
diff --git a/src/content/docs/pipelines/sinks/available-sinks/r2-data-catalog.mdx b/src/content/docs/pipelines/sinks/available-sinks/r2-data-catalog.mdx
@@ -19,7 +19,7 @@ npx wrangler pipelines sinks create my-sink \
   --catalog-token YOUR_CATALOG_TOKEN
 ```
 
-If the specified namespace and table do not exist, the sink will create them automatically.
+The sink will create the specified namespace and table if they do not exist. Sinks cannot be created for existing Iceberg tables.
 
 ## Format
 
@@ -43,10 +43,10 @@ Configure Parquet compression for optimal storage and query performance:
 
 ### Row group size
 
-[Row groups](https://parquet.apache.org/docs/file-format/configurations/) are sets of rows in a Parquet file that are stored together, affecting memory usage and query performance. Configure the target row group size:
+[Row groups](https://parquet.apache.org/docs/file-format/configurations/) are sets of rows in a Parquet file that are stored together, affecting memory usage and query performance. Configure the target row group size in MB:
 
 ```bash
---target-row-group-size 1024MB
+--target-row-group-size 256
 ```
 
 ## Batching and rolling policy
@@ -77,10 +77,5 @@ Set maximum file size in MB before creating a new file:
 R2 Data Catalog sinks require an API token with [R2 Admin Read & Write permissions](/r2/data-catalog/manage-catalogs/#create-api-token-in-the-dashboard). This permission grants the sink access to both R2 Data Catalog and R2 storage.
 
 ```bash
-npx wrangler pipelines sinks create my-sink \
-  --type r2-data-catalog \
-  --bucket my-bucket \
-  --namespace my_namespace \
-  --table my_table \
-  --catalog-token YOUR_CATALOG_TOKEN
+--catalog-token YOUR_CATALOG_TOKEN
 ```
diff --git a/src/content/docs/pipelines/sinks/available-sinks/r2.mdx b/src/content/docs/pipelines/sinks/available-sinks/r2.mdx
@@ -25,22 +25,15 @@ R2 sinks support two output formats:
 Write data as newline-delimited JSON files:
 
 ```bash
-npx wrangler pipelines sinks create my-sink \
-  --type r2 \
-  --bucket my-bucket \
-  --format json
+--format json
 ```
 
 ### Parquet format
 
 Write data as Parquet files for better query performance and compression:
 
 ```bash
-npx wrangler pipelines sinks create my-sink \
-  --type r2 \
-  --bucket my-bucket \
-  --format parquet \
-  --compression zstd
+--format parquet --compression zstd
 ```
 
 **Compression options for Parquet:**
@@ -52,42 +45,38 @@ npx wrangler pipelines sinks create my-sink \
 - `uncompressed` - No compression
 
 **Row group size:**
-[Row groups](https://parquet.apache.org/docs/file-format/configurations/) are sets of rows in a Parquet file that are stored together, affecting memory usage and query performance. Configure the target row group size:
+[Row groups](https://parquet.apache.org/docs/file-format/configurations/) are sets of rows in a Parquet file that are stored together, affecting memory usage and query performance. Configure the target row group size in MB:
 
 ```bash
---target-row-group-size 1024MB
+--target-row-group-size 256
 ```
 
 ## File organization
 
-Files are written with UUID names within the partitioned directory structure. For example, with prefix `analytics` and default partitioning:
+Files are written with UUID names within the partitioned directory structure. For example, with path `analytics` and default partitioning:
 
 ```
 analytics/year=2025/month=09/day=18/002507a5-d449-48e8-a484-b1bea916102f.parquet
 ```
 
-### Path prefix
+### Path
 
 Set a base directory in your bucket where files will be written:
 
 ```bash
-npx wrangler pipelines sinks create my-sink \
-  --type r2 \
-  --bucket my-bucket \
-  --path analytics/events
+--path analytics/events
 ```
 
 ### Partitioning
 
-R2 sinks automatically partition files by time using a configurable pattern. The default pattern is `year=%Y/month=%m/day=%d`.
+R2 sinks automatically partition files by time using a configurable pattern. The default pattern is `year=%Y/month=%m/day=%d` (Hive-style partitioning).
 
 ```bash
-npx wrangler pipelines sinks create my-sink \
-  --type r2 \
-  --bucket my-bucket \
-  --partitioning "year=%Y/month=%m/day=%d/hour=%H"
+--partitioning "year=%Y/month=%m/day=%d/hour=%H"
 ```
 
+For available format specifiers, refer to [strftime documentation](https://docs.rs/chrono/latest/chrono/format/strftime/index.html).
+
 ## Batching and rolling policy
 
 Control when files are written to R2. Configure based on your needs:
diff --git a/src/content/docs/pipelines/sinks/index.mdx b/src/content/docs/pipelines/sinks/index.mdx
@@ -2,18 +2,14 @@
 title: Sinks
 pcx_content_type: navigation
 sidebar:
-  order: 4
+  order: 3
 ---
 
 import { LinkCard } from "~/components";
 
 Sinks define destinations for your data in Cloudflare Pipelines. They support writing to [R2 Data Catalog](/r2/data-catalog/) as Apache Iceberg tables or to [R2](/r2/) as raw JSON or Parquet files.
 
-## What are Sinks?
-
-Sinks write processed data from pipelines to R2. They provide exactly-once delivery guarantees, ensuring events are never duplicated or dropped.
-
-Sinks can be configured to write files frequently for low-latency ingestion or to write larger, less frequent files for better query performance. Configuration options include batching settings, compression, and output formats.
+Sinks provide exactly-once delivery guarantees, ensuring events are never duplicated or dropped. They can be configured to write files frequently for low-latency ingestion or to write larger, less frequent files for better query performance.
 
 ## Learn more
 
diff --git a/src/content/docs/pipelines/sql-reference/sql-data-types.mdx b/src/content/docs/pipelines/sql-reference/sql-data-types.mdx
@@ -14,11 +14,11 @@ Cloudflare Pipelines supports a set of primitive and composite data types for SQ
 | `bool`      | `BOOLEAN`                           | `TRUE`, `FALSE`                                      |
 | `int32`     | `INT`, `INTEGER`                    | `0`, `1`, `-2`                                       |
 | `int64`     | `BIGINT`                            | `0`, `1`, `-2`                                       |
-| `f32`       | `FLOAT`, `REAL`                     | `0.0`, `-2.4`, `1E-3`                                |
-| `f64`       | `DOUBLE`                            | `0.0`, `-2.4`, `1E-35`                               |
+| `float32`   | `FLOAT`, `REAL`                     | `0.0`, `-2.4`, `1E-3`                                |
+| `float64`   | `DOUBLE`                            | `0.0`, `-2.4`, `1E-35`                               |
 | `string`    | `VARCHAR`, `CHAR`, `TEXT`, `STRING` | `"hello"`, `"world"`                                 |
 | `timestamp` | `TIMESTAMP`                         | `'2020-01-01'`, `'2023-05-17T22:16:00.648662+00:00'` |
-| `bytes`     | `BYTEA`                             | `X'A123'` (hex)                                      |
+| `binary`    | `BYTEA`                             | `X'A123'` (hex)                                      |
 | `json`      | `JSON`                              | `'{"event": "purchase", "amount": 29.99}'`           |
 
 ## Composite types
@@ -41,21 +41,14 @@ Pipelines provides array functions for manipulating list values, and lists may b
 
 ### Struct types
 
-Structs combine related fields into a single value. In stream schemas, structs are declared using the `struct` type with a `fields` array. In SQL, structs are declared using this syntax: `STRUCT<field_name field_type, ..>`, and may contain any other type, including lists and other structs.
+Structs combine related fields into a single value. In stream schemas, structs are declared using the `struct` type with a `fields` array. In SQL, structs can be created using the `struct` function.
 
-Example struct in SQL:
+Example creating a struct in SQL:
 
 ```sql
-CREATE TABLE events (
-    properties STRUCT <
-      user_id TEXT,
-      amounts INT[],
-      profile STRUCT <
-        first_name TEXT,
-        last_name TEXT
-      >
-    >
-)
+SELECT struct('user123', 'purchase', 29.99) as event_data FROM events
 ```
 
-Struct fields can be accessed via `.` notation, for example `properties.profile.first_name`.
+This creates a struct with fields `c0`, `c1`, `c2` containing the user ID, event type, and amount.
+
+Struct fields can be accessed via `.` notation, for example `event_data.c0` for the user ID.
diff --git a/src/content/docs/pipelines/streams/index.mdx b/src/content/docs/pipelines/streams/index.mdx
@@ -7,13 +7,11 @@ sidebar:
 
 import { LinkCard } from "~/components";
 
-Streams are durable, buffered queues that receive and store events for processing in [Cloudflare Pipelines](/pipelines/). They provide reliable data ingestion and can accept events via HTTP endpoints or Worker bindings.
+Streams are durable, buffered queues that receive and store events for processing in [Cloudflare Pipelines](/pipelines/). They provide reliable data ingestion via HTTP endpoints and Worker bindings, ensuring no data loss even during downstream processing delays or failures.
 
-## What are Streams?
+A single stream can be read by multiple pipelines, allowing you to route the same data to different destinations or apply different transformations. For example, you might send user events to both a real-time analytics pipeline and a data warehouse pipeline.
 
-Streams act as the entry point for your data into Pipelines. They durably buffer incoming events, ensuring no data loss even during downstream processing delays or failures. Events are persisted until successfully processed by connected pipelines.
-
-Streams currently accept events in JSON format via [HTTP endpoints](/pipelines/streams/writing-to-streams/) and [Workers bindings](/pipelines/streams/writing-to-streams/). Streams support both structured events with defined schemas and unstructured JSON. When a schema is provided, Streams will validate and enforce it for incoming events.
+Streams currently accept events in JSON format and support both structured events with defined schemas and unstructured JSON. When a schema is provided, streams will validate and enforce it for incoming events.
 
 ## Learn more
 
diff --git a/src/content/docs/pipelines/streams/manage-streams.mdx b/src/content/docs/pipelines/streams/manage-streams.mdx
@@ -65,15 +65,35 @@ Example schema file:
 			"type": "string",
 			"required": true
 		},
-		{
-			"name": "event_type",
-			"type": "string",
-			"required": true
-		},
 		{
 			"name": "amount",
-			"type": "f64",
+			"type": "float64",
 			"required": false
+		},
+		{
+			"name": "tags",
+			"type": "list",
+			"required": false,
+			"items": {
+				"type": "string"
+			}
+		},
+		{
+			"name": "metadata",
+			"type": "struct",
+			"required": false,
+			"fields": [
+				{
+					"name": "source",
+					"type": "string",
+					"required": false
+				},
+				{
+					"name": "priority",
+					"type": "int32",
+					"required": false
+				}
+			]
 		}
 	]
 }
@@ -83,16 +103,16 @@ Example schema file:
 
 - `string` - Text values
 - `int32`, `int64` - Integer numbers
-- `f32`, `f64` - Floating-point numbers
+- `float32`, `float64` - Floating-point numbers
 - `bool` - Boolean true/false
-- `timestamp` - ISO 8601 timestamps
-- `json` - Nested JSON objects
-- `bytes` - Binary data
+- `timestamp` - RFC 3339 timestamps, or numeric values parsed as Unix seconds, milliseconds, or microseconds (depending on unit)
+- `json` - JSON objects
+- `binary` - Binary data (base64-encoded)
 - `list` - Arrays of values
 - `struct` - Nested objects with defined fields
 
 :::note
-Events with invalid schemas are accepted during ingestion but will be dropped during processing. Schema modifications are not supported after stream creation.
+Events that do not match the defined schema are accepted during ingestion but will be dropped during processing. Schema modifications are not supported after stream creation.
 :::
 
 ## View stream configuration
@@ -167,5 +187,5 @@ npx wrangler pipelines streams delete <STREAM_ID>
 ```
 
 :::caution
-Deleting a stream will permanently remove all buffered events that have not been processed. Ensure all data has been delivered to your sink before deletion.
+Deleting a stream will permanently remove all buffered events that have not been processed and will delete any dependent pipelines. Ensure all data has been delivered to your sink before deletion.
 :::

Original file line number	Diff line number	Diff line change
`@@ -147,7 +147,7 @@ First, create a schema file that defines your ecommerce data structure:`
`147`	`147`	`},`
`148`	`148`	`{`
`149`	`149`	`"name": "amount",`
`150`		`- "type": "f64",`
	`150`	`+ "type": "float64",`
`151`	`151`	`"required": false`
`152`	`152`	`}`
`153`	`153`	`]`