Skip to content

Commit a9196f6

Browse files
committed
Small changes
1 parent c87125d commit a9196f6

File tree

3 files changed

+31
-13
lines changed

3 files changed

+31
-13
lines changed
206 KB
Loading

src/content/docs/pipelines/concepts/how-pipelines-work.mdx

Lines changed: 26 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -5,32 +5,46 @@ sidebar:
55
order: 1
66
---
77

8-
Cloudflare Pipelines let you ingest data from a source, and deliver to a destination. It's built for high volume, real time data streams. Each pipeline can ingest up to 100 MB/s of data, via HTTP or a Worker, and load the data as files in an R2 bucket.
8+
Cloudflare Pipelines let you ingest data from a source, and deliver to a sink. It's built for high volume, real time data streams. Each pipeline can ingest up to 100 MB/s of data, via HTTP or a Worker, and load the data as files in an R2 bucket.
99

1010
This guide explains how a pipeline works.
1111

1212
![Pipelines Architecture](~/assets/images/pipelines/architecture.png)
1313

1414
## Supported sources, data formats, and sinks
15-
Pipelines supports ingestion via [HTTP](/pipelines/build-with-pipelines/http), or from a [Cloudflare Worker](/workers/), using the [Pipelines Workers API](/pipelines/build-with-pipelines/workers-apis).
1615

17-
A pipeline can ingest JSON-serializable records.
16+
### Sources
17+
Pipelines supports the following sources:
18+
* [HTTP Clients](/pipelines/build-with-pipelines/http), with optional authentication and CORS settings
19+
* [Cloudflare Worker](/workers/), using the [Pipelines Workers API](/pipelines/build-with-pipelines/workers-apis)
1820

19-
Finally, Pipelines supports writing data to [R2 Object Storage](/r2/). Ingested data is written to output files. These files are optionally compressed, and then delivered to an R2 bucket. Output files are generated as newline delimited JSON files (`ndjson`). The filename of each output file is prefixed by the event date and time, to make querying the data more efficient. For example, an output fle might be named like this: `event_date=2025-04-03/hr=15/01JQY361X75TMYSQZGWC6ZDMR2.json.gz`. Each line in an output file maps to a single record ingested by a pipeline.
21+
Multiple sources can be active on a single pipeline simultaneously. For example, you can create a pipeline which accepts data from both a Worker, and via HTTP. Multiple workers can be configured to send data to the same pipeline. There is no limit to the number of source clients.
2022

21-
We plan to support more sources, data formats, and sinks, in the future.
23+
### Data format
24+
Pipelines can ingest JSON serializable records.
2225

23-
## Data durability, and the lifecycle of a request
24-
If you make a request to send data to a pipeline, and receive a successful response, we guarantee that the data will be delivered to your configured destination.
26+
### Sinks
27+
Pipelines supports delivering data into [R2 Object Storage](/r2/). Ingested data is delivered as newline delimited JSON files (`ndjson`), with optional compression. Multiple pipelines can be configured to deliver data to the same R2 bucket.
2528

26-
Any data sent to a pipeline is durably committed to storage. Pipelines use [SQLite backed Durable Objects](/durable-objects/best-practices/access-durable-objects-storage/#sqlite-storage-backend) as a buffer for ingested records. A pipeline will only return a response after data has been successfully stored.
29+
## Data durability
30+
Pipelines are designed to be reliable. Data sent to a pipeline should be delivered successfully to the configured R2 bucket, provided that the [R2 API credentials associated with a pipeline](/r2/api/s3/tokens/) remain valid.
2731

28-
Ingested data continues to be stored, until a sufficiently large batch of data has accumulated. Batching is useful to reduce the number of output files written out to R2. [Batch sizes are customizable](/pipelines/build-with-pipelines/output-settings/#customize-batch-behavior), in terms of data volume, rows, or time.
32+
Each pipeline maintains a storage buffer. Requests to send data to a pipeline receive a successful response only after the data is committed to this storage buffer.
2933

30-
When a batch has reached its target size, the batch entire is written out to a file. The file is optionally compressed, and is delivered to an R2 bucket. Any transient failures, such as network failures, are automatically retried.
34+
Ingested data accumulates, until a sufficiently [large batch of data](/pipelines/build-with-pipelines/output-settings/#customize-batch-behavior) has been filled. Once the batch reaches its target size, the entire batch of data is converted to a file and delivered to R2.
35+
36+
Transient failures, such as network connectivity issues, are automatically retried.
37+
38+
However, if the [R2 API credentials associated with a pipeline](/r2/api/s3/tokens/) expire or are revoked, data delivery will fail. In this scenario, some data might continue to accumulate in the buffers, but the pipeline will eventually start rejecting requests.
3139

3240
## How a Pipeline handles updates
33-
Data delivery is guaranteed even while updating an existing pipeline. Updating an existing pipeline effectively creates a new deployment, including all your previously configured options. Requests are gracefully re-routed to the new pipeline. The old pipeline continues to write data into your destination. Once the old pipeline is fully drained, it is spun down.
41+
Pipelines update without dropping records. Updating an existing pipeline effectively creates a new instance of the pipeline. Requests are gracefully re-routed to the new instance. The old instance continues to write data into your configured sink. Once the old instance is fully drained, it is spun down.
42+
43+
This means that updates might take a few minutes to go into effect. For example, if you update a pipeline's sink, previously ingested data might continue to be delivered into the old sink.
3444

3545
## What if I send too much data? Do Pipelines communicate backpressure?
36-
If you send too much data, the pipeline will communicate backpressure by returning a 429 response to HTTP requests, or throwing an error if using the Workers API. Refer to the [limits](/pipelines/platform/limits) to learn how much volume a single pipeline can support.
46+
If you send too much data, the pipeline will communicate backpressure by returning a 429 response to HTTP requests, or throwing an error if using the Workers API. Refer to the [limits](/pipelines/platform/limits) to learn how much volume a single pipeline can support. You might see 429 responses if you are sending too many requests, or sending too much data.
47+
48+
If you are consistently seeing backpressure from your pipeline, consider the following strategies:
49+
* Increase the [shard count](/pipelines/build-with-pipelines/shards), to increase the maxiumum throughput of your pipeline.
50+
* Send data to a second pipeline if you receive an error. You can setup multiple pipelines to write to the same R2 bucket.

src/content/docs/pipelines/platform/limits.mdx

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,8 @@ Many of these limits will increase during Pipelines' public beta period.
2323
| Maximum batch duration | 300s |
2424

2525
## What happens if I exceed the requests per second or throughput limits?
26-
Consistently exceeding the requests per second or throughpht limits may result in requests being rejected by your Pipeline, with a status code of 429. This status code indicates that your Pipeline is unable to keep up with the volume.
26+
If you consistently exceed the requests per second or throughput limits, your pipeline might not be able to keep up with the load. The pipeline will communicate backpressure by returning a 429 response to HTTP requests, or throwing an error if using the Workers API.
27+
28+
If you are consistently seeing backpressure from your pipeline, consider the following strategies:
29+
* Increase the [shard count](/pipelines/build-with-pipelines/shards), to increase the maxiumum throughput of your pipeline.
30+
* Send data to a second pipeline if you receive an error. You can setup multiple pipelines to write to the same R2 bucket.

0 commit comments

Comments
 (0)