You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/content/docs/pipelines/concepts/how-pipelines-work.mdx
+26-12Lines changed: 26 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,32 +5,46 @@ sidebar:
5
5
order: 1
6
6
---
7
7
8
-
Cloudflare Pipelines let you ingest data from a source, and deliver to a destination. It's built for high volume, real time data streams. Each pipeline can ingest up to 100 MB/s of data, via HTTP or a Worker, and load the data as files in an R2 bucket.
8
+
Cloudflare Pipelines let you ingest data from a source, and deliver to a sink. It's built for high volume, real time data streams. Each pipeline can ingest up to 100 MB/s of data, via HTTP or a Worker, and load the data as files in an R2 bucket.
Pipelines supports ingestion via [HTTP](/pipelines/build-with-pipelines/http), or from a [Cloudflare Worker](/workers/), using the [Pipelines Workers API](/pipelines/build-with-pipelines/workers-apis).
16
15
17
-
A pipeline can ingest JSON-serializable records.
16
+
### Sources
17
+
Pipelines supports the following sources:
18
+
*[HTTP Clients](/pipelines/build-with-pipelines/http), with optional authentication and CORS settings
19
+
*[Cloudflare Worker](/workers/), using the [Pipelines Workers API](/pipelines/build-with-pipelines/workers-apis)
18
20
19
-
Finally, Pipelines supports writing data to [R2 Object Storage](/r2/). Ingested data is written to output files. These files are optionally compressed, and then delivered to an R2 bucket. Output files are generated as newline delimited JSON files (`ndjson`). The filename of each output file is prefixed by the event date and time, to make querying the data more efficient. For example, an output fle might be named like this: `event_date=2025-04-03/hr=15/01JQY361X75TMYSQZGWC6ZDMR2.json.gz`. Each line in an output file maps to a single record ingested by a pipeline.
21
+
Multiple sources can be active on a single pipeline simultaneously. For example, you can create a pipeline which accepts data from both a Worker, and via HTTP. Multiple workers can be configured to send data to the same pipeline. There is no limit to the number of source clients.
20
22
21
-
We plan to support more sources, data formats, and sinks, in the future.
23
+
### Data format
24
+
Pipelines can ingest JSON serializable records.
22
25
23
-
##Data durability, and the lifecycle of a request
24
-
If you make a request to send data to a pipeline, and receive a successful response, we guarantee that the data will be delivered to your configured destination.
26
+
### Sinks
27
+
Pipelines supports delivering data into [R2 Object Storage](/r2/). Ingested data is delivered as newline delimited JSON files (`ndjson`), with optional compression. Multiple pipelines can be configured to deliver data to the same R2 bucket.
25
28
26
-
Any data sent to a pipeline is durably committed to storage. Pipelines use [SQLite backed Durable Objects](/durable-objects/best-practices/access-durable-objects-storage/#sqlite-storage-backend) as a buffer for ingested records. A pipeline will only return a response after data has been successfully stored.
29
+
## Data durability
30
+
Pipelines are designed to be reliable. Data sent to a pipeline should be delivered successfully to the configured R2 bucket, provided that the [R2 API credentials associated with a pipeline](/r2/api/s3/tokens/) remain valid.
27
31
28
-
Ingested data continues to be stored, until a sufficiently large batch of data has accumulated. Batching is useful to reduce the number of output files written out to R2. [Batch sizes are customizable](/pipelines/build-with-pipelines/output-settings/#customize-batch-behavior), in terms of data volume, rows, or time.
32
+
Each pipeline maintains a storage buffer. Requests to send data to a pipeline receive a successful response only after the data is committed to this storage buffer.
29
33
30
-
When a batch has reached its target size, the batch entire is written out to a file. The file is optionally compressed, and is delivered to an R2 bucket. Any transient failures, such as network failures, are automatically retried.
34
+
Ingested data accumulates, until a sufficiently [large batch of data](/pipelines/build-with-pipelines/output-settings/#customize-batch-behavior) has been filled. Once the batch reaches its target size, the entire batch of data is converted to a file and delivered to R2.
35
+
36
+
Transient failures, such as network connectivity issues, are automatically retried.
37
+
38
+
However, if the [R2 API credentials associated with a pipeline](/r2/api/s3/tokens/) expire or are revoked, data delivery will fail. In this scenario, some data might continue to accumulate in the buffers, but the pipeline will eventually start rejecting requests.
31
39
32
40
## How a Pipeline handles updates
33
-
Data delivery is guaranteed even while updating an existing pipeline. Updating an existing pipeline effectively creates a new deployment, including all your previously configured options. Requests are gracefully re-routed to the new pipeline. The old pipeline continues to write data into your destination. Once the old pipeline is fully drained, it is spun down.
41
+
Pipelines update without dropping records. Updating an existing pipeline effectively creates a new instance of the pipeline. Requests are gracefully re-routed to the new instance. The old instance continues to write data into your configured sink. Once the old instance is fully drained, it is spun down.
42
+
43
+
This means that updates might take a few minutes to go into effect. For example, if you update a pipeline's sink, previously ingested data might continue to be delivered into the old sink.
34
44
35
45
## What if I send too much data? Do Pipelines communicate backpressure?
36
-
If you send too much data, the pipeline will communicate backpressure by returning a 429 response to HTTP requests, or throwing an error if using the Workers API. Refer to the [limits](/pipelines/platform/limits) to learn how much volume a single pipeline can support.
46
+
If you send too much data, the pipeline will communicate backpressure by returning a 429 response to HTTP requests, or throwing an error if using the Workers API. Refer to the [limits](/pipelines/platform/limits) to learn how much volume a single pipeline can support. You might see 429 responses if you are sending too many requests, or sending too much data.
47
+
48
+
If you are consistently seeing backpressure from your pipeline, consider the following strategies:
49
+
* Increase the [shard count](/pipelines/build-with-pipelines/shards), to increase the maxiumum throughput of your pipeline.
50
+
* Send data to a second pipeline if you receive an error. You can setup multiple pipelines to write to the same R2 bucket.
Copy file name to clipboardExpand all lines: src/content/docs/pipelines/platform/limits.mdx
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,4 +23,8 @@ Many of these limits will increase during Pipelines' public beta period.
23
23
| Maximum batch duration | 300s |
24
24
25
25
## What happens if I exceed the requests per second or throughput limits?
26
-
Consistently exceeding the requests per second or throughpht limits may result in requests being rejected by your Pipeline, with a status code of 429. This status code indicates that your Pipeline is unable to keep up with the volume.
26
+
If you consistently exceed the requests per second or throughput limits, your pipeline might not be able to keep up with the load. The pipeline will communicate backpressure by returning a 429 response to HTTP requests, or throwing an error if using the Workers API.
27
+
28
+
If you are consistently seeing backpressure from your pipeline, consider the following strategies:
29
+
* Increase the [shard count](/pipelines/build-with-pipelines/shards), to increase the maxiumum throughput of your pipeline.
30
+
* Send data to a second pipeline if you receive an error. You can setup multiple pipelines to write to the same R2 bucket.
0 commit comments