You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pipelines support data ingestion over HTTP. When you create a new pipeline, you'll receive a globally scalable ingestion endpoint. To ingest data, make HTTP POST requests to the endpoint.
13
+
Pipelines support data ingestion over HTTP. When you create a new pipeline, you will receive a globally scalable ingestion endpoint. To ingest data, make HTTP POST requests to the endpoint.
@@ -57,7 +57,7 @@ curl -X POST https://<PIPELINE-ID>.pipelines.cloudflare.com \
57
57
```
58
58
59
59
## Turning HTTP ingestion off
60
-
By default, ingestion via HTTP is turned on. You can turn it off by excluding it from the list of sources, by using `--sources` when creating or updating a pipeline.
60
+
By default, ingestion via HTTP is turned on. You can turn it off by excluding it from the list of sources by using `--sources` when creating or updating a pipeline.
@@ -76,9 +76,9 @@ Once authentication is turned on, you will need to include a Cloudflare API toke
76
76
77
77
### Get API token
78
78
1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com) and select your account.
79
-
2. Navigate to your [API Keys](https://dash.cloudflare.com/profile/api-tokens)
80
-
3. Select *Create Token*
81
-
4. Choose the template for Workers Pipelines. Click on *continue to summary*, and finally on *create token*. Make sure to copy the API token, and save it securely.
79
+
2. Navigate to your [API Keys](https://dash.cloudflare.com/profile/api-tokens).
80
+
3. Select **Create Token**.
81
+
4. Choose the template for Workers Pipelines. Select **Continue to summary** > **Create token**. Make sure to copy the API token and save it securely.
82
82
83
83
### Making authenticated requests
84
84
Include the API token you created in the previous step in the headers for your request:
If you want to use your pipeline to ingest client side data, such as website clicks, you'll need to configure your [Cross-Origin Resource Sharing (CORS) settings](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS).
94
+
If you want to use your pipeline to ingest client side data, such as website clicks, you will need to configure your [Cross-Origin Resource Sharing (CORS) settings](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS).
95
95
96
96
Without setting your CORS settings, browsers will restrict requests made to your pipeline endpoint. For example, if your website domain is `https://my-website.com`, and you want to post client side data to your pipeline at `https://<PIPELINE-ID>.pipelines.cloudflare.com`, without CORS settings, the request will fail.
97
97
@@ -106,4 +106,4 @@ You can specify that all cross origin requests are accepted. We recommend only u
After your the `--cors-origins` have been set on your pipeline, your pipeline will respond to preflight requests and POST requests with the appropriate `Access-Control-Allow-Origin` headers set.
109
+
After the `--cors-origins` have been set on your pipeline, your pipeline will respond to preflight requests and `POST` requests with the appropriate `Access-Control-Allow-Origin` headers set.
Pipelines convert a stream of records into output files, and deliver the files to an R2 bucket in your account. This guide details how you can change the output destination, and how to customize batch settings to generate query ready files.
13
+
Pipelines convert a stream of records into output files and deliver the files to an R2 bucket in your account. This guide details how you can change the output destination and customize batch settings to generate query ready files.
14
14
15
15
## Configure an R2 bucket as a destination
16
16
To create or update a pipeline using Wrangler, run the following command in a terminal:
@@ -19,9 +19,9 @@ To create or update a pipeline using Wrangler, run the following command in a te
After running this command, you'll be prompted to authorize Cloudflare Workers Pipelines to create an R2 API token on your behalf. Your pipeline uses the R2 API token to load data into your bucket. You can approve the request through the browser link which will open automatically.
22
+
After running this command, you will be prompted to authorize Cloudflare Workers Pipelines to create an R2 API token on your behalf. Your pipeline uses the R2 API token to load data into your bucket. You can approve the request through the browser link which will open automatically.
23
23
24
-
If you prefer not to authenticate this way, you may pass your [R2 API Token](/r2/api/tokens/) to Wrangler:
24
+
If you prefer not to authenticate this way, you can pass your [R2 API Token](/r2/api/tokens/) to Wrangler:
@@ -40,18 +40,18 @@ Output files are named using a [UILD](https://github.com/ulid/spec) slug, follow
40
40
When configuring your pipeline, you can define how records are batched before they are delivered to R2. Batches of records are written out to a single output file.
41
41
42
42
Batching can:
43
-
1. Reduce the number of output files written to R2, and thus reduce the [cost of writing data to R2](/r2/pricing/#class-a-operations)
44
-
2. Increase the size of output files, making them more efficient to query
43
+
- Reduce the number of output files written to R2 and thus reduce the [cost of writing data to R2](/r2/pricing/#class-a-operations).
44
+
- Increase the size of output files making them more efficient to query.
45
45
46
46
There are three ways to define how ingested data is batched:
47
47
48
-
1.`batch-max-mb`: The maximum amount of data that will be batched, in megabytes. Default is 10 MB, maximum is 100 MB.
49
-
2.`batch-max-rows`: The maximum number of rows or events in a batch before data is written. Default, and maximum, is 10,000 rows.
50
-
3.`batch-max-seconds`: The maximum duration of a batch before data is written, in seconds. Default is 15 seconds, maximum is 300 seconds.
48
+
1.`batch-max-mb`: The maximum amount of data that will be batched in megabytes. Default is `10 MB`, maximum is `100 MB`.
49
+
2.`batch-max-rows`: The maximum number of rows or events in a batch before data is written. Default, and maximum, is `10,000` rows.
50
+
3.`batch-max-seconds`: The maximum duration of a batch before data is written in seconds. Default is `15 seconds`, maximum is `300 seconds`.
51
51
52
52
Batch definitions are hints. A pipeline will follow these hints closely, but batches might not be exact.
53
53
54
-
All three batch definitions work together. Whichever limit is reached first triggers the delivery of a batch.
54
+
All three batch definitions work together and whichever limit is reached first triggers the delivery of a batch.
55
55
56
56
For example, a `batch-max-mb` = 100 MB and a `batch-max-seconds` = 100 means that if 100 MB of events are posted to the pipeline, the batch will be delivered. However, if it takes longer than 100 seconds for 100 MB of events to be posted, a batch of all the messages that were posted during those 100 seconds will be created.
npx wrangler pipelines update [PIPELINE-NAME] --r2-prefix test
97
97
```
98
98
99
-
After running the above command, the output files generated by your pipeline will be stored under the prefix "test". Files will remain partitioned. Your output will look like this:
99
+
After running the above command, the output files generated by your pipeline will be stored under the prefix `test`. Files will remain partitioned. Your output will look like this:
Copy file name to clipboardExpand all lines: src/content/docs/pipelines/build-with-pipelines/shards.mdx
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,12 +24,12 @@ The default shard count will be set to `auto` in the future, with support for au
24
24
Each pipeline is composed of stateless, independent shards. These shards are spun up when a pipeline is created. Each shard is composed of layers of [Durable Objects](/durable-objects). The Durable Objects buffer data, replicate for durability, handle compression, and delivery to R2.
25
25
26
26
When a record is sent to a pipeline:
27
-
1. The Pipelines [Worker](/workers) receives the record
28
-
2. The record is routed to to one of the shards
29
-
3. The record is handled by a set of Durable Objects, which commmit the record to storage, and replicate for durability.
30
-
4. Records accumulate, until the [batch definitions](/pipelines/build-with-pipelines/output-settings/#customize-batch-behavior) are met.
31
-
5. The batch is written to an output file, and optionally compressed.
32
-
6. The output file is delivered to the configured R2 bucket
27
+
1. The Pipelines [Worker](/workers) receives the record.
28
+
2. The record is routed to to one of the shards.
29
+
3. The record is handled by a set of Durable Objects, which commit the record to storage and replicate for durability.
30
+
4. Records accumulate until the [batch definitions](/pipelines/build-with-pipelines/output-settings/#customize-batch-behavior) are met.
31
+
5. The batch is written to an output file and optionally compressed.
32
+
6. The output file is delivered to the configured R2 bucket.
33
33
34
34
Increasing the number of shards will increase the maximum throughput of a pipeline, as well as the number of output files created.
35
35
@@ -43,12 +43,12 @@ Increasing the shard count also increases the number of output files that your p
43
43
44
44
## How should I decide the number of shards to use?
45
45
Choose a shard count based on these factors:
46
-
*How many requests per second you will make to your pipeline
47
-
*How much data per second you will send to your pipeline
46
+
*The number of requests per second you will make to your pipeline
47
+
*The amount of data per second you will send to your pipeline
48
48
49
-
Each shard is capable of handling approximately 7,000 requests per second, or ingesting 7 MB / s of data. Either factor might act as the bottleneck, so choose the shard count based on the higher number.
49
+
Each shard is capable of handling approximately 7,000 requests per second, or ingesting 7 MB/s of data. Either factor might act as the bottleneck, so choose the shard count based on the higher number.
50
50
51
-
For example, if you estimate that you will ingest 70 MB / s, making 70,000 requests per second, setup a pipeline with 10 shards. However, if you estimate that you will ingest 70 MB / s while making 100,000 requests per second, setup a pipeline with 15 shards.
51
+
For example, if you estimate that you will ingest 70 MB/s, making 70,000 requests per second, setup a pipeline with 10 shards. However, if you estimate that you will ingest 70 MB/s while making 100,000 requests per second, setup a pipeline with 15 shards.
Copy file name to clipboardExpand all lines: src/content/docs/pipelines/concepts/how-pipelines-work.mdx
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ sidebar:
5
5
order: 1
6
6
---
7
7
8
-
Cloudflare Pipelines let you ingest data from a source, and deliver to a sink. It's built for high volume, real time data streams. Each pipeline can ingest up to 100 MB/s of data, via HTTP or a Worker, and load the data as files in an R2 bucket.
8
+
Cloudflare Pipelines let you ingest data from a source and deliver to a sink. It is built for high volume, real time data streams. Each pipeline can ingest up to 100 MB/s of data, via HTTP or a Worker, and load the data as files in an R2 bucket.
9
9
10
10
This guide explains how a pipeline works.
11
11
@@ -24,7 +24,7 @@ Multiple sources can be active on a single pipeline simultaneously. For example,
24
24
Pipelines can ingest JSON serializable records.
25
25
26
26
### Sinks
27
-
Pipelines supports delivering data into [R2 Object Storage](/r2/). Ingested data is delivered as newline delimited JSON files (`ndjson`), with optional compression. Multiple pipelines can be configured to deliver data to the same R2 bucket.
27
+
Pipelines supports delivering data into [R2 Object Storage](/r2/). Ingested data is delivered as newline delimited JSON files (`ndjson`) with optional compression. Multiple pipelines can be configured to deliver data to the same R2 bucket.
28
28
29
29
## Data durability
30
30
Pipelines are designed to be reliable. Any data which is successfully ingested will be delivered to the configured R2 bucket, provided that the [R2 API credentials associated with a pipeline](/r2/api/tokens/) remain valid.
@@ -43,8 +43,8 @@ Pipelines update without dropping records. Updating an existing pipeline effecti
43
43
This means that updates might take a few minutes to go into effect. For example, if you update a pipeline's sink, previously ingested data might continue to be delivered into the old sink.
44
44
45
45
## Backpressure behavior
46
-
If you send too much data, the pipeline will communicate backpressure by returning a 429 response to HTTP requests, or throwing an error if using the Workers API. Refer to the [limits](/pipelines/platform/limits) to learn how much volume a single pipeline can support. You might see 429 responses if you are sending too many requests, or sending too much data.
46
+
If you send too much data, the pipeline will communicate backpressure by returning a 429 response to HTTP requests, or throwing an error if using the Workers API. Refer to the [limits](/pipelines/platform/limits) to learn how much volume a single pipeline can support. You might see 429 responses if you are sending too many requests or sending too much data.
47
47
48
48
If you are consistently seeing backpressure from your pipeline, consider the following strategies:
49
-
* Increase the [shard count](/pipelines/build-with-pipelines/shards), to increase the maxiumum throughput of your pipeline.
50
-
* Send data to a second pipeline if you receive an error. You can setup multiple pipelines to write to the same R2 bucket.
49
+
* Increase the [shard count](/pipelines/build-with-pipelines/shards) to increase the maximum throughput of your pipeline.
50
+
* Send data to a second pipeline if you receive an error. You can set up multiple pipelines to write to the same R2 bucket.
Cloudflare Pipelines allows you to ingest load high volumes of real time streaming data, and load into [R2 Object Storage](/r2/), without managing any infrastructure.
14
14
15
15
By following this guide, you will:
16
-
1. Setup an R2 bucket
17
-
2. Create a pipeline, with HTTP as a source, and an R2 bucket as a sink
18
-
3. Send data to your pipeline's HTTP ingestion endpoint
19
-
4. Verify the output delivered to R2
16
+
1. Setup an R2 bucket.
17
+
2. Create a pipeline, with HTTP as a source, and an R2 bucket as a sink.
18
+
3. Send data to your pipeline's HTTP ingestion endpoint.
19
+
4. Verify the output delivered to R2.
20
20
21
21
:::note
22
22
@@ -53,7 +53,7 @@ To create a pipeline using Wrangler, run the following command in a terminal, an
After running this command, you'll be prompted to authorize Cloudflare Workers Pipelines to create an R2 API token on your behalf. These tokens used by your pipeline when loading data into your bucket. You can approve the request through the browser link which will open automatically.
56
+
After running this command, you will be prompted to authorize Cloudflare Workers Pipelines to create an R2 API token on your behalf. These tokens used by your pipeline when loading data into your bucket. You can approve the request through the browser link which will open automatically.
57
57
58
58
If you prefer not to authenticate this way, you may pass your [R2 API Token](/r2/api/tokens/) to Wrangler:
1. Ensure it is descriptive and relevant to the type of events you intend to ingest. You cannot change the name of the pipeline after creating it.
66
-
2. Pipeline names must be between 1 and 63 characters long.
67
-
3. The name cannot contain special characters outside dashes (`-`).
65
+
- Ensure it is descriptive and relevant to the type of events you intend to ingest. You cannot change the name of the pipeline after creating it.
66
+
- The pipeline name must be between 1 and 63 characters long.
67
+
- The name cannot contain special characters outside dashes (`-`).
68
68
4. The name must start and end with a letter or a number.
69
69
70
-
You'll notice that we have set two optional flags while creating the pipeline: `--batch-max-seconds` and `--compression`. We've added these flags to make it faster for you to see the output of your first pipeline. For production use cases, we recommend keeping the default settings.
70
+
You will notice two optional flags are set while creating the pipeline: `--batch-max-seconds` and `--compression`. These flags are added to make it faster for you to see the output of your first pipeline. For production use cases, we recommend keeping the default settings.
71
71
72
72
Once you create your pipeline, you will receive a HTTP endpoint which you can post data to. You should see output as shown below:
73
73
@@ -133,7 +133,7 @@ Open the [R2 dashboard](https://dash.cloudflare.com/?to=/:account/r2/overview),
133
133
134
134
## Next steps
135
135
136
-
* Learn about how to [setup authentication, or CORS settings](/pipelines/build-with-pipelines/http), on your HTTP endpoint
137
-
* Send data to your Pipeline from a Cloudflare Worker, using our [Workers API documentation](/pipelines/build-with-pipelines/workers-apis)
136
+
* Learn about how to [setup authentication, or CORS settings](/pipelines/build-with-pipelines/http), on your HTTP endpoint.
137
+
* Send data to your Pipeline from a Cloudflare Worker using the [Workers API documentation](/pipelines/build-with-pipelines/workers-apis).
138
138
139
139
If you have any feature requests or notice any bugs, share your feedback directly with the Cloudflare team by joining the [Cloudflare Developers community on Discord](https://discord.cloudflare.com).
Copy file name to clipboardExpand all lines: src/content/docs/pipelines/observability/metrics.mdx
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,9 +45,9 @@ The `pipelinesDeliverynAdaptiveGroups` dataset provides the following dimensions
45
45
46
46
## Query via the GraphQL API
47
47
48
-
You can programmatically query analytics for your Workflows via the [GraphQL Analytics API](/analytics/graphql-api/). This API queries the same datasets as the Cloudflare dashboard, and supports GraphQL [introspection](/analytics/graphql-api/features/discovery/introspection/).
48
+
You can programmatically query analytics for your Workflows via the [GraphQL Analytics API](/analytics/graphql-api/). This API queries the same datasets as the Cloudflare dashboard and supports GraphQL [introspection](/analytics/graphql-api/features/discovery/introspection/).
49
49
50
-
Pipelines GraphQL datasets require an `accountTag` filter, with your Cloudflare account ID.
50
+
Pipelines GraphQL datasets require an `accountTag` filter with your Cloudflare account ID.
51
51
52
52
### Measure total bytes & records ingested over time period
0 commit comments