Skip to content

Commit b85bae9

Browse files
edits
1 parent 5ea7532 commit b85bae9

File tree

3 files changed

+34
-37
lines changed

3 files changed

+34
-37
lines changed

src/content/docs/workers-ai/features/batch-api/batch-api-rest-api.mdx

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -7,51 +7,51 @@ sidebar:
77

88
import { Render, PackageManagers, WranglerConfig, CURL } from "~/components";
99

10-
If you prefer to work directly with the REST API instead of a Cloudflare Worker, below are the steps on how to do it:
10+
If you prefer to work directly with the REST API instead of a [Cloudflare Worker](/workers-ai/features/batch-api/get-started/), below are the steps on how to do it:
1111

1212
## 1. Sending a Batch Request
1313

1414
Make a POST request to the following endpoint:
1515

1616
<CURL
17-
url="https://api.cloudflare.com/client/v4/accounts/<account-id>/ai/run/@cf/meta/ray-llama-3.3-70b-instruct-fp8-fast?queueRequest=true"
17+
url="https://api.cloudflare.com/client/v4/accounts/&lt;account-id&gt;/ai/run/@cf/meta/ray-llama-3.3-70b-instruct-fp8-fast?queueRequest=true"
1818
method="POST"
1919
headers={{
2020
Authorization: "<token>",
2121
"Content-Type": "application/json",
2222
}}
23+
json={{
24+
requests: [
25+
{
26+
prompt: "Tell me a story",
27+
external_reference: "reference2",
28+
},
29+
{
30+
prompt: "Tell me a joke",
31+
external_reference: "reference1",
32+
},
33+
],
34+
}}
35+
code={{
36+
mark: "external_reference",
37+
}}
2338
/>
2439

25-
```json output
26-
{
27-
"requests": [
28-
{
29-
"prompt": "Tell me a story",
30-
"external_reference": "reference2"
31-
},
32-
{
33-
"prompt": "Tell me a joke",
34-
"external_reference": "reference1"
35-
}
36-
]
37-
}
38-
```
39-
4040
## 2. Retrieving the Batch Response
4141

4242
After receiving a `request_id` from your initial POST, you can poll for or retrieve the results with another POST request:
4343

4444
<CURL
45-
url="https://api.cloudflare.com/client/v4/accounts/<account-id>/ai/run/@cf/meta/ray-llama-3.3-70b-instruct-fp8-fast"
45+
url="https://api.cloudflare.com/client/v4/accounts/&lt;account-id&gt;/ai/run/@cf/meta/ray-llama-3.3-70b-instruct-fp8-fast?queueRequest=true"
4646
method="POST"
4747
headers={{
4848
Authorization: "<token>",
4949
"Content-Type": "application/json",
5050
}}
51+
json={{
52+
request_id: "<uuid>",
53+
}}
54+
code={{
55+
mark: "request_id",
56+
}}
5157
/>
52-
53-
```json output
54-
{
55-
"request_id": "<uuid>"
56-
}
57-
```

src/content/docs/workers-ai/features/batch-api/get-started.mdx

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ If you want to skip the steps and get started quickly, click the button below:
1111

1212
[![Deploy to Workers](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/craigsdennis/batch-please-workers-ai)
1313

14-
This will create a repository in your GitHub account and deploy a ready-to-use Worker that demonstrates how to use Cloudflare's Async Batch API. The template includes preconfigured AI bindings, and examples for sending and retrieving batch requests with and without external references. Once deployed, you can visit the live Worker and start experimenting with the Batch API immediately.
14+
This will create a repository in your GitHub account and deploy a ready-to-use Worker that demonstrates how to use Cloudflare's Asynchronous Batch API. The template includes preconfigured AI bindings, and examples for sending and retrieving batch requests with and without external references. Once deployed, you can visit the live Worker and start experimenting with the Batch API immediately.
1515

1616
## 1. Prerequisites and setup
1717

@@ -65,7 +65,7 @@ Your binding is [available in your Worker code](/workers/reference/migrate-to-mo
6565

6666
## 4. How to use the Batch API
6767

68-
### 1. Sending a Batch request
68+
### Sending a Batch request
6969

7070
Send your initial batch inference request by composing a JSON payload containing an array of individual inference requests.
7171

@@ -107,11 +107,9 @@ const resp = env.AI.run(
107107
);
108108
```
109109

110-
#### Expected Response
111-
112110
After sending your batch request, you will receive a response similar to:
113111

114-
```json
112+
```json output
115113
{
116114
"status": "queued",
117115
"request_id": "000-000-000",
@@ -123,11 +121,11 @@ After sending your batch request, you will receive a response similar to:
123121
- **`request_id`**: A unique identifier for the batch request.
124122
- **`model`**: The model used for the batch inference.
125123

126-
### 2. Polling the Batch Request Status
124+
### Polling the Batch Request Status
127125

128126
Once your batch request is queued, use the `request_id` to poll for its status. During processing, the API returns a status "queued" or "running" indicating that the request is still in the queue or being processed.
129127

130-
```javascript title=example
128+
```typescript title=example
131129
// Polling the status of the batch request using the request_id
132130
const status = env.AI.run("@cf/meta/ray-llama-3.3-70b-instruct-fp8-fast", {
133131
request_id: "000-000-000",
@@ -141,7 +139,7 @@ const status = env.AI.run("@cf/meta/ray-llama-3.3-70b-instruct-fp8-fast", {
141139
}
142140
```
143141

144-
### 3. Retrieving the Batch Inference results
142+
### Retrieving the Batch Inference results
145143

146144
When the inference is complete, the API returns a final HTTP status code of `200` along with an array of responses. Each response object corresponds to an individual input prompt, identified by an `id` that maps to the index of the prompt in your original request.
147145

@@ -190,7 +188,7 @@ When the inference is complete, the API returns a final HTTP status code of `200
190188
- **`success`**: A Boolean flag indicating if the request was processed successfully.
191189
- **`usage`**: Contains token usage details for the batch request.
192190

193-
## 6. Implementing the Batch API in your Worker
191+
## 5. Implementing the Batch API in your Worker
194192

195193
Below is a sample TypeScript Worker that receives a batch of inference requests, sends them to a batch-enabled AI model, and returns the results.
196194

@@ -250,7 +248,6 @@ export default {
250248
};
251249
```
252250

253-
254251
- **Receiving the Batch request:**
255252
The Worker expects a `POST` request with a `JSON` payload containing an array called `requests`. Each prompt is an individual inference request.
256253

@@ -260,7 +257,7 @@ export default {
260257
- **Returning the results:**
261258
Once processed, the AI API returns the batch responses. These responses include an array where each object has an `id` (matching the prompt index) and the corresponding inference result.
262259

263-
## 7. Deployment
260+
## 6. Deployment
264261

265262
After completing your changes, deploy your Worker with the following command:
266263

src/content/docs/workers-ai/features/batch-api/index.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,14 @@ sidebar:
77

88
import { Render, PackageManagers, WranglerConfig, CURL } from "~/components";
99

10-
This guide will walk you through the concepts behind asynchronous batch processing, explain why it matters, and show you how to create and deploy a Cloudflare Worker that leverages the [Batch API with the AI binding](/workers-ai/features/batch-api/get-started/), working with [REST API](/workers-ai/features/batch-api/batch-api-rest-api/) instead of a Cloudflare Worker and through the a template.
11-
1210
## What is Asynchronous Batch?
1311

1412
Asynchronous batch processing lets you send a collection (batch) of inference requests in a single call. Instead of expecting immediate responses for every request, the system queues them for processing and returns the results later.
1513

1614
When you send a batch request, the API immediately acknowledges receipt with a status like `queued` and provides a unique `request_id`. This ID is later used to poll for the final responses once the processing is complete.
1715

16+
You can use the Batch API by either creating and deploying a Cloudflare Worker that leverages the [Batch API with the AI binding](/workers-ai/features/batch-api/get-started/), using the [REST API](/workers-ai/features/batch-api/batch-api-rest-api/) directly or by starting from a template.
17+
1818
:::note[Note]
1919

2020
Ensure that the total payload is under 10 MB.

0 commit comments

Comments
 (0)