Skip to content

Commit 100e7cc

Browse files
daisyfaithaumakodster28thatsKevinJain
authored andcommitted
[Workers AI] Batch-api (#21413)
* CLI * Initial documentation * Update src/content/docs/workers-ai/get-started/workers-wrangler.mdx * removed the why * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kevin Jain <[email protected]> * supported models * rest API * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kevin Jain <[email protected]> * minor fixes * curl fix * typescript fixes * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kody Jackson <[email protected]> * Update src/content/docs/workers-ai/features/async-batch-api.mdx Co-authored-by: Kody Jackson <[email protected]> * file restructure and added template * deleted file * Update src/content/docs/workers-ai/features/batch-api/get-started.mdx Co-authored-by: Kody Jackson <[email protected]> * Update src/content/docs/workers-ai/features/batch-api/get-started.mdx Co-authored-by: Kody Jackson <[email protected]> * edits * template link * Update src/content/docs/workers-ai/features/batch-api/get-started.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/batch-api/get-started.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/batch-api/get-started.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/batch-api/get-started.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/batch-api/get-started.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/batch-api/get-started.mdx Co-authored-by: Kevin Jain <[email protected]> * Update src/content/docs/workers-ai/features/batch-api/index.mdx * Added beta badge * Small updates * update * change title of code block * Updated response * match order * Updates * Remove unused components --------- Co-authored-by: Kody Jackson <[email protected]> Co-authored-by: Kevin Jain <[email protected]>
1 parent 704518e commit 100e7cc

File tree

3 files changed

+249
-0
lines changed

3 files changed

+249
-0
lines changed
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
pcx_content_type: configuration
3+
title: Asynchronous Batch API
4+
sidebar:
5+
order: 1
6+
group:
7+
badge: Beta
8+
---
9+
10+
import { Render, PackageManagers, WranglerConfig, CURL } from "~/components";
11+
12+
Asynchronous batch processing lets you send a collection (batch) of inference requests in a single call. Instead of expecting immediate responses for every request, the system queues them for processing and returns the results later.
13+
14+
Batch processing is useful for large workloads such as summarization or embeddings when there is no human interaction. Using the batch API will guarantee that your requests are fulfilled eventually, rather than erroring out if Cloudflare does have enough capacity at a given time.
15+
16+
When you send a batch request, the API immediately acknowledges receipt with a status like `queued` and provides a unique `request_id`. This ID is later used to poll for the final responses once the processing is complete.
17+
18+
You can use the Batch API by either creating and deploying a Cloudflare Worker that leverages the [Batch API with the AI binding](/workers-ai/features/batch-api/workers-binding/), using the [REST API](/workers-ai/features/batch-api/rest-api/) directly or by starting from a [template](https://github.com/craigsdennis/batch-please-workers-ai).
19+
20+
:::note[Note]
21+
22+
Ensure that the total payload is under 10 MB.
23+
24+
:::
25+
26+
## Demo application
27+
28+
If you want to get started quickly, click the button below:
29+
30+
[![Deploy to Workers](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/craigsdennis/batch-please-workers-ai)
31+
32+
This will create a repository in your GitHub account and deploy a ready-to-use Worker that demonstrates how to use Cloudflare's Asynchronous Batch API. The template includes preconfigured AI bindings, and examples for sending and retrieving batch requests with and without external references. Once deployed, you can visit the live Worker and start experimenting with the Batch API immediately.
33+
34+
## Supported Models
35+
36+
- [@cf/meta/llama-3.3-70b-instruct-fp8-fast](/workers-ai/models/llama-3.3-70b-instruct-fp8-fast/)
37+
- [@cf/baai/bge-small-en-v1.5](/workers-ai/models/bge-small-en-v1.5/)
38+
- [@cf/baai/bge-base-en-v1.5](/workers-ai/models/bge-base-en-v1.5/)
39+
- [@cf/baai/bge-large-en-v1.5](/workers-ai/models/bge-large-en-v1.5/)
40+
- [@cf/baai/bge-m3](/workers-ai/models/bge-m3/)
41+
- [@cf/meta/m2m100-1.2b](/workers-ai/models/m2m100-1.2b/)
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
---
2+
pcx_content_type: how-to
3+
title: REST API
4+
sidebar:
5+
order: 4
6+
---
7+
8+
If you prefer to work directly with the REST API instead of a [Cloudflare Worker](/workers-ai/features/batch-api/workers-binding/), below are the steps on how to do it:
9+
10+
## 1. Sending a Batch Request
11+
12+
Make a POST request using the following pattern. You can pass `external_reference` as a unique ID per-prompt that will be returned in the response.
13+
14+
```bash title="Sending a batch request" {11,15,19}
15+
curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/run/@cf/baai/bge-m3?queueRequest=true" \
16+
--header "Authorization: Bearer $API_TOKEN" \
17+
--header 'Content-Type: application/json' \
18+
--json '{
19+
"requests": [
20+
{
21+
"query": "This is a story about Cloudflare",
22+
"contexts": [
23+
{
24+
"text": "This is a story about an orange cloud",
25+
"external_reference": "story1"
26+
},
27+
{
28+
"text": "This is a story about a llama",
29+
"external_reference": "story2"
30+
},
31+
{
32+
"text": "This is a story about a hugging emoji",
33+
"external_reference": "story3"
34+
}
35+
]
36+
}
37+
]
38+
}'
39+
```
40+
41+
```json output {4}
42+
{
43+
"result": {
44+
"status": "queued",
45+
"request_id": "768f15b7-4fd6-4498-906e-ad94ffc7f8d2",
46+
"model": "@cf/baai/bge-m3"
47+
},
48+
"success": true,
49+
"errors": [],
50+
"messages": []
51+
}
52+
```
53+
54+
## 2. Retrieving the Batch Response
55+
56+
After receiving a `request_id` from your initial POST, you can poll for or retrieve the results with another POST request:
57+
58+
```bash title="Retrieving a response"
59+
curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/run/@cf/baai/bge-m3?queueRequest=true" \
60+
--header "Authorization: Bearer $API_TOKEN" \
61+
--header 'Content-Type: application/json' \
62+
--json '{
63+
"request_id": "<uuid>"
64+
}'
65+
```
66+
67+
```json output
68+
{
69+
"result": {
70+
"responses": [
71+
{
72+
"id": 0,
73+
"result": {
74+
"response": [
75+
{ "id": 0, "score": 0.73974609375 },
76+
{ "id": 1, "score": 0.642578125 },
77+
{ "id": 2, "score": 0.6220703125 }
78+
]
79+
},
80+
"success": true,
81+
"external_reference": null
82+
}
83+
],
84+
"usage": { "prompt_tokens": 12, "completion_tokens": 0, "total_tokens": 12 }
85+
},
86+
"success": true,
87+
"errors": [],
88+
"messages": []
89+
}
90+
```
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
---
2+
pcx_content_type: how-to
3+
title: Workers Binding
4+
sidebar:
5+
order: 2
6+
---
7+
8+
import {
9+
Render,
10+
PackageManagers,
11+
TypeScriptExample,
12+
WranglerConfig,
13+
CURL,
14+
} from "~/components";
15+
16+
You can use Workers Bindings to interact with the Batch API.
17+
18+
## Send a Batch request
19+
20+
Send your initial batch inference request by composing a JSON payload containing an array of individual inference requests and the `queueRequest: true` property (which is what controlls queueing behavior).
21+
22+
:::note[Note]
23+
24+
Ensure that the total payload is under 10 MB.
25+
26+
:::
27+
28+
```ts {26} title="src/index.ts"
29+
export interface Env {
30+
AI: Ai;
31+
}
32+
export default {
33+
async fetch(request, env): Promise<Response> {
34+
const embeddings = await env.AI.run(
35+
"@cf/baai/bge-m3",
36+
{
37+
requests: [
38+
{
39+
query: "This is a story about Cloudflare",
40+
contexts: [
41+
{
42+
text: "This is a story about an orange cloud",
43+
},
44+
{
45+
text: "This is a story about a llama",
46+
},
47+
{
48+
text: "This is a story about a hugging emoji",
49+
},
50+
],
51+
},
52+
],
53+
},
54+
{ queueRequest: true },
55+
);
56+
57+
return Response.json(embeddings);
58+
},
59+
} satisfies ExportedHandler<Env>;
60+
```
61+
62+
```json output {4}
63+
{
64+
"status": "queued",
65+
"model": "@cf/baai/bge-m3",
66+
"request_id": "000-000-000"
67+
}
68+
```
69+
70+
You will get a response with the following values:
71+
72+
- **`status`**: Indicates that your request is queued.
73+
- **`request_id`**: A unique identifier for the batch request.
74+
- **`model`**: The model used for the batch inference.
75+
76+
Of these, the `request_id` is important for when you need to [poll the batch status](#poll-batch-status).
77+
78+
### Poll batch status
79+
80+
Once your batch request is queued, use the `request_id` to poll for its status. During processing, the API returns a status `queued` or `running` indicating that the request is still in the queue or being processed.
81+
82+
```typescript title=src/index.ts
83+
export interface Env {
84+
AI: Ai;
85+
}
86+
87+
export default {
88+
async fetch(request, env): Promise<Response> {
89+
const status = await env.AI.run("@cf/baai/bge-m3", {
90+
request_id: "000-000-000",
91+
});
92+
93+
return Response.json(status);
94+
},
95+
} satisfies ExportedHandler<Env>;
96+
```
97+
98+
```json output
99+
{
100+
"responses": [
101+
{
102+
"id": 0,
103+
"result": {
104+
"response": [
105+
{ "id": 0, "score": 0.73974609375 },
106+
{ "id": 1, "score": 0.642578125 },
107+
{ "id": 2, "score": 0.6220703125 }
108+
]
109+
},
110+
"success": true,
111+
"external_reference": null
112+
}
113+
],
114+
"usage": { "prompt_tokens": 12, "completion_tokens": 0, "total_tokens": 12 }
115+
}
116+
```
117+
118+
When the inference is complete, the API returns a final HTTP status code of `200` along with an array of responses. Each response object corresponds to an individual input prompt, identified by an `id` that maps to the index of the prompt in your original request.

0 commit comments

Comments
 (0)