Skip to content

Commit 5cebaeb

Browse files
removed the why
1 parent 2e5f403 commit 5cebaeb

File tree

1 file changed

+109
-120
lines changed

1 file changed

+109
-120
lines changed

src/content/docs/workers-ai/features/async-batch-api.mdx

Lines changed: 109 additions & 120 deletions
Original file line numberDiff line numberDiff line change
@@ -15,17 +15,6 @@ Asynchronous batch processing lets you send a collection (batch) of inference re
1515

1616
When you send a batch request, the API immediately acknowledges receipt with a status like `"queued"` and provides a unique `request_id`. This ID is later used to poll for the final responses once the processing is complete.
1717

18-
### Why is it important?
19-
20-
- **Handling large workloads:**
21-
Ideal for use cases like summarizing large documents or generating embeddings from many data points. Instead of overwhelming the system with thousands of individual requests, you can bundle them into a single, manageable batch.
22-
23-
- **Improved resource management:**
24-
In a serverless environment, resources such as GPUs are limited. Async processing avoids the scenario where a sudden influx of requests leads to capacity issues or cold starts, allowing the platform to auto-scale more gracefully.
25-
26-
- **Enhanced reliability:**
27-
Guarantees that even if your batch request is queued due to high demand, every individual inference will eventually be processed. This separation between request submission and result delivery helps maintain system performance for real-time applications.
28-
2918
## 2. Prerequisites and setup
3019

3120
<Render file="prereqs" product="workers" />
@@ -36,11 +25,7 @@ Open your terminal and run the following command:
3625

3726
Create a new Worker project named `batch-api` by running:
3827

39-
<PackageManagers
40-
type="create"
41-
pkg="cloudflare@latest"
42-
args={"batch-api"}
43-
/>
28+
<PackageManagers type="create" pkg="cloudflare@latest" args={"batch-api"} />
4429

4530
<Render
4631
file="c3-post-run-steps"
@@ -82,31 +67,30 @@ Your binding is [available in your Worker code](/workers/reference/migrate-to-mo
8267

8368
## 4. How to use the Batch API
8469

85-
### 1. Sending a Batch request
70+
### 1. Sending a Batch request
8671

8772
Send your initial batch inference request by composing a JSON payload containing an array of individual inference requests. Ensure that the total payload is under 25 MB.
8873

89-
9074
```javascript title=Example code
9175
// Input: JSON with an array of individual request JSONs
9276
const resp = env.AI.run("@cf/meta/llama-3.3-70b-instruct-batch", {
93-
"prompts": [
94-
{
95-
"prompt": "tell me a joke",
96-
"temperature": 0.5,
97-
"max_tokens": 100
98-
},
99-
{
100-
"prompt": "write an email from user to provider.",
101-
"temperature": 0.6,
102-
"max_tokens": 101
103-
},
104-
{
105-
"prompt": "tell me a joke about llamas",
106-
"temperature": 0.7,
107-
"max_tokens": 102
108-
}
109-
]
77+
prompts: [
78+
{
79+
prompt: "tell me a joke",
80+
temperature: 0.5,
81+
max_tokens: 100,
82+
},
83+
{
84+
prompt: "write an email from user to provider.",
85+
temperature: 0.6,
86+
max_tokens: 101,
87+
},
88+
{
89+
prompt: "tell me a joke about llamas",
90+
temperature: 0.7,
91+
max_tokens: 102,
92+
},
93+
],
11094
});
11195
```
11296

@@ -116,9 +100,9 @@ After sending your batch request, you will receive a response similar to:
116100

117101
```json
118102
{
119-
"status": "queued",
120-
"request_id": "000-000-000",
121-
"model": "@cf/meta/llama-3.3-70b-instruct-batch"
103+
"status": "queued",
104+
"request_id": "000-000-000",
105+
"model": "@cf/meta/llama-3.3-70b-instruct-batch"
122106
}
123107
```
124108

@@ -132,15 +116,17 @@ Once your batch request is queued, use the `request_id` to poll for its status.
132116

133117
```javascript title=example
134118
// Polling the status of the batch request using the request_id
135-
const status = env.AI.run("@cf/meta/llama-3.3-70b-instruct-batch", { "request_id": "000-000-000" });
119+
const status = env.AI.run("@cf/meta/llama-3.3-70b-instruct-batch", {
120+
request_id: "000-000-000",
121+
});
136122
```
137123

138124
#### Expected polling response (while queued)
139125

140126
```json
141127
{
142-
"status": "queued",
143-
"request_id": "000-000-000"
128+
"status": "queued",
129+
"request_id": "000-000-000"
144130
}
145131
```
146132

@@ -150,40 +136,40 @@ When the inference is complete, the API returns a final HTTP status code of `200
150136

151137
```json title=Example complete response
152138
{
153-
"responses": [
154-
{
155-
"id": 2,
156-
"result": {
157-
"result": {
158-
"response": "\nHere's one:\n\nWhy did the llama refuse to play poker?\n\nBecause he always got fleeced!\n\n(Sorry, it's a bit of a woolly joke, but I hope it made you smile!)"
159-
}
160-
},
161-
"success": true
162-
},
163-
{
164-
"id": 0,
165-
"result": {
166-
"result": {
167-
"response": ", please!\nHere's one:\n\nWhat do you call a fake noodle?\n\n(wait for it...)\n\nAn impasta!\n\nHope that made you laugh! Do you want to hear another one? \n#joke #humor #funny #laugh #smile #noodle #impasta #pastajoke\nHow was that? Do you want another one? I have a million of them!\n\nHere's another one:\n\nWhat do you call a can opener that doesn't work?\n\n(wait"
168-
}
169-
},
170-
"success": true
171-
},
172-
{
173-
"id": 1,
174-
"result": {
175-
"result": {
176-
"response": " The user is asking for a refund for a service that was not provided.\nHere is an example of an email that a user might send to a provider requesting a refund for a service that was not provided:\nSubject: Request for Refund for Undelivered Service\n\nDear [Provider's Name],\n\nI am writing to request a refund for the [service name] that I was supposed to receive from your company on [date]. Unfortunately, the service was not provided as agreed upon, and I have not"
177-
}
178-
},
179-
"success": true
180-
}
181-
],
182-
"usage": {
183-
"prompt_tokens": 22,
184-
"completion_tokens": 243,
185-
"total_tokens": 265
186-
}
139+
"responses": [
140+
{
141+
"id": 2,
142+
"result": {
143+
"result": {
144+
"response": "\nHere's one:\n\nWhy did the llama refuse to play poker?\n\nBecause he always got fleeced!\n\n(Sorry, it's a bit of a woolly joke, but I hope it made you smile!)"
145+
}
146+
},
147+
"success": true
148+
},
149+
{
150+
"id": 0,
151+
"result": {
152+
"result": {
153+
"response": ", please!\nHere's one:\n\nWhat do you call a fake noodle?\n\n(wait for it...)\n\nAn impasta!\n\nHope that made you laugh! Do you want to hear another one? \n#joke #humor #funny #laugh #smile #noodle #impasta #pastajoke\nHow was that? Do you want another one? I have a million of them!\n\nHere's another one:\n\nWhat do you call a can opener that doesn't work?\n\n(wait"
154+
}
155+
},
156+
"success": true
157+
},
158+
{
159+
"id": 1,
160+
"result": {
161+
"result": {
162+
"response": " The user is asking for a refund for a service that was not provided.\nHere is an example of an email that a user might send to a provider requesting a refund for a service that was not provided:\nSubject: Request for Refund for Undelivered Service\n\nDear [Provider's Name],\n\nI am writing to request a refund for the [service name] that I was supposed to receive from your company on [date]. Unfortunately, the service was not provided as agreed upon, and I have not"
163+
}
164+
},
165+
"success": true
166+
}
167+
],
168+
"usage": {
169+
"prompt_tokens": 22,
170+
"completion_tokens": 243,
171+
"total_tokens": 265
172+
}
187173
}
188174
```
189175

@@ -199,53 +185,56 @@ Below is a sample TypeScript Worker that receives a batch of inference requests,
199185

200186
```ts
201187
export interface Env {
202-
AI: {
203-
run: (model: string, payload: any) => Promise<any>;
204-
};
188+
AI: {
189+
run: (model: string, payload: any) => Promise<any>;
190+
};
205191
}
206192

207193
export default {
208-
async fetch(request: Request, env: Env): Promise<Response> {
209-
// Only allow POST requests
210-
if (request.method !== "POST") {
211-
return new Response("Method Not Allowed", { status: 405 });
212-
}
213-
214-
try {
215-
// Parse the incoming JSON payload
216-
const data = await request.json();
217-
218-
// Validate that we have a 'prompts' array in the payload
219-
if (!data.prompts || !Array.isArray(data.prompts)) {
220-
return new Response(
221-
JSON.stringify({
222-
error: "Missing or invalid 'prompts' array in request payload."
223-
}),
224-
{ status: 400, headers: { "Content-Type": "application/json" } }
225-
);
226-
}
227-
228-
// Send the batch request to the AI model via the AI binding
229-
// Replace "@cf/meta/llama-3.3-70b-instruct-batch" with your desired batch-enabled model if needed.
230-
const batchResponse = await env.AI.run("@cf/meta/llama-3.3-70b-instruct-batch", {
231-
prompts: data.prompts,
232-
});
233-
234-
// Return the response from the AI API
235-
return new Response(JSON.stringify(batchResponse), {
236-
status: 200,
237-
headers: { "Content-Type": "application/json" },
238-
});
239-
} catch (error: any) {
240-
// Log the error if needed and return a 500 response
241-
return new Response(
242-
JSON.stringify({
243-
error: error?.toString() || "An unknown error occurred."
244-
}),
245-
{ status: 500, headers: { "Content-Type": "application/json" } }
246-
);
247-
}
248-
},
194+
async fetch(request: Request, env: Env): Promise<Response> {
195+
// Only allow POST requests
196+
if (request.method !== "POST") {
197+
return new Response("Method Not Allowed", { status: 405 });
198+
}
199+
200+
try {
201+
// Parse the incoming JSON payload
202+
const data = await request.json();
203+
204+
// Validate that we have a 'prompts' array in the payload
205+
if (!data.prompts || !Array.isArray(data.prompts)) {
206+
return new Response(
207+
JSON.stringify({
208+
error: "Missing or invalid 'prompts' array in request payload.",
209+
}),
210+
{ status: 400, headers: { "Content-Type": "application/json" } },
211+
);
212+
}
213+
214+
// Send the batch request to the AI model via the AI binding
215+
// Replace "@cf/meta/llama-3.3-70b-instruct-batch" with your desired batch-enabled model if needed.
216+
const batchResponse = await env.AI.run(
217+
"@cf/meta/llama-3.3-70b-instruct-batch",
218+
{
219+
prompts: data.prompts,
220+
},
221+
);
222+
223+
// Return the response from the AI API
224+
return new Response(JSON.stringify(batchResponse), {
225+
status: 200,
226+
headers: { "Content-Type": "application/json" },
227+
});
228+
} catch (error: any) {
229+
// Log the error if needed and return a 500 response
230+
return new Response(
231+
JSON.stringify({
232+
error: error?.toString() || "An unknown error occurred.",
233+
}),
234+
{ status: 500, headers: { "Content-Type": "application/json" } },
235+
);
236+
}
237+
},
249238
};
250239
```
251240

@@ -268,4 +257,4 @@ After completing your changes, deploy your Worker with the following command:
268257
npm run deploy
269258
```
270259

271-
By following this guide, you can create a Worker that leverages the async batch API to efficiently handle large workloads and improve the performance of both batch and real-time applications.
260+
By following this guide, you can create a Worker that leverages the async batch API to efficiently handle large workloads and improve the performance of both batch and real-time applications.

0 commit comments

Comments
 (0)