You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -15,17 +15,6 @@ Asynchronous batch processing lets you send a collection (batch) of inference re
15
15
16
16
When you send a batch request, the API immediately acknowledges receipt with a status like `"queued"` and provides a unique `request_id`. This ID is later used to poll for the final responses once the processing is complete.
17
17
18
-
### Why is it important?
19
-
20
-
-**Handling large workloads:**
21
-
Ideal for use cases like summarizing large documents or generating embeddings from many data points. Instead of overwhelming the system with thousands of individual requests, you can bundle them into a single, manageable batch.
22
-
23
-
-**Improved resource management:**
24
-
In a serverless environment, resources such as GPUs are limited. Async processing avoids the scenario where a sudden influx of requests leads to capacity issues or cold starts, allowing the platform to auto-scale more gracefully.
25
-
26
-
-**Enhanced reliability:**
27
-
Guarantees that even if your batch request is queued due to high demand, every individual inference will eventually be processed. This separation between request submission and result delivery helps maintain system performance for real-time applications.
28
-
29
18
## 2. Prerequisites and setup
30
19
31
20
<Renderfile="prereqs"product="workers" />
@@ -36,11 +25,7 @@ Open your terminal and run the following command:
36
25
37
26
Create a new Worker project named `batch-api` by running:
@@ -82,31 +67,30 @@ Your binding is [available in your Worker code](/workers/reference/migrate-to-mo
82
67
83
68
## 4. How to use the Batch API
84
69
85
-
### 1. Sending a Batch request
70
+
### 1. Sending a Batch request
86
71
87
72
Send your initial batch inference request by composing a JSON payload containing an array of individual inference requests. Ensure that the total payload is under 25 MB.
88
73
89
-
90
74
```javascript title=Example code
91
75
// Input: JSON with an array of individual request JSONs
@@ -150,40 +136,40 @@ When the inference is complete, the API returns a final HTTP status code of `200
150
136
151
137
```json title=Example complete response
152
138
{
153
-
"responses": [
154
-
{
155
-
"id": 2,
156
-
"result": {
157
-
"result": {
158
-
"response": "\nHere's one:\n\nWhy did the llama refuse to play poker?\n\nBecause he always got fleeced!\n\n(Sorry, it's a bit of a woolly joke, but I hope it made you smile!)"
159
-
}
160
-
},
161
-
"success": true
162
-
},
163
-
{
164
-
"id": 0,
165
-
"result": {
166
-
"result": {
167
-
"response": ", please!\nHere's one:\n\nWhat do you call a fake noodle?\n\n(wait for it...)\n\nAn impasta!\n\nHope that made you laugh! Do you want to hear another one? \n#joke #humor #funny #laugh #smile #noodle #impasta #pastajoke\nHow was that? Do you want another one? I have a million of them!\n\nHere's another one:\n\nWhat do you call a can opener that doesn't work?\n\n(wait"
168
-
}
169
-
},
170
-
"success": true
171
-
},
172
-
{
173
-
"id": 1,
174
-
"result": {
175
-
"result": {
176
-
"response": " The user is asking for a refund for a service that was not provided.\nHere is an example of an email that a user might send to a provider requesting a refund for a service that was not provided:\nSubject: Request for Refund for Undelivered Service\n\nDear [Provider's Name],\n\nI am writing to request a refund for the [service name] that I was supposed to receive from your company on [date]. Unfortunately, the service was not provided as agreed upon, and I have not"
177
-
}
178
-
},
179
-
"success": true
180
-
}
181
-
],
182
-
"usage": {
183
-
"prompt_tokens": 22,
184
-
"completion_tokens": 243,
185
-
"total_tokens": 265
186
-
}
139
+
"responses": [
140
+
{
141
+
"id": 2,
142
+
"result": {
143
+
"result": {
144
+
"response": "\nHere's one:\n\nWhy did the llama refuse to play poker?\n\nBecause he always got fleeced!\n\n(Sorry, it's a bit of a woolly joke, but I hope it made you smile!)"
145
+
}
146
+
},
147
+
"success": true
148
+
},
149
+
{
150
+
"id": 0,
151
+
"result": {
152
+
"result": {
153
+
"response": ", please!\nHere's one:\n\nWhat do you call a fake noodle?\n\n(wait for it...)\n\nAn impasta!\n\nHope that made you laugh! Do you want to hear another one? \n#joke #humor #funny #laugh #smile #noodle #impasta #pastajoke\nHow was that? Do you want another one? I have a million of them!\n\nHere's another one:\n\nWhat do you call a can opener that doesn't work?\n\n(wait"
154
+
}
155
+
},
156
+
"success": true
157
+
},
158
+
{
159
+
"id": 1,
160
+
"result": {
161
+
"result": {
162
+
"response": " The user is asking for a refund for a service that was not provided.\nHere is an example of an email that a user might send to a provider requesting a refund for a service that was not provided:\nSubject: Request for Refund for Undelivered Service\n\nDear [Provider's Name],\n\nI am writing to request a refund for the [service name] that I was supposed to receive from your company on [date]. Unfortunately, the service was not provided as agreed upon, and I have not"
163
+
}
164
+
},
165
+
"success": true
166
+
}
167
+
],
168
+
"usage": {
169
+
"prompt_tokens": 22,
170
+
"completion_tokens": 243,
171
+
"total_tokens": 265
172
+
}
187
173
}
188
174
```
189
175
@@ -199,53 +185,56 @@ Below is a sample TypeScript Worker that receives a batch of inference requests,
@@ -268,4 +257,4 @@ After completing your changes, deploy your Worker with the following command:
268
257
npm run deploy
269
258
```
270
259
271
-
By following this guide, you can create a Worker that leverages the async batch API to efficiently handle large workloads and improve the performance of both batch and real-time applications.
260
+
By following this guide, you can create a Worker that leverages the async batch API to efficiently handle large workloads and improve the performance of both batch and real-time applications.
0 commit comments