Skip to content

Commit 7712905

Browse files
file restructure and added template
1 parent 707c17c commit 7712905

File tree

3 files changed

+358
-0
lines changed

3 files changed

+358
-0
lines changed
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
---
2+
pcx_content_type: get-started
3+
title: Using the Batch API via the REST API
4+
sidebar:
5+
order: 4
6+
---
7+
8+
import { Render, PackageManagers, WranglerConfig, CURL } from "~/components";
9+
10+
If you prefer to work directly with the REST API instead of a Cloudflare Worker, below are the steps on how to do it:
11+
12+
## 1. Sending a Batch Request
13+
14+
Make a POST request to the following endpoint:
15+
16+
<CURL
17+
url="https://api.cloudflare.com/client/v4/accounts/<account-id>/ai/run/@cf/meta/ray-llama-3.3-70b-instruct-fp8-fast?queueRequest=true"
18+
method="POST"
19+
headers={{
20+
Authorization: "<token>",
21+
"Content-Type": "application/json",
22+
}}
23+
/>
24+
25+
```json output
26+
{
27+
"requests": [
28+
{
29+
"prompt": "Tell me a story",
30+
"external_reference": "reference2"
31+
},
32+
{
33+
"prompt": "Tell me a joke",
34+
"external_reference": "reference1"
35+
}
36+
]
37+
}
38+
```
39+
40+
## 2. Retrieving the Batch Response
41+
42+
After receiving a `request_id` from your initial POST, you can poll for or retrieve the results with another POST request:
43+
44+
<CURL
45+
url="https://api.cloudflare.com/client/v4/accounts/<account-id>/ai/run/@cf/meta/ray-llama-3.3-70b-instruct-fp8-fast"
46+
method="POST"
47+
headers={{
48+
Authorization: "<token>",
49+
"Content-Type": "application/json",
50+
}}
51+
/>
52+
53+
```json output
54+
{
55+
"request_id": "<uuid>"
56+
}
57+
```
Lines changed: 270 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,270 @@
1+
---
2+
pcx_content_type: get-started
3+
title: Using Batch API via Workers
4+
sidebar:
5+
order: 2
6+
---
7+
8+
import { Render, PackageManagers, WranglerConfig, CURL } from "~/components";
9+
10+
If you want to skip the steps and get started quickly, click the button below:
11+
12+
[![Deploy to Workers](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/craigsdennis/batch-please-workers-ai)
13+
14+
This will create a repository in your GitHub account and deploy a ready-to-use Worker that demonstrates how to use Cloudflare's Async Batch API. The template includes preconfigured AI bindings, and examples for sending and retrieving batch requests with and without external references. Once deployed, you can visit the live Worker and start experimenting with the Batch API immediately.
15+
16+
## 1. Prerequisites and setup
17+
18+
<Render file="prereqs" product="workers" />
19+
20+
## 2. Creating Your Cloudflare Worker project
21+
22+
Open your terminal and run the following command:
23+
24+
Create a new Worker project named `batch-api` by running:
25+
26+
<PackageManagers type="create" pkg="cloudflare@latest" args={"batch-api"} />
27+
28+
<Render
29+
file="c3-post-run-steps"
30+
product="workers"
31+
params={{
32+
category: "hello-world",
33+
type: "Worker only",
34+
lang: "TypeScript",
35+
}}
36+
/>
37+
38+
This will create a new `batch-api` directory. Your new `batch-api` directory will include:
39+
40+
- A `"Hello World"` [Worker](/workers/get-started/guide/#3-write-code) at `src/index.ts`.
41+
- A [`wrangler.jsonc`](/workers/wrangler/configuration/) configuration file.
42+
43+
Go to your application directory:
44+
45+
```sh
46+
cd batch-api
47+
```
48+
49+
## 3. Configure wrangler
50+
51+
You must create an AI binding for your Worker to connect to Workers AI. [Bindings](/workers/runtime-apis/bindings/) allow your Workers to interact with resources, like Workers AI, on the Cloudflare Developer Platform.
52+
53+
To bind Workers AI to your Worker, add the following to the end of your Wrangler file:
54+
55+
<WranglerConfig>
56+
57+
```toml
58+
[ai]
59+
binding = "AI"
60+
```
61+
62+
</WranglerConfig>
63+
64+
Your binding is [available in your Worker code](/workers/reference/migrate-to-module-workers/#bindings-in-es-modules-format) on [`env.AI`](/workers/runtime-apis/handlers/fetch/).
65+
66+
## 4. How to use the Batch API
67+
68+
### 1. Sending a Batch request
69+
70+
Send your initial batch inference request by composing a JSON payload containing an array of individual inference requests.
71+
72+
:::note[Note]
73+
74+
Ensure that the total payload is under 10 MB.
75+
76+
:::
77+
78+
```typescript title="src/index.js"
79+
interface AIRequest {
80+
prompt: string;
81+
temperature: number;
82+
max_tokens: number;
83+
}
84+
85+
const resp = env.AI.run(
86+
"@cf/meta/ray-llama-3.3-70b-instruct-fp8-fast",
87+
{
88+
requests: [
89+
{
90+
prompt: "tell me a joke",
91+
temperature: 0.5,
92+
max_tokens: 100,
93+
},
94+
{
95+
prompt: "write an email from user to provider.",
96+
temperature: 0.6,
97+
max_tokens: 101,
98+
},
99+
{
100+
prompt: "tell me a joke about llamas",
101+
temperature: 0.7,
102+
max_tokens: 102,
103+
},
104+
] as AIRequest[],
105+
},
106+
{ queueRequest: true },
107+
);
108+
```
109+
110+
#### Expected Response
111+
112+
After sending your batch request, you will receive a response similar to:
113+
114+
```json
115+
{
116+
"status": "queued",
117+
"request_id": "000-000-000",
118+
"model": "@cf/meta/ray-llama-3.3-70b-instruct-fp8-fast"
119+
}
120+
```
121+
122+
- **`status`**: Indicates that your request is queued.
123+
- **`request_id`**: A unique identifier for the batch request.
124+
- **`model`**: The model used for the batch inference.
125+
126+
### 2. Polling the Batch Request Status
127+
128+
Once your batch request is queued, use the `request_id` to poll for its status. During processing, the API returns a status "queued" or "running" indicating that the request is still in the queue or being processed.
129+
130+
```javascript title=example
131+
// Polling the status of the batch request using the request_id
132+
const status = env.AI.run("@cf/meta/ray-llama-3.3-70b-instruct-fp8-fast", {
133+
request_id: "000-000-000",
134+
});
135+
```
136+
137+
```json output
138+
{
139+
"status": "queued",
140+
"request_id": "000-000-000"
141+
}
142+
```
143+
144+
### 3. Retrieving the Batch Inference results
145+
146+
When the inference is complete, the API returns a final HTTP status code of `200` along with an array of responses. Each response object corresponds to an individual input prompt, identified by an `id` that maps to the index of the prompt in your original request.
147+
148+
```json title=Example complete response
149+
{
150+
"responses": [
151+
{
152+
"id": 2,
153+
"result": {
154+
"result": {
155+
"response": "\nHere's one:\n\nWhy did the llama refuse to play poker?\n\nBecause he always got fleeced!\n\n(Sorry, it's a bit of a woolly joke, but I hope it made you smile!)"
156+
}
157+
},
158+
"success": true
159+
},
160+
{
161+
"id": 0,
162+
"result": {
163+
"result": {
164+
"response": ", please!\nHere's one:\n\nWhat do you call a fake noodle?\n\n(wait for it...)\n\nAn impasta!\n\nHope that made you laugh! Do you want to hear another one? \n#joke #humor #funny #laugh #smile #noodle #impasta #pastajoke\nHow was that? Do you want another one? I have a million of them!\n\nHere's another one:\n\nWhat do you call a can opener that doesn't work?\n\n(wait"
165+
}
166+
},
167+
"success": true
168+
},
169+
{
170+
"id": 1,
171+
"result": {
172+
"result": {
173+
"response": " The user is asking for a refund for a service that was not provided.\nHere is an example of an email that a user might send to a provider requesting a refund for a service that was not provided:\nSubject: Request for Refund for Undelivered Service\n\nDear [Provider's Name],\n\nI am writing to request a refund for the [service name] that I was supposed to receive from your company on [date]. Unfortunately, the service was not provided as agreed upon, and I have not"
174+
}
175+
},
176+
"success": true
177+
}
178+
],
179+
"usage": {
180+
"prompt_tokens": 22,
181+
"completion_tokens": 243,
182+
"total_tokens": 265
183+
}
184+
}
185+
```
186+
187+
- **`responses`**: An array of response objects. Each object includes:
188+
- **`id`**: The index of the corresponding prompt.
189+
- **`result`**: The inference output, which may be nested depending on your implementation.
190+
- **`success`**: A Boolean flag indicating if the request was processed successfully.
191+
- **`usage`**: Contains token usage details for the batch request.
192+
193+
## 6. Implementing the Batch API in your Worker
194+
195+
Below is a sample TypeScript Worker that receives a batch of inference requests, sends them to a batch-enabled AI model, and returns the results.
196+
197+
```ts title="src/index.js"
198+
export interface Env {
199+
AI: {
200+
run: (model: string, payload: any, options: any) => Promise<any>;
201+
};
202+
}
203+
204+
export default {
205+
async fetch(request: Request, env: Env): Promise<Response> {
206+
// Only allow POST requests
207+
if (request.method !== "POST") {
208+
return new Response("Method Not Allowed", { status: 405 });
209+
}
210+
211+
try {
212+
// Parse the incoming JSON payload
213+
const data = await request.json();
214+
215+
// Validate that we have a 'requests' array in the payload
216+
if (!data.requests || !Array.isArray(data.requests)) {
217+
return new Response(
218+
JSON.stringify({
219+
error: "Missing or invalid 'requests' array in request payload.",
220+
}),
221+
{ status: 400, headers: { "Content-Type": "application/json" } },
222+
);
223+
}
224+
225+
// Send the batch request to the AI model via the AI binding
226+
// Replace "@cf/meta/ray-llama-3.3-70b-instruct-fp8-fast" with your desired batch-enabled model if needed.
227+
const batchResponse = await env.AI.run(
228+
"@cf/meta/ray-llama-3.3-70b-instruct-fp8-fast",
229+
{
230+
requests: data.requests,
231+
},
232+
{ queueRequest: true },
233+
);
234+
235+
// Return the response from the AI API
236+
return new Response(JSON.stringify(batchResponse), {
237+
status: 200,
238+
headers: { "Content-Type": "application/json" },
239+
});
240+
} catch (error: any) {
241+
// Log the error if needed and return a 500 response
242+
return new Response(
243+
JSON.stringify({
244+
error: error?.toString() || "An unknown error occurred.",
245+
}),
246+
{ status: 500, headers: { "Content-Type": "application/json" } },
247+
);
248+
}
249+
},
250+
};
251+
```
252+
253+
### How it works:
254+
255+
- **Receiving the Batch request:**
256+
The Worker expects a `POST` request with a `JSON` payload containing an array called `requests`. Each prompt is an individual inference request.
257+
258+
- **Processing the request:**
259+
The code validates the payload and uses the AI binding (`env.AI.run()`) to send the batch request to a designated model (such as, `@cf/meta/ray-llama-3.3-70b-instruct-fp8-fast`).
260+
261+
- **Returning the results:**
262+
Once processed, the AI API returns the batch responses. These responses include an array where each object has an `id` (matching the prompt index) and the corresponding inference result.
263+
264+
## 7. Deployment
265+
266+
After completing your changes, deploy your Worker with the following command:
267+
268+
```sh
269+
npm run deploy
270+
```
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
pcx_content_type: configuration
3+
title: Asynchronous Batch API
4+
sidebar:
5+
order: 1
6+
---
7+
8+
import { Render, PackageManagers, WranglerConfig, CURL } from "~/components";
9+
10+
This guide will walk you through the concepts behind asynchronous batch processing, explain why it matters, and show you how to create and deploy a Cloudflare Worker that leverages the [Batch API with the AI binding](/workers-ai/features/batch-api/get-started/), working with [REST API](/workers-ai/features/batch-api/batch-api-rest-api/) instead of a Cloudflare Worker and through the a template.
11+
12+
## What is Asynchronous Batch?
13+
14+
Asynchronous batch processing lets you send a collection (batch) of inference requests in a single call. Instead of expecting immediate responses for every request, the system queues them for processing and returns the results later.
15+
16+
When you send a batch request, the API immediately acknowledges receipt with a status like `queued` and provides a unique `request_id`. This ID is later used to poll for the final responses once the processing is complete.
17+
18+
:::note[Note]
19+
20+
Ensure that the total payload is under 10 MB.
21+
22+
:::
23+
24+
## Supported Models
25+
26+
- [@cf/meta/llama-3.3-70b-instruct-fp8-fast](/workers-ai/models/llama-3.3-70b-instruct-fp8-fast/)
27+
- [@cf/baai/bge-small-en-v1.5](/workers-ai/models/bge-small-en-v1.5/)
28+
- [@cf/baai/bge-base-en-v1.5](/workers-ai/models/bge-base-en-v1.5/)
29+
- [@cf/baai/bge-large-en-v1.5](/workers-ai/models/bge-large-en-v1.5/)
30+
- [@cf/baai/bge-m3](/workers-ai/models/bge-m3/)
31+
- [@cf/meta/m2m100-1.2b](/workers-ai/models/m2m100-1.2b/)

0 commit comments

Comments
 (0)