-
Notifications
You must be signed in to change notification settings - Fork 10k
[Workers AI] Batch-api #21413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
[Workers AI] Batch-api #21413
Changes from 40 commits
Commits
Show all changes
49 commits
Select commit
Hold shift + click to select a range
9c43fe2
CLI
daisyfaithauma 3b60888
Initial documentation
daisyfaithauma 2e5f403
Update src/content/docs/workers-ai/get-started/workers-wrangler.mdx
kodster28 5cebaeb
removed the why
daisyfaithauma 3218dc0
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma 27a9944
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma 4d6d683
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma 4377c68
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma 8435e21
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma 1e9e606
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma 2911331
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma c335075
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma 95838b8
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma 508fde3
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma 837b314
supported models
daisyfaithauma dc23ec7
rest API
daisyfaithauma 0b01f90
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma fe3c647
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma bcb0d0f
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma ac4e9a3
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma cb9c4de
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma 578b436
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma 870d2fa
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma c1ff1e2
minor fixes
daisyfaithauma ac0ffe2
curl fix
daisyfaithauma 44c162a
typescript fixes
daisyfaithauma cbb0802
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma 707c17c
Update src/content/docs/workers-ai/features/async-batch-api.mdx
daisyfaithauma 7712905
file restructure and added template
daisyfaithauma 4b22ee5
deleted file
daisyfaithauma 244d0ff
Update src/content/docs/workers-ai/features/batch-api/get-started.mdx
daisyfaithauma 5ea7532
Update src/content/docs/workers-ai/features/batch-api/get-started.mdx
daisyfaithauma b85bae9
edits
daisyfaithauma 3c1fc05
template link
daisyfaithauma c461b93
Update src/content/docs/workers-ai/features/batch-api/get-started.mdx
daisyfaithauma 855ad6a
Update src/content/docs/workers-ai/features/batch-api/get-started.mdx
daisyfaithauma 03088a8
Update src/content/docs/workers-ai/features/batch-api/get-started.mdx
daisyfaithauma 7a9ed2f
Update src/content/docs/workers-ai/features/batch-api/get-started.mdx
daisyfaithauma b2b8ca9
Update src/content/docs/workers-ai/features/batch-api/get-started.mdx
daisyfaithauma 1e190b6
Update src/content/docs/workers-ai/features/batch-api/get-started.mdx
daisyfaithauma 741b59b
Update src/content/docs/workers-ai/features/batch-api/index.mdx
kodster28 1c8f8ce
Added beta badge
kodster28 f5ca4bc
Small updates
kodster28 50501e9
update
kodster28 02b681c
change title of code block
kodster28 7192946
Updated response
kodster28 eff4fbe
match order
kodster28 4bed317
Updates
kodster28 b6fe2d6
Remove unused components
kodster28 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
57 changes: 57 additions & 0 deletions
57
src/content/docs/workers-ai/features/batch-api/batch-api-rest-api.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| --- | ||
| pcx_content_type: get-started | ||
| title: Using the Batch API via the REST API | ||
| sidebar: | ||
| order: 4 | ||
| --- | ||
|
|
||
| import { Render, PackageManagers, WranglerConfig, CURL } from "~/components"; | ||
|
|
||
| If you prefer to work directly with the REST API instead of a [Cloudflare Worker](/workers-ai/features/batch-api/get-started/), below are the steps on how to do it: | ||
|
|
||
| ## 1. Sending a Batch Request | ||
|
|
||
| Make a POST request to the following endpoint: | ||
|
|
||
| <CURL | ||
| url="https://api.cloudflare.com/client/v4/accounts/<account-id>/ai/run/@cf/meta/ray-llama-3.3-70b-instruct-fp8-fast?queueRequest=true" | ||
kodster28 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| method="POST" | ||
| headers={{ | ||
| Authorization: "<token>", | ||
| "Content-Type": "application/json", | ||
| }} | ||
| json={{ | ||
| requests: [ | ||
| { | ||
| prompt: "Tell me a story", | ||
| external_reference: "reference2", | ||
| }, | ||
| { | ||
| prompt: "Tell me a joke", | ||
| external_reference: "reference1", | ||
| }, | ||
| ], | ||
| }} | ||
| code={{ | ||
kodster28 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| mark: "external_reference", | ||
| }} | ||
| /> | ||
|
|
||
| ## 2. Retrieving the Batch Response | ||
|
|
||
| After receiving a `request_id` from your initial POST, you can poll for or retrieve the results with another POST request: | ||
|
|
||
| <CURL | ||
| url="https://api.cloudflare.com/client/v4/accounts/<account-id>/ai/run/@cf/meta/ray-llama-3.3-70b-instruct-fp8-fast?queueRequest=true" | ||
kodster28 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| method="POST" | ||
| headers={{ | ||
| Authorization: "<token>", | ||
| "Content-Type": "application/json", | ||
| }} | ||
| json={{ | ||
| request_id: "<uuid>", | ||
| }} | ||
| code={{ | ||
| mark: "request_id", | ||
| }} | ||
| /> | ||
266 changes: 266 additions & 0 deletions
266
src/content/docs/workers-ai/features/batch-api/get-started.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,266 @@ | ||
| --- | ||
| pcx_content_type: get-started | ||
| title: Using Batch API via Workers | ||
| sidebar: | ||
| order: 2 | ||
| --- | ||
|
|
||
| import { Render, PackageManagers, WranglerConfig, CURL } from "~/components"; | ||
|
|
||
| If you want to skip the steps and get started quickly, click the button below: | ||
|
|
||
| [](https://deploy.workers.cloudflare.com/?url=https://github.com/craigsdennis/batch-please-workers-ai) | ||
|
|
||
| This will create a repository in your GitHub account and deploy a ready-to-use Worker that demonstrates how to use Cloudflare's Asynchronous Batch API. The template includes preconfigured AI bindings, and examples for sending and retrieving batch requests with and without external references. Once deployed, you can visit the live Worker and start experimenting with the Batch API immediately. | ||
|
|
||
| ## 1. Prerequisites and setup | ||
|
|
||
| <Render file="prereqs" product="workers" /> | ||
|
|
||
| ## 2. Creating Your Cloudflare Worker project | ||
|
|
||
| Open your terminal and run the following command: | ||
|
|
||
| Create a new Worker project named `batch-api` by running: | ||
|
|
||
| <PackageManagers type="create" pkg="cloudflare@latest" args={"batch-api"} /> | ||
|
|
||
| <Render | ||
| file="c3-post-run-steps" | ||
| product="workers" | ||
| params={{ | ||
| category: "hello-world", | ||
| type: "Worker only", | ||
| lang: "TypeScript", | ||
| }} | ||
| /> | ||
|
|
||
| This will create a new `batch-api` directory. Your new `batch-api` directory will include: | ||
|
|
||
| - A `"Hello World"` [Worker](/workers/get-started/guide/#3-write-code) at `src/index.ts`. | ||
| - A [`wrangler.jsonc`](/workers/wrangler/configuration/) configuration file. | ||
|
|
||
| Go to your application directory: | ||
|
|
||
| ```sh | ||
| cd batch-api | ||
| ``` | ||
|
|
||
| ## 3. Configure wrangler | ||
|
|
||
| You must create an AI binding for your Worker to connect to Workers AI. [Bindings](/workers/runtime-apis/bindings/) allow your Workers to interact with resources, like Workers AI, on the Cloudflare Developer Platform. | ||
|
|
||
| To bind Workers AI to your Worker, add the following to the end of your Wrangler file: | ||
|
|
||
| <WranglerConfig> | ||
|
|
||
| ```toml | ||
| [ai] | ||
| binding = "AI" | ||
| ``` | ||
|
|
||
| </WranglerConfig> | ||
|
|
||
| Your binding is [available in your Worker code](/workers/reference/migrate-to-module-workers/#bindings-in-es-modules-format) on [`env.AI`](/workers/runtime-apis/handlers/fetch/). | ||
|
|
||
| ## 4. How to use the Batch API | ||
|
|
||
| ### Sending a Batch request | ||
|
|
||
| Send your initial batch inference request by composing a JSON payload containing an array of individual inference requests. | ||
|
|
||
| :::note[Note] | ||
|
|
||
| Ensure that the total payload is under 10 MB. | ||
|
|
||
| ::: | ||
|
|
||
| ```typescript title="src/index.js" | ||
daisyfaithauma marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| interface AIRequest { | ||
| prompt: string; | ||
| temperature: number; | ||
| max_tokens: number; | ||
| } | ||
|
|
||
| const resp = env.AI.run( | ||
| "@cf/meta/llama-3.3-70b-instruct-fp8-fast", | ||
| { | ||
| requests: [ | ||
| { | ||
| prompt: "tell me a joke", | ||
| temperature: 0.5, | ||
| max_tokens: 100, | ||
| }, | ||
| { | ||
| prompt: "write an email from user to provider.", | ||
| temperature: 0.6, | ||
| max_tokens: 101, | ||
| }, | ||
| { | ||
| prompt: "tell me a joke about llamas", | ||
| temperature: 0.7, | ||
| max_tokens: 102, | ||
| }, | ||
| ] as AIRequest[], | ||
| }, | ||
| { queueRequest: true }, | ||
| ); | ||
| ``` | ||
|
|
||
| After sending your batch request, you will receive a response similar to: | ||
|
|
||
| ```json output | ||
| { | ||
| "status": "queued", | ||
| "request_id": "000-000-000", | ||
| "model": "@cf/meta/llama-3.3-70b-instruct-fp8-fast" | ||
| } | ||
| ``` | ||
|
|
||
| - **`status`**: Indicates that your request is queued. | ||
| - **`request_id`**: A unique identifier for the batch request. | ||
| - **`model`**: The model used for the batch inference. | ||
|
|
||
| ### Polling the Batch Request Status | ||
|
|
||
| Once your batch request is queued, use the `request_id` to poll for its status. During processing, the API returns a status "queued" or "running" indicating that the request is still in the queue or being processed. | ||
|
|
||
| ```typescript title=example | ||
| // Polling the status of the batch request using the request_id | ||
| const status = env.AI.run("@cf/meta/llama-3.3-70b-instruct-fp8-fast", { | ||
| request_id: "000-000-000", | ||
| }); | ||
| ``` | ||
|
|
||
| ```json output | ||
| { | ||
| "status": "queued", | ||
| "request_id": "000-000-000" | ||
| } | ||
| ``` | ||
|
|
||
| ### Retrieving the Batch Inference results | ||
|
|
||
| When the inference is complete, the API returns a final HTTP status code of `200` along with an array of responses. Each response object corresponds to an individual input prompt, identified by an `id` that maps to the index of the prompt in your original request. | ||
|
|
||
| ```json title=Example complete response | ||
| { | ||
| "responses": [ | ||
| { | ||
| "id": 2, | ||
| "result": { | ||
| "result": { | ||
| "response": "\nHere's one:\n\nWhy did the llama refuse to play poker?\n\nBecause he always got fleeced!\n\n(Sorry, it's a bit of a woolly joke, but I hope it made you smile!)" | ||
| } | ||
| }, | ||
| "success": true | ||
| }, | ||
| { | ||
| "id": 0, | ||
| "result": { | ||
| "result": { | ||
| "response": ", please!\nHere's one:\n\nWhat do you call a fake noodle?\n\n(wait for it...)\n\nAn impasta!\n\nHope that made you laugh! Do you want to hear another one? \n#joke #humor #funny #laugh #smile #noodle #impasta #pastajoke\nHow was that? Do you want another one? I have a million of them!\n\nHere's another one:\n\nWhat do you call a can opener that doesn't work?\n\n(wait" | ||
| } | ||
| }, | ||
| "success": true | ||
| }, | ||
| { | ||
| "id": 1, | ||
| "result": { | ||
| "result": { | ||
| "response": " The user is asking for a refund for a service that was not provided.\nHere is an example of an email that a user might send to a provider requesting a refund for a service that was not provided:\nSubject: Request for Refund for Undelivered Service\n\nDear [Provider's Name],\n\nI am writing to request a refund for the [service name] that I was supposed to receive from your company on [date]. Unfortunately, the service was not provided as agreed upon, and I have not" | ||
| } | ||
| }, | ||
| "success": true | ||
| } | ||
| ], | ||
| "usage": { | ||
| "prompt_tokens": 22, | ||
| "completion_tokens": 243, | ||
| "total_tokens": 265 | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| - **`responses`**: An array of response objects. Each object includes: | ||
| - **`id`**: The index of the corresponding prompt. | ||
| - **`result`**: The inference output, which may be nested depending on your implementation. | ||
| - **`success`**: A Boolean flag indicating if the request was processed successfully. | ||
| - **`usage`**: Contains token usage details for the batch request. | ||
|
|
||
| ## 5. Implementing the Batch API in your Worker | ||
|
|
||
| Below is a sample TypeScript Worker that receives a batch of inference requests, sends them to a batch-enabled AI model, and returns the results. | ||
|
|
||
| ```ts title="src/index.ts" | ||
| export interface Env { | ||
| AI: { | ||
| run: (model: string, payload: any, options: any) => Promise<any>; | ||
| }; | ||
| } | ||
|
|
||
| export default { | ||
| async fetch(request: Request, env: Env): Promise<Response> { | ||
| // Only allow POST requests | ||
| if (request.method !== "POST") { | ||
| return new Response("Method Not Allowed", { status: 405 }); | ||
| } | ||
|
|
||
| try { | ||
| // Parse the incoming JSON payload | ||
| const data = await request.json(); | ||
|
|
||
| // Validate that we have a 'requests' array in the payload | ||
| if (!data.requests || !Array.isArray(data.requests)) { | ||
| return new Response( | ||
| JSON.stringify({ | ||
| error: "Missing or invalid 'requests' array in request payload.", | ||
| }), | ||
| { status: 400, headers: { "Content-Type": "application/json" } }, | ||
| ); | ||
| } | ||
|
|
||
| // Send the batch request to the AI model via the AI binding | ||
| // Replace "@cf/meta/llama-3.3-70b-instruct-fp8-fast" with your desired batch-enabled model if needed. | ||
| const batchResponse = await env.AI.run( | ||
| "@cf/meta/llama-3.3-70b-instruct-fp8-fast", | ||
| { | ||
| requests: data.requests, | ||
| }, | ||
| { queueRequest: true }, | ||
| ); | ||
|
|
||
| // Return the response from the AI API | ||
| return new Response(JSON.stringify(batchResponse), { | ||
| status: 200, | ||
| headers: { "Content-Type": "application/json" }, | ||
| }); | ||
| } catch (error: any) { | ||
| // Log the error if needed and return a 500 response | ||
| return new Response( | ||
| JSON.stringify({ | ||
| error: error?.toString() || "An unknown error occurred.", | ||
| }), | ||
| { status: 500, headers: { "Content-Type": "application/json" } }, | ||
| ); | ||
| } | ||
| }, | ||
| }; | ||
| ``` | ||
|
|
||
| - **Receiving the Batch request:** | ||
| The Worker expects a `POST` request with a `JSON` payload containing an array called `requests`. Each prompt is an individual inference request. | ||
|
|
||
| - **Processing the request:** | ||
| The code validates the payload and uses the AI binding (`env.AI.run()`) to send the batch request to a designated model (such as, `@cf/meta/llama-3.3-70b-instruct-fp8-fast`). | ||
|
|
||
| - **Returning the results:** | ||
| Once processed, the AI API returns the batch responses. These responses include an array where each object has an `id` (matching the prompt index) and the corresponding inference result. | ||
|
|
||
| ## 6. Deployment | ||
|
|
||
| After completing your changes, deploy your Worker with the following command: | ||
|
|
||
| ```sh | ||
| npm run deploy | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| --- | ||
| pcx_content_type: configuration | ||
| title: Asynchronous Batch API | ||
| sidebar: | ||
| order: 1 | ||
| --- | ||
|
|
||
| import { Render, PackageManagers, WranglerConfig, CURL } from "~/components"; | ||
|
|
||
| ## What is Asynchronous Batch? | ||
kodster28 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Asynchronous batch processing lets you send a collection (batch) of inference requests in a single call. Instead of expecting immediate responses for every request, the system queues them for processing and returns the results later. | ||
|
|
||
| When you send a batch request, the API immediately acknowledges receipt with a status like `queued` and provides a unique `request_id`. This ID is later used to poll for the final responses once the processing is complete. | ||
|
|
||
| You can use the Batch API by either creating and deploying a Cloudflare Worker that leverages the [Batch API with the AI binding](/workers-ai/features/batch-api/get-started/), using the [REST API](/workers-ai/features/batch-api/batch-api-rest-api/) directly or by starting from a [template](https://github.com/craigsdennis/batch-please-workers-ai). | ||
|
|
||
| :::note[Note] | ||
|
|
||
| Ensure that the total payload is under 10 MB. | ||
|
|
||
| ::: | ||
|
|
||
daisyfaithauma marked this conversation as resolved.
Show resolved
Hide resolved
daisyfaithauma marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ## Supported Models | ||
|
|
||
| - [@cf/meta/llama-3.3-70b-instruct-fp8-fast](/workers-ai/models/llama-3.3-70b-instruct-fp8-fast/) | ||
| - [@cf/baai/bge-small-en-v1.5](/workers-ai/models/bge-small-en-v1.5/) | ||
| - [@cf/baai/bge-base-en-v1.5](/workers-ai/models/bge-base-en-v1.5/) | ||
| - [@cf/baai/bge-large-en-v1.5](/workers-ai/models/bge-large-en-v1.5/) | ||
| - [@cf/baai/bge-m3](/workers-ai/models/bge-m3/) | ||
| - [@cf/meta/m2m100-1.2b](/workers-ai/models/m2m100-1.2b/) | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.