Skip to content

Commit f5ca4bc

Browse files
committed
Small updates
1 parent 1c8f8ce commit f5ca4bc

File tree

2 files changed

+6
-2
lines changed

2 files changed

+6
-2
lines changed

src/content/docs/workers-ai/features/batch-api/batch-api-rest-api.mdx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ If you prefer to work directly with the REST API instead of a [Cloudflare Worker
1414
Make a POST request to the following endpoint:
1515

1616
<CURL
17-
url="https://api.cloudflare.com/client/v4/accounts/&lt;account-id&gt;/ai/run/@cf/meta/ray-llama-3.3-70b-instruct-fp8-fast?queueRequest=true"
17+
url="https://api.cloudflare.com/client/v4/accounts/&lt;account-id&gt;/ai/run/@cf/meta/llama-3.3-70b-instruct-fp8-fast?queueRequest=true"
1818
method="POST"
1919
headers={{
2020
Authorization: "<token>",
@@ -37,12 +37,14 @@ Make a POST request to the following endpoint:
3737
}}
3838
/>
3939

40+
You can pass `external_reference` as a unique ID per-prompt that will be returned in the response.
41+
4042
## 2. Retrieving the Batch Response
4143

4244
After receiving a `request_id` from your initial POST, you can poll for or retrieve the results with another POST request:
4345

4446
<CURL
45-
url="https://api.cloudflare.com/client/v4/accounts/&lt;account-id&gt;/ai/run/@cf/meta/ray-llama-3.3-70b-instruct-fp8-fast?queueRequest=true"
47+
url="https://api.cloudflare.com/client/v4/accounts/&lt;account-id&gt;/ai/run/@cf/meta/llama-3.3-70b-instruct-fp8-fast?queueRequest=true"
4648
method="POST"
4749
headers={{
4850
Authorization: "<token>",

src/content/docs/workers-ai/features/batch-api/index.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ import { Render, PackageManagers, WranglerConfig, CURL } from "~/components";
1111

1212
Asynchronous batch processing lets you send a collection (batch) of inference requests in a single call. Instead of expecting immediate responses for every request, the system queues them for processing and returns the results later.
1313

14+
Batch processing is useful for large workloads such as summarization or embeddings when there is no human interaction. Using the batch API will guarantee that your requests are fulfilled eventually, rather than erroring out if Cloudflare does have enough capacity at a given time.
15+
1416
When you send a batch request, the API immediately acknowledges receipt with a status like `queued` and provides a unique `request_id`. This ID is later used to poll for the final responses once the processing is complete.
1517

1618
You can use the Batch API by either creating and deploying a Cloudflare Worker that leverages the [Batch API with the AI binding](/workers-ai/features/batch-api/get-started/), using the [REST API](/workers-ai/features/batch-api/batch-api-rest-api/) directly or by starting from a [template](https://github.com/craigsdennis/batch-please-workers-ai).

0 commit comments

Comments
 (0)