|
| 1 | +--- |
| 2 | +title: Send batch requests to Azure OpenAI LLMs |
| 3 | +content_type: how_to |
| 4 | +related_resources: |
| 5 | + - text: AI Gateway |
| 6 | + url: /ai-gateway/ |
| 7 | + - text: AI Proxy |
| 8 | + url: /plugins/ai-proxy/ |
| 9 | + |
| 10 | +description: Reduce costs by using llm/v1/files and llm/v1/batches route_types to send asynchronous batched requests to Azure OpenAI. |
| 11 | + |
| 12 | +products: |
| 13 | + - gateway |
| 14 | + - ai-gateway |
| 15 | + |
| 16 | +works_on: |
| 17 | + - on-prem |
| 18 | + - konnect |
| 19 | + |
| 20 | +min_version: |
| 21 | + gateway: '3.11' |
| 22 | + |
| 23 | +plugins: |
| 24 | + - ai-proxy |
| 25 | + |
| 26 | +entities: |
| 27 | + - service |
| 28 | + - route |
| 29 | + - plugin |
| 30 | + |
| 31 | +tags: |
| 32 | + - ai |
| 33 | + - azure |
| 34 | + |
| 35 | +tldr: |
| 36 | + q: How can I run many Azure OpenAI LLM requests at once? |
| 37 | + a: | |
| 38 | + Package your prompts into a JSONL file and upload it to the `/files` endpoint. Then launch a batch job with `/batches` to process everything asynchronously, and download the output from /files once the run completes. |
| 39 | +
|
| 40 | +tools: |
| 41 | + - deck |
| 42 | + |
| 43 | +prereqs: |
| 44 | + inline: |
| 45 | + - title: Azure OpenAI |
| 46 | + icon_url: /assets/icons/azure.svg |
| 47 | + content: | |
| 48 | + This tutorial uses Azure OpenAI service. Configure it as follows: |
| 49 | +
|
| 50 | + 1. [Create an Azure account](https://azure.microsoft.com/en-us/get-started/azure-portal). |
| 51 | + 2. In the Azure Portal, click **Create a resource**. |
| 52 | + 3. Search for **Azure OpenAI** and select **Azure OpenAI Service**. |
| 53 | + 4. Configure your Azure resource. |
| 54 | + 5. Export your instance name: |
| 55 | + ```bash |
| 56 | + export DECK_AZURE_INSTANCE_NAME='YOUR_AZURE_RESOURCE_NAME' |
| 57 | + ``` |
| 58 | + 6. Deploy your model in [Azure AI Foundry](https://ai.azure.com/): |
| 59 | + 1. Go to **My assets → Models and deployments → Deploy model**. |
| 60 | +
|
| 61 | + {:.warning} |
| 62 | + > Use a `globalbatch` or `datazonebatch` deployment type for batch operations since standard deployments (`GlobalStandard`) cannot process batch files. |
| 63 | +
|
| 64 | + 2. Export the API key and deployment ID: |
| 65 | + ```bash |
| 66 | + export DECK_AZURE_OPENAI_API_KEY='YOUR_AZURE_OPENAI_MODEL_API_KEY' |
| 67 | + export DECK_AZURE_DEPLOYMENT_ID='YOUR_AZURE_OPENAI_DEPLOYMENT_NAME'``` |
| 68 | + - title: Batch .jsonl file |
| 69 | + content: | |
| 70 | + To complete this tutorial, create a `batch.jsonl` to generate asynchronous batched LLM responses. We use `/v1/chat/completions` because it handles chat-based generation requests, instructing the LLM to produce conversational completions in batch mode. |
| 71 | +
|
| 72 | + Run the following command to create the file: |
| 73 | +
|
| 74 | + ```bash |
| 75 | + cat <<EOF > batch.jsonl |
| 76 | + {"custom_id": "prod1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a compelling product description for a solar-powered smart garden light."}], "max_tokens": 60}} |
| 77 | + {"custom_id": "prod2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a product description for an energy-efficient smart thermostat for home use."}], "max_tokens": 60}} |
| 78 | + {"custom_id": "prod3", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write an engaging product description for a biodegradable bamboo kitchen utensil set."}], "max_tokens": 60}} |
| 79 | + {"custom_id": "prod4", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a detailed product description for a water-saving smart shower head."}], "max_tokens": 60}} |
| 80 | + {"custom_id": "prod5", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a concise product description for a compact indoor air purifier that uses natural filters."}], "max_tokens": 60}} |
| 81 | + EOF |
| 82 | +
|
| 83 | + ``` |
| 84 | + entities: |
| 85 | + services: |
| 86 | + - files-service |
| 87 | + - batches-service |
| 88 | + routes: |
| 89 | + - files-route |
| 90 | + - batches-route |
| 91 | + |
| 92 | +cleanup: |
| 93 | + inline: |
| 94 | + - title: Clean up Konnect environment |
| 95 | + include_content: cleanup/platform/konnect |
| 96 | + icon_url: /assets/icons/gateway.svg |
| 97 | + - title: Destroy the {{site.base_gateway}} container |
| 98 | + include_content: cleanup/products/gateway |
| 99 | + icon_url: /assets/icons/gateway.svg |
| 100 | + |
| 101 | +automated_tests: false |
| 102 | +--- |
| 103 | +## Configure AI Proxy plugins for /files route |
| 104 | + |
| 105 | +Let's create an AI Proxy plugin for the `llm/v1/files` route type. It will be used to handle the upload and retrieval of JSONL files containing batch input and output data. This plugin instance ensures that input data is correctly staged for batch processing and that the results can be downloaded once the batch job completes. |
| 106 | + |
| 107 | +{% entity_examples %} |
| 108 | +entities: |
| 109 | + plugins: |
| 110 | + - name: ai-proxy |
| 111 | + service: files-service |
| 112 | + config: |
| 113 | + model_name_header: false |
| 114 | + route_type: llm/v1/files |
| 115 | + auth: |
| 116 | + header_name: Authorization |
| 117 | + header_value: Bearer ${azure_key} |
| 118 | + model: |
| 119 | + provider: azure |
| 120 | + options: |
| 121 | + azure_api_version: "2025-01-01-preview" |
| 122 | + azure_instance: ${azure_instance} |
| 123 | + azure_deployment_id: ${azure_deployment} |
| 124 | +variables: |
| 125 | + azure_key: |
| 126 | + value: "$AZURE_OPENAI_API_KEY" |
| 127 | + azure_instance: |
| 128 | + value: "$AZURE_INSTANCE_NAME" |
| 129 | + azure_deployment: |
| 130 | + value: "$AZURE_DEPLOYMENT_ID" |
| 131 | +{% endentity_examples %} |
| 132 | + |
| 133 | +## Configure AI Proxy plugins for /batches route |
| 134 | + |
| 135 | +Next, create an AI Proxy plugin for the `llm/v1/batches` route. This plugin manages the submission, monitoring, and retrieval of asynchronous batch jobs. It communicates with Azure OpenAI's batch deployment to process multiple LLM requests in a batch. |
| 136 | + |
| 137 | +{% entity_examples %} |
| 138 | +entities: |
| 139 | + plugins: |
| 140 | + - name: ai-proxy |
| 141 | + service: batches-service |
| 142 | + config: |
| 143 | + model_name_header: false |
| 144 | + route_type: llm/v1/batches |
| 145 | + auth: |
| 146 | + header_name: Authorization |
| 147 | + header_value: Bearer ${azure_key} |
| 148 | + model: |
| 149 | + provider: azure |
| 150 | + options: |
| 151 | + azure_api_version: "2025-01-01-preview" |
| 152 | + azure_instance: ${azure_instance} |
| 153 | + azure_deployment_id: ${azure_deployment} |
| 154 | +variables: |
| 155 | + azure_key: |
| 156 | + value: "$AZURE_OPENAI_API_KEY" |
| 157 | + azure_instance: |
| 158 | + value: "$AZURE_INSTANCE_NAME" |
| 159 | + azure_deployment: |
| 160 | + value: "$AZURE_DEPLOYMENT_ID" |
| 161 | +{% endentity_examples %} |
| 162 | + |
| 163 | +## Upload a .jsonl file for batching |
| 164 | + |
| 165 | +Now, let's use the following command to upload our [batching file](/#batch-jsonl-file) to the `/llm/v1/files` route: |
| 166 | + |
| 167 | +<!-- vale off --> |
| 168 | +{% validation request-check %} |
| 169 | +url: "/files" |
| 170 | +status_code: 200 |
| 171 | +form_data: |
| 172 | + purpose: "batch" |
| 173 | + file: "@batch.jsonl" |
| 174 | +extract_body: |
| 175 | + - name: 'id' |
| 176 | + variable: FILE_ID |
| 177 | +{% endvalidation %} |
| 178 | +<!-- vale on --> |
| 179 | + |
| 180 | +Once processed, you will see a JSON response like this: |
| 181 | + |
| 182 | +```json |
| 183 | +{ |
| 184 | + "status": "processed", |
| 185 | + "bytes": 1648, |
| 186 | + "purpose": "batch", |
| 187 | + "filename": "batch.jsonl", |
| 188 | + "id": "file-da4364d8fd714dd9b29706b91236ab02", |
| 189 | + "created_at": 1761817541, |
| 190 | + "object": "file" |
| 191 | +} |
| 192 | +``` |
| 193 | + |
| 194 | +Now, let's export the file ID: |
| 195 | + |
| 196 | +```bash |
| 197 | +export FILE_ID=YOUR_FILE_ID |
| 198 | +``` |
| 199 | + |
| 200 | +## Create a batching request |
| 201 | + |
| 202 | +Now, we can send a `POST` request to the `/batches` Route to create a batch using our uploaded file: |
| 203 | + |
| 204 | +{:.info} |
| 205 | +> The completion window must be set to `24h`, as it's the only value currently supported by the [OpenAI `/batches` API](https://platform.openai.com/docs/api-reference/batch/create). |
| 206 | +> |
| 207 | +> In this example we use the `/v1/chat/completions` route for batching because we are sending multiple structured chat-style prompts in OpenAI's chat completions format to be processed in bulk. |
| 208 | +
|
| 209 | +<!-- vale off --> |
| 210 | +{% validation request-check %} |
| 211 | +url: '/batches' |
| 212 | +status_code: 201 |
| 213 | +body: |
| 214 | + input_file_id: $FILE_ID |
| 215 | + endpoint: "/v1/chat/completions" |
| 216 | + completion_window: "24h" |
| 217 | +extract_body: |
| 218 | + - name: 'id' |
| 219 | + variable: BATCH_ID |
| 220 | +{% endvalidation %} |
| 221 | +<!-- vale on --> |
| 222 | + |
| 223 | +You will receive a response similar to: |
| 224 | + |
| 225 | +```json |
| 226 | +{ |
| 227 | + "cancelled_at": null, |
| 228 | + "cancelling_at": null, |
| 229 | + "completed_at": null, |
| 230 | + "completion_window": "24h", |
| 231 | + "created_at": 1761817562, |
| 232 | + "error_file_id": "", |
| 233 | + "expired_at": null, |
| 234 | + "expires_at": 1761903959, |
| 235 | + "failed_at": null, |
| 236 | + "finalizing_at": null, |
| 237 | + "id": "batch_379f1007-8057-4f43-be38-12f3d456c7da", |
| 238 | + "in_progress_at": null, |
| 239 | + "input_file_id": "file-da4364d8fd714dd9b29706b91236ab02", |
| 240 | + "errors": null, |
| 241 | + "metadata": null, |
| 242 | + "object": "batch", |
| 243 | + "output_file_id": "", |
| 244 | + "request_counts": { |
| 245 | + "total": 0, |
| 246 | + "completed": 0, |
| 247 | + "failed": 0 |
| 248 | + }, |
| 249 | + "status": "validating", |
| 250 | + "endpoint": "" |
| 251 | +} |
| 252 | +``` |
| 253 | +{:.no-copy-code} |
| 254 | + |
| 255 | + |
| 256 | +Copy the batch ID from this response to check the batch status and export it as an environment variable by running the following command in your terminal: |
| 257 | + |
| 258 | +```bash |
| 259 | +export BATCH_ID=YOUR_BATCH_ID |
| 260 | +``` |
| 261 | + |
| 262 | +## Check batching status |
| 263 | + |
| 264 | +Wait for a moment for the batching request to be completed, then check the status of your batch by sending the following request: |
| 265 | + |
| 266 | +<!-- vale off --> |
| 267 | +{% validation request-check %} |
| 268 | +url: /batches/$BATCH_ID |
| 269 | +extract_body: |
| 270 | + - name: 'output_file_id' |
| 271 | + variable: OUTPUT_FILE_ID |
| 272 | +{% endvalidation %} |
| 273 | +<!-- vale on --> |
| 274 | + |
| 275 | +A completed batch response looks like this: |
| 276 | + |
| 277 | +```json |
| 278 | +{ |
| 279 | + "cancelled_at": null, |
| 280 | + "cancelling_at": null, |
| 281 | + "completed_at": 1761817685, |
| 282 | + "completion_window": "24h", |
| 283 | + "created_at": 1761817562, |
| 284 | + "error_file_id": null, |
| 285 | + "expired_at": null, |
| 286 | + "expires_at": 1761903959, |
| 287 | + "failed_at": null, |
| 288 | + "finalizing_at": 1761817662, |
| 289 | + "id": "batch_379f1007-8057-4f43-be38-12f3d456c7da", |
| 290 | + "in_progress_at": null, |
| 291 | + "input_file_id": "file-da4364d8fd714dd9b29706b91236ab02", |
| 292 | + "errors": null, |
| 293 | + "metadata": null, |
| 294 | + "object": "batch", |
| 295 | + "output_file_id": "file-93d91f55-0418-abcd-1234-81f4bb334951", |
| 296 | + "request_counts": { |
| 297 | + "total": 5, |
| 298 | + "completed": 5, |
| 299 | + "failed": 0 |
| 300 | + }, |
| 301 | + "status": "completed", |
| 302 | + "endpoint": "/v1/chat/completions" |
| 303 | +} |
| 304 | +``` |
| 305 | +{:.no-copy-code} |
| 306 | + |
| 307 | +You can notice The `"request_counts"` object shows that all five requests in the batch were successfully completed (`"completed": 5`, `"failed": 0`). |
| 308 | + |
| 309 | + |
| 310 | +Now, you can copy the `output_file_id` to retrieve your batched responses and export it as environment variable: |
| 311 | + |
| 312 | +```bash |
| 313 | +export OUTPUT_FILE_ID=YOUR_OUTPUT_FILE_ID |
| 314 | +``` |
| 315 | + |
| 316 | +The output file ID will only be available once the batch request has completed. If the status is `"in_progress"`, it won’t be set yet. |
| 317 | + |
| 318 | +## Retrieve batched responses |
| 319 | + |
| 320 | +Now, we can download the batched responses from the `/files` endpoint by appending `/content` to the file ID URL. For details, see the [OpenAI API documentation](https://platform.openai.com/docs/api-reference/files/retrieve-contents). |
| 321 | + |
| 322 | +```bash |
| 323 | +curl http://localhost:8000/files/$OUTPUT_FILE_ID/content > batched-response.jsonl |
| 324 | +``` |
| 325 | + |
| 326 | +This command saves the batched responses to the `batched-response.jsonl` file. |
| 327 | + |
| 328 | +The batched response file contains one JSON object per line, each representing a single batched request's response. Here is an example of content from `batched-response.jsonl` which contains the individual completion results for each request we submitted in the batch input file: |
| 329 | + |
| 330 | + |
| 331 | +```json |
| 332 | +{"custom_id": "prod4", "response": {"body": {"id": "chatcmpl-AB12CD34EF56GH78IJ90KL12MN", "object": "chat.completion", "created": 1761909664, "model": "gpt-4o-2024-11-20", "choices": [{"index": 0, "message": {"role": "assistant", "content": "**EcoFlow Smart Shower Head: Revolutionize Your Daily Routine While Saving Water**\n\nExperience the perfect blend of luxury, sustainability, and smart technology with the **EcoFlow Smart Shower Head** — a cutting-edge solution for modern households looking to conserve water without compromising on comfort. Designed to elevate your shower experience", "refusal": null, "annotations": []}, "finish_reason": "length", "logprobs": null}], "usage": {"completion_tokens": 60, "prompt_tokens": 30, "total_tokens": 90}, "system_fingerprint": "fp_random1234"},"request_id": "req-111aaa22-bb33-cc44-dd55-ee66ff778899", "status_code": 200}, "error": null} |
| 333 | +{"custom_id": "prod3", "response": {"body": {"id": "chatcmpl-ZX98YW76VU54TS32RQ10PO98LK", "object": "chat.completion", "created": 1761909664, "model": "gpt-4o-2024-11-20", "choices": [{"index": 0, "message": {"role": "assistant", "content": "**Eco-Friendly Elegance: Biodegradable Bamboo Kitchen Utensil Set**\n\nElevate your cooking experience while making a positive impact on the planet with our **Biodegradable Bamboo Kitchen Utensil Set**. Crafted from 100% natural, sustainably sourced bamboo, this set combines durability, functionality", "refusal": null, "annotations": []}, "finish_reason": "length", "logprobs": null}], "usage": {"completion_tokens": 60, "prompt_tokens": 31, "total_tokens": 91}, "system_fingerprint": "fp_random1234"},"request_id": "req-222bbb33-cc44-dd55-ee66-ff7788990011", "status_code": 200}, "error": null} |
| 334 | +{"custom_id": "prod1", "response": {"body": {"id": "chatcmpl-MN34OP56QR78ST90UV12WX34YZ", "object": "chat.completion", "created": 1761909664, "model": "gpt-4o-2024-11-20", "choices": [{"index": 0, "message": {"role": "assistant", "content": "**Illuminate Your Garden with Brilliance: The Solar-Powered Smart Garden Light** \n\nTransform your outdoor space into a haven of sustainable beauty with the **Solar-Powered Smart Garden Light**—a perfect blend of modern innovation and eco-friendly design. Powered entirely by the sun, this smart light delivers effortless", "refusal": null, "annotations": []}, "finish_reason": "length", "logprobs": null}], "usage": {"completion_tokens": 60, "prompt_tokens": 30, "total_tokens": 90}, "system_fingerprint": "fp_random1234"},"request_id": "req-333ccc44-dd55-ee66-ff77-889900112233", "status_code": 200}, "error": null} |
| 335 | +{"custom_id": "prod5", "response": {"body": {"id": "chatcmpl-AQ12WS34ED56RF78TG90HY12UJ", "object": "chat.completion", "created": 1761909664, "model": "gpt-4o-2024-11-20", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Breathe easy with our compact indoor air purifier, designed to deliver fresh and clean air using natural filters. This eco-friendly purifier quietly removes allergens, dust, and odors without synthetic materials, making it perfect for any small space. Stylish, efficient, and sustainable—experience pure air, naturally.", "refusal": null, "annotations": []}, "finish_reason": "stop", "logprobs": null}], "usage": {"completion_tokens": 59, "prompt_tokens": 33, "total_tokens": 92}, "system_fingerprint": "fp_random1234"},"request_id": "req-444ddd55-ee66-ff77-8899-001122334455", "status_code": 200}, "error": null} |
| 336 | +{"custom_id": "prod2", "response": {"body": {"id": "chatcmpl-PO98LK76JI54HG32FE10DC98VB", "object": "chat.completion", "created": 1761909664, "model": "gpt-4o-2024-11-20", "choices": [{"index": 0, "message": {"role": "assistant", "content": "**EcoSmart Pro Wi-Fi Thermostat: Energy Efficiency Meets Smart Technology** \n\nUpgrade your home’s comfort and save energy with the EcoSmart Pro Wi-Fi Thermostat. Designed for modern living, this sleek and intuitive thermostat lets you take control of your heating and cooling while minimizing energy waste. Whether you're", "refusal": null, "annotations": []}, "finish_reason": "length", "logprobs": null}], "usage": {"completion_tokens": 60, "prompt_tokens": 31, "total_tokens": 91}, "system_fingerprint": "fp_random1234"},"request_id": "req-555eee66-ff77-8899-0011-223344556677", "status_code": 200}, "error": null} |
| 337 | +``` |
0 commit comments