Skip to content

Commit dcacc45

Browse files
Merge pull request #4062 from mrbullwinkle/mrb_04_11_2025_batch_fail_fast
[Azure OpenAI] Batch fail fast
2 parents 1f6b52c + d99f0a8 commit dcacc45

File tree

2 files changed

+129
-2
lines changed

2 files changed

+129
-2
lines changed

articles/ai-services/openai/how-to/batch.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ manager: nitinme
66
ms.service: azure-ai-openai
77
ms.custom: references_regions
88
ms.topic: how-to
9-
ms.date: 01/14/2025
9+
ms.date: 04/14/2025
1010
author: mrbullwinkle
1111
ms.author: mbullwin
1212
recommendations: false
@@ -33,6 +33,11 @@ Key use cases include:
3333

3434
* **Marketing and Personalization:** Generate personalized content and recommendations at scale.
3535

36+
> [!TIP]
37+
> If your batch jobs are so large that you are hitting the enqueued token limit even after maxing out the quota for your deployment, certain regions now support a new feature that allows you to queue multiple batch jobs with exponential backoff.
38+
>
39+
>Once your enqueued token quota is available, the next batch job can be created and kicked off automatically.To learn more, see [**automating retries of large batch jobs with exponential backoff**](#queueing-batch-jobs).
40+
3641
> [!IMPORTANT]
3742
> We aim to process batch requests within 24 hours; we don't expire the jobs that take longer. You can [cancel](#cancel-batch) the job anytime. When you cancel the job, any remaining work is cancelled and any already completed work is returned. You'll be charged for any completed work.
3843
>

articles/ai-services/openai/includes/batch/batch-python.md

Lines changed: 123 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,8 @@ print(batch_response.model_dump_json(indent=2))
201201
}
202202
```
203203

204+
If your batch jobs are so large that you are hitting the enqueued token limit even after maxing out the quota for your deployment, certain regions now support a new [fail fast](#queueing-batch-jobs) feature that allows you to queue multiple batch jobs with exponential backoff so once one large batch job completes the next can be kicked off automatically. To learn more about what regions support this feature and how to adapt your code to take advantage of it, see [queuing batch jobs](#queueing-batch-jobs).
205+
204206
## Track batch job progress
205207

206208
Once you have created batch job successfully you can monitor its progress either in the Studio or programatically. When checking batch job progress we recommend waiting at least 60 seconds in between each status call.
@@ -620,4 +622,124 @@ else:
620622
"has_more": false,
621623
"last_id": "batch_6287485f-50fc-4efa-bcc5-b86690037f43"
622624
}
623-
```
625+
```
626+
627+
## Queueing batch jobs
628+
629+
If your batch jobs are so large that you are hitting the enqueued token limit even after maxing out the quota for your deployment, certain regions now support a new fail fast feature that allows you to queue multiple batch jobs with exponential backoff. Once one large batch job completes and your enqueued token quota is once again available, the next batch job can be created and kicked off automatically.
630+
631+
**Old behavior:**
632+
633+
1. Large Batch job/s already running and using all available tokens for your deployment.
634+
2. New batch job submitted.
635+
3. New batch job goes into validation phase which can last up to a few minutes.
636+
4. Token count for new job is checked against currently available quota.
637+
5. New batch job fails with error reporting token limit exceeded.
638+
639+
**New behavior:**
640+
641+
1. Large Batch job/s already running and using all available tokens for your deployment
642+
2. New batch job submitted
643+
3. Approximate token count of new job immediately compared against currently available batch quota job fails fast allowing you to more easily handle retries programmatically.
644+
645+
### Region support
646+
647+
The following regions support the new fail fast behavior:
648+
649+
- australiaeast
650+
- eastus
651+
- germanywestcentral
652+
- italynorth
653+
- northcentralus
654+
- polandcentral
655+
- swedencentral
656+
- eastus2
657+
- westus
658+
659+
The code below demonstrates the basic mechanics of handling the fail fast behavior to allow automating retries and batch job queuing with exponential backoff.
660+
661+
Depending on the size of your batch jobs you may need to greatly increase the `max_retries` or alter this example further.
662+
663+
```python
664+
import time
665+
from openai import BadRequestError
666+
667+
max_retries = 10
668+
retries = 0
669+
initial_delay = 5
670+
delay = initial_delay
671+
672+
while True:
673+
try:
674+
batch_response = client.batches.create(
675+
input_file_id=file_id,
676+
endpoint="/chat/completions",
677+
completion_window="24h",
678+
)
679+
680+
# Save batch ID for later use
681+
batch_id = batch_response.id
682+
683+
print(f"✅ Batch created successfully after {retries} retries")
684+
print(batch_response.model_dump_json(indent=2))
685+
break
686+
687+
except BadRequestError as e:
688+
error_message = str(e)
689+
690+
# Check if it's a token limit error
691+
if 'token_limit_exceeded' in error_message:
692+
retries += 1
693+
if retries >= max_retries:
694+
print(f"❌ Maximum retries ({max_retries}) reached. Giving up.")
695+
raise
696+
697+
print(f"⏳ Token limit exceeded. Waiting {delay} seconds before retry {retries}/{max_retries}...")
698+
time.sleep(delay)
699+
700+
# Exponential backoff - increase delay for next attempt
701+
delay *= 2
702+
else:
703+
# If it's a different error, raise it immediately
704+
print(f"❌ Encountered non-token limit error: {error_message}")
705+
raise
706+
```
707+
708+
**Output:**
709+
710+
```console
711+
⏳ Token limit exceeded. Waiting 5 seconds before retry 1/10...
712+
⏳ Token limit exceeded. Waiting 10 seconds before retry 2/10...
713+
⏳ Token limit exceeded. Waiting 20 seconds before retry 3/10...
714+
⏳ Token limit exceeded. Waiting 40 seconds before retry 4/10...
715+
⏳ Token limit exceeded. Waiting 80 seconds before retry 5/10...
716+
⏳ Token limit exceeded. Waiting 160 seconds before retry 6/10...
717+
⏳ Token limit exceeded. Waiting 320 seconds before retry 7/10...
718+
✅ Batch created successfully after 7 retries
719+
{
720+
"id": "batch_1e1e7b9f-d4b4-41fa-bd2e-8d2ec50fb8cc",
721+
"completion_window": "24h",
722+
"created_at": 1744402048,
723+
"endpoint": "/chat/completions",
724+
"input_file_id": "file-e2ba4ccaa4a348e0976c6fe3c018ea92",
725+
"object": "batch",
726+
"status": "validating",
727+
"cancelled_at": null,
728+
"cancelling_at": null,
729+
"completed_at": null,
730+
"error_file_id": "",
731+
"errors": null,
732+
"expired_at": null,
733+
"expires_at": 1744488444,
734+
"failed_at": null,
735+
"finalizing_at": null,
736+
"in_progress_at": null,
737+
"metadata": null,
738+
"output_file_id": "",
739+
"request_counts": {
740+
"completed": 0,
741+
"failed": 0,
742+
"total": 0
743+
}
744+
}
745+
```

0 commit comments

Comments
 (0)