You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/batch.md
+6-1Lines changed: 6 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ manager: nitinme
6
6
ms.service: azure-ai-openai
7
7
ms.custom: references_regions
8
8
ms.topic: how-to
9
-
ms.date: 01/14/2025
9
+
ms.date: 04/14/2025
10
10
author: mrbullwinkle
11
11
ms.author: mbullwin
12
12
recommendations: false
@@ -33,6 +33,11 @@ Key use cases include:
33
33
34
34
***Marketing and Personalization:** Generate personalized content and recommendations at scale.
35
35
36
+
> [!TIP]
37
+
> If your batch jobs are so large that you are hitting the enqueued token limit even after maxing out the quota for your deployment, certain regions now support a new feature that allows you to queue multiple batch jobs with exponential backoff.
38
+
>
39
+
>Once your enqueued token quota is available, the next batch job can be created and kicked off automatically.To learn more, see [**automating retries of large batch jobs with exponential backoff**](#queueing-batch-jobs).
40
+
36
41
> [!IMPORTANT]
37
42
> We aim to process batch requests within 24 hours; we don't expire the jobs that take longer. You can [cancel](#cancel-batch) the job anytime. When you cancel the job, any remaining work is cancelled and any already completed work is returned. You'll be charged for any completed work.
If your batch jobs are so large that you are hitting the enqueued token limit even after maxing out the quota for your deployment, certain regions now support a new [fail fast](#queueing-batch-jobs) feature that allows you to queue multiple batch jobs with exponential backoff so once one large batch job completes the next can be kicked off automatically. To learn more about what regions support this feature and how to adapt your code to take advantage of it, see [queuing batch jobs](#queueing-batch-jobs).
205
+
204
206
## Track batch job progress
205
207
206
208
Once you have created batch job successfully you can monitor its progress either in the Studio or programatically. When checking batch job progress we recommend waiting at least 60 seconds in between each status call.
If your batch jobs are so large that you are hitting the enqueued token limit even after maxing out the quota for your deployment, certain regions now support a new fail fast feature that allows you to queue multiple batch jobs with exponential backoff. Once one large batch job completes and your enqueued token quota is once again available, the next batch job can be created and kicked off automatically.
630
+
631
+
**Old behavior:**
632
+
633
+
1. Large Batch job/s already running and using all available tokens for your deployment.
634
+
2. New batch job submitted.
635
+
3. New batch job goes into validation phase which can last up to a few minutes.
636
+
4. Token count for new job is checked against currently available quota.
637
+
5. New batch job fails with error reporting token limit exceeded.
638
+
639
+
**New behavior:**
640
+
641
+
1. Large Batch job/s already running and using all available tokens for your deployment
642
+
2. New batch job submitted
643
+
3. Approximate token count of new job immediately compared against currently available batch quota job fails fast allowing you to more easily handle retries programmatically.
644
+
645
+
### Region support
646
+
647
+
The following regions support the new fail fast behavior:
648
+
649
+
- australiaeast
650
+
- eastus
651
+
- germanywestcentral
652
+
- italynorth
653
+
- northcentralus
654
+
- polandcentral
655
+
- swedencentral
656
+
- eastus2
657
+
- westus
658
+
659
+
The code below demonstrates the basic mechanics of handling the fail fast behavior to allow automating retries and batch job queuing with exponential backoff.
660
+
661
+
Depending on the size of your batch jobs you may need to greatly increase the `max_retries` or alter this example further.
662
+
663
+
```python
664
+
import time
665
+
from openai import BadRequestError
666
+
667
+
max_retries =10
668
+
retries =0
669
+
initial_delay =5
670
+
delay = initial_delay
671
+
672
+
whileTrue:
673
+
try:
674
+
batch_response = client.batches.create(
675
+
input_file_id=file_id,
676
+
endpoint="/chat/completions",
677
+
completion_window="24h",
678
+
)
679
+
680
+
# Save batch ID for later use
681
+
batch_id = batch_response.id
682
+
683
+
print(f"✅ Batch created successfully after {retries} retries")
684
+
print(batch_response.model_dump_json(indent=2))
685
+
break
686
+
687
+
except BadRequestError as e:
688
+
error_message =str(e)
689
+
690
+
# Check if it's a token limit error
691
+
if'token_limit_exceeded'in error_message:
692
+
retries +=1
693
+
if retries >= max_retries:
694
+
print(f"❌ Maximum retries ({max_retries}) reached. Giving up.")
695
+
raise
696
+
697
+
print(f"⏳ Token limit exceeded. Waiting {delay} seconds before retry {retries}/{max_retries}...")
698
+
time.sleep(delay)
699
+
700
+
# Exponential backoff - increase delay for next attempt
0 commit comments