Skip to content

Commit 45e8b68

Browse files
committed
Merge branch 'main' into release-openai-audio-models
2 parents ab6fb05 + 05f6b58 commit 45e8b68

22 files changed

+183
-79
lines changed

articles/ai-foundry/concepts/model-lifecycle-retirement.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,12 @@ The following tables list the timelines for models that are on track for retirem
8080
| [Cohere-rerank-v3-english](https://ai.azure.com/explore/models/Cohere-rerank-v3-english/version/1/registry/azureml-cohere) | February 28, 2025 | March 31, 2025 | June 30, 2025 | [Cohere-rerank-v3.5-english](https://ai.azure.com/explore/models/Cohere-rerank-v3.5/version/1/registry/azureml-cohere) |
8181
| [Cohere-rerank-v3-multilingual](https://ai.azure.com/explore/models/Cohere-rerank-v3-multilingual/version/1/registry/azureml-cohere) | February 28, 2025 | March 31, 2025 | June 30, 2025 | [Cohere-rerank-v3.5-multilingual](https://ai.azure.com/explore/models/Cohere-rerank-v3.5/version/1/registry/azureml-cohere) |
8282

83+
#### DeepSeek
84+
85+
| Model | Legacy date (UTC) | Deprecation date (UTC) | Retirement date (UTC) | Suggested replacement model |
86+
|-------|-------------------|------------------------|-----------------------|-----------------------------|
87+
| [DeepSeek-V3](https://aka.ms/azureai/landing/DeepSeek-V3) | April 10, 2025 | May 31, 2025 | August 31, 2025 | [DeepSeek-V3-0324](https://aka.ms/azureai/landing/DeepSeek-V3-0324) |
88+
8389
#### Meta
8490

8591
| Model | Legacy date (UTC) | Deprecation date (UTC) | Retirement date (UTC) | Suggested replacement model |

articles/ai-foundry/concepts/models-featured.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -141,11 +141,12 @@ For more examples of how to use Jais models, see the following examples:
141141

142142
## DeepSeek
143143

144-
DeepSeek family of models includes DeepSeek-R1, which excels at reasoning tasks using a step-by-step training process, such as language, scientific reasoning, and coding tasks, and DeepSeek-V3, a Mixture-of-Experts (MoE) language model.
144+
DeepSeek family of models includes DeepSeek-R1, which excels at reasoning tasks using a step-by-step training process, such as language, scientific reasoning, and coding tasks, DeepSeek-V3-0324, a Mixture-of-Experts (MoE) language model, and more.
145145

146146
| Model | Type | Capabilities |
147147
| ------ | ---- | --- |
148-
| [DeepSeek-V3](https://ai.azure.com/explore/models/deepseek-v3/version/1/registry/azureml-deepseek) | [chat-completion](../model-inference/how-to/use-chat-completions.md?context=/azure/ai-foundry/context/context) | - **Input:** text (131,072 tokens) <br /> - **Output:** text (131,072 tokens) <br /> - **Tool calling:** No <br /> - **Response formats:** Text, JSON |
148+
| [DeekSeek-V3-0324](https://ai.azure.com/explore/models/deepseek-v3-0324/version/1/registry/azureml-deepseek) | [chat-completion](../model-inference/how-to/use-chat-completions.md?context=/azure/ai-foundry/context/context) | - **Input:** text (131,072 tokens) <br /> - **Output:** (131,072 tokens) <br /> - **Tool calling:** No <br /> - **Response formats:** Text, JSON |
149+
| [DeepSeek-V3](https://ai.azure.com/explore/models/deepseek-v3/version/1/registry/azureml-deepseek) <br />(Legacy) | [chat-completion](../model-inference/how-to/use-chat-completions.md?context=/azure/ai-foundry/context/context) | - **Input:** text (131,072 tokens) <br /> - **Output:** text (131,072 tokens) <br /> - **Tool calling:** No <br /> - **Response formats:** Text, JSON |
149150
| [DeepSeek-R1](https://ai.azure.com/explore/models/deepseek-r1/version/1/registry/azureml-deepseek) | [chat-completion with reasoning content](../model-inference/how-to/use-chat-reasoning.md?context=/azure/ai-foundry/context/context) | - **Input:** text (163,840 tokens) <br /> - **Output:** text (163,840 tokens) <br /> - **Tool calling:** No <br /> - **Response formats:** Text. |
150151

151152
For a tutorial on DeepSeek-R1, see [Tutorial: Get started with DeepSeek-R1 reasoning model in Azure AI model inference](../model-inference/tutorials/get-started-deepseek-r1.md?context=/azure/ai-foundry/context/context).

articles/ai-foundry/includes/region-availability-maas.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ Cohere Embed v3 - Multilingual | [Microsoft Managed Countries/Regions](/par
4141

4242
| Model | Offer Availability Region | Hub/Project Region for Deployment | Hub/Project Region for Fine tuning |
4343
|---------|---------|---------|---------|
44+
DeepSeek-V3-0324 | Not applicable | East US <br> East US 2 <br> North Central US <br> South Central US <br> West US <br> West US 3 | Not available |
4445
DeepSeek-V3 | Not applicable | East US <br> East US 2 <br> North Central US <br> South Central US <br> West US <br> West US 3 | Not available |
4546
DeepSeek-R1 | Not applicable | East US <br> East US 2 <br> North Central US <br> South Central US <br> West US <br> West US 3 | Not available |
4647

articles/ai-foundry/model-inference/concepts/models.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,8 @@ DeepSeek family of models includes DeepSeek-R1, which excels at reasoning tasks
110110
| Model | Type | Tier | Capabilities |
111111
| ------ | ---- | --- | ------------ |
112112
| [DeekSeek-R1](https://ai.azure.com/explore/models/deepseek-r1/version/1/registry/azureml-deepseek) | chat-completion <br /> [(with reasoning content)](../how-to/use-chat-reasoning.md) | Global standard | - **Input:** text (163,840 tokens) <br /> - **Output:** (163,840 tokens) <br /> - **Languages:** `en` and `zh` <br /> - **Tool calling:** No <br /> - **Response formats:** Text. |
113-
| [DeekSeek-V3](https://ai.azure.com/explore/models/deepseek-v3/version/1/registry/azureml-deepseek) | chat-completion | Global standard | - **Input:** text (131,072 tokens) <br /> - **Output:** (131,072 tokens) <br /> - **Languages:** `en` and `zh` <br /> - **Tool calling:** No <br /> - **Response formats:** Text, JSON |
113+
| [DeekSeek-V3](https://ai.azure.com/explore/models/deepseek-v3/version/1/registry/azureml-deepseek) <br />(Legacy) | chat-completion | Global standard | - **Input:** text (131,072 tokens) <br /> - **Output:** (131,072 tokens) <br /> - **Languages:** `en` and `zh` <br /> - **Tool calling:** No <br /> - **Response formats:** Text, JSON |
114+
| [DeekSeek-V3-0324](https://ai.azure.com/explore/models/deepseek-v3-0324/version/1/registry/azureml-deepseek) | chat-completion | Global standard | - **Input:** text (131,072 tokens) <br /> - **Output:** (131,072 tokens) <br /> - **Languages:** `en` and `zh` <br /> - **Tool calling:** No <br /> - **Response formats:** Text, JSON |
114115

115116
For a tutorial on DeepSeek-R1, see [Tutorial: Get started with DeepSeek-R1 reasoning model in Azure AI model inference](../tutorials/get-started-deepseek-r1.md).
116117

articles/ai-foundry/model-inference/quotas-limits.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,9 @@ Azure uses quotas and limits to prevent budget overruns due to fraud, and to hon
3232
| -------------------- | ------------------- | ----------- |
3333
| Tokens per minute | Azure OpenAI models | Varies per model and SKU. See [limits for Azure OpenAI](../../ai-services/openai/quotas-limits.md). |
3434
| Requests per minute | Azure OpenAI models | Varies per model and SKU. See [limits for Azure OpenAI](../../ai-services/openai/quotas-limits.md). |
35-
| Tokens per minute | DeepSeek-R1 | 5,000,000 |
36-
| Requests per minute | DeepSeek-R1 | 5,000 |
37-
| Concurrent requests | DeepSeek-R1 | 300 |
35+
| Tokens per minute | DeepSeek-R1<br />DeepSeek-V3-0324 | 5,000,000 |
36+
| Requests per minute | DeepSeek-R1<br />DeepSeek-V3-0324 | 5,000 |
37+
| Concurrent requests | DeepSeek-R1<br />DeepSeek-V3-0324 | 300 |
3838
| Tokens per minute | Rest of models | 400,000 |
3939
| Requests per minute | Rest of models | 1,000 |
4040
| Concurrent requests | Rest of models | 300 |

articles/ai-services/openai/includes/batch/batch-python.md

Lines changed: 34 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ The `custom_id` is required to allow you to identify which individual batch requ
7575
7676
### Create input file
7777

78-
For this article we'll create a file named `test.jsonl` and will copy the contents from standard input code block above to the file. You will need to modify and add your global batch deployment name to each line of the file. Save this file in the same directory that you're executing your Jupyter Notebook.
78+
For this article we'll create a file named `test.jsonl` and will copy the contents from standard input code block above to the file. You'll need to modify and add your global batch deployment name to each line of the file. Save this file in the same directory that you're executing your Jupyter Notebook.
7979

8080
## Upload batch file
8181

@@ -101,10 +101,15 @@ client = AzureOpenAI(
101101
# Upload a file with a purpose of "batch"
102102
file = client.files.create(
103103
file=open("test.jsonl", "rb"),
104-
purpose="batch"
104+
purpose="batch",
105+
#extra_body={"expires_after":{"seconds": 1209600, "anchor": "created_at"}} # Optional you can set to a number between 1209600-2592000. This is equivalent to 14-30 days
105106
)
106107

108+
107109
print(file.model_dump_json(indent=2))
110+
111+
#print(f"File expiration: {datetime.fromtimestamp(file.expires_at) if file.expires_at is not None else 'Not set'}")
112+
108113
file_id = file.id
109114
```
110115

@@ -125,30 +130,41 @@ client = AzureOpenAI(
125130
# Upload a file with a purpose of "batch"
126131
file = client.files.create(
127132
file=open("test.jsonl", "rb"),
128-
purpose="batch"
133+
purpose="batch",
134+
#extra_body={"expires_after":{"seconds": 1209600, "anchor": "created_at"}} # Optional you can set to a number between 1209600-2592000. This is equivalent to 14-30 days
129135
)
130136

137+
131138
print(file.model_dump_json(indent=2))
139+
140+
#print(f"File expiration: {datetime.fromtimestamp(file.expires_at) if file.expires_at is not None else 'Not set'}")
141+
132142
file_id = file.id
133143
```
134144

135145
---
136146

147+
By uncommenting and adding `extra_body={"expires_after":{"seconds": 1209600, "anchor": "created_at"}}` you're setting our upload file to expire in 14 days. There's a max limit of 500 batch files per resource when no expiration is set. By setting a value for expiration the number of batch files per resource is increased to 10,000 files per resource. This feature isn't currently available in all regions. Output when file upload expiration is set:
148+
137149
**Output:**
138150

139151
```json
140152
{
141-
"id": "file-9f3a81d899b4442f98b640e4bc3535dd",
142-
"bytes": 815,
143-
"created_at": 1722476551,
153+
"id": "file-655111ec9cfc44489d9af078f08116ef",
154+
"bytes": 176064,
155+
"created_at": 1743391067,
144156
"filename": "test.jsonl",
145157
"object": "file",
146158
"purpose": "batch",
147-
"status": null,
159+
"status": "processed",
160+
"expires_at": 1744600667,
148161
"status_details": null
149162
}
163+
File expiration: 2025-04-13 23:17:47
150164
```
151165

166+
167+
152168
## Create batch job
153169

154170
Once your file has uploaded successfully you can submit the file for batch processing.
@@ -159,16 +175,21 @@ batch_response = client.batches.create(
159175
input_file_id=file_id,
160176
endpoint="/chat/completions",
161177
completion_window="24h",
178+
#extra_body={"output_expires_after":{"seconds": 1209600, "anchor": "created_at"}} # Optional you can set to a number between 1209600-2592000. This is equivalent to 14-30 days
162179
)
163180

181+
164182
# Save batch ID for later use
165183
batch_id = batch_response.id
166184

167185
print(batch_response.model_dump_json(indent=2))
186+
168187
```
169188

189+
The default 500 max file limit per resource also applies to output files. Here you can uncomment this line to add `extra_body={"output_expires_after":{"seconds": 1209600, "anchor": "created_at"}}` so that your output files expire in 14 days. By setting a value for expiration the number of batch files per resource is increased to 10,000 files per resource. This feature isn't currently available in all regions.
190+
170191
> [!NOTE]
171-
> Currently the completion window must be set to 24h. If you set any other value than 24h your job will fail. Jobs taking longer than 24 hours will continue to execute until canceled.
192+
> Currently the completion window must be set to `24h`. If you set any other value than `24h` your job will fail. Jobs taking longer than 24 hours will continue to execute until canceled.
172193
173194
**Output:**
174195

@@ -178,7 +199,7 @@ print(batch_response.model_dump_json(indent=2))
178199
"completion_window": "24h",
179200
"created_at": 1722476583,
180201
"endpoint": null,
181-
"input_file_id": "file-9f3a81d899b4442f98b640e4bc3535dd",
202+
"input_file_id": "file-655111ec9cfc44489d9af078f08116ef",
182203
"object": "batch",
183204
"status": "validating",
184205
"cancelled_at": null,
@@ -201,7 +222,7 @@ print(batch_response.model_dump_json(indent=2))
201222
}
202223
```
203224

204-
If your batch jobs are so large that you are hitting the enqueued token limit even after maxing out the quota for your deployment, certain regions now support a new [fail fast](#queueing-batch-jobs) feature that allows you to queue multiple batch jobs with exponential backoff so once one large batch job completes the next can be kicked off automatically. To learn more about what regions support this feature and how to adapt your code to take advantage of it, see [queuing batch jobs](#queueing-batch-jobs).
225+
If your batch jobs are so large that you're hitting the enqueued token limit even after maxing out the quota for your deployment, certain regions now support a new [fail fast](#queueing-batch-jobs) feature that allows you to queue multiple batch jobs with exponential backoff so once one large batch job completes the next can be kicked off automatically. To learn more about what regions support this feature and how to adapt your code to take advantage of it, see [queuing batch jobs](#queueing-batch-jobs).
205226

206227
## Track batch job progress
207228

@@ -311,7 +332,7 @@ if output_file_id:
311332

312333
**Output:**
313334

314-
For brevity, we are only including a single chat completion response of output. If you follow the steps in this article you should have three responses similar to the one below:
335+
For brevity, we're only including a single chat completion response of output. If you follow the steps in this article you should have three responses similar to the one below:
315336

316337
```json
317338
{
@@ -429,7 +450,7 @@ print(all_jobs)
429450

430451
Use the REST API to list all batch jobs with additional sorting/filtering options.
431452

432-
In the examples below we are providing the `generate_time_filter` function to make constructing the filter easier. If you don't wish to use this function the format of the filter string would look like `created_at gt 1728860560 and status eq 'Completed'`.
453+
In the examples below we're providing the `generate_time_filter` function to make constructing the filter easier. If you don't wish to use this function the format of the filter string would look like `created_at gt 1728860560 and status eq 'Completed'`.
433454

434455
# [Python (Microsoft Entra ID)](#tab/python-secure)
435456

@@ -626,7 +647,7 @@ else:
626647

627648
## Queueing batch jobs
628649

629-
If your batch jobs are so large that you are hitting the enqueued token limit even after maxing out the quota for your deployment, certain regions now support a new fail fast feature that allows you to queue multiple batch jobs with exponential backoff. Once one large batch job completes and your enqueued token quota is once again available, the next batch job can be created and kicked off automatically.
650+
If your batch jobs are so large that you're hitting the enqueued token limit even after maxing out the quota for your deployment, certain regions now support a new fail fast feature that allows you to queue multiple batch jobs with exponential backoff. Once one large batch job completes and your enqueued token quota is once again available, the next batch job can be created and kicked off automatically.
630651

631652
**Old behavior:**
632653

articles/ai-services/openai/includes/batch/batch-rest.md

Lines changed: 26 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ The `custom_id` is required to allow you to identify which individual batch requ
6565

6666
### Create input file
6767

68-
For this article we'll create a file named `test.jsonl` and will copy the contents from standard input code block above to the file. You will need to modify and add your global batch deployment name to each line of the file.
68+
For this article we'll create a file named `test.jsonl` and will copy the contents from standard input code block above to the file. You'll need to modify and add your global batch deployment name to each line of the file.
6969

7070
## Upload batch file
7171

@@ -78,21 +78,29 @@ curl -X POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/files?api-versio
7878
-H "Content-Type: multipart/form-data" \
7979
-H "api-key: $AZURE_OPENAI_API_KEY" \
8080
-F "purpose=batch" \
81-
-F "file=@C:\\batch\\test.jsonl;type=application/json"
81+
-F "file=@C:\\batch\\test.jsonl;type=application/json" \
82+
-F "expires_after.seconds=1209600" \
83+
-F "expires_after.anchor=created_at"
84+
8285
```
8386

84-
The above code assumes a particular file path for your test.jsonl file. Adjust this file path as necessary for your local system.
87+
The above code assumes a particular file path for your test.jsonl file. Adjust this file path as necessary for your local system.
88+
89+
By adding the optional `"expires_after.seconds=1209600"` and `"expires_after.anchor=created_at"` parameters you're setting your upload file to expire in 14 days. There's a max limit of 500 batch files per resource when no expiration is set. By setting a value for expiration the number of batch files per resource is increased to 10,000 files per resource. You can set to a number between 1209600-2592000. This is equivalent to 14-30 days. This feature isn't currently available in all regions.
90+
91+
8592

8693
**Output:**
8794

8895
```json
8996
{
90-
"status": "pending",
91-
"bytes": 686,
97+
"status": "processed",
98+
"bytes": 817,
9299
"purpose": "batch",
93100
"filename": "test.jsonl",
94-
"id": "file-21006e70789246658b86a1fc205899a4",
95-
"created_at": 1721408291,
101+
"expires_at": 1744607747,
102+
"id": "file-7733bc35e32841e297a62a9ee50b3461",
103+
"created_at": 1743398147,
96104
"object": "file"
97105
}
98106

@@ -116,7 +124,8 @@ curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/files/{file-id}?api-vers
116124
"bytes": 686,
117125
"purpose": "batch",
118126
"filename": "test.jsonl",
119-
"id": "file-21006e70789246658b86a1fc205899a4",
127+
"expires_at": 1744607747,
128+
"id": "file-7733bc35e32841e297a62a9ee50b3461",
120129
"created_at": 1721408291,
121130
"object": "file"
122131
}
@@ -134,12 +143,18 @@ curl -X POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/batches?api-vers
134143
-d '{
135144
"input_file_id": "file-abc123",
136145
"endpoint": "/chat/completions",
137-
"completion_window": "24h"
146+
"completion_window": "24h",
147+
"output_expires_after": {
148+
"seconds": 1209600
149+
},
150+
"anchor": "created_at"
138151
}'
139152
```
140153

154+
The default 500 max file limit per resource also applies to output files. Here you can optionally add `"output_expires_after":{"seconds": 1209600},` and `"anchor": "created_at"` so that your output files expire in 14 days. By setting a value for expiration the number of batch files per resource is increased to 10,000 files per resource. The file expiration feature is currently not available in all regions.
155+
141156
> [!NOTE]
142-
> Currently the completion window must be set to 24h. If you set any other value than 24h your job will fail. Jobs taking longer than 24 hours will continue to execute until canceled.
157+
> Currently the completion window must be set to `24h`. If you set any other value than `24h` your job will fail. Jobs taking longer than 24 hours will continue to execute until canceled.
143158
144159
**Output:**
145160

@@ -221,7 +236,7 @@ The following status values are possible:
221236
| `in_progress`|The input file was successfully validated and the batch is currently running. |
222237
| `finalizing`|The batch has completed and the results are being prepared. |
223238
| `completed`|The batch has been completed and the results are ready. |
224-
| `expired`|The batch was not able to be completed within the 24-hour time window.|
239+
| `expired`|The batch wasn't able to be completed within the 24-hour time window.|
225240
| `cancelling`|The batch is being `cancelled` (This can take up to 10 minutes to go into effect.) |
226241
| `cancelled`|the batch was `cancelled`.|
227242

0 commit comments

Comments
 (0)