Skip to content

Commit 0a1443c

Browse files
committed
update
2 parents 42aaef1 + 8b0b3ab commit 0a1443c

28 files changed

+825
-617
lines changed

articles/ai-foundry/how-to/develop/cloud-evaluation.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -243,6 +243,18 @@ print("Versioned evaluator id:", registered_evaluator.id)
243243

244244
After you log your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under the **Evaluation** tab of your Azure AI project.
245245

246+
### Troubleshooting: Job Stuck in Running State
247+
248+
If your evaluation job remains in the **Running** state for an extended period when using Azure AI Foundry Project or Hub, this may be because the Azure OpenAI model you selected does not have enough capacity.
249+
250+
**Resolution**
251+
252+
Cancel the current evaluation job.
253+
254+
Increase the model capacity to handle larger input data.
255+
256+
Re-run the evaluation.
257+
246258
## Related content
247259

248260
- [Evaluate your generative AI applications locally](./evaluate-sdk.md)

articles/ai-foundry/openai/api-version-lifecycle.md

Lines changed: 140 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ manager: nitinme
66
ms.service: azure-ai-foundry
77
ms.subservice: azure-ai-foundry-openai
88
ms.topic: conceptual
9-
ms.date: 09/05/2025
9+
ms.date: 10/01/2025
1010
author: mrbullwinkle
1111
ms.author: mbullwin
1212
recommendations: false
@@ -27,7 +27,7 @@ Previously, Azure OpenAI received monthly updates of new API versions. Taking ad
2727

2828
Starting in August 2025, you can now opt in to our next generation v1 Azure OpenAI APIs which add support for:
2929

30-
- Ongoing access to the latest features with no need specify new `api-version`'s each month.
30+
- Ongoing access to the latest features with no need to specify new `api-version`'s each month.
3131
- Faster API release cycle with new features launching more frequently.
3232
- OpenAI client support with minimal code changes to swap between OpenAI and Azure OpenAI when using key-based authentication.
3333
- OpenAI client support for token based authentication and automatic token refresh without the need to take a dependency on a separate Azure OpenAI client.
@@ -43,29 +43,13 @@ For the initial v1 Generally Available (GA) API launch we're only supporting a s
4343

4444
## Code changes
4545

46-
# [API Key](#tab/key)
46+
# [Python](#tab/python)
4747

48-
### Last generation API
48+
### v1 API
4949

50-
```python
51-
import os
52-
from openai import AzureOpenAI
53-
54-
client = AzureOpenAI(
55-
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
56-
api_version="2025-04-01-preview",
57-
azure_endpoint="https://YOUR-RESOURCE-NAME.openai.azure.com")
58-
)
59-
60-
response = client.responses.create(
61-
model="gpt-4.1-nano", # Replace with your model deployment name
62-
input="This is a test."
63-
)
50+
[Python v1 examples](./supported-languages.md)
6451

65-
print(response.model_dump_json(indent=2))
66-
```
67-
68-
### Next generation API
52+
**API Key**:
6953

7054
```python
7155
import os
@@ -88,33 +72,13 @@ print(response.model_dump_json(indent=2))
8872
- `base_url` passes the Azure OpenAI endpoint and `/openai/v1` is appended to the endpoint address.
8973
- `api-version` is no longer a required parameter with the v1 GA API.
9074

91-
# [Microsoft Entra ID](#tab/entra)
92-
93-
### Last generation API
75+
**API Key** with environment variables set for `OPENAI_BASE_URL` and `OPENAI_API_KEY`:
9476

9577
```python
96-
from openai import AzureOpenAI
97-
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
98-
99-
token_provider = get_bearer_token_provider(
100-
DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
101-
)
102-
103-
client = AzureOpenAI(
104-
azure_endpoint = "https://YOUR-RESOURCE-NAME.openai.azure.com/",
105-
azure_ad_token_provider=token_provider,
106-
api_version="2025-04-01-preview"
107-
)
108-
109-
response = client.responses.create(
110-
model="gpt-4.1-nano", # Replace with your model deployment name
111-
input="This is a test."
112-
)
113-
114-
print(response.model_dump_json(indent=2))
78+
client = OpenAI()
11579
```
11680

117-
### Next generation API
81+
**Microsoft Entra ID**:
11882

11983
> [!IMPORTANT]
12084
> Handling automatic token refresh was previously handled through use of the `AzureOpenAI()` client. The v1 API removes this dependency, by adding automatic token refresh support to the `OpenAI()` client.
@@ -143,40 +107,150 @@ print(response.model_dump_json(indent=2))
143107
- `base_url` passes the Azure OpenAI endpoint and `/openai/v1` is appended to the endpoint address.
144108
- `api_key` parameter is set to `token_provider`, enabling automatic retrieval and refresh of an authentication token instead of using a static API key.
145109

146-
# [REST](#tab/rest)
110+
# [C#](#tab/dotnet)
111+
112+
### v1 API
147113

148-
### Last generation API
114+
[C# v1 examples](./supported-languages.md)
149115

150116
**API Key**:
151117

152-
```bash
153-
curl -X POST https://YOUR-RESOURCE-NAME.openai.azure.com/openai/responses?api-version=2025-04-01-preview \
154-
-H "Content-Type: application/json" \
155-
-H "api-key: $AZURE_OPENAI_API_KEY" \
156-
-d '{
157-
"model": "gpt-4.1-nano",
158-
"input": "This is a test"
159-
}'
118+
```csharp
119+
OpenAIClient client = new(
120+
new ApiKeyCredential("{your-api-key}"),
121+
new OpenAIClientOptions()
122+
{
123+
Endpoint = new("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/"),
124+
})
160125
```
161126

162127
**Microsoft Entra ID**:
163128

164-
```bash
165-
curl -X POST https://YOUR-RESOURCE-NAME.openai.azure.com/openai/responses?api-version=2025-04-01-preview \
166-
-H "Content-Type: application/json" \
167-
-H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN" \
168-
-d '{
169-
"model": "gpt-4.1-nano",
170-
"input": "This is a test"
171-
}'
129+
```csharp
130+
#pragma warning disable OPENAI001
131+
132+
BearerTokenPolicy tokenPolicy = new(
133+
new DefaultAzureCredential(),
134+
"https://cognitiveservices.azure.com/.default");
135+
OpenAIClient client = new(
136+
authenticationPolicy: tokenPolicy,
137+
options: new OpenAIClientOptions()
138+
{
139+
Endpoint = new("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/"),
140+
})
141+
```
142+
143+
# [JavaScript](#tab/javascript)
144+
145+
### v1 API
146+
147+
[JavaScript v1 examples](./supported-languages.md)
148+
149+
**API Key**:
150+
151+
```javascript
152+
const client = new OpenAI({
153+
baseURL: "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
154+
apiKey: "{your-api-key}"
155+
});
156+
```
157+
158+
**API Key** with environment variables set for `OPENAI_BASE_URL` and `OPENAI_API_KEY`:
159+
160+
```javascript
161+
const client = new OpenAI();
162+
```
163+
164+
**Microsoft Entra ID**:
165+
166+
```javascript
167+
const tokenProvider = getBearerTokenProvider(
168+
new DefaultAzureCredential(),
169+
'https://cognitiveservices.azure.com/.default');
170+
const client = new OpenAI({
171+
baseURL: "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
172+
apiKey: tokenProvider
173+
});
174+
```
175+
176+
# [Go](#tab/go)
177+
178+
### v1 API
179+
180+
[Go v1 examples](./supported-languages.md)
181+
182+
**API Key**:
183+
184+
```go
185+
client := openai.NewClient(
186+
option.WithBaseURL("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/"),
187+
option.WithAPIKey("{your-api-key}")
188+
)
189+
```
190+
191+
**API Key** with environment variables set for `OPENAI_BASE_URL` and `OPENAI_API_KEY`:
192+
193+
```go
194+
client := openai.NewClient()
195+
```
196+
197+
198+
**Microsoft Entra ID**:
199+
200+
```go
201+
tokenCredential, err := azidentity.NewDefaultAzureCredential(nil)
202+
203+
client := openai.NewClient(
204+
option.WithBaseURL("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/"),
205+
azure.WithTokenCredential(tokenCredential)
206+
)
172207
```
173208

174-
### Next generation API
209+
# [Java](#tab/Java)
210+
211+
[Java v1 examples](./supported-languages.md)
212+
213+
### v1 API
214+
215+
**API Key**:
216+
217+
```java
218+
219+
OpenAIClient client = OpenAIOkHttpClient.builder()
220+
.baseUrl("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/")
221+
.apiKey(apiKey)
222+
.build();
223+
```
224+
225+
**API Key** with environment variables set for `OPENAI_BASE_URL` and `OPENAI_API_KEY`:
226+
227+
```java
228+
OpenAIClient client = OpenAIOkHttpClient.builder()
229+
.fromEnv()
230+
.build();
231+
```
232+
233+
**Microsoft Entra ID**:
234+
235+
```java
236+
Credential tokenCredential = BearerTokenCredential.create(
237+
AuthenticationUtil.getBearerTokenSupplier(
238+
new DefaultAzureCredentialBuilder().build(),
239+
"https://cognitiveservices.azure.com/.default"));
240+
OpenAIClient client = OpenAIOkHttpClient.builder()
241+
.baseUrl("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/")
242+
.credential(tokenCredential)
243+
.build();
244+
```
245+
246+
# [REST](#tab/rest)
247+
248+
### v1 API
175249

176250
**API Key**:
177251

178252
```bash
179-
curl -X POST https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/responses?api-version=preview \
253+
curl -X POST https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/responses \
180254
-H "Content-Type: application/json" \
181255
-H "api-key: $AZURE_OPENAI_API_KEY" \
182256
-d '{
@@ -188,7 +262,7 @@ curl -X POST https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/responses?api
188262
**Microsoft Entra ID**:
189263

190264
```bash
191-
curl -X POST https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/responses?api-version=preview \
265+
curl -X POST https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/responses \
192266
-H "Content-Type: application/json" \
193267
-H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN" \
194268
-d '{

articles/ai-foundry/openai/azure-government.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,27 @@ To request quota increases for these models, submit a request at [https://aka.ms
5656

5757
<br>
5858

59+
### Model Retirements
60+
In some cases, models are retired in Azure Governmen ahead of dates in the commercial cloud. General information on model retirement policies, dates, and other details can be found at [Azure OpenAI in Azure AI Foundry model deprecations and retirements](/azure/ai-foundry/openai/concepts/model-retirements). The following shows model retirement differences in Azure Government.
61+
62+
| Model | Version | Azure Government Status | Public Retirement date |
63+
| --------------------------|-------------------|:--------------------------|------------------------------------|
64+
| `gpt-35-turbo` | 1106 | Retired | November 11, 2025 |
65+
| `gpt-4` | turbo-2024-04-09 | Retired | November 11, 2025 |
66+
67+
<br>
68+
69+
### Deafault Model Versions
70+
In some cases, new model versions are designated as default in Azure Governmen ahead of dates in the commercial cloud. General information on model upgrades can be found at [Working with Azure OpenAI models](/azure/ai-foundry/openai/how-to/working-with-models?tabs=powershell&branch=main#model-deployment-upgrade-configuration)
71+
72+
The following shows default model differences in Azure Government.
73+
74+
| Model | Azure Government Default Version | Public Default Version | Default upgrade date |
75+
|-----------|----------------------------------|------------------------|-------------------------------|
76+
| `gpt-4o` | 2024-11-20 | 2024-08-06 | Starting on October 13, 2025 |
77+
78+
<br>
79+
5980
## Azure OpenAI features
6081

6182
The following feature differences exist when comparing Azure OpenAI in Azure Government vs commercial cloud.

articles/ai-foundry/openai/how-to/provisioned-throughput-onboarding.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -83,14 +83,14 @@ For example, for gpt-5 1 output token counts as 8 input tokens towards your util
8383
> [!NOTE]
8484
> gpt-4.1, gpt-4.1-mini and gpt-4.1-nano don't support long context (requests estimated at larger than 128k prompt tokens).
8585
86-
|Topic| **gpt-5** | **gpt-4.1** | **gpt-4.1-mini** | **gpt-4.1-nano** | **o3** | **o4-mini** |
87-
| --- | --- | --- | --- | --- | --- | --- |
88-
|Global & data zone provisioned minimum deployment| 15 | 15|15| 15 | 15 | 15 |
89-
|Global & data zone provisioned scale increment| 5 | 5|5| 5 | 5 | 5 |
90-
|Regional provisioned minimum deployment| 50 | 50|25| 25 |50 |25|
91-
|Regional provisioned scale increment| 50 | 50|25| 25 | 50 | 25|
92-
|Input TPM per PTU| 4,750 | 3,000|14,900| 59,400 | 3,000 | 5,400 |
93-
|Latency Target Value| 99% > 50 Tokens Per Second\* | 99% > 80 Tokens Per Second\* | 99% > 90 Tokens Per Second\*| 99% > 100 Tokens Per Second\* | 99% > 80 Tokens Per Second\* | 99% > 90 Tokens Per Second\* |
86+
|Topic| **gpt-5** | **gpt-5-mini** | **gpt-4.1** | **gpt-4.1-mini** | **gpt-4.1-nano** | **o3** | **o4-mini** |
87+
| --- | --- | --- | --- | --- | --- | --- | --- |
88+
|Global & data zone provisioned minimum deployment| 15 | 15 | 15|15| 15 | 15 | 15 |
89+
|Global & data zone provisioned scale increment| 5 | 5 | 5|5| 5 | 5 | 5 |
90+
|Regional provisioned minimum deployment| 50 | 25 | 50|25| 25 |50 |25|
91+
|Regional provisioned scale increment| 50 | 25 | 50|25| 25 | 50 | 25|
92+
|Input TPM per PTU| 4,750 | 23,750 | 3,000|14,900| 59,400 | 3,000 | 5,400 |
93+
|Latency Target Value| 99% > 50 Tokens Per Second\* | 99% > 80 Tokens Per Second\* | 99% > 80 Tokens Per Second\* | 99% > 90 Tokens Per Second\*| 99% > 100 Tokens Per Second\* | 99% > 80 Tokens Per Second\* | 99% > 90 Tokens Per Second\* |
9494

9595
\* Calculated as p50 request latency on a per 5 minute basis.
9696

articles/ai-foundry/openai/how-to/spillover-traffic-management.md

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,15 @@ ms.author: mopeakande
66
ms.service: azure-ai-foundry
77
ms.subservice: azure-ai-foundry-openai
88
ms.topic: how-to
9-
ms.date: 09/03/2025
9+
ms.date: 10/02/2025
1010
---
1111

1212
# Manage traffic with spillover for provisioned deployments
1313

1414
Spillover manages traffic fluctuations on provisioned deployments by routing overage traffic to a corresponding standard deployment. Spillover is an optional capability that can be set for all requests on a given deployment or can be managed on a per-request basis. When spillover is enabled, Azure OpenAI in Azure AI Foundry Models sends any overage traffic from your provisioned deployment to a standard deployment for processing.
1515

1616
> [!NOTE]
17-
> Spillover is currently not available for the `/v1` [API endpoint](../reference-preview-latest.md) or [responses API](./responses.md).
17+
> Spillover is currently not available for the [responses API](./responses.md).
1818
1919
## Prerequisites
2020
- You need to have a provisioned managed deployment and a standard deployment.
@@ -27,7 +27,28 @@ Spillover manages traffic fluctuations on provisioned deployments by routing ove
2727
To maximize the utilization of your provisioned deployment, you can enable spillover for all global and data zone provisioned deployments. With spillover, bursts or fluctuations in traffic can be automatically managed by the service. This capability reduces the risk of experiencing disruptions when a provisioned deployment is fully utilized. Alternatively, spillover is configurable per-request to provide flexibility across different scenarios and workloads. Spillover can also now be used for the [Azure AI Foundry Agent Service](../../agents/overview.md).
2828

2929
## When does spillover come into effect?
30-
When spillover is enabled for a deployment or configured for a given inference request, spillover is initiated when a non-200 response code is received for a given inference request. When a request results in a non-200 response code, the Azure OpenAI automatically sends the request from your provisioned deployment to your standard deployment to be processed. Even if a subset of requests is routed to the standard deployment, the service prioritizes sending requests to the provisioned deployment before sending any overage requests to the standard deployment, which may incur additional latency.
30+
When you enable spillover for a deployment or configure it for a given inference request, spillover initiates when a specific non-`200` response code is received for a given inference request as a result of one of these scenarios:
31+
32+
- Provisioned throughput units (PTU) are completely used, resulting in a `429` response code.
33+
34+
- You send a long context token request, resulting in a `400` error code. For example, when using `gpt 4.1` series models, PTU supports only context lengths less than 128k and returns HTTP 400.
35+
36+
- Server errors when processing your request, resulting in error code `500` or `503`.
37+
38+
When a request results in one of these non-`200` response codes, Azure OpenAI automatically sends the request from your provisioned deployment to your standard deployment to be processed.
39+
40+
> [!NOTE]
41+
> Even if a subset of requests is routed to the standard deployment, the service prioritizes sending requests to the provisioned deployment before sending any overage requests to the standard deployment, which might incur additional latency.
42+
43+
## How to know a request spilled over
44+
45+
The following HTTP response headers indicate that a specific request spilled over:
46+
47+
- `x-ms-spillover-from-<deployment-name>`. This header contains the PTU deployment name. The presence of this header indicates that the request was a spillover request.
48+
49+
- `x-ms-<deployment-name>`. This header contains the name of the deployment that served the request. If the request spilled over, the deployment name is the name of the standard deployment.
50+
51+
For a request that spilled over, if the standard deployment request failed for any reason, the original PTU response is used in the response to the customer. The customer sees a header `x-ms-spillover-error` that contains the response code of the spillover request (such as `429` or `500`) so that they know the reason for the failed spillover.
3152

3253
## How does spillover affect cost?
3354
Since spillover uses a combination of provisioned and standard deployments to manage traffic fluctuations, billing for spillover involves two components:
@@ -124,4 +145,4 @@ Applying the `IsSpillover` split lets you view the requests to your deployment t
124145
## See also
125146

126147
* [What is provisioned throughput](../concepts/provisioned-throughput.md)
127-
* [Onboarding to provisioned throughput](./provisioned-throughput-onboarding.md)
148+
* [Onboarding to provisioned throughput](./provisioned-throughput-onboarding.md)

0 commit comments

Comments
 (0)