Skip to content

Commit fa1679b

Browse files
Merge pull request #6788 from aahill/ptu-spillover
fixes
2 parents 747267f + 64700f1 commit fa1679b

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

articles/ai-foundry/openai/how-to/provisioned-throughput-onboarding.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ For example, for gpt-5 1 output token counts as 8 input tokens towards your util
8080
## Latest Azure OpenAI models
8181

8282
> [!NOTE]
83-
> gpt-5, gpt-4.1, gpt-4.1-mini and gpt-4.1-nano don't support long context (requests estimated at larger than 128k prompt tokens).
83+
> gpt-4.1, gpt-4.1-mini and gpt-4.1-nano don't support long context (requests estimated at larger than 128k prompt tokens).
8484
8585
|Topic| **gpt-5** | **gpt-4.1** | **gpt-4.1-mini** | **gpt-4.1-nano** | **o3** | **o4-mini** |
8686
| --- | --- | --- | --- | --- | --- | --- |

articles/ai-foundry/openai/how-to/spillover-traffic-management.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ ms.date: 08/27/2025
1313
Spillover manages traffic fluctuations on provisioned deployments by routing overage traffic to a corresponding standard deployment. Spillover is an optional capability that can be set for all requests on a given deployment or can be managed on a per-request basis. When spillover is enabled, Azure OpenAI in Azure AI Foundry Models sends any overage traffic from your provisioned deployment to a standard deployment for processing.
1414

1515
> [!NOTE]
16-
> Spillover is currently not available for the `/v1` [API endpoint](../reference-preview-latest.md).
16+
> Spillover is currently not available for the `/v1` [API endpoint](../reference-preview-latest.md) or [responses API](./responses.md).
1717
1818
## Prerequisites
1919
- You need to have a provisioned managed deployment and a standard deployment.
@@ -23,7 +23,7 @@ Spillover manages traffic fluctuations on provisioned deployments by routing ove
2323
- The data processing level of your standard deployment must match your provisioned deployment (for example, a global provisioned deployment must be used with a global standard spillover deployment).
2424

2525
## When to enable spillover on provisioned deployments
26-
To maximize the utilization of your provisioned deployment, you can enable spillover for all global and data zone provisioned deployments. With spillover, bursts or fluctuations in traffic can be automatically managed by the service. This capability reduces the risk of experiencing disruptions when a provisioned deployment is fully utilized. Alternatively, spillover is configurable per-request to provide flexibility across different scenarios and workloads. Spillover can also now be used for the [Azure AI Foundry Agent Service](../../agents/overview.md) and [responses API](./responses.md).
26+
To maximize the utilization of your provisioned deployment, you can enable spillover for all global and data zone provisioned deployments. With spillover, bursts or fluctuations in traffic can be automatically managed by the service. This capability reduces the risk of experiencing disruptions when a provisioned deployment is fully utilized. Alternatively, spillover is configurable per-request to provide flexibility across different scenarios and workloads. Spillover can also now be used for the [Azure AI Foundry Agent Service](../../agents/overview.md).
2727

2828
## When does spillover come into effect?
2929
When spillover is enabled for a deployment or configured for a given inference request, spillover is initiated when a non-200 response code is received for a given inference request. When a request results in a non-200 response code, the Azure OpenAI automatically sends the request from your provisioned deployment to your standard deployment to be processed. Even if a subset of requests is routed to the standard deployment, the service prioritizes sending requests to the provisioned deployment before sending any overage requests to the standard deployment, which may incur additional latency.

0 commit comments

Comments
 (0)