Skip to content

Commit 3982b72

Browse files
Merge pull request #7459 from msakande/patch-1
Update spillover traffic management documentation
2 parents 3195e9c + 100476f commit 3982b72

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

articles/ai-foundry/openai/how-to/spillover-traffic-management.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ ms.date: 10/02/2025
1414
Spillover manages traffic fluctuations on provisioned deployments by routing overage traffic to a corresponding standard deployment. Spillover is an optional capability that can be set for all requests on a given deployment or can be managed on a per-request basis. When spillover is enabled, Azure OpenAI in Azure AI Foundry Models sends any overage traffic from your provisioned deployment to a standard deployment for processing.
1515

1616
> [!NOTE]
17-
> Spillover is currently not available for the `/v1` [API endpoint](../reference-preview-latest.md) or [responses API](./responses.md).
17+
> Spillover is currently not available for the [responses API](./responses.md).
1818
1919
## Prerequisites
2020
- You need to have a provisioned managed deployment and a standard deployment.
@@ -35,7 +35,10 @@ When you enable spillover for a deployment or configure it for a given inference
3535

3636
- Server errors when processing your request, resulting in error code `500` or `503`.
3737

38-
When a request results in one of these non-`200` response codes, Azure OpenAI automatically sends the request from your provisioned deployment to your standard deployment to be processed. Even if a subset of requests is routed to the standard deployment, the service prioritizes sending requests to the provisioned deployment before sending any overage requests to the standard deployment, which might incur additional latency.
38+
When a request results in one of these non-`200` response codes, Azure OpenAI automatically sends the request from your provisioned deployment to your standard deployment to be processed.
39+
40+
> [!NOTE]
41+
> Even if a subset of requests is routed to the standard deployment, the service prioritizes sending requests to the provisioned deployment before sending any overage requests to the standard deployment, which might incur additional latency.
3942
4043
## How to know a request spilled over
4144

@@ -142,4 +145,4 @@ Applying the `IsSpillover` split lets you view the requests to your deployment t
142145
## See also
143146

144147
* [What is provisioned throughput](../concepts/provisioned-throughput.md)
145-
* [Onboarding to provisioned throughput](./provisioned-throughput-onboarding.md)
148+
* [Onboarding to provisioned throughput](./provisioned-throughput-onboarding.md)

0 commit comments

Comments
 (0)