Skip to content

Commit ba8d8d7

Browse files
committed
PM feedback
1 parent fccb93e commit ba8d8d7

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

articles/ai-foundry/openai/how-to/spillover-traffic-management.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,21 +27,21 @@ Spillover manages traffic fluctuations on provisioned deployments by routing ove
2727
To maximize the utilization of your provisioned deployment, you can enable spillover for all global and data zone provisioned deployments. With spillover, bursts or fluctuations in traffic can be automatically managed by the service. This capability reduces the risk of experiencing disruptions when a provisioned deployment is fully utilized. Alternatively, spillover is configurable per-request to provide flexibility across different scenarios and workloads. Spillover can also now be used for the [Azure AI Foundry Agent Service](../../agents/overview.md).
2828

2929
## When does spillover come into effect?
30-
When you enable spillover for a deployment or configure it for a given inference request, spillover initiates when a non-`200` response code is received for a given inference request. A non-`200` response code can result from any of these scenarios:
30+
When you enable spillover for a deployment or configure it for a given inference request, spillover initiates when a specific non-`200` response code is received for a given inference request as a result of one of these scenarios:
3131

3232
- Provisioned throughput units (PTU) are completely used, resulting in a `429` response code.
3333

3434
- You send a long context token request, resulting in a `400` error code. For example, when using `gpt 4.1` series models, PTU supports only context lengths less than 128k and returns HTTP 400.
3535

3636
- Server errors when processing your request, resulting in error code `500` or `503`.
3737

38-
When a request results in a non-`200` response code, Azure OpenAI automatically sends the request from your provisioned deployment to your standard deployment to be processed. Even if a subset of requests is routed to the standard deployment, the service prioritizes sending requests to the provisioned deployment before sending any overage requests to the standard deployment, which might incur additional latency.
38+
When a request results in one of these non-`200` response codes, Azure OpenAI automatically sends the request from your provisioned deployment to your standard deployment to be processed. Even if a subset of requests is routed to the standard deployment, the service prioritizes sending requests to the provisioned deployment before sending any overage requests to the standard deployment, which might incur additional latency.
3939

4040
## How to know a request spilled over
4141

4242
The following HTTP response headers indicate that a specific request spilled over:
4343

44-
- `x-ms-spillover-from-<deployment>`. This header contains the PTU deployment name. The presence of this header indicates that the request was a spillover request.
44+
- `x-ms-spillover-from-<deployment-name>`. This header contains the PTU deployment name. The presence of this header indicates that the request was a spillover request.
4545

4646
- `x-ms-<deployment-name>`. This header contains the name of the deployment that served the request. If the request spilled over, the deployment name is the name of the standard deployment.
4747

0 commit comments

Comments
 (0)