Skip to content

Commit d51893d

Browse files
committed
Learn Editor: Update spillover-traffic-management.md
1 parent 462264d commit d51893d

File tree

1 file changed

+14
-12
lines changed

1 file changed

+14
-12
lines changed

articles/ai-services/openai/how-to/spillover-traffic-management.md

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ ms.date: 03/05/2025
1414

1515
# Manage traffic with spillover for provisioned deployments (Preview)
1616

17-
Spillover is a capability that automates the process of sending overage traffic from provisioned deployments to standard deployments when a non-200 response is received. Spillover is an optional capability that can be set for all requests on a given deployment or can be managed on a per-request basis. When spillover is enabled, Azure OpenAI service will take care of sending any overage traffic from your provisioned deployment to a designated standard deployment to be processed.
17+
Spillover manages traffic fluctuations on provisioned deployments by routing overage traffic to a corresponding standard deployment. Spillover is an optional capability that can be set for all requests on a given deployment or can be managed on a per-request basis. When spillover is enabled, Azure OpenAI Service send any overage traffic from your provisioned deployment to a standard deployment for processing.
1818

1919
## Prerequisites
2020
- A global provisioned or data zone provisioned deployment to be used as your primary deployment.
@@ -25,21 +25,23 @@ Spillover is a capability that automates the process of sending overage traffic
2525
- The data processing level of your standard deployment must match your provisioned deployment (e.g. global provisioned deployment must be used with a global standard spillover deployment).
2626

2727
## When to enable spillover on provisioned deployments
28-
To maximize the utilization of your provisioned deployment, it is recommended to enable spillover for all global and data zone provisioned deployments. With spillover, bursts or fluctuations in traffic can be automatically managed by the service, which reduces the risk of experience disruptions caused by a fully utilized provisioned deployment. If there are particular use cases or workloads where spillover is not required, this capability can be controlled on a per-request basis with request headers to provide full configurability across different scenarios and workloads.
28+
To maximize the utilization of your provisioned deployment, it is recommended to enable spillover for all global and data zone provisioned deployments. With spillover, bursts or fluctuations in traffic can be automatically managed by the service. This capability reduces the risk of experience disruptions when a provisioned deployment is fully utilized. Alternatively, spillover is configurable per-request to provide flexibility across different scenarios and workloads.
2929

3030
## When does spillover come into effect?
31-
When spillover is enabled for a deployment or configured for a given inference request, the spillover capability will be initiated when a non-200 response code is received for a request originally sent to your primary provisioned deployment. When a request results in a non-200 response code, the Azure OpenAI Service will automatically send the request to the designated spillover standard deployment to be processed. Even if a subset of requests is routed to the spillover standard deployment, the service will prioritize sending requests to the primary provisioned deployment before sending any overage requests to the spillover standard deployment.
31+
When spillover is enabled for a deployment or configured for a given inference request, spillover is initiated when a non-200 response code is received for a given inference request. When a request results in a non-200 response code, the Azure OpenAI Service automatically sends the request from your provisioned deployment to your standard deployment to be processed. Even if a subset of requests is routed to the standard deployment, the service prioritizes sending requests to the provisioned deployment before sending any overage requests to the standard deployment.
3232

3333
## How does spillover impact cost?
34-
Since spillover leverages a combination of provisioned and standard deployments to manage traffic fluctuations, billing for spillover will involve two aspects:
35-
- For any requests that are processed by your primary provisioned deployment, only the hourly provisioned deployment cost will apply. No additional costs will be incurred for these requests.
36-
- For any requests that are routed to your spillover standard deployment, the request will be billed at the associated input token, cached token, and output token rates for the specified model version and deployment type.
34+
Since spillover uses a combination of provisioned and standard deployments to manage traffic fluctuations, billing for spillover involves two components:
35+
36+
- For any requests processed by your provisioned deployment, only the hourly provisioned deployment cost applies. No additional costs are incurred for these requests.
37+
38+
- For any requests routed to your standard deployment, the request is billed at the associated input token, cached token, and output token rates for the specified model version and deployment type.
3739

3840
## How to enable spillover
39-
The spillover capability can be enabled for all requests on a provisioned deployment using a deployment property or it can be managed on a per-request basis using request headers. The following explains how to configure spillover for each of these scenarios.
41+
The spillover capability can be enabled for all requests on a provisioned deployment using a deployment property or it can be managed on a per-request basis using request headers. The following section explains how to configure spillover for each of these scenarios.
4042

4143
### Enable spillover for all requests on a provisioned deployment
42-
To enable spillover for all requests on a provisioned deployment, the deployment property `spilloverDeploymentName` needs to be set to the name of the standard deployment that will be used for spillover requests. This property can be set during the creation of a new provisioned deployment or can be added to an existing provisioned deployment. The `spilloverDeploymentName` property needs to be set to the name of a standard deployment within the same Azure OpenAI Service resource as your provisioned deployment.
44+
To enable spillover for all requests on a provisioned deployment, set the deployment property `spilloverDeploymentName` to the standard deployment target for spillover requests. This property can be set during the creation of a new provisioned deployment or can be added to an existing provisioned deployment. The `spilloverDeploymentName` property needs to be set to the name of a standard deployment within the same Azure OpenAI Service resource as your provisioned deployment.
4345

4446
```Bash
4547
curl -X PUT https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resource-group-temp/providers/Microsoft.CognitiveServices/accounts/docs-openai-test-001/deployments/spillover-ptu-deployment?api-version=2024-10-01 \
@@ -48,7 +50,7 @@ curl -X PUT https://management.azure.com/subscriptions/00000000-0000-0000-0000-0
4850
-d '{"sku":{"name":"GlobalProvisionedManaged","capacity":100},"properties": {"spilloverDeploymentName": "spillover-standard-deployment", "model":{"format": "OpenAI","name": "gpt-4o-mini","version": "2024-07-18"}}}'
4951
```
5052
### Enable spillover for select inference requests
51-
To selectively enable spillover on a per-request basis, the inference request header `x-ms-spillover-deployment` is used to specify the standard deployment where spillover requests will be processed in the event of a non-200 response code. If the `x-ms-spillover-deployment` header is not set on a given request, spillover will not be initiated in the event of a non-200 response. The use or omission of this header provides the flexibility to control when spillover should or should not be initiated for a given workload or scenario.
53+
To selectively enable spillover on a per-request basis, set the `x-ms-spillover-deployment` inference request header to the standard deployment target for spillover requests. If the `x-ms-spillover-deployment` header is not set on a given request, spillover is initiated in the event of a non-200 response. The use or omission of this header provides the flexibility to control when spillover should or should not be initiated for a given workload or scenario.
5254

5355
```bash
5456
curl $AZURE_OPENAI_ENDPOINT/openai/deployments/spillover-ptu-deployment/chat/completions?api-version=2025-02-01-preview \
@@ -59,9 +61,9 @@ curl $AZURE_OPENAI_ENDPOINT/openai/deployments/spillover-ptu-deployment/chat/com
5961

6062
```
6163
> [!NOTE]
62-
> If the spillover capability is enabled for the deployment using the `spilloverDeploymentName` property and also enabled at the request level using the `x-ms-spillover-deployment` header, the system will default to the setting of the deployment property. If you would like to ensure that spillover is only enabled on per-request basis, do not set the `spilloverDeploymentName` property on the provisioned deployment and only rely on the `x-ms-spillover-deployment` header on a per-request basis.
64+
> If the spillover capability is enabled for the deployment using the `spilloverDeploymentName` property and also enabled at the request level using the `x-ms-spillover-deployment` header, the system defaults to the setting of the deployment property. If you would like to ensure that spillover is only enabled on per-request basis, do not set the `spilloverDeploymentName` property on the provisioned deployment and only rely on the `x-ms-spillover-deployment` header on a per-request basis.
6365
6466
## How do I monitor my spillover usage?
65-
Since the spillover capability relies on a combination of provisioned and standard deployments to manage traffic overages, monitoring can be conducted at the deployment level for each deployment. To view how many requests were processed on the primary provisioned deployment versus the spillover standard deployment, leverage the splitting feature within Azure Monitor metrics to view the requests processed by each deployment and their respective status codes. Similarly, the splitting feature can be used to view how many tokens were processed on the primary provisioned deployment versus the spillover standard deployment for a given time period. For more information on observability within Azure OpenAI, review the [Monitor Azure OpenAI](./monitor-openai.md) documentation.
67+
Since the spillover capability relies on a combination of provisioned and standard deployments to manage traffic overages, monitoring can be conducted at the deployment level for each deployment. To view how many requests were processed on the primary provisioned deployment versus the spillover standard deployment, apply the splitting feature within Azure Monitor metrics to view the requests processed by each deployment and their respective status codes. Similarly, the splitting feature can be used to view how many tokens were processed on the primary provisioned deployment versus the spillover standard deployment for a given time period. For more information on observability within Azure OpenAI, review the [Monitor Azure OpenAI](./monitor-openai.md) documentation.
6668

67-
The following Azure Monitor metrics chart provides an example of the split of requests between the primary provisioned deployment and the spillover standard deployment when spillover is initiated. As shown in the chart, for every request that has a non-200 response code for the provisioned deployment ("gpt-4o-ptu"), there is a corresponding request with a 200 response code on the spillover standard deployment ("gpt-4o-paygo-spillover"), indicating that these overage requests were routed to the spillover standard deployment for successful processing. ![Azure monitor chart showing spillover requests from a provisioned deployment to a standard deployment.](media/spillover-traffic-management/monitor-spillover-usage.png)
69+
The following Azure Monitor metrics chart provides an example of the split of requests between the primary provisioned deployment and the spillover standard deployment when spillover is initiated. As shown in the chart, for every request that has a non-200 response code for the provisioned deployment ("gpt-4o-ptu"), there is a corresponding request with a 200-response code on the spillover standard deployment ("gpt-4o-paygo-spillover"), indicating that these overage requests were routed to the spillover standard deployment for successful processing. ![Azure monitor chart showing spillover requests from a provisioned deployment to a standard deployment.](media/spillover-traffic-management/monitor-spillover-usage.png)

0 commit comments

Comments
 (0)