You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md
+10-11Lines changed: 10 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,7 @@ You should consider switching from pay-as-you-go to provisioned throughput when
27
27
> [!NOTE]
28
28
> In function calling and agent use cases, token usage can be variable. You should understand your expected Tokens Per Minute (TPM) usage in detail prior to migrating workloads to PTU.
29
29
30
-
## Sizing and estimation: provisioned managed only
30
+
## Sizing and estimation: provisioned and global provisioned
31
31
32
32
Determining the right amount of provisioned throughput, or PTUs, you require for your workload is an essential step to optimizing performance and cost. This section describes how to use the Azure OpenAI capacity planning tool. The tool provides you with an estimate of the required PTU to meet the needs of your workload.
33
33
@@ -56,38 +56,37 @@ The values in the output column are the estimated value of PTU units required fo
56
56
57
57
## Understanding the Provisioned Throughput Purchase Model
58
58
59
-
Azure OpenAI Provisioned is purchased on-demand at an hourly basis based on the number of deployed PTUs, with substantial term discount available via the purchase of Azure Reservations.
59
+
Azure OpenAI Provisioned and Global Provisiones are purchased on-demand at an hourly basis based on the number of deployed PTUs, with substantial term discount available via the purchase of Azure Reservations.
60
60
61
-
The hourly model is useful for short-term deployment needs, such as validating new models or acquiring capacity for a hackathon. However, the discounts provided by the Azure Reservation for Azure OpenAI Provisioned are considerable and most customers with consistent long-term usage will find a reserved model to be a better value proposition.
61
+
The hourly model is useful for short-term deployment needs, such as validating new models or acquiring capacity for a hackathon. However, the discounts provided by the Azure Reservation for Azure OpenAI Provisioned and Global Provisioned are considerable and most customers with consistent long-term usage will find a reserved model to be a better value proposition.
62
62
63
63
> [!NOTE]
64
64
> Azure OpenAI Provisioned customers onboarded prior to the August self-service update use a purchase model called the Commitment model. These customers can continue to use this older purchase model alongside the Hourly/reservation purchase model. The Commitment model is not available for new customers. For details on the Commitment purchase model and options for coexistence and migration, please see the [Azure OpenAI Provisioned August Update](../concepts/provisioned-migration.md).
65
-
66
65
## Hourly Usage
67
66
68
-
Provisioned Throughput deployments are charged an hourly rate ($/PTU/hr) on the number of PTUs that have been deployed. For example, a 300 PTU deployment will be charged the hourly rate times 300. All Azure OpenAI pricing is available in the Azure Pricing Calculator.
67
+
Provisioned and Global Provisioned deployments are charged an hourly rate ($/PTU/hr) on the number of PTUs that have been deployed. For example, a 300 PTU deployment will be charged the hourly rate times 300. All Azure OpenAI pricing is available in the Azure Pricing Calculator.
69
68
70
69
If a deployment exists for a partial hour, it will receive a prorated charge based on the number of minutes it was deployed during the hour. For example, a deployment that exists for 15 minutes during an hour will receive 1/4th the hourly charge.
71
70
72
71
If the deployment size is changed, the costs of the deployment will adjust to match the new number of PTUs.
Paying for provisioned deployments on an hourly basis is ideal for short-term deployment scenarios. For example: Quality and performance benchmarking of new models, or temporarily increasing PTU capacity to cover an event such as a hackathon.
75
+
Paying for provisioned and global provisioned deployments on an hourly basis is ideal for short-term deployment scenarios. For example: Quality and performance benchmarking of new models, or temporarily increasing PTU capacity to cover an event such as a hackathon.
77
76
78
-
Customers that require long-term usage of provisioned deployments, however, might pay significantly less per month by purchasing a term discount via an Azure Reservation as discussed in the next section.
77
+
Customers that require long-term usage of provisioned and global provisioned deployments, however, might pay significantly less per month by purchasing a term discount via an Azure Reservation as discussed in the next section.
79
78
80
79
> [!NOTE]
81
80
> It is not recommended to scale production deployments according to incoming traffic and pay for them purely on an hourly basis. There are two reasons for this:
82
81
> * The cost savings achieved by purchasing an Azure Reservation for Azure OpenAI Provisioned are significant, and it will be less expensive in many cases to maintain a deployment sized for full production volume paid for via a reservation than it would be to scale the deployment with incoming traffic.
83
82
> * Having unused provisioned quota (PTUs) does not guarentee that capacity will be available to support increasing the size of the deployment when required. Quota limits the maximum number of PTUs that can be deployed, but it is not a capacity guarantee. Provisioned capacity for each region and modal dynamically changes throughout the day and might not be available when required. As a result, it is recommended to maintain a permanant deployment to cover your traffic needs (paid for via a reservation).
84
83
> * Charges for deployments on a deleted resource will continue until the resource is purged. To prevent this, delete a resource’s deployment before deleting the resource. For more information, see [Recover or purge deleted Azure AI services resources](../../recover-purge-resources.md).
85
84
86
-
## Azure Reservations for Azure OpenAI Provisioned
85
+
## Azure Reservations for Azure OpenAI Provisioned and Global Provisioned
87
86
88
-
Discounts on top of the hourly usage price can be obtained by purchasing an Azure Reservation for Azure OpenAI Provisioned. An Azure Reservation is a term-discounting mechanism shared by many Azure products. For example, Compute and Cosmos DB. For Azure OpenAI Provisioned, the reservation provides a discount for committing to payment for fixed number of PTUs for a one-month or one-year period.
87
+
Discounts on top of the hourly usage price can be obtained by purchasing an Azure Reservation for Azure OpenAI Provisioned and Global Provisioned. An Azure Reservation is a term-discounting mechanism shared by many Azure products. For example, Compute and Cosmos DB. For Azure OpenAI Provisioned and Global Provisioned, the reservation provides a discount for committing to payment for fixed number of PTUs for a one-month or one-year period.
89
88
90
-
* Azure Reservations are purchased via the Azure portal, not Azure OpenAI Studio Link to Azure reservation portal.
89
+
* Azure Reservations are purchased via the Azure portal, not Azure OpenAI Studio Link to Azure reservation portal.
91
90
92
91
* Reservations are purchased regionally and can be flexibly scoped to cover usage from a group of deployments. Reservation scopes include:
93
92
@@ -110,7 +109,7 @@ Discounts on top of the hourly usage price can be obtained by purchasing an Azur
110
109
>
111
110
> * The Azure role and tenant policy requirements to purchase a reservation are different than those required to create a deployment or Azure OpenAI resource. Verify authorization to purchase reservations in advance of needing to do so. See Azure OpenAI [Provisioned reservation documentation](https://aka.ms/oai/docs/ptum-reservations) for more details.
## Important: Sizing Azure OpenAI Provisioned & Global Provisioned Reservations
114
113
115
114
The PTU amounts in reservation purchases are independent of PTUs allocated in quota or used in deployments. It is possible to purchase a reservation for more PTUs than you have in quota, or can deploy for the desired region, model, or version. Credits for over-purchasing a reservation are limited, and customers must take steps to ensure they maintain their reservation sizes in line with their deployed PTUs.
0 commit comments