You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In mid-August, 2024, Microsoft launched improvements to its Provisioned Throughput offering that address customer feedback on usability and operational agility that open new payment options and deployment scenarios.
18
+
Microsoft launched improvements to its Provisioned Throughput offering that address customer feedback on usability and operational agility that open new payment options and deployment scenarios.
19
19
20
20
This article is intended for existing users of the provisioned throughput offering. New customers should refer to the [Azure OpenAI provisioned onboarding guide](../how-to/provisioned-throughput-onboarding.md).
21
21
22
22
## What's changing?
23
23
24
-
The capabilities below are rolling out for the Provisioned Managed offering.
25
24
26
25
> [!IMPORTANT]
27
26
> The changes in this article do not apply to the older *"Provisioned Classic (PTU-C)"* offering. They only affect the Provisioned (also known as the Provisioned Managed) offering.
@@ -39,11 +38,11 @@ The capabilities below are rolling out for the Provisioned Managed offering.
39
38
40
39
|Feature | Benefit|
41
40
|---|---|
42
-
|Hourly, uncommitted usage| Hourly payment option without a required commitment enables short-term deployment scenarios. |
41
+
|Non-binding, Hourly option| Hourly payment option without any binding enables short-term deployment scenarios. Ideal for testing new models and assessing benefits of Provisioned Throughput. |
43
42
|Term discounts via Azure Reservations | Azure reservations provide substantial discounts over the hourly rate for one month and one year terms, and provide flexible scopes that minimize administration and associated with today’s resource-bound commitments.|
44
43
| Default provisioned-managed quota in many regions | Get started quickly in new regions without having to first request quota. |
45
-
| Flexible choice of payment model for existing provisioned customers | Customers with commitments can stay on the commitment model at least through the end of 2024, and can choose to migrate existing commitments to hourly/reservations via a self-service or managed process. |
46
-
| Supports latest model generations | The hourly/reservation model is required to deploy models released after August 1, 2024. |
44
+
| Flexible choice of payment model for existing provisioned customers | Customers with commitments can stay on the commitment model till the end of life of the currently supported models, and can choose to migrate existing commitments to hourly/reservations via managed process. We recommend migrating to hourly/ reservations to take advantage of term discounts and to work with the latest models. |
45
+
| Supports latest model generations | The latest models are available only on hourly/ reservations in provisioned offering. |
47
46
48
47
## Usability improvement details
49
48
@@ -87,7 +86,7 @@ See the following links for more information. The guidance for reservations and
87
86
> [!NOTE]
88
87
> The following description of payment models doesn't apply to the older "Provisioned Classic (PTU-C)" offering. They only affect the Provisioned (also known as Provisioned Managed) offering. Provisioned Classic continues to be governed by the unchanged monthly commitment payment model.
89
88
90
-
Microsoft has introduced a new "Hourly/reservation" payment model for provisioned deployments. This is in addition to the current **Commitment** payment model, which will continue to be supported at least through the end of 2024.
89
+
Microsoft has introduced a new "Hourly/reservation" payment model for provisioned deployments. This is in addition to the current **Commitment** payment model, which will continue to be supported till end of life of the currently supported limited model list. Refer to the [supported models on **Commitment payment model**](./provisioned-migration.md#supported-models-on-commitment-payment-model) for the list of supported models on Commitment payment model.
91
90
92
91
### Commitment payment model
93
92
@@ -97,7 +96,18 @@ Microsoft has introduced a new "Hourly/reservation" payment model for provisione
97
96
98
97
- Commitments can't be canceled or altered during their term, except to add new PTUs.
99
98
100
-
- Supports models released prior to August 1, 2024.
99
+
#### Supported models on Commitment payment model:
100
+
Only the following list of Azure OpenAI models are supported in Commitments. For onboarding any other models that are not in the list below, or any newer models on provisioned throughput offering, refer to the [Azure OpenAI provisioned onboarding guide](../how-to/provisioned-throughput-onboarding.md) and [Azure Reservations for Azure OpenAI provisioned deployments](../how-to/provisioned-throughput-onboarding.md#azure-reservations-for-azure-openai-provisioned-deployments)
101
+
102
+
|Supported models on Commitment plan |
103
+
|-|
104
+
|gpt-35-turbo|
105
+
|gpt-4|
106
+
|gpt-4-32k|
107
+
|gpt-4o|
108
+
109
+
110
+
101
111
102
112
### Hourly reservation payment model
103
113
@@ -112,7 +122,7 @@ Microsoft has introduced a new "Hourly/reservation" payment model for provisione
112
122
- Supports all models, both old and new.
113
123
114
124
> [!IMPORTANT]
115
-
> **Models released after August 1, 2024 require the use of the Hourly/Reservation payment model.** They are not deployable on Azure OpenAI resources that have active commitments. To deploy models released after August 1, existing customers must either:
125
+
> More latest models are available in provisioned offering with Hourly/Reservation payment model. Check the list [**here**](https://learn.microsoft.com/azure/ai-services/openai/concepts/models?tabs=provisioned%2Cstandard-chat-completions#global-standard-model-availability) for the availabilityModels that are not in the above [**list**](./provisioned-migration.md#supported-models-on-commitment-payment-model) are not deployable on Azure OpenAI resources that have active commitments. To deploy models newer models you must either:
116
126
> - Create deployments on Azure OpenAI resources without commitments.
117
127
> - Migrate an existing resource off its commitments.
118
128
@@ -142,11 +152,11 @@ Steps 1 and 2 are the same in all cases. The difference is whether a commitment
142
152
|Discount type |Available Scopes (within a region) |
143
153
|---------|---------|
144
154
|Commitment | Azure OpenAI resource |
145
-
|Row2| Resource group, single subscription, management group (group of subscriptions), shared (all subscriptions in a billing account) |
155
+
|Reservation| Resource group, single subscription, management group (group of subscriptions), shared (all subscriptions in a billing account) |
146
156
147
157
* The discounted price is applied to deployed PTUs up to the number of discounted PTUs in the discount.
148
158
* The number of deployed PTUs exceeding the discounted PTUs (or not covered by any discount) are charged the hourly rate.
149
-
* The best practice is to create deployments first, and then to apply discounts. This is to guarantee that service. capacity is available to support your deployments prior to creating a term commitment for PTUs you cannot use.
159
+
* The best practice is to create deployments first, and then to apply discounts. This is to guarantee that service. capacity is available to support your deployments prior to creating a term agreement for PTUs you cannot use.
150
160
151
161
> [!NOTE]
152
162
> When you follow best practices, you might receive hourly charges between the time you create the deployment and increase your discount (commitment or reservation).
@@ -155,12 +165,12 @@ Steps 1 and 2 are the same in all cases. The difference is whether a commitment
155
165
156
166
## Mapping deployments to discounting method
157
167
158
-
Customers using Azure OpenAI Provisioned prior to August 2024 can use either or both payment models simultaneously within a subscription. The payment model used for each deployment is determined based on its Azure OpenAI resource:
168
+
Customers using Azure OpenAI Provisioned offer prior to August 2024 can use either or both payment models simultaneously within a subscription. The payment model used for each deployment is determined based on its Azure OpenAI resource:
159
169
160
170
161
171
**Resource has an active Commitment**
162
172
163
-
* The commitment discounts all deployments on the resource up to the number of PTUs on the commitment. Any excess PTUs will be billed hourly.
173
+
* The commitment discounts all deployments on the resource up to the number of PTUs on the commitment. Any excess PTUs will be billed hourly unless the excess PTUs are not in the scope of an active reservations. If the excess PTUs exist in the scope of an active reservation, will be discounted as a group up to the number of PTUs on the reservation and any excess spill still leftover will be billed hourly.
164
174
165
175
**Resource does not have an active commitment**
166
176
@@ -169,9 +179,10 @@ Customers using Azure OpenAI Provisioned prior to August 2024 can use either or
169
179
170
180
### Changes to the existing payment mode
171
181
172
-
Customers that have commitments today can continue to use them at least through the end of 2024. This includes purchasing new PTUs on new or existing commitments and managing commitment renewal behaviors. However, the August update has changed certain aspects of commitment operation.
182
+
Customers that have commitments today can continue to use them at least till the supported model's retirement. This includes purchasing new PTUs on new or existing commitments and managing commitment renewals. However, the August update has changed certain aspects of commitments operation.
173
183
174
-
- Only models released as provisioned prior to August 1, 2024 or before can be deployed on a resource with a commitment.
184
+
- Azure OpenAI has stopped supporting enrollment on to new commitments, starting August 1, 2024
185
+
- Only a limited set of models can be deployed on a resource with a commitment. Here is the [List of models](./provisioned-migration.md#supported-models-on-commitment-payment-model)
175
186
176
187
- If the deployed PTUs under a commitment exceed the committed PTUs, the hourly overage charges will be emitted against the same hourly meter as used for the new hourly/reservation payment model. This allows the overage charges to be discounted via an Azure Reservation.
177
188
- It is possible to deploy more PTUs than are committed on the resource. This supports the ability to guarantee capacity availability prior to increasing the commitment size to cover it.
@@ -206,7 +217,7 @@ An alternative approach to self-service migration is to switch the reservation p
206
217
* There will be a short period of double-billing or hourly charges during the switchover from committed to hourly/reservation billing.
207
218
208
219
> [!IMPORTANT]
209
-
> Both self-service approaches generate some additional charges as the payment mode is switched from Committed to Hourly/Reservation. These are characteristics of the migration approaches and customers aren't credited for these charges. Customers can choose to use the managed migration approach described below to avoid them.
220
+
> Self-service approach generates additional charges as the payment mode is switched from Committed to Hourly/Reservation. This is the characteristics of this migration approaches and customers aren't credited for these charges. Alternately, Customers can choose to use the managed migration approach described below to avoid additional charges.
Copy file name to clipboardExpand all lines: articles/ai-services/openai/concepts/provisioned-throughput.md
+6-2Lines changed: 6 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,9 +21,13 @@ The provisioned throughput capability allows you to specify the amount of throug
21
21
## What do the provisioned deployment types provide?
22
22
23
23
-**Predictable performance:** stable max latency and throughput for uniform workloads.
24
-
-**Reserved processing capacity:** A deployment configures the amount of throughput. Once deployed, the throughput is available whether used or not.
24
+
-**Allocated processing capacity:** A deployment configures the amount of throughput. Once deployed, the throughput is available whether used or not.
25
25
-**Cost savings:** High throughput workloads might provide cost savings vs token-based consumption.
26
26
27
+
> [!NOTE]
28
+
> Customers can take advantage of additional cost savings on provisioned deployments when they buy [Microsoft Azure OpenAI Service reservations](/azure/cost-management-billing/reservations/azure-openai#buy-a-microsoft-azure-openai-service-reservation).
29
+
30
+
27
31
An Azure OpenAI Deployment is a unit of management for a specific OpenAI Model. A deployment provides customer access to a model for inference and integrates more features like Content Moderation ([See content moderation documentation](content-filter.md)). Global provisioned deployments are available in the same Azure OpenAI resources as all other deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with the best availability for each request. Similarly, data zone provisioned deployments are also available in the same resources as all other deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center within the Microsoft specified data zone with the best availability for each request.
28
32
29
33
## What do you get?
@@ -165,7 +169,7 @@ For provisioned deployments, we use a variation of the leaky bucket algorithm to
165
169
166
170
a. When the current utilization is above 100%, the service returns a 429 code with the `retry-after-ms` header set to the time until utilization is below 100%
167
171
168
-
b. Otherwise, the service estimates the incremental change to utilization required to serve the request by combining the prompt tokens, less any cacehd tokens, and the specified `max_tokens` in the call. A customer can receive up to a 100% discount on their prompt tokens depending on the size of their cached tokens. If the `max_tokens` parameter is not specified, the service estimates a value. This estimation can lead to lower concurrency than expected when the number of actual generated tokens is small. For highest concurrency, ensure that the `max_tokens` value is as close as possible to the true generation size.
172
+
b. Otherwise, the service estimates the incremental change to utilization required to serve the request by combining the prompt tokens, less any cached tokens, and the specified `max_tokens` in the call. A customer can receive up to a 100% discount on their prompt tokens depending on the size of their cached tokens. If the `max_tokens` parameter is not specified, the service estimates a value. This estimation can lead to lower concurrency than expected when the number of actual generated tokens is small. For highest concurrency, ensure that the `max_tokens` value is as close as possible to the true generation size.
169
173
170
174
1. When a request finishes, we now know the actual compute cost for the call. To ensure an accurate accounting, we correct the utilization using the following logic:
0 commit comments