Skip to content

Commit 34fb310

Browse files
committed
Merge branch 'ambadal-pr-CommitmentsUpdates1' of https://github.com/AmarBadal/azure-ai-docs-pr into mrb_02_12_2025_pm_assist
2 parents 0eb9b0a + ac0ad6f commit 34fb310

File tree

2 files changed

+33
-18
lines changed

2 files changed

+33
-18
lines changed

articles/ai-services/openai/concepts/provisioned-migration.md

Lines changed: 27 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,14 @@ ms.author: aahi
1313
recommendations: false
1414
---
1515

16-
# Azure OpenAI provisioned August 2024 update
16+
# Azure OpenAI provisioned Managed offering updates
1717

18-
In mid-August, 2024, Microsoft launched improvements to its Provisioned Throughput offering that address customer feedback on usability and operational agility that open new payment options and deployment scenarios.
18+
Microsoft launched improvements to its Provisioned Throughput offering that address customer feedback on usability and operational agility that open new payment options and deployment scenarios.
1919

2020
This article is intended for existing users of the provisioned throughput offering. New customers should refer to the [Azure OpenAI provisioned onboarding guide](../how-to/provisioned-throughput-onboarding.md).
2121

2222
## What's changing?
2323

24-
The capabilities below are rolling out for the Provisioned Managed offering.
2524

2625
> [!IMPORTANT]
2726
> The changes in this article do not apply to the older *"Provisioned Classic (PTU-C)"* offering. They only affect the Provisioned (also known as the Provisioned Managed) offering.
@@ -39,11 +38,11 @@ The capabilities below are rolling out for the Provisioned Managed offering.
3938

4039
|Feature | Benefit|
4140
|---|---|
42-
|Hourly, uncommitted usage | Hourly payment option without a required commitment enables short-term deployment scenarios. |
41+
|Non-binding, Hourly option | Hourly payment option without any binding enables short-term deployment scenarios. Ideal for testing new models and assessing benefits of Provisioned Throughput. |
4342
|Term discounts via Azure Reservations | Azure reservations provide substantial discounts over the hourly rate for one month and one year terms, and provide flexible scopes that minimize administration and associated with today’s resource-bound commitments.|
4443
| Default provisioned-managed quota in many regions | Get started quickly in new regions without having to first request quota. |
45-
| Flexible choice of payment model for existing provisioned customers | Customers with commitments can stay on the commitment model at least through the end of 2024, and can choose to migrate existing commitments to hourly/reservations via a self-service or managed process. |
46-
| Supports latest model generations | The hourly/reservation model is required to deploy models released after August 1, 2024. |
44+
| Flexible choice of payment model for existing provisioned customers | Customers with commitments can stay on the commitment model till the end of life of the currently supported models, and can choose to migrate existing commitments to hourly/reservations via managed process. We recommend migrating to hourly/ reservations to take advantage of term discounts and to work with the latest models. |
45+
| Supports latest model generations | The latest models are available only on hourly/ reservations in provisioned offering. |
4746

4847
## Usability improvement details
4948

@@ -87,7 +86,7 @@ See the following links for more information. The guidance for reservations and
8786
> [!NOTE]
8887
> The following description of payment models doesn't apply to the older "Provisioned Classic (PTU-C)" offering. They only affect the Provisioned (also known as Provisioned Managed) offering. Provisioned Classic continues to be governed by the unchanged monthly commitment payment model.
8988
90-
Microsoft has introduced a new "Hourly/reservation" payment model for provisioned deployments. This is in addition to the current **Commitment** payment model, which will continue to be supported at least through the end of 2024.
89+
Microsoft has introduced a new "Hourly/reservation" payment model for provisioned deployments. This is in addition to the current **Commitment** payment model, which will continue to be supported till end of life of the currently supported limited model list. Refer to the [supported models on **Commitment payment model**](./provisioned-migration.md#supported-models-on-commitment-payment-model) for the list of supported models on Commitment payment model.
9190

9291
### Commitment payment model
9392

@@ -97,7 +96,18 @@ Microsoft has introduced a new "Hourly/reservation" payment model for provisione
9796

9897
- Commitments can't be canceled or altered during their term, except to add new PTUs.
9998

100-
- Supports models released prior to August 1, 2024.
99+
#### Supported models on Commitment payment model:
100+
Only the following list of Azure OpenAI models are supported in Commitments. For onboarding any other models that are not in the list below, or any newer models on provisioned throughput offering, refer to the [Azure OpenAI provisioned onboarding guide](../how-to/provisioned-throughput-onboarding.md) and [Azure Reservations for Azure OpenAI provisioned deployments](../how-to/provisioned-throughput-onboarding.md#azure-reservations-for-azure-openai-provisioned-deployments)
101+
102+
|Supported models on Commitment plan |
103+
|-|
104+
|gpt-35-turbo|
105+
|gpt-4|
106+
|gpt-4-32k|
107+
|gpt-4o|
108+
109+
110+
101111

102112
### Hourly reservation payment model
103113

@@ -112,7 +122,7 @@ Microsoft has introduced a new "Hourly/reservation" payment model for provisione
112122
- Supports all models, both old and new.
113123

114124
> [!IMPORTANT]
115-
> **Models released after August 1, 2024 require the use of the Hourly/Reservation payment model.** They are not deployable on Azure OpenAI resources that have active commitments. To deploy models released after August 1, existing customers must either:
125+
> More latest models are available in provisioned offering with Hourly/Reservation payment model. Check the list [**here**](https://learn.microsoft.com/azure/ai-services/openai/concepts/models?tabs=provisioned%2Cstandard-chat-completions#global-standard-model-availability) for the availabilityModels that are not in the above [**list**](./provisioned-migration.md#supported-models-on-commitment-payment-model) are not deployable on Azure OpenAI resources that have active commitments. To deploy models newer models you must either:
116126
> - Create deployments on Azure OpenAI resources without commitments.
117127
> - Migrate an existing resource off its commitments.
118128
@@ -142,11 +152,11 @@ Steps 1 and 2 are the same in all cases. The difference is whether a commitment
142152
|Discount type |Available Scopes (within a region) |
143153
|---------|---------|
144154
|Commitment | Azure OpenAI resource |
145-
|Row2 | Resource group, single subscription, management group (group of subscriptions), shared (all subscriptions in a billing account) |
155+
|Reservation | Resource group, single subscription, management group (group of subscriptions), shared (all subscriptions in a billing account) |
146156

147157
* The discounted price is applied to deployed PTUs up to the number of discounted PTUs in the discount.
148158
* The number of deployed PTUs exceeding the discounted PTUs (or not covered by any discount) are charged the hourly rate.
149-
* The best practice is to create deployments first, and then to apply discounts. This is to guarantee that service. capacity is available to support your deployments prior to creating a term commitment for PTUs you cannot use.
159+
* The best practice is to create deployments first, and then to apply discounts. This is to guarantee that service. capacity is available to support your deployments prior to creating a term agreement for PTUs you cannot use.
150160

151161
> [!NOTE]
152162
> When you follow best practices, you might receive hourly charges between the time you create the deployment and increase your discount (commitment or reservation).
@@ -155,12 +165,12 @@ Steps 1 and 2 are the same in all cases. The difference is whether a commitment
155165
156166
## Mapping deployments to discounting method
157167

158-
Customers using Azure OpenAI Provisioned prior to August 2024 can use either or both payment models simultaneously within a subscription. The payment model used for each deployment is determined based on its Azure OpenAI resource:
168+
Customers using Azure OpenAI Provisioned offer prior to August 2024 can use either or both payment models simultaneously within a subscription. The payment model used for each deployment is determined based on its Azure OpenAI resource:
159169

160170

161171
**Resource has an active Commitment**
162172

163-
* The commitment discounts all deployments on the resource up to the number of PTUs on the commitment. Any excess PTUs will be billed hourly.
173+
* The commitment discounts all deployments on the resource up to the number of PTUs on the commitment. Any excess PTUs will be billed hourly unless the excess PTUs are not in the scope of an active reservations. If the excess PTUs exist in the scope of an active reservation, will be discounted as a group up to the number of PTUs on the reservation and any excess spill still leftover will be billed hourly.
164174

165175
**Resource does not have an active commitment**
166176

@@ -169,9 +179,10 @@ Customers using Azure OpenAI Provisioned prior to August 2024 can use either or
169179

170180
### Changes to the existing payment mode
171181

172-
Customers that have commitments today can continue to use them at least through the end of 2024. This includes purchasing new PTUs on new or existing commitments and managing commitment renewal behaviors. However, the August update has changed certain aspects of commitment operation.
182+
Customers that have commitments today can continue to use them at least till the supported model's retirement. This includes purchasing new PTUs on new or existing commitments and managing commitment renewals. However, the August update has changed certain aspects of commitments operation.
173183

174-
- Only models released as provisioned prior to August 1, 2024 or before can be deployed on a resource with a commitment.
184+
- Azure OpenAI has stopped supporting enrollment on to new commitments, starting August 1, 2024
185+
- Only a limited set of models can be deployed on a resource with a commitment. Here is the [List of models](./provisioned-migration.md#supported-models-on-commitment-payment-model)
175186

176187
- If the deployed PTUs under a commitment exceed the committed PTUs, the hourly overage charges will be emitted against the same hourly meter as used for the new hourly/reservation payment model. This allows the overage charges to be discounted via an Azure Reservation.
177188
- It is possible to deploy more PTUs than are committed on the resource. This supports the ability to guarantee capacity availability prior to increasing the commitment size to cover it.
@@ -206,7 +217,7 @@ An alternative approach to self-service migration is to switch the reservation p
206217
* There will be a short period of double-billing or hourly charges during the switchover from committed to hourly/reservation billing.
207218

208219
> [!IMPORTANT]
209-
> Both self-service approaches generate some additional charges as the payment mode is switched from Committed to Hourly/Reservation. These are characteristics of the migration approaches and customers aren't credited for these charges. Customers can choose to use the managed migration approach described below to avoid them.
220+
> Self-service approach generates additional charges as the payment mode is switched from Committed to Hourly/Reservation. This is the characteristics of this migration approaches and customers aren't credited for these charges. Alternately, Customers can choose to use the managed migration approach described below to avoid additional charges.
210221
211222
### Managed migration
212223

articles/ai-services/openai/concepts/provisioned-throughput.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,13 @@ The provisioned throughput capability allows you to specify the amount of throug
2121
## What do the provisioned deployment types provide?
2222

2323
- **Predictable performance:** stable max latency and throughput for uniform workloads.
24-
- **Reserved processing capacity:** A deployment configures the amount of throughput. Once deployed, the throughput is available whether used or not.
24+
- **Allocated processing capacity:** A deployment configures the amount of throughput. Once deployed, the throughput is available whether used or not.
2525
- **Cost savings:** High throughput workloads might provide cost savings vs token-based consumption.
2626

27+
> [!NOTE]
28+
> Customers can take advantage of additional cost savings on provisioned deployments when they buy [Microsoft Azure OpenAI Service reservations](/azure/cost-management-billing/reservations/azure-openai#buy-a-microsoft-azure-openai-service-reservation).
29+
30+
2731
An Azure OpenAI Deployment is a unit of management for a specific OpenAI Model. A deployment provides customer access to a model for inference and integrates more features like Content Moderation ([See content moderation documentation](content-filter.md)). Global provisioned deployments are available in the same Azure OpenAI resources as all other deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with the best availability for each request. Similarly, data zone provisioned deployments are also available in the same resources as all other deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center within the Microsoft specified data zone with the best availability for each request.
2832

2933
## What do you get?
@@ -165,7 +169,7 @@ For provisioned deployments, we use a variation of the leaky bucket algorithm to
165169

166170
a. When the current utilization is above 100%, the service returns a 429 code with the `retry-after-ms` header set to the time until utilization is below 100%
167171

168-
b. Otherwise, the service estimates the incremental change to utilization required to serve the request by combining the prompt tokens, less any cacehd tokens, and the specified `max_tokens` in the call. A customer can receive up to a 100% discount on their prompt tokens depending on the size of their cached tokens. If the `max_tokens` parameter is not specified, the service estimates a value. This estimation can lead to lower concurrency than expected when the number of actual generated tokens is small. For highest concurrency, ensure that the `max_tokens` value is as close as possible to the true generation size.
172+
b. Otherwise, the service estimates the incremental change to utilization required to serve the request by combining the prompt tokens, less any cached tokens, and the specified `max_tokens` in the call. A customer can receive up to a 100% discount on their prompt tokens depending on the size of their cached tokens. If the `max_tokens` parameter is not specified, the service estimates a value. This estimation can lead to lower concurrency than expected when the number of actual generated tokens is small. For highest concurrency, ensure that the `max_tokens` value is as close as possible to the true generation size.
169173

170174
1. When a request finishes, we now know the actual compute cost for the call. To ensure an accurate accounting, we correct the utilization using the following logic:
171175

0 commit comments

Comments
 (0)