You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/concepts/provisioned-throughput.md
+4-20Lines changed: 4 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,18 +3,17 @@ title: Azure OpenAI Service provisioned throughput
3
3
description: Learn about provisioned throughput and Azure OpenAI.
4
4
ms.service: azure-ai-openai
5
5
ms.topic: conceptual
6
-
ms.date: 08/07/2024
6
+
ms.date: 03/31/2025
7
7
manager: nitinme
8
-
author: mrbullwinkle#ChrisHMSFT
9
-
ms.author: mbullwin#chrhoder
8
+
author: aahill#ChrisHMSFT
9
+
ms.author: aahi#chrhoder
10
10
recommendations: false
11
11
---
12
12
13
13
# What is provisioned throughput?
14
14
15
15
> [!NOTE]
16
-
> The Azure OpenAI Provisioned offering received significant updates on August 12, 2024, including aligning the purchase model with Azure standards and moving to model-independent quota. It is highly recommended that customers onboarded before this date read the Azure [OpenAI provisioned August update](./provisioned-migration.md) to learn more about these changes.
17
-
16
+
> If you're looking for what's recently changed with the provisioned throughput offering, see the [update article](./provisioned-migration.md) for more information.
18
17
19
18
The provisioned throughput offering is a model deployment type that allows you to specify the amount of throughput you require in a model deployment. The Azure OpenAI service then allocates the necessary model processing capacity and ensures it's ready for you. Provisioned throughput provides:
20
19
@@ -36,21 +35,6 @@ An Azure OpenAI deployment is a unit of management for a specific OpenAI Model.
36
35
37
36
You should consider switching from standard deployments to provisioned managed deployments when you have well-defined, predictable throughput and latency requirements. Typically, this occurs when the application is ready for production or has already been deployed in production and there's an understanding of the expected traffic. This allows users to accurately forecast the required capacity and avoid unexpected billing. Provisioned managed deployments are also useful for applications that have real-time/latency sensitive requirements.
38
37
39
-
<!--
40
-
## What do you get?
41
-
42
-
43
-
| Topic | Description |
44
-
|---|---|
45
-
| What is it? |Provides guaranteed throughput at smaller increments than the existing provisioned offer. Deployments have a consistent max latency for a given model-version. |
46
-
| Who is it for? | Customers who want guaranteed throughput with minimal latency variance. |
47
-
| Quota |Provisioned Managed Throughput Unit, Global Provisioned Managed Throughput Unit, or Data Zone Provisioned Managed Throughput Unit assigned per region. Quota can be used across any available Azure OpenAI model.|
48
-
| Latency | Max latency constrained from the model. Overall latency is a factor of call shape. |
@@ -390,7 +390,7 @@ Global Standard fine-tuned deployments currently support structured outputs only
390
390
-`gpt-4o-mini-2024-07-18`
391
391
-`gpt-4o-2024-08-06`
392
392
393
-
[Provisioned managed](./deployment-types.md#provisioned) fine-tuned deployments offer [predictable performance](../concepts/provisioned-throughput.md#what-do-the-provisioned-deployment-types-provide) for fine-tuned deployments. As part of public preview, provisioned managed deployments may be created regionally via the data-plane [REST API](../reference.md#data-plane-inference) version `2024-10-01` or newer. See below for examples.
393
+
[Provisioned managed](./deployment-types.md#provisioned) fine-tuned deployments offer [predictable performance](../concepts/provisioned-throughput.md) for fine-tuned deployments. As part of public preview, provisioned managed deployments may be created regionally via the data-plane [REST API](../reference.md#data-plane-inference) version `2024-10-01` or newer. See below for examples.
394
394
395
395
Provisioned Managed fine-tuned deployments currently support structured outputs only on GPT-4o.
396
396
@@ -424,7 +424,7 @@ curl -X PUT "https://management.azure.com/subscriptions/<SUBSCRIPTION>/resourceG
424
424
425
425
#### Scaling a fine-tuned model on Provisioned Managed
426
426
427
-
To scale a fine-tuned provision managed deployment to increase or decrease PTU capacity, perform the same `PUT` REST API call as you did when [creating the deployment](#creating-a-provisioned-managed-deployment) and provide an updated `capacity` value for the `sku`. Keep in mind, provisioned deployments must scale in [minimum increments](../concepts/provisioned-throughput.md#how-much-throughput-per-ptu-you-get-for-each-model).
427
+
To scale a fine-tuned provision managed deployment to increase or decrease PTU capacity, perform the same `PUT` REST API call as you did when [creating the deployment](#creating-a-provisioned-managed-deployment) and provide an updated `capacity` value for the `sku`. Keep in mind, provisioned deployments must scale in [minimum increments](../how-to/provisioned-throughput-onboarding.md#how-much-throughput-per-ptu-you-get-for-each-model).
428
428
429
429
For example, to scale the model deployed in the previous section from 25 to 40 PTU, make another `PUT` call and increase the capacity:
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,12 @@
1
1
---
2
-
title: Azure OpenAI Service Provisioned Throughput Units (PTU) onboarding
3
-
description: Learn about provisioned throughput units onboarding and Azure OpenAI.
2
+
title: Understanding costs associated with provisioned throughput units (PTU)
3
+
description: Learn about provisioned throughput costs and billing in Azure OpenAI.
4
4
ms.service: azure-ai-openai
5
5
ms.topic: conceptual
6
-
ms.date: 10/18/2024
6
+
ms.date: 03/31/2025
7
7
manager: nitinme
8
-
author: mrbullwinkle
9
-
ms.author: mbullwin
8
+
author: aahill
9
+
ms.author: aahi
10
10
recommendations: false
11
11
---
12
12
@@ -21,14 +21,14 @@ Use this article to learn about calculating and understanding costs associated w
21
21
22
22
Provisioned throughput units (PTUs) are generic units of model processing capacity that you can use to size provisioned deployments to achieve the required throughput for processing prompts and generating completions. Provisioned throughput units are granted to a subscription as quota. Each quota is specific to a region and defines the maximum number of PTUs that can be assigned to deployments in that subscription and region.
23
23
24
-
## Understanding the provisioned throughput purchase model
24
+
## Understanding provisioned throughput billing
25
25
26
-
Azure OpenAI [Provisioned](../how-to/deployment-types.md#provisioned), [Data Zone Provisioned](../how-to/deployment-types.md#data-zone-provisioned), and [Global Provisioned](../how-to/deployment-types.md#global-provisioned) are purchased on-demand at an hourly basis based on the number of deployed PTUs, with substantial term discount available via the purchase of [Azure Reservations](#azure-reservations-for-azure-openai-provisioned-deployments).
26
+
Azure OpenAI [Provisioned](../how-to/deployment-types.md#provisioned), [Data Zone Provisioned](../how-to/deployment-types.md#data-zone-provisioned) (also known as regional), and [Global Provisioned](../how-to/deployment-types.md#global-provisioned) are purchased on-demand at an hourly basis based on the number of deployed PTUs, with substantial term discount available via the purchase of [Azure Reservations](#azure-reservations-for-azure-openai-provisioned-deployments).
27
27
28
28
The hourly model is useful for short-term deployment needs, such as validating new models or acquiring capacity for a hackathon. However, the discounts provided by the Azure Reservation for Azure OpenAI Provisioned, Data Zone Provisioned, and Global Provisioned are considerable and most customers with consistent long-term usage will find a reserved model to be a better value proposition.
29
29
30
30
> [!NOTE]
31
-
> Azure OpenAI Provisioned customers onboarded prior to the August self-service update use a purchase model called the Commitment model. These customers can continue to use this older purchase model alongside the Hourly/reservation purchase model. The Commitment model is not available for new customers or new models introduced after August 2024. For details on the Commitment purchase model and options for coexistence and migration, please see the [Azure OpenAI Provisioned August Update](../concepts/provisioned-migration.md).
31
+
> Azure OpenAI Provisioned customers onboarded prior to the August self-service update use a purchase model called the Commitment model. These customers can continue to use this older purchase model alongside the Hourly/reservation purchase model. The Commitment model is not available for new customers or [certain new models](../concepts/provisioned-migration.md#supported-models-on-commitment-payment-model) introduced after August 2024. For details on the Commitment purchase model and options for coexistence and migration, please see the [Azure OpenAI Provisioned August Update](../concepts/provisioned-migration.md).
32
32
33
33
34
34
## Model independent quota
@@ -49,7 +49,7 @@ Quota for provisioned deployments shows up in Azure AI Foundry as the following
49
49
50
50
51
51
> [!NOTE]
52
-
> Global provisioned and data zone provisioned deployments are only supported for gpt-4o and gpt-4o-mini models at this time. For more information on model availability, review the [models documentation](./models.md).
52
+
> Global provisioned and data zone provisioned deployments are only supported for gpt-4o and gpt-4o-mini models at this time. For more information on model availability, review the [models documentation](../concepts/models.md).
53
53
54
54
## Hourly usage
55
55
@@ -108,7 +108,7 @@ PTU quota is available by default in many regions. If more quota is required, cu
108
108
109
109
### Per-Model PTU minimums
110
110
111
-
The minimum PTU deployment, increments, and processing capacity associated with each unit varies by model type & version.
111
+
The minimum PTU deployment, increments, and processing capacity associated with each unit varies by model type & version. See the above [table](#how-much-throughput-per-ptu-you-get-for-each-model) for more information.
@@ -289,7 +289,7 @@ Provisioned deployments support distinct model management practices. Provisioned
289
289
### Prerequisites
290
290
- Validate that the target model version or model family is supported for your existing deployment type. Migrations can only occur between provisioned deployments of the same deployment type. For more information on deployment types, review the [deployment type documentation](./deployment-types.md).
291
291
- Validate capacity availability for your target model version or model family prior to attempting a migration. For more information on determining capacity availability, review the [capacity transparency documentation](../concepts/provisioned-throughput.md#capacity-transparency).
292
-
- For multi-deployment migrations, validate that you have sufficient quota to support multiple deployments simultaneously. For more information on how to validate quota for each provisioned deployment type, review the [provisioned quota documentation](../concepts/provisioned-throughput.md#quota).
292
+
- For multi-deployment migrations, validate that you have sufficient quota to support multiple deployments simultaneously. For more information on how to validate quota for each provisioned deployment type, review the [provisioned throughput cost documentation](../how-to/provisioned-throughput-onboarding.md).
293
293
294
294
### In-place migrations for provisioned deployments
295
295
In-place migrations allow you to maintain the same provisioned deployment name and size while changing the model version or model family assigned to that deployment. With in-place migrations, Azure OpenAI Service takes care of migrating any existing traffic between model versions or model families throughout the migration over a 20-30 minute window. Throughout the migration window, your provisioned deployment will display an "updating" provisioned state. You can continue to use your provisioned deployment as you normally would. Once the in-place migration is complete, the provisioned state will be updated to "succeeded", indicating that all traffic has been migrated over to the target model version or model family.
0 commit comments