You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/concepts/provisioned-throughput.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ recommendations: false
13
13
# What is provisioned throughput?
14
14
15
15
> [!NOTE]
16
-
> The Azure OpenAI Provisioned offering received significant updates on August 12, 2024, including aligning the purchase model with Azure standards and moving to model-independent quota. It is highly recommneded that customers onboarded before this date read the Azure [OpenAI provisioned august update](../how-to/provisioned-throughput-onboarding.md) to learn more about these changes.
16
+
> The Azure OpenAI Provisioned offering received significant updates on August 12, 2024, including aligning the purchase model with Azure standards and moving to model-independent quota. It is highly recommended that customers onboarded before this date read the Azure [OpenAI provisioned August update](../provisioned-migration.md) to learn more about these changes.
17
17
18
18
The provisioned throughput capability allows you to specify the amount of throughput you require in a deployment. The service then allocates the necessary model processing capacity and ensures it's ready for you. Throughput is defined in terms of provisioned throughput units (PTU) which is a normalized way of representing the throughput for your deployment. Each model-version pair requires different amounts of PTU to deploy and provide different amounts of throughput per PTU.
19
19
@@ -67,7 +67,7 @@ az cognitiveservices account deployment create \
67
67
68
68
#### Provisioned throughput units
69
69
70
-
Provisioned throughput units (PTU) are generic units of model processing capacity that you can use to size provisioned deployments to achieve the required throughput and deploy for processing prompts and generating completions. Provisioned throughput units are granted to a subscription as quota on a regional basis, which defines the maximum number of PTUs that can be assigned to deployments in that subscription and region.
70
+
Provisioned throughput units (PTU) are generic units of model processing capacity that you can use to size provisioned deployments to achieve the required throughput for processing prompts and generating completions. Provisioned throughput units are granted to a subscription as quota on a regional basis, which defines the maximum number of PTUs that can be assigned to deployments in that subscription and region.
71
71
72
72
73
73
#### Model independent quota
@@ -80,17 +80,17 @@ The new quota shows up in Azure OpenAI Studio as a quota item named **Provisione
80
80
81
81
:::image type="content" source="../media/provisioned/ptu-quota-page.png" alt-text="Screenshot of quota UI for Azure OpenAI provisioned." lightbox="../media/provisioned/ptu-quota-page.png":::
82
82
83
-
## Obtaining PTU Quota
83
+
####Obtaining PTU Quota
84
84
85
-
Like with other offerings, PTU quota is available by default in many regions. If additional quota is required, customers can request additional quota via the Request Quota link to the right of the Provisioned Managed Throughput Unit quota item in Azure OpenAI Studio.
85
+
PTU quota is available by default in many regions. If additional quota is required, customers can request additional quota via the Request Quota link to the right of the Provisioned Managed Throughput Unit quota item in Azure OpenAI Studio.
86
86
87
87
The form will allow the customer to request an increase in PTU quota for a specified region. The customer will receive an email at the included address once the request is approved, typically within 2 business days.
88
88
89
-
## Per-Model PTU Minimums
89
+
####Per-Model PTU Minimums
90
90
91
91
The minimum PTU deployment, increments, and processing capacity associated with each unit varies by model type & version.
92
92
93
-
## Capacity transparency and quota definitions
93
+
## Capacity transparency
94
94
95
95
Azure OpenAI is a highly sought-after service where customer demand may exceed service GPU capacity. Microsoft strives to provide capacity for all in-demand regions and models, but selling out a region is always a possibility. This can limit some customers’ ability to create a deployment of their desired model, version, or number of PTUs in a desired region -- even if they have quota available in that region. Generally speaking:
96
96
@@ -99,7 +99,7 @@ Azure OpenAI is a highly sought-after service where customer demand may exceed s
99
99
- Customers use real-time information on quota/capacity availability to choose an appropriate region for their scenario with the necessary model capacity
100
100
- Scaling down or deleting a deployment releases capacity back to the region. There is no guarantee that the capacity will be available should the deployment be scaled up or re-created later.
101
101
102
-
## Regional capacity transparency
102
+
####Regional capacity guidance
103
103
104
104
To help users find the capacity needed for their deployments, customers will use a new API and Studio experience to provide real-time information on.
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -55,7 +55,7 @@ The values in the output column are the estimated value of PTU units required fo
55
55
:::image type="content" source="../media/how-to/provisioned-onboarding/capacity-calculator.png" alt-text="Screenshot of the Azure OpenAI Studio landing page." lightbox="../media/how-to/provisioned-onboarding/capacity-calculator.png":::
56
56
57
57
> [!NOTE]
58
-
> The capacity planner is an estimate based on simple input criteria. The most accurate way to determine your capacity is to benchmark a deployment with a representational workload for your use case.
58
+
> The capacity calculator provides an estimate based on simple input criteria. The most accurate way to determine your capacity is to benchmark a deployment with a representational workload for your use case.
59
59
60
60
## Understanding the Provisioned Throughput Purchase Model
61
61
@@ -64,7 +64,7 @@ Azure OpenAI Provisioned is purchased on-demand at an hourly basis based on the
64
64
The hourly model is useful for short-term deployment needs, such as validating new models or acquiring capacity for a hackathon. However, the discounts provided by the Azure Reservation for Azure OpenAI Provisioned are considerable and most customers with consistent long-term usage will find a reserved model to be a better value proposition.
65
65
66
66
> [!NOTE]
67
-
> Azure OpenAI Provisioned customers onboarded prior to August 12, 2024 use a purchase model called the Commitment model. These customers may continue to use this older purchase model alongside the current Hourly/reservation purchase model. For details on the older purchase model and options for coexistence and migration, please see the [Azure OpenAI Provisioned August Update](../provisioned-migration.md).
67
+
> Azure OpenAI Provisioned customers onboarded prior to the August self-service update use a purchase model called the Commitment model. These customers may continue to use this older purchase model alongside the Hourly/reservation purchase model. The Commitment model is not available for new customers. For details on the Commitment purchase model and options for coexistence and migration, please see the [Azure OpenAI Provisioned August Update](../provisioned-migration.md).
0 commit comments