Skip to content

Commit f78c132

Browse files
aahilldhuntley1023
andauthored
Apply suggestions from code review
Co-authored-by: David Huntley <[email protected]>
1 parent 00771b6 commit f78c132

File tree

2 files changed

+9
-9
lines changed

2 files changed

+9
-9
lines changed

articles/ai-services/openai/concepts/provisioned-throughput.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ recommendations: false
1313
# What is provisioned throughput?
1414

1515
> [!NOTE]
16-
> The Azure OpenAI Provisioned offering received significant updates on August 12, 2024, including aligning the purchase model with Azure standards and moving to model-independent quota. It is highly recommneded that customers onboarded before this date read the Azure [OpenAI provisioned august update](../how-to/provisioned-throughput-onboarding.md) to learn more about these changes.
16+
> The Azure OpenAI Provisioned offering received significant updates on August 12, 2024, including aligning the purchase model with Azure standards and moving to model-independent quota. It is highly recommended that customers onboarded before this date read the Azure [OpenAI provisioned August update](../provisioned-migration.md) to learn more about these changes.
1717
1818
The provisioned throughput capability allows you to specify the amount of throughput you require in a deployment. The service then allocates the necessary model processing capacity and ensures it's ready for you. Throughput is defined in terms of provisioned throughput units (PTU) which is a normalized way of representing the throughput for your deployment. Each model-version pair requires different amounts of PTU to deploy and provide different amounts of throughput per PTU.
1919

@@ -67,7 +67,7 @@ az cognitiveservices account deployment create \
6767

6868
#### Provisioned throughput units
6969

70-
Provisioned throughput units (PTU) are generic units of model processing capacity that you can use to size provisioned deployments to achieve the required throughput and deploy for processing prompts and generating completions. Provisioned throughput units are granted to a subscription as quota on a regional basis, which defines the maximum number of PTUs that can be assigned to deployments in that subscription and region.
70+
Provisioned throughput units (PTU) are generic units of model processing capacity that you can use to size provisioned deployments to achieve the required throughput for processing prompts and generating completions. Provisioned throughput units are granted to a subscription as quota on a regional basis, which defines the maximum number of PTUs that can be assigned to deployments in that subscription and region.
7171

7272

7373
#### Model independent quota
@@ -80,17 +80,17 @@ The new quota shows up in Azure OpenAI Studio as a quota item named **Provisione
8080

8181
:::image type="content" source="../media/provisioned/ptu-quota-page.png" alt-text="Screenshot of quota UI for Azure OpenAI provisioned." lightbox="../media/provisioned/ptu-quota-page.png":::
8282

83-
## Obtaining PTU Quota
83+
#### Obtaining PTU Quota
8484

85-
Like with other offerings, PTU quota is available by default in many regions. If additional quota is required, customers can request additional quota via the Request Quota link to the right of the Provisioned Managed Throughput Unit quota item in Azure OpenAI Studio.
85+
PTU quota is available by default in many regions. If additional quota is required, customers can request additional quota via the Request Quota link to the right of the Provisioned Managed Throughput Unit quota item in Azure OpenAI Studio.
8686

8787
The form will allow the customer to request an increase in PTU quota for a specified region. The customer will receive an email at the included address once the request is approved, typically within 2 business days.
8888

89-
## Per-Model PTU Minimums
89+
#### Per-Model PTU Minimums
9090

9191
The minimum PTU deployment, increments, and processing capacity associated with each unit varies by model type & version.
9292

93-
## Capacity transparency and quota definitions
93+
## Capacity transparency
9494

9595
Azure OpenAI is a highly sought-after service where customer demand may exceed service GPU capacity. Microsoft strives to provide capacity for all in-demand regions and models, but selling out a region is always a possibility. This can limit some customers’ ability to create a deployment of their desired model, version, or number of PTUs in a desired region -- even if they have quota available in that region. Generally speaking:
9696

@@ -99,7 +99,7 @@ Azure OpenAI is a highly sought-after service where customer demand may exceed s
9999
- Customers use real-time information on quota/capacity availability to choose an appropriate region for their scenario with the necessary model capacity
100100
- Scaling down or deleting a deployment releases capacity back to the region. There is no guarantee that the capacity will be available should the deployment be scaled up or re-created later.
101101

102-
## Regional capacity transparency
102+
#### Regional capacity guidance
103103

104104
To help users find the capacity needed for their deployments, customers will use a new API and Studio experience to provide real-time information on.
105105

articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ The values in the output column are the estimated value of PTU units required fo
5555
:::image type="content" source="../media/how-to/provisioned-onboarding/capacity-calculator.png" alt-text="Screenshot of the Azure OpenAI Studio landing page." lightbox="../media/how-to/provisioned-onboarding/capacity-calculator.png":::
5656

5757
> [!NOTE]
58-
> The capacity planner is an estimate based on simple input criteria. The most accurate way to determine your capacity is to benchmark a deployment with a representational workload for your use case.
58+
> The capacity calculator provides an estimate based on simple input criteria. The most accurate way to determine your capacity is to benchmark a deployment with a representational workload for your use case.
5959
6060
## Understanding the Provisioned Throughput Purchase Model
6161

@@ -64,7 +64,7 @@ Azure OpenAI Provisioned is purchased on-demand at an hourly basis based on the
6464
The hourly model is useful for short-term deployment needs, such as validating new models or acquiring capacity for a hackathon.  However, the discounts provided by the Azure Reservation for Azure OpenAI Provisioned are considerable and most customers with consistent long-term usage will find a reserved model to be a better value proposition.
6565

6666
> [!NOTE]
67-
> Azure OpenAI Provisioned customers onboarded prior to August 12, 2024 use a purchase model called the Commitment model. These customers may continue to use this older purchase model alongside the current Hourly/reservation purchase model. For details on the older purchase model and options for coexistence and migration, please see the [Azure OpenAI Provisioned August Update](../provisioned-migration.md).
67+
> Azure OpenAI Provisioned customers onboarded prior to the August self-service update use a purchase model called the Commitment model. These customers may continue to use this older purchase model alongside the Hourly/reservation purchase model. The Commitment model is not available for new customers. For details on the Commitment purchase model and options for coexistence and migration, please see the [Azure OpenAI Provisioned August Update](../provisioned-migration.md).
6868
6969
## Hourly Usage 
7070

0 commit comments

Comments
 (0)