Skip to content

Commit 06a5488

Browse files
committed
doc updates
1 parent 37a3ce1 commit 06a5488

12 files changed

+158
-180
lines changed

articles/ai-services/openai/concepts/provisioned-throughput.md

Lines changed: 29 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,10 @@ recommendations: false
1212

1313
# What is provisioned throughput?
1414

15-
The provisioned throughput capability allows you to specify the amount of throughput you require in a deployment. The service then allocates the necessary model processing capacity and ensures it's ready for you. Throughput is defined in terms of provisioned throughput units (PTU) which is a normalized way of representing the throughput for your deployment. Each model-version pair requires different amounts of PTU to deploy and provide different amounts of throughput per PTU. For more information on the PTU model, see the [migration guide](../provisioned-migration.md).
15+
> [!NOTE]
16+
> The Azure OpenAI Provisioned offering received significant updates on August 12, 2024, including aligning the purchase model with Azure standards and moving to model-independent quota. It is highly recommneded that customers onboarded before this date read the Azure [OpenAI provisioned august update](../how-to/provisioned-throughput-onboarding.md) to learn more about these changes.
17+
18+
The provisioned throughput capability allows you to specify the amount of throughput you require in a deployment. The service then allocates the necessary model processing capacity and ensures it's ready for you. Throughput is defined in terms of provisioned throughput units (PTU) which is a normalized way of representing the throughput for your deployment. Each model-version pair requires different amounts of PTU to deploy and provide different amounts of throughput per PTU.
1619

1720
## What does the provisioned deployment type provide?
1821

@@ -22,9 +25,6 @@ The provisioned throughput capability allows you to specify the amount of throug
2225

2326
An Azure OpenAI Deployment is a unit of management for a specific OpenAI Model. A deployment provides customer access to a model for inference and integrates more features like Content Moderation ([See content moderation documentation](content-filter.md)).
2427

25-
> [!NOTE]
26-
> Provisioned throughput unit (PTU) quota is different from standard quota in Azure OpenAI and is not available by default. To learn more about this offering contact your Microsoft Account Team.
27-
2828
## What do you get?
2929

3030
| Topic | Provisioned|
@@ -36,12 +36,6 @@ An Azure OpenAI Deployment is a unit of management for a specific OpenAI Model.
3636
| Utilization | Provisioned-managed Utilization measure provided in Azure Monitor. |
3737
| Estimating size | Provided calculator in the studio & benchmarking script. |
3838

39-
### Hourly/reservation commercial model
40-
41-
On July 29th 2024, Microsoft switched to an hourly/reservation PTU offering that offers usability improvements. For more details, see the [PTU migration article](../provisioned-migration.md#whats-changing).
42-
43-
[!INCLUDE [hourly-ptu-description](../includes/hourly-ptu-description.md)]
44-
4539
## What models and regions are available for provisioned throughput?
4640

4741
[!INCLUDE [Provisioned](../includes/model-matrix/provisioned-models.md)]
@@ -51,10 +45,6 @@ On July 29th 2024, Microsoft switched to an hourly/reservation PTU offering that
5145
5246
## Key concepts
5347

54-
### Provisioned throughput units
55-
56-
Provisioned throughput units (PTU) are units of model processing capacity that you can reserve and deploy for processing prompts and generating completions. The minimum PTU deployment, increments, and processing capacity associated with each unit varies by model type & version.
57-
5848
### Deployment types
5949

6050
When creating a provisioned deployment in Azure OpenAI Studio, the deployment type on the Create Deployment dialog is Provisioned-Managed.
@@ -75,34 +65,49 @@ az cognitiveservices account deployment create \
7565

7666
### Quota
7767

68+
#### Provisioned throughput units
69+
70+
Provisioned throughput units (PTU) are generic units of model processing capacity that you can use to size provisioned deployments to achieve the required throughput and deploy for processing prompts and generating completions. Provisioned throughput units are granted to a subscription as quota on a regional basis, which defines the maximum number of PTUs that can be assigned to deployments in that subscription and region.
71+
72+
7873
#### Model independent quota
7974

80-
Provisioned quota is granted on a per subscription/region basis, and unlike Standard offering quota, is model-independent. Each quota item per subscription and region limits the total number of PTUs that can be deployed across all supported models and versions.
75+
Unlike TPM quota used by other Azure OpenAI offerings, PTUs are model-independent. The PTUs may be used to deploy any supported model/version in the region.
8176

8277
:::image type="content" source="../media/provisioned/model-independent-quota.png" alt-text="Diagram of model independent quota with one pool of PTUs available to multiple Azure OpenAI models." lightbox="../media/provisioned/model-independent-quota.png":::
8378

84-
The new quota shows up in the AI Studio and Azure OpenAI Studio as a quota item named **Provisioned Managed Throughput Unit**. In the Studio Quota pane, expanding the quota item will show the deployments contributing to usage of the quota.
79+
The new quota shows up in Azure OpenAI Studio as a quota item named **Provisioned Managed Throughput Unit**. In the Studio Quota pane, expanding the quota item will show the deployments contributing to usage of the quota.
80+
81+
:::image type="content" source="../media/provisioned/ptu-quota-page.png" alt-text="Screenshot of quota UI for Azure OpenAI provisioned." lightbox="../media/provisioned/ptu-quota-page.png":::
82+
83+
## Obtaining PTU Quota
84+
85+
Like with other offerings, PTU quota is available by default in many regions. If additional quota is required, customers can request additional quota via the Request Quota link to the right of the Provisioned Managed Throughput Unit quota item in Azure OpenAI Studio.
86+
87+
The form will allow the customer to request an increase in PTU quota for a specified region. The customer will receive an email at the included address once the request is approved, typically within 2 business days.
88+
89+
## Per-Model PTU Minimums
8590

86-
:::image type="content" source="../media/provisioned/quota.png" alt-text="Screenshot of quota UI for Azure OpenAI provisioned." lightbox="../media/provisioned/quota.png":::
91+
The minimum PTU deployment, increments, and processing capacity associated with each unit varies by model type & version.
8792

8893
## Capacity transparency and quota definitions
8994

9095
Azure OpenAI is a highly sought-after service where customer demand may exceed service GPU capacity. Microsoft strives to provide capacity for all in-demand regions and models, but selling out a region is always a possibility. This can limit some customers’ ability to create a deployment of their desired model, version, or number of PTUs in a desired region -- even if they have quota available in that region. Generally speaking:
9196

92-
- Quota places a limit on the maximum number of PTUs that can be deployed in a subscription and region, but is not a guarantee of capacity availability. This is algned with how quota works for other Azure services, such as VMs.
93-
- Capacity is allocated to a customer at deployment time and is held for as long as the deployment exists. If service capacity is not available, the deployment will fail
97+
- Quota places a limit on the maximum number of PTUs that can be deployed in a subscription and region, and is not a guarantee of capacity availability.
98+
- Capacity is allocated at deployment time and is held for as long as the deployment exists. If service capacity is not available, the deployment will fail
9499
- Customers use real-time information on quota/capacity availability to choose an appropriate region for their scenario with the necessary model capacity
95-
- Scaling down or deleting a deployment releases capacity back to the region. There is no guarantee that the capacity will be available should the customer scale up or re-create the deployment later.
100+
- Scaling down or deleting a deployment releases capacity back to the region. There is no guarantee that the capacity will be available should the deployment be scaled up or re-created later.
96101

97102
## Regional capacity transparency
98103

99-
To assist users to find the capacity needed for their deployments, customers will use a new API and Studio experience to provide real-time information on.
104+
To help users find the capacity needed for their deployments, customers will use a new API and Studio experience to provide real-time information on.
100105

101-
In AI Studio and Azure OpenAI Studio, the deployment experience will identify when a region lacks the capacity to support the desired model, version and number of PTUs, and will direct the user to a select an alternative region when needed.
106+
In Azure OpenAI Studio, the deployment experience will identify when a region lacks the capacity to support the desired model, version and number of PTUs, and will direct the user to a select an alternative region when needed.
102107

103-
:::image type="content" source="../media/provisioned/check-capacity.png" alt-text="Screenshot of the check capacity experience for quota for Azure OpenAI provisioned." lightbox="../media/provisioned/check-capacity.png":::
108+
<!--:::image type="content" source="../media/provisioned/check-capacity.png" alt-text="Screenshot of the check capacity experience for quota for Azure OpenAI provisioned." lightbox="../media/provisioned/check-capacity.png":::-->
104109

105-
Details on the new deployment experience can be found in the updated Azure OpenAI [provisioned onboarding guide](../how-to/provisioned-throughput-onboarding.md).
110+
Details on the new deployment experience can be found in the Azure OpenAI [Provisioned get started guide](../how-to/provisioned-throughput-onboarding.md).
106111

107112
The new [model capacities API](/rest/api/aiservices/accountmanagement/model-capacities/list?view=rest-aiservices-accountmanagement-2024-04-01-preview&tabs=HTTP&preserve-view=true) can also be used to programmatically identify the maximum sized deployment of a specified model that can be created in each region based on the availability of both quota in the subscription and service capacity in the region.
108113

articles/ai-services/openai/how-to/provisioned-get-started.md

Lines changed: 57 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,24 +14,47 @@ recommendations: false
1414

1515
# Get started using Provisioned Deployments on the Azure OpenAI Service
1616

17-
The following guide walks you through setting up a provisioned deployment with your Azure OpenAI Service resource.
17+
The following guide walks you through key steps in creating a provisioned deployment with your Azure OpenAI Service resource. For more details on the concepts discussed here, see:
18+
* [Azure OpenAI Provisioned Onboarding Guide](./provisioned-throughput-onboarding.md)
19+
* [Azure OpenAI Provisioned Concepts](../concepts/provisioned-throughput.md)
1820

1921
## Prerequisites
2022

2123
- An Azure subscription - [Create one for free](https://azure.microsoft.com/free/cognitive-services?azure-portal=true)
22-
- Access granted to Azure OpenAI in the desired Azure subscription.
23-
Currently, access to this service is by application. You can apply for access to Azure OpenAI Service by completing the form at [https://aka.ms/oai/access](https://aka.ms/oai/access?azure-portal=true).
24-
- Obtained Quota for a provisioned deployment and purchased a commitment.
24+
- Assignment of the Contributor or Cognitive Services Contributor role to user in the subscription.
25+
- Access to Azure OpenAI Studio
2526

26-
> [!NOTE]
27-
> Provisioned Throughput Units (PTU) are different from standard quota in Azure OpenAI and are not available by default. To learn more about this offering contact your Microsoft Account Team.
27+
## Obtain/verify PTU quota availability.
28+
29+
Provisioned throughput deployments are sized in units called Provisioned Throughput Units (PTUs). PTU quota is granted to a subscription regionally and limits the total number of PTUs that can be deployed in that region across all models and versions.
30+
31+
Creating a new deployment requires available (unused) quota to cover the desired size of the deployment. For example: If a subscription has the following in South Central US:
32+
33+
* Total PTU Quota = 500 PTUs
34+
* Deployments:
35+
* 100 PTUs: GPT-4o, 2024-05-13
36+
* 100 PTUs: GPT-4, 0613
37+
38+
Then 200 PTUs of quota are considered used, and there are 300 PTUs available for use to create new deployments.
2839

40+
A default amount of PTU quota is assigned to all subscriptions in several regions. You can view the quota available to you in a region by visiting the Quotas blade in Azure OpenAI Studio and selecting the desired subscription and region. For example, the screenshot below shows a quota limit of 500 PTUs in West US for the selected subscription. (Note: You may see lower values of available default quota).
41+
42+
:::image type="content" source="../media/provisioned/available-quota.png" alt-text="A screenshot of the available quota in Azure OpenAI studio." lightbox="../media/provisioned/available-quota.png":::
43+
44+
Additional quota can be requested by clicking the Request Quota link to the right of the “Usage/Limit” column. (This is off-screen in the screenshot above).
45+
46+
## Create an Azure OpenAI resource
47+
48+
Provisioned Throughput deployments are created via Azure OpenAI resource objects within Azure. You must have an Azure OpenAI resource in each region where you intend to create a deployment. Use the Azure portal to [create a resource](./create-resource.md) in a region with available quota, if required.
49+
50+
> [!NOTE]
51+
> Azure OpenAI resources can be used with all types of Azure OpenAI deployments. There is no requirement to create dedicated resources for your provisioned deployment.
2952
30-
## Create your provisioned deployment
53+
## Create your provisioned deployment - capacity is available
3154

3255
After you purchase a commitment on your quota, you can create a deployment. To create a provisioned deployment, you can follow these steps; the choices described reflect the entries shown in the screenshot.
3356

34-
:::image type="content" source="../media/provisioned/deployment-screen.jpg" alt-text="Screenshot of the Azure OpenAI Studio deployment page for a provisioned deployment." lightbox="../media/provisioned/deployment-screen.jpg":::
57+
:::image type="content" source="../media/provisioned/deployment-screen.png" alt-text="Screenshot of the Azure OpenAI Studio deployment page for a provisioned deployment." lightbox="../media/provisioned/deployment-screen.png":::
3558

3659

3760

@@ -50,6 +73,9 @@ After you purchase a commitment on your quota, you can create a deployment. To c
5073
| Deployment Type |This impacts the throughput and performance. Choose Provisioned-Managed for your provisioned deployment | Provisioned-Managed |
5174
| Provisioned Throughput Units | Choose the amount of throughput you wish to include in the deployment. | 100 |
5275

76+
Important things to note:
77+
* The deployment dialog contains a reminder that you can purchase an Azure Reservation for Azure OpenAI Provisioned to obtain a significant discount for a term commitment.
78+
* There is a message that tells you the list, hourly price of the deployment that you would be charged if this deployment is not covered by a reservation. This is a list price that does not include any negotiated discounts for your company.
5379

5480
If you wish to create your deployment programmatically, you can do so with the following Azure CLI command. Update the `sku-capacity` with the desired number of provisioned throughput units.
5581

@@ -67,7 +93,29 @@ az cognitiveservices account deployment create \
6793

6894
REST, ARM template, Bicep and Terraform can also be used to create deployments. See the section on automating deployments in the [Managing Quota](quota.md?tabs=rest#automate-deployment) how-to guide and replace the `sku.name` with "ProvisionedManaged" rather than "Standard."
6995

70-
## Make your first calls
96+
## Create your provisioned deployment – Capacity is not available
97+
98+
Due to the dynamic nature of capacity availability, it is possible that the region of your selected resource may not have the service capacity to create the deployment of the specified model, version and number of PTUs.
99+
100+
In this event, Azure OpenAI Studio will direct you to other regions with available quota and capacity to create a deployment of the desired model. If this happens, the deployment dialog will look like this:
101+
102+
:::image type="content" source="../media/provisioned/deployment-screen-2.png" alt-text="Screenshot of the Azure OpenAI Studio deployment page for a provisioned deployment with no capacity available." lightbox="../media/provisioned/deployment-screen-2.png":::
103+
104+
Things to notice:
105+
106+
* A message displays showing you many PTUs you have in available quota, and how many can currently be deployed at this time.
107+
* If you select a number of PTUs greater than service capacity, a message will appear that provides options for you to obtain more capacity, and a button to allow you to select an alternate region. Clicking the "See other regions" button will display a dialog that shows a list of Azure OpenAI resources where you can create a deployment, along with the maximum sized deployment that can be created based on available quota and service capacity in each region.
108+
109+
:::image type="content" source="../media/provisioned/choose-different-resource.png" alt-text="Screenshot of the Azure OpenAI Studio deployment page for choosing a different resource and region." lightbox="../media/provisioned/choose-different-resource.png":::
110+
111+
Selecting a resource and clicking **Switch resource** will cause the deployment dialog to redisplay using the selected resource. You can then proceed to create your deployment in the new region.
112+
113+
Learn more about the purchase model and how to purchase a reservation:
114+
115+
* [Azure OpenAI provisioned onboarding guide](./provisioned-throughput-onboarding.md)
116+
* [Guide for Azure OpenAI provisioned reservations](../concepts/provisioned-throughput.md)
117+
118+
## Make your first inferencing calls
71119
The inferencing code for provisioned deployments is the same a standard deployment type. The following code snippet shows a chat completions call to a GPT-4 model. For your first time using these models programmatically, we recommend starting with our [quickstart guide](../quickstart.md). Our recommendation is to use the OpenAI library with version 1.0 or greater since this includes retry logic within the library.
72120

73121

0 commit comments

Comments
 (0)