MicrosoftDocs
diff --git a/‎articles/ai-services/openai/concepts/provisioned-throughput.md
Lines changed: 29 additions & 24 deletions b/‎articles/ai-services/openai/concepts/provisioned-throughput.md
Lines changed: 29 additions & 24 deletions
diff --git a/‎articles/ai-services/openai/how-to/provisioned-get-started.md
Lines changed: 57 additions & 9 deletions b/‎articles/ai-services/openai/how-to/provisioned-get-started.md
Lines changed: 57 additions & 9 deletions
@@ -12,7 +12,10 @@ recommendations: false
 
 # What is provisioned throughput?
 
-The provisioned throughput capability allows you to specify the amount of throughput you require in a deployment. The service then allocates the necessary model processing capacity and ensures it's ready for you. Throughput is defined in terms of provisioned throughput units (PTU) which is a normalized way of representing the throughput for your deployment. Each model-version pair requires different amounts of PTU to deploy and provide different amounts of throughput per PTU. For more information on the PTU model, see the [migration guide](../provisioned-migration.md).  
+> [!NOTE]
+> The Azure OpenAI Provisioned offering received significant updates on August 12, 2024, including aligning the purchase model with Azure standards and moving to model-independent quota. It is highly recommneded that customers onboarded before this date read the Azure [OpenAI provisioned august update](../how-to/provisioned-throughput-onboarding.md) to learn more about these changes.
+ 
+The provisioned throughput capability allows you to specify the amount of throughput you require in a deployment. The service then allocates the necessary model processing capacity and ensures it's ready for you. Throughput is defined in terms of provisioned throughput units (PTU) which is a normalized way of representing the throughput for your deployment. Each model-version pair requires different amounts of PTU to deploy and provide different amounts of throughput per PTU. 
 
 ## What does the provisioned deployment type provide?
 
@@ -22,9 +25,6 @@ The provisioned throughput capability allows you to specify the amount of throug
 
 An Azure OpenAI Deployment is a unit of management for a specific OpenAI Model. A deployment provides customer access to a model for inference and integrates more features like Content Moderation ([See content moderation documentation](content-filter.md)).
 
-> [!NOTE]
-> Provisioned throughput unit (PTU) quota is different from standard quota in Azure OpenAI and is not available by default. To learn more about this offering contact your Microsoft Account Team.
-
 ## What do you get?
 
 | Topic | Provisioned|
@@ -36,12 +36,6 @@ An Azure OpenAI Deployment is a unit of management for a specific OpenAI Model.
 | Utilization | Provisioned-managed Utilization measure provided in Azure Monitor. |
 | Estimating size | Provided calculator in the studio & benchmarking script. |
 
-### Hourly/reservation commercial model
-
-On July 29th 2024, Microsoft switched to an hourly/reservation PTU offering that offers usability improvements. For more details, see the [PTU migration article](../provisioned-migration.md#whats-changing).
-
-[!INCLUDE [hourly-ptu-description](../includes/hourly-ptu-description.md)]
-
 ## What models and regions are available for provisioned throughput?
 
 [!INCLUDE [Provisioned](../includes/model-matrix/provisioned-models.md)]
@@ -51,10 +45,6 @@ On July 29th 2024, Microsoft switched to an hourly/reservation PTU offering that
 
 ## Key concepts
 
-### Provisioned throughput units
-
-Provisioned throughput units (PTU) are units of model processing capacity that you can reserve and deploy for processing prompts and generating completions. The minimum PTU deployment, increments, and processing capacity associated with each unit varies by model type & version. 
-
 ### Deployment types
 
 When creating a provisioned deployment in Azure OpenAI Studio, the deployment type on the Create Deployment dialog is Provisioned-Managed.
@@ -75,34 +65,49 @@ az cognitiveservices account deployment create \
 
 ### Quota
 
+#### Provisioned throughput units 
+
+Provisioned throughput units (PTU) are generic units of model processing capacity that you can use to size provisioned deployments to achieve the required throughput  and deploy for processing prompts and generating completions.   Provisioned throughput units are granted to a subscription as quota on a regional basis, which defines the maximum number of PTUs that can be assigned to deployments in that subscription and region.
+
+
 #### Model independent quota
 
-Provisioned quota is granted on a per subscription/region basis, and unlike Standard offering quota, is model-independent.  Each quota item per subscription and region limits the total number of PTUs that can be deployed across all supported models and versions.
+Unlike TPM quota used by other Azure OpenAI offerings, PTUs are model-independent. The PTUs may be used to deploy any supported model/version in the region. 
 
 :::image type="content" source="../media/provisioned/model-independent-quota.png" alt-text="Diagram of model independent quota with one pool of PTUs available to multiple Azure OpenAI models." lightbox="../media/provisioned/model-independent-quota.png":::
 
-The new quota shows up in the AI Studio and Azure OpenAI Studio as a quota item named **Provisioned Managed Throughput Unit**.  In the Studio Quota pane, expanding the quota item will show the deployments contributing to usage of the quota.
+The new quota shows up in Azure OpenAI Studio as a quota item named **Provisioned Managed Throughput Unit**.  In the Studio Quota pane, expanding the quota item will show the deployments contributing to usage of the quota.
+
+:::image type="content" source="../media/provisioned/ptu-quota-page.png" alt-text="Screenshot of quota UI for Azure OpenAI provisioned." lightbox="../media/provisioned/ptu-quota-page.png":::
+
+## Obtaining PTU Quota 
+
+Like with other offerings, PTU quota is available by default in many regions. If additional quota is required, customers can request additional quota via the Request Quota link to the right of the Provisioned Managed Throughput Unit quota item in Azure OpenAI Studio. 
+
+The form will allow the customer to request an increase in PTU quota for a specified region. The customer will receive an email at the included address once the request is approved, typically within 2 business days. 
+
+## Per-Model PTU Minimums 
 
-:::image type="content" source="../media/provisioned/quota.png" alt-text="Screenshot of quota UI for Azure OpenAI provisioned." lightbox="../media/provisioned/quota.png":::
+The minimum PTU deployment, increments, and processing capacity associated with each unit varies by model type & version. 
 
 ## Capacity transparency and quota definitions
 
 Azure OpenAI is a highly sought-after service where customer demand may exceed service GPU capacity. Microsoft strives to provide capacity for all in-demand regions and models, but selling out a region is always a possibility. This can limit some customers’ ability to create a deployment of their desired model, version, or number of PTUs in a desired region -- even if they have quota available in that region. Generally speaking:
 
-- Quota places a limit on the maximum number of PTUs that can be deployed in a subscription and region, but is not a guarantee of capacity availability. This is algned with how quota works for other Azure services, such as VMs.
-- Capacity is allocated to a customer at deployment time and is held for as long as the deployment exists.  If service capacity is not available, the deployment will fail
+- Quota places a limit on the maximum number of PTUs that can be deployed in a subscription and region, and is not a guarantee of capacity availability. 
+- Capacity is allocated at deployment time and is held for as long as the deployment exists.  If service capacity is not available, the deployment will fail
 - Customers use real-time information on quota/capacity availability to choose an appropriate region for their scenario with the necessary model capacity
-- Scaling down or deleting a deployment releases capacity back to the region.  There is no guarantee that the capacity will be available should the customer scale up or re-create the deployment later.
+- Scaling down or deleting a deployment releases capacity back to the region.  There is no guarantee that the capacity will be available should the deployment be scaled up or re-created later.
 
 ## Regional capacity transparency
 
-To assist users to find the capacity needed for their deployments, customers will use a new API and Studio experience to provide real-time information on.
+To help users find the capacity needed for their deployments, customers will use a new API and Studio experience to provide real-time information on.
 
-In AI Studio and Azure OpenAI Studio, the deployment experience will identify when a region lacks the capacity to support the desired model, version and number of PTUs, and will direct the user to a select an alternative region when needed.
+In Azure OpenAI Studio, the deployment experience will identify when a region lacks the capacity to support the desired model, version and number of PTUs, and will direct the user to a select an alternative region when needed.
 
-:::image type="content" source="../media/provisioned/check-capacity.png" alt-text="Screenshot of the check capacity experience for quota for Azure OpenAI provisioned." lightbox="../media/provisioned/check-capacity.png":::
+<!--:::image type="content" source="../media/provisioned/check-capacity.png" alt-text="Screenshot of the check capacity experience for quota for Azure OpenAI provisioned." lightbox="../media/provisioned/check-capacity.png":::-->
 
-Details on the new deployment experience can be found in the updated Azure OpenAI [provisioned onboarding guide](../how-to/provisioned-throughput-onboarding.md).
+Details on the new deployment experience can be found in the Azure OpenAI [Provisioned get started guide](../how-to/provisioned-throughput-onboarding.md).
 
 The new [model capacities API](/rest/api/aiservices/accountmanagement/model-capacities/list?view=rest-aiservices-accountmanagement-2024-04-01-preview&tabs=HTTP&preserve-view=true) can also be used to programmatically identify the maximum sized deployment of a specified model that can be created in each region based on the availability of both quota in the subscription and service capacity in the region.
 
 
@@ -14,24 +14,47 @@ recommendations: false
 
 # Get started using Provisioned Deployments on the Azure OpenAI Service
 
-The following guide walks you through setting up a provisioned deployment with your Azure OpenAI Service resource. 
+The following guide walks you through key steps in creating a provisioned deployment with your Azure OpenAI Service resource. For more details on the concepts discussed here, see:
+* [Azure OpenAI Provisioned Onboarding Guide](./provisioned-throughput-onboarding.md)
+* [Azure OpenAI Provisioned Concepts](../concepts/provisioned-throughput.md) 
 
 ## Prerequisites
 
 - An Azure subscription - [Create one for free](https://azure.microsoft.com/free/cognitive-services?azure-portal=true)
-- Access granted to Azure OpenAI in the desired Azure subscription.
-    Currently, access to this service is by application. You can apply for access to Azure OpenAI Service by completing the form at [https://aka.ms/oai/access](https://aka.ms/oai/access?azure-portal=true).
-- Obtained Quota for a provisioned deployment and purchased a commitment. 
+- Assignment of the Contributor or Cognitive Services Contributor role to user in the subscription.
+- Access to Azure OpenAI Studio
 
-> [!NOTE]
-> Provisioned Throughput Units (PTU) are different from standard quota in Azure OpenAI and are not available by default. To learn more about this offering contact your Microsoft Account Team.
+## Obtain/verify PTU quota availability.
+
+Provisioned throughput deployments are sized in units called Provisioned Throughput Units (PTUs). PTU quota is granted to a subscription regionally and limits the total number of PTUs that can be deployed in that region across all models and versions. 
+
+Creating a new deployment requires available (unused) quota to cover the desired size of the deployment. For example: If a subscription has the following in South Central US: 
+
+* Total PTU Quota = 500 PTUs 
+* Deployments: 
+    * 100 PTUs: GPT-4o, 2024-05-13 
+    * 100 PTUs: GPT-4, 0613 
+
+Then 200 PTUs of quota are considered used, and there are 300 PTUs available for use to create new deployments. 
 
+A default amount of PTU quota is assigned to all subscriptions in several regions.  You can view the quota available to you in a region by visiting the Quotas blade in Azure OpenAI Studio and selecting the desired subscription and region.  For example, the screenshot below shows a quota limit of 500 PTUs in West US for the selected subscription.  (Note: You may see lower values of available default quota). 
+ 
+:::image type="content" source="../media/provisioned/available-quota.png" alt-text="A screenshot of the available quota in Azure OpenAI studio." lightbox="../media/provisioned/available-quota.png":::
+
+Additional quota can be requested by clicking the Request Quota link to the right of the “Usage/Limit” column.  (This is off-screen in the screenshot above). 
+
+## Create an Azure OpenAI resource 
+
+Provisioned Throughput deployments are created via Azure OpenAI resource objects within Azure. You must have an Azure OpenAI resource in each region where you intend to create a deployment. Use the Azure portal to [create a resource](./create-resource.md) in a region with available quota, if required.  
+
+> [!NOTE]
+> Azure OpenAI resources can be used with all types of Azure OpenAI deployments. There is no requirement to create dedicated resources for your provisioned deployment. 
 
-## Create your provisioned deployment
+## Create your provisioned deployment - capacity is available
 
 After you purchase a commitment on your quota, you can create a deployment. To create a provisioned deployment, you can follow these steps; the choices described reflect the entries shown in the screenshot. 
 
-:::image type="content" source="../media/provisioned/deployment-screen.jpg" alt-text="Screenshot of the Azure OpenAI Studio deployment page for a provisioned deployment." lightbox="../media/provisioned/deployment-screen.jpg":::
+:::image type="content" source="../media/provisioned/deployment-screen.png" alt-text="Screenshot of the Azure OpenAI Studio deployment page for a provisioned deployment." lightbox="../media/provisioned/deployment-screen.png":::
 
 
 
@@ -50,6 +73,9 @@ After you purchase a commitment on your quota, you can create a deployment. To c
 | Deployment Type	|This impacts the throughput and performance. Choose Provisioned-Managed for your provisioned deployment 	| Provisioned-Managed |
 | Provisioned Throughput Units |	Choose the amount of throughput you wish to include in the deployment. |	100 |
 
+Important things to note: 
+* The deployment dialog contains a reminder that you can purchase an Azure Reservation for Azure OpenAI Provisioned to obtain a significant discount for a term commitment. 
+* There is a message that tells you the list, hourly price of the deployment that you would be charged if this deployment is not covered by a reservation.  This is a list price that does not include any negotiated discounts for your company. 
 
 If you wish to create your deployment programmatically, you can do so with the following Azure CLI command. Update the `sku-capacity` with the desired number of provisioned throughput units.
 
@@ -67,7 +93,29 @@ az cognitiveservices account deployment create \
 
 REST, ARM template, Bicep and Terraform can also be used to create deployments. See the section on automating deployments in the [Managing Quota](quota.md?tabs=rest#automate-deployment) how-to guide and replace the `sku.name` with "ProvisionedManaged" rather than "Standard."
 
-## Make your first calls
+## Create your provisioned deployment – Capacity is not available 
+
+Due to the dynamic nature of capacity availability, it is possible that the region of your selected resource may not have the service capacity to create the deployment of the specified model, version and number of PTUs. 
+
+In this event, Azure OpenAI Studio will direct you to other regions with available quota and capacity to create a deployment of the desired model.  If this happens, the deployment dialog will look like this: 
+
+:::image type="content" source="../media/provisioned/deployment-screen-2.png" alt-text="Screenshot of the Azure OpenAI Studio deployment page for a provisioned deployment with no capacity available." lightbox="../media/provisioned/deployment-screen-2.png":::
+
+Things to notice: 
+
+* A message displays showing you many PTUs you have in available quota, and how many can currently be deployed at this time. 
+* If you select a number of PTUs greater than service capacity, a message will appear that provides options for you to obtain more capacity, and a button to allow you to select an alternate region.  Clicking the "See other regions" button will display a dialog that shows a list of Azure OpenAI resources where you can create a deployment, along with the maximum sized deployment that can be created based on available quota and service capacity in each region. 
+
+:::image type="content" source="../media/provisioned/choose-different-resource.png" alt-text="Screenshot of the Azure OpenAI Studio deployment page for choosing a different resource and region." lightbox="../media/provisioned/choose-different-resource.png":::
+
+Selecting a resource and clicking **Switch resource** will cause the deployment dialog to redisplay using the selected resource.  You can then proceed to create your deployment in the new region. 
+
+Learn more about the purchase model and how to purchase a reservation: 
+
+* [Azure OpenAI provisioned onboarding guide](./provisioned-throughput-onboarding.md) 
+* [Guide for Azure OpenAI provisioned reservations](../concepts/provisioned-throughput.md) 
+
+## Make your first inferencing calls
 The inferencing code for provisioned deployments is the same a standard deployment type. The following code snippet shows a chat completions call to a GPT-4 model.  For your first time using these models programmatically, we recommend starting with our [quickstart guide](../quickstart.md). Our recommendation is to use the OpenAI library with version 1.0 or greater since this includes retry logic within the library.