Skip to content

Commit d1e7724

Browse files
committed
Learn Editor: Update provisioned-get-started.md
1 parent 8555dbf commit d1e7724

File tree

1 file changed

+16
-14
lines changed

1 file changed

+16
-14
lines changed

articles/ai-services/openai/how-to/provisioned-get-started.md

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ The following guide walks you through key steps in creating a provisioned deploy
2626

2727
## Obtain/verify PTU quota availability.
2828

29-
Provisioned throughput deployments are sized in units called Provisioned Throughput Units (PTUs). PTU quota is granted to a subscription regionally and limits the total number of PTUs that can be deployed in that region across all models and versions.
29+
Provisioned throughput deployments are sized in units called Provisioned Throughput Units (PTUs). PTU quotav for each provisioned deployment type is granted to a subscription regionally and limits the total number of PTUs that can be deployed in that region across all models and versions.
3030

3131
Creating a new deployment requires available (unused) quota to cover the desired size of the deployment. For example: If a subscription has the following in South Central US:
3232

@@ -37,28 +37,29 @@ Creating a new deployment requires available (unused) quota to cover the desired
3737

3838
Then 200 PTUs of quota are considered used, and there are 300 PTUs available for use to create new deployments.
3939

40-
A default amount of provisioned and global provisioned quota is assigned to all subscriptions in several regions. You can view the quota available to you in a region by visiting the Quotas blade in Azure OpenAI Studio and selecting the desired subscription and region. For example, the screenshot below shows a quota limit of 500 PTUs in West US for the selected subscription. Note that you might see lower values of available default quotas.
40+
A default amount of global, data zone, and regional provisioned quota is assigned to eligible subscriptions in several regions. You can view the quota available to you in a region by visiting the Quotas blade in Azure AI Foundry and selecting the desired subscription and region. For example, the screenshot below shows a quota limit of 500 PTUs in West US for the selected subscription. Note that you might see lower values of available default quotas.
4141

4242
:::image type="content" source="../media/provisioned/available-quota.png" alt-text="A screenshot of the available quota in Azure OpenAI studio." lightbox="../media/provisioned/available-quota.png":::
4343

4444
Additional quota can be requested by clicking the Request Quota link to the right of the “Usage/Limit” column. (This is off-screen in the screenshot above).
4545

4646
## Create an Azure OpenAI resource
4747

48-
Provisioned and global provisioned deployments are created via Azure OpenAI resource objects within Azure. You must have an Azure OpenAI resource in each region where you intend to create a deployment. Use the Azure portal to [create a resource](./create-resource.md) in a region with available quota, if required.
48+
Provisioned deployments are created via Azure OpenAI resource objects within Azure. You must have an Azure OpenAI resource in each region where you intend to create a deployment. Use the Azure portal to [create a resource](./create-resource.md) in a region with available quota, if required.
4949

5050
> [!NOTE]
51-
> Azure OpenAI resources can support multiple types of Azure OpenAI deployments at the same time. It is not necessary to dedicate new resources for your provisioned or global provisioned deployments.
52-
## Create your provisioned or global provisioned deployment - capacity is available
51+
> Azure OpenAI resources can support multiple types of Azure OpenAI deployments at the same time. It is not necessary to dedicate new resources for your provisioned deployments.
52+
53+
## Create your provisioned deployment - capacity is available
5354

5455
once you have verified your quota, you can create a deployment. To create a provisioned deployment, you can follow these steps; the choices described reflect the entries shown in the screenshot.
5556

5657
:::image type="content" source="../media/provisioned/deployment-screen.png" alt-text="Screenshot of the Azure OpenAI Studio deployment page for a provisioned deployment." lightbox="../media/provisioned/deployment-screen.png":::
5758

5859

5960

60-
1. Sign into the [Azure OpenAI Studio](https://oai.azure.com)
61-
1. Choose the subscription that was enabled for provisioned and global provisioned deployments & select the desired resource in a region where you have the quota.
61+
1. Sign into [Azure AI Foundry](https://oai.azure.com)
62+
1. Choose the subscription that was enabled for provisioned deployments & select the desired resource in a region where you have the quota.
6263

6364
3. Under **Management** in the left-nav select **Deployments**.
6465
4. Select Create new deployment and configure the following fields. Expand the **advanced options** drop-down menu.
@@ -70,7 +71,7 @@ once you have verified your quota, you can create a deployment. To create a prov
7071
| Model version | Choose the version of the model to deploy. | 0613 |
7172
| Deployment Name | The deployment name is used in your code to call the model by using the client libraries and the REST APIs. | gpt-4|
7273
| Content filter | Specify the filtering policy to apply to the deployment. Learn more on our [Content Filtering](../concepts/content-filter.md) how-to. | Default |
73-
| Deployment Type |This impacts the throughput and performance. Choose Provisioned-Managed or Global Provisioned-Managed for your deployment | Provisioned-Managed |
74+
| Deployment Type |This impacts the throughput and performance. Choose Global Provisioned-Managed, DataZone Provisioned-Managed or Provisioned-Managed from the deployment dialog dropdown for your deployment | Provisioned-Managed |
7475
| Provisioned Throughput Units | Choose the amount of throughput you wish to include in the deployment. | 100 |
7576

7677
Important things to note:
@@ -87,7 +88,7 @@ The image below shows the pricing confirmation you will see. The price shown is
8788

8889
:::image type="content" source="../media/provisioned/confirm-pricing.png" alt-text="Screenshot showing the pricing confirmation screen." lightbox="../media/provisioned/confirm-pricing.png":::
8990

90-
If you wish to create your deployment programmatically, you can do so with the following Azure CLI command. To specify the deployment type, modify the `sku-name` to `ProvisionedManaged` or `GlobalProvisionedManaged` based on the intended deployment type. Update the `sku-capacity` with the desired number of provisioned throughput units.
91+
If you wish to create your deployment programmatically, you can do so with the following Azure CLI command. To specify the deployment type, modify the `sku-name` to `GlobalProvisionedManaged`, `DataZoneProvisionedManaged`, or `ProvisionedManaged` based on the intended deployment type. Update the `sku-capacity` with the desired number of provisioned throughput units.
9192

9293
```cli
9394
az cognitiveservices account deployment create \
@@ -101,13 +102,13 @@ az cognitiveservices account deployment create \
101102
--sku-name ProvisionedManaged
102103
```
103104

104-
REST, ARM template, Bicep, and Terraform can also be used to create deployments. See the section on automating deployments in the [Managing Quota](quota.md?tabs=rest#automate-deployment) how-to guide and replace the `sku.name` with "ProvisionedManaged" or "GlobalProvisionedManaged" rather than "Standard."
105+
REST, ARM template, Bicep, and Terraform can also be used to create deployments. See the section on automating deployments in the [Managing Quota](quota.md?tabs=rest#automate-deployment) how-to guide and replace the `sku.name` with `GlobalProvisionedManaged`, `DataZoneProvisionedManaged`, or `ProvisionedManaged` rather than `Standard`.
105106

106-
## Create your provisioned or global provisioned deployment – Capacity is not available
107+
## Create your provisioned deployment – Capacity is not available
107108

108109
Due to the dynamic nature of capacity availability, it is possible that the region of your selected resource might not have the service capacity to create the deployment of the specified model, version, and number of PTUs.
109110

110-
In this event, Azure OpenAI Studio will direct you to other regions with available quota and capacity to create a deployment of the desired model. If this happens, the deployment dialog will look like this:
111+
In this event, Azure AI Foundry will direct you to other regions with available quota and capacity to create a deployment of the desired model. If this happens, the deployment dialog will look like this:
111112

112113
:::image type="content" source="../media/provisioned/deployment-screen-2.png" alt-text="Screenshot of the Azure OpenAI Studio deployment page for a provisioned deployment with no capacity available." lightbox="../media/provisioned/deployment-screen-2.png":::
113114

@@ -166,7 +167,8 @@ The inferencing code for provisioned deployments is the same a standard deployme
166167

167168
## Understanding expected throughput
168169
The amount of throughput that you can achieve on the endpoint is a factor of the number of PTUs deployed, input size, output size, and call rate. The number of concurrent calls and total tokens processed can vary based on these values. Our recommended way for determining the throughput for your deployment is as follows:
169-
1. Use the Capacity calculator for a sizing estimate. You can find the capacity calculator in the Azure OpenAI Studio under the quotas page and Provisioned tab.
170+
1. Use the Capacity calculator for a sizing estimate. You can find the capacity calculator in Azure AI Foundry under the quotas page and Provisioned tab.
171+
170172
2. Benchmark the load using real traffic workload. For more information about benchmarking, see the [benchmarking](#run-a-benchmark) section.
171173

172174

@@ -183,7 +185,7 @@ For more information about monitoring your deployments, see the [Monitoring Azur
183185

184186

185187
## Handling high utilization
186-
Provisioned deployments provide you with an allocated amount of compute capacity to run a given model. The ‘Provisioned-Managed Utilization’ metric in Azure Monitor measures the utilization of the deployment in one-minute increments. Provisioned-Managed deployments are also optimized so that calls accepted are processed with a consistent per-call max latency. When the workload exceeds its allocated capacity, the service returns a 429 HTTP status code until the utilization drops down below 100%. The time before retrying is provided in the `retry-after` and `retry-after-ms` response headers that provide the time in seconds and milliseconds respectively. This approach maintains the per-call latency targets while giving the developer control over how to handle high-load situations – for example retry or divert to another experience/endpoint.
188+
Provisioned deployments provide you with an allocated amount of compute capacity to run a given model. The ‘Provisioned-Managed Utilization V2’ metric in Azure Monitor measures the utilization of the deployment in one-minute increments. Provisioned-Managed deployments are also optimized so that calls accepted are processed with a consistent per-call max latency. When the workload exceeds its allocated capacity, the service returns a 429 HTTP status code until the utilization drops down below 100%. The time before retrying is provided in the `retry-after` and `retry-after-ms` response headers that provide the time in seconds and milliseconds respectively. This approach maintains the per-call latency targets while giving the developer control over how to handle high-load situations – for example retry or divert to another experience/endpoint.
187189

188190
### What should I do when I receive a 429 response?
189191
A 429 response indicates that the allocated PTUs are fully consumed at the time of the call. The response includes the `retry-after-ms` and `retry-after` headers that tell you the time to wait before the next call will be accepted. How you choose to handle a 429 response depends on your application requirements. Here are some considerations:

0 commit comments

Comments
 (0)