You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| Estimating size | Provided calculator in the studio & benchmarking script. |
38
38
39
-
## How do I get access to Provisioned?
40
-
41
-
You need to speak with your Microsoft sales/account team to acquire provisioned throughput. If you don't have a sales/account team, unfortunately at this time, you cannot purchase provisioned throughput.
42
-
43
39
## What models and regions are available for provisioned throughput?
@@ -55,7 +51,9 @@ Provisioned throughput units (PTU) are units of model processing capacity that y
55
51
56
52
### Deployment types
57
53
58
-
When deploying a model in Azure OpenAI, you need to set the `sku-name` to be Provisioned-Managed. The `sku-capacity` specifies the number of PTUs assigned to the deployment.
54
+
When creating a provisioned deployment in Azure OpenAI Studio, the deployment type on the Create Deployment dialog is Provisioned-Managed.
55
+
56
+
When creating a provisioned deployment in Azure OpenAI via CLI or API, you need to set the `sku-name` to be Provisioned-Managed. The `sku-capacity` specifies the number of PTUs assigned to the deployment.
59
57
60
58
```azurecli
61
59
az cognitiveservices account deployment create \
@@ -83,12 +81,12 @@ The new quota shows up in the AI Studio and Azure OpenAI Studio as a quota item
83
81
84
82
## Capacity transparency and quota definitions
85
83
86
-
Azure OpenAI is a highly sought-after service where customer demand may exceed service GPU capacity. While Microsoft strives to provide capacity for all in-demand regions and models, selling out a region is always a possibility. This can impact some customers’ ability to create a deployment of the desired model, version, PTU count in a desired region. For existing customers, there is a negotiation about regional availability during the quota allocation process that results in a region selection. With the release of the self-service quota changes, this process changes in a few ways:
84
+
Azure OpenAI is a highly sought-after service where customer demand may exceed service GPU capacity. Microsoft strives to provide capacity for all in-demand regions and models, but selling out a region is always a possibility. This can limit some customers’ ability to create a deployment of their desired model, version, or number of PTUs in a desired region -- even if they have quota available in that region. Generally speaking:
87
85
88
-
- Quota is a limit on the maximum number of PTUs that can be deployed but is not a guarantee of capacity availability. This is the same quota meaning as for other Azure services, such as VMs.
89
-
- Capacity is allocated to a customer at deployment time and is held by that customer for as long as the deployment exists. It isn't released unless the customer deletes or scales down the deployment.
90
-
-Quota won't be considered an implicit commitment to pay, and it will not be reclaimed if not used.
91
-
-Customers use real-time information on quota/capacity availability to choose for themselves an appropriate region for their scenario that can support the deployment they require.
86
+
- Quota places a limit on the maximum number of PTUs that can be deployed in a subscription and region, but is not a guarantee of capacity availability. This is algned with how quota works for other Azure services, such as VMs.
87
+
- Capacity is allocated to a customer at deployment time and is held for as long as the deployment exists. If service capacity is not available, the deployment will fail
88
+
-Customers use real-time information on quota/capacity availability to choose an appropriate region for their scenario with the necessary model capacity
89
+
-Scaling down or deleting a deployment releases capacity back to the region. There is no guarantee that the capacity will be available should the customer scale up or re-create the deployment later.
0 commit comments