Skip to content

Commit ad5f431

Browse files
committed
update
1 parent 9415df9 commit ad5f431

File tree

1 file changed

+2
-3
lines changed

1 file changed

+2
-3
lines changed

articles/ai-services/openai/concepts/provisioned-throughput.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,13 +68,12 @@ az cognitiveservices account deployment create \
6868

6969
### Quota
7070

71-
Provisioned throughput quota represents a specific amount of total throughput you can deploy. Quota in the Azure OpenAI Service is managed at the subscription level. All Azure OpenAI resources within the subscription share this quota.
71+
Provisioned throughput quota represents a specific amount of total throughput you can deploy. Quota in the Azure OpenAI Service is managed at the subscription level. All Azure OpenAI resources within the subscription share this quota.
7272

73-
Quota is specified in Provisioned throughput units and is specific to a (deployment type, model, region) triplet. Quota isn't interchangeable. Meaning you can't use quota for GPT-4 to deploy GPT-35-turbo. You can raise a support request to move quota across deployment types, models, or regions but the swap isn't guaranteed.
73+
Quota is specified in Provisioned throughput units and is specific to a (deployment type, model, region) triplet. Quota isn't interchangeable. Meaning you can't use quota for GPT-4 to deploy GPT-3.5-Turbo.
7474

7575
While we make every attempt to ensure that quota is deployable, quota doesn't represent a guarantee that the underlying capacity is available. The service assigns capacity during the deployment operation and if capacity is unavailable the deployment fails with an out of capacity error.
7676

77-
7877
### Determining the number of PTUs needed for a workload
7978

8079
PTUs represent an amount of model processing capacity. Similar to your computer or databases, different workloads or requests to the model will consume different amounts of underlying processing capacity. The conversion from call shape characteristics (prompt size, generation size and call rate) to PTUs is complex and non-linear. To simplify this process, you can use the [Azure OpenAI Capacity calculator](https://oai.azure.com/portal/calculator) to size specific workload shapes.

0 commit comments

Comments
 (0)