Skip to content

Commit 4dad929

Browse files
committed
Learn Editor: Update provisioned-throughput.md
1 parent 366e727 commit 4dad929

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

articles/ai-services/openai/concepts/provisioned-throughput.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ For a full list see the [Azure OpenAI Service in Azure AI Foundry portal calcula
5858

5959

6060
> [!NOTE]
61-
> Global provisioned deployments are only supported for gpt-4o, 2024-08-06 and gpt-4o-mini, 2024-07-18 models at this time. Data zone provisioned deployments are only supported for gpt-4o, 2024-08-06, gpt-4o, 2024-05-13, and gpt-4o-mini, 2024-07-18 models at this time. For more information on model availability, review the [models documentation](./models.md).
61+
> Global provisioned deployments are only supported for gpt-4o and gpt-4o-mini models at this time. Data zone provisioned deployments are only supported for gpt-4o and gpt-4o-mini models at this time. For more information on model availability, review the [models documentation](./models.md).
6262
6363
## Key concepts
6464

@@ -73,11 +73,11 @@ az cognitiveservices account deployment create \
7373
--name <myResourceName> \
7474
--resource-group <myResourceGroupName> \
7575
--deployment-name MyDeployment \
76-
--model-name gpt-4 \
77-
--model-version 0613 \
76+
--model-name gpt-4o \
77+
--model-version 2024-08-06 \
7878
--model-format OpenAI \
79-
--sku-capacity 100 \
80-
--sku-name ProvisionedManaged
79+
--sku-capacity 15 \
80+
--sku-name GlobalProvisionedManaged
8181
```
8282

8383
### Quota
@@ -132,7 +132,7 @@ If an acceptable region isn't available to support the desire model, version and
132132

133133
### Determining the number of PTUs needed for a workload
134134

135-
PTUs represent an amount of model processing capacity. Similar to your computer or databases, different workloads or requests to the model will consume different amounts of underlying processing capacity. The conversion from call shape characteristics (prompt size, generation size and call rate) to PTUs is complex and nonlinear. To simplify this process, you can use the [Azure OpenAI Capacity calculator](https://oai.azure.com/portal/calculator) to size specific workload shapes.
135+
PTUs represent an amount of model processing capacity. Similar to your computer or databases, different workloads or requests to the model will consume different amounts of underlying processing capacity. The conversion from throughput needs to PTUs can be approximated using historical token usage data or call shape estimations (input tokens, output tokens, and requests per minute) as outlined in our [performance and latency](../how-to/latency.md) documentation. To simplify this process, you can use the [Azure OpenAI Capacity calculator](https://oai.azure.com/portal/calculator) to size specific workload shapes.
136136

137137
A few high-level considerations:
138138
- Generations require more capacity than prompts

0 commit comments

Comments
 (0)