Skip to content

Commit e6e5b64

Browse files
Merge pull request #3324 from sydneemayers/docs-editor/provisioned-throughput-1741123288
Update provisioned-throughput.md
2 parents 5d46386 + fa93f1e commit e6e5b64

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

articles/ai-services/openai/concepts/provisioned-throughput.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -48,14 +48,14 @@ The amount of throughput (tokens per minute or TPM) a deployment gets per PTU is
4848

4949
To help with simplifying the sizing effort, the following table outlines the TPM per PTU for the specified models. To understand the impact of output tokens on the TPM per PTU limit, use the 3 input token to 1 output token ratio. For a detailed understanding of how different ratios of input and output tokens impact the throughput your workload needs, see the [Azure OpenAI capacity calculator](https://oai.azure.com/portal/calculator). The table also shows Service Level Agreement (SLA) Latency Target Values per model. For more information about the SLA for Azure OpenAI Service, see the [Service Level Agreements (SLA) for Online Services page](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services?lang=1)
5050

51-
|Topic| **gpt-4o** | **gpt-4o-mini** |
52-
| --- | --- | --- |
53-
|Global & data zone provisioned minimum deployment|15|15|
54-
|Global & data zone provisioned scale increment|5|5|
55-
|Regional provisioned minimum deployment | 50 | 25|
56-
|Regional provisioned scale increment|50|25|
57-
|Input TPM per PTU | 2,500 | 37,000 |
58-
|Latency Target Value |25 Tokens Per Second|33 Tokens Per Second|
51+
|Topic| **gpt-4o** | **gpt-4o-mini** | **o1**|
52+
| --- | --- | --- | --- |
53+
|Global & data zone provisioned minimum deployment|15|15|15|
54+
|Global & data zone provisioned scale increment|5|5|5|
55+
|Regional provisioned minimum deployment|50|25|50|
56+
|Regional provisioned scale increment|50|25|50|
57+
|Input TPM per PTU |2,500|37,000|230|
58+
|Latency Target Value |25 Tokens Per Second|33 Tokens Per Second|25 Tokens Per Second|
5959

6060
For a full list see the [Azure OpenAI Service in Azure AI Foundry portal calculator](https://oai.azure.com/portal/calculator).
6161

0 commit comments

Comments
 (0)