Skip to content

Commit dac0360

Browse files
committed
deduplicating
1 parent 071e5c7 commit dac0360

File tree

1 file changed

+0
-9
lines changed

1 file changed

+0
-9
lines changed

articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -81,15 +81,6 @@ The amount of throughput (measured in tokens per minute or TPM) a deployment get
8181

8282
For example, for `gpt-4.1:2025-04-14`, 1 output token counts as 4 input tokens towards your utilization limit which matches the [pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). Older models use a different ratio and for a deeper understanding on how different ratios of input and output tokens impact the throughput your workload needs, see the [Azure OpenAI capacity calculator](https://ai.azure.com/resource/calculator).
8383

84-
|Topic| **gpt-4.1** | **gpt-4.1-mini** | **gpt-4o** | **gpt-4o-mini** | **o3-mini** | **o1** |
85-
| --- | --- | --- | --- | --- | --- | --- |
86-
|Global & data zone provisioned minimum deployment|15|15|15|15|15|15|
87-
|Global & data zone provisioned scale increment|5|5|5|5|5|5|
88-
|Regional provisioned minimum deployment|50|25|50|25|25|25|
89-
|Regional provisioned scale increment|50|25|50|25|25|50|
90-
|Input TPM per PTU|3,000|14,900|2,500|37,000|2,500|230|
91-
|Latency Target Value|44 Tokens Per Second|50 Tokens Per Second|25 Tokens Per Second|33 Tokens Per Second| |25 Tokens Per Second|
92-
9384
|Topic| **gpt-4.1** | **gpt-4.1-mini** | **o3-mini** | **o1** | **gpt-4o** | **gpt-4o-mini** |
9485
| --- | --- | --- | --- | --- | --- | --- |
9586
|Global & data zone provisioned minimum deployment|15|15|15|15|15|15|

0 commit comments

Comments
 (0)