Skip to content

Commit d87fb1e

Browse files
committed
fixing conflict
2 parents a1e6526 + 9e32297 commit d87fb1e

File tree

3 files changed

+37
-37
lines changed

3 files changed

+37
-37
lines changed

articles/ai-services/openai/concepts/provisioned-throughput.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,7 @@ The following points are some important takeaways from the table:
192192
| | O4 mini |||||
193193
| **Azure DeepSeek** | DeepSeek-R1 || | | |
194194
| | DeepSeek-V3-0324 || | | |
195-
195+
| | DeepSeek-R1-0528 || | | |
196196

197197

198198
### Region availability for provisioned throughput capability

articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -77,14 +77,14 @@ The amount of throughput (measured in tokens per minute or TPM) a deployment get
7777

7878
For example, for `gpt-4.1:2025-04-14`, 1 output token counts as 4 input tokens towards your utilization limit which matches the [pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). Older models use a different ratio and for a deeper understanding on how different ratios of input and output tokens impact the throughput your workload needs, see the [Azure AI Foundry PTU quota calculator](https://ai.azure.com/resource/calculator).
7979

80-
|Topic| **o4-mini** | **gpt-4.1** | **gpt-4.1-mini** | **gpt-4.1-nano** | **o3** | **o3-mini** | **o1** | **gpt-4o** | **gpt-4o-mini** | **DeepSeek-R1** | **DeepSeek-V3-0324** |
81-
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
82-
|Global & data zone provisioned minimum deployment| 15 | 15|15| 15 | 15 |15|15|15|15| 100|100|
83-
|Global & data zone provisioned scale increment| 5 | 5|5| 5 | 5 |5|5|5|5| 100|100|
84-
|Regional provisioned minimum deployment|25| 50|25| 25 |50 | 25|25|50|25| NA|NA|
85-
|Regional provisioned scale increment|25| 50|25| 25 | 50 | 25|50|50|25|NA|NA|
86-
|Input TPM per PTU|5,400 | 3,000|14,900| 59,400 | 3,000 | 2,500|230|2,500|37,000|4,000|4,000|
87-
|Latency Target Value| 99% > 66 Tokens Per Second\* | 99% > 40 Tokens Per Second\* | 99% > 50 Tokens Per Second\*| 99% > 60 Tokens Per Second\* | 99% > 40 Tokens Per Second\* | 99% > 66 Tokens Per Second\* | 99% > 25 Tokens Per Second\* | 99% > 25 Tokens Per Second\* | 99% > 33 Tokens Per Second\* | 99% > 50 Tokens Per Second\*| 99% > 50 Tokens Per Second\*|
80+
|Topic| **o4-mini** | **gpt-4.1** | **gpt-4.1-mini** | **gpt-4.1-nano** | **o3** | **o3-mini** | **o1** | **gpt-4o** | **gpt-4o-mini** | **DeepSeek-R1** | **DeepSeek-V3-0324** | **DeepSeek-R1-0528** |
81+
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
82+
|Global & data zone provisioned minimum deployment| 15 | 15|15| 15 | 15 |15|15|15|15| 100|100|100|
83+
|Global & data zone provisioned scale increment| 5 | 5|5| 5 | 5 |5|5|5|5| 100|100|100|
84+
|Regional provisioned minimum deployment|25| 50|25| 25 |50 | 25|25|50|25| NA|NA|NA|
85+
|Regional provisioned scale increment|25| 50|25| 25 | 50 | 25|50|50|25|NA|NA|NA|
86+
|Input TPM per PTU|5,400 | 3,000|14,900| 59,400 | 3,000 | 2,500|230|2,500|37,000|4,000|4,000|4,000|
87+
|Latency Target Value| 99% > 66 Tokens Per Second\* | 99% > 40 Tokens Per Second\* | 99% > 50 Tokens Per Second\*| 99% > 60 Tokens Per Second\* | 99% > 40 Tokens Per Second\* | 99% > 66 Tokens Per Second\* | 99% > 25 Tokens Per Second\* | 99% > 25 Tokens Per Second\* | 99% > 33 Tokens Per Second\* | 99% > 50 Tokens Per Second\*| 99% > 50 Tokens Per Second\*| 99% > 50 Tokens Per Second\*|
8888

8989
\* Calculated as the average request latency on a per-minute basis across the month.
9090

0 commit comments

Comments
 (0)