@@ -83,14 +83,14 @@ For example, for gpt-5 1 output token counts as 8 input tokens towards your util
83
83
> [ !NOTE]
84
84
> gpt-4.1, gpt-4.1-mini and gpt-4.1-nano don't support long context (requests estimated at larger than 128k prompt tokens).
85
85
86
- | Topic| ** gpt-5** | ** gpt-4.1** | ** gpt-4.1-mini** | ** gpt-4.1-nano** | ** o3** | ** o4-mini** |
87
- | --- | --- | --- | --- | --- | --- | --- |
88
- | Global & data zone provisioned minimum deployment| 15 | 15| 15| 15 | 15 | 15 |
89
- | Global & data zone provisioned scale increment| 5 | 5| 5| 5 | 5 | 5 |
90
- | Regional provisioned minimum deployment| 50 | 50| 25| 25 | 50 | 25|
91
- | Regional provisioned scale increment| 50 | 50| 25| 25 | 50 | 25|
92
- | Input TPM per PTU| 4,750 | 3,000| 14,900| 59,400 | 3,000 | 5,400 |
93
- | Latency Target Value| 99% > 50 Tokens Per Second\* | 99% > 80 Tokens Per Second\* | 99% > 90 Tokens Per Second\* | 99% > 100 Tokens Per Second\* | 99% > 80 Tokens Per Second\* | 99% > 90 Tokens Per Second\* |
86
+ | Topic| ** gpt-5** | ** gpt-5-mini ** | ** gpt- 4.1** | ** gpt-4.1-mini** | ** gpt-4.1-nano** | ** o3** | ** o4-mini** |
87
+ | --- | --- | --- | --- | --- | --- | --- | --- |
88
+ | Global & data zone provisioned minimum deployment| 15 | 15 | 15| 15| 15 | 15 | 15 |
89
+ | Global & data zone provisioned scale increment| 5 | 5 | 5 | 5| 5 | 5 | 5 |
90
+ | Regional provisioned minimum deployment| 50 | 25 | 50| 25| 25 | 50 | 25|
91
+ | Regional provisioned scale increment| 50 | 25 | 50| 25| 25 | 50 | 25|
92
+ | Input TPM per PTU| 4,750 | 23,750 | 3,000| 14,900| 59,400 | 3,000 | 5,400 |
93
+ | Latency Target Value| 99% > 50 Tokens Per Second\* | 99% > 80 Tokens Per Second\* | 99% > 80 Tokens Per Second \* | 99% > 90 Tokens Per Second\* | 99% > 100 Tokens Per Second\* | 99% > 80 Tokens Per Second\* | 99% > 90 Tokens Per Second\* |
94
94
95
95
\* Calculated as p50 request latency on a per 5 minute basis.
96
96
0 commit comments