Skip to content

Commit cecef0d

Browse files
authored
Update provisioned-throughput-onboarding.md
1 parent 99513a1 commit cecef0d

File tree

1 file changed

+0
-20
lines changed

1 file changed

+0
-20
lines changed

articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md

Lines changed: 0 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -82,26 +82,6 @@ The amount of throughput (measured in tokens per minute or TPM) a deployment get
8282
For example, for 'gpt-4.1:2025-04-14', 1 output token counts as 4 input tokens towards your utilization limit which matches the [pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). Older models use a different ratio and for a deeper understanding on how different ratios of input and output tokens impact the throughput your workload needs, see the [Azure OpenAI capacity calculator](https://ai.azure.com/resource/calculator).
8383

8484

85-
86-
87-
88-
89-
90-
91-
To understand how much throughput (TPU) you get, keep the following in mind:
92-
93-
* The amount of throughput (measured in tokens per minute or TPM) a deployment gets per PTU is a function of the input and output tokens in a given minute.
94-
* Generating output tokens requires more processing than input tokens.  Provisioned-Managed matches the Standard offering for gpt-4.1 models and later.
95-
* 1 output token now counts as 4 input tokens towards your TPM-per-PTU limit.
96-
* In the standard offering, 1 output token is 4 times expensive as an input token. See the [pricing page for details](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/).
97-
98-
To help simplify the sizing effort, the following table outlines the TPM-per-PTU for the specified models.
99-
To understand the impact of output tokens on the TPM-per-PTU limit, use the 4 input token to 1 output token ratio.
100-
101-
For a detailed understanding of how different ratios of input and output tokens impact the throughput your workload needs, see the [Azure OpenAI capacity calculator](https://ai.azure.com/resource/calculator). The table also shows the [Service Level Agreement (SLA)](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services?lang=1) Latency Target Values per model.
102-
103-
104-
10585
|Topic| **gpt-4o** | **gpt-4o-mini** | **o1**|
10686
| --- | --- | --- | --- |
10787
|Global & data zone provisioned minimum deployment|15|15|15|

0 commit comments

Comments
 (0)