Skip to content

Commit 99513a1

Browse files
authored
Update provisioned-throughput-onboarding.md
1 parent bc0bca6 commit 99513a1

File tree

1 file changed

+14
-0
lines changed

1 file changed

+14
-0
lines changed

articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,20 @@ Customers that require long-term usage of provisioned, data zoned provisioned, a
7474
## How much throughput per PTU you get for each model
7575

7676

77+
78+
79+
80+
The amount of throughput (measured in tokens per minute or TPM) a deployment gets per PTU is a function of the input and output tokens in a given minute. Generating output tokens requires more processing than input tokens.  Starting with GPT 4.1 models and later, the system matches the global standard price ratio between input and output tokens. Cached tokens are deducted 100% from the utilization.
81+
82+
For example, for 'gpt-4.1:2025-04-14', 1 output token counts as 4 input tokens towards your utilization limit which matches the [pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). Older models use a different ratio and for a deeper understanding on how different ratios of input and output tokens impact the throughput your workload needs, see the [Azure OpenAI capacity calculator](https://ai.azure.com/resource/calculator).
83+
84+
85+
86+
87+
88+
89+
90+
7791
To understand how much throughput (TPU) you get, keep the following in mind:
7892

7993
* The amount of throughput (measured in tokens per minute or TPM) a deployment gets per PTU is a function of the input and output tokens in a given minute.

0 commit comments

Comments
 (0)