Update provisioned-throughput-onboarding.md

aahill · web-flow · commit 99513a16645c · 2025-04-30T18:37:27.000-07:00
diff --git a/articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md b/articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md
@@ -74,6 +74,20 @@ Customers that require long-term usage of provisioned, data zoned provisioned, a
 ## How much throughput per PTU you get for each model
 
 
+
+
+
+The amount of throughput (measured in tokens per minute or TPM) a deployment gets per PTU is a function of the input and output tokens in a given minute. Generating output tokens requires more processing than input tokens.  Starting with GPT 4.1 models and later, the system matches the global standard price ratio between input and output tokens. Cached tokens are deducted 100% from the utilization.
+
+For example, for 'gpt-4.1:2025-04-14', 1 output token counts as 4 input tokens towards your utilization limit which matches the [pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). Older models use a different ratio and for a deeper understanding on how different ratios of input and output tokens impact the throughput your workload needs, see the [Azure OpenAI capacity calculator](https://ai.azure.com/resource/calculator).
+
+
+
+
+
+
+
+
 To understand how much throughput (TPU) you get, keep the following in mind: 
 
 * The amount of throughput (measured in tokens per minute or TPM) a deployment gets per PTU is a function of the input and output tokens in a given minute.