Merge pull request #1691 from sydneemayers/docs-editor/provisioned-throughput-1732290457

prmerger-automator[bot] · web-flow · commit 956b22b22532 · 2024-11-22T15:59:37.000Z
Update provisioned-throughput.md
diff --git a/articles/ai-services/openai/concepts/provisioned-throughput.md b/articles/ai-services/openai/concepts/provisioned-throughput.md
@@ -41,14 +41,17 @@ An Azure OpenAI Deployment is a unit of management for a specific OpenAI Model.
 ## How much throughput per PTU you get for each model
 The amount of throughput (tokens per minute or TPM) a deployment gets per PTU is a function of the input and output tokens in the minute. Generating output tokens requires more processing than input tokens and so the more output tokens generated the lower your overall TPM. The service dynamically balances the input & output costs, so users do not have to set specific input and output limits. This approach means your deployment is resilient to fluctuations in the workload shape. 
 
-To help with simplifying the sizing effort, the following table outlines the TPM per PTU for the `gpt-4o` and `gpt-4o-mini` models which represent the max all the traffic is either input or output. The table also shows Service Level Agreement (SLA) Latency Target Values per model.  For more information about the SLA for Azure OpenAI Service, see the [Service Level Agreements (SLA) for Online Services page].(https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services?lang=1)
+To help with simplifying the sizing effort, the following table outlines the TPM per PTU for the `gpt-4o` and `gpt-4o-mini` models which represent the max TPM assuming all traffic is either input or output. To understand how different ratios of input and output tokens impact your Max TPM per PTU, see the [Azure OpenAI capacity calculator](https://oai.azure.com/portal/calculator). The table also shows Service Level Agreement (SLA) Latency Target Values per model.  For more information about the SLA for Azure OpenAI Service, see the [Service Level Agreements (SLA) for Online Services page](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services?lang=1)
 
-|     | **gpt-4o**, **2024-05-13**   & **gpt-4o**, **2024-08-06**   | **gpt-4o-mini**, **2024-07-18**   |
+|| **gpt-4o**, **2024-05-13**   & **gpt-4o**, **2024-08-06**   | **gpt-4o-mini**, **2024-07-18**   |
 | --- | --- | --- |
-| Deployable Increments | 50 | 25|
+|Global provisioned minimum deployment|15|15|
+|Global provisioned scale increment|5|5|
+| Regional provisioned minimum deployment | 50 | 25|
+|Regional provisioned scale increment|50|25|
 |Max Input TPM per PTU | 2,500 | 37,000  |
 |Max Output TPM per PTU| 833|12,333|
-| Latency Target Value |25 Tokens Per Second* |33 Tokens Per Second*|
+| Latency Target Value |25 Tokens Per Second|33 Tokens Per Second|
 
 For a full list see the [AOAI Studio calculator](https://oai.azure.com/portal/calculator).