You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ title: Azure OpenAI Service Provisioned Throughput Units (PTU) onboarding
3
3
description: Learn about provisioned throughput units onboarding and Azure OpenAI.
4
4
ms.service: azure-ai-openai
5
5
ms.topic: conceptual
6
-
ms.date: 05/02/2024
6
+
ms.date: 06/25/2024
7
7
manager: nitinme
8
8
author: mrbullwinkle
9
9
ms.author: mbullwin
@@ -44,11 +44,13 @@ The **Provisioned** option and the capacity planner are only available in certai
44
44
|---|---|
45
45
|Model | OpenAI model you plan to use. For example: GPT-4 |
46
46
| Version | Version of the model you plan to use, for example 0614 |
47
-
|Prompt tokens | Number of tokens in the prompt for each call|
48
-
|Generation tokens | Number of tokens generated by the modelon each call|
49
-
|Peak calls per minute|Peak concurrent load to the endpoint measured in calls per minute|
47
+
|Peak calls per min | The number of calls per minute that are expected to be sent to the model|
48
+
|Tokens in prompt call | The number of tokens in the prompt for each call to the model. Calls with larger prompts will utilize more of the PTU deployment. Currently this calculator assumes a single prompt value so for workloads with wide variance, we recommend benchmarking your deployment on your traffic to determine the most accurate estimate of PTU needed for your deployment.|
49
+
|Tokens in model response|The number of tokens generated from each call to the model. Calls with larger generation sizes will utilize more of the PTU deployment. Currently this calculator assumes a single prompt value so for workloads with wide variance, we recommend benchmarking your deployment on your traffic to determine the most accurate estimate of PTU needed for your deployment. |
50
50
51
-
After you fill in the required details, select **Calculate** to view the suggested PTU for your scenario.
51
+
After you fill in the required details, select **Calculate** button in the output column.
52
+
53
+
The values in the output column are the estimated value of PTU units required for the provided workload inputs. The first output value represents the estimated PTU units required for the workload, rounded to the nearest PTU scale increment. The second output value represents the raw estimated PTU units required for the workload. The token totals are calculated using the following equation: `Total = Peak calls per minute * (Tokens in prompt call + Tokens in model response)`.
52
54
53
55
:::image type="content" source="../media/how-to/provisioned-onboarding/capacity-calculator.png" alt-text="Screenshot of the Azure OpenAI Studio landing page." lightbox="../media/how-to/provisioned-onboarding/capacity-calculator.png":::
0 commit comments