update

mrbullwinkle · mrbullwinkle · commit f1884b505b91 · 2024-06-27T11:38:35.000-04:00
diff --git a/articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md b/articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md
@@ -44,11 +44,14 @@ The **Provisioned** option and the capacity planner are only available in certai
 |---|---|
 |Model | OpenAI model you plan to use. For example: GPT-4 |
 | Version | Version of the model you plan to use, for example 0614 |
-| Prompt tokens | Number of tokens in the prompt for each call |
-| Generation tokens | Number of tokens generated by the model on each call |
-| Peak calls per minute | Peak concurrent load to the endpoint measured in calls per minute|
+| Peak calls per min | The number of calls per minute that are expected to be sent to the model |
+| Tokens in prompt call | The number of tokens in the prompt for each call to the model. Calls with larger prompts will utilize more of the PTU deployment. Currently this calculator assumes a single prompt value so for workloads with wide variance, we recommend benchmarking your deployment on your traffic to determine the most accurate estimate of PTU needed for your deployment. |
+| Tokens in model response | 
+The number of tokens generated from each call to the model. Calls with larger generation sizes will utilize more of the PTU deployment. Currently this calculator assumes a single prompt value so for workloads with wide variance, we recommend benchmarking your deployment on your traffic to determine the most accurate estimate of PTU needed for your deployment. |
 
-After you fill in the required details, select **Calculate** to view the suggested PTU for your scenario.
+After you fill in the required details, select **Calculate** button in the output column.
+
+The values in the output column are the estimated value of PTU units required for the provided workload inputs. The first output value represents the estimated PTU units required for the workload, rounded to the nearest PTU scale increment. The second output value represents the raw estimated PTU units required for the workload. The token totals are calculated using the following equation: `Total = Peak calls per minute * (Tokens in prompt call + Tokens in model response)`.
 
 :::image type="content" source="../media/how-to/provisioned-onboarding/capacity-calculator.png" alt-text="Screenshot of the Azure OpenAI Studio landing page." lightbox="../media/how-to/provisioned-onboarding/capacity-calculator.png":::