Skip to content

Commit f1884b5

Browse files
committed
update
1 parent feb2f4b commit f1884b5

File tree

1 file changed

+7
-4
lines changed

1 file changed

+7
-4
lines changed

articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,11 +44,14 @@ The **Provisioned** option and the capacity planner are only available in certai
4444
|---|---|
4545
|Model | OpenAI model you plan to use. For example: GPT-4 |
4646
| Version | Version of the model you plan to use, for example 0614 |
47-
| Prompt tokens | Number of tokens in the prompt for each call |
48-
| Generation tokens | Number of tokens generated by the model on each call |
49-
| Peak calls per minute | Peak concurrent load to the endpoint measured in calls per minute|
47+
| Peak calls per min | The number of calls per minute that are expected to be sent to the model |
48+
| Tokens in prompt call | The number of tokens in the prompt for each call to the model. Calls with larger prompts will utilize more of the PTU deployment. Currently this calculator assumes a single prompt value so for workloads with wide variance, we recommend benchmarking your deployment on your traffic to determine the most accurate estimate of PTU needed for your deployment. |
49+
| Tokens in model response |
50+
The number of tokens generated from each call to the model. Calls with larger generation sizes will utilize more of the PTU deployment. Currently this calculator assumes a single prompt value so for workloads with wide variance, we recommend benchmarking your deployment on your traffic to determine the most accurate estimate of PTU needed for your deployment. |
5051

51-
After you fill in the required details, select **Calculate** to view the suggested PTU for your scenario.
52+
After you fill in the required details, select **Calculate** button in the output column.
53+
54+
The values in the output column are the estimated value of PTU units required for the provided workload inputs. The first output value represents the estimated PTU units required for the workload, rounded to the nearest PTU scale increment. The second output value represents the raw estimated PTU units required for the workload. The token totals are calculated using the following equation: `Total = Peak calls per minute * (Tokens in prompt call + Tokens in model response)`.
5255

5356
:::image type="content" source="../media/how-to/provisioned-onboarding/capacity-calculator.png" alt-text="Screenshot of the Azure OpenAI Studio landing page." lightbox="../media/how-to/provisioned-onboarding/capacity-calculator.png":::
5457

0 commit comments

Comments
 (0)