Skip to content

Commit 9df751c

Browse files
author
Jill Grant
authored
Merge pull request #279225 from mrbullwinkle/mrb_06_25_2024_PTU_calculator
[Azure OpenAI] update screenshot
2 parents a59859b + cda1528 commit 9df751c

File tree

2 files changed

+7
-5
lines changed

2 files changed

+7
-5
lines changed

articles/ai-services/openai/how-to/provisioned-throughput-onboarding.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Azure OpenAI Service Provisioned Throughput Units (PTU) onboarding
33
description: Learn about provisioned throughput units onboarding and Azure OpenAI.
44
ms.service: azure-ai-openai
55
ms.topic: conceptual
6-
ms.date: 05/02/2024
6+
ms.date: 06/25/2024
77
manager: nitinme
88
author: mrbullwinkle
99
ms.author: mbullwin
@@ -44,11 +44,13 @@ The **Provisioned** option and the capacity planner are only available in certai
4444
|---|---|
4545
|Model | OpenAI model you plan to use. For example: GPT-4 |
4646
| Version | Version of the model you plan to use, for example 0614 |
47-
| Prompt tokens | Number of tokens in the prompt for each call |
48-
| Generation tokens | Number of tokens generated by the model on each call |
49-
| Peak calls per minute | Peak concurrent load to the endpoint measured in calls per minute|
47+
| Peak calls per min | The number of calls per minute that are expected to be sent to the model |
48+
| Tokens in prompt call | The number of tokens in the prompt for each call to the model. Calls with larger prompts will utilize more of the PTU deployment. Currently this calculator assumes a single prompt value so for workloads with wide variance, we recommend benchmarking your deployment on your traffic to determine the most accurate estimate of PTU needed for your deployment. |
49+
| Tokens in model response | The number of tokens generated from each call to the model. Calls with larger generation sizes will utilize more of the PTU deployment. Currently this calculator assumes a single prompt value so for workloads with wide variance, we recommend benchmarking your deployment on your traffic to determine the most accurate estimate of PTU needed for your deployment. |
5050

51-
After you fill in the required details, select **Calculate** to view the suggested PTU for your scenario.
51+
After you fill in the required details, select **Calculate** button in the output column.
52+
53+
The values in the output column are the estimated value of PTU units required for the provided workload inputs. The first output value represents the estimated PTU units required for the workload, rounded to the nearest PTU scale increment. The second output value represents the raw estimated PTU units required for the workload. The token totals are calculated using the following equation: `Total = Peak calls per minute * (Tokens in prompt call + Tokens in model response)`.
5254

5355
:::image type="content" source="../media/how-to/provisioned-onboarding/capacity-calculator.png" alt-text="Screenshot of the Azure OpenAI Studio landing page." lightbox="../media/how-to/provisioned-onboarding/capacity-calculator.png":::
5456

26.1 KB
Loading

0 commit comments

Comments
 (0)