Skip to content

Commit 2e7df4b

Browse files
committed
Learn Editor: Update latency.md
1 parent 21d1369 commit 2e7df4b

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

articles/ai-services/openai/how-to/latency.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,10 @@ Understanding system level throughput for any workload involves multiple factors
3434

3535
##### Determining TPM from Azure Monitor metrics
3636

37+
One approach to estimating system level throughput for a given workload is using historical usage data. For Azure OpenAI workloads, all historical usage data can be accessed and visualized with the native Monitoring capabilities offered within Azure OpenAI. Two metrics are needed to estimate system level throughput for Azure OpenAI workloads: (1) Processed Prompt Tokens and (2) Generated Completion Tokens.
38+
39+
When combined, the Processed Prompt Tokens (input TPM) and Generated Completion Tokens (output TPM) provide an aggregated view of system level throughput based on actual traffic in the past. These metrics can be analyzed using minimum, average, and maximum aggregation windows over numerous time periods. It is recommended to analyze this data over a multi-week time horizon to ensure there are enough data points to assess.
40+
3741
##### Calculating TPM from request data
3842

3943
##### Estimating TPM from common workload shapes

0 commit comments

Comments
 (0)