Skip to content

Commit 0b26e5c

Browse files
committed
Learn Editor: Update latency.md
1 parent 7d98100 commit 0b26e5c

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

articles/ai-services/openai/how-to/latency.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ One approach to estimating system level throughput for a given workload is using
3636

3737
When combined, the **Processed Prompt Tokens** (input TPM) and **Generated Completion Tokens** (output TPM) metrics provide an estimated view of system level throughput based on actual workload traffic. This approach does not account for benefits from prompt caching, so it will be a conservative system throughput estimate. These metrics can be analyzed using minimum, average, and maximum aggregation over 1-minute windows across a multi-week time horizon. It is recommended to analyze this data over a multi-week time horizon to ensure there are enough data points to assess. The following screenshot shows an example of the **Processed Prompt Tokens** metric visualized in Azure Monitor, which is available directly through the Azure portal.
3838

39-
![processed-prompt-token-graph](media/latency/processed-prompt-token-graph.png)
39+
![Screenshot of Azure Monitor graph showcasing the Processed Prompt Tokens metric split by model and version.](media/latency/processed-prompt-token-graph.png)
4040

4141
##### Estimating TPM from request data
4242

0 commit comments

Comments
 (0)