Skip to content

Commit ba5529c

Browse files
committed
Learn Editor: Update latency.md
1 parent 58797c7 commit ba5529c

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

articles/ai-services/openai/how-to/latency.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ For a standard deployment, the quota assigned to your deployment partially deter
2626

2727
In a provisioned deployment, a set amount of model processing capacity is allocated to your endpoint. The amount of throughput that you can achieve on the endpoint is a factor of the workload shape including input token amount, output amount, call rate and cache match rate. The number of concurrent calls and total tokens processed can vary based on these values.
2828

29-
For all deployment types, understanding system level throughput is a key component of optimizing performance. The following section explains several approaches that can be used to estimate system level throughput with existing metrics and data from your Azure OpenAI Service environment.
29+
For all deployment types, understanding system level throughput is a key component of optimizing performance. It is important to consider system level throughput for a given model, version, and workload combination as the throughput will vary across these factors. The following section explains several approaches that can be used to estimate system level throughput with existing metrics and data from your Azure OpenAI Service environment.
3030

3131
#### Estimating system level throughput
3232

@@ -36,7 +36,7 @@ Understanding system level throughput for any workload involves multiple factors
3636

3737
One approach to estimating system level throughput for a given workload is using historical usage data. For Azure OpenAI workloads, all historical usage data can be accessed and visualized with the native Monitoring capabilities offered within Azure OpenAI. Two metrics are needed to estimate system level throughput for Azure OpenAI workloads: (1) **Processed Prompt Tokens** and (2) **Generated Completion Tokens**.
3838

39-
When combined, the Processed Prompt Tokens (input TPM) and Generated Completion Tokens (output TPM) provide an aggregated view of system level throughput based on actual traffic in the past. These metrics can be analyzed using minimum, average, and maximum aggregation windows over numerous time periods. It is recommended to analyze this data over a multi-week time horizon to ensure there are enough data points to assess.
39+
When combined, the Processed Prompt Tokens (input TPM) and Generated Completion Tokens (output TPM) metrics provide an estimated view of system level throughput based on actual workload traffic. These metrics can be analyzed using minimum, average, and maximum aggregation windows over numerous time periods. It is recommended to analyze this data over a multi-week time horizon to ensure there are enough data points to assess.
4040

4141
##### Calculating TPM from request data
4242

0 commit comments

Comments
 (0)