You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/latency.md
+5-8Lines changed: 5 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -56,19 +56,16 @@ A second approach to estimated system level throughput involves collecting token
56
56
}
57
57
}
58
58
```
59
-
Assuming all requests for a given workload are uniform, the prompt tokens and completion tokens can each be multiplied by the estimated RPM to identify the input and output TPM for the given workload.
59
+
Assuming all requests for a given workload are uniform, the prompt tokens and completion tokens from the API response data can each be multiplied by the estimated RPM to identify the input and output TPM for the given workload.
60
60
61
-
##### Estimating TPM from common workload shapes
61
+
##### How to use system level throughput estimates
62
62
63
-
There are two approaches that can be used to estimate the amount of model processing capacity needed to support a given workload:
64
63
65
-
1. Use the built-in capacity calculator in the Azure OpenAI deployment creation workflow in the Azure AI Studio
64
+
Once system level throughput has been estimated for a given workload, these estimates can be used to size Standard and Provisioned deployments.
66
65
67
-
1. Use the expanded Azure OpenAI capacity calculator in the Azure AI Studio
0 commit comments