You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/latency.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@ ms.custom:
17
17
This article provides you with background around how latency and throughput works with Azure OpenAI and how to optimize your environment to improve performance.
18
18
19
19
## Understanding throughput vs latency
20
-
There are two key concepts to think about when sizing an application: (1) System level throughput measured in tokens per minute (TPM) and (2) Per-call response times (also known as Latency).
20
+
There are two key concepts to think about when sizing an application: (1) System level throughput measured in tokens per minute (TPM) and (2) Per-call response times (also known as latency).
21
21
22
22
### System level throughput
23
23
This looks at the overall capacity of your deployment – how many requests per minute and total tokens that can be processed.
0 commit comments