Skip to content

Commit 0224093

Browse files
Update articles/ai-services/openai/how-to/latency.md
Co-authored-by: Michael <[email protected]>
1 parent 289f00c commit 0224093

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

articles/ai-services/openai/how-to/latency.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ ms.custom:
1717
This article provides you with background around how latency and throughput works with Azure OpenAI and how to optimize your environment to improve performance.
1818

1919
## Understanding throughput vs latency
20-
There are two key concepts to think about when sizing an application: (1) System level throughput measured in tokens per minute (TPM) and (2) Per-call response times (also known as Latency).
20+
There are two key concepts to think about when sizing an application: (1) System level throughput measured in tokens per minute (TPM) and (2) Per-call response times (also known as latency).
2121

2222
### System level throughput
2323
This looks at the overall capacity of your deployment – how many requests per minute and total tokens that can be processed.

0 commit comments

Comments
 (0)