Skip to content

Commit 95bdb11

Browse files
committed
Learn Editor: Update latency.md
1 parent 7329554 commit 95bdb11

File tree

1 file changed

+5
-8
lines changed

1 file changed

+5
-8
lines changed

articles/ai-services/openai/how-to/latency.md

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -56,19 +56,16 @@ A second approach to estimated system level throughput involves collecting token
5656
}
5757
}
5858
```
59-
Assuming all requests for a given workload are uniform, the prompt tokens and completion tokens can each be multiplied by the estimated RPM to identify the input and output TPM for the given workload.
59+
Assuming all requests for a given workload are uniform, the prompt tokens and completion tokens from the API response data can each be multiplied by the estimated RPM to identify the input and output TPM for the given workload.
6060

61-
##### Estimating TPM from common workload shapes
61+
##### How to use system level throughput estimates
6262

63-
There are two approaches that can be used to estimate the amount of model processing capacity needed to support a given workload:
6463

65-
1. Use the built-in capacity calculator in the Azure OpenAI deployment creation workflow in the Azure AI Studio
64+
Once system level throughput has been estimated for a given workload, these estimates can be used to size Standard and Provisioned deployments.
6665

67-
1. Use the expanded Azure OpenAI capacity calculator in the Azure AI Studio
66+
Here are a few examples for GPT-4o mini model:
6867

69-
Here are a few examples for GPT-4 0613 model:
70-
71-
| Prompt Size (tokens) | Generation size (tokens) | Calls per minute | PTUs required |
68+
| Prompt Size (tokens) | Generation size (tokens) | Requests per minute | PTUs required |
7269
|--|--|--|--|
7370
| 800 | 150 | 30 | 100 |
7471
| 1000 | 50 | 300 | 700 |

0 commit comments

Comments
 (0)