Skip to content

Commit d42bd7f

Browse files
committed
Learn Editor: Update latency.md
1 parent a06436b commit d42bd7f

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

articles/ai-services/openai/how-to/latency.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -65,11 +65,11 @@ Once system level throughput has been estimated for a given workload, these esti
6565

6666
Here are a few examples for GPT-4o mini model:
6767

68-
| Prompt Size (tokens) |Generation size (tokens) |Requests per minute |Input TPM|Output TPM|PTUs required |
69-
|--|--|--| -------- | -------- |--|
70-
|800 |150 |30 |24,000|4,500|15|
71-
|5,000 |50 |1,000|5,000,000|50,000|140|
72-
|1,000 |300 | 500 |500,000|150,000|30|
68+
| Prompt Size (tokens) |Generation size (tokens) |Requests per minute |Input TPM|Output TPM|Total TPM|PTUs required |
69+
|--|--|--| -------- | -------- | -------- |--|
70+
|800 |150 |30 |24,000|4,500|28,500|15|
71+
|5,000 |50 |1,000|5,000,000|50,000|5,050,000|140|
72+
|1,000 |300 | 500 |500,000|150,000|650,000|30|
7373

7474
The number of PTUs scales roughly linearly with call rate (might be sublinear) when the workload distribution remains constant.
7575

0 commit comments

Comments
 (0)