Skip to content

Commit baa2ec5

Browse files
Merge pull request #1173 from mpande98/docs-editor/provisioned-throughput-1730392636
Update provisioned-throughput.md
2 parents 2e64bf0 + 379875b commit baa2ec5

File tree

1 file changed

+4
-3
lines changed

1 file changed

+4
-3
lines changed

articles/ai-services/openai/concepts/provisioned-throughput.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,13 +41,14 @@ An Azure OpenAI Deployment is a unit of management for a specific OpenAI Model.
4141
## How much throughput per PTU you get for each model
4242
The amount of throughput (tokens per minute or TPM) a deployment gets per PTU is a function of the input and output tokens in the minute. Generating output tokens requires more processing than input tokens and so the more output tokens generated the lower your overall TPM. The service dynamically balances the input & output costs, so users do not have to set specific input and output limits. This approach means your deployment is resilient to fluctuations in the workload shape.
4343

44-
To help with simplifying the sizing effort, the following table outlines the TPM per PTU for the `gpt-4o` and `gpt-4o-mini` models
44+
To help with simplifying the sizing effort, the following table outlines the TPM per PTU for the `gpt-4o` and `gpt-4o-mini` models. The table also shows Service Level Agreement (SLA) Latency Target Values per model. For more information about the SLA for Azure OpenAI Service, see the [Service Level Agreements (SLA) for Online Services page].(https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services?lang=1)
4545

46-
| | **gpt-4o**, **2024-05-13** & **gpt-4o**, **2024-08-06** | **gpt-4o-mini**, **2024-07-18** |
46+
| | **gpt-4o**, **2024-05-13** & **gpt-4o**, **2024-08-06** | **gpt-4o-mini**, **2024-07-18** |
4747
| --- | --- | --- |
4848
| Deployable Increments | 50 | 25|
4949
| Input TPM per PTU | 2,500 | 37,000 |
50-
| Output TPM per PTU | 833 | 12,333 |
50+
| Output TPM per PTU| 833|12,333|
51+
| Latency Target Value |25 Tokens Per Second* |33 Tokens Per Second*|
5152

5253
For a full list see the [AOAI Studio calculator](https://oai.azure.com/portal/calculator).
5354

0 commit comments

Comments
 (0)