You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/concepts/provisioned-throughput.md
+18-18Lines changed: 18 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -163,7 +163,7 @@ For provisioned deployments, we use a variation of the leaky bucket algorithm to
163
163
The number of concurrent calls you can achieve depends on each call's shape (prompt size, `max_tokens` parameter, etc.). The service continues to accept calls until the utilization reaches 100%. To determine the approximate number of concurrent calls, you can model out the maximum requests per minute for a particular call shape in the [capacity calculator](https://ai.azure.com/resource/calculator). If the system generates less than the number of output tokens set for the `max_tokens` parameter, then the provisioned deployment will accept more requests.
164
164
165
165
166
-
## Foundry Models with provisioned throughput capability
166
+
## Provisioned throughput capability for Models Sold Directly by Azure
167
167
168
168
This section lists Foundry Models that support the provisioned throughput capability. You can use your PTU quota and PTU reservation across the models shown in the table.
169
169
@@ -179,23 +179,23 @@ The following points are some important takeaways from the table:
179
179
180
180
- Spillover is an optional capability that manages traffic fluctuations on provisioned deployments. For more information on spillover, see [Manage traffic with spillover for provisioned deployments (Preview)](../how-to/spillover-traffic-management.md).
181
181
182
-
| Model Family | Model name | Global provisioned | Data zone provisioned | Regional provisioned | Spillover feature |
0 commit comments