Skip to content

Commit 0f707be

Browse files
author
Jill Grant
authored
Merge pull request #264756 from mrbullwinkle/mrb_01_30_2024_Dynamic_quota
[Azure OpenAI] Dynamic Quota
2 parents 7dd0dbd + c69cfc9 commit 0f707be

File tree

5 files changed

+85
-2
lines changed

5 files changed

+85
-2
lines changed
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
---
2+
title: Azure OpenAI Service dynamic quota
3+
titleSuffix: Azure AI services
4+
description: Learn how to use Azure OpenAI dynamic quota
5+
#services: cognitive-services
6+
author: mrbullwinkle
7+
manager: nitinme
8+
ms.service: azure-ai-openai
9+
ms.topic: how-to
10+
ms.date: 01/30/2024
11+
ms.author: mbullwin
12+
---
13+
14+
15+
# Azure OpenAI Dynamic quota (Preview)
16+
17+
Dynamic quota is an Azure OpenAI feature that enables a standard (pay-as-you-go) deployment to opportunistically take advantage of more quota when extra capacity is available. When dynamic quota is set to off, your deployment will be able to process a maximum throughput established by your Tokens Per Minute (TPM) setting. When you exceed your preset TPM, requests will return HTTP 429 responses. When dynamic quota is enabled, the deployment has the capability to access higher throughput before returning 429 responses, allowing you to perform more calls earlier. The extra requests are still billed at the [regular pricing rates](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/).
18+
19+
Dynamic quota can only temporarily *increase* your available quota: it will never decrease below your configured value.
20+
21+
## When to use dynamic quota
22+
23+
Dynamic quota is useful in most scenarios, particularly when your application can use extra capacity opportunistically or the application itself is driving the rate at which the Azure OpenAI API is called.
24+
25+
Typically, the situation in which you might prefer to avoid dynamic quota is when your application would provide an adverse experience if quota is volatile or increased.
26+
27+
For dynamic quota, consider scenarios such as:
28+
29+
* Bulk processing,
30+
* Creating summarizations or embeddings for Retrieval Augmented Generation (RAG),
31+
* Offline analysis of logs for generation of metrics and evaluations,
32+
* Low-priority research,
33+
* Apps that have a small amount of quota allocated.
34+
35+
### When does dynamic quota come into effect?
36+
37+
The Azure OpenAI backend decides if, when, and how much extra dynamic quota is added or removed from different deployments. It isn't forecasted or announced in advance, and isn't predictable. Azure OpenAI lets your application know there's more quota available by responding with an HTTP 429 and not letting more API calls through. To take advantage of dynamic quota, your application code must be able to issue more requests as HTTP 429 responses become infrequent.
38+
39+
### How does dynamic quota change costs?
40+
41+
* Calls that are done above your base quota have the same costs as regular calls.
42+
43+
* There's no extra cost to turn on dynamic quota on a deployment, though the increased throughput could ultimately result in increased cost depending on the amount of traffic your deployment receives.
44+
45+
> [!NOTE]
46+
> With dynamic quota, there is no call enforcement of a "ceiling" quota or throughput. Azure OpenAI will process as many requests as it can above your baseline quota. If you need to control the rate of spend even when quota is less constrained, your application code needs to hold back requests accordingly.
47+
48+
## How to use dynamic quota
49+
50+
To use dynamic quota, you must:
51+
52+
* Turn on the dynamic quota property in your Azure OpenAI deployment.
53+
* Make sure your application can take advantage of dynamic quota.
54+
55+
### Enable dynamic quota
56+
57+
To activate dynamic quota for your deployment, you can go to the advanced properties in the resource configuration, and switch it on:
58+
59+
:::image type="content" source="../media/how-to/dynamic-quota/dynamic-quota.png" alt-text="Screenshot of advanced configuration UI for deployments." lightbox="../media/how-to/dynamic-quota/dynamic-quota.png":::
60+
61+
Alternatively, you can enable it programmatically with Azure CLI's [`az rest`](/cli/azure/reference-index?view=azure-cli-latest#az-rest&preserve-view=true):
62+
63+
Replace the `{subscriptionId}`, `{resourceGroupName}`, `{accountName}`, and `{deploymentName}` with the relevant values for your resource. In this case, `accountName` is equal to Azure OpenAI resource name.
64+
65+
```azurecli
66+
az rest --method patch --url "https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.CognitiveServices/accounts/{accountName}/deployments/{deploymentName}?2023-10-01-preview" --body '{"properties": {"dynamicThrottlingEnabled": true} }'
67+
```
68+
69+
### How do I know how much throughput dynamic quota is adding to my app?
70+
71+
To monitor how it's working, you can track the throughput of your application in Azure Monitor. During the Preview of dynamic quota, there's no specific metric or log to indicate if quota has been dynamically increased or decreased.
72+
dynamic quota is less likely to be engaged for your deployment if it runs in heavily utilized regions, and during peak hours of use for those regions.
73+
74+
## Next steps
75+
76+
* Learn more about how [quota works](./quota.md).
77+
* Learn more about [monitoring Azure OpenAI](./monitoring.md).
78+
79+

articles/ai-services/openai/includes/create-resource-portal.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ description: Learn how to use the Azure portal to create an Azure OpenAI resourc
66
manager: nitinme
77
ms.service: azure-ai-openai
88
ms.topic: include
9-
ms.date: 08/25/2023
9+
ms.date: 01/30/2024
1010
---
1111

1212
## Prerequisites
@@ -109,7 +109,7 @@ To deploy a model, follow these steps:
109109
|---|---|
110110
| **Select a model** | Model availability varies by region. For a list of available models per region, see [Model summary table and region availability](../concepts/models.md#model-summary-table-and-region-availability). |
111111
| **Deployment name** | Choose a name carefully. The deployment name is used in your code to call the model by using the client libraries and the REST APIs. |
112-
| **Advanced options** (Optional) | You can set optional advanced settings, as needed for your resource. <br> - For the **Content Filter**, assign a content filter to your deployment.<br> - For the **Tokens per Minute Rate Limit**, adjust the Tokens per Minute (TPM) to set the effective rate limit for your deployment. You can modify this value at any time by using the [**Quotas**](../how-to/quota.md) menu. |
112+
| **Advanced options** (Optional) | You can set optional advanced settings, as needed for your resource. <br> - For the **Content Filter**, assign a content filter to your deployment.<br> - For the **Tokens per Minute Rate Limit**, adjust the Tokens per Minute (TPM) to set the effective rate limit for your deployment. You can modify this value at any time by using the [**Quotas**](../how-to/quota.md) menu. [**Dynamic Quota**](../how-to/dynamic-quota.md) allows you to take advantage of more quota when extra capacity is available. |
113113

114114
5. Select a model from the dropdown list.
115115

articles/ai-services/openai/index.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,8 @@ landingContent:
6262
links:
6363
- text: Quota
6464
url: ./how-to/quota.md
65+
- text: Dynamic quota
66+
url: ./how-to/dynamic-quota.md
6567
- text: Provisioned Throughput Units (PTU)
6668
url: ./concepts/provisioned-throughput.md
6769
- text: Content filtering
47.2 KB
Loading

articles/ai-services/openai/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,8 @@ items:
127127
href: ./how-to/switching-endpoints.md
128128
- name: Manage quota
129129
href: ./how-to/quota.md
130+
- name: Dynamic quota
131+
href: ./how-to/dynamic-quota.md
130132
- name: Monitor Azure OpenAI
131133
href: ./how-to/monitoring.md
132134
- name: Onboarding to Provisioned Throughput Units (PTU)

0 commit comments

Comments
 (0)