You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/quota.md
+269-2Lines changed: 269 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ manager: nitinme
8
8
ms.service: cognitive-services
9
9
ms.subservice: openai
10
10
ms.topic: how-to
11
-
ms.date: 07/20/2023
11
+
ms.date: 08/01/2023
12
12
ms.author: mbullwin
13
13
---
14
14
@@ -19,7 +19,11 @@ Quota provides the flexibility to actively manage the allocation of rate limits
19
19
## Prerequisites
20
20
21
21
> [!IMPORTANT]
22
-
> Quota requires the **Cognitive Services Usages Reader** role. This role provides the minimal access necessary to view quota usage across an Azure subscription. This role can be found in the Azure portal under **Subscriptions** > **Access control (IAM)** > **Add role assignment** > search for **Cognitive Services Usages Reader**.
22
+
> Viewing quota and deploying models requires the **Cognitive Services Usages Reader** role. This role provides the minimal access necessary to view quota usage across an Azure subscription.
23
+
>
24
+
> This role can be found in the Azure portal under **Subscriptions** > **Access control (IAM)** > **Add role assignment** > search for **Cognitive Services Usages Reader**.This role **must be applied at the subscription level**, it does not exist at the resource level.
25
+
>
26
+
> If you do not wish to use this role, the subscription **Reader** role will provide equivalent access, but it will also grant read access beyond the scope of what is needed for viewing quota and model deployment.
23
27
24
28
## Introduction to quota
25
29
@@ -104,6 +108,269 @@ To minimize issues related to rate limits, it's a good idea to use the following
104
108
- Avoid sharp changes in the workload. Increase the workload gradually.
105
109
- Test different load increase patterns.
106
110
111
+
## Automate deployment
112
+
113
+
This section contains brief example templates to help get you started programmatically creating deployments that use quota to set TPM rate limits. With the introduction of quota you must use API version `2023-05-01` for resource management related activities. This API version is only for managing your resources, and does not impact the API version used for inferencing calls like completions, chat completions, embedding, image generation etc.
114
+
115
+
# [REST](#tab/rest)
116
+
117
+
### Deployment
118
+
119
+
```http
120
+
PUT https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.CognitiveServices/accounts/{accountName}/deployments/{deploymentName}?api-version=2023-05-01
121
+
```
122
+
123
+
**Path parameters**
124
+
125
+
| Parameter | Type | Required? | Description |
126
+
|--|--|--|--|
127
+
|```accountName```| string | Required | The name of your Azure OpenAI Resource. |
128
+
|```deploymentName```| string | Required | The deployment name you chose when you deployed an existing model or the name you would like a new model deployment to have. |
129
+
|```resourceGroupName```| string | Required | The name of the associated resource group for this model deployment. |
130
+
|```subscriptionId```| string | Required | Subscription ID for the associated subscription. |
131
+
|```api-version```| string | Required |The API version to use for this operation. This follows the YYYY-MM-DD format. |
This is only a subset of the available request body parameters. For the full list of the parameters, you can refer to the [REST API reference documentation](/rest/api/cognitiveservices/accountmanagement/deployments/create-or-update?tabs=HTTP).
140
+
141
+
|Parameter|Type| Description |
142
+
|--|--|--|
143
+
|sku | Sku | The resource model definition representing SKU.|
144
+
|capacity|integer|This represents the amount of [quota](../how-to/quota.md) you are assigning to this deployment. A value of 1 equals 1,000 Tokens per Minute (TPM). A value of 10 equals 10k Tokens per Minute (TPM).|
145
+
146
+
#### Example request
147
+
148
+
```Bash
149
+
curl -X PUT https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resource-group-temp/providers/Microsoft.CognitiveServices/accounts/docs-openai-test-001/deployments/gpt-35-turbo-test-deployment?api-version=2023-05-01 \
> There are multiple ways to generate an authorization token. The easiest method for initial testing is to launch the Cloud Shell from the [Azure portal](https://portal.azure.com). Then run [`az account get-access-token`](/cli/azure/account?view=azure-cli-latest#az-account-get-access-token&preserve-view=true). You can use this token as your temporary authorization token for API testing.
157
+
158
+
For more information, refer to the REST API reference documentation for [usages](/rest/api/cognitiveservices/accountmanagement/usages/list?branch=main&tabs=HTTP) and [deployment](/rest/api/cognitiveservices/accountmanagement/deployments/create-or-update).
159
+
160
+
### Usage
161
+
162
+
To query your quota usage in a given region, for a specific subscription
163
+
164
+
```html
165
+
GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.CognitiveServices/locations/{location}/usages?api-version=2023-05-01
166
+
```
167
+
**Path parameters**
168
+
169
+
| Parameter | Type | Required? | Description |
170
+
|--|--|--|--|
171
+
|```subscriptionId```| string | Required | Subscription ID for the associated subscription. |
172
+
|```location```| string | Required | Location to view usage for ex: `eastus`|
173
+
|```api-version```| string | Required |The API version to use for this operation. This follows the YYYY-MM-DD format. |
curl -X GET https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01 \
183
+
-H "Content-Type: application/json" \
184
+
-H 'Authorization: Bearer YOUR_AUTH_TOKEN'
185
+
```
186
+
187
+
# [Azure CLI](#tab/cli)
188
+
189
+
Install the [Azure CLI](/cli/azure/install-azure-cli). Quota requires `Azure CLI version 2.51.0`. If you already have Azure CLI installed locally run `az upgrade` to update to the latest version.
190
+
191
+
To check which version of Azure CLI you are running use `az version`. Azure Cloud Shell is currently still running 2.50.0 so in the interim local installation of Azure CLI is required to take advantage of the latest Azure OpenAI features.
192
+
193
+
### Deployment
194
+
195
+
```azurecli
196
+
az cognitiveservices account deployment create --model-format
197
+
--model-name
198
+
--model-version
199
+
--name
200
+
--resource-group
201
+
[--capacity]
202
+
[--deployment-name]
203
+
[--scale-capacity]
204
+
[--scale-settings-scale-type {Manual, Standard}]
205
+
[--sku]
206
+
```
207
+
208
+
To sign into your local installation of the CLI, run the [az login](/cli/azure/reference-index#az-login) command:
209
+
210
+
```azurecli
211
+
az login
212
+
```
213
+
214
+
<!--TODO:You can also use the green **Try It** button to run these commands in your browser in the Azure Cloud Shell.-->
215
+
216
+
By setting sku-capacity to 10 in the command below this deployment will be set with a 10K TPM limit.
To [query your quota usage](/cli/azure/cognitiveservices/usage?view=azure-cli-latest&preserve-view=true) in a given region, for a specific subscription
225
+
226
+
```azurecli
227
+
az cognitiveservices usage list --location
228
+
```
229
+
230
+
### Example
231
+
232
+
```azurecli
233
+
az cognitiveservices usage list -l eastus
234
+
```
235
+
236
+
This command runs in the context of the currently active subscription for Azure CLI. Use `az-account-set --subscription` to [modify the active subscription](/cli/azure/manage-azure-subscriptions-azure-cli#change-the-active-subscription).
237
+
238
+
For more details on `az cognitiveservices account` and `az cognitivesservices usage` consult the [Azure CLI reference documentation](/cli/azure/cognitiveservices/account/deployment?view=azure-cli-latest&preserve-view=true)
239
+
240
+
# [Azure Resource Manager](#tab/arm)
241
+
242
+
```json
243
+
//
244
+
// This Azure Resource Manager template shows how to use the new schema introduced in the 2023-05-01 API version to
245
+
// create deployments that set the model version and the TPM limits for standard deployments.
"name": "arm-je-aoai-test-resource/arm-je-std-deployment", // Update reference to parent Azure OpenAI resource
251
+
"dependsOn": [
252
+
"[resourceId('Microsoft.CognitiveServices/accounts', 'arm-je-aoai-test-resource')]"// Update reference to parent Azure OpenAI resource
253
+
],
254
+
"sku": {
255
+
"name": "Standard",
256
+
"capacity": 10// The deployment will be created with a 10K TPM limit
257
+
},
258
+
"properties": {
259
+
"model": {
260
+
"format": "OpenAI",
261
+
"name": "gpt-35-turbo",
262
+
"version": "0613"// Version 0613 of gpt-35-turbo will be used
263
+
}
264
+
}
265
+
}
266
+
```
267
+
268
+
For more details, consult the [full Azure Resource Manager reference documentation](/azure/templates/microsoft.cognitiveservices/accounts/deployments?pivots=deployment-language-arm-template).
269
+
270
+
# [Bicep](#tab/bicep)
271
+
272
+
```bicep
273
+
//
274
+
// This Bicep template shows how to use the new schema introduced in the 2023-05-01 API version to
275
+
// create deployments that set the model version and the TPM limits for standard deployments.
parent: arm_je_aoai_resource // Replace this with a reference to the parent Azure OpenAI resource
279
+
name: 'arm-je-std-deployment'
280
+
sku: {
281
+
name: 'Standard'
282
+
capacity: 10 // The deployment will be created with a 10K TPM limit
283
+
}
284
+
properties: {
285
+
model: {
286
+
format: 'OpenAI'
287
+
name: 'gpt-35-turbo'
288
+
version: '0613' // gpt-35-turbo version 0613 will be used
289
+
}
290
+
}
291
+
}
292
+
```
293
+
294
+
For more details consult the [full Bicep reference documentation](/azure/templates/microsoft.cognitiveservices/accounts/deployments?pivots=deployment-language-bicep).
295
+
296
+
# [Terraform](#tab/terraform)
297
+
298
+
```terraform
299
+
# This Terraform template shows how to use the new schema introduced in the 2023-05-01 API version to
300
+
# create deployments that set the model version and the TPM limits for standard deployments.
301
+
#
302
+
# The new schema is not yet available in the AzureRM provider (target v4.0), so this template uses the AzAPI
303
+
# provider, which provides a Terraform-compatible interface to the underlying ARM structures.
sku = { # The sku object specifies the deployment type and limit in 2023-05-01
356
+
name = "Standard",
357
+
capacity = 10 # This deployment will be set with a 10K TPM limit
358
+
},
359
+
properties = {
360
+
model = {
361
+
format = "OpenAI",
362
+
name = "gpt-35-turbo",
363
+
version = "0613" # Deploy gpt-35-turbo version 0613
364
+
}
365
+
}
366
+
})
367
+
}
368
+
```
369
+
370
+
For more details consult the [full Terraform reference documentation](/azure/templates/microsoft.cognitiveservices/accounts/deployments?pivots=deployment-language-terraform).
371
+
372
+
---
373
+
107
374
## Resource deletion
108
375
109
376
When an attempt to delete an Azure OpenAI resource is made from the Azure portal if any deployments are still present deletion is blocked until the associated deployments are deleted. Deleting the deployments first allows quota allocations to be properly freed up so they can be used on new deployments.
0 commit comments