Skip to content

Commit d2c7e95

Browse files
author
Jill Grant
authored
Merge pull request #246836 from mrbullwinkle/mrb_07_31_2023_quota
[Azure AI] [Azure OpenAI] Automate deployment
2 parents c2e080a + f8345fe commit d2c7e95

File tree

1 file changed

+269
-2
lines changed
  • articles/ai-services/openai/how-to

1 file changed

+269
-2
lines changed

articles/ai-services/openai/how-to/quota.md

Lines changed: 269 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: openai
1010
ms.topic: how-to
11-
ms.date: 07/20/2023
11+
ms.date: 08/01/2023
1212
ms.author: mbullwin
1313
---
1414

@@ -19,7 +19,11 @@ Quota provides the flexibility to actively manage the allocation of rate limits
1919
## Prerequisites
2020

2121
> [!IMPORTANT]
22-
> Quota requires the **Cognitive Services Usages Reader** role. This role provides the minimal access necessary to view quota usage across an Azure subscription. This role can be found in the Azure portal under **Subscriptions** > **Access control (IAM)** > **Add role assignment** > search for **Cognitive Services Usages Reader**.
22+
> Viewing quota and deploying models requires the **Cognitive Services Usages Reader** role. This role provides the minimal access necessary to view quota usage across an Azure subscription.
23+
>
24+
> This role can be found in the Azure portal under **Subscriptions** > **Access control (IAM)** > **Add role assignment** > search for **Cognitive Services Usages Reader**.This role **must be applied at the subscription level**, it does not exist at the resource level.
25+
>
26+
> If you do not wish to use this role, the subscription **Reader** role will provide equivalent access, but it will also grant read access beyond the scope of what is needed for viewing quota and model deployment.
2327
2428
## Introduction to quota
2529

@@ -104,6 +108,269 @@ To minimize issues related to rate limits, it's a good idea to use the following
104108
- Avoid sharp changes in the workload. Increase the workload gradually.
105109
- Test different load increase patterns.
106110

111+
## Automate deployment
112+
113+
This section contains brief example templates to help get you started programmatically creating deployments that use quota to set TPM rate limits. With the introduction of quota you must use API version `2023-05-01` for resource management related activities. This API version is only for managing your resources, and does not impact the API version used for inferencing calls like completions, chat completions, embedding, image generation etc.
114+
115+
# [REST](#tab/rest)
116+
117+
### Deployment
118+
119+
```http
120+
PUT https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.CognitiveServices/accounts/{accountName}/deployments/{deploymentName}?api-version=2023-05-01
121+
```
122+
123+
**Path parameters**
124+
125+
| Parameter | Type | Required? | Description |
126+
|--|--|--|--|
127+
| ```accountName``` | string | Required | The name of your Azure OpenAI Resource. |
128+
| ```deploymentName``` | string | Required | The deployment name you chose when you deployed an existing model or the name you would like a new model deployment to have. |
129+
| ```resourceGroupName``` | string | Required | The name of the associated resource group for this model deployment. |
130+
| ```subscriptionId``` | string | Required | Subscription ID for the associated subscription. |
131+
| ```api-version``` | string | Required |The API version to use for this operation. This follows the YYYY-MM-DD format. |
132+
133+
**Supported versions**
134+
135+
- `2023-05-01` [Swagger spec](https://github.com/Azure/azure-rest-api-specs/blob/1e71ad94aeb8843559d59d863c895770560d7c93/specification/cognitiveservices/resource-manager/Microsoft.CognitiveServices/stable/2023-05-01/cognitiveservices.json)
136+
137+
**Request body**
138+
139+
This is only a subset of the available request body parameters. For the full list of the parameters, you can refer to the [REST API reference documentation](/rest/api/cognitiveservices/accountmanagement/deployments/create-or-update?tabs=HTTP).
140+
141+
|Parameter|Type| Description |
142+
|--|--|--|
143+
|sku | Sku | The resource model definition representing SKU.|
144+
|capacity|integer|This represents the amount of [quota](../how-to/quota.md) you are assigning to this deployment. A value of 1 equals 1,000 Tokens per Minute (TPM). A value of 10 equals 10k Tokens per Minute (TPM).|
145+
146+
#### Example request
147+
148+
```Bash
149+
curl -X PUT https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resource-group-temp/providers/Microsoft.CognitiveServices/accounts/docs-openai-test-001/deployments/gpt-35-turbo-test-deployment?api-version=2023-05-01 \
150+
-H "Content-Type: application/json" \
151+
-H 'Authorization: Bearer YOUR_AUTH_TOKEN' \
152+
-d '{"sku":{"name":"Standard","capacity":10},"properties": {"model": {"format": "OpenAI","name": "gpt-35-turbo","version": "0613"}}}'
153+
```
154+
155+
> [!NOTE]
156+
> There are multiple ways to generate an authorization token. The easiest method for initial testing is to launch the Cloud Shell from the [Azure portal](https://portal.azure.com). Then run [`az account get-access-token`](/cli/azure/account?view=azure-cli-latest#az-account-get-access-token&preserve-view=true). You can use this token as your temporary authorization token for API testing.
157+
158+
For more information, refer to the REST API reference documentation for [usages](/rest/api/cognitiveservices/accountmanagement/usages/list?branch=main&tabs=HTTP) and [deployment](/rest/api/cognitiveservices/accountmanagement/deployments/create-or-update).
159+
160+
### Usage
161+
162+
To query your quota usage in a given region, for a specific subscription
163+
164+
```html
165+
GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.CognitiveServices/locations/{location}/usages?api-version=2023-05-01
166+
```
167+
**Path parameters**
168+
169+
| Parameter | Type | Required? | Description |
170+
|--|--|--|--|
171+
| ```subscriptionId``` | string | Required | Subscription ID for the associated subscription. |
172+
|```location``` | string | Required | Location to view usage for ex: `eastus` |
173+
| ```api-version``` | string | Required |The API version to use for this operation. This follows the YYYY-MM-DD format. |
174+
175+
**Supported versions**
176+
177+
- `2023-05-01` [Swagger spec](https://github.com/Azure/azure-rest-api-specs/blob/1e71ad94aeb8843559d59d863c895770560d7c93/specification/cognitiveservices/resource-manager/Microsoft.CognitiveServices/stable/2023-05-01/cognitiveservices.json)
178+
179+
#### Example request
180+
181+
```Bash
182+
curl -X GET https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01 \
183+
-H "Content-Type: application/json" \
184+
-H 'Authorization: Bearer YOUR_AUTH_TOKEN'
185+
```
186+
187+
# [Azure CLI](#tab/cli)
188+
189+
Install the [Azure CLI](/cli/azure/install-azure-cli). Quota requires `Azure CLI version 2.51.0`. If you already have Azure CLI installed locally run `az upgrade` to update to the latest version.
190+
191+
To check which version of Azure CLI you are running use `az version`. Azure Cloud Shell is currently still running 2.50.0 so in the interim local installation of Azure CLI is required to take advantage of the latest Azure OpenAI features.
192+
193+
### Deployment
194+
195+
```azurecli
196+
az cognitiveservices account deployment create --model-format
197+
--model-name
198+
--model-version
199+
--name
200+
--resource-group
201+
[--capacity]
202+
[--deployment-name]
203+
[--scale-capacity]
204+
[--scale-settings-scale-type {Manual, Standard}]
205+
[--sku]
206+
```
207+
208+
To sign into your local installation of the CLI, run the [az login](/cli/azure/reference-index#az-login) command:
209+
210+
```azurecli
211+
az login
212+
```
213+
214+
<!--TODO:You can also use the green **Try It** button to run these commands in your browser in the Azure Cloud Shell.-->
215+
216+
By setting sku-capacity to 10 in the command below this deployment will be set with a 10K TPM limit.
217+
218+
```azurecli
219+
az cognitiveservices account deployment create -g test-resource-group -n test-resource-name --deployment-name test-deployment-name --model-name gpt-35-turbo --model-version "0613" --model-format OpenAI --sku-capacity 10 --sku-name "Standard"
220+
```
221+
222+
### Usage
223+
224+
To [query your quota usage](/cli/azure/cognitiveservices/usage?view=azure-cli-latest&preserve-view=true) in a given region, for a specific subscription
225+
226+
```azurecli
227+
az cognitiveservices usage list --location
228+
```
229+
230+
### Example
231+
232+
```azurecli
233+
az cognitiveservices usage list -l eastus
234+
```
235+
236+
This command runs in the context of the currently active subscription for Azure CLI. Use `az-account-set --subscription` to [modify the active subscription](/cli/azure/manage-azure-subscriptions-azure-cli#change-the-active-subscription).
237+
238+
For more details on `az cognitiveservices account` and `az cognitivesservices usage` consult the [Azure CLI reference documentation](/cli/azure/cognitiveservices/account/deployment?view=azure-cli-latest&preserve-view=true)
239+
240+
# [Azure Resource Manager](#tab/arm)
241+
242+
```json
243+
//
244+
// This Azure Resource Manager template shows how to use the new schema introduced in the 2023-05-01 API version to
245+
// create deployments that set the model version and the TPM limits for standard deployments.
246+
//
247+
{
248+
"type": "Microsoft.CognitiveServices/accounts/deployments",
249+
"apiVersion": "2023-05-01",
250+
"name": "arm-je-aoai-test-resource/arm-je-std-deployment", // Update reference to parent Azure OpenAI resource
251+
"dependsOn": [
252+
"[resourceId('Microsoft.CognitiveServices/accounts', 'arm-je-aoai-test-resource')]" // Update reference to parent Azure OpenAI resource
253+
],
254+
"sku": {
255+
"name": "Standard",
256+
"capacity": 10 // The deployment will be created with a 10K TPM limit
257+
},
258+
"properties": {
259+
"model": {
260+
"format": "OpenAI",
261+
"name": "gpt-35-turbo",
262+
"version": "0613" // Version 0613 of gpt-35-turbo will be used
263+
}
264+
}
265+
}
266+
```
267+
268+
For more details, consult the [full Azure Resource Manager reference documentation](/azure/templates/microsoft.cognitiveservices/accounts/deployments?pivots=deployment-language-arm-template).
269+
270+
# [Bicep](#tab/bicep)
271+
272+
```bicep
273+
//
274+
// This Bicep template shows how to use the new schema introduced in the 2023-05-01 API version to
275+
// create deployments that set the model version and the TPM limits for standard deployments.
276+
//
277+
resource arm_je_std_deployment 'Microsoft.CognitiveServices/accounts/deployments@2023-05-01' = {
278+
parent: arm_je_aoai_resource // Replace this with a reference to the parent Azure OpenAI resource
279+
name: 'arm-je-std-deployment'
280+
sku: {
281+
name: 'Standard'
282+
capacity: 10 // The deployment will be created with a 10K TPM limit
283+
}
284+
properties: {
285+
model: {
286+
format: 'OpenAI'
287+
name: 'gpt-35-turbo'
288+
version: '0613' // gpt-35-turbo version 0613 will be used
289+
}
290+
}
291+
}
292+
```
293+
294+
For more details consult the [full Bicep reference documentation](/azure/templates/microsoft.cognitiveservices/accounts/deployments?pivots=deployment-language-bicep).
295+
296+
# [Terraform](#tab/terraform)
297+
298+
```terraform
299+
# This Terraform template shows how to use the new schema introduced in the 2023-05-01 API version to
300+
# create deployments that set the model version and the TPM limits for standard deployments.
301+
#
302+
# The new schema is not yet available in the AzureRM provider (target v4.0), so this template uses the AzAPI
303+
# provider, which provides a Terraform-compatible interface to the underlying ARM structures.
304+
#
305+
# For more details on these providers:
306+
# AzureRM: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
307+
# AzAPI: https://registry.terraform.io/providers/azure/azapi/latest/docs
308+
#
309+
310+
#
311+
terraform {
312+
required_providers {
313+
azapi = { source = "Azure/azapi" }
314+
azurerm = { source = "hashicorp/azurerm" }
315+
}
316+
}
317+
318+
provider "azapi" {
319+
# Insert auth info here as necessary
320+
}
321+
322+
provider "azurerm" {
323+
# Insert auth info here as necessary
324+
features {
325+
}
326+
}
327+
328+
#
329+
# To create a complete example, AzureRM is used to create a new resource group and Azure OpenAI Resource
330+
#
331+
resource "azurerm_resource_group" "TERRAFORM-AOAI-TEST-GROUP" {
332+
name = "TERRAFORM-AOAI-TEST-GROUP"
333+
location = "canadaeast"
334+
}
335+
336+
resource "azurerm_cognitive_account" "TERRAFORM-AOAI-TEST-ACCOUNT" {
337+
name = "terraform-aoai-test-account"
338+
location = "canadaeast"
339+
resource_group_name = azurerm_resource_group.TERRAFORM-AOAI-TEST-GROUP.name
340+
kind = "OpenAI"
341+
sku_name = "S0"
342+
custom_subdomain_name = "terraform-test-account-"
343+
}
344+
345+
346+
#
347+
# AzAPI is used to create the deployment so that the TPM limit and model versions can be set
348+
#
349+
resource "azapi_resource" "TERRAFORM-AOAI-STD-DEPLOYMENT" {
350+
type = "Microsoft.CognitiveServices/accounts/deployments@2023-05-01"
351+
name = "TERRAFORM-AOAI-STD-DEPLOYMENT"
352+
parent_id = azurerm_cognitive_account.TERRAFORM-AOAI-TEST-ACCOUNT.id
353+
354+
body = jsonencode({
355+
sku = { # The sku object specifies the deployment type and limit in 2023-05-01
356+
name = "Standard",
357+
capacity = 10 # This deployment will be set with a 10K TPM limit
358+
},
359+
properties = {
360+
model = {
361+
format = "OpenAI",
362+
name = "gpt-35-turbo",
363+
version = "0613" # Deploy gpt-35-turbo version 0613
364+
}
365+
}
366+
})
367+
}
368+
```
369+
370+
For more details consult the [full Terraform reference documentation](/azure/templates/microsoft.cognitiveservices/accounts/deployments?pivots=deployment-language-terraform).
371+
372+
---
373+
107374
## Resource deletion
108375

109376
When an attempt to delete an Azure OpenAI resource is made from the Azure portal if any deployments are still present deletion is blocked until the associated deployments are deleted. Deleting the deployments first allows quota allocations to be properly freed up so they can be used on new deployments.

0 commit comments

Comments
 (0)