Skip to content

Commit 46b6dca

Browse files
Merge pull request #245710 from mrbullwinkle/mrb_07_20_2023_quota_rbac
[Azure AI] [Azure OpenAI] update info on RBAC prereqs
2 parents 363b8b8 + e5d9430 commit 46b6dca

File tree

1 file changed

+6
-1
lines changed
  • articles/ai-services/openai/how-to

1 file changed

+6
-1
lines changed

articles/ai-services/openai/how-to/quota.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,19 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: openai
1010
ms.topic: how-to
11-
ms.date: 07/18/2023
11+
ms.date: 07/20/2023
1212
ms.author: mbullwin
1313
---
1414

1515
# Manage Azure OpenAI Service quota
1616

1717
Quota provides the flexibility to actively manage the allocation of rate limits across the deployments within your subscription. This article walks through the process of managing your Azure OpenAI quota.
1818

19+
## Prerequisites
20+
21+
> [!IMPORTANT]
22+
> Quota requires the **Cognitive Services Usages Reader** role. This role provides the minimal access necessary to view quota usage across an Azure subscription. This role can be found in the Azure portal under **Subscriptions** > **Access control (IAM)** > **Add role assignment** > search for **Cognitive Services Usages Reader**.
23+
1924
## Introduction to quota
2025

2126
Azure OpenAI's quota feature enables assignment of rate limits to your deployments, up-to a global limit called your “quota.” Quota is assigned to your subscription on a per-region, per-model basis in units of **Tokens-per-Minute (TPM)**. When you onboard a subscription to Azure OpenAI, you'll receive default quota for most available models. Then, you'll assign TPM to each deployment as it is created, and the available quota for that model will be reduced by that amount. You can continue to create deployments and assign them TPM until you reach your quota limit. Once that happens, you can only create new deployments of that model by reducing the TPM assigned to other deployments of the same model (thus freeing TPM for use), or by requesting and being approved for a model quota increase in the desired region.

0 commit comments

Comments
 (0)