Skip to content

Commit 8b28e79

Browse files
Merge pull request #3998 from mrbullwinkle/mrb_04_08_2025_quota_capacity_updates
[Azure OpenAI] Add info on capacity API
2 parents 1663575 + 2b8fe3f commit 8b28e79

File tree

2 files changed

+40
-3
lines changed

2 files changed

+40
-3
lines changed

articles/ai-services/openai/quotas-limits.md

Lines changed: 39 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,8 +55,6 @@ The following sections provide you with a quick guide to the default quotas and
5555

5656
<sup>1</sup> Our current APIs allow up to 10 custom headers, which are passed through the pipeline, and returned. Some customers now exceed this header count resulting in HTTP 431 errors. There's no solution for this error, other than to reduce header volume. **In future API versions we will no longer pass through custom headers**. We recommend customers not depend on custom headers in future system architectures.
5757

58-
## Regional quota limits
59-
6058
> [!NOTE]
6159
> Quota limits are subject to change.
6260
@@ -274,6 +272,45 @@ Quota increase requests can be submitted via the [quota increase request form](h
274272

275273
For other rate limits, [submit a service request](../cognitive-services-support-options.md?context=/azure/ai-services/openai/context/context).
276274

275+
## Regional quota capacity limits
276+
277+
You can view quota availability by region for your subscription in the [Azure AI Foundry portal](https://ai.azure.com/resource/quota).
278+
279+
Alternatively to view quota capacity by region for a specific model/version you can query the [capacity API](/rest/api/aiservices/accountmanagement/model-capacities/list) for your subscription. Provide a `subscriptionId`, `model_name`, and `model_version` and the API will return the available capacity for that model across all regions, and deployment types for your subscription.
280+
281+
> [!NOTE]
282+
> Currently both the Azure AI Foundry portal and the capacity API will return quota/capacity information for models that are [retired](./concepts/model-retirements.md) and no longer available.
283+
284+
[API Reference](/rest/api/aiservices/accountmanagement/model-capacities/list)
285+
286+
```python
287+
import requests
288+
import json
289+
from azure.identity import DefaultAzureCredential
290+
291+
subscriptionId = "Replace with your subscription ID" #replace with your subscription ID
292+
model_name = "gpt-4o" # Example value, replace with model name
293+
model_version = "2024-08-06" # Example value, replace with model version
294+
295+
token_credential = DefaultAzureCredential()
296+
token = token_credential.get_token('https://management.azure.com/.default')
297+
headers = {'Authorization': 'Bearer ' + token.token}
298+
299+
url = f"https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.CognitiveServices/modelCapacities"
300+
params = {
301+
"api-version": "2024-06-01-preview",
302+
"modelFormat": "OpenAI",
303+
"modelName": model_name,
304+
"modelVersion": model_version
305+
}
306+
307+
response = requests.get(url, params=params, headers=headers)
308+
model_capacity = response.json()
309+
310+
print(json.dumps(model_capacity, indent=2))
311+
312+
```
313+
277314
## Next steps
278315

279316
Explore how to [manage quota](./how-to/quota.md) for your Azure OpenAI deployments.

articles/ai-services/openai/whats-new.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -771,7 +771,7 @@ Azure OpenAI Service now supports speech to text APIs powered by OpenAI's Whispe
771771

772772
### Regional quota limits increases
773773

774-
- Increases to the max default quota limits for certain models and regions. Migrating workloads to [these models and regions](./quotas-limits.md#regional-quota-limits) will allow you to take advantage of higher Tokens per minute (TPM).
774+
- Increases to the max default quota limits for certain models and regions. Migrating workloads to [these models and regions](./quotas-limits.md) will allow you to take advantage of higher Tokens per minute (TPM).
775775

776776
## August 2023
777777

0 commit comments

Comments
 (0)