Skip to content

Commit 5158dd6

Browse files
Merge pull request #275739 from mrbullwinkle/mrb_05_19_2024_global_endpoints
[Azure OpenAI] Global standard
2 parents 165c978 + a31150f commit 5158dd6

File tree

8 files changed

+139
-6
lines changed

8 files changed

+139
-6
lines changed

articles/ai-services/openai/how-to/create-resource.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ manager: nitinme
77
ms.service: azure-ai-openai
88
ms.custom: devx-track-azurecli, build-2023, build-2023-dataai, devx-track-azurepowershell
99
ms.topic: how-to
10-
ms.date: 08/25/2023
10+
ms.date: 05/20/2024
1111
zone_pivot_groups: openai-create-resource
1212
author: mrbullwinkle
1313
ms.author: mbullwin
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
---
2+
title: Understanding Azure OpenAI Service deployment types
3+
titleSuffix: Azure AI services
4+
description: Learn how to use Azure OpenAI deployment types | Global-Standard | Standard | Provisioned.
5+
#services: cognitive-services
6+
author: mrbullwinkle
7+
manager: nitinme
8+
ms.service: azure-ai-openai
9+
ms.topic: how-to
10+
ms.date: 05/19/2024
11+
ms.author: mbullwin
12+
---
13+
14+
# Azure OpenAI deployment types
15+
16+
Azure OpenAI provides customers with choices on the hosting structure that fits their business and usage patterns. The service offers two main types of deployment: **standard** and **provisioned**. Standard is offered with a global deployment option, routing traffic globally to provide higher throughput. All deployments can perform the exact same inference operations, however the billing, scale and performance are substantially different. As part of your solution design, you will need to make two key decisions:
17+
18+
- **Data residency needs**: global vs. regional resources
19+
- **Call volume**: standard vs. provisioned
20+
21+
## Global versus regional deployment types
22+
23+
For standard deployments you have an option of two types of configurations within your resource – **global** or **regional**. Global standard is the recommended starting point for development and experimentation. Global deployments leverage Azure's global infrastructure, dynamically route customer traffic to the data center with best availability for the customer’s inference requests. With global deployments there are higher initial throughput limits, though your latency may vary at high usage levels. For customers that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput.
24+
25+
Our global deployments will be the first location for all new models and features. Customers with very large throughput requirements should consider our provisioned deployment offering.
26+
27+
## Deployment types
28+
29+
Azure OpenAI offers three types of deployments. These provide a varied level of capabilities that provide trade-offs on: throughput, SLAs, and price. Below is a summary of the options followed by a deeper description of each.
30+
31+
| **Offering** | **Global-Standard** <sup>**1**</sup> | **Standard** | **Provisioned** |
32+
|---|---|---|---|
33+
| **Best suited for** | Applications that don’t require data residency. Recommended starting place for customers. | For customers with data residency requirements. Optimized for low to medium volume. | Real-time scoring for large consistent volume. Includes the highest commitments and limits.|
34+
| **How it works** | Traffic may be routed anywhere in the world | | |
35+
| **Getting started** | [Model deployment](./create-resource.md) | [Model deployment](./create-resource.md) | [Provisioned onboarding](./provisioned-throughput-onboarding.md) |
36+
| **Cost** | [Baseline](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) | [Regional Pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) | May experience cost savings for consistent usage |
37+
| **What you get** | Easy access to all new models with highest default pay-per-call limits.<br><br> Customers with high volume usage may see higher latency variability | Easy access with [SLA on availability](https://azure.microsoft.com/support/legal/sla/). Optimized for low to medium volume workloads with high burstiness. <br><br>Customers with high consistent volume may experience greater latency variability. | Regional access with very high & predictable throughput. Determine throughput per PTU using the provided [capacity calculator](./provisioned-throughput-onboarding.md#estimate-provisioned-throughput-and-cost) |
38+
| **What you don’t get** | Data residency guarantees | High volume w/consistent low latency | Pay-per-call flexibility |
39+
| **Per-call Latency** | Optimized for real-time calling & low to medium volume usage. Customers with high volume usage may see higher latency variability. Threshold set per model | Optimized for real-time calling & low to medium volume usage. Customers with high volume usage may see higher latency variability. Threshold set per model | Optimized for real-time. |
40+
| **Sku Name in code** | `GlobalStandard` | `Standard` | `ProvisionedManaged` |
41+
| **Billing model** | Pay-per-token | Pay-per-token | Monthly Commitments |
42+
43+
<sup>**1**</sup> Global-Standard deployment type is currently in preview.
44+
45+
## Provisioned
46+
47+
Provisioned deployments allow you to specify the amount of throughput you require in a deployment. The service then allocates the necessary model processing capacity and ensures it's ready for you. Throughput is defined in terms of provisioned throughput units (PTU) which is a normalized way of representing the throughput for your deployment. Each model-version pair requires different amounts of PTU to deploy and provide different amounts of throughput per PTU. Learn more from our [Provisioned throughput concepts article](../concepts/provisioned-throughput.md).
48+
49+
## Standard
50+
51+
Standard deployments provide a pay-per-call billing model on the chosen model. Provides the fastest way to get started as you only pay for what you consume. Models available in each region as well as throughput may be limited.
52+
53+
Standard deployments are optimized for low to medium volume workloads with high burstiness. Customers with high consistent volume may experience greater latency variability.
54+
55+
## Global standard (preview)
56+
57+
Global deployments are available in the same Azure OpenAI resources as non-global offers but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global standard will provide the highest default quota for new models and eliminates the need to load balance across multiple resources.
58+
59+
The deployment type is optimized for low to medium volume workloads with high burstiness. Customers with high consistent volume may experience greater latency variability. The threshold is set per model. See the [quotas page to learn more](./quota.md).
60+
61+
For customers that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput.
62+
63+
### How to disable access to global deployments in your subscription
64+
65+
Azure Policy helps to enforce organizational standards and to assess compliance at-scale. Through its compliance dashboard, it provides an aggregated view to evaluate the overall state of the environment, with the ability to drill down to the per-resource, per-policy granularity. It also helps to bring your resources to compliance through bulk remediation for existing resources and automatic remediation for new resources. [Learn more about Azure Policy and specific built-in controls for AI services](/azure/ai-services/security-controls-policy).
66+
67+
You can use the following policy to disable access to Azure OpenAI global standard deployments.
68+
69+
```json
70+
{
71+
"mode": "All",
72+
"policyRule": {
73+
"if": {
74+
"allOf": [
75+
{
76+
"field": "type",
77+
"equals": "Microsoft.CognitiveServices/accounts/deployments"
78+
},
79+
{
80+
"field": "Microsoft.CognitiveServices/accounts/deployments/sku.name",
81+
"equals": "GlobalStandard"
82+
}
83+
]
84+
}
85+
}
86+
}
87+
```
88+
89+
## Deploy models
90+
91+
:::image type="content" source="../media/deployment-types/deploy-models.png" alt-text="Screenshot that shows the model deployment dialog in Azure OpenAI Studio with three deployment types highlighted." lightbox="../media/deployment-types/deploy-models.png":::
92+
93+
To learn about creating resources and deploying models refer to the [resource creation guide](./create-resource.md).
94+
95+
## See also
96+
97+
- [Quotas & limits](./quota.md)
98+
- [Provisioned throughput units (PTU) onboarding](./provisioned-throughput-onboarding.md)
99+
- [Provisioned throughput units (PTU) getting started](./provisioned-get-started.md)

articles/ai-services/openai/includes/create-resource-cli.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ manager: nitinme
77
ms.service: azure-ai-openai
88
ms.custom: devx-track-azurecli
99
ms.topic: include
10-
ms.date: 08/25/2023
10+
ms.date: 05/20/2024
1111
---
1212

1313
## Prerequisites
@@ -80,7 +80,7 @@ az cognitiveservices account keys list \
8080

8181
## Deploy a model
8282

83-
To deploy a model, use the [az cognitiveservices account deployment create](/cli/azure/cognitiveservices/account/deployment?view=azure-cli-latest&preserve-view=true#az-cognitiveservices-account-deployment-create) command. In the following example, you deploy an instance of the `text-embedding-ada-002` model and give it the name _MyModel_. When you try the example, update the code to use your values for the resource group and resource. You don't need to change the `model-version`, `model-format` or `sku-capacity`, and `sku-name` values.
83+
To deploy a model, use the [az cognitiveservices account deployment create](/cli/azure/cognitiveservices/account/deployment?view=azure-cli-latest&preserve-view=true#az-cognitiveservices-account-deployment-create) command. In the following example, you deploy an instance of the `text-embedding-ada-002` model and give it the name _MyModel_. When you try the example, update the code to use your values for the resource group and resource. You don't need to change the `model-version`, `model-format` or `sku-capacity`, and `sku-name` values.
8484

8585
```azurecli
8686
az cognitiveservices account deployment create \
@@ -94,6 +94,9 @@ az cognitiveservices account deployment create \
9494
--sku-name "Standard"
9595
```
9696

97+
`--sku-name` accepts the following deployment types: `Standard`, `GlobalStandard`, and `ProvisionedManaged`. Learn more about [deployment type options](../how-to/deployment-types.md).
98+
99+
97100
> [!IMPORTANT]
98101
> When you access the model via the API, you need to refer to the deployment name rather than the underlying model name in API calls, which is one of the [key differences](../how-to/switching-endpoints.yml) between OpenAI and Azure OpenAI. OpenAI only requires the model name. Azure OpenAI always requires deployment name, even when using the model parameter. In our docs, we often have examples where deployment names are represented as identical to model names to help indicate which model works with a particular API endpoint. Ultimately your deployment names can follow whatever naming convention is best for your use case.
99102

articles/ai-services/openai/includes/create-resource-portal.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ description: Learn how to use the Azure portal to create an Azure OpenAI resourc
66
manager: nitinme
77
ms.service: azure-ai-openai
88
ms.topic: include
9-
ms.date: 01/30/2024
9+
ms.date: 05/20/2024
1010
---
1111

1212
## Prerequisites
@@ -111,6 +111,7 @@ To deploy a model, follow these steps:
111111
|---|---|
112112
| **Select a model** | Model availability varies by region. For a list of available models per region, see [Model summary table and region availability](../concepts/models.md#model-summary-table-and-region-availability). |
113113
| **Deployment name** | Choose a name carefully. The deployment name is used in your code to call the model by using the client libraries and the REST APIs. |
114+
|**Deployment type** | **Standard**, **Global-Standard**, **Provisioned-Managed**. Learn more about [deployment type options](../how-to/deployment-types.md). |
114115
| **Advanced options** (Optional) | You can set optional advanced settings, as needed for your resource. <br> - For the **Content Filter**, assign a content filter to your deployment.<br> - For the **Tokens per Minute Rate Limit**, adjust the Tokens per Minute (TPM) to set the effective rate limit for your deployment. You can modify this value at any time by using the [**Quotas**](../how-to/quota.md) menu. [**Dynamic Quota**](../how-to/dynamic-quota.md) allows you to take advantage of more quota when extra capacity is available. |
115116

116117
5. Select a model from the dropdown list.

articles/ai-services/openai/includes/create-resource-powershell.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ manager: nitinme
77
ms.service: azure-ai-openai
88
ms.custom: devx-track-azurepowershell
99
ms.topic: include
10-
ms.date: 08/28/2023
10+
ms.date: 05/20/2024
1111
---
1212

1313
## Prerequisites
@@ -89,6 +89,8 @@ $sku = New-Object -TypeName "Microsoft.Azure.Management.CognitiveServices.Models
8989
New-AzCognitiveServicesAccountDeployment -ResourceGroupName OAIResourceGroup -AccountName MyOpenAIResource -Name MyModel -Properties $properties -Sku $sku
9090
```
9191

92+
The `Name` property of the `$sku` variable accepts the following deployment types: `Standard`, `GlobalStandard`, and `ProvisionedManaged`. Learn more about [deployment type options](../how-to/deployment-types.md).
93+
9294
> [!IMPORTANT]
9395
> When you access the model via the API, you need to refer to the deployment name rather than the underlying model name in API calls, which is one of the [key differences](../how-to/switching-endpoints.yml) between OpenAI and Azure OpenAI. OpenAI only requires the model name. Azure OpenAI always requires deployment name, even when using the model parameter. In our docs, we often have examples where deployment names are represented as identical to model names to help indicate which model works with a particular API endpoint. Ultimately your deployment names can follow whatever naming convention is best for your use case.
9496
38.1 KB
Loading

articles/ai-services/openai/quotas-limits.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.custom:
1010
- ignite-2023
1111
- references_regions
1212
ms.topic: conceptual
13-
ms.date: 02/27/2024
13+
ms.date: 05/19/2024
1414
ms.author: mbullwin
1515
---
1616

@@ -50,6 +50,31 @@ The following sections provide you with a quick guide to the default quotas and
5050

5151
[!INCLUDE [Quota](includes/model-matrix/quota.md)]
5252

53+
## gpt-4o rate limits
54+
55+
`gpt-4o` introduces rate limit tiers with higher limits for certain customer types.
56+
57+
### gpt-4o global standard
58+
59+
> [!NOTE]
60+
> The [global standard model deployment type](./how-to/deployment-types.md#deployment-types) is currently in public preview.
61+
62+
|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
63+
|---|:---:|:---:|
64+
|Enterprise agreement | 10 M | 60 K |
65+
|Default | 450 K | 2.7 K |
66+
67+
M = million | K = thousand
68+
69+
### gpt-4o standard
70+
71+
|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
72+
|---|:---:|:---:|
73+
|Enterprise agreement | 1 M | 6 K |
74+
|Default | 150 K | 900 |
75+
76+
M = million | K = thousand
77+
5378
### General best practices to remain within rate limits
5479

5580
To minimize issues related to rate limits, it's a good idea to use the following techniques:

articles/ai-services/openai/toc.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ items:
88
href: overview.md
99
- name: Quotas and limits
1010
href: quotas-limits.md
11+
- name: Deployment types
12+
href: ./how-to/deployment-types.md
13+
displayName: global, Global, globalstandard, global-standard, Global-Standard, standard, provisioned
1114
- name: Models
1215
href: ./concepts/models.md
1316
- name: Model retirements

0 commit comments

Comments
 (0)