Skip to content

Commit 9fcd5d6

Browse files
author
Jill Grant
authored
Merge pull request #1131 from sydneemayers/docs-editor/deployment-types-1730243482
Data Zone Standard deployment launch
2 parents 273e6e1 + 770723e commit 9fcd5d6

File tree

4 files changed

+62
-20
lines changed

4 files changed

+62
-20
lines changed

articles/ai-services/openai/concepts/models.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -405,6 +405,17 @@ For more information on Provisioned deployments, see our [Provisioned guidance](
405405

406406
This table doesn't include fine-tuning regional availability information. Consult the [fine-tuning section](#fine-tuning-models) for this information.
407407

408+
### Data zone standard model availability
409+
410+
#### Select customer access
411+
412+
In addition to the regions above which are available to all Azure OpenAI customers, some select pre-existing customers have been granted access to versions of GPT-4 in additional regions:
413+
414+
| Model | US Data zone region | EUR Data zone region |
415+
|---|:---|
416+
| `gpt-4o`(2024-08-06) <br> `gpt-4o`(2024-05-13) | East US 2 <br> West US 3 <br> | Spain Central <br> West Europe |
417+
| `gpt-4o-mini` (2024-07-18) | East US 2 <br> West US 3 <br> | Spain Central <br> West Europe |
418+
408419
### Standard models by endpoint
409420

410421
# [Chat Completions](#tab/standard-chat-completions)

articles/ai-services/openai/how-to/deployment-types.md

Lines changed: 31 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -13,18 +13,24 @@ ms.author: mbullwin
1313

1414
# Azure OpenAI deployment types
1515

16-
Azure OpenAI provides customers with choices on the hosting structure that fits their business and usage patterns. The service offers two main types of deployment: **standard** and **provisioned**. Standard is offered with a global deployment option, routing traffic globally to provide higher throughput. Provisioned is also offered with a global deployment option, allowing customers to purchase and deploy provisioned throughput units across Azure global infrastructure. All deployments can perform the exact same inference operations, however the billing, scale and performance are substantially different. As part of your solution design, you will need to make two key decisions:
16+
Azure OpenAI provides customers with choices on the hosting structure that fits their business and usage patterns. The service offers two main types of deployments: **standard** and **provisioned**. For a given deployment type, customers can align their workloads with their data processing requirements by choosing an Azure geography (`Standard` or `Provisioned`), Microsoft specified data zone (`DataZone-Standard`), or Global (`Global-Standard` or `Global Provisioned-Managed`) processing options.
1717

18-
- **Data processing needs**: global vs. regional resources
19-
- **Call volume**: standard vs. provisioned
18+
All deployments can perform the exact same inference operations, however the billing, scale, and performance are substantially different. As part of your solution design, you will need to make two key decisions:
2019

21-
## Global versus regional deployment types
20+
- **Data processing location**
21+
- **Call volume**
2222

23-
For standard and provisioned deployments, you have an option of two types of configurations within your resource – **global** or **regional**. Global standard is the recommended starting point.
23+
## Azure OpenAI Deployment Data Processing Locations
2424

25-
Global deployments leverage Azure's global infrastructure, dynamically route customer traffic to the data center with best availability for the customer’s inference requests. This means you will get the highest initial throughput limits and best model availability with Global while still providing our uptime SLA and low latency. For high volume workloads above the specified usage tiers on standard and global standard, you may experience increased latency variation. For customers that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput.
25+
For standard deployments, there are three deployment type options to choose from - global, data zone, and Azure geography. For provisioned deployments, there are two deployment type options to choose from - global and Azure geography. Global standard is the recommended starting point.
2626

27-
Our global deployments will be the first location for all new models and features. Customers with very large throughput requirements should consider our provisioned deployment offering.
27+
Global deployments leverage Azure's global infrastructure to dynamically route customer traffic to the data center with the best availability for the customer’s inference requests. This means you will get the highest initial throughput limits and best model availability with Global while still providing our uptime SLA and low latency. For high volume workloads above the specified usage tiers on standard and global standard, you may experience increased latency variation. For customers that require the lower latency variance at large workload usage, we recommend leveraging our provisioned deployment types.
28+
29+
Our global deployments will be the first location for all new models and features. Depending on call volume, customers with large volume and low latency variance requirements should consider our provisioned deployment types.
30+
31+
Data zone deployments leverage Azure's global infrastructure to dynamically route customer traffic to the data center with the best availability for the customer's inference requests within the data zone defined by Microsoft. Positioned between our Azure geography and Global deployment offerings, data zone deployments provide elevated quota limits while keeping data processing within the Microsoft specified data zone. Data stored at rest will continue to remain in the geography of the Azure OpenAI resource (e.g., for an Azure OpenAI resource created in the Sweden Central Azure region, the Azure geography is Sweden).
32+
33+
If the Azure OpenAI resource used in your Data Zone deployment is located in the United States, the data will be processed within the United States. If the Azure OpenAI resource used in your Data Zone deployment is located in a European Union Member Nation, the data will be processed within the European Union Member Nation geographies. For all Azure OpenAI service deployment types, any data stored at rest will continue to remain in the geography of the Azure OpenAI resource. Azure data processing and compliance commitments remain applicable.
2834

2935
## Deployment types
3036

@@ -42,16 +48,6 @@ Azure OpenAI offers three types of deployments. These provide a varied level of
4248
| **Sku Name in code** | `GlobalBatch` | `GlobalStandard` |`GlobalProvisionedManaged`| `Standard` | `ProvisionedManaged` |
4349
| **Billing model** | Pay-per-token |Pay-per-token |Hourly billing with optional purchase of monthly or yearly reservations| Pay-per-token |Hourly billing with optional purchase of monthly or yearly reservations|
4450

45-
## Provisioned
46-
47-
Provisioned deployments allow you to specify the amount of throughput you require in a deployment. The service then allocates the necessary model processing capacity and ensures it's ready for you. Throughput is defined in terms of provisioned throughput units (PTU) which is a normalized way of representing the throughput for your deployment. Each model-version pair requires different amounts of PTU to deploy and provide different amounts of throughput per PTU. Learn more from our [Provisioned throughput concepts article](../concepts/provisioned-throughput.md).
48-
49-
## Standard
50-
51-
Standard deployments provide a pay-per-call billing model on the chosen model. Provides the fastest way to get started as you only pay for what you consume. Models available in each region as well as throughput may be limited.
52-
53-
Standard deployments are optimized for low to medium volume workloads with high burstiness. Customers with high consistent volume may experience greater latency variability.
54-
5551
## Global standard
5652

5753
> [!IMPORTANT]
@@ -91,6 +87,24 @@ Key use cases include:
9187

9288
* **Marketing and Personalization:** Generate personalized content and recommendations at scale.
9389

90+
## Data zone standard
91+
> [!IMPORTANT]
92+
> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location within the Microsoft specified data zone. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
93+
94+
Data zone standard deployments are available in the same Azure OpenAI resource as all other Azure OpenAI deployment types but allow you to leverage Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. Data zone standard provides higher default quotas than our Azure geography-based deployment types.
95+
96+
Customers with high consistent volume may experience greater latency variability. The threshold is set per model. See the [Quotas and limits](/azure/ai-services/openai/quotas-limits#usage-tiers) page to learn more. For workloads that require low latency variance at large volume, we recommend leveraging the provisioned deployment offerings.
97+
98+
## Standard
99+
100+
Standard deployments provide a pay-per-call billing model on the chosen model. Provides the fastest way to get started as you only pay for what you consume. Models available in each region as well as throughput may be limited.
101+
102+
Standard deployments are optimized for low to medium volume workloads with high burstiness. Customers with high consistent volume may experience greater latency variability.
103+
104+
## Provisioned
105+
106+
Provisioned deployments allow you to specify the amount of throughput you require in a deployment. The service then allocates the necessary model processing capacity and ensures it's ready for you. Throughput is defined in terms of provisioned throughput units (PTU) which is a normalized way of representing the throughput for your deployment. Each model-version pair requires different amounts of PTU to deploy and provide different amounts of throughput per PTU. Learn more from our [Provisioned throughput concepts article](../concepts/provisioned-throughput.md).
107+
94108
### How to disable access to global deployments in your subscription
95109

96110
Azure Policy helps to enforce organizational standards and to assess compliance at-scale. Through its compliance dashboard, it provides an aggregated view to evaluate the overall state of the environment, with the ability to drill down to the per-resource, per-policy granularity. It also helps to bring your resources to compliance through bulk remediation for existing resources and automatic remediation for new resources. [Learn more about Azure Policy and specific built-in controls for AI services](/azure/ai-services/security-controls-policy).

articles/ai-services/openai/quotas-limits.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,18 @@ The following sections provide you with a quick guide to the default quotas and
108108

109109
M = million | K = thousand
110110

111+
### gpt-4o data zone standard
112+
113+
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
114+
|---|---|:---:|:---:|
115+
|`gpt-4o`|Enterprise agreement | 10 M | 60 K |
116+
|`gpt-4o-mini` | Enterprise agreement | 20 M | 120 K |
117+
|`gpt-4o` |Default | 300 K | 1.8 K |
118+
|`gpt-4o-mini` | Default | 1 M | 6 K |
119+
120+
M = million | K = thousand
121+
122+
111123
### gpt-4o standard
112124

113125
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
@@ -121,14 +133,14 @@ M = million | K = thousand
121133

122134
#### Usage tiers
123135

124-
Global Standard deployments use Azure's global infrastructure, dynamically routing customer traffic to the data center with best availability for the customer’s inference requests. This enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see more variability in response latency.
136+
Global standard deployments use Azure's global infrastructure, dynamically routing customer traffic to the data center with best availability for the customer’s inference requests. Similarly, Data zone standard deployments allow you to leverage Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. This enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see more variability in response latency.
125137

126138
The Usage Limit determines the level of usage above which customers might see larger variability in response latency. A customer’s usage is defined per model and is the total tokens consumed across all deployments in all subscriptions in all regions for a given tenant.
127139

128140
> [!NOTE]
129-
> Usage tiers only apply to standard and global standard deployment types. Usage tiers do not apply to global batch and provisioned throughput deployments.
141+
> Usage tiers only apply to standard, data zone standard, and global standard deployment types. Usage tiers do not apply to global batch and provisioned throughput deployments.
130142
131-
#### GPT-4o global standard & standard
143+
#### GPT-4o global standard, data zone standard, & standard
132144

133145
|Model| Usage Tiers per month |
134146
|----|----|

articles/ai-services/openai/whats-new.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,11 @@ This article provides a summary of the latest releases and major documentation u
2020

2121
## October 2024
2222

23+
### NEW data zone standard deployment type
24+
Data zone standard deployments are available in the same Azure OpenAI resource as all other Azure OpenAI deployment types but allow you to leverage Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. Data zone standard provides higher default quotas than our Azure geography-based deployment types. Data zone standard deployments are supported on `gpt-4o-2024-08-06`, `gpt-4o-2024-05-13, and `gpt-4o-mini-2024-07-18` models.
25+
26+
For more information, see the [deployment types guide](https://aka.ms/aoai/docs/deployment-types).
27+
2328
### Global Batch GA
2429

2530
Azure OpenAI global batch is now generally available.

0 commit comments

Comments
 (0)