You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/concepts/models.md
+11Lines changed: 11 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -405,6 +405,17 @@ For more information on Provisioned deployments, see our [Provisioned guidance](
405
405
406
406
This table doesn't include fine-tuning regional availability information. Consult the [fine-tuning section](#fine-tuning-models) for this information.
407
407
408
+
### Data zone standard model availability
409
+
410
+
#### Select customer access
411
+
412
+
In addition to the regions above which are available to all Azure OpenAI customers, some select pre-existing customers have been granted access to versions of GPT-4 in additional regions:
413
+
414
+
| Model | US Data zone region | EUR Data zone region |
415
+
|---|:---|
416
+
|`gpt-4o`(2024-08-06) <br> `gpt-4o`(2024-05-13) | East US 2 <br> West US 3 <br> | Spain Central <br> West Europe |
417
+
|`gpt-4o-mini` (2024-07-18) | East US 2 <br> West US 3 <br> | Spain Central <br> West Europe |
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/deployment-types.md
+31-17Lines changed: 31 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,18 +13,24 @@ ms.author: mbullwin
13
13
14
14
# Azure OpenAI deployment types
15
15
16
-
Azure OpenAI provides customers with choices on the hosting structure that fits their business and usage patterns. The service offers two main types of deployment: **standard** and **provisioned**. Standard is offered with a global deployment option, routing traffic globally to provide higher throughput. Provisioned is also offered with a global deployment option, allowing customers to purchase and deploy provisioned throughput units across Azure global infrastructure. All deployments can perform the exact same inference operations, however the billing, scale and performance are substantially different. As part of your solution design, you will need to make two key decisions:
16
+
Azure OpenAI provides customers with choices on the hosting structure that fits their business and usage patterns. The service offers two main types of deployments: **standard** and **provisioned**. For a given deployment type, customers can align their workloads with their data processing requirements by choosing an Azure geography (`Standard` or `Provisioned`), Microsoft specified data zone (`DataZone-Standard`), or Global (`Global-Standard` or `Global Provisioned-Managed`) processing options.
17
17
18
-
-**Data processing needs**: global vs. regional resources
19
-
-**Call volume**: standard vs. provisioned
18
+
All deployments can perform the exact same inference operations, however the billing, scale, and performance are substantially different. As part of your solution design, you will need to make two key decisions:
20
19
21
-
## Global versus regional deployment types
20
+
-**Data processing location**
21
+
-**Call volume**
22
22
23
-
For standard and provisioned deployments, you have an option of two types of configurations within your resource – **global** or **regional**. Global standard is the recommended starting point.
23
+
## Azure OpenAI Deployment Data Processing Locations
24
24
25
-
Global deployments leverage Azure's global infrastructure, dynamically route customer traffic to the data center with best availability for the customer’s inference requests. This means you will get the highest initial throughput limits and best model availability with Global while still providing our uptime SLA and low latency. For high volume workloads above the specified usage tiers on standard and global standard, you may experience increased latency variation. For customers that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput.
25
+
For standard deployments, there are three deployment type options to choose from - global, data zone, and Azure geography. For provisioned deployments, there are two deployment type options to choose from - global and Azure geography. Global standard is the recommended starting point.
26
26
27
-
Our global deployments will be the first location for all new models and features. Customers with very large throughput requirements should consider our provisioned deployment offering.
27
+
Global deployments leverage Azure's global infrastructure to dynamically route customer traffic to the data center with the best availability for the customer’s inference requests. This means you will get the highest initial throughput limits and best model availability with Global while still providing our uptime SLA and low latency. For high volume workloads above the specified usage tiers on standard and global standard, you may experience increased latency variation. For customers that require the lower latency variance at large workload usage, we recommend leveraging our provisioned deployment types.
28
+
29
+
Our global deployments will be the first location for all new models and features. Depending on call volume, customers with large volume and low latency variance requirements should consider our provisioned deployment types.
30
+
31
+
Data zone deployments leverage Azure's global infrastructure to dynamically route customer traffic to the data center with the best availability for the customer's inference requests within the data zone defined by Microsoft. Positioned between our Azure geography and Global deployment offerings, data zone deployments provide elevated quota limits while keeping data processing within the Microsoft specified data zone. Data stored at rest will continue to remain in the geography of the Azure OpenAI resource (e.g., for an Azure OpenAI resource created in the Sweden Central Azure region, the Azure geography is Sweden).
32
+
33
+
If the Azure OpenAI resource used in your Data Zone deployment is located in the United States, the data will be processed within the United States. If the Azure OpenAI resource used in your Data Zone deployment is located in a European Union Member Nation, the data will be processed within the European Union Member Nation geographies. For all Azure OpenAI service deployment types, any data stored at rest will continue to remain in the geography of the Azure OpenAI resource. Azure data processing and compliance commitments remain applicable.
28
34
29
35
## Deployment types
30
36
@@ -42,16 +48,6 @@ Azure OpenAI offers three types of deployments. These provide a varied level of
42
48
|**Sku Name in code**|`GlobalBatch`|`GlobalStandard`|`GlobalProvisionedManaged`|`Standard`|`ProvisionedManaged`|
43
49
|**Billing model**| Pay-per-token |Pay-per-token |Hourly billing with optional purchase of monthly or yearly reservations| Pay-per-token |Hourly billing with optional purchase of monthly or yearly reservations|
44
50
45
-
## Provisioned
46
-
47
-
Provisioned deployments allow you to specify the amount of throughput you require in a deployment. The service then allocates the necessary model processing capacity and ensures it's ready for you. Throughput is defined in terms of provisioned throughput units (PTU) which is a normalized way of representing the throughput for your deployment. Each model-version pair requires different amounts of PTU to deploy and provide different amounts of throughput per PTU. Learn more from our [Provisioned throughput concepts article](../concepts/provisioned-throughput.md).
48
-
49
-
## Standard
50
-
51
-
Standard deployments provide a pay-per-call billing model on the chosen model. Provides the fastest way to get started as you only pay for what you consume. Models available in each region as well as throughput may be limited.
52
-
53
-
Standard deployments are optimized for low to medium volume workloads with high burstiness. Customers with high consistent volume may experience greater latency variability.
54
-
55
51
## Global standard
56
52
57
53
> [!IMPORTANT]
@@ -91,6 +87,24 @@ Key use cases include:
91
87
92
88
***Marketing and Personalization:** Generate personalized content and recommendations at scale.
93
89
90
+
## Data zone standard
91
+
> [!IMPORTANT]
92
+
> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location within the Microsoft specified data zone. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
93
+
94
+
Data zone standard deployments are available in the same Azure OpenAI resource as all other Azure OpenAI deployment types but allow you to leverage Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. Data zone standard provides higher default quotas than our Azure geography-based deployment types.
95
+
96
+
Customers with high consistent volume may experience greater latency variability. The threshold is set per model. See the [Quotas and limits](/azure/ai-services/openai/quotas-limits#usage-tiers) page to learn more. For workloads that require low latency variance at large volume, we recommend leveraging the provisioned deployment offerings.
97
+
98
+
## Standard
99
+
100
+
Standard deployments provide a pay-per-call billing model on the chosen model. Provides the fastest way to get started as you only pay for what you consume. Models available in each region as well as throughput may be limited.
101
+
102
+
Standard deployments are optimized for low to medium volume workloads with high burstiness. Customers with high consistent volume may experience greater latency variability.
103
+
104
+
## Provisioned
105
+
106
+
Provisioned deployments allow you to specify the amount of throughput you require in a deployment. The service then allocates the necessary model processing capacity and ensures it's ready for you. Throughput is defined in terms of provisioned throughput units (PTU) which is a normalized way of representing the throughput for your deployment. Each model-version pair requires different amounts of PTU to deploy and provide different amounts of throughput per PTU. Learn more from our [Provisioned throughput concepts article](../concepts/provisioned-throughput.md).
107
+
94
108
### How to disable access to global deployments in your subscription
95
109
96
110
Azure Policy helps to enforce organizational standards and to assess compliance at-scale. Through its compliance dashboard, it provides an aggregated view to evaluate the overall state of the environment, with the ability to drill down to the per-resource, per-policy granularity. It also helps to bring your resources to compliance through bulk remediation for existing resources and automatic remediation for new resources. [Learn more about Azure Policy and specific built-in controls for AI services](/azure/ai-services/security-controls-policy).
Copy file name to clipboardExpand all lines: articles/ai-services/openai/quotas-limits.md
+15-3Lines changed: 15 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -108,6 +108,18 @@ The following sections provide you with a quick guide to the default quotas and
108
108
109
109
M = million | K = thousand
110
110
111
+
### gpt-4o data zone standard
112
+
113
+
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
114
+
|---|---|:---:|:---:|
115
+
|`gpt-4o`|Enterprise agreement | 10 M | 60 K |
116
+
|`gpt-4o-mini`| Enterprise agreement | 20 M | 120 K |
117
+
|`gpt-4o`|Default | 300 K | 1.8 K |
118
+
|`gpt-4o-mini`| Default | 1 M | 6 K |
119
+
120
+
M = million | K = thousand
121
+
122
+
111
123
### gpt-4o standard
112
124
113
125
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
@@ -121,14 +133,14 @@ M = million | K = thousand
121
133
122
134
#### Usage tiers
123
135
124
-
Global Standard deployments use Azure's global infrastructure, dynamically routing customer traffic to the data center with best availability for the customer’s inference requests. This enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see more variability in response latency.
136
+
Global standard deployments use Azure's global infrastructure, dynamically routing customer traffic to the data center with best availability for the customer’s inference requests. Similarly, Data zone standard deployments allow you to leverage Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. This enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see more variability in response latency.
125
137
126
138
The Usage Limit determines the level of usage above which customers might see larger variability in response latency. A customer’s usage is defined per model and is the total tokens consumed across all deployments in all subscriptions in all regions for a given tenant.
127
139
128
140
> [!NOTE]
129
-
> Usage tiers only apply to standard and global standard deployment types. Usage tiers do not apply to global batch and provisioned throughput deployments.
141
+
> Usage tiers only apply to standard, data zone standard, and global standard deployment types. Usage tiers do not apply to global batch and provisioned throughput deployments.
130
142
131
-
#### GPT-4o global standard & standard
143
+
#### GPT-4o global standard, data zone standard, & standard
Copy file name to clipboardExpand all lines: articles/ai-services/openai/whats-new.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,6 +20,11 @@ This article provides a summary of the latest releases and major documentation u
20
20
21
21
## October 2024
22
22
23
+
### NEW data zone standard deployment type
24
+
Data zone standard deployments are available in the same Azure OpenAI resource as all other Azure OpenAI deployment types but allow you to leverage Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. Data zone standard provides higher default quotas than our Azure geography-based deployment types. Data zone standard deployments are supported on `gpt-4o-2024-08-06`, `gpt-4o-2024-05-13, and `gpt-4o-mini-2024-07-18` models.
25
+
26
+
For more information, see the [deployment types guide](https://aka.ms/aoai/docs/deployment-types).
27
+
23
28
### Global Batch GA
24
29
25
30
Azure OpenAI global batch is now generally available.
0 commit comments