Skip to content

Commit 5dfb750

Browse files
Merge pull request #2491 from mrbullwinkle/mrb_01_24_2025_deployment_types
[Azure OpenAI] Deployment types SKUs for code
2 parents b74668b + 45e257e commit 5dfb750

File tree

1 file changed

+18
-1
lines changed

1 file changed

+18
-1
lines changed

articles/ai-services/openai/how-to/deployment-types.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: mrbullwinkle
77
manager: nitinme
88
ms.service: azure-ai-openai
99
ms.topic: how-to
10-
ms.date: 07/11/2024
10+
ms.date: 01/24/2025
1111
ms.author: mbullwin
1212
---
1313

@@ -39,6 +39,8 @@ For any [deployment type](/azure/ai-services/openai/how-to/deployment-types) lab
3939
> [!IMPORTANT]
4040
> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
4141
42+
**SKU name in code:** `GlobalStandard`
43+
4244
Global deployments are available in the same Azure OpenAI resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global standard provides the highest default quota and eliminates the need to load balance across multiple resources.
4345

4446
Customers with high consistent volume may experience greater latency variability. The threshold is set per model. See the [quotas page to learn more](./quota.md). For applications that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput.
@@ -48,6 +50,8 @@ Customers with high consistent volume may experience greater latency variability
4850
> [!IMPORTANT]
4951
> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
5052
53+
**SKU name in code:** `GlobalProvisionedManaged`
54+
5155
Global deployments are available in the same Azure OpenAI resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global provisioned deployments provide reserved model processing capacity for high and predictable throughput using Azure global infrastructure.
5256

5357
## Global batch
@@ -57,6 +61,8 @@ Global deployments are available in the same Azure OpenAI resources as non-globa
5761
5862
[Global batch](./batch.md) is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, with 24-hour target turnaround, at [50% less cost than global standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads.
5963

64+
**SKU name in code:** `GlobalBatch`
65+
6066
Key use cases include:
6167

6268
* **Large-Scale Data Processing:** Quickly analyze extensive datasets in parallel.
@@ -74,9 +80,12 @@ Key use cases include:
7480
* **Marketing and Personalization:** Generate personalized content and recommendations at scale.
7581

7682
## Data zone standard
83+
7784
> [!IMPORTANT]
7885
> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location within the Microsoft specified data zone. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
7986
87+
**SKU name in code:** `DataZoneStandard`
88+
8089
Data zone standard deployments are available in the same Azure OpenAI resource as all other Azure OpenAI deployment types but allow you to leverage Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. Data zone standard provides higher default quotas than our Azure geography-based deployment types.
8190

8291
Customers with high consistent volume may experience greater latency variability. The threshold is set per model. See the [Quotas and limits](/azure/ai-services/openai/quotas-limits#usage-tiers) page to learn more. For workloads that require low latency variance at large volume, we recommend leveraging the provisioned deployment offerings.
@@ -86,23 +95,31 @@ Customers with high consistent volume may experience greater latency variability
8695
> [!IMPORTANT]
8796
> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location within the Microsoft specified data zone.[Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
8897
98+
**SKU name in code:** `DataZoneProvisionedManaged`
99+
89100
Data zone provisioned deployments are available in the same Azure OpenAI resource as all other Azure OpenAI deployment types but allow you to leverage Azure global infrastructure to dynamically route traffic to the data center within the Microsoft specified data zone with the best availability for each request. Data zone provisioned deployments provide reserved model processing capacity for high and predictable throughput using Azure infrastructure within the Microsoft specified data zone.
90101

91102
## Data zone batch
92103

93104
> [!IMPORTANT]
94105
> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location within the Microsoft specified data zone. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
106+
107+
**SKU name in code:** `DataZoneBatch`
95108

96109
Data zone batch deployments provide all the same functionality as [global batch deployments](./batch.md) while allowing you to leverage Azure global infrastructure to dynamically route traffic to only data centers within the Microsoft defined data zone with the best availability for each request.
97110

98111
## Standard
99112

113+
**SKU name in code:** `Standard`
114+
100115
Standard deployments provide a pay-per-call billing model on the chosen model. Provides the fastest way to get started as you only pay for what you consume. Models available in each region as well as throughput may be limited.
101116

102117
Standard deployments are optimized for low to medium volume workloads with high burstiness. Customers with high consistent volume may experience greater latency variability.
103118

104119
## Provisioned
105120

121+
**SKU name in code:** `ProvisionedManaged`
122+
106123
Provisioned deployments allow you to specify the amount of throughput you require in a deployment. The service then allocates the necessary model processing capacity and ensures it's ready for you. Throughput is defined in terms of provisioned throughput units (PTU) which is a normalized way of representing the throughput for your deployment. Each model-version pair requires different amounts of PTU to deploy and provide different amounts of throughput per PTU. Learn more from our [Provisioned throughput concepts article](../concepts/provisioned-throughput.md).
107124

108125
### How to disable access to global deployments in your subscription

0 commit comments

Comments
 (0)