You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/deployment-types.md
+18-1Lines changed: 18 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ author: mrbullwinkle
7
7
manager: nitinme
8
8
ms.service: azure-ai-openai
9
9
ms.topic: how-to
10
-
ms.date: 07/11/2024
10
+
ms.date: 01/24/2025
11
11
ms.author: mbullwin
12
12
---
13
13
@@ -39,6 +39,8 @@ For any [deployment type](/azure/ai-services/openai/how-to/deployment-types) lab
39
39
> [!IMPORTANT]
40
40
> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
41
41
42
+
**SKU name in code:**`GlobalStandard`
43
+
42
44
Global deployments are available in the same Azure OpenAI resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global standard provides the highest default quota and eliminates the need to load balance across multiple resources.
43
45
44
46
Customers with high consistent volume may experience greater latency variability. The threshold is set per model. See the [quotas page to learn more](./quota.md). For applications that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput.
@@ -48,6 +50,8 @@ Customers with high consistent volume may experience greater latency variability
48
50
> [!IMPORTANT]
49
51
> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
50
52
53
+
**SKU name in code:**`GlobalProvisionedManaged`
54
+
51
55
Global deployments are available in the same Azure OpenAI resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global provisioned deployments provide reserved model processing capacity for high and predictable throughput using Azure global infrastructure.
52
56
53
57
## Global batch
@@ -57,6 +61,8 @@ Global deployments are available in the same Azure OpenAI resources as non-globa
57
61
58
62
[Global batch](./batch.md) is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, with 24-hour target turnaround, at [50% less cost than global standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads.
59
63
64
+
**SKU name in code:**`GlobalBatch`
65
+
60
66
Key use cases include:
61
67
62
68
***Large-Scale Data Processing:** Quickly analyze extensive datasets in parallel.
@@ -74,9 +80,12 @@ Key use cases include:
74
80
***Marketing and Personalization:** Generate personalized content and recommendations at scale.
75
81
76
82
## Data zone standard
83
+
77
84
> [!IMPORTANT]
78
85
> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location within the Microsoft specified data zone. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
79
86
87
+
**SKU name in code:**`DataZoneStandard`
88
+
80
89
Data zone standard deployments are available in the same Azure OpenAI resource as all other Azure OpenAI deployment types but allow you to leverage Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. Data zone standard provides higher default quotas than our Azure geography-based deployment types.
81
90
82
91
Customers with high consistent volume may experience greater latency variability. The threshold is set per model. See the [Quotas and limits](/azure/ai-services/openai/quotas-limits#usage-tiers) page to learn more. For workloads that require low latency variance at large volume, we recommend leveraging the provisioned deployment offerings.
@@ -86,23 +95,31 @@ Customers with high consistent volume may experience greater latency variability
86
95
> [!IMPORTANT]
87
96
> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location within the Microsoft specified data zone.[Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
88
97
98
+
**SKU name in code:**`DataZoneProvisionedManaged`
99
+
89
100
Data zone provisioned deployments are available in the same Azure OpenAI resource as all other Azure OpenAI deployment types but allow you to leverage Azure global infrastructure to dynamically route traffic to the data center within the Microsoft specified data zone with the best availability for each request. Data zone provisioned deployments provide reserved model processing capacity for high and predictable throughput using Azure infrastructure within the Microsoft specified data zone.
90
101
91
102
## Data zone batch
92
103
93
104
> [!IMPORTANT]
94
105
> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location within the Microsoft specified data zone. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
106
+
107
+
**SKU name in code:**`DataZoneBatch`
95
108
96
109
Data zone batch deployments provide all the same functionality as [global batch deployments](./batch.md) while allowing you to leverage Azure global infrastructure to dynamically route traffic to only data centers within the Microsoft defined data zone with the best availability for each request.
97
110
98
111
## Standard
99
112
113
+
**SKU name in code:**`Standard`
114
+
100
115
Standard deployments provide a pay-per-call billing model on the chosen model. Provides the fastest way to get started as you only pay for what you consume. Models available in each region as well as throughput may be limited.
101
116
102
117
Standard deployments are optimized for low to medium volume workloads with high burstiness. Customers with high consistent volume may experience greater latency variability.
103
118
104
119
## Provisioned
105
120
121
+
**SKU name in code:**`ProvisionedManaged`
122
+
106
123
Provisioned deployments allow you to specify the amount of throughput you require in a deployment. The service then allocates the necessary model processing capacity and ensures it's ready for you. Throughput is defined in terms of provisioned throughput units (PTU) which is a normalized way of representing the throughput for your deployment. Each model-version pair requires different amounts of PTU to deploy and provide different amounts of throughput per PTU. Learn more from our [Provisioned throughput concepts article](../concepts/provisioned-throughput.md).
107
124
108
125
### How to disable access to global deployments in your subscription
0 commit comments