You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Getting started with Azure OpenAI global batch deployments (preview)
17
17
18
-
The Azure OpenAI Batch API is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, a 24-hour turnaround time, at [50% less cost than global standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads.
18
+
The Azure OpenAI Batch API is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, with 24-hour target turnaround, at [50% less cost than global standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads.
19
19
20
20
Key use cases include:
21
21
@@ -36,7 +36,7 @@ Key use cases include:
36
36
> [!IMPORTANT]
37
37
> We aim to process batch requests within 24 hours; we do not expire the jobs that take longer. You can [cancel](#cancel-batch) the job anytime. When you cancel the job, any remaining work is cancelled and any already completed work is returned. You will be charged for any completed work.
38
38
>
39
-
> Data may be processed outside of the resource’s Azure geography, but data storage remains in its Azure geography. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
39
+
> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
40
40
41
41
## Global batch support
42
42
@@ -80,7 +80,7 @@ In the Studio UI the deployment type will appear as `Global-Batch`.
80
80
:::image type="content" source="../media/how-to/global-batch/global-batch.png" alt-text="Screenshot that shows the model deployment dialog in Azure OpenAI Studio with Global-Batch deployment type highlighted." lightbox="../media/how-to/global-batch/global-batch.png":::
81
81
82
82
> [!TIP]
83
-
> Each line of your input file for batch processing requires the unique **deployment name** that you chose during model deployment to be present. This value wil be assigned to the `model` attribute. This is different from OpenAI where the concept of model deployments does not exist.
83
+
> Each line of your input file for batch processing has a `model` attribute that requires a global batch **deployment name**. For a given input file, all names must be the same deployment name. This is different from OpenAI where the concept of model deployments does not exist.
|**Best suited for**| Offline scoring <br><br> Workloads that are not latency sensitive and can be completed in hours.| Applications that don’t require data residency. Recommended starting place for customers. | For customers with data residency requirements. Optimized for low to medium volume. | Real-time scoring for large consistent volume. Includes the highest commitments and limits.|
33
+
|**Best suited for**| Offline scoring <br><br> Workloads that are not latency sensitive and can be completed in hours.<br><br> For use cases that do not have data processing residency requirements.| Recommended starting place for customers. <br><br>Global-Standard will have the higher default quota and larger number of models available than Standard. <br><br> For production applications that do not have data processing residency requirements. | For customers with data residency requirements. Optimized for low to medium volume. | Real-time scoring for large consistent volume. Includes the highest commitments and limits.|
34
34
|**How it works**| Offline processing via files |Traffic may be routed anywhere in the world |||
|**Cost**|[Least expensive option](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) <br> 50% less cost compared to Global Standard prices. Access to all new models with larger quota allocations. |[Global deployment pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/)|[Regional pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/)| May experience cost savings for consistent usage |
37
37
|**What you get**|[Significant discount compared to Global Standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/)| Easy access to all new models with highest default pay-per-call limits.<br><br> Customers with high volume usage may see higher latency variability | Easy access with [SLA on availability](https://azure.microsoft.com/support/legal/sla/). Optimized for low to medium volume workloads with high burstiness. <br><br>Customers with high consistent volume may experience greater latency variability. | Regional access with very high & predictable throughput. Determine throughput per PTU using the provided [capacity calculator](./provisioned-throughput-onboarding.md#estimate-provisioned-throughput-and-cost)|
38
-
|**What you don’t get**|❌Real-time call performance |❌Data processing guarantee<br> <br> Data might be processed outside of the resource's Azure geography, but data storage remains in its Azure geography. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/)| ❌High volume w/consistent low latency | ❌Pay-per-call flexibility |
38
+
|**What you don’t get**|❌Real-time call performance <br><br>❌Data processing guarantee<br> <br> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/)|❌Data processing guarantee<br> <br> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/)| ❌High volume w/consistent low latency | ❌Pay-per-call flexibility |
39
39
|**Per-call Latency**| Not Applicable (file based async process) | Optimized for real-time calling & low to medium volume usage. Customers with high volume usage may see higher latency variability. Threshold set per model | Optimized for real-time calling & low to medium volume usage. Customers with high volume usage may see higher latency variability. Threshold set per model | Optimized for real-time. |
40
40
|**Sku Name in code**|`GlobalBatch`|`GlobalStandard`|`Standard`|`ProvisionedManaged`|
@@ -53,7 +53,7 @@ Standard deployments are optimized for low to medium volume workloads with high
53
53
## Global standard
54
54
55
55
> [!IMPORTANT]
56
-
> Data might be processed outside of the resource's Azure geography, but data storage remains in its Azure geography. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
56
+
> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
57
57
58
58
Global deployments are available in the same Azure OpenAI resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global standard provides the highest default quota and eliminates the need to load balance across multiple resources.
59
59
@@ -62,9 +62,9 @@ Customers with high consistent volume may experience greater latency variability
62
62
## Global batch
63
63
64
64
> [!IMPORTANT]
65
-
> Data might be processed outside of the resource's Azure geography, but data storage remains in its Azure geography. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
65
+
> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
66
66
67
-
[Global batch](./batch.md) is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, a 24-hour turnaround time, at [50% less cost than global standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads.
67
+
[Global batch](./batch.md) is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, with 24-hour target turnaround, at [50% less cost than global standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads.
Copy file name to clipboardExpand all lines: articles/ai-services/openai/whats-new.md
+25-1Lines changed: 25 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,14 +10,38 @@ ms.custom:
10
10
- ignite-2023
11
11
- references_regions
12
12
ms.topic: whats-new
13
-
ms.date: 08/02/2024
13
+
ms.date: 08/05/2024
14
14
recommendations: false
15
15
---
16
16
17
17
# What's new in Azure OpenAI Service
18
18
19
19
This article provides a summary of the latest releases and major documentation updates for Azure OpenAI.
20
20
21
+
## August 2024
22
+
23
+
### Global batch deployments are now available
24
+
25
+
The Azure OpenAI Batch API is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, with 24-hour target turnaround, at [50% less cost than global standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads.
26
+
27
+
Key use cases include:
28
+
29
+
***Large-Scale Data Processing:** Quickly analyze extensive datasets in parallel.
30
+
31
+
***Content Generation:** Create large volumes of text, such as product descriptions or articles.
32
+
33
+
***Document Review and Summarization:** Automate the review and summarization of lengthy documents.
34
+
35
+
***Customer Support Automation:** Handle numerous queries simultaneously for faster responses.
36
+
37
+
***Data Extraction and Analysis:** Extract and analyze information from vast amounts of unstructured data.
38
+
39
+
***Natural Language Processing (NLP) Tasks:** Perform tasks like sentiment analysis or translation on large datasets.
40
+
41
+
***Marketing and Personalization:** Generate personalized content and recommendations at scale.
42
+
43
+
For more information on [getting started with global batch deployments](./how-to/batch.md).
0 commit comments