Skip to content

Commit d36eb57

Browse files
Merge pull request #283630 from mrbullwinkle/mrb_08_01_2024_batch-007
[Azure OpenAI] Small fixes
2 parents ec86304 + ee7d53c commit d36eb57

File tree

4 files changed

+34
-10
lines changed

4 files changed

+34
-10
lines changed

articles/ai-services/openai/how-to/batch.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ zone_pivot_groups: openai-fine-tuning-batch
1515

1616
# Getting started with Azure OpenAI global batch deployments (preview)
1717

18-
The Azure OpenAI Batch API is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, a 24-hour turnaround time, at [50% less cost than global standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads.
18+
The Azure OpenAI Batch API is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, with 24-hour target turnaround, at [50% less cost than global standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads.
1919

2020
Key use cases include:
2121

@@ -36,7 +36,7 @@ Key use cases include:
3636
> [!IMPORTANT]
3737
> We aim to process batch requests within 24 hours; we do not expire the jobs that take longer. You can [cancel](#cancel-batch) the job anytime. When you cancel the job, any remaining work is cancelled and any already completed work is returned. You will be charged for any completed work.
3838
>
39-
> Data may be processed outside of the resource’s Azure geography, but data storage remains in its Azure geography. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/). 
39+
> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/). 
4040
4141
## Global batch support
4242

@@ -80,7 +80,7 @@ In the Studio UI the deployment type will appear as `Global-Batch`.
8080
:::image type="content" source="../media/how-to/global-batch/global-batch.png" alt-text="Screenshot that shows the model deployment dialog in Azure OpenAI Studio with Global-Batch deployment type highlighted." lightbox="../media/how-to/global-batch/global-batch.png":::
8181

8282
> [!TIP]
83-
> Each line of your input file for batch processing requires the unique **deployment name** that you chose during model deployment to be present. This value wil be assigned to the `model` attribute. This is different from OpenAI where the concept of model deployments does not exist.
83+
> Each line of your input file for batch processing has a `model` attribute that requires a global batch **deployment name**. For a given input file, all names must be the same deployment name. This is different from OpenAI where the concept of model deployments does not exist.
8484
8585
::: zone pivot="programming-language-ai-studio"
8686

articles/ai-services/openai/how-to/deployment-types.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,12 +30,12 @@ Azure OpenAI offers three types of deployments. These provide a varied level of
3030

3131
| **Offering** | **Global-Batch** | **Global-Standard** | **Standard** | **Provisioned** |
3232
|---|:---|:---|:---|:---|
33-
| **Best suited for** | Offline scoring <br><br> Workloads that are not latency sensitive and can be completed in hours.| Applications that don’t require data residency. Recommended starting place for customers. | For customers with data residency requirements. Optimized for low to medium volume. | Real-time scoring for large consistent volume. Includes the highest commitments and limits.|
33+
| **Best suited for** | Offline scoring <br><br> Workloads that are not latency sensitive and can be completed in hours.<br><br> For use cases that do not have data processing residency requirements.| Recommended starting place for customers. <br><br>Global-Standard will have the higher default quota and larger number of models available than Standard. <br><br> For production applications that do not have data processing residency requirements. | For customers with data residency requirements. Optimized for low to medium volume. | Real-time scoring for large consistent volume. Includes the highest commitments and limits.|
3434
| **How it works** | Offline processing via files |Traffic may be routed anywhere in the world | | |
3535
| **Getting started** | [Global-Batch](./batch.md) | [Model deployment](./create-resource.md) | [Model deployment](./create-resource.md) | [Provisioned onboarding](./provisioned-throughput-onboarding.md) |
3636
| **Cost** | [Least expensive option](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) <br> 50% less cost compared to Global Standard prices. Access to all new models with larger quota allocations. | [Global deployment pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) | [Regional pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) | May experience cost savings for consistent usage |
3737
| **What you get** |[Significant discount compared to Global Standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) | Easy access to all new models with highest default pay-per-call limits.<br><br> Customers with high volume usage may see higher latency variability | Easy access with [SLA on availability](https://azure.microsoft.com/support/legal/sla/). Optimized for low to medium volume workloads with high burstiness. <br><br>Customers with high consistent volume may experience greater latency variability. | Regional access with very high & predictable throughput. Determine throughput per PTU using the provided [capacity calculator](./provisioned-throughput-onboarding.md#estimate-provisioned-throughput-and-cost) |
38-
| **What you don’t get** |❌Real-time call performance |❌Data processing guarantee<br> <br> Data might be processed outside of the resource's Azure geography, but data storage remains in its Azure geography. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/) | ❌High volume w/consistent low latency | ❌Pay-per-call flexibility |
38+
| **What you don’t get** |❌Real-time call performance <br><br>❌Data processing guarantee<br> <br> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/) |❌Data processing guarantee<br> <br> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/) | ❌High volume w/consistent low latency | ❌Pay-per-call flexibility |
3939
| **Per-call Latency** | Not Applicable (file based async process) | Optimized for real-time calling & low to medium volume usage. Customers with high volume usage may see higher latency variability. Threshold set per model | Optimized for real-time calling & low to medium volume usage. Customers with high volume usage may see higher latency variability. Threshold set per model | Optimized for real-time. |
4040
| **Sku Name in code** | `GlobalBatch` | `GlobalStandard` | `Standard` | `ProvisionedManaged` |
4141
| **Billing model** | Pay-per-token |Pay-per-token | Pay-per-token | Monthly Commitments |
@@ -53,7 +53,7 @@ Standard deployments are optimized for low to medium volume workloads with high
5353
## Global standard
5454

5555
> [!IMPORTANT]
56-
> Data might be processed outside of the resource's Azure geography, but data storage remains in its Azure geography. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
56+
> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
5757
5858
Global deployments are available in the same Azure OpenAI resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global standard provides the highest default quota and eliminates the need to load balance across multiple resources.
5959

@@ -62,9 +62,9 @@ Customers with high consistent volume may experience greater latency variability
6262
## Global batch
6363

6464
> [!IMPORTANT]
65-
> Data might be processed outside of the resource's Azure geography, but data storage remains in its Azure geography. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
65+
> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
6666
67-
[Global batch](./batch.md) is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, a 24-hour turnaround time, at [50% less cost than global standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads.
67+
[Global batch](./batch.md) is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, with 24-hour target turnaround, at [50% less cost than global standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads.
6868

6969
Key use cases include:
7070

articles/ai-services/openai/includes/global-batch-limits.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ The table shows the batch quota limit. Quota values for global batch are represe
2323
|Model|Enterprise agreement|Default| Monthly credit card based subscriptions | MSDN subscriptions | Azure for Students, Free Trials |
2424
|---|---|---|---|---|---|
2525
| `gpt-4o` | 5 B | 50 M | 1.35 M | 90 K | N/A|
26-
| `gpt-4-turbo`<sup>*</sup> | 300 M | 40 M | 1.35 M | 90 K | N/A |
26+
| `gpt-4-turbo` | 300 M | 40 M | 1.35 M | 90 K | N/A |
2727
| `gpt-4` | 150 M | 5 M | 200 K | 100 K | N/A |
2828
| `gpt-35-turbo` | 10 B | 100 M | 5 M | 2 M | 50 K |
2929

articles/ai-services/openai/whats-new.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,38 @@ ms.custom:
1010
- ignite-2023
1111
- references_regions
1212
ms.topic: whats-new
13-
ms.date: 08/02/2024
13+
ms.date: 08/05/2024
1414
recommendations: false
1515
---
1616

1717
# What's new in Azure OpenAI Service
1818

1919
This article provides a summary of the latest releases and major documentation updates for Azure OpenAI.
2020

21+
## August 2024
22+
23+
### Global batch deployments are now available
24+
25+
The Azure OpenAI Batch API is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, with 24-hour target turnaround, at [50% less cost than global standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads.
26+
27+
Key use cases include:
28+
29+
* **Large-Scale Data Processing:** Quickly analyze extensive datasets in parallel.
30+
31+
* **Content Generation:** Create large volumes of text, such as product descriptions or articles.
32+
33+
* **Document Review and Summarization:** Automate the review and summarization of lengthy documents.
34+
35+
* **Customer Support Automation:** Handle numerous queries simultaneously for faster responses.
36+
37+
* **Data Extraction and Analysis:** Extract and analyze information from vast amounts of unstructured data.
38+
39+
* **Natural Language Processing (NLP) Tasks:** Perform tasks like sentiment analysis or translation on large datasets.
40+
41+
* **Marketing and Personalization:** Generate personalized content and recommendations at scale.
42+
43+
For more information on [getting started with global batch deployments](./how-to/batch.md).
44+
2145
## July 2024
2246

2347
### GPT-4o mini is now available for fine-tuning

0 commit comments

Comments
 (0)