Merge pull request #283630 from mrbullwinkle/mrb_08_01_2024_batch-007

prmerger-automator[bot] · web-flow · commit d36eb57e1cea · 2024-08-05T21:05:57.000Z
[Azure OpenAI]  Small fixes
diff --git a/articles/ai-services/openai/how-to/batch.md b/articles/ai-services/openai/how-to/batch.md
@@ -15,7 +15,7 @@ zone_pivot_groups: openai-fine-tuning-batch
 
 # Getting started with Azure OpenAI global batch deployments (preview)
 
-The Azure OpenAI Batch API is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, a 24-hour turnaround time, at [50% less cost than global standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads.  
+The Azure OpenAI Batch API is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, with 24-hour target turnaround, at [50% less cost than global standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads.  
 
 Key use cases include:
 
@@ -36,7 +36,7 @@ Key use cases include:
 > [!IMPORTANT]
 > We aim to process batch requests within 24 hours; we do not expire the jobs that take longer. You can [cancel](#cancel-batch) the job anytime. When you cancel the job, any remaining work is cancelled and any already completed work is returned. You will be charged for any completed work.
 >
-> Data may be processed outside of the resource’s Azure geography, but data storage remains in its Azure geography. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).  
+> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).  
 
 ## Global batch support
 
@@ -80,7 +80,7 @@ In the Studio UI the deployment type will appear as `Global-Batch`.
 :::image type="content" source="../media/how-to/global-batch/global-batch.png" alt-text="Screenshot that shows the model deployment dialog in Azure OpenAI Studio with Global-Batch deployment type highlighted." lightbox="../media/how-to/global-batch/global-batch.png":::
 
 > [!TIP]
-> Each line of your input file for batch processing requires the unique **deployment name** that you chose during model deployment to be present. This value wil be assigned to the `model` attribute. This is different from OpenAI where the concept of model deployments does not exist.
+> Each line of your input file for batch processing has a `model` attribute that requires a global batch **deployment name**. For a given input file, all names must be the same deployment name. This is different from OpenAI where the concept of model deployments does not exist.
 
 ::: zone pivot="programming-language-ai-studio"
 
diff --git a/articles/ai-services/openai/how-to/deployment-types.md b/articles/ai-services/openai/how-to/deployment-types.md
@@ -30,12 +30,12 @@ Azure OpenAI offers three types of deployments. These provide a varied level of
 
 | **Offering** | **Global-Batch** | **Global-Standard** | **Standard** | **Provisioned**  |
 |---|:---|:---|:---|:---|
-| **Best suited for**      | Offline scoring <br><br> Workloads that are not latency sensitive and can be completed in hours.| Applications that don’t require data residency. Recommended starting place for customers. | For customers with data residency requirements. Optimized for low to medium volume. | Real-time scoring for large consistent volume. Includes the highest commitments and limits.|
+| **Best suited for**      | Offline scoring <br><br> Workloads that are not latency sensitive and can be completed in hours.<br><br> For use cases that do not have data processing residency requirements.| Recommended starting place for customers. <br><br>Global-Standard will have the higher default quota and larger number of models available than Standard. <br><br> For production applications that do not have data processing residency requirements. | For customers with data residency requirements. Optimized for low to medium volume. | Real-time scoring for large consistent volume. Includes the highest commitments and limits.|
 | **How it works**         | Offline processing via files |Traffic may be routed anywhere in the world | | |
 | **Getting started**      | [Global-Batch](./batch.md) | [Model deployment](./create-resource.md) | [Model deployment](./create-resource.md) | [Provisioned onboarding](./provisioned-throughput-onboarding.md) |
 | **Cost**                 | [Least expensive option](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) <br> 50% less cost compared to Global Standard prices. Access to all new models with larger quota allocations.  | [Global deployment pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) |  [Regional pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) | May experience cost savings for consistent usage |
 | **What you get**         |[Significant discount compared to Global Standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) | Easy access to all new models with highest default pay-per-call limits.<br><br> Customers with high volume usage may see higher latency variability | Easy access with [SLA on availability](https://azure.microsoft.com/support/legal/sla/). Optimized for low to medium volume workloads with high burstiness. <br><br>Customers with high consistent volume may experience greater latency variability. | Regional access with very high & predictable throughput. Determine throughput per PTU using the provided [capacity calculator](./provisioned-throughput-onboarding.md#estimate-provisioned-throughput-and-cost) |
-| **What you don’t get**   |❌Real-time call performance |❌Data processing guarantee<br> <br> Data might be processed outside of the resource's Azure geography, but data storage remains in its Azure geography. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/) | ❌High volume w/consistent low latency | ❌Pay-per-call flexibility |
+| **What you don’t get**   |❌Real-time call performance <br><br>❌Data processing guarantee<br> <br> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/)  |❌Data processing guarantee<br> <br> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/) | ❌High volume w/consistent low latency | ❌Pay-per-call flexibility |
 | **Per-call Latency**     | Not Applicable (file based async process) | Optimized for real-time calling & low to medium volume usage. Customers with high volume usage may see higher latency variability. Threshold set per model | Optimized for real-time calling & low to medium volume usage. Customers with high volume usage may see higher latency variability. Threshold set per model | Optimized for real-time. |
 | **Sku Name in code**     |  `GlobalBatch` |   `GlobalStandard`               | `Standard`   |      `ProvisionedManaged`       |
 | **Billing model**        |  Pay-per-token |Pay-per-token | Pay-per-token | Monthly Commitments |
@@ -53,7 +53,7 @@ Standard deployments are optimized for low to medium volume workloads with high
 ## Global standard
 
 > [!IMPORTANT]
-> Data might be processed outside of the resource's Azure geography, but data storage remains in its Azure geography. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
+> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
 
 Global deployments are available in the same Azure OpenAI resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request.  Global standard provides the highest default quota and eliminates the need to load balance across multiple resources.  
 
@@ -62,9 +62,9 @@ Customers with high consistent volume may experience greater latency variability
 ## Global batch
 
 > [!IMPORTANT]
-> Data might be processed outside of the resource's Azure geography, but data storage remains in its Azure geography. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
+> Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure OpenAI location. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
 
-[Global batch](./batch.md) is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, a 24-hour turnaround time, at [50% less cost than global standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads.  
+[Global batch](./batch.md) is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, with 24-hour target turnaround, at [50% less cost than global standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads.  
 
 Key use cases include:
 
diff --git a/articles/ai-services/openai/includes/global-batch-limits.md b/articles/ai-services/openai/includes/global-batch-limits.md
@@ -23,7 +23,7 @@ The table shows the batch quota limit. Quota values for global batch are represe
 |Model|Enterprise agreement|Default| Monthly credit card based subscriptions | MSDN subscriptions | Azure for Students, Free Trials |
 |---|---|---|---|---|---|
 | `gpt-4o` | 5 B | 50 M | 1.35 M | 90 K | N/A|
-| `gpt-4-turbo`<sup>*</sup> | 300 M | 40 M | 1.35 M | 90 K | N/A |
+| `gpt-4-turbo` | 300 M | 40 M | 1.35 M | 90 K | N/A |
 | `gpt-4` | 150 M | 5 M | 200 K | 100 K | N/A |
 | `gpt-35-turbo` | 10 B | 100 M | 5 M | 2 M | 50 K |
 
diff --git a/articles/ai-services/openai/whats-new.md b/articles/ai-services/openai/whats-new.md
@@ -10,14 +10,38 @@ ms.custom:
   - ignite-2023
   - references_regions
 ms.topic: whats-new
-ms.date: 08/02/2024
+ms.date: 08/05/2024
 recommendations: false
 ---
 
 # What's new in Azure OpenAI Service
 
 This article provides a summary of the latest releases and major documentation updates for Azure OpenAI.
 
+## August 2024
+
+### Global batch deployments are now available
+
+The Azure OpenAI Batch API is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, with 24-hour target turnaround, at [50% less cost than global standard](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads.  
+
+Key use cases include:
+
+* **Large-Scale Data Processing:** Quickly analyze extensive datasets in parallel.
+
+* **Content Generation:** Create large volumes of text, such as product descriptions or articles.
+
+* **Document Review and Summarization:** Automate the review and summarization of lengthy documents.
+
+* **Customer Support Automation:** Handle numerous queries simultaneously for faster responses.
+
+* **Data Extraction and Analysis:** Extract and analyze information from vast amounts of unstructured data.
+
+* **Natural Language Processing (NLP) Tasks:** Perform tasks like sentiment analysis or translation on large datasets.
+
+* **Marketing and Personalization:** Generate personalized content and recommendations at scale.
+
+For more information on [getting started with global batch deployments](./how-to/batch.md).
+
 ## July 2024
 
 ### GPT-4o mini is now available for fine-tuning