Merge pull request #281839 from mrbullwinkle/mrb_07_25_2024_gpt4o-mini

prmerger-automator[bot] · web-flow · commit db7ad0bc47dc · 2024-07-31T21:41:42.000Z
[Azure OpenAI] GPT-4o mini
diff --git a/articles/ai-services/openai/concepts/models.md b/articles/ai-services/openai/concepts/models.md
@@ -4,7 +4,7 @@ titleSuffix: Azure OpenAI
 description: Learn about the different model capabilities that are available with Azure OpenAI.
 ms.service: azure-ai-openai
 ms.topic: conceptual
-ms.date: 07/18/2024
+ms.date: 07/31/2024
 ms.custom: references_regions, build-2023, build-2023-dataai, refefences_regions
 manager: nitinme
 author: mrbullwinkle #ChrisHMSFT
@@ -18,7 +18,7 @@ Azure OpenAI Service is powered by a diverse set of models with different capabi
 
 | Models | Description |
 |--|--|
-| [GPT-4o & GPT-4 Turbo](#gpt-4o-and-gpt-4-turbo) | The latest most capable Azure OpenAI models with multimodal versions, which can accept both text and images as input. |
+| [GPT-4o & GPT-4o mini & GPT-4 Turbo](#gpt-4o-and-gpt-4-turbo) | The latest most capable Azure OpenAI models with multimodal versions, which can accept both text and images as input. |
 | [GPT-4](#gpt-4) | A set of models that improve on GPT-3.5 and can understand and generate natural language and code. |
 | [GPT-3.5](#gpt-35) | A set of models that improve on GPT-3 and can understand and generate natural language and code. |
 | [Embeddings](#embeddings-models) | A set of models that can convert text into numerical vector form to facilitate text similarity. |
@@ -30,27 +30,16 @@ Azure OpenAI Service is powered by a diverse set of models with different capabi
 
 GPT-4o integrates text and images in a single model, enabling it to handle multiple data types simultaneously. This multimodal approach enhances accuracy and responsiveness in human-computer interactions. GPT-4o matches GPT-4 Turbo in English text and coding tasks while offering superior performance in non-English languages and vision tasks, setting new benchmarks for AI capabilities.
 
-### Early access playground
+### How do I access the GPT-4o and GPT-4o mini models?
 
-Existing Azure OpenAI customers can test out the **NEW GPT-4o mini** model in the **Azure OpenAI Studio Early Access Playground (Preview)**.  
-
-To test the latest model:
-
-> [!NOTE]
-> The GPT-4o mini early access playground is currently only available for resources in **West US3** and **East US**, and is limited to 10 requests every five minutes per subscription. Azure OpenAI content filters are enabled at the default configuration and cannot be modified. GPT-4o mini is a preview model and is currently not available for deployment/direct API access.
-
-1. Navigate to Azure OpenAI Studio at https://oai.azure.com/ and sign-in with credentials that have access to your OpenAI resources.
-2. Select an Azure OpenAI resource in the **West US3** or **East US** regions. If you don't have a resource in one of these regions you will need to [create a resource](../how-to/create-resource.md).
-3. From the main [Azure OpenAI Studio](https://oai.azure.com/) page select the **Early Access Playground (Preview)** button from under the **Get started** section. (This button will only be visible when a resource in **West US3** or **East US** is selected.)
-4. Now you can start asking the model questions just as you would before in the existing [chat playground](../chatgpt-quickstart.md).
-
-### How do I access the GPT-4o model?
-
-GPT-4o is available for **standard** and **global-standard** model deployment.
+GPT-4o and GPT-4o mini are available for **standard** and **global-standard** model deployment.
 
 You need to [create](../how-to/create-resource.md) or use an existing resource in a [supported standard](#gpt-4-and-gpt-4-turbo-model-availability) or [global standard](#global-standard-model-availability) region where the model is available.
 
-When your resource is created, you can [deploy](../how-to/create-resource.md#deploy-a-model) the GPT-4o model. If you are performing a programmatic deployment, the **model** name is `gpt-4o`, and the **version** is `2024-05-13`.
+When your resource is created, you can [deploy](../how-to/create-resource.md#deploy-a-model) the GPT-4o models. If you are performing a programmatic deployment, the **model** names are:
+
+- `gpt-4o`, **Version**  `2024-05-13`
+- `gpt-4o-mini` **Version** `2024-07-18`
 
 ### GPT-4 Turbo
 
@@ -76,7 +65,8 @@ See [model versions](../concepts/model-versions.md) to learn about how Azure Ope
 
 |  Model ID  | Description | Max Request (tokens) | Training Data (up to)  |
 |  --- |  :--- |:--- |:---: |
-|`gpt-4o` (2024-05-13) <br> **GPT-4o (Omni)** | **Latest GA model** <br> - Text, image processing <br> - JSON Mode <br> - parallel function calling <br> - Enhanced accuracy and responsiveness <br> - Parity with English text and coding tasks compared to GPT-4 Turbo with Vision <br> - Superior performance in non-English languages and in vision tasks <br> - **Does not support enhancements** |Input: 128,000  <br> Output: 4,096| Oct 2023 |
+|`gpt-4o-mini` (2024-07-18) <br> **GPT-4o mini** | **Latest small GA model** <br> - Fast, inexpensive, capable model ideal for replacing GPT-3.5 Turbo series models. <br> - Text, image processing <br>- JSON Mode <br> - parallel function calling <br> - **Does not support enhancements** | Input: 128,000 <br> Output: 16,384  | Oct 2023 |
+|`gpt-4o` (2024-05-13) <br> **GPT-4o (Omni)** | **Latest large GA model** <br> - Text, image processing <br> - JSON Mode <br> - parallel function calling <br> - Enhanced accuracy and responsiveness <br> - Parity with English text and coding tasks compared to GPT-4 Turbo with Vision <br> - Superior performance in non-English languages and in vision tasks <br> - **Does not support enhancements** |Input: 128,000  <br> Output: 4,096| Oct 2023 |
 | `gpt-4` (turbo-2024-04-09) <br>**GPT-4 Turbo with Vision** | **New GA model** <br> - Replacement for all previous GPT-4 preview models (`vision-preview`, `1106-Preview`, `0125-Preview`). <br> - [**Feature availability**](#gpt-4o-and-gpt-4-turbo) is currently different depending on method of input, and deployment type. <br> - **Does not support enhancements**. | Input: 128,000  <br> Output: 4,096  | Dec 2023 |
 | `gpt-4` (0125-Preview)*<br>**GPT-4 Turbo Preview** | **Preview Model** <br> -Replaces 1106-Preview <br>- Better code generation performance <br> - Reduces cases where the model doesn't complete a task <br> - JSON Mode <br> - parallel function calling <br> - reproducible output (preview) | Input: 128,000  <br> Output: 4,096           | Dec 2023         |
 | `gpt-4` (vision-preview)<br>**GPT-4 Turbo with Vision Preview**  | **Preview model** <br> - Accepts text and image input. <br> - Supports enhancements <br> - JSON Mode <br> - parallel function calling <br> - reproducible output (preview) | Input: 128,000  <br> Output: 4,096              | Apr 2023       |
@@ -180,9 +170,7 @@ For more information on Provisioned deployments, see our [Provisioned guidance](
 
 ### Global standard model availability
 
-**Supported models:**
-
-- `gpt-4o` **Version:** `2024-05-13`  
+`gpt-4o` **Version:** `2024-05-13`  
 
 **Supported regions:**
 
@@ -208,14 +196,18 @@ For more information on Provisioned deployments, see our [Provisioned guidance](
 - westus            
 - westus3           
 
+`gpt-4o-mini` **Version:** `2024-07-18`  
+
+**Supported regions:**
+
+- eastus
+
 ### GPT-4 and GPT-4 Turbo model availability
 
 #### Public cloud regions
 
 [!INCLUDE [GPT-4](../includes/model-matrix/standard-gpt-4.md)]
 
-
-
 #### Select customer access
 
 In addition to the regions above which are available to all Azure OpenAI customers, some select pre-existing customers have been granted access to versions of GPT-4 in additional regions:
@@ -283,9 +275,9 @@ These models can only be used with Embedding API requests.
 | `gpt-35-turbo` (0613) | East US2 <br> North Central US <br> Sweden Central <br> Switzerland West | 4,096 | Sep 2021 |
 | `gpt-35-turbo` (1106) | East US2 <br> North Central US <br> Sweden Central <br> Switzerland West | Input: 16,385<br> Output: 4,096 |  Sep 2021|
 | `gpt-35-turbo` (0125)  | East US2 <br> North Central US <br> Sweden Central <br> Switzerland West | 16,385 | Sep 2021 |
-| `gpt-4` (0613) <sup>**1**<sup> | North Central US <br> Sweden Central | 8192 | Sep 2021 |
+| `gpt-4` (0613) <sup>**1**</sup> | North Central US <br> Sweden Central | 8192 | Sep 2021 |
 
-**<sup>1<sup>** GPT-4 fine-tuning is currently in public preview. See our [GPT-4 fine-tuning safety evaluation guidance](/azure/ai-services/openai/how-to/fine-tuning?tabs=turbo%2Cpython-new&pivots=programming-language-python#safety-evaluation-gpt-4-fine-tuning---public-preview) for more information.
+**<sup>1</sup>** GPT-4 fine-tuning is currently in public preview. See our [GPT-4 fine-tuning safety evaluation guidance](/azure/ai-services/openai/how-to/fine-tuning?tabs=turbo%2Cpython-new&pivots=programming-language-python#safety-evaluation-gpt-4-fine-tuning---public-preview) for more information.
 
 ### Whisper models
 
diff --git a/articles/ai-services/openai/how-to/fine-tuning.md b/articles/ai-services/openai/how-to/fine-tuning.md
@@ -7,7 +7,7 @@ manager: nitinme
 ms.service: azure-ai-openai
 ms.custom: build-2023, build-2023-dataai, devx-track-python
 ms.topic: how-to
-ms.date: 05/16/2024
+ms.date: 07/25/2024
 author: mrbullwinkle
 ms.author: mbullwin
 zone_pivot_groups: openai-fine-tuning-new
diff --git a/articles/ai-services/openai/how-to/function-calling.md b/articles/ai-services/openai/how-to/function-calling.md
@@ -36,6 +36,7 @@ At a high level you can break down working with functions into three steps:
 * `gpt-4` (vision-preview)
 * `gpt-4` (2024-04-09)
 * `gpt-4o` (2024-05-13)
+* `gpt-4o-mini` (2024-07-18)
 
 Support for parallel function was first added in API version [`2023-12-01-preview`](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/preview/2023-12-01-preview/inference.json)
 
diff --git a/articles/ai-services/openai/includes/fine-tuning-openai-in-ai-studio.md b/articles/ai-services/openai/includes/fine-tuning-openai-in-ai-studio.md
@@ -31,7 +31,9 @@ The following models support fine-tuning:
 - `gpt-35-turbo` (0613)
 - `gpt-35-turbo` (1106)
 - `gpt-35-turbo` (0125)
-- `gpt-4` (0613)
+- `gpt-4` (0613)**<sup>*</sup>** 
+
+**<sup>*</sup>** Fine-tuning for this model is currently in public preview.
 
 Consult the [models page](../concepts/models.md#fine-tuning-models) to check which regions currently support fine-tuning.
 
diff --git a/articles/ai-services/openai/includes/fine-tuning-python.md b/articles/ai-services/openai/includes/fine-tuning-python.md
@@ -31,7 +31,9 @@ The following models support fine-tuning:
 - `gpt-35-turbo` (0613)
 - `gpt-35-turbo` (1106)
 - `gpt-35-turbo` (0125)
-- `gpt-4` (0613)
+- `gpt-4` (0613)**<sup>*</sup>**
+
+**<sup>*</sup>** Fine-tuning for this model is currently in public preview.
 
 If you plan to use `gpt-4` for fine-tuning, please refer to the [GPT-4 public preview safety evaluation guidance](#safety-evaluation-gpt-4-fine-tuning---public-preview)
 
diff --git a/articles/ai-services/openai/includes/fine-tuning-rest.md b/articles/ai-services/openai/includes/fine-tuning-rest.md
@@ -30,7 +30,9 @@ The following models support fine-tuning:
 - `gpt-35-turbo` (0613)
 - `gpt-35-turbo` (1106)
 - `gpt-35-turbo` (0125)
-- `gpt-4` (0613)
+- `gpt-4` (0613)**<sup>*</sup>** 
+
+**<sup>*</sup>** Fine-tuning for this model is currently in public preview.
 
 Consult the [models page](../concepts/models.md#fine-tuning-models) to check which regions currently support fine-tuning.
 
diff --git a/articles/ai-services/openai/includes/fine-tuning-studio.md b/articles/ai-services/openai/includes/fine-tuning-studio.md
@@ -29,7 +29,9 @@ The following models support fine-tuning:
 - `gpt-35-turbo` (0613)
 - `gpt-35-turbo` (1106)
 - `gpt-35-turbo` (0125)
-- `gpt-4` (0613)
+- `gpt-4` (0613)**<sup>*</sup>** 
+
+**<sup>*</sup>** Fine-tuning for this model is currently in public preview.
 
 Consult the [models page](../concepts/models.md#fine-tuning-models) to check which regions currently support fine-tuning.
 
diff --git a/articles/ai-services/openai/quotas-limits.md b/articles/ai-services/openai/quotas-limits.md
@@ -10,7 +10,7 @@ ms.custom:
   - ignite-2023
   - references_regions
 ms.topic: conceptual
-ms.date: 07/24/2024
+ms.date: 07/31/2024
 ms.author: mbullwin
 ---
 
@@ -55,23 +55,27 @@ The following sections provide you with a quick guide to the default quotas and
 
 ## gpt-4o rate limits
 
-`gpt-4o` introduces rate limit tiers with higher limits for certain customer types.
+`gpt-4o` and `gpt-4o-mini` have rate limit tiers with higher limits for certain customer types.
 
 ### gpt-4o global standard
 
-|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
-|---|:---:|:---:|
-|Enterprise agreement | 30 M | 180 K |
-|Default | 450 K | 2.7 K |
+| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
+|---|---|:---:|:---:|
+|`gpt-4o`|Enterprise agreement | 30 M | 180 K |
+|`gpt-4o-mini` | Enterprise agreement | 50 M | 300 K |
+|`gpt-4o` |Default | 450 K | 2.7 K |
+|`gpt-4o-mini` | Default | 2 M | 12 K  |
 
 M = million | K = thousand
 
 ### gpt-4o standard
 
-|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
-|---|:---:|:---:|
-|Enterprise agreement | 1 M | 6 K |
-|Default | 150 K | 900 |
+| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
+|---|---|:---:|:---:|
+|`gpt-4o`|Enterprise agreement | 1 M | 6 K |
+|`gpt-4o-mini` | Enterprise agreement | 2 M | 12 K |
+|`gpt-4o`|Default | 150 K | 900 |
+|`gpt-4o-mini` | Default | 450 K | 2.7 K |
 
 M = million | K = thousand
 
diff --git a/articles/ai-services/openai/whats-new.md b/articles/ai-services/openai/whats-new.md
@@ -10,7 +10,7 @@ ms.custom:
   - ignite-2023
   - references_regions
 ms.topic: whats-new
-ms.date: 07/18/2024
+ms.date: 07/31/2024
 recommendations: false
 ---
 
@@ -20,15 +20,15 @@ This article provides a summary of the latest releases and major documentation u
 
 ## July 2024
 
-### GPT-4o mini preview model available for early access
+### GPT-4o mini model available for deployment
 
-GPT-4o mini is the latest model from OpenAI [launched on July 18, 2024](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/).
+GPT-4o mini is the latest Azure OpenAI model first [announced on July 18, 2024](https://azure.microsoft.com/blog/openais-fastest-model-gpt-4o-mini-is-now-available-on-azure-ai/):
 
-From OpenAI:
+*"GPT-4o mini allows customers to deliver stunning applications at a lower cost with blazing speed. GPT-4o mini is significantly smarter than GPT-3.5 Turbo—scoring 82% on Measuring Massive Multitask Language Understanding (MMLU) compared to 70%—and is more than 60% cheaper.1 The model delivers an expanded 128K context window and integrates the improved multilingual capabilities of GPT-4o, bringing greater quality to languages from around the world."*
 
-*"GPT-4o mini surpasses GPT-3.5 Turbo and other small models on academic benchmarks across both textual intelligence and multimodal reasoning, and supports the same range of languages as GPT-4o. It also demonstrates strong performance in function calling, which can enable developers to build applications that fetch data or take actions with external systems, and improved long-context performance compared to GPT-3.5 Turbo."*
+The model is currently available for both [standard and global standard deployment](./how-to/deployment-types.md) in the East US region.
 
-To start testing out the model today in Azure OpenAI, see the [**Azure OpenAI Studio early access playground**](./concepts/models.md#early-access-playground).
+For information on model quota, consult the [quota and limits page](./quotas-limits.md) and for the latest info on model availability refer to the [models page](./concepts/models.md).
 
 ### New Responsible AI default content filtering policy