Merge pull request #563 from MicrosoftDocs/release-2024-openai-oct

Daidihuang · web-flow · commit f766e0c8d96c · 2024-10-02T04:08:38.000+08:00
[Azure OpenAI] Release branch to main tracking PR
diff --git a/articles/ai-services/openai/concepts/models.md b/articles/ai-services/openai/concepts/models.md
@@ -4,7 +4,7 @@ titleSuffix: Azure OpenAI
 description: Learn about the different model capabilities that are available with Azure OpenAI.
 ms.service: azure-ai-openai
 ms.topic: conceptual
-ms.date: 09/12/2024
+ms.date: 09/30/2024
 ms.custom: references_regions, build-2023, build-2023-dataai, refefences_regions
 manager: nitinme
 author: mrbullwinkle #ChrisHMSFT
@@ -19,6 +19,7 @@ Azure OpenAI Service is powered by a diverse set of models with different capabi
 | Models | Description |
 |--|--|
 | [GPT-4o & GPT-4o mini & GPT-4 Turbo](#gpt-4o-and-gpt-4-turbo) | The latest most capable Azure OpenAI models with multimodal versions, which can accept both text and images as input. |
+| [GPT-4o audio](#gpt-4o-audio) | A GPT-4o model that supports low-latency, "speech in, speech out" conversational interactions. |
 | [GPT-4](#gpt-4) | A set of models that improve on GPT-3.5 and can understand and generate natural language and code. |
 | [GPT-3.5](#gpt-35) | A set of models that improve on GPT-3 and can understand and generate natural language and code. |
 | [Embeddings](#embeddings-models) | A set of models that can convert text into numerical vector form to facilitate text similarity. |
@@ -43,6 +44,20 @@ Once access has been granted, you will need to:
 1. Navigate to https://ai.azure.com/resources and select a resource in the `eastus2` region. If you do not have an Azure OpenAI resource in this region you will need to [create one](https://portal.azure.com/#create/Microsoft.CognitiveServicesOpenAI).  
 2. Once the `eastus2` Azure OpenAI resource is selected, in the upper left-hand panel under **Playgrounds** select **Early access playground (preview)**.
 
+## GPT-4o audio
+
+The `gpt-4o-realtime-preview` model is part of the GPT-4o model family and supports low-latency, "speech in, speech out" conversational interactions. GPT-4o audio is designed to handle real-time, low-latency conversational interactions, making it a great fit for support agents, assistants, translators, and other use cases that need highly responsive back-and-forth with a user.
+
+GPT-4o audio is available in the East US 2 (`eastus2`) and Sweden Central (`swedencentral`) regions. To use GPT-4o audio, you need to [create](../how-to/create-resource.md) or use an existing resource in one of the supported regions.
+
+When your resource is created, you can [deploy](../how-to/create-resource.md#deploy-a-model) the GPT-4o audio model. If you are performing a programmatic deployment, the **model** name is `gpt-4o-realtime-preview`. For more information on how to use GPT-4o audio, see the [GPT-4o audio documentation](../how-to/audio-real-time.md).
+
+Details about maximum request tokens and training data are available in the following table.
+
+|  Model ID  | Description | Max Request (tokens) | Training Data (up to)  |
+|  --- |  :--- |:--- |:---: |
+|`gpt-4o-realtime-preview` (2024-10-01-preview) <br> **GPT-4o audio** | **Audio model** for real-time audio processing |Input: 128,000  <br> Output: 4,096 | Oct 2023 |
+
 ## GPT-4o and GPT-4 Turbo
 
 GPT-4o integrates text and images in a single model, enabling it to handle multiple data types simultaneously. This multimodal approach enhances accuracy and responsiveness in human-computer interactions. GPT-4o matches GPT-4 Turbo in English text and coding tasks while offering superior performance in non-English languages and vision tasks, setting new benchmarks for AI capabilities.
@@ -96,15 +111,17 @@ See [model versions](../concepts/model-versions.md) to learn about how Azure Ope
 | `gpt-4` (0314) | **Older GA model** <br> - [Retirement information](./model-retirements.md#current-models)  | 8,192 | Sep 2021         |
 
 > [!CAUTION]
-> We don't recommend using preview models in production. We will upgrade all deployments of preview models to either future preview versions or to the latest stable/GA version. Models designated preview do not follow the standard Azure OpenAI model lifecycle.
+> We don't recommend using preview models in production. We will upgrade all deployments of preview models to either future preview versions or to the latest stable GA version. Models designated preview do not follow the standard Azure OpenAI model lifecycle.
 
 - GPT-4 version 0125-preview is an updated version of the GPT-4 Turbo preview previously released as version 1106-preview.  
 - GPT-4 version 0125-preview completes tasks such as code generation more completely compared to gpt-4-1106-preview. Because of this, depending on the task, customers may find that GPT-4-0125-preview generates more output compared to the gpt-4-1106-preview.  We recommend customers compare the outputs of the new model.  GPT-4-0125-preview also addresses bugs in gpt-4-1106-preview with UTF-8 handling for non-English languages. 
 - GPT-4 version `turbo-2024-04-09` is the latest GA release and replaces `0125-Preview`, `1106-preview`, and `vision-preview`.
 
 > [!IMPORTANT]
->
-> - `gpt-4` versions 1106-Preview, 0125-Preview, and vision-preview will be upgraded with a stable version of `gpt-4` in the future. Deployments of `gpt-4` versions 1106-Preview, 0125-Preview, and vision-preview set to "Auto-update to default" and "Upgrade when expired" will start to be upgraded after the stable version is released. For each deployment, a model version upgrade takes place with no interruption in service for API calls.  Upgrades are staged by region and the full upgrade process is expected to take 2 weeks. Deployments of `gpt-4` versions 1106-Preview, 0125-Preview, and vision-preview set to "No autoupgrade" will not be upgraded and will stop operating when the preview version is upgraded in the region. See [Azure OpenAI model retirements and deprecations](./model-retirements.md) for more information on the timing of the upgrade.
+> The GPT-4 (`gpt-4`) versions `1106-Preview`, `0125-Preview`, and `vision-preview` will be upgraded with a stable version of `gpt-4` in the future. 
+> - Deployments of `gpt-4` versions `1106-Preview`, `0125-Preview`, and `vision-preview` set to "Auto-update to default" and "Upgrade when expired" will start to be upgraded after the stable version is released. For each deployment, a model version upgrade takes place with no interruption in service for API calls. Upgrades are staged by region and the full upgrade process is expected to take 2 weeks. 
+> - Deployments of `gpt-4` versions  `1106-Preview`, `0125-Preview`, and `vision-preview` set to "No autoupgrade" will not be upgraded and will stop operating when the preview version is upgraded in the region. 
+> See [Azure OpenAI model retirements and deprecations](./model-retirements.md) for more information on the timing of the upgrade.
 
 ## GPT-3.5
 
diff --git a/articles/ai-services/openai/how-to/audio-real-time.md b/articles/ai-services/openai/how-to/audio-real-time.md
@@ -0,0 +1,100 @@
+---
+title: 'How to use GPT-4o real-time audio with Azure OpenAI Service'
+titleSuffix: Azure OpenAI
+description: Learn how to use GPT-4o real-time audio with Azure OpenAI Service.
+manager: nitinme
+ms.service: azure-ai-openai
+ms.topic: how-to
+ms.date: 10/1/2024
+author: eric-urban
+ms.author: eur
+ms.custom: references_regions
+recommendations: false
+---
+
+# GPT-4o real-time audio
+
+Azure OpenAI GPT-4o audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions. The GPT-4o audio `realtime` API is designed to handle real-time, low-latency conversational interactions, making it a great fit for use cases involving live interactions between a user and a model, such as customer support agents, voice assistants, and real-time translators.
+
+Most users of this API need to deliver and receive audio from an end-user in real time, including applications that use WebRTC or a telephony system. The real-time API isn't designed to connect directly to end user devices and relies on client integrations to terminate end user audio streams. 
+
+## Supported models
+
+Currently only `gpt-4o-realtime-preview` version: `2024-10-01-preview` supports real-time audio.
+
+The `gpt-4o-realtime-preview` model is available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
+
+> [!IMPORTANT]
+> The system stores your prompts and completions as described in the "Data Use and Access for Abuse Monitoring" section of the service-specific Product Terms for Azure OpenAI Service, except that the Limited Exception does not apply. Abuse monitoring will be turned on for use of the `gpt-4o-realtime-preview` API even for customers who otherwise are approved for modified abuse monitoring.
+
+## API support
+
+Support for real-time audio was first added in API version `2024-10-01-preview`. 
+
+> [!NOTE]
+> For more information about the API and architecture, see the [Azure OpenAI GPT-4o real-time audio repository on GitHub](https://github.com/azure-samples/aoai-realtime-audio-sdk).
+
+## Prerequisites
+
+- An Azure subscription - <a href="https://azure.microsoft.com/free/cognitive-services" target="_blank">Create one for free</a>.
+- An Azure OpenAI resource created in a [supported region](#supported-models). For more information, see [Create a resource and deploy a model with Azure OpenAI](../how-to/create-resource.md).
+
+## Deploy a model for real-time audio
+
+Before you can use GPT-4o real-time audio, you need a deployment of the `gpt-4o-realtime-preview` model in a supported region as described in the [supported models](#supported-models) section.
+
+You can deploy the model from the Azure OpenAI model catalog or from your project in AI Studio. Follow these steps to deploy a `gpt-4o-realtime-preview` model from the [AI Studio model catalog](../../../ai-studio/how-to/model-catalog-overview.md):
+
+1. Sign in to [AI Studio](https://ai.azure.com) and go to the **Home** page.
+1. Select **Model catalog** from the left sidebar.
+1. Search for and select the `gpt-4o-realtime-preview` model from the Azure OpenAI collection.
+1. Select **Deploy** to open the deployment window.
+1. Enter a deployment name and select an Azure OpenAI resource.
+1. Select `2024-10-01` from the **Model version** dropdown.
+1. Modify other default settings depending on your requirements.
+1. Select **Deploy**. You land on the deployment details page. 
+
+Now that you have a deployment of the `gpt-4o-realtime-preview` model, you can use the playground to interact with the model in real time. Select **Early access playground** from the list of playgrounds in the left pane.
+
+## Use the GPT-4o real-time audio API
+
+> [!TIP]
+> A playground for GPT-4o real-time audio is coming soon to [Azure AI Studio](https://ai.azure.com). You can already use the API directly in your application.
+
+Right now, the fastest way to get started with GPT-4o real-time audio is to download the sample code from the [Azure OpenAI GPT-4o real-time audio repository on GitHub](https://github.com/azure-samples/aoai-realtime-audio-sdk).
+
+The JavaScript web sample demonstrates how to use the GPT-4o real-time audio API to interact with the model in real time. The sample code includes a simple web interface that captures audio from the user's microphone and sends it to the model for processing. The model responds with text and audio, which the sample code renders in the web interface.
+ 
+1. Clone the repository to your local machine:
+    
+    ```bash
+    git clone https://github.com/Azure-Samples/aoai-realtime-audio-sdk.git
+    ```
+
+1. Go to the `javascript/samples/web` folder in your preferred code editor.
+
+    ```bash
+    cd .\javascript\samples\web\
+    ```
+
+1. If you don't have Node.js installed, download and install the [LTS version of Node.js](https://nodejs.org/).
+
+1. Run `npm install` to download a few dependency packages. For more information, see the `package.json` file in the same `web` folder.
+
+1. Run `npm run dev` to start the web server, navigating any firewall permissions prompts as needed.
+1. Go to any of the provided URIs from the console output (such as `http://localhost:5173/`) in a browser.
+1. Enter the following information in the web interface:
+    - **Endpoint**: The resource endpoint of an Azure OpenAI resource. You don't need to append the `/realtime` path. An example structure might be `https://my-azure-openai-resource-from-portal.openai.azure.com`.
+    - **API Key**: A corresponding API key for the Azure OpenAI resource.
+    - **Deployment**: The name of the `gpt-4o-realtime-preview` model that [you deployed in the previous section](#deploy-a-model-for-real-time-audio).
+    - **System Message**: Optionally, you can provide a system message such as "You always talk like a friendly pirate."
+    - **Temperature**: Optionally, you can provide a custom temperature.
+    - **Voice**: Optionally, you can select a voice.
+1. Select the **Record** button to start the session. Accept permissions to use your microphone if prompted.
+1. You should see a `<< Session Started >>` message in the main output. Then you can speak into the microphone to start a chat.
+1. You can interrupt the chat at any time by speaking. You can end the chat by selecting the **Stop** button.
+
+## Related content
+
+* Learn more about Azure OpenAI [deployment types](./deployment-types.md)
+* Learn more about Azure OpenAI [quotas and limits](../quotas-limits.md)
diff --git a/articles/ai-services/openai/toc.yml b/articles/ai-services/openai/toc.yml
@@ -105,6 +105,8 @@ items:
           href: ./how-to/assistants-logic-apps.md
       - name: File search
         href: ./how-to/file-search.md
+  - name: Audio in real time
+    href: ./how-to/audio-real-time.md
   - name: Batch
     href: ./how-to/batch.md
   - name: Completions & chat completions
diff --git a/articles/ai-services/openai/whats-new.md b/articles/ai-services/openai/whats-new.md
@@ -18,6 +18,16 @@ recommendations: false
 
 This article provides a summary of the latest releases and major documentation updates for Azure OpenAI.
 
+## October 2024
+
+### New GPT-4o real-time audio public preview
+
+Azure OpenAI GPT-4o audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions. The GPT-4o audio `realtime` API is designed to handle real-time, low-latency conversational interactions, making it a great fit for use cases involving live interactions between a user and a model, such as customer support agents, voice assistants, and real-time translators.
+
+The `gpt-4o-realtime-preview` model is available for global deployments in [East US 2 and Sweden Central regions](./concepts/models.md#global-standard-model-availability).
+
+For more information, see the [GPT-4o real-time audio documentation](./how-to/audio-real-time.md).
+
 ## September 2024
 
 ### Azure OpenAI Studio UX updates
diff --git a/zone-pivots/zone-pivot-groups.yml b/zone-pivots/zone-pivot-groups.yml
@@ -120,7 +120,7 @@ groups:
   prompt: Choose your preferred usage method
   pivots:
   - id: programming-language-ai-studio
-    title: AI Studio (Preview)
+    title: AI Studio
   - id: programming-language-csharp
     title: C#
   - id: programming-language-python
@@ -760,7 +760,7 @@ groups:
   - id: programming-language-studio
     title: Studio
   - id: programming-language-ai-studio
-    title: AI Studio (Preview)
+    title: AI Studio
   - id: programming-language-python
     title: Python
   - id: rest-api
@@ -840,4 +840,4 @@ groups:
   - id: programming-language-python
     title: Python
   - id: programming-language-powershell
-    title: PowerShell
+    title: PowerShell