Skip to content

Commit f766e0c

Browse files
authored
Merge pull request #563 from MicrosoftDocs/release-2024-openai-oct
[Azure OpenAI] Release branch to main tracking PR
2 parents ac60ccf + 10f603f commit f766e0c

File tree

5 files changed

+136
-7
lines changed

5 files changed

+136
-7
lines changed

articles/ai-services/openai/concepts/models.md

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ titleSuffix: Azure OpenAI
44
description: Learn about the different model capabilities that are available with Azure OpenAI.
55
ms.service: azure-ai-openai
66
ms.topic: conceptual
7-
ms.date: 09/12/2024
7+
ms.date: 09/30/2024
88
ms.custom: references_regions, build-2023, build-2023-dataai, refefences_regions
99
manager: nitinme
1010
author: mrbullwinkle #ChrisHMSFT
@@ -19,6 +19,7 @@ Azure OpenAI Service is powered by a diverse set of models with different capabi
1919
| Models | Description |
2020
|--|--|
2121
| [GPT-4o & GPT-4o mini & GPT-4 Turbo](#gpt-4o-and-gpt-4-turbo) | The latest most capable Azure OpenAI models with multimodal versions, which can accept both text and images as input. |
22+
| [GPT-4o audio](#gpt-4o-audio) | A GPT-4o model that supports low-latency, "speech in, speech out" conversational interactions. |
2223
| [GPT-4](#gpt-4) | A set of models that improve on GPT-3.5 and can understand and generate natural language and code. |
2324
| [GPT-3.5](#gpt-35) | A set of models that improve on GPT-3 and can understand and generate natural language and code. |
2425
| [Embeddings](#embeddings-models) | A set of models that can convert text into numerical vector form to facilitate text similarity. |
@@ -43,6 +44,20 @@ Once access has been granted, you will need to:
4344
1. Navigate to https://ai.azure.com/resources and select a resource in the `eastus2` region. If you do not have an Azure OpenAI resource in this region you will need to [create one](https://portal.azure.com/#create/Microsoft.CognitiveServicesOpenAI).
4445
2. Once the `eastus2` Azure OpenAI resource is selected, in the upper left-hand panel under **Playgrounds** select **Early access playground (preview)**.
4546

47+
## GPT-4o audio
48+
49+
The `gpt-4o-realtime-preview` model is part of the GPT-4o model family and supports low-latency, "speech in, speech out" conversational interactions. GPT-4o audio is designed to handle real-time, low-latency conversational interactions, making it a great fit for support agents, assistants, translators, and other use cases that need highly responsive back-and-forth with a user.
50+
51+
GPT-4o audio is available in the East US 2 (`eastus2`) and Sweden Central (`swedencentral`) regions. To use GPT-4o audio, you need to [create](../how-to/create-resource.md) or use an existing resource in one of the supported regions.
52+
53+
When your resource is created, you can [deploy](../how-to/create-resource.md#deploy-a-model) the GPT-4o audio model. If you are performing a programmatic deployment, the **model** name is `gpt-4o-realtime-preview`. For more information on how to use GPT-4o audio, see the [GPT-4o audio documentation](../how-to/audio-real-time.md).
54+
55+
Details about maximum request tokens and training data are available in the following table.
56+
57+
| Model ID | Description | Max Request (tokens) | Training Data (up to) |
58+
| --- | :--- |:--- |:---: |
59+
|`gpt-4o-realtime-preview` (2024-10-01-preview) <br> **GPT-4o audio** | **Audio model** for real-time audio processing |Input: 128,000 <br> Output: 4,096 | Oct 2023 |
60+
4661
## GPT-4o and GPT-4 Turbo
4762

4863
GPT-4o integrates text and images in a single model, enabling it to handle multiple data types simultaneously. This multimodal approach enhances accuracy and responsiveness in human-computer interactions. GPT-4o matches GPT-4 Turbo in English text and coding tasks while offering superior performance in non-English languages and vision tasks, setting new benchmarks for AI capabilities.
@@ -96,15 +111,17 @@ See [model versions](../concepts/model-versions.md) to learn about how Azure Ope
96111
| `gpt-4` (0314) | **Older GA model** <br> - [Retirement information](./model-retirements.md#current-models) | 8,192 | Sep 2021 |
97112

98113
> [!CAUTION]
99-
> We don't recommend using preview models in production. We will upgrade all deployments of preview models to either future preview versions or to the latest stable/GA version. Models designated preview do not follow the standard Azure OpenAI model lifecycle.
114+
> We don't recommend using preview models in production. We will upgrade all deployments of preview models to either future preview versions or to the latest stable GA version. Models designated preview do not follow the standard Azure OpenAI model lifecycle.
100115
101116
- GPT-4 version 0125-preview is an updated version of the GPT-4 Turbo preview previously released as version 1106-preview.
102117
- GPT-4 version 0125-preview completes tasks such as code generation more completely compared to gpt-4-1106-preview. Because of this, depending on the task, customers may find that GPT-4-0125-preview generates more output compared to the gpt-4-1106-preview. We recommend customers compare the outputs of the new model. GPT-4-0125-preview also addresses bugs in gpt-4-1106-preview with UTF-8 handling for non-English languages.
103118
- GPT-4 version `turbo-2024-04-09` is the latest GA release and replaces `0125-Preview`, `1106-preview`, and `vision-preview`.
104119

105120
> [!IMPORTANT]
106-
>
107-
> - `gpt-4` versions 1106-Preview, 0125-Preview, and vision-preview will be upgraded with a stable version of `gpt-4` in the future. Deployments of `gpt-4` versions 1106-Preview, 0125-Preview, and vision-preview set to "Auto-update to default" and "Upgrade when expired" will start to be upgraded after the stable version is released. For each deployment, a model version upgrade takes place with no interruption in service for API calls. Upgrades are staged by region and the full upgrade process is expected to take 2 weeks. Deployments of `gpt-4` versions 1106-Preview, 0125-Preview, and vision-preview set to "No autoupgrade" will not be upgraded and will stop operating when the preview version is upgraded in the region. See [Azure OpenAI model retirements and deprecations](./model-retirements.md) for more information on the timing of the upgrade.
121+
> The GPT-4 (`gpt-4`) versions `1106-Preview`, `0125-Preview`, and `vision-preview` will be upgraded with a stable version of `gpt-4` in the future.
122+
> - Deployments of `gpt-4` versions `1106-Preview`, `0125-Preview`, and `vision-preview` set to "Auto-update to default" and "Upgrade when expired" will start to be upgraded after the stable version is released. For each deployment, a model version upgrade takes place with no interruption in service for API calls. Upgrades are staged by region and the full upgrade process is expected to take 2 weeks.
123+
> - Deployments of `gpt-4` versions `1106-Preview`, `0125-Preview`, and `vision-preview` set to "No autoupgrade" will not be upgraded and will stop operating when the preview version is upgraded in the region.
124+
> See [Azure OpenAI model retirements and deprecations](./model-retirements.md) for more information on the timing of the upgrade.
108125
109126
## GPT-3.5
110127

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
---
2+
title: 'How to use GPT-4o real-time audio with Azure OpenAI Service'
3+
titleSuffix: Azure OpenAI
4+
description: Learn how to use GPT-4o real-time audio with Azure OpenAI Service.
5+
manager: nitinme
6+
ms.service: azure-ai-openai
7+
ms.topic: how-to
8+
ms.date: 10/1/2024
9+
author: eric-urban
10+
ms.author: eur
11+
ms.custom: references_regions
12+
recommendations: false
13+
---
14+
15+
# GPT-4o real-time audio
16+
17+
Azure OpenAI GPT-4o audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions. The GPT-4o audio `realtime` API is designed to handle real-time, low-latency conversational interactions, making it a great fit for use cases involving live interactions between a user and a model, such as customer support agents, voice assistants, and real-time translators.
18+
19+
Most users of this API need to deliver and receive audio from an end-user in real time, including applications that use WebRTC or a telephony system. The real-time API isn't designed to connect directly to end user devices and relies on client integrations to terminate end user audio streams.
20+
21+
## Supported models
22+
23+
Currently only `gpt-4o-realtime-preview` version: `2024-10-01-preview` supports real-time audio.
24+
25+
The `gpt-4o-realtime-preview` model is available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
26+
27+
> [!IMPORTANT]
28+
> The system stores your prompts and completions as described in the "Data Use and Access for Abuse Monitoring" section of the service-specific Product Terms for Azure OpenAI Service, except that the Limited Exception does not apply. Abuse monitoring will be turned on for use of the `gpt-4o-realtime-preview` API even for customers who otherwise are approved for modified abuse monitoring.
29+
30+
## API support
31+
32+
Support for real-time audio was first added in API version `2024-10-01-preview`.
33+
34+
> [!NOTE]
35+
> For more information about the API and architecture, see the [Azure OpenAI GPT-4o real-time audio repository on GitHub](https://github.com/azure-samples/aoai-realtime-audio-sdk).
36+
37+
## Prerequisites
38+
39+
- An Azure subscription - <a href="https://azure.microsoft.com/free/cognitive-services" target="_blank">Create one for free</a>.
40+
- An Azure OpenAI resource created in a [supported region](#supported-models). For more information, see [Create a resource and deploy a model with Azure OpenAI](../how-to/create-resource.md).
41+
42+
## Deploy a model for real-time audio
43+
44+
Before you can use GPT-4o real-time audio, you need a deployment of the `gpt-4o-realtime-preview` model in a supported region as described in the [supported models](#supported-models) section.
45+
46+
You can deploy the model from the Azure OpenAI model catalog or from your project in AI Studio. Follow these steps to deploy a `gpt-4o-realtime-preview` model from the [AI Studio model catalog](../../../ai-studio/how-to/model-catalog-overview.md):
47+
48+
1. Sign in to [AI Studio](https://ai.azure.com) and go to the **Home** page.
49+
1. Select **Model catalog** from the left sidebar.
50+
1. Search for and select the `gpt-4o-realtime-preview` model from the Azure OpenAI collection.
51+
1. Select **Deploy** to open the deployment window.
52+
1. Enter a deployment name and select an Azure OpenAI resource.
53+
1. Select `2024-10-01` from the **Model version** dropdown.
54+
1. Modify other default settings depending on your requirements.
55+
1. Select **Deploy**. You land on the deployment details page.
56+
57+
Now that you have a deployment of the `gpt-4o-realtime-preview` model, you can use the playground to interact with the model in real time. Select **Early access playground** from the list of playgrounds in the left pane.
58+
59+
## Use the GPT-4o real-time audio API
60+
61+
> [!TIP]
62+
> A playground for GPT-4o real-time audio is coming soon to [Azure AI Studio](https://ai.azure.com). You can already use the API directly in your application.
63+
64+
Right now, the fastest way to get started with GPT-4o real-time audio is to download the sample code from the [Azure OpenAI GPT-4o real-time audio repository on GitHub](https://github.com/azure-samples/aoai-realtime-audio-sdk).
65+
66+
The JavaScript web sample demonstrates how to use the GPT-4o real-time audio API to interact with the model in real time. The sample code includes a simple web interface that captures audio from the user's microphone and sends it to the model for processing. The model responds with text and audio, which the sample code renders in the web interface.
67+
68+
1. Clone the repository to your local machine:
69+
70+
```bash
71+
git clone https://github.com/Azure-Samples/aoai-realtime-audio-sdk.git
72+
```
73+
74+
1. Go to the `javascript/samples/web` folder in your preferred code editor.
75+
76+
```bash
77+
cd .\javascript\samples\web\
78+
```
79+
80+
1. If you don't have Node.js installed, download and install the [LTS version of Node.js](https://nodejs.org/).
81+
82+
1. Run `npm install` to download a few dependency packages. For more information, see the `package.json` file in the same `web` folder.
83+
84+
1. Run `npm run dev` to start the web server, navigating any firewall permissions prompts as needed.
85+
1. Go to any of the provided URIs from the console output (such as `http://localhost:5173/`) in a browser.
86+
1. Enter the following information in the web interface:
87+
- **Endpoint**: The resource endpoint of an Azure OpenAI resource. You don't need to append the `/realtime` path. An example structure might be `https://my-azure-openai-resource-from-portal.openai.azure.com`.
88+
- **API Key**: A corresponding API key for the Azure OpenAI resource.
89+
- **Deployment**: The name of the `gpt-4o-realtime-preview` model that [you deployed in the previous section](#deploy-a-model-for-real-time-audio).
90+
- **System Message**: Optionally, you can provide a system message such as "You always talk like a friendly pirate."
91+
- **Temperature**: Optionally, you can provide a custom temperature.
92+
- **Voice**: Optionally, you can select a voice.
93+
1. Select the **Record** button to start the session. Accept permissions to use your microphone if prompted.
94+
1. You should see a `<< Session Started >>` message in the main output. Then you can speak into the microphone to start a chat.
95+
1. You can interrupt the chat at any time by speaking. You can end the chat by selecting the **Stop** button.
96+
97+
## Related content
98+
99+
* Learn more about Azure OpenAI [deployment types](./deployment-types.md)
100+
* Learn more about Azure OpenAI [quotas and limits](../quotas-limits.md)

articles/ai-services/openai/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,8 @@ items:
105105
href: ./how-to/assistants-logic-apps.md
106106
- name: File search
107107
href: ./how-to/file-search.md
108+
- name: Audio in real time
109+
href: ./how-to/audio-real-time.md
108110
- name: Batch
109111
href: ./how-to/batch.md
110112
- name: Completions & chat completions

articles/ai-services/openai/whats-new.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,16 @@ recommendations: false
1818

1919
This article provides a summary of the latest releases and major documentation updates for Azure OpenAI.
2020

21+
## October 2024
22+
23+
### New GPT-4o real-time audio public preview
24+
25+
Azure OpenAI GPT-4o audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions. The GPT-4o audio `realtime` API is designed to handle real-time, low-latency conversational interactions, making it a great fit for use cases involving live interactions between a user and a model, such as customer support agents, voice assistants, and real-time translators.
26+
27+
The `gpt-4o-realtime-preview` model is available for global deployments in [East US 2 and Sweden Central regions](./concepts/models.md#global-standard-model-availability).
28+
29+
For more information, see the [GPT-4o real-time audio documentation](./how-to/audio-real-time.md).
30+
2131
## September 2024
2232

2333
### Azure OpenAI Studio UX updates

zone-pivots/zone-pivot-groups.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ groups:
120120
prompt: Choose your preferred usage method
121121
pivots:
122122
- id: programming-language-ai-studio
123-
title: AI Studio (Preview)
123+
title: AI Studio
124124
- id: programming-language-csharp
125125
title: C#
126126
- id: programming-language-python
@@ -760,7 +760,7 @@ groups:
760760
- id: programming-language-studio
761761
title: Studio
762762
- id: programming-language-ai-studio
763-
title: AI Studio (Preview)
763+
title: AI Studio
764764
- id: programming-language-python
765765
title: Python
766766
- id: rest-api
@@ -840,4 +840,4 @@ groups:
840840
- id: programming-language-python
841841
title: Python
842842
- id: programming-language-powershell
843-
title: PowerShell
843+
title: PowerShell

0 commit comments

Comments
 (0)