Skip to content

Commit c18cb3f

Browse files
authored
Merge pull request #2128 from MicrosoftDocs/main
Merge main to live, 4 AM
2 parents dfca068 + aedc677 commit c18cb3f

15 files changed

+934
-88
lines changed

articles/ai-services/openai/how-to/fine-tuning.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,12 +44,12 @@ We use LoRA, or low rank approximation, to fine-tune models in a way that reduce
4444

4545
::: zone-end
4646

47-
## Global Standard
47+
## Global Standard (preview)
4848

4949
Azure OpenAI fine-tuning supports [global standard deployments](./deployment-types.md#global-standard) in East US2, North Central US, and Sweden Central for:
5050

51-
- `gpt-4o-2024-08-06`
5251
- `gpt-4o-mini-2024-07-18`
52+
- `gpt-4o-2024-08-06` (New deployments aren't available until January 2025)
5353

5454
Global standard fine-tuned deployments offer [cost savings](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/), but custom model weights may temporarily be stored outside the geography of your Azure OpenAI resource.
5555

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
author: eric-urban
3+
ms.author: eur
4+
ms.service: azure-ai-openai
5+
ms.topic: include
6+
ms.date: 12/27/2024
7+
---
8+
9+
|Variable name | Value |
10+
|--------------------------|-------------|
11+
| `AZURE_OPENAI_ENDPOINT` | This value can be found in the **Keys and Endpoint** section when examining your resource from the Azure portal. |
12+
| `AZURE_OPENAI_API_KEY` | This value can be found in the **Keys and Endpoint** section when examining your resource from the Azure portal. You can use either `KEY1` or `KEY2`.|
13+
| `AZURE_OPENAI_DEPLOYMENT_NAME` | This value will correspond to the custom name you chose for your deployment when you deployed a model. This value can be found under **Resource Management** > **Model Deployments** in the Azure portal.|
14+
| `OPENAI_API_VERSION`|Learn more about [API Versions](/azure/ai-services/openai/api-version-deprecation).|
15+
16+
Learn more about [finding API keys](/azure/ai-services/cognitive-services-environment-variables) and [setting environment variables](/azure/ai-services/cognitive-services-environment-variables).
17+
18+
[!INCLUDE [Azure key vault](~/reusable-content/ce-skilling/azure/includes/ai-services/security/azure-key-vault.md)]
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
author: eric-urban
3+
ms.author: eur
4+
ms.service: azure-ai-openai
5+
ms.topic: include
6+
ms.date: 12/27/2024
7+
---
8+
9+
|Variable name | Value |
10+
|--------------------------|-------------|
11+
| `AZURE_OPENAI_ENDPOINT` | This value can be found in the **Keys and Endpoint** section when examining your resource from the Azure portal. |
12+
| `AZURE_OPENAI_DEPLOYMENT_NAME` | This value will correspond to the custom name you chose for your deployment when you deployed a model. This value can be found under **Resource Management** > **Model Deployments** in the Azure portal.|
13+
| `OPENAI_API_VERSION`|Learn more about [API Versions](/azure/ai-services/openai/api-version-deprecation).|
14+
15+
Learn more about [keyless authentication](/azure/ai-services/authentication) and [setting environment variables](/azure/ai-services/cognitive-services-environment-variables).
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
manager: nitinme
3+
author: eric-urban
4+
ms.author: eur
5+
ms.service: azure-ai-openai
6+
ms.topic: include
7+
ms.date: 12/26/2024
8+
---
9+
10+
To deploy the `gpt-4o-realtime-preview` model in the Azure AI Foundry portal:
11+
1. Go to the [Azure AI Foundry portal](https://ai.azure.com) and make sure you're signed in with the Azure subscription that has your Azure OpenAI Service resource (with or without model deployments.)
12+
1. Select the **Real-time audio** playground from under **Playgrounds** in the left pane.
13+
1. Select **Create new deployment** to open the deployment window.
14+
1. Search for and select the `gpt-4o-realtime-preview` model and then select **Confirm**.
15+
1. In the deployment wizard, make sure to select the `2024-10-01` model version.
16+
1. Follow the wizard to finish deploying the model.
17+
18+
Now that you have a deployment of the `gpt-4o-realtime-preview` model, you can interact with it in real time in the Azure AI Foundry portal **Real-time audio** playground or Realtime API.
Lines changed: 278 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,278 @@
1+
---
2+
manager: nitinme
3+
author: eric-urban
4+
ms.author: eur
5+
ms.service: azure-ai-openai
6+
ms.topic: include
7+
ms.date: 12/26/2024
8+
---
9+
10+
## Prerequisites
11+
12+
- An Azure subscription - <a href="https://azure.microsoft.com/free/cognitive-services" target="_blank">Create one for free</a>
13+
- <a href="https://nodejs.org/" target="_blank">Node.js LTS or ESM support.</a>
14+
- An Azure OpenAI resource created in the East US 2 or Sweden Central regions. See [Region availability](/azure/ai-services/openai/concepts/models#model-summary-table-and-region-availability).
15+
- Then, you need to deploy a `gpt-4o-realtime-preview` model with your Azure OpenAI resource. For more information, see [Create a resource and deploy a model with Azure OpenAI](../how-to/create-resource.md).
16+
17+
## Microsoft Entra ID prerequisites
18+
19+
For the recommended keyless authentication with Microsoft Entra ID, you need to:
20+
- Install the [Azure CLI](/cli/azure/install-azure-cli) used for keyless authentication with Microsoft Entra ID.
21+
- Assign the `Cognitive Services User` role to your user account. You can assign roles in the Azure portal under **Access control (IAM)** > **Add role assignment**.
22+
23+
## Deploy a model for real-time audio
24+
25+
[!INCLUDE [Deploy model](realtime-deploy-model.md)]
26+
27+
## Set up
28+
29+
1. Create a new folder `realtime-audio-quickstart` to contain the application and open Visual Studio Code in that folder with the following command:
30+
31+
```shell
32+
mkdir realtime-audio-quickstart && code realtime-audio-quickstart
33+
```
34+
35+
1. Create the `package.json` with the following command:
36+
37+
```shell
38+
npm init -y
39+
```
40+
41+
1. Update the `package.json` to ECMAScript with the following command:
42+
43+
```shell
44+
npm pkg set type=module
45+
```
46+
47+
48+
1. Install the real-time audio client library for JavaScript with:
49+
50+
```console
51+
npm install https://github.com/Azure-Samples/aoai-realtime-audio-sdk/releases/download/js/v0.5.2/rt-client-0.5.2.tgz
52+
```
53+
54+
1. For the **recommended** keyless authentication with Microsoft Entra ID, install the `@azure/identity` package with:
55+
56+
```console
57+
npm install @azure/identity
58+
```
59+
60+
## Retrieve resource information
61+
62+
#### [Microsoft Entra ID](#tab/javascript-keyless)
63+
64+
[!INCLUDE [keyless-environment-variables](env-var-without-key.md)]
65+
66+
#### [API key](#tab/javascript-key)
67+
68+
[!INCLUDE [key-environment-variables](env-var-key.md)]
69+
70+
---
71+
72+
> [!CAUTION]
73+
> To use the recommended keyless authentication with the SDK, make sure that the `AZURE_OPENAI_API_KEY` environment variable isn't set.
74+
75+
## Text in audio out
76+
77+
#### [Microsoft Entra ID](#tab/javascript-keyless)
78+
79+
1. Create the `text-in-audio-out.js` file with the following code:
80+
81+
```javascript
82+
import { DefaultAzureCredential } from "@azure/identity";
83+
import { LowLevelRTClient } from "rt-client";
84+
import dotenv from "dotenv";
85+
dotenv.config();
86+
async function text_in_audio_out() {
87+
// Set environment variables or edit the corresponding values here.
88+
const endpoint = process.env["AZURE_OPENAI_ENDPOINT"] || "yourEndpoint";
89+
const deployment = "gpt-4o-realtime-preview";
90+
if (!endpoint || !deployment) {
91+
throw new Error("You didn't set the environment variables.");
92+
}
93+
const client = new LowLevelRTClient(new URL(endpoint), new DefaultAzureCredential(), { deployment: deployment });
94+
try {
95+
await client.send({
96+
type: "response.create",
97+
response: {
98+
modalities: ["audio", "text"],
99+
instructions: "Please assist the user."
100+
}
101+
});
102+
for await (const message of client.messages()) {
103+
switch (message.type) {
104+
case "response.done": {
105+
break;
106+
}
107+
case "error": {
108+
console.error(message.error);
109+
break;
110+
}
111+
case "response.audio_transcript.delta": {
112+
console.log(`Received text delta: ${message.delta}`);
113+
break;
114+
}
115+
case "response.audio.delta": {
116+
const buffer = Buffer.from(message.delta, "base64");
117+
console.log(`Received ${buffer.length} bytes of audio data.`);
118+
break;
119+
}
120+
}
121+
if (message.type === "response.done" || message.type === "error") {
122+
break;
123+
}
124+
}
125+
}
126+
finally {
127+
client.close();
128+
}
129+
}
130+
await text_in_audio_out();
131+
```
132+
133+
1. Sign in to Azure with the following command:
134+
135+
```shell
136+
az login
137+
```
138+
139+
1. Run the JavaScript file.
140+
141+
```shell
142+
node text-in-audio-out.js
143+
```
144+
145+
146+
#### [API key](#tab/javascript-key)
147+
148+
1. Create the `text-in-audio-out.js` file with the following code:
149+
150+
```javascript
151+
import { AzureKeyCredential } from "@azure/core-auth";
152+
import { LowLevelRTClient } from "rt-client";
153+
import dotenv from "dotenv";
154+
dotenv.config();
155+
async function text_in_audio_out() {
156+
// Set environment variables or edit the corresponding values here.
157+
const apiKey = process.env["AZURE_OPENAI_API_KEY"] || "yourKey";
158+
const endpoint = process.env["AZURE_OPENAI_ENDPOINT"] || "yourEndpoint";
159+
const deployment = "gpt-4o-realtime-preview";
160+
if (!endpoint || !deployment) {
161+
throw new Error("You didn't set the environment variables.");
162+
}
163+
const client = new LowLevelRTClient(new URL(endpoint), new AzureKeyCredential(apiKey), { deployment: deployment });
164+
try {
165+
await client.send({
166+
type: "response.create",
167+
response: {
168+
modalities: ["audio", "text"],
169+
instructions: "Please assist the user."
170+
}
171+
});
172+
for await (const message of client.messages()) {
173+
switch (message.type) {
174+
case "response.done": {
175+
break;
176+
}
177+
case "error": {
178+
console.error(message.error);
179+
break;
180+
}
181+
case "response.audio_transcript.delta": {
182+
console.log(`Received text delta: ${message.delta}`);
183+
break;
184+
}
185+
case "response.audio.delta": {
186+
const buffer = Buffer.from(message.delta, "base64");
187+
console.log(`Received ${buffer.length} bytes of audio data.`);
188+
break;
189+
}
190+
}
191+
if (message.type === "response.done" || message.type === "error") {
192+
break;
193+
}
194+
}
195+
}
196+
finally {
197+
client.close();
198+
}
199+
}
200+
await text_in_audio_out();
201+
```
202+
203+
1. Run the JavaScript file.
204+
205+
```shell
206+
node text-in-audio-out.js
207+
```
208+
209+
---
210+
211+
Wait a few moments to get the response.
212+
213+
## Output
214+
215+
The script gets a response from the model and prints the transcript and audio data received.
216+
217+
The output will look similar to the following:
218+
219+
```console
220+
Received text delta: Hello
221+
Received text delta: !
222+
Received text delta: How
223+
Received text delta: can
224+
Received text delta: I
225+
Received 4800 bytes of audio data.
226+
Received 7200 bytes of audio data.
227+
Received text delta: help
228+
Received 12000 bytes of audio data.
229+
Received text delta: you
230+
Received text delta: today
231+
Received text delta: ?
232+
Received 12000 bytes of audio data.
233+
Received 12000 bytes of audio data.
234+
Received 12000 bytes of audio data.
235+
Received 24000 bytes of audio data.
236+
```
237+
238+
## Web application sample
239+
240+
Our JavaScript web sample [on GitHub](https://github.com/azure-samples/aoai-realtime-audio-sdk) demonstrates how to use the GPT-4o Realtime API to interact with the model in real time. The sample code includes a simple web interface that captures audio from the user's microphone and sends it to the model for processing. The model responds with text and audio, which the sample code renders in the web interface.
241+
242+
You can run the sample code locally on your machine by following these steps. Refer to the [repository on GitHub](https://github.com/azure-samples/aoai-realtime-audio-sdk) for the most up-to-date instructions.
243+
1. If you don't have Node.js installed, download and install the [LTS version of Node.js](https://nodejs.org/).
244+
245+
1. Clone the repository to your local machine:
246+
247+
```bash
248+
git clone https://github.com/Azure-Samples/aoai-realtime-audio-sdk.git
249+
```
250+
251+
1. Go to the `javascript/samples/web` folder in your preferred code editor.
252+
253+
```bash
254+
cd ./javascript/samples
255+
```
256+
257+
1. Run `download-pkg.ps1` or `download-pkg.sh` to download the required packages.
258+
259+
1. Go to the `web` folder from the `./javascript/samples` folder.
260+
261+
```bash
262+
cd ./web
263+
```
264+
265+
1. Run `npm install` to install package dependencies.
266+
267+
1. Run `npm run dev` to start the web server, navigating any firewall permissions prompts as needed.
268+
1. Go to any of the provided URIs from the console output (such as `http://localhost:5173/`) in a browser.
269+
1. Enter the following information in the web interface:
270+
- **Endpoint**: The resource endpoint of an Azure OpenAI resource. You don't need to append the `/realtime` path. An example structure might be `https://my-azure-openai-resource-from-portal.openai.azure.com`.
271+
- **API Key**: A corresponding API key for the Azure OpenAI resource.
272+
- **Deployment**: The name of the `gpt-4o-realtime-preview` model that [you deployed in the previous section](#deploy-a-model-for-real-time-audio).
273+
- **System Message**: Optionally, you can provide a system message such as "You always talk like a friendly pirate."
274+
- **Temperature**: Optionally, you can provide a custom temperature.
275+
- **Voice**: Optionally, you can select a voice.
276+
1. Select the **Record** button to start the session. Accept permissions to use your microphone if prompted.
277+
1. You should see a `<< Session Started >>` message in the main output. Then you can speak into the microphone to start a chat.
278+
1. You can interrupt the chat at any time by speaking. You can end the chat by selecting the **Stop** button.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
manager: nitinme
3+
author: eric-urban
4+
ms.author: eur
5+
ms.service: azure-ai-openai
6+
ms.topic: include
7+
ms.date: 12/26/2024
8+
---
9+
10+
## Deploy a model for real-time audio
11+
12+
[!INCLUDE [Deploy model](realtime-deploy-model.md)]
13+
14+
## Use the GPT-4o real-time audio
15+
16+
To chat with your deployed `gpt-4o-realtime-preview` model in the [Azure AI Foundry](https://ai.azure.com) **Real-time audio** playground, follow these steps:
17+
18+
1. Go to the [Azure OpenAI Service page](https://ai.azure.com/resource/overview) in Azure AI Foundry portal. Make sure you're signed in with the Azure subscription that has your Azure OpenAI Service resource and the deployed `gpt-4o-realtime-preview` model.
19+
1. Select the **Real-time audio** playground from under **Playgrounds** in the left pane.
20+
1. Select your deployed `gpt-4o-realtime-preview` model from the **Deployment** dropdown.
21+
1. Select **Enable microphone** to allow the browser to access your microphone. If you already granted permission, you can skip this step.
22+
23+
:::image type="content" source="../media/how-to/real-time/real-time-playground.png" alt-text="Screenshot of the real-time audio playground with the deployed model selected." lightbox="../media/how-to/real-time/real-time-playground.png":::
24+
25+
1. Optionally, you can edit contents in the **Give the model instructions and context** text box. Give the model instructions about how it should behave and any context it should reference when generating a response. You can describe the assistant's personality, tell it what it should and shouldn't answer, and tell it how to format responses.
26+
1. Optionally, change settings such as threshold, prefix padding, and silence duration.
27+
1. Select **Start listening** to start the session. You can speak into the microphone to start a chat.
28+
29+
:::image type="content" source="../media/how-to/real-time/real-time-playground-start-listening.png" alt-text="Screenshot of the real-time audio playground with the start listening button and microphone access enabled." lightbox="../media/how-to/real-time/real-time-playground-start-listening.png":::
30+
31+
1. You can interrupt the chat at any time by speaking. You can end the chat by selecting the **Stop listening** button.

0 commit comments

Comments
 (0)