|
| 1 | +--- |
| 2 | +manager: nitinme |
| 3 | +author: eric-urban |
| 4 | +ms.author: eur |
| 5 | +ms.service: azure-ai-openai |
| 6 | +ms.topic: include |
| 7 | +ms.date: 12/26/2024 |
| 8 | +--- |
| 9 | + |
| 10 | +## Prerequisites |
| 11 | + |
| 12 | +- An Azure subscription - <a href="https://azure.microsoft.com/free/cognitive-services" target="_blank">Create one for free</a> |
| 13 | +- <a href="https://nodejs.org/" target="_blank">Node.js LTS or ESM support.</a> |
| 14 | +- An Azure OpenAI resource created in the East US 2 or Sweden Central regions. See [Region availability](/azure/ai-services/openai/concepts/models#model-summary-table-and-region-availability). |
| 15 | +- Then, you need to deploy a `gpt-4o-realtime-preview` model with your Azure OpenAI resource. For more information, see [Create a resource and deploy a model with Azure OpenAI](../how-to/create-resource.md). |
| 16 | + |
| 17 | +## Microsoft Entra ID prerequisites |
| 18 | + |
| 19 | +For the recommended keyless authentication with Microsoft Entra ID, you need to: |
| 20 | +- Install the [Azure CLI](/cli/azure/install-azure-cli) used for keyless authentication with Microsoft Entra ID. |
| 21 | +- Assign the `Cognitive Services User` role to your user account. You can assign roles in the Azure portal under **Access control (IAM)** > **Add role assignment**. |
| 22 | + |
| 23 | +## Deploy a model for real-time audio |
| 24 | + |
| 25 | +[!INCLUDE [Deploy model](realtime-deploy-model.md)] |
| 26 | + |
| 27 | +## Set up |
| 28 | + |
| 29 | +1. Create a new folder `realtime-audio-quickstart` to contain the application and open Visual Studio Code in that folder with the following command: |
| 30 | + |
| 31 | + ```shell |
| 32 | + mkdir realtime-audio-quickstart && code realtime-audio-quickstart |
| 33 | + ``` |
| 34 | + |
| 35 | +1. Create the `package.json` with the following command: |
| 36 | + |
| 37 | + ```shell |
| 38 | + npm init -y |
| 39 | + ``` |
| 40 | + |
| 41 | +1. Update the `package.json` to ECMAScript with the following command: |
| 42 | + |
| 43 | + ```shell |
| 44 | + npm pkg set type=module |
| 45 | + ``` |
| 46 | + |
| 47 | + |
| 48 | +1. Install the real-time audio client library for JavaScript with: |
| 49 | + |
| 50 | + ```console |
| 51 | + npm install https://github.com/Azure-Samples/aoai-realtime-audio-sdk/releases/download/js/v0.5.2/rt-client-0.5.2.tgz |
| 52 | + ``` |
| 53 | + |
| 54 | +1. For the **recommended** keyless authentication with Microsoft Entra ID, install the `@azure/identity` package with: |
| 55 | + |
| 56 | + ```console |
| 57 | + npm install @azure/identity |
| 58 | + ``` |
| 59 | + |
| 60 | +## Retrieve resource information |
| 61 | + |
| 62 | +#### [Microsoft Entra ID](#tab/javascript-keyless) |
| 63 | + |
| 64 | +[!INCLUDE [keyless-environment-variables](env-var-without-key.md)] |
| 65 | + |
| 66 | +#### [API key](#tab/javascript-key) |
| 67 | + |
| 68 | +[!INCLUDE [key-environment-variables](env-var-key.md)] |
| 69 | + |
| 70 | +--- |
| 71 | + |
| 72 | +> [!CAUTION] |
| 73 | +> To use the recommended keyless authentication with the SDK, make sure that the `AZURE_OPENAI_API_KEY` environment variable isn't set. |
| 74 | +
|
| 75 | +## Text in audio out |
| 76 | +
|
| 77 | +#### [Microsoft Entra ID](#tab/javascript-keyless) |
| 78 | +
|
| 79 | +1. Create the `text-in-audio-out.js` file with the following code: |
| 80 | +
|
| 81 | + ```javascript |
| 82 | + import { DefaultAzureCredential } from "@azure/identity"; |
| 83 | + import { LowLevelRTClient } from "rt-client"; |
| 84 | + import dotenv from "dotenv"; |
| 85 | + dotenv.config(); |
| 86 | + async function text_in_audio_out() { |
| 87 | + // Set environment variables or edit the corresponding values here. |
| 88 | + const endpoint = process.env["AZURE_OPENAI_ENDPOINT"] || "yourEndpoint"; |
| 89 | + const deployment = "gpt-4o-realtime-preview"; |
| 90 | + if (!endpoint || !deployment) { |
| 91 | + throw new Error("You didn't set the environment variables."); |
| 92 | + } |
| 93 | + const client = new LowLevelRTClient(new URL(endpoint), new DefaultAzureCredential(), { deployment: deployment }); |
| 94 | + try { |
| 95 | + await client.send({ |
| 96 | + type: "response.create", |
| 97 | + response: { |
| 98 | + modalities: ["audio", "text"], |
| 99 | + instructions: "Please assist the user." |
| 100 | + } |
| 101 | + }); |
| 102 | + for await (const message of client.messages()) { |
| 103 | + switch (message.type) { |
| 104 | + case "response.done": { |
| 105 | + break; |
| 106 | + } |
| 107 | + case "error": { |
| 108 | + console.error(message.error); |
| 109 | + break; |
| 110 | + } |
| 111 | + case "response.audio_transcript.delta": { |
| 112 | + console.log(`Received text delta: ${message.delta}`); |
| 113 | + break; |
| 114 | + } |
| 115 | + case "response.audio.delta": { |
| 116 | + const buffer = Buffer.from(message.delta, "base64"); |
| 117 | + console.log(`Received ${buffer.length} bytes of audio data.`); |
| 118 | + break; |
| 119 | + } |
| 120 | + } |
| 121 | + if (message.type === "response.done" || message.type === "error") { |
| 122 | + break; |
| 123 | + } |
| 124 | + } |
| 125 | + } |
| 126 | + finally { |
| 127 | + client.close(); |
| 128 | + } |
| 129 | + } |
| 130 | + await text_in_audio_out(); |
| 131 | + ``` |
| 132 | +
|
| 133 | +1. Sign in to Azure with the following command: |
| 134 | +
|
| 135 | + ```shell |
| 136 | + az login |
| 137 | + ``` |
| 138 | +
|
| 139 | +1. Run the JavaScript file. |
| 140 | +
|
| 141 | + ```shell |
| 142 | + node text-in-audio-out.js |
| 143 | + ``` |
| 144 | +
|
| 145 | +
|
| 146 | +#### [API key](#tab/javascript-key) |
| 147 | +
|
| 148 | +1. Create the `text-in-audio-out.js` file with the following code: |
| 149 | +
|
| 150 | + ```javascript |
| 151 | + import { AzureKeyCredential } from "@azure/core-auth"; |
| 152 | + import { LowLevelRTClient } from "rt-client"; |
| 153 | + import dotenv from "dotenv"; |
| 154 | + dotenv.config(); |
| 155 | + async function text_in_audio_out() { |
| 156 | + // Set environment variables or edit the corresponding values here. |
| 157 | + const apiKey = process.env["AZURE_OPENAI_API_KEY"] || "yourKey"; |
| 158 | + const endpoint = process.env["AZURE_OPENAI_ENDPOINT"] || "yourEndpoint"; |
| 159 | + const deployment = "gpt-4o-realtime-preview"; |
| 160 | + if (!endpoint || !deployment) { |
| 161 | + throw new Error("You didn't set the environment variables."); |
| 162 | + } |
| 163 | + const client = new LowLevelRTClient(new URL(endpoint), new AzureKeyCredential(apiKey), { deployment: deployment }); |
| 164 | + try { |
| 165 | + await client.send({ |
| 166 | + type: "response.create", |
| 167 | + response: { |
| 168 | + modalities: ["audio", "text"], |
| 169 | + instructions: "Please assist the user." |
| 170 | + } |
| 171 | + }); |
| 172 | + for await (const message of client.messages()) { |
| 173 | + switch (message.type) { |
| 174 | + case "response.done": { |
| 175 | + break; |
| 176 | + } |
| 177 | + case "error": { |
| 178 | + console.error(message.error); |
| 179 | + break; |
| 180 | + } |
| 181 | + case "response.audio_transcript.delta": { |
| 182 | + console.log(`Received text delta: ${message.delta}`); |
| 183 | + break; |
| 184 | + } |
| 185 | + case "response.audio.delta": { |
| 186 | + const buffer = Buffer.from(message.delta, "base64"); |
| 187 | + console.log(`Received ${buffer.length} bytes of audio data.`); |
| 188 | + break; |
| 189 | + } |
| 190 | + } |
| 191 | + if (message.type === "response.done" || message.type === "error") { |
| 192 | + break; |
| 193 | + } |
| 194 | + } |
| 195 | + } |
| 196 | + finally { |
| 197 | + client.close(); |
| 198 | + } |
| 199 | + } |
| 200 | + await text_in_audio_out(); |
| 201 | + ``` |
| 202 | +
|
| 203 | +1. Run the JavaScript file. |
| 204 | +
|
| 205 | + ```shell |
| 206 | + node text-in-audio-out.js |
| 207 | + ``` |
| 208 | +
|
| 209 | +--- |
| 210 | +
|
| 211 | +Wait a few moments to get the response. |
| 212 | +
|
| 213 | +## Output |
| 214 | +
|
| 215 | +The script gets a response from the model and prints the transcript and audio data received. |
| 216 | +
|
| 217 | +The output will look similar to the following: |
| 218 | +
|
| 219 | +```console |
| 220 | +Received text delta: Hello |
| 221 | +Received text delta: ! |
| 222 | +Received text delta: How |
| 223 | +Received text delta: can |
| 224 | +Received text delta: I |
| 225 | +Received 4800 bytes of audio data. |
| 226 | +Received 7200 bytes of audio data. |
| 227 | +Received text delta: help |
| 228 | +Received 12000 bytes of audio data. |
| 229 | +Received text delta: you |
| 230 | +Received text delta: today |
| 231 | +Received text delta: ? |
| 232 | +Received 12000 bytes of audio data. |
| 233 | +Received 12000 bytes of audio data. |
| 234 | +Received 12000 bytes of audio data. |
| 235 | +Received 24000 bytes of audio data. |
| 236 | +``` |
| 237 | +
|
| 238 | +## Web application sample |
| 239 | +
|
| 240 | +Our JavaScript web sample [on GitHub](https://github.com/azure-samples/aoai-realtime-audio-sdk) demonstrates how to use the GPT-4o Realtime API to interact with the model in real time. The sample code includes a simple web interface that captures audio from the user's microphone and sends it to the model for processing. The model responds with text and audio, which the sample code renders in the web interface. |
| 241 | + |
| 242 | +You can run the sample code locally on your machine by following these steps. Refer to the [repository on GitHub](https://github.com/azure-samples/aoai-realtime-audio-sdk) for the most up-to-date instructions. |
| 243 | +1. If you don't have Node.js installed, download and install the [LTS version of Node.js](https://nodejs.org/). |
| 244 | +
|
| 245 | +1. Clone the repository to your local machine: |
| 246 | + |
| 247 | + ```bash |
| 248 | + git clone https://github.com/Azure-Samples/aoai-realtime-audio-sdk.git |
| 249 | + ``` |
| 250 | +
|
| 251 | +1. Go to the `javascript/samples/web` folder in your preferred code editor. |
| 252 | +
|
| 253 | + ```bash |
| 254 | + cd ./javascript/samples |
| 255 | + ``` |
| 256 | +
|
| 257 | +1. Run `download-pkg.ps1` or `download-pkg.sh` to download the required packages. |
| 258 | +
|
| 259 | +1. Go to the `web` folder from the `./javascript/samples` folder. |
| 260 | +
|
| 261 | + ```bash |
| 262 | + cd ./web |
| 263 | + ``` |
| 264 | +
|
| 265 | +1. Run `npm install` to install package dependencies. |
| 266 | +
|
| 267 | +1. Run `npm run dev` to start the web server, navigating any firewall permissions prompts as needed. |
| 268 | +1. Go to any of the provided URIs from the console output (such as `http://localhost:5173/`) in a browser. |
| 269 | +1. Enter the following information in the web interface: |
| 270 | + - **Endpoint**: The resource endpoint of an Azure OpenAI resource. You don't need to append the `/realtime` path. An example structure might be `https://my-azure-openai-resource-from-portal.openai.azure.com`. |
| 271 | + - **API Key**: A corresponding API key for the Azure OpenAI resource. |
| 272 | + - **Deployment**: The name of the `gpt-4o-realtime-preview` model that [you deployed in the previous section](#deploy-a-model-for-real-time-audio). |
| 273 | + - **System Message**: Optionally, you can provide a system message such as "You always talk like a friendly pirate." |
| 274 | + - **Temperature**: Optionally, you can provide a custom temperature. |
| 275 | + - **Voice**: Optionally, you can select a voice. |
| 276 | +1. Select the **Record** button to start the session. Accept permissions to use your microphone if prompted. |
| 277 | +1. You should see a `<< Session Started >>` message in the main output. Then you can speak into the microphone to start a chat. |
| 278 | +1. You can interrupt the chat at any time by speaking. You can end the chat by selecting the **Stop** button. |
0 commit comments