Skip to content

Commit 2c4f33c

Browse files
committed
Fixed voice live API naming consistency
1 parent ea9a48f commit 2c4f33c

File tree

3 files changed

+55
-55
lines changed

3 files changed

+55
-55
lines changed

articles/ai-services/speech-service/voice-live-how-to.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,43 @@
11
---
2-
title: How to use the Voice Live API (Preview)
2+
title: How to use the voice live API (Preview)
33
titleSuffix: Azure AI services
4-
description: Learn how to use the Voice Live API for real-time voice agents.
4+
description: Learn how to use the voice live API for real-time voice agents.
55
manager: nitinme
66
author: eric-urban
77
ms.author: eur
88
ms.service: azure-ai-speech
99
ms.topic: how-to
1010
ms.date: 7/1/2025
1111
ms.custom: references_regions
12-
# Customer intent: As a developer, I want to learn how to use the Voice Live API for real-time voice agents.
12+
# Customer intent: As a developer, I want to learn how to use the voice live API for real-time voice agents.
1313
---
1414

15-
# How to use the Voice Live API (Preview)
15+
# How to use the voice live API (Preview)
1616

1717
[!INCLUDE [Feature preview](./includes/previews/preview-generic.md)]
1818

19-
The Voice Live API provides a capable WebSocket interface compared to the [Azure OpenAI Realtime API](../../ai-foundry/openai/how-to/realtime-audio.md).
19+
The voice live API provides a capable WebSocket interface compared to the [Azure OpenAI Realtime API](../../ai-foundry/openai/how-to/realtime-audio.md).
2020

21-
Unless otherwise noted, the Voice Live API uses the same events as the [Azure OpenAI Realtime API](/azure/ai-services/openai/realtime-audio-reference?context=/azure/ai-services/speech-service/context/context). This document provides a reference for the event message properties that are specific to the Voice Live API.
21+
Unless otherwise noted, the voice live API uses the same events as the [Azure OpenAI Realtime API](/azure/ai-services/openai/realtime-audio-reference?context=/azure/ai-services/speech-service/context/context). This document provides a reference for the event message properties that are specific to the voice live API.
2222

2323
## Supported models and regions
2424

25-
For a table of supported models and regions, see the [Voice Live API overview](./voice-live.md#supported-models-and-regions).
25+
For a table of supported models and regions, see the [voice live API overview](./voice-live.md#supported-models-and-regions).
2626

2727
## Authentication
2828

29-
An [Azure AI Foundry resource](../multi-service-resource.md) is required to access the Voice Live API.
29+
An [Azure AI Foundry resource](../multi-service-resource.md) is required to access the voice live API.
3030

3131
### WebSocket endpoint
3232

33-
The WebSocket endpoint for the Voice Live API is `wss://<your-ai-foundry-resource-name>.cognitiveservices.azure.com/voice-live/realtime?api-version=2025-05-01-preview`.
33+
The WebSocket endpoint for the voice live API is `wss://<your-ai-foundry-resource-name>.cognitiveservices.azure.com/voice-live/realtime?api-version=2025-05-01-preview`.
3434
The endpoint is the same for all models. The only difference is the required `model` query parameter.
3535

3636
For example, an endpoint for a resource with a custom domain would be `wss://<your-ai-foundry-resource-name>.cognitiveservices.azure.com/voice-live/realtime?api-version=2025-05-01-preview&model=gpt-4o-mini-realtime-preview`
3737

3838
### Credentials
3939

40-
The Voice Live API supports two authentication methods:
40+
The voice live API supports two authentication methods:
4141

4242
- **Microsoft Entra** (recommended): Use token-based authentication for an Azure AI Foundry resource. Apply a retrieved authentication token using a `Bearer` token with the `Authorization` header.
4343
- **API key**: An `api-key` can be provided in one of two ways:
@@ -52,7 +52,7 @@ For the recommended keyless authentication with Microsoft Entra ID, you need to:
5252

5353
## Session configuration
5454

55-
Often, the first event sent by the caller on a newly established Voice Live API session is the [`session.update`](../openai/realtime-audio-reference.md?context=/azure/ai-services/speech-service/context/context#realtimeclienteventsessionupdate) event. This event controls a wide set of input and output behavior, with output and response generation properties then later overridable using the [`response.create`](../openai/realtime-audio-reference.md?context=/azure/ai-services/speech-service/context/context#realtimeclienteventresponsecreate) event.
55+
Often, the first event sent by the caller on a newly established voice live API session is the [`session.update`](../openai/realtime-audio-reference.md?context=/azure/ai-services/speech-service/context/context#realtimeclienteventsessionupdate) event. This event controls a wide set of input and output behavior, with output and response generation properties then later overridable using the [`response.create`](../openai/realtime-audio-reference.md?context=/azure/ai-services/speech-service/context/context#realtimeclienteventresponsecreate) event.
5656

5757
Here's an example `session.update` message that configures several aspects of the session, including turn detection, input audio processing, and voice output. Most session parameters are optional and can be omitted if not needed.
5858

@@ -88,7 +88,7 @@ The server responds with a [`session.updated`](../openai/realtime-audio-referenc
8888
The following sections describe the properties of the `session` object that can be configured in the `session.update` message.
8989

9090
> [!TIP]
91-
> For comprehensive descriptions of supported events and properties, see the [Azure OpenAI Realtime API events reference documentation](../openai/realtime-audio-reference.md?context=/azure/ai-services/speech-service/context/context). This document provides a reference for the event message properties that are enhancements via the Voice Live API.
91+
> For comprehensive descriptions of supported events and properties, see the [Azure OpenAI Realtime API events reference documentation](../openai/realtime-audio-reference.md?context=/azure/ai-services/speech-service/context/context). This document provides a reference for the event message properties that are enhancements via the voice live API.
9292
9393
### Input audio properties
9494

@@ -134,19 +134,19 @@ Server echo cancellation enhances the input audio quality by removing the echo f
134134

135135
## Conversational enhancements
136136

137-
The Voice Live API offers conversational enhancements to provide robustness to the natural end-user conversation flow.
137+
The voice live API offers conversational enhancements to provide robustness to the natural end-user conversation flow.
138138

139139
### Turn Detection Parameters
140140

141-
Turn detection is the process of detecting when the end-user started or stopped speaking. The Voice Live API builds on the Azure OpenAI Realtime API `turn_detection` property to configure turn detection. The `azure_semantic_vad` type is one differentiator between the Voice Live API and the Azure OpenAI Realtime API.
141+
Turn detection is the process of detecting when the end-user started or stopped speaking. The voice live API builds on the Azure OpenAI Realtime API `turn_detection` property to configure turn detection. The `azure_semantic_vad` type is one differentiator between the voice live API and the Azure OpenAI Realtime API.
142142

143143
| Property | Type | Required or optional | Description |
144144
|----------|----------|----------|------------|
145145
| `type` | string | Optional | The type of turn detection system to use. Type `server_vad` detects start and end of speech based on audio volume.<br/><br/>Type `azure_semantic_vad` detects start and end of speech based on semantic meaning. Azure semantic voice activity detection (VAD) improves turn detection by removing filler words to reduce the false alarm rate. The current list of filler words are `['ah', 'umm', 'mm', 'uh', 'huh', 'oh', 'yeah', 'hmm']`. The service ignores these words when there's an ongoing response. Remove feature words feature assumes the client plays response audio as soon as it receives them.<br/><br/>The default value is `server_vad`. |
146146
| `threshold` | number | Optional | A higher threshold requires a higher confidence signal of the user trying to speak. |
147147
| `prefix_padding_ms` | integer | Optional | The amount of audio, measured in milliseconds, to include before the start of speech detection signal. |
148148
| `silence_duration_ms` | integer | Optional | The duration of user's silence, measured in milliseconds, to detect the end of speech. |
149-
| `end_of_utterance_detection` | object | Optional | Configuration for end of utterance detection. The Voice Live API offers advanced end-of-turn detection to indicate when the end-user stopped speaking while allowing for natural pauses. End of utterance detection can significantly reduce premature end-of-turn signals without adding user-perceivable latency. End of utterance detection is only available when using `azure_semantic_vad`.<br/><br/>Properties of `end_of_utterance_detection` include:<br/>-`model`: The model to use for end of utterance detection. The supported value is `semantic_detection_v1`.<br/>- `threshold`: Threshold to determine the end of utterance (0.0 to 1.0). The default value is 0.01.<br/>- `timeout`: Timeout in seconds. The default value is 2 seconds.|
149+
| `end_of_utterance_detection` | object | Optional | Configuration for end of utterance detection. The voice live API offers advanced end-of-turn detection to indicate when the end-user stopped speaking while allowing for natural pauses. End of utterance detection can significantly reduce premature end-of-turn signals without adding user-perceivable latency. End of utterance detection is only available when using `azure_semantic_vad`.<br/><br/>Properties of `end_of_utterance_detection` include:<br/>-`model`: The model to use for end of utterance detection. The supported value is `semantic_detection_v1`.<br/>- `threshold`: Threshold to determine the end of utterance (0.0 to 1.0). The default value is 0.01.<br/>- `timeout`: Timeout in seconds. The default value is 2 seconds.|
150150

151151
Here's an example of end of utterance detection in a session object:
152152

@@ -445,5 +445,5 @@ Then you can connect the avatar with the server SDP.
445445

446446
## Related content
447447

448-
- Try out the [Voice Live API quickstart](./voice-live-quickstart.md)
448+
- Try out the [voice live API quickstart](./voice-live-quickstart.md)
449449
- See the [audio events reference](/azure/ai-services/openai/realtime-audio-reference?context=/azure/ai-services/speech-service/context/context)

articles/ai-services/speech-service/voice-live-quickstart.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: 'How to use Voice Live API for real-time voice agents with Azure AI Speech'
2+
title: 'How to use voice live API for real-time voice agents with Azure AI Speech'
33
titleSuffix: Azure AI services
4-
description: Learn how to use Voice Live API for real-time voice agents with Azure AI Speech.
4+
description: Learn how to use voice live API for real-time voice agents with Azure AI Speech.
55
manager: nitinme
66
ms.service: azure-ai-openai
77
ms.topic: how-to
@@ -13,7 +13,7 @@ ms.custom: build-2025
1313
recommendations: false
1414
---
1515

16-
# Quickstart: Voice Live API for real-time voice agents (Preview)
16+
# Quickstart: Voice live API for real-time voice agents (Preview)
1717

1818
[!INCLUDE [Feature preview](./includes/previews/preview-generic.md)]
1919

@@ -27,5 +27,5 @@ recommendations: false
2727

2828
## Related content
2929

30-
- Learn more about [How to use the Voice Live API](./voice-live-how-to.md)
30+
- Learn more about [How to use the voice live API](./voice-live-how-to.md)
3131
- See the [audio events reference](/azure/ai-services/openai/realtime-audio-reference?context=/azure/ai-services/speech-service/context/context)

0 commit comments

Comments
 (0)