Skip to content

Commit bcef902

Browse files
authored
Merge pull request #5567 from MicrosoftDocs/main
6/17/2025 AM Publish
2 parents f577e77 + 3f345e5 commit bcef902

File tree

4 files changed

+22
-17
lines changed

4 files changed

+22
-17
lines changed

articles/ai-services/openai/how-to/predicted-outputs.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ services: cognitive-services
66
manager: nitinme
77
ms.service: azure-ai-openai
88
ms.topic: how-to
9-
ms.date: 04/14/2025
9+
ms.date: 06/17/2025
1010
author: mrbullwinkle
1111
ms.author: mbullwin
1212
recommendations: false
@@ -22,10 +22,12 @@ Predicted outputs can improve model response latency for chat completions calls
2222
- `gpt-4o` version: `2024-08-06`
2323
- `gpt-4o` version: `2024-11-20`
2424
- `gpt-4.1` version: `2025-04-14`
25+
- `gpt-4.1-nano` version: `2025-04-14`
26+
- `gpt-4.1-mini` version: `2025-04-14`
2527

2628
## API support
2729

28-
- `2025-01-01-preview`
30+
First introduced in `2025-01-01-preview`. Supported in all subsequent releases.
2931

3032
## Unsupported features
3133

articles/ai-services/speech-service/includes/quickstarts/voice-live-api/realtime-python.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -159,8 +159,8 @@ For the recommended keyless authentication with Microsoft Entra ID, you need to:
159159
"remove_filler_words": False,
160160
"end_of_utterance_detection": {
161161
"model": "semantic_detection_v1",
162-
"threshold": 0.1,
163-
"timeout": 4,
162+
"threshold": 0.01,
163+
"timeout": 2,
164164
},
165165
},
166166
"input_audio_noise_reduction": {

articles/ai-services/speech-service/voice-live-how-to.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ ms.custom: references_regions
1616

1717
[!INCLUDE [Feature preview](./includes/previews/preview-generic.md)]
1818

19-
The Voice Live API provides a capable WebSocket interface compared to the [Azure OpenAI Realtime API](../openai/how-to/realtime-audio.md).
19+
The Voice Live API provides a capable WebSocket interface compared to the [Azure OpenAI Realtime API](../openai/how-to/realtime-audio.md).
2020

2121
Unless otherwise noted, the Voice Live API uses the same events as the [Azure OpenAI Realtime API](/azure/ai-services/openai/realtime-audio-reference?context=/azure/ai-services/speech-service/context/context). This document provides a reference for the event message properties that are specific to the Voice Live API.
2222

@@ -26,7 +26,7 @@ For a table of supported models and regions, see the [Voice Live API overview](.
2626

2727
## Authentication
2828

29-
An [Azure AI Foundry resource](../multi-service-resource.md) is required to access the Voice Live API.
29+
An [Azure AI Foundry resource](../multi-service-resource.md) is required to access the Voice Live API.
3030

3131
### WebSocket endpoint
3232

@@ -66,8 +66,8 @@ Here's an example `session.update` message that configures several aspects of th
6666
"remove_filler_words": false,
6767
"end_of_utterance_detection": {
6868
"model": "semantic_detection_v1",
69-
"threshold": 0.1,
70-
"timeout": 4,
69+
"threshold": 0.01,
70+
"timeout": 2,
7171
},
7272
},
7373
"input_audio_noise_reduction": {"type": "azure_deep_noise_suppression"},
@@ -84,10 +84,10 @@ The server responds with a [`session.updated`](../openai/realtime-audio-referenc
8484

8585
## Session Properties
8686

87-
The following sections describe the properties of the `session` object that can be configured in the `session.update` message.
87+
The following sections describe the properties of the `session` object that can be configured in the `session.update` message.
8888

8989
> [!TIP]
90-
> For comprehensive descriptions of supported events and properties, see the [Azure OpenAI Realtime API events reference documentation](../openai/realtime-audio-reference.md?context=/azure/ai-services/speech-service/context/context). This document provides a reference for the event message properties that are enhancements via the Voice Live API.
90+
> For comprehensive descriptions of supported events and properties, see the [Azure OpenAI Realtime API events reference documentation](../openai/realtime-audio-reference.md?context=/azure/ai-services/speech-service/context/context). This document provides a reference for the event message properties that are enhancements via the Voice Live API.
9191
9292
### Input audio properties
9393

@@ -99,7 +99,7 @@ You can use input audio properties to configure the input audio stream.
9999
| `input_audio_echo_cancellation` | object | Optional | Enhances the input audio quality by removing the echo from the model's own voice without requiring any client-side echo cancellation.<br/><br/>Set the `type` property of `input_audio_echo_cancellation` to enable echo cancellation.<br/><br/>The supported value for `type` is `server_echo_cancellation` which is used when the model's voice is played back to the end-user through a speaker, and the microphone picks up the model's own voice. |
100100
| `input_audio_noise_reduction` | object | Optional | Enhances the input audio quality by suppressing or removing environmental background noise.<br/><br/>Set the `type` property of `input_audio_noise_reduction` to enable noise suppression.<br/><br/>The supported value for `type` is `azure_deep_noise_suppression` which optimizes for speakers closest to the microphone. |
101101

102-
Here's an example of input audio properties is a session object:
102+
Here's an example of input audio properties is a session object:
103103

104104
```json
105105
{
@@ -137,15 +137,15 @@ The Voice Live API offers conversational enhancements to provide robustness to t
137137

138138
### Turn Detection Parameters
139139

140-
Turn detection is the process of detecting when the end-user started or stopped speaking. The Voice Live API builds on the Azure OpenAI Realtime API `turn_detection` property to configure turn detection. The `azure_semantic_vad` type is one differentiator between the Voice Live API and the Azure OpenAI Realtime API.
140+
Turn detection is the process of detecting when the end-user started or stopped speaking. The Voice Live API builds on the Azure OpenAI Realtime API `turn_detection` property to configure turn detection. The `azure_semantic_vad` type is one differentiator between the Voice Live API and the Azure OpenAI Realtime API.
141141

142142
| Property | Type | Required or optional | Description |
143143
|----------|----------|----------|------------|
144144
| `type` | string | Optional | The type of turn detection system to use. Type `server_vad` detects start and end of speech based on audio volume.<br/><br/>Type `azure_semantic_vad` detects start and end of speech based on semantic meaning. Azure semantic voice activity detection (VAD) improves turn detection by removing filler words to reduce the false alarm rate. The current list of filler words are `['ah', 'umm', 'mm', 'uh', 'huh', 'oh', 'yeah', 'hmm']`. The service ignores these words when there's an ongoing response. Remove feature words feature assumes the client plays response audio as soon as it receives them. The `azure_semantic_vad` type isn't supported with the `gpt-4o-realtime-preview` and `gpt-4o-mini-realtime-preview` models.<br/><br/>The default value is `server_vad`. |
145145
| `threshold` | number | Optional | A higher threshold requires a higher confidence signal of the user trying to speak. |
146146
| `prefix_padding_ms` | integer | Optional | The amount of audio, measured in milliseconds, to include before the start of speech detection signal. |
147147
| `silence_duration_ms` | integer | Optional | The duration of user's silence, measured in milliseconds, to detect the end of speech. |
148-
| `end_of_utterance_detection` | object | Optional | Configuration for end of utterance detection. The Voice Live API offers advanced end-of-turn detection to indicate when the end-user stopped speaking while allowing for natural pauses. End of utterance detection can significantly reduce premature end-of-turn signals without adding user-perceivable latency. End of utterance detection is only available when using `azure_semantic_vad`.<br/><br/>Properties of `end_of_utterance_detection` include:<br/>-`model`: The model to use for end of utterance detection. The supported value is `semantic_detection_v1`.<br/>- `threshold`: Threshold to determine the end of utterance (0.0 to 1.0). The default value is 0.1.<br/>- `timeout`: Timeout in seconds. The default value is 4 seconds.|
148+
| `end_of_utterance_detection` | object | Optional | Configuration for end of utterance detection. The Voice Live API offers advanced end-of-turn detection to indicate when the end-user stopped speaking while allowing for natural pauses. End of utterance detection can significantly reduce premature end-of-turn signals without adding user-perceivable latency. End of utterance detection is only available when using `azure_semantic_vad`.<br/><br/>Properties of `end_of_utterance_detection` include:<br/>-`model`: The model to use for end of utterance detection. The supported value is `semantic_detection_v1`.<br/>- `threshold`: Threshold to determine the end of utterance (0.0 to 1.0). The default value is 0.01.<br/>- `timeout`: Timeout in seconds. The default value is 2 seconds.|
149149

150150
Here's an example of end of utterance detection in a session object:
151151

@@ -160,8 +160,8 @@ Here's an example of end of utterance detection in a session object:
160160
"remove_filler_words": false,
161161
"end_of_utterance_detection": {
162162
"model": "semantic_detection_v1",
163-
"threshold": 0.1,
164-
"timeout": 4
163+
"threshold": 0.01,
164+
"timeout": 2
165165
}
166166
}
167167
}
@@ -170,7 +170,7 @@ Here's an example of end of utterance detection in a session object:
170170

171171
### Audio output through Azure text to speech
172172

173-
You can use the `voice` parameter to specify a standard or custom voice. The voice is used for audio output.
173+
You can use the `voice` parameter to specify a standard or custom voice. The voice is used for audio output.
174174

175175
The `voice` object has the following properties:
176176

@@ -357,7 +357,7 @@ To configure the viseme, you can set the `animation.outputs` in the `session.upd
357357
}
358358
```
359359

360-
The `output_audio_timestamp_types` parameter is optional. It configures which audio timestamps should be returned for generated audio. Currently, it only supports `word`.
360+
The `output_audio_timestamp_types` parameter is optional. It configures which audio timestamps should be returned for generated audio. Currently, it only supports `word`.
361361

362362
The service returns the viseme alignment in the response when the audio is generated.
363363

articles/machine-learning/toc.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -514,6 +514,9 @@ items:
514514
- name: Configure & submit training run
515515
displayName: run config, script run config, scriptrunconfig, compute target, dsvm, Data Science Virtual Machine, local, cluster, ACI, container instance, Databricks, data lake, lake, HDI, HDInsight
516516
href: ./v1/how-to-set-up-training-targets.md
517+
- name: Log metrics, parameters, and files
518+
displayName: troubleshoot, log, files, tracing, metrics
519+
href: ./v1/how-to-log-view-metrics.md
517520
# end v1
518521
- name: Training with CLI and SDK
519522
href: how-to-train-model.md

0 commit comments

Comments
 (0)