Skip to content

Commit 5bc558c

Browse files
authored
Merge pull request #6328 from eric-urban/eur/speech-updates
speech updates for voice live and freshness
2 parents 3fe9473 + 04442a9 commit 5bc558c

File tree

3 files changed

+13
-18
lines changed

3 files changed

+13
-18
lines changed

articles/ai-services/includes/quickstarts/management-azportal.md

Lines changed: 7 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Get started with Azure AI services by creating an AI Foundry resour
55
author: eric-urban
66
ms.author: eur
77
manager: nitinme
8-
ms.date: 8/1/2024
8+
ms.date: 7/31/2025
99
ms.service: azure-ai-services
1010
ms.topic: quickstart
1111
ms.custom:
@@ -24,24 +24,18 @@ keywords:
2424

2525
## Create a new Azure AI Foundry resource
2626

27-
[Azure AI Foundry portal](https://ai.azure.com/?cid=learnDocs) provides a way to create a new Azure resource with basic, defaulted, settings. If your organization requires customized Azure configurations like alternative names, security controls or cost tags, you may need to instead use [Azure portal](https://portal.azure.com) or [template options](../../../ai-foundry/how-to/create-resource-template.md) to comply with your organization's Azure Policy compliance.
27+
If your organization requires customized Azure configurations like alternative names, security controls or cost tags, you might need to use the [Azure portal](https://portal.azure.com) or [template options](../../../ai-foundry/how-to/create-resource-template.md) to comply with your organization's Azure Policy compliance.
2828

29-
The Azure AI Foundry multi-service resource is listed under **AI Foundry** > **AI Foundry** in the portal. The API kind is **AIServices**. Look for the logo as shown here:
29+
The Azure AI Foundry multi-service resource is listed under **AI Foundry** > **AI Foundry** in the Azure portal. The API kind is **AIServices**. Look for the logo as shown here:
3030

3131
:::image type="content" source="../../media/ai-services-resource-portal.png" alt-text="Screenshot of the Azure AI Foundry resource in the Azure portal." lightbox="../../media/ai-services-resource-portal.png":::
3232

33-
> [!IMPORTANT]
34-
> Azure provides more than one resource kinds named Azure AI services. Be sure to select the one that is listed under **AI Foundry** > **AI Foundry** with the logo as shown previously.
35-
36-
To create an AI Foundry resource follow these instructions:
37-
3833
> [!TIP]
39-
> If you need to create an [!INCLUDE [fdp](../../../ai-foundry/includes/fdp-project-name.md)] or [!INCLUDE [hub](../../../ai-foundry/includes/hub-project-name.md)] resource, you can also use the [Azure Foundry portal](https://ai.azure.com/?cid=learnDocs) to create the resource. For more information, see the following articles:
40-
>
41-
> - [Create an Azure AI Foundry project](/azure/ai-foundry/how-to/create-projects?tabs=ai-foundry&pivots=fdp-project).
42-
> - [Create an Azure AI hub based project](/azure/ai-foundry/how-to/create-projects?tabs=ai-foundry&pivots=hub-project).
34+
> [Azure AI Foundry portal](https://ai.azure.com/?cid=learnDocs) provides a way to [create a new Azure AI Foundry resource](/azure/ai-foundry/how-to/create-projects?tabs=ai-foundry&pivots=fdp-project) with basic, defaulted, settings.
35+
36+
To create an AI Foundry resource in the Azure portal follow these instructions:
4337

44-
1. Select this link to create an **AI Foundry** resource: [https://portal.azure.com/#create/Microsoft.CognitiveServicesAIFoundry](https://portal.azure.com/#create/Microsoft.CognitiveServicesAIFoundry)
38+
1. Select this **AI Foundry** resource link: [https://portal.azure.com/#create/Microsoft.CognitiveServicesAIFoundry](https://portal.azure.com/#create/Microsoft.CognitiveServicesAIFoundry)
4539

4640
1. On the **Create** page, provide the following information:
4741

articles/ai-services/speech-service/text-to-speech-avatar/what-is-text-to-speech-avatar.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,14 @@ description: Get an overview of the Text to speech avatar feature of speech serv
55
manager: nitinme
66
ms.service: azure-ai-speech
77
ms.topic: overview
8-
ms.date: 4/28/2025
8+
ms.date: 7/31/2025
99
ms.reviewer: eur
1010
ms.author: eur
1111
author: eric-urban
1212
ms.custom: references_regions
1313
---
1414

15-
# Text to speech avatar overview
15+
# What is Text to speech avatar?
1616

1717
Text to speech avatar converts text into a digital video of a photorealistic human (either a standard avatar or a [custom text to speech avatar](#custom-text-to-speech-avatar)) speaking with a natural-sounding voice. The text to speech avatar video can be synthesized asynchronously or in real time. Developers can build applications integrated with text to speech avatar through an API, or use a content creation tool on Speech Studio to create video content without coding.
1818

articles/ai-services/speech-service/voice-live-how-to.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: eric-urban
77
ms.author: eur
88
ms.service: azure-ai-speech
99
ms.topic: how-to
10-
ms.date: 7/1/2025
10+
ms.date: 7/31/2025
1111
ms.custom: references_regions
1212
# Customer intent: As a developer, I want to learn how to use the voice live API for real-time voice agents.
1313
---
@@ -98,7 +98,7 @@ You can use input audio properties to configure the input audio stream.
9898
|----------|----------|----------|------------|
9999
| `input_audio_sampling_rate` | integer | Optional | The sampling rate of the input audio.<br/><br/>The supported values are `16000` and `24000`. The default value is `24000`. |
100100
| `input_audio_echo_cancellation` | object | Optional | Enhances the input audio quality by removing the echo from the model's own voice without requiring any client-side echo cancellation.<br/><br/>Set the `type` property of `input_audio_echo_cancellation` to enable echo cancellation.<br/><br/>The supported value for `type` is `server_echo_cancellation`, which is used when the model's voice is played back to the end-user through a speaker, and the microphone picks up the model's own voice. |
101-
| `input_audio_noise_reduction` | object | Optional | Enhances the input audio quality by suppressing or removing environmental background noise.<br/><br/>Set the `type` property of `input_audio_noise_reduction` to enable noise suppression.<br/><br/>The supported value for `type` is `azure_deep_noise_suppression`, which optimizes for speakers closest to the microphone. |
101+
| `input_audio_noise_reduction` | object | Optional | Enhances the input audio quality by suppressing or removing environmental background noise.<br/><br/>Set the `type` property of `input_audio_noise_reduction` to enable noise suppression.<br/><br/>The supported value for `type` is `azure_deep_noise_suppression`, which optimizes for speakers closest to the microphone.<br/><br/>You can set this property to `near_field` or `far_field` if you're using the [Azure OpenAI Realtime API](../../ai-foundry/openai/realtime-audio-reference.md#realtimeaudioinputaudionoisereductionsettings). |
102102

103103
Here's an example of input audio properties is a session object:
104104

@@ -142,10 +142,11 @@ Turn detection is the process of detecting when the end-user started or stopped
142142

143143
| Property | Type | Required or optional | Description |
144144
|----------|----------|----------|------------|
145-
| `type` | string | Optional | The type of turn detection system to use. Type `server_vad` detects start and end of speech based on audio volume.<br/><br/>Type `azure_semantic_vad` detects start and end of speech based on semantic meaning. Azure semantic voice activity detection (VAD) improves turn detection by removing filler words to reduce the false alarm rate. The current list of filler words are `['ah', 'umm', 'mm', 'uh', 'huh', 'oh', 'yeah', 'hmm']`. The service ignores these words when there's an ongoing response. Remove feature words feature assumes the client plays response audio as soon as it receives them.<br/><br/>The default value is `server_vad`. |
145+
| `type` | string | Optional | The type of turn detection system to use. Type `server_vad` detects start and end of speech based on audio volume.<br/><br/>Type `azure_semantic_vad` detects start and end of speech based on semantic meaning. Azure semantic voice activity detection (VAD) improves turn detection by removing filler words to reduce the false alarm rate. The `remove_filler_words` property must be set to `true`. The current list of filler words are `['ah', 'umm', 'mm', 'uh', 'huh', 'oh', 'yeah', 'hmm']`. The service ignores these words when there's an ongoing response. Remove feature words feature assumes the client plays response audio as soon as it receives them.<br/><br/>The default value is `server_vad`. |
146146
| `threshold` | number | Optional | A higher threshold requires a higher confidence signal of the user trying to speak. |
147147
| `prefix_padding_ms` | integer | Optional | The amount of audio, measured in milliseconds, to include before the start of speech detection signal. |
148148
| `silence_duration_ms` | integer | Optional | The duration of user's silence, measured in milliseconds, to detect the end of speech. |
149+
| `remove_filler_words` | boolean | Determines whether to remove filler words to reduce the false alarm rate. This property must be set to `true` when using `azure_semantic_vad`.<br/><br/>The default value is `false`. |
149150
| `end_of_utterance_detection` | object | Optional | Configuration for end of utterance detection. The voice live API offers advanced end-of-turn detection to indicate when the end-user stopped speaking while allowing for natural pauses. End of utterance detection can significantly reduce premature end-of-turn signals without adding user-perceivable latency. End of utterance detection is only available when using `azure_semantic_vad`.<br/><br/>Properties of `end_of_utterance_detection` include:<br/>-`model`: The model to use for end of utterance detection. The supported value is `semantic_detection_v1`.<br/>- `threshold`: Threshold to determine the end of utterance (0.0 to 1.0). The default value is 0.01.<br/>- `timeout`: Timeout in seconds. The default value is 2 seconds.|
150151

151152
Here's an example of end of utterance detection in a session object:

0 commit comments

Comments
 (0)