You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Voice live provides multiple options to optimize performance and quality by using custom models. The following customization options are currently available:
20
20
21
21
- Speech input customization:
22
-
- Phrase-list: A lightweight just-in-time customization based on a list of words or phrases provided as part of the session configuration to help improve recognition quality. See [Improve recognition accuracy with phrase list](./improve-accuracy-phrase-list) to learn more.
23
-
- Custom Speech: With custom speech, you can evaluate and improve the accuracy of speech recognition for your applications and products and fine-tune the recognition quality to your business needs. See [What is custom speech?](./custom-speech-overview) to learn more.
22
+
- Phrase-list: A lightweight just-in-time customization based on a list of words or phrases provided as part of the session configuration to help improve recognition quality. See [Improve recognition accuracy with phrase list](./improve-accuracy-phrase-list.md) to learn more.
23
+
- Custom Speech: With custom speech, you can evaluate and improve the accuracy of speech recognition for your applications and products and fine-tune the recognition quality to your business needs. See [What is custom speech?](./custom-speech-overview.md) to learn more.
24
24
- Speech output customization:
25
-
- Custom lexicon: Custom lexicon allows you to easily customize pronunciation for both standard Azure text to speech voices and custom voices to improve speech synthesis accuracy for your use case. See [custom lexicon for text to speech](./speech-synthesis-markup-pronunciation.md#custom-lexicon) to learn more.
26
-
- Custom voice: Custom voice lets you create a one-of-a-kind, customized, synthetic voice for your applications. With custom voice, you can build a highly natural-sounding voice for your brand or characters by providing human speech samples as fine-tuning data. See [What is custom voice?](./custom-neural-voice) to learn more.
27
-
- Custom avatar: Custom text to speech avatar allows you to create a customized, one-of-a-kind synthetic talking avatar for your application. With custom text to speech avatar, you can build a unique and natural-looking avatar for your product or brand by providing video recording data of your selected actors. See [What is custom text to speech avatar?](./text-to-speech-avatar/what-is-custom-text-to-speech-avatar) to learn more.
25
+
- Custom lexicon: Custom lexicon allows you to easily customize pronunciation for both standard Azure text to speech voices and custom voices to improve speech synthesis accuracy for your use case. See [custom lexicon for text to speech](./speech-synthesis-markup-pronunciation.md#custom-lexicon.md) to learn more.
26
+
- Custom voice: Custom voice lets you create a one-of-a-kind, customized, synthetic voice for your applications. With custom voice, you can build a highly natural-sounding voice for your brand or characters by providing human speech samples as fine-tuning data. See [What is custom voice?](./custom-neural-voice.md) to learn more.
27
+
- Custom avatar: Custom text to speech avatar allows you to create a customized, one-of-a-kind synthetic talking avatar for your application. With custom text to speech avatar, you can build a unique and natural-looking avatar for your product or brand by providing video recording data of your selected actors. See [What is custom text to speech avatar?](./text-to-speech-avatar/what-is-custom-text-to-speech-avatar.md) to learn more.
28
28
29
29
## Speech input customization
30
30
@@ -48,7 +48,7 @@ Use phrase list for lightweight just-in-time customization on audio input. To co
48
48
49
49
### Custom speech configuration
50
50
51
-
You can use the custom_speech field to specify your custom speech models. This field is defined as a dictionary, where each key represents a locale code and each value corresponds to the `Model ID` of the custom speech model. For more information about custom speech, please see [What is custom speech?](./custom-speech-overview).
51
+
You can use the custom_speech field to specify your custom speech models. This field is defined as a dictionary, where each key represents a locale code and each value corresponds to the `Model ID` of the custom speech model. For more information about custom speech, please see [What is custom speech?](./custom-speech-overview.md).
52
52
53
53
Voice live supports using a combination of base models and custom models as long as each type is unique per locale with a maximum of 10 languages specified in total.
54
54
@@ -76,7 +76,7 @@ Example session configuration with custom speech models. In this case, if the de
76
76
77
77
### Custom lexicon
78
78
79
-
Use the `custom_lexicon_url` string property to customize pronunciation for both standard Azure text to speech voices and custom voices. To learn more about how to format the custom lexicon (the same as Speech Synthesis Markup Language (SSML)), see [custom lexicon for text to speech](./speech-synthesis-markup-pronunciation.md#custom-lexicon).
79
+
Use the `custom_lexicon_url` string property to customize pronunciation for both standard Azure text to speech voices and custom voices. To learn more about how to format the custom lexicon (the same as Speech Synthesis Markup Language (SSML)), see [custom lexicon for text to speech](./speech-synthesis-markup-pronunciation.md#custom-lexicon.md).
80
80
81
81
```json
82
82
{
@@ -112,7 +112,7 @@ You can use a custom voice for audio output. For information about how to create
112
112
113
113
[Text to speech avatar](./text-to-speech-avatar/what-is-text-to-speech-avatar.md) converts text into a digital video of a photorealistic human (either a standard avatar or a [custom text to speech avatar](./text-to-speech-avatar/what-is-custom-text-to-speech-avatar.md)) speaking with a natural-sounding voice.
114
114
115
-
The configuration for a custom avatar does not differ from the configuration of a standard avatar. Please refer to [How to use the voice live API - Azure text to speech avatar](./voice-live-how-to#azure-text-to-speech-avatar) for a detailed example.
115
+
The configuration for a custom avatar does not differ from the configuration of a standard avatar. Please refer to [How to use the voice live API - Azure text to speech avatar](./voice-live-how-to.md#azure-text-to-speech-avatar) for a detailed example.
116
116
117
117
> [!NOTE]
118
118
> In order to use a custom voice model with voice live API, the model must be available on the same Azure AI Foundry resource you are using to call the voice live API. If you trained the model on a different Azure AI Foundry or Azure AI Speech resource you have to copy the model to the resource you are using to call the voice live API.
@@ -122,4 +122,4 @@ The configuration for a custom avatar does not differ from the configuration of
122
122
## Related content
123
123
124
124
- Try out the [voice live API quickstart](./voice-live-quickstart.md)
125
-
- See the [audio events reference](/azure/ai-foundry/openai/realtime-audio-reference?context=/azure/ai-services/speech-service/context/context)
125
+
- See the [audio events reference](/azure/ai-foundry/openai/realtime-audio-reference.md?context=/azure/ai-services/speech-service/context/context)
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/voice-live-how-to.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -168,7 +168,7 @@ Here's an example of end of utterance detection in a session object:
168
168
169
169
Azure speech to text will automatically be active when you are using a non-multimodal model like gpt-4o.
170
170
171
-
In order to explicitly configure it you can set the `model` to `azure-speech` in `input_audio_transcription`. This can be useful to improve the recognition quality for specific language situations. See [How to customize voice live input and output](./voice-live-how-to-customize) learn more about speech input customization configuration.
171
+
In order to explicitly configure it you can set the `model` to `azure-speech` in `input_audio_transcription`. This can be useful to improve the recognition quality for specific language situations. See [How to customize voice live input and output](./voice-live-how-to-customize.md) learn more about speech input customization configuration.
172
172
173
173
```json
174
174
{
@@ -193,7 +193,7 @@ The `voice` object has the following properties:
193
193
|`type`| string | Required | Configuration of the type of Azure voice between `azure-standard` and `azure-custom`. |
194
194
|`temperature`| number | Optional | Specifies temperature applicable to Azure HD voices. Higher values provide higher levels of variability in intonation, prosody, etc. |
195
195
196
-
See [How to customize voice live input and output](./voice-live-how-to-customize) learn more about speech input customization configuration.
196
+
See [How to customize voice live input and output](./voice-live-how-to-customize.md) learn more about speech input customization configuration.
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/voice-live.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,7 +41,7 @@ Azure AI voice live API is ideal for scenarios where voice-driven interactions i
41
41
The voice live API includes a comprehensive set of features to support diverse use cases and ensure superior voice interactions:
42
42
43
43
-**Broad locale coverage**: Supports over 15 locales for speech to text and offers over 600 standard voices across 140+ locales for text to speech, ensuring global accessibility.
44
-
-**Customizable input and output**: Use phrase list for lightweight just-in-time customization on audio input or custom speech models for advanced speech recognizion fine-tuning. Use custom voice to create unique, brand-aligned voices for audio output. See [How to customize voice live input and output](./voice-live-how-to-customize) to learn more.
44
+
-**Customizable input and output**: Use phrase list for lightweight just-in-time customization on audio input or custom speech models for advanced speech recognizion fine-tuning. Use custom voice to create unique, brand-aligned voices for audio output. See [How to customize voice live input and output](./voice-live-how-to-customize.md) to learn more.
45
45
-**Flexible generative AI model options**: [Choose from multiple models](#supported-models-and-regions), including GPT-5, GPT-4.1, GPT-4o, Phi, and more tailored to conversational requirements.
46
46
-**Advanced conversational features**:
47
47
- Noise suppression: Reduces environmental noise for clearer communication.
0 commit comments