Skip to content

Commit 47b847c

Browse files
authored
Merge pull request #7402 from goergenj/main
Add Voice Live FAQ
2 parents 708ffc4 + 0b1ba1d commit 47b847c

File tree

2 files changed

+181
-0
lines changed

2 files changed

+181
-0
lines changed

articles/ai-services/speech-service/toc.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -254,6 +254,9 @@ items:
254254
href: voice-live-how-to.md
255255
- name: How to customize voice live input and output
256256
href: voice-live-how-to-customize.md
257+
- name: Voice live FAQ
258+
href: voice-live-faq.yml
259+
displayName: FAQ,frequently asked questions
257260
- name: Reference
258261
items:
259262
- name: Voice live API reference
Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
### YamlMime:FAQ
2+
metadata:
3+
title: Voice live frequently asked questions (FAQ)
4+
titleSuffix: Azure AI services
5+
description: Get answers to frequently asked questions about the Voice live API in Azure AI Speech.
6+
author: goergenj
7+
reviewers: pafarley
8+
manager: nitinme
9+
ms.service: azure-ai-speech
10+
ms.topic: faq
11+
ms.date: 09/30/2025
12+
ms.author: jagoerge
13+
ms.reviewer: pafarley
14+
title: Voice live FAQ
15+
summary: |
16+
This article answers commonly asked questions about the Voice live API. If you can't find answers to your questions here, check out [other support options](../cognitive-services-support-options.md?context=%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext%253fcontext%253d%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext).
17+
18+
sections:
19+
- name: General
20+
questions:
21+
- question: |
22+
What scenarios does Voice live support?
23+
answer: |
24+
Voice live API supports a wide range of real-time, natural voice interaction scenarios: contact centers, automotive assistants, accessibility applications, virtual tutors and learning companions, multilingual public service agents, HR support, and training. Used by customers like eClinicalWorks and the Government of Malta.
25+
- question: |
26+
How does Voice live compare to AOAI Realtime API? When should I choose which?
27+
answer: |
28+
Voice live API enhances AOAI Realtime API by offering: expanded model selection (including GPT-Realtime, GPT-5, GPT-4.1, PHI), more natural voice options, more supported speech languages, avatar integration, advanced semantic voice activity detection (VAD), seamless Azure AI Foundry Agent Service integration, telephony integration via Azure Communication Services.
29+
- question: |
30+
What regions does Voice live support?
31+
answer: |
32+
Voice live is available in 10+ Azure regions. For more information, see [Region support](./regions.md?tabs=voice-live).
33+
- question: |
34+
What is the tokens-per-minute threshold?
35+
answer: |
36+
The current limit is 100,000 tokens per minute per resource. Customers can request an increase. For more information, see [Speech service quotas and limits](./speech-services-quotas-and-limits.md).
37+
- name: Generative AI Models
38+
questions:
39+
- question: |
40+
What generative AI models are supported?
41+
answer: |
42+
Supports OpenAI models in Azure AI Foundry, Phi-based LLMs, and SLMs. For more information, see [Voice live overview](./voice-live.md). Voice live also provides an option to bring-your-own model (PREVIEW).
43+
- question: |
44+
How do I choose the LLM model for my use case?
45+
answer: |
46+
Consider: accuracy (Azure Speech-based models are more robust for noisy audio), existing LLM solutions (reuse prompts and grounding data), latency (text-based LLMs can have slightly higher latency), inference cost (smaller models can be more cost-effective).
47+
- question: |
48+
What is response instruction?
49+
answer: |
50+
Guides model behavior and context. Define agent personality, specify questions, control response formatting. Responses should be concise and normalized for optimal audio synthesis.
51+
- question: |
52+
What is response temperature?
53+
answer: |
54+
Controls randomness of output. Lower values = deterministic, higher = creative. Adjust temperature or Top-P, not both.
55+
- name: Speech Input
56+
questions:
57+
- question: |
58+
What languages does Voice live support?
59+
answer: |
60+
Supports 146 languages/locales for input, 151 for output, 600+ neural voices. See [Voice live language support](./voice-live-language-support.md?tabs=speechinput).
61+
- question: |
62+
How do I get the live transcripts from the call?
63+
answer: |
64+
Use text output events. Details at [Voice live API reference](./voice-live-api-reference.md).
65+
- question: |
66+
What is a phrase list?
67+
answer: |
68+
Domain-specific terms to improve recognition. Limit to <500 words/phrases. See [How to customize Voice live](./voice-live-how-to-customize.md).
69+
- question: |
70+
Are there other ways to improve speech input recognition accuracy?
71+
answer: |
72+
Use Azure AI Custom Speech models. Configure multiple custom models per language. See [How to customize Voice live](./voice-live-how-to-customize.md).
73+
- name: Speech Output
74+
questions:
75+
- question: |
76+
What voices does Voice live support?
77+
answer: |
78+
Native audio output with preferred model, Azure AI Speech TTS voices (600+ voices, 150+ locales, 30+ Neural HD voices). Custom voice models via Professional Voice Fine-tuning. For more information, see [Voice live API supported languages](./voice-live-language-support.md?tabs=speechoutput).
79+
- question: |
80+
How do I pick a voice?
81+
answer: |
82+
Use [Voice Gallery](https://speech.microsoft.com/portal/voicegallery) in Azure AI Foundry Speech Playground. Consider gender, age, capability, style, personality.
83+
- question: |
84+
What is voice temperature?
85+
answer: |
86+
Controls expressiveness. Higher = dynamic/emotive, lower = neutral. Applies to Neural HD voices.
87+
- question: |
88+
What is speaking rate?
89+
answer: |
90+
Controls agent's speech speed.
91+
- question: |
92+
What is a custom lexicon?
93+
answer: |
94+
Define pronunciation rules for specific words. See [How to customize Voice live](./voice-live-how-to-customize.md#speech-output-customization).
95+
- question: |
96+
What is Custom Voice?
97+
answer: |
98+
Create brand-specific synthetic voices using your own audio data. See [How to customize Voice live](./voice-live-how-to-customize.md#azure-custom-voices).
99+
- question: |
100+
What is Avatar support?
101+
answer: |
102+
Pair speech output with visual avatars for multimodal experiences.
103+
- question: |
104+
What is Custom Avatar?
105+
answer: |
106+
Photorealistic digital human using Azure AI TTS. Built from video recordings, tailored to specific actor’s appearance and voice.
107+
- name: Conversational Enhancements
108+
questions:
109+
- question: |
110+
What is the difference between Azure Semantic VAD and Basic Server VAD?
111+
answer: |
112+
Azure Semantic VAD is more noise robust and accurate for detecting utterance boundaries.
113+
- question: |
114+
What is EOU (End of Utterance) detection?
115+
answer: |
116+
Uses context to determine if a user finished speaking or just paused.
117+
- question: |
118+
How does noise suppression work?
119+
answer: |
120+
Filters background noise based on advanced technology.
121+
- question: |
122+
How does echo cancellation work?
123+
answer: |
124+
Removes echo of agent’s own voice picked up by microphone.
125+
- name: Function Calling
126+
questions:
127+
- question: |
128+
Does Voice live support function calling?
129+
answer: |
130+
Yes, including asynchronous function calling.
131+
- question: |
132+
Is there model context protocol (MCP) support?
133+
answer: |
134+
Currently MCP isn't supported.
135+
- name: Pricing
136+
questions:
137+
- question: |
138+
Where is the pricing listed?
139+
answer: |
140+
[Voice live overview](./voice-live.md#pricing)
141+
- question: |
142+
How do I estimate the cost based on my use case?
143+
answer: |
144+
Estimate by audio minutes; tokens are billing unit. See [pricing](./voice-live.md#pricing) and [token usage and cost estimation](./voice-live.md#token-usage-and-cost-estimation).
145+
- question: |
146+
Are there separate quota and throttling limits for voice-live?
147+
answer: |
148+
Yes, quota applies specifically to Voice live API (default: 100k tokens/min).
149+
- name: Additional
150+
questions:
151+
- question: |
152+
Does this service provide an SDK?
153+
answer: |
154+
Yes, SDKs for Python and C#. See [Voice live - Reference - Voice live SDK](./voice-live.md).
155+
- question: |
156+
Does this service include content filtering?
157+
answer: |
158+
Yes, content filtering is included.
159+
- question: |
160+
Can you modify or disable the content filtering in Voice live API?
161+
answer: |
162+
No. If you need custom content filtering, you can use the bring-your-own-model (PREVIEW) feature.
163+
- question: |
164+
Does Voice live API support WebRTC?
165+
answer: |
166+
WebRTC is currently not supported.
167+
- question: |
168+
Is SIP supported?
169+
answer: |
170+
SIP is currently not supported.
171+
172+
additionalContent: |
173+
174+
## Next steps
175+
176+
- Learn more about [How to use the Voice live API](./voice-live-how-to.md)
177+
- See the [Voice live API reference](./voice-live-api-reference.md)
178+
- [What's new](releasenotes.md)

0 commit comments

Comments
 (0)