|
| 1 | +### YamlMime:FAQ |
| 2 | +metadata: |
| 3 | + title: Voice live frequently asked questions (FAQ) |
| 4 | + titleSuffix: Azure AI services |
| 5 | + description: Get answers to frequently asked questions about the Voice live API in Azure AI Speech. |
| 6 | + author: goergenj |
| 7 | + reviewers: pafarley |
| 8 | + manager: nitinme |
| 9 | + ms.service: azure-ai-speech |
| 10 | + ms.topic: faq |
| 11 | + ms.date: 09/30/2025 |
| 12 | + ms.author: jagoerge |
| 13 | + ms.reviewer: pafarley |
| 14 | +title: Voice live FAQ |
| 15 | +summary: | |
| 16 | + This article answers commonly asked questions about the Voice live API. If you can't find answers to your questions here, check out [other support options](../cognitive-services-support-options.md?context=%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext%253fcontext%253d%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext). |
| 17 | +
|
| 18 | +sections: |
| 19 | + - name: General |
| 20 | + questions: |
| 21 | + - question: | |
| 22 | + What scenarios does Voice live support? |
| 23 | + answer: | |
| 24 | + Voice live API supports a wide range of real-time, natural voice interaction scenarios: contact centers, automotive assistants, accessibility applications, virtual tutors and learning companions, multilingual public service agents, HR support, and training. Used by customers like eClinicalWorks and the Government of Malta. |
| 25 | + - question: | |
| 26 | + How does Voice live compare to AOAI Realtime API? When should I choose which? |
| 27 | + answer: | |
| 28 | + Voice live API enhances AOAI Realtime API by offering: expanded model selection (including GPT-Realtime, GPT-5, GPT-4.1, PHI), more natural voice options, more supported speech languages, avatar integration, advanced semantic voice activity detection (VAD), seamless Azure AI Foundry Agent Service integration, telephony integration via Azure Communication Services. |
| 29 | + - question: | |
| 30 | + What regions does Voice live support? |
| 31 | + answer: | |
| 32 | + Voice live is available in 10+ Azure regions. For more information, see [Region support](./regions.md?tabs=voice-live). |
| 33 | + - question: | |
| 34 | + What is the tokens-per-minute threshold? |
| 35 | + answer: | |
| 36 | + The current limit is 100,000 tokens per minute per resource. Customers can request an increase. For more information, see [Speech service quotas and limits](./speech-services-quotas-and-limits.md). |
| 37 | + - name: Generative AI Models |
| 38 | + questions: |
| 39 | + - question: | |
| 40 | + What generative AI models are supported? |
| 41 | + answer: | |
| 42 | + Supports OpenAI models in Azure AI Foundry, Phi-based LLMs, and SLMs. For more information, see [Voice live overview](./voice-live.md). Voice live also provides an option to bring-your-own model (PREVIEW). |
| 43 | + - question: | |
| 44 | + How do I choose the LLM model for my use case? |
| 45 | + answer: | |
| 46 | + Consider: accuracy (Azure Speech-based models are more robust for noisy audio), existing LLM solutions (reuse prompts and grounding data), latency (text-based LLMs can have slightly higher latency), inference cost (smaller models can be more cost-effective). |
| 47 | + - question: | |
| 48 | + What is response instruction? |
| 49 | + answer: | |
| 50 | + Guides model behavior and context. Define agent personality, specify questions, control response formatting. Responses should be concise and normalized for optimal audio synthesis. |
| 51 | + - question: | |
| 52 | + What is response temperature? |
| 53 | + answer: | |
| 54 | + Controls randomness of output. Lower values = deterministic, higher = creative. Adjust temperature or Top-P, not both. |
| 55 | + - name: Speech Input |
| 56 | + questions: |
| 57 | + - question: | |
| 58 | + What languages does Voice live support? |
| 59 | + answer: | |
| 60 | + Supports 146 languages/locales for input, 151 for output, 600+ neural voices. See [Voice live language support](./voice-live-language-support.md?tabs=speechinput). |
| 61 | + - question: | |
| 62 | + How do I get the live transcripts from the call? |
| 63 | + answer: | |
| 64 | + Use text output events. Details at [Voice live API reference](./voice-live-api-reference.md). |
| 65 | + - question: | |
| 66 | + What is a phrase list? |
| 67 | + answer: | |
| 68 | + Domain-specific terms to improve recognition. Limit to <500 words/phrases. See [How to customize Voice live](./voice-live-how-to-customize.md). |
| 69 | + - question: | |
| 70 | + Are there other ways to improve speech input recognition accuracy? |
| 71 | + answer: | |
| 72 | + Use Azure AI Custom Speech models. Configure multiple custom models per language. See [How to customize Voice live](./voice-live-how-to-customize.md). |
| 73 | + - name: Speech Output |
| 74 | + questions: |
| 75 | + - question: | |
| 76 | + What voices does Voice live support? |
| 77 | + answer: | |
| 78 | + Native audio output with preferred model, Azure AI Speech TTS voices (600+ voices, 150+ locales, 30+ Neural HD voices). Custom voice models via Professional Voice Fine-tuning. For more information, see [Voice live API supported languages](./voice-live-language-support.md?tabs=speechoutput). |
| 79 | + - question: | |
| 80 | + How do I pick a voice? |
| 81 | + answer: | |
| 82 | + Use [Voice Gallery](https://speech.microsoft.com/portal/voicegallery) in Azure AI Foundry Speech Playground. Consider gender, age, capability, style, personality. |
| 83 | + - question: | |
| 84 | + What is voice temperature? |
| 85 | + answer: | |
| 86 | + Controls expressiveness. Higher = dynamic/emotive, lower = neutral. Applies to Neural HD voices. |
| 87 | + - question: | |
| 88 | + What is speaking rate? |
| 89 | + answer: | |
| 90 | + Controls agent's speech speed. |
| 91 | + - question: | |
| 92 | + What is a custom lexicon? |
| 93 | + answer: | |
| 94 | + Define pronunciation rules for specific words. See [How to customize Voice live](./voice-live-how-to-customize.md#speech-output-customization). |
| 95 | + - question: | |
| 96 | + What is Custom Voice? |
| 97 | + answer: | |
| 98 | + Create brand-specific synthetic voices using your own audio data. See [How to customize Voice live](./voice-live-how-to-customize.md#azure-custom-voices). |
| 99 | + - question: | |
| 100 | + What is Avatar support? |
| 101 | + answer: | |
| 102 | + Pair speech output with visual avatars for multimodal experiences. |
| 103 | + - question: | |
| 104 | + What is Custom Avatar? |
| 105 | + answer: | |
| 106 | + Photorealistic digital human using Azure AI TTS. Built from video recordings, tailored to specific actor’s appearance and voice. |
| 107 | + - name: Conversational Enhancements |
| 108 | + questions: |
| 109 | + - question: | |
| 110 | + What is the difference between Azure Semantic VAD and Basic Server VAD? |
| 111 | + answer: | |
| 112 | + Azure Semantic VAD is more noise robust and accurate for detecting utterance boundaries. |
| 113 | + - question: | |
| 114 | + What is EOU (End of Utterance) detection? |
| 115 | + answer: | |
| 116 | + Uses context to determine if a user finished speaking or just paused. |
| 117 | + - question: | |
| 118 | + How does noise suppression work? |
| 119 | + answer: | |
| 120 | + Filters background noise based on advanced technology. |
| 121 | + - question: | |
| 122 | + How does echo cancellation work? |
| 123 | + answer: | |
| 124 | + Removes echo of agent’s own voice picked up by microphone. |
| 125 | + - name: Function Calling |
| 126 | + questions: |
| 127 | + - question: | |
| 128 | + Does Voice live support function calling? |
| 129 | + answer: | |
| 130 | + Yes, including asynchronous function calling. |
| 131 | + - question: | |
| 132 | + Is there model context protocol (MCP) support? |
| 133 | + answer: | |
| 134 | + Currently MCP isn't supported. |
| 135 | + - name: Pricing |
| 136 | + questions: |
| 137 | + - question: | |
| 138 | + Where is the pricing listed? |
| 139 | + answer: | |
| 140 | + [Voice live overview](./voice-live.md#pricing) |
| 141 | + - question: | |
| 142 | + How do I estimate the cost based on my use case? |
| 143 | + answer: | |
| 144 | + Estimate by audio minutes; tokens are billing unit. See [pricing](./voice-live.md#pricing) and [token usage and cost estimation](./voice-live.md#token-usage-and-cost-estimation). |
| 145 | + - question: | |
| 146 | + Are there separate quota and throttling limits for voice-live? |
| 147 | + answer: | |
| 148 | + Yes, quota applies specifically to Voice live API (default: 100k tokens/min). |
| 149 | + - name: Additional |
| 150 | + questions: |
| 151 | + - question: | |
| 152 | + Does this service provide an SDK? |
| 153 | + answer: | |
| 154 | + Yes, SDKs for Python and C#. See [Voice live - Reference - Voice live SDK](./voice-live.md). |
| 155 | + - question: | |
| 156 | + Does this service include content filtering? |
| 157 | + answer: | |
| 158 | + Yes, content filtering is included. |
| 159 | + - question: | |
| 160 | + Can you modify or disable the content filtering in Voice live API? |
| 161 | + answer: | |
| 162 | + No. If you need custom content filtering, you can use the bring-your-own-model (PREVIEW) feature. |
| 163 | + - question: | |
| 164 | + Does Voice live API support WebRTC? |
| 165 | + answer: | |
| 166 | + WebRTC is currently not supported. |
| 167 | + - question: | |
| 168 | + Is SIP supported? |
| 169 | + answer: | |
| 170 | + SIP is currently not supported. |
| 171 | +
|
| 172 | +additionalContent: | |
| 173 | +
|
| 174 | + ## Next steps |
| 175 | + |
| 176 | + - Learn more about [How to use the Voice live API](./voice-live-how-to.md) |
| 177 | + - See the [Voice live API reference](./voice-live-api-reference.md) |
| 178 | + - [What's new](releasenotes.md) |
0 commit comments