Merge pull request #5811 from eric-urban/eur/voice-live

prmerger-automator[bot] · web-flow · commit 5bdb549bb55f · 2025-07-01T14:19:17.000Z
voice live fast follow 2
diff --git a/articles/ai-services/speech-service/voice-live.md b/articles/ai-services/speech-service/voice-live.md
@@ -176,9 +176,12 @@ You're charged separately for the training and model hosting of:
 
 Tokens are the units that generative AI models use to process input and generate output. 
 
-You can estimate the cost with Voice Live API based on audio length as follows:
-- Each second of input audio is approximately 10 tokens.
-- Each second of output audio is approximately 20 tokens.
+You can estimate token usage for different model families with the Voice Live API based on audio length. The following token calculations apply to each model family:
+
+| Model family | Input audio (tokens per second) | Output audio (tokens per second) |
+| ----- | ----- | ----- |
+| Azure OpenAI models | ~10 tokens | ~20 tokens |
+| Phi models | ~12.5 tokens | ~20 tokens |
 
 You're also charged for cached audio and text inputs, including the prompt and the context of the conversations.