missed models

Yulin Li · Yulin Li · commit cbdc84b43a6b · 2025-09-27T01:02:10.000+08:00
diff --git a/articles/ai-services/speech-service/regions.md b/articles/ai-services/speech-service/regions.md
@@ -174,20 +174,20 @@ The regions in these tables support most of the core features of the Speech serv
 
 # [Voice live](#tab/voice-live)
 
-| **Region** | **gpt-realtime** | **gpt-4o-mini-realtime** (Preview) | **gpt-4o** | **gpt-4o-mini**  | **gpt-4.1** | **gpt-4.1-mini** | **gpt-5** (Preview) | **gpt-5-mini** (Preview) | **gpt-5-nano** (Preview) | **phi4-mm-realtime** (Preview) | **phi4-mini** (Preview) | 
-|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
-| centralindia       | Cross-region<sup>1</sup> | Cross-region<sup>1</sup> | Global standard | Global standard | Global standard | Global standard | - | - | - | - | - |
-| eastus2       | Global standard | Global standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Regional | Regional |
-| southeastasia       | - | - | - | - | Global standard | Global standard | - | - | - | Regional | Regional |
-| swedencentral       | Global standard | Global standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Regional | Regional |
-| westus2       | Cross-region<sup>2</sup> | Cross-region<sup>2</sup> | Data zone standard | Data zone standard | Data zone standard | Data zone standard | - | - | - | Regional | Regional |
-|australiaeast| - | - | Global standard | Global standard | Global standard | Global standard | - | - | - | - | - |
-|japaneast| - | - | Global standard | Global standard | Global standard | Global standard | - | - | - | Regional | Regional |
-|eastus| - | - | Data zone standard | Data zone standard | Data zone standard | Data zone standard | - | - | - | - | - |
-|uksouth| - | - | Global standard | Global standard | Global standard | Global standard | - | - | - | - | - |
-|westeurope| - | - | Data zone standard | Data zone standard | Data zone standard | Data zone standard | - | - | - | - | - |
-
-<sup>1</sup> The Azure AI Foundry resource must be in Central India. Azure AI Speech features remain in Central India. The voice live API uses Sweden Central as needed for generative AI load balancing.  
+| **Region** | **gpt-realtime** | **gpt-4o-realtime-preview** (Preview) | **gpt-4o-mini-realtime-preview** (Preview) | **gpt-4o** | **gpt-4o-mini**  | **gpt-4.1** | **gpt-4.1-mini** | **gpt-5** (Preview) | **gpt-5-mini** (Preview) | **gpt-5-nano** (Preview) | **gpt-5-chat** (Preview) | **phi4-mm-realtime** (Preview) | **phi4-mini** (Preview) |
+|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
+| centralindia       | Cross-region<sup>1</sup> | Cross-region<sup>1</sup> | Cross-region<sup>1</sup> | Global standard | Global standard | Global standard | Global standard | - | - | - | - | - | - |
+| eastus2       | Global standard | Global standard | Global standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Global standard | Regional | Regional |
+| southeastasia       | - | - | - | - | - | Global standard | Global standard | - | - | - | - | Regional | Regional |
+| swedencentral       | Global standard | Global standard | Global standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Global standard | Regional | Regional |
+| westus2       | Cross-region<sup>2</sup> | Cross-region<sup>2</sup> | Cross-region<sup>2</sup> | Data zone standard | Data zone standard | Data zone standard | Data zone standard | - | - | - | - | Regional | Regional |
+|australiaeast| - | - | - | Global standard | Global standard | Global standard | Global standard | - | - | - | - | - | - |
+|japaneast| - | - | - | Global standard | Global standard | Global standard | Global standard | - | - | - | - | Regional | Regional |
+|eastus| - | - | - | Data zone standard | Data zone standard | Data zone standard | Data zone standard | - | - | - | - | - | - |
+|uksouth| - | - | - | Global standard | Global standard | Global standard | Global standard | - | - | - | - | - | - |
+|westeurope| - | - | - | Data zone standard | Data zone standard | Data zone standard | Data zone standard | - | - | - | - | - | - |
+
+<sup>1</sup> The Azure AI Foundry resource must be in Central India. Azure AI Speech features remain in Central India. The voice live API uses Sweden Central as needed for generative AI load balancing.
 
 <sup>2</sup> The Azure AI Foundry resource must be in West US 2. Azure AI Speech features remain in West US 2. The voice live API uses East US 2 as needed for generative AI load balancing.
 
@@ -267,7 +267,7 @@ The regions in these tables support most of the core features of the Speech serv
 
 # [Scenarios](#tab/scenarios)
 
-| **Region** | **Pronunciation assessment** | **Speaker recognition** | **Voice assistants** | 
+| **Region** | **Pronunciation assessment** | **Speaker recognition** | **Voice assistants** |
 |-----|-----|-----|
 | australiaeast      | ✅ | ✅ |  |
 | brazilsouth        | ✅ |  |  |
diff --git a/articles/ai-services/speech-service/voice-live-how-to.md b/articles/ai-services/speech-service/voice-live-how-to.md
@@ -33,7 +33,7 @@ An [Azure AI Foundry resource](../multi-service-resource.md) is required to acce
 The WebSocket endpoint for the voice live API is `wss://<your-ai-foundry-resource-name>.services.ai.azure.com/voice-live/realtime?api-version=2025-05-01-preview` or, for older resources, `wss://<your-ai-foundry-resource-name>.cognitiveservices.azure.com/voice-live/realtime?api-version=2025-05-01-preview`.
 The endpoint is the same for all models. The only difference is the required `model` query parameter, or, when using the Agent service, the `agent_id` and `project_id` parameters.
 
-For example, an endpoint for a resource with a custom domain would be `wss://<your-ai-foundry-resource-name>.services.ai.azure.com/voice-live/realtime?api-version=2025-05-01-preview&model=gpt-4o-mini-realtime`
+For example, an endpoint for a resource with a custom domain would be `wss://<your-ai-foundry-resource-name>.services.ai.azure.com/voice-live/realtime?api-version=2025-05-01-preview&model=gpt-realtime`
 
 ### Credentials
 
diff --git a/articles/ai-services/speech-service/voice-live-language-support.md b/articles/ai-services/speech-service/voice-live-language-support.md
@@ -22,7 +22,7 @@ The voice live API supports multiple languages and configuration options. In thi
 
 ## [Speech input](#tab/speechinput)
 
-Depending on which model is being used voice live speech input is processed either by one of the multimodal models (for example, `gpt-realtime`, `gpt-4o-mini-realtime`, and `phi4-mm-realtime`) or by `azure speech to text` models.
+Depending on which model is being used voice live speech input is processed either by one of the multimodal models (for example, `gpt-realtime`, `gpt-4o-realtime-preview`, `gpt-4o-mini-realtime-preview`, and `phi4-mm-realtime`) or by `azure speech to text` models.
 
 ### Azure speech to text supported languages
 
@@ -78,11 +78,11 @@ To configure a single or multiple languages not supported by the multimodal mode
 }
 ```
 
-### gpt-realtime and gpt-4o-mini-realtime supported languages
+### gpt-realtime, gpt-4o-realtime-preview and gpt-4o-mini-realtime-preview supported languages
 
 While the underlying model was trained on 98 languages, OpenAI only lists the languages that exceeded <50% word error rate (WER) which is an industry standard benchmark for speech to text model accuracy. The model returns results for languages not listed but the quality will be low.
 
-The following languages are supported by `gpt-realtime` and `gpt-4o-mini-realtime`:
+The following languages are supported by `gpt-realtime`, `gpt-4o-realtime-preview` and `gpt-4o-mini-realtime-preview`:
 - Afrikaans
 - Arabic
 - Armenian
@@ -175,7 +175,7 @@ Multimodal models don't require a language configuration for the general process
 
 ## [Speech output](#tab/speechoutput)
 
-Depending on which model is being used voice live speech output is processed either by one of the multimodal OpenAI voices integrated into `gpt-realtime` and `gpt-4o-mini-realtime` or by `azure text to speech` voices.
+Depending on which model is being used voice live speech output is processed either by one of the multimodal OpenAI voices integrated into `gpt-realtime`, `gpt-4o-realtime-preview`, and `gpt-4o-mini-realtime-preview` or by `azure text to speech` voices.
 
 ### Azure text to speech supported languages
 
diff --git a/articles/ai-services/speech-service/voice-live.md b/articles/ai-services/speech-service/voice-live.md
@@ -59,7 +59,7 @@ The voice live API is fully managed, eliminating the need for customers to handl
 
 The voice live API is designed for compatibility with the Azure OpenAI Realtime API. The supported real-time events are mostly in parity with the [Azure OpenAI Realtime API events](/azure/ai-foundry/openai/realtime-audio-reference?context=/azure/ai-services/speech-service/context/context), with some exceptions as described in the [voice live API how to guide](./voice-live-how-to.md).
 
-Features that are unique to the voice live API are designed to be optional and additive. You can add Azure AI Speech capabilities such as noise suppression, echo cancellation, and advanced end-of-turn detection to your existing applications without needing to change your existing architecture. 
+Features that are unique to the voice live API are designed to be optional and additive. You can add Azure AI Speech capabilities such as noise suppression, echo cancellation, and advanced end-of-turn detection to your existing applications without needing to change your existing architecture.
 
 The API is supported through WebSocket events, allowing for an easy server-to-server integration. Your backend or middle-tier service connects to the voice live API via WebSockets. You can use the WebSocket messages directly to interact with the API.
 
@@ -74,14 +74,16 @@ The voice live API supports the following models. For supported regions, see the
 | Model | Description |
 | ------------------------------ | ----------- |
 | `gpt-realtime`      | GPT real-time + option to use Azure text to speech voices including custom voice for audio. |
-| `gpt-4o-mini-realtime` | GPT-4o mini real-time + option to use Azure text to speech voices including custom voice for audio. |
+| `gpt-4o-realtime-preview` | GPT-4o real-time preview + option to use Azure text to speech voices including custom voice for audio. |
+| `gpt-4o-mini-realtime-preview` | GPT-4o mini real-time preview + option to use Azure text to speech voices including custom voice for audio. |
 | `gpt-4o` | GPT-4o + audio input through Azure speech to text + audio output through Azure text to speech voices including custom voice. |
 | `gpt-4o-mini` | GPT-4o mini + audio input through Azure speech to text + audio output through Azure text to speech voices including custom voice. |
 | `gpt-4.1` | GPT-4.1 + audio input through Azure speech to text + audio output through Azure text to speech voices including custom voice. |
 | `gpt-4.1-mini` | GPT-4.1 mini + audio input through Azure speech to text + audio output through Azure text to speech voices including custom voice. |
 | `gpt-5` | GPT-5 + audio input through Azure speech to text + audio output through Azure text to speech voices including custom voice. |
 | `gpt-5-mini` | GPT-5 mini + audio input through Azure speech to text + audio output through Azure text to speech voices including custom voice. |
 | `gpt-5-nano` | GPT-5 nano + audio input through Azure speech to text + audio output through Azure text to speech voices including custom voice. |
+| `gpt-5-chat` | GPT-5 chat + audio input through Azure speech to text + audio output through Azure text to speech voices including custom voice. |
 | `phi4-mm-realtime` | Phi4-mm + audio output through Azure text to speech voices including custom voice. |
 | `phi4-mini` | Phi4-mm + audio input through Azure speech to text + audio output through Azure text to speech voices including custom voice. |
 
@@ -103,35 +105,35 @@ To meet your requirements, you can either build your own solution or use the voi
 
 ## Pricing
 
-Pricing for the voice live API is in effect from July 1, 2025. 
+Pricing for the voice live API is in effect from July 1, 2025.
 
 Pricing for the voice live API is tiered (**Pro**, **Basic**, and **Lite**) based on the generative AI model used.
 
 You don't select a tier. You choose a generative AI model and the corresponding pricing applies.
 
 | Pricing category | Models |
 | ----- | ------ |
-| Voice live pro | `gpt-realtime`, `gpt-4o`, `gpt-4.1`, `gpt-5` |
-| Voice live basic | `gpt-4o-mini-realtime`, `gpt-4o-mini`, `gpt-4.1-mini`, `gpt-5-mini` |
+| Voice live pro | `gpt-realtime`, `gpt-4o-realtime`, `gpt-4o`, `gpt-4.1`, `gpt-5`, `gpt-5-chat` |
+| Voice live basic | `gpt-4o-mini-realtime-preview`, `gpt-4o-mini`, `gpt-4.1-mini`, `gpt-5-mini` |
 | Voice live lite | `gpt-5-nano`,`phi4-mm-realtime`, `phi4-mini` |
 
 If you choose to use custom voice for your speech output, you're charged separately for custom voice model training and hosting. Refer to the [Text to Speech – Custom Voice – Professional](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services) pricing for details. Custom voice is a limited access feature. [Learn more about how to create custom voices.](https://aka.ms/CNVPro)
 
-Avatars are charged separately with [the interactive avatar pricing published here.](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services)  
+Avatars are charged separately with [the interactive avatar pricing published here.](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services)
 
-For more details regarding custom voice and avatar training charges, [refer to this pricing note.](/azure/ai-services/speech-service/text-to-speech#model-training-and-hosting-time-for-custom-voice) 
+For more details regarding custom voice and avatar training charges, [refer to this pricing note.](/azure/ai-services/speech-service/text-to-speech#model-training-and-hosting-time-for-custom-voice)
 
 ### Example pricing scenarios
 
 Here are some example pricing scenarios to help you understand how the voice live API is charged:
 
 #### Scenario 1
 
-A customer service agent built with standard Azure AI Speech input, GPT-4.1, custom Azure AI Speech output, and a custom avatar.  
+A customer service agent built with standard Azure AI Speech input, GPT-4.1, custom Azure AI Speech output, and a custom avatar.
 
 You're charged at the voice live pro rate for:
 - Text
-- Audio with Azure AI Speech - Standard 
+- Audio with Azure AI Speech - Standard
 - Audio with Azure AI Speech - Custom
 
 You're charged separately for the training and model hosting of:
@@ -140,7 +142,7 @@ You're charged separately for the training and model hosting of:
 
 #### Scenario 2
 
-A learning agent built with `gpt-realtime` native audio input and standard Azure AI Speech output. 
+A learning agent built with `gpt-realtime` native audio input and standard Azure AI Speech output.
 
 You're charged at the voice live pro rate for:
 - Text
@@ -149,19 +151,19 @@ You're charged at the voice live pro rate for:
 
 #### Scenario 3
 
-A talent interview agent built with `gpt-4o-mini-realtime` native audio input, and standard Azure AI Speech output and standard avatar. 
+A talent interview agent built with `gpt-4o-mini-realtime-preview` native audio input, and standard Azure AI Speech output and standard avatar.
 
 You're charged at the voice live basic rate for:
 - Text
-- Native audio with `gpt-4o-mini-realtime`
+- Native audio with `gpt-4o-mini-realtime-preview`
 - Audio with Azure AI Speech - Standard
 
 You're charged separately for:
 - Text to speech avatar (standard)
 
 #### Scenario 4
 
-An in-car assistant built with `phi4-mm-realtime` and Azure custom voice.  
+An in-car assistant built with `phi4-mm-realtime` and Azure custom voice.
 
 You're charged at the voice live lite rate for:
 - Text
@@ -184,7 +186,7 @@ You can estimate token usage for different model families with the voice live AP
 | Azure OpenAI models | ~10 tokens | ~20 tokens |
 | Phi models | ~12.5 tokens | ~20 tokens |
 
-You're also charged for cached audio and text inputs, including the prompt and the context of the conversations. 
+You're also charged for cached audio and text inputs, including the prompt and the context of the conversations.
 
 ## Related content