Skip to content

Commit cbdc84b

Browse files
author
Yulin Li
committed
missed models
1 parent d891946 commit cbdc84b

File tree

4 files changed

+36
-34
lines changed

4 files changed

+36
-34
lines changed

articles/ai-services/speech-service/regions.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -174,20 +174,20 @@ The regions in these tables support most of the core features of the Speech serv
174174

175175
# [Voice live](#tab/voice-live)
176176

177-
| **Region** | **gpt-realtime** | **gpt-4o-mini-realtime** (Preview) | **gpt-4o** | **gpt-4o-mini** | **gpt-4.1** | **gpt-4.1-mini** | **gpt-5** (Preview) | **gpt-5-mini** (Preview) | **gpt-5-nano** (Preview) | **phi4-mm-realtime** (Preview) | **phi4-mini** (Preview) |
178-
|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
179-
| centralindia | Cross-region<sup>1</sup> | Cross-region<sup>1</sup> | Global standard | Global standard | Global standard | Global standard | - | - | - | - | - |
180-
| eastus2 | Global standard | Global standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Regional | Regional |
181-
| southeastasia | - | - | - | - | Global standard | Global standard | - | - | - | Regional | Regional |
182-
| swedencentral | Global standard | Global standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Regional | Regional |
183-
| westus2 | Cross-region<sup>2</sup> | Cross-region<sup>2</sup> | Data zone standard | Data zone standard | Data zone standard | Data zone standard | - | - | - | Regional | Regional |
184-
|australiaeast| - | - | Global standard | Global standard | Global standard | Global standard | - | - | - | - | - |
185-
|japaneast| - | - | Global standard | Global standard | Global standard | Global standard | - | - | - | Regional | Regional |
186-
|eastus| - | - | Data zone standard | Data zone standard | Data zone standard | Data zone standard | - | - | - | - | - |
187-
|uksouth| - | - | Global standard | Global standard | Global standard | Global standard | - | - | - | - | - |
188-
|westeurope| - | - | Data zone standard | Data zone standard | Data zone standard | Data zone standard | - | - | - | - | - |
189-
190-
<sup>1</sup> The Azure AI Foundry resource must be in Central India. Azure AI Speech features remain in Central India. The voice live API uses Sweden Central as needed for generative AI load balancing.
177+
| **Region** | **gpt-realtime** | **gpt-4o-realtime-preview** (Preview) | **gpt-4o-mini-realtime-preview** (Preview) | **gpt-4o** | **gpt-4o-mini** | **gpt-4.1** | **gpt-4.1-mini** | **gpt-5** (Preview) | **gpt-5-mini** (Preview) | **gpt-5-nano** (Preview) | **gpt-5-chat** (Preview) | **phi4-mm-realtime** (Preview) | **phi4-mini** (Preview) |
178+
|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
179+
| centralindia | Cross-region<sup>1</sup> | Cross-region<sup>1</sup> | Cross-region<sup>1</sup> | Global standard | Global standard | Global standard | Global standard | - | - | - | - | - | - |
180+
| eastus2 | Global standard | Global standard | Global standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Global standard | Regional | Regional |
181+
| southeastasia | - | - | - | - | - | Global standard | Global standard | - | - | - | - | Regional | Regional |
182+
| swedencentral | Global standard | Global standard | Global standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Data zone standard | Global standard | Regional | Regional |
183+
| westus2 | Cross-region<sup>2</sup> | Cross-region<sup>2</sup> | Cross-region<sup>2</sup> | Data zone standard | Data zone standard | Data zone standard | Data zone standard | - | - | - | - | Regional | Regional |
184+
|australiaeast| - | - | - | Global standard | Global standard | Global standard | Global standard | - | - | - | - | - | - |
185+
|japaneast| - | - | - | Global standard | Global standard | Global standard | Global standard | - | - | - | - | Regional | Regional |
186+
|eastus| - | - | - | Data zone standard | Data zone standard | Data zone standard | Data zone standard | - | - | - | - | - | - |
187+
|uksouth| - | - | - | Global standard | Global standard | Global standard | Global standard | - | - | - | - | - | - |
188+
|westeurope| - | - | - | Data zone standard | Data zone standard | Data zone standard | Data zone standard | - | - | - | - | - | - |
189+
190+
<sup>1</sup> The Azure AI Foundry resource must be in Central India. Azure AI Speech features remain in Central India. The voice live API uses Sweden Central as needed for generative AI load balancing.
191191

192192
<sup>2</sup> The Azure AI Foundry resource must be in West US 2. Azure AI Speech features remain in West US 2. The voice live API uses East US 2 as needed for generative AI load balancing.
193193

@@ -267,7 +267,7 @@ The regions in these tables support most of the core features of the Speech serv
267267

268268
# [Scenarios](#tab/scenarios)
269269

270-
| **Region** | **Pronunciation assessment** | **Speaker recognition** | **Voice assistants** |
270+
| **Region** | **Pronunciation assessment** | **Speaker recognition** | **Voice assistants** |
271271
|-----|-----|-----|
272272
| australiaeast ||| |
273273
| brazilsouth || | |

articles/ai-services/speech-service/voice-live-how-to.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ An [Azure AI Foundry resource](../multi-service-resource.md) is required to acce
3333
The WebSocket endpoint for the voice live API is `wss://<your-ai-foundry-resource-name>.services.ai.azure.com/voice-live/realtime?api-version=2025-05-01-preview` or, for older resources, `wss://<your-ai-foundry-resource-name>.cognitiveservices.azure.com/voice-live/realtime?api-version=2025-05-01-preview`.
3434
The endpoint is the same for all models. The only difference is the required `model` query parameter, or, when using the Agent service, the `agent_id` and `project_id` parameters.
3535

36-
For example, an endpoint for a resource with a custom domain would be `wss://<your-ai-foundry-resource-name>.services.ai.azure.com/voice-live/realtime?api-version=2025-05-01-preview&model=gpt-4o-mini-realtime`
36+
For example, an endpoint for a resource with a custom domain would be `wss://<your-ai-foundry-resource-name>.services.ai.azure.com/voice-live/realtime?api-version=2025-05-01-preview&model=gpt-realtime`
3737

3838
### Credentials
3939

articles/ai-services/speech-service/voice-live-language-support.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ The voice live API supports multiple languages and configuration options. In thi
2222

2323
## [Speech input](#tab/speechinput)
2424

25-
Depending on which model is being used voice live speech input is processed either by one of the multimodal models (for example, `gpt-realtime`,`gpt-4o-mini-realtime`, and`phi4-mm-realtime`) or by `azure speech to text` models.
25+
Depending on which model is being used voice live speech input is processed either by one of the multimodal models (for example, `gpt-realtime`, `gpt-4o-realtime-preview`, `gpt-4o-mini-realtime-preview`, and `phi4-mm-realtime`) or by `azure speech to text` models.
2626

2727
### Azure speech to text supported languages
2828

@@ -78,11 +78,11 @@ To configure a single or multiple languages not supported by the multimodal mode
7878
}
7979
```
8080

81-
### gpt-realtime and gpt-4o-mini-realtime supported languages
81+
### gpt-realtime, gpt-4o-realtime-preview and gpt-4o-mini-realtime-preview supported languages
8282

8383
While the underlying model was trained on 98 languages, OpenAI only lists the languages that exceeded <50% word error rate (WER) which is an industry standard benchmark for speech to text model accuracy. The model returns results for languages not listed but the quality will be low.
8484

85-
The following languages are supported by `gpt-realtime` and `gpt-4o-mini-realtime`:
85+
The following languages are supported by `gpt-realtime`, `gpt-4o-realtime-preview` and `gpt-4o-mini-realtime-preview`:
8686
- Afrikaans
8787
- Arabic
8888
- Armenian
@@ -175,7 +175,7 @@ Multimodal models don't require a language configuration for the general process
175175

176176
## [Speech output](#tab/speechoutput)
177177

178-
Depending on which model is being used voice live speech output is processed either by one of the multimodal OpenAI voices integrated into `gpt-realtime` and`gpt-4o-mini-realtime` or by `azure text to speech` voices.
178+
Depending on which model is being used voice live speech output is processed either by one of the multimodal OpenAI voices integrated into `gpt-realtime`, `gpt-4o-realtime-preview`, and `gpt-4o-mini-realtime-preview` or by `azure text to speech` voices.
179179

180180
### Azure text to speech supported languages
181181

articles/ai-services/speech-service/voice-live.md

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ The voice live API is fully managed, eliminating the need for customers to handl
5959

6060
The voice live API is designed for compatibility with the Azure OpenAI Realtime API. The supported real-time events are mostly in parity with the [Azure OpenAI Realtime API events](/azure/ai-foundry/openai/realtime-audio-reference?context=/azure/ai-services/speech-service/context/context), with some exceptions as described in the [voice live API how to guide](./voice-live-how-to.md).
6161

62-
Features that are unique to the voice live API are designed to be optional and additive. You can add Azure AI Speech capabilities such as noise suppression, echo cancellation, and advanced end-of-turn detection to your existing applications without needing to change your existing architecture.
62+
Features that are unique to the voice live API are designed to be optional and additive. You can add Azure AI Speech capabilities such as noise suppression, echo cancellation, and advanced end-of-turn detection to your existing applications without needing to change your existing architecture.
6363

6464
The API is supported through WebSocket events, allowing for an easy server-to-server integration. Your backend or middle-tier service connects to the voice live API via WebSockets. You can use the WebSocket messages directly to interact with the API.
6565

@@ -74,14 +74,16 @@ The voice live API supports the following models. For supported regions, see the
7474
| Model | Description |
7575
| ------------------------------ | ----------- |
7676
| `gpt-realtime` | GPT real-time + option to use Azure text to speech voices including custom voice for audio. |
77-
| `gpt-4o-mini-realtime` | GPT-4o mini real-time + option to use Azure text to speech voices including custom voice for audio. |
77+
| `gpt-4o-realtime-preview` | GPT-4o real-time preview + option to use Azure text to speech voices including custom voice for audio. |
78+
| `gpt-4o-mini-realtime-preview` | GPT-4o mini real-time preview + option to use Azure text to speech voices including custom voice for audio. |
7879
| `gpt-4o` | GPT-4o + audio input through Azure speech to text + audio output through Azure text to speech voices including custom voice. |
7980
| `gpt-4o-mini` | GPT-4o mini + audio input through Azure speech to text + audio output through Azure text to speech voices including custom voice. |
8081
| `gpt-4.1` | GPT-4.1 + audio input through Azure speech to text + audio output through Azure text to speech voices including custom voice. |
8182
| `gpt-4.1-mini` | GPT-4.1 mini + audio input through Azure speech to text + audio output through Azure text to speech voices including custom voice. |
8283
| `gpt-5` | GPT-5 + audio input through Azure speech to text + audio output through Azure text to speech voices including custom voice. |
8384
| `gpt-5-mini` | GPT-5 mini + audio input through Azure speech to text + audio output through Azure text to speech voices including custom voice. |
8485
| `gpt-5-nano` | GPT-5 nano + audio input through Azure speech to text + audio output through Azure text to speech voices including custom voice. |
86+
| `gpt-5-chat` | GPT-5 chat + audio input through Azure speech to text + audio output through Azure text to speech voices including custom voice. |
8587
| `phi4-mm-realtime` | Phi4-mm + audio output through Azure text to speech voices including custom voice. |
8688
| `phi4-mini` | Phi4-mm + audio input through Azure speech to text + audio output through Azure text to speech voices including custom voice. |
8789

@@ -103,35 +105,35 @@ To meet your requirements, you can either build your own solution or use the voi
103105

104106
## Pricing
105107

106-
Pricing for the voice live API is in effect from July 1, 2025.
108+
Pricing for the voice live API is in effect from July 1, 2025.
107109

108110
Pricing for the voice live API is tiered (**Pro**, **Basic**, and **Lite**) based on the generative AI model used.
109111

110112
You don't select a tier. You choose a generative AI model and the corresponding pricing applies.
111113

112114
| Pricing category | Models |
113115
| ----- | ------ |
114-
| Voice live pro | `gpt-realtime`, `gpt-4o`, `gpt-4.1`, `gpt-5` |
115-
| Voice live basic | `gpt-4o-mini-realtime`, `gpt-4o-mini`, `gpt-4.1-mini`, `gpt-5-mini` |
116+
| Voice live pro | `gpt-realtime`, `gpt-4o-realtime`, `gpt-4o`, `gpt-4.1`, `gpt-5`, `gpt-5-chat` |
117+
| Voice live basic | `gpt-4o-mini-realtime-preview`, `gpt-4o-mini`, `gpt-4.1-mini`, `gpt-5-mini` |
116118
| Voice live lite | `gpt-5-nano`,`phi4-mm-realtime`, `phi4-mini` |
117119

118120
If you choose to use custom voice for your speech output, you're charged separately for custom voice model training and hosting. Refer to the [Text to Speech – Custom Voice – Professional](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services) pricing for details. Custom voice is a limited access feature. [Learn more about how to create custom voices.](https://aka.ms/CNVPro)
119121

120-
Avatars are charged separately with [the interactive avatar pricing published here.](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services)
122+
Avatars are charged separately with [the interactive avatar pricing published here.](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services)
121123

122-
For more details regarding custom voice and avatar training charges, [refer to this pricing note.](/azure/ai-services/speech-service/text-to-speech#model-training-and-hosting-time-for-custom-voice)
124+
For more details regarding custom voice and avatar training charges, [refer to this pricing note.](/azure/ai-services/speech-service/text-to-speech#model-training-and-hosting-time-for-custom-voice)
123125

124126
### Example pricing scenarios
125127

126128
Here are some example pricing scenarios to help you understand how the voice live API is charged:
127129

128130
#### Scenario 1
129131

130-
A customer service agent built with standard Azure AI Speech input, GPT-4.1, custom Azure AI Speech output, and a custom avatar.
132+
A customer service agent built with standard Azure AI Speech input, GPT-4.1, custom Azure AI Speech output, and a custom avatar.
131133

132134
You're charged at the voice live pro rate for:
133135
- Text
134-
- Audio with Azure AI Speech - Standard
136+
- Audio with Azure AI Speech - Standard
135137
- Audio with Azure AI Speech - Custom
136138

137139
You're charged separately for the training and model hosting of:
@@ -140,7 +142,7 @@ You're charged separately for the training and model hosting of:
140142

141143
#### Scenario 2
142144

143-
A learning agent built with `gpt-realtime` native audio input and standard Azure AI Speech output.
145+
A learning agent built with `gpt-realtime` native audio input and standard Azure AI Speech output.
144146

145147
You're charged at the voice live pro rate for:
146148
- Text
@@ -149,19 +151,19 @@ You're charged at the voice live pro rate for:
149151

150152
#### Scenario 3
151153

152-
A talent interview agent built with `gpt-4o-mini-realtime` native audio input, and standard Azure AI Speech output and standard avatar.
154+
A talent interview agent built with `gpt-4o-mini-realtime-preview` native audio input, and standard Azure AI Speech output and standard avatar.
153155

154156
You're charged at the voice live basic rate for:
155157
- Text
156-
- Native audio with `gpt-4o-mini-realtime`
158+
- Native audio with `gpt-4o-mini-realtime-preview`
157159
- Audio with Azure AI Speech - Standard
158160

159161
You're charged separately for:
160162
- Text to speech avatar (standard)
161163

162164
#### Scenario 4
163165

164-
An in-car assistant built with `phi4-mm-realtime` and Azure custom voice.
166+
An in-car assistant built with `phi4-mm-realtime` and Azure custom voice.
165167

166168
You're charged at the voice live lite rate for:
167169
- Text
@@ -184,7 +186,7 @@ You can estimate token usage for different model families with the voice live AP
184186
| Azure OpenAI models | ~10 tokens | ~20 tokens |
185187
| Phi models | ~12.5 tokens | ~20 tokens |
186188

187-
You're also charged for cached audio and text inputs, including the prompt and the context of the conversations.
189+
You're also charged for cached audio and text inputs, including the prompt and the context of the conversations.
188190

189191
## Related content
190192

0 commit comments

Comments
 (0)