You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/voice-live-language-support.md
+14-14Lines changed: 14 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,13 +18,13 @@ ms.custom: languages
18
18
19
19
## Introduction
20
20
21
-
The voice live API supports multiple languages and configuration options. In this document you will which languages are supported by the voice live API and how to configure them.
21
+
The voice live API supports multiple languages and configuration options. In this document, you learn which languages the voice live API supports and how to configure them.
22
22
23
23
## [Speech input](#tab/speechinput)
24
24
25
-
Depending on which model is being used voice live speech input is processed either by one of the multimodal models (e.g.`gpt-4o-realtime-preview`, `gpt-4o-mini-realtime-preview`, and `phi4-mm-realtime`) or by `azure speech to text` models.
25
+
Depending on which model is being used voice live speech input is processed either by one of the multimodal models (for example,`gpt-4o-realtime-preview`, `gpt-4o-mini-realtime-preview`, and `phi4-mm-realtime`) or by `azure speech to text` models.
26
26
27
-
### azure speech to text supported languages
27
+
### Azure speech to text supported languages
28
28
29
29
Azure speech to text is used for all configuration where a non-multimodal model is being used and for speech input transcriptions with `phi4-mm-realtime`.
30
30
It supports all languages documented on the [Language and voice support for the Speech service - Speech to text](./language-support.md?tabs=stt) tab.
@@ -51,7 +51,7 @@ The current multi-lingual model supports the following languages:
51
51
- Spanish (Mexico) [es-MX]
52
52
- Spanish (Spain) [es-ES]
53
53
54
-
To use **Automatic multilingual configuration using multilingual model** no additional configuration is required. If you do add the `language` string to the session`session.update` message, make sure to leave it empty.
54
+
To use **Automatic multilingual configuration using multilingual model** no extra configuration is required. If you do add the `language` string to the session`session.update` message, make sure to leave it empty.
55
55
56
56
```json
57
57
{
@@ -64,9 +64,9 @@ To use **Automatic multilingual configuration using multilingual model** no addi
64
64
```
65
65
66
66
> [!NOTE]
67
-
> The multilingual model will also generate results for unsupported languages, if no language is defined. In these cases transcription quality will be low. Ensure to configure defined languages, if you are setting up application with languages unsupported by the multilingual model.
67
+
> The multilingual model generates results for unsupported languages, if no language is defined. In these cases transcription, quality is low. Ensure to configure defined languages, if you're setting up application with languages unsupported by the multilingual model.
68
68
69
-
To configure a single or multiple languages not supported by the multimodal model you must add them to the `language` string in the session`session.update` message. A maximum of 10 languages are supported.
69
+
To configure a single or multiple languages not supported by the multimodal model, you must add them to the `language` string in the session`session.update` message. A maximum of 10 languages are supported.
70
70
71
71
```json
72
72
{
@@ -80,7 +80,7 @@ To configure a single or multiple languages not supported by the multimodal mode
80
80
81
81
### gpt-4o-realtime-preview and gpt-4o-mini-realtime-preview supported languages
82
82
83
-
While the underlying model was trained on 98 languages, OpenAI only lists the languages that exceeded <50% word error rate (WER) which is an industry standard benchmark for speech to text model accuracy. The model will return results for languages not listed below but the quality will be low.
83
+
While the underlying model was trained on 98 languages, OpenAI only lists the languages that exceeded <50% word error rate (WER) which is an industry standard benchmark for speech to text model accuracy. The model returns results for languages not listed but the quality will be low.
84
84
85
85
The following languages are supported by `gpt-4o-realtime-preview` and `gpt-4o-mini-realtime-preview`:
86
86
- Afrikaans
@@ -141,7 +141,7 @@ The following languages are supported by `gpt-4o-realtime-preview` and `gpt-4o-m
141
141
- Vietnamese
142
142
- Welsh
143
143
144
-
Multimodal models do not require a language configuration for the general processing. If you configure input audio transcription you can provide the transcription models with a language hint to improve transcription quality. In this case you need to add the `language`string to the session`session.update` message.
144
+
Multimodal models don't require a language configuration for the general processing. If you configure input audio transcription, you can provide the transcription models with a language hint to improve transcription quality. In this case you need to add the `language`string to the session`session.update` message.
145
145
146
146
```json
147
147
{
@@ -154,7 +154,7 @@ Multimodal models do not require a language configuration for the general proces
154
154
```
155
155
156
156
> [!NOTE]
157
-
> Multimodal gpt models only support the following transcription models: `whisper-1`, `gpt-4o-transcribe` and `gpt-4o-mini-transcribe`.
157
+
> Multimodal gpt models only support the following transcription models: `whisper-1`, `gpt-4o-transcribe`, and `gpt-4o-mini-transcribe`.
158
158
159
159
### phi4-mm-realtime supported languages
160
160
@@ -168,7 +168,7 @@ The following languages are supported by `phi4-mm-realtime`:
168
168
- Portuguese
169
169
- Spanish
170
170
171
-
Multimodal models do not require a language configuration for the general processing. If you configure input audio transcription for `phi4-mm-realtime` you need to use the same configuration as for all non-mulitmodal model configuration where azure-speech is used for transcription as described above.
171
+
Multimodal models don't require a language configuration for the general processing. If you configure input audio transcription for `phi4-mm-realtime` you need to use the same configuration as for all non-mulitmodal model configuration where `azure-speech` is used for transcription as described.
172
172
173
173
> [!NOTE]
174
174
> Multimodal phi models only support the following transcription models: `azure-speech`.
@@ -177,7 +177,7 @@ Multimodal models do not require a language configuration for the general proces
177
177
178
178
Depending on which model is being used voice live speech output is processed either by one of the multimodal OpenAI voices integrated into `gpt-4o-realtime-preview` and `gpt-4o-mini-realtime-preview` or by `azure text to speech` voices.
179
179
180
-
### azure text to speech supported languages
180
+
### Azure text to speech supported languages
181
181
182
182
Azure text to speech is used by default for all configuration where a non-multimodal OpenAI model is being used and can be configured in all configurations manually.
183
183
It supports all voices documented on the [Language and voice support for the Speech service - Text to speech](./language-support.md?tabs=tts) tab.
@@ -187,7 +187,7 @@ The following types of voices are supported:
187
187
1. Multilingual voices
188
188
1. Custom voices
189
189
190
-
The supported language is tied to the voice used. To configure specific Azure text to speech voices you need to add the `voice` configuration to the session`session.update` message.
190
+
The supported language is tied to the voice used. To configure specific Azure text to speech voices, you need to add the `voice` configuration to the session`session.update` message.
191
191
192
192
```json
193
193
{
@@ -201,9 +201,9 @@ The supported language is tied to the voice used. To configure specific Azure te
201
201
}
202
202
```
203
203
204
-
For more details see how to configure [Audio output through Azure text to speech](./voice-live-how-to.md#audio-output-through-azure-text-to-speech).
204
+
For more information, see how to configure [Audio output through Azure text to speech](./voice-live-how-to.md#audio-output-through-azure-text-to-speech).
205
205
206
-
In case of *Multilingual Voices* the language output can optionally be controlled by setting specific SSML tags. You can learn more about this in the [Customize voice and sound with SSML](./speech-synthesis-markup-voice.md#lang-examples) how to.
206
+
If *Multilingual Voices* are used, the language output can optionally be controlled by setting specific SSML tags. You can learn more about SSML tags in the [Customize voice and sound with SSML](./speech-synthesis-markup-voice.md#lang-examples) how to.
0 commit comments