You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/includes/release-notes/release-notes-stt.md
+40-26Lines changed: 40 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,21 @@ ms.custom: references_regions
9
9
10
10
### May 2025 release
11
11
12
+
#### Pronunciation assessment
13
+
14
+
Announcing the release of 6 latest pronunciation assessment models (de-DE, es-MX, it-IT, ja-JP, ko-KR, pt-BR), which brings substantial improvements on Pearson Correlation Coefficients (PCC). Below are the relative improvement on each locale:
15
+
16
+
|locale|Relative improvement on PCC (%)|
17
+
|:-----|:-----------------------------:|
18
+
|de-DE |13.4|
19
+
|es-MX |32.4|
20
+
|it-IT |4.0|
21
+
|ja-JP |20.5|
22
+
|ko-KR |15.9|
23
+
|pt-BR |12.3|
24
+
25
+
The new models are expected to provide a more accurate, efficient, and satisfying experience for all users and are available through the API and Azure AI Foundry playground. Feedback is encouraged to further refine its capabilities.
26
+
12
27
#### Fast transcription API - Multi-lingual speech transcription
13
28
14
29
To transcribe multi-lingual contents continuously and accurately in an audio file, now you can use the latest multi-lingual model without specifying the locale codes via fast transcription API. For more information, see [multi-lingual transcription in fast transcription](../../fast-transcription-create.md?tabs=multilingual-transcription-on).
@@ -20,15 +35,15 @@ Fast transcription now supports additional locales including fi-FI, he-IL, id-ID
Conversation transcription multichannel diarization is retiring on March 28, 2025.
38
+
Conversation transcription multichannel diarization is retiring on March 28, 2025.
24
39
25
40
To continue using speech to text with diarization, use the following features instead:
26
41
27
42
-[Real-time speech to text with diarization](../../get-started-stt-diarization.md)
28
43
-[Fast transcription with diarization](../../fast-transcription-create.md)
29
44
-[Batch transcription with diarization](../../batch-transcription.md)
30
45
31
-
These speech to text features only support diarization for single-channel audio. Multichannel audio that you used with conversation transcription multichannel diarization isn't supported.
46
+
These speech to text features only support diarization for single-channel audio. Multichannel audio that you used with conversation transcription multichannel diarization isn't supported.
32
47
33
48
### January 2025 release
34
49
@@ -92,11 +107,11 @@ The video translation API is now available in public preview. For more informati
92
107
93
108
### September 2024 release
94
109
95
-
#### Real-time speech to text
110
+
#### Real-time speech to text
96
111
97
-
[Real-time speech to text](../../how-to-recognize-speech.md) has released new models, with better quality, for the following languages.
112
+
[Real-time speech to text](../../how-to-recognize-speech.md) has released new models, with better quality, for the following languages.
@@ -112,7 +127,7 @@ Language learning is now available in public preview. Interactive language learn
112
127
113
128
Speech [pronunciation assessment](../../how-to-pronunciation-assessment.md) now supports 33 languages generally available, and each language is available on all Speech to text [regions](../../regions.md#regions). For more information, see the full [language list for Pronunciation assessment](../../language-support.md?tabs=pronunciation-assessment).
114
129
115
-
| Language | Locale (BCP-47) |
130
+
| Language | Locale (BCP-47) |
116
131
|--|--|
117
132
|Arabic (Egypt)|`ar-EG`|
118
133
|Arabic (Saudi Arabia)|`ar-SA`|
@@ -126,10 +141,10 @@ Speech [pronunciation assessment](../../how-to-pronunciation-assessment.md) now
126
141
|English (Canada)|`en-CA`|
127
142
|English (India)|`en-IN`|
128
143
|English (United Kingdom)|`en-GB`|
129
-
|English (United States)|`en-US`|
130
-
|Finnish (Finland)|`fi-FI`|
131
-
|French (Canada)|`fr-CA`|
132
-
|French (France)|`fr-FR`|
144
+
|English (United States)|`en-US`|
145
+
|Finnish (Finland)|`fi-FI`|
146
+
|French (Canada)|`fr-CA`|
147
+
|French (France)|`fr-FR`|
133
148
|German (Germany)|`de-DE`|
134
149
|Hindi (India)|`hi-IN`|
135
150
|Italian (Italy)|`it-IT`|
@@ -141,11 +156,11 @@ Speech [pronunciation assessment](../../how-to-pronunciation-assessment.md) now
141
156
|Portuguese (Brazil)|`pt-BR`|
142
157
|Portuguese (Portugal)|`pt-PT`|
143
158
|Russian (Russia)|`ru-RU`|
144
-
|Spanish (Mexico)|`es-MX`|
145
-
|Spanish (Spain)|`es-ES`|
159
+
|Spanish (Mexico)|`es-MX`|
160
+
|Spanish (Spain)|`es-ES`|
146
161
|Swedish (Sweden)|`sv-SE`|
147
-
|Tamil (India)|`ta-IN`|
148
-
|Thai (Thailand)|`th-TH`|
162
+
|Tamil (India)|`ta-IN`|
163
+
|Thai (Thailand)|`th-TH`|
149
164
|Vietnamese (Vietnam)|`vi-VN`|
150
165
151
166
@@ -162,7 +177,7 @@ Fast transcription is now available in public preview. Fast transcription allows
162
177
163
178
#### Speech to text REST API v3.2 general availability
164
179
165
-
The Speech to text REST API version 3.2 is now generally available. For more information about speech to text REST API v3.2, see the [Speech to text REST API v3.2 reference documentation](/rest/api/speechtotext/operation-groups?view=rest-speechtotext-v3.2&preserve-view=true) and the [Speech to text REST API guide](../../rest-speech-to-text.md).
180
+
The Speech to text REST API version 3.2 is now generally available. For more information about speech to text REST API v3.2, see the [Speech to text REST API v3.2 reference documentation](/rest/api/speechtotext/operation-groups?view=rest-speechtotext-v3.2&preserve-view=true) and the [Speech to text REST API guide](../../rest-speech-to-text.md).
166
181
167
182
> [!NOTE]
168
183
> Preview versions *3.2-preview.1* and *3.2-preview.2* are retired as of September 2024.
@@ -209,17 +224,17 @@ You can create speech to text applications that use diarization to distinguish b
209
224
210
225
#### Speech to text model Update
211
226
212
-
[Real-time speech to text](../../how-to-recognize-speech.md) has released new models with bilingual capabilities. The `en-IN` model now supports both English and Hindi bilingual scenarios and offers improved accuracy. Arabic locales (`ar-AE`, `ar-BH`, `ar-DZ`, `ar-IL`, `ar-IQ`, `ar-KW`, `ar-LB`, `ar-LY`, `ar-MA`, `ar-OM`, `ar-PS`, `ar-QA`, `ar-SA`, `ar-SY`, `ar-TN`, `ar-YE`) are now equipped with bilingual support for English, enhanced accuracy and call center support.
227
+
[Real-time speech to text](../../how-to-recognize-speech.md) has released new models with bilingual capabilities. The `en-IN` model now supports both English and Hindi bilingual scenarios and offers improved accuracy. Arabic locales (`ar-AE`, `ar-BH`, `ar-DZ`, `ar-IL`, `ar-IQ`, `ar-KW`, `ar-LB`, `ar-LY`, `ar-MA`, `ar-OM`, `ar-PS`, `ar-QA`, `ar-SA`, `ar-SY`, `ar-TN`, `ar-YE`) are now equipped with bilingual support for English, enhanced accuracy and call center support.
213
228
214
-
[Batch transcription](../../batch-transcription.md) provides models with new architecture for these locales: `es-ES`, `es-MX`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, and `zh-CN`. These models significantly enhance readability and entity recognition.
229
+
[Batch transcription](../../batch-transcription.md) provides models with new architecture for these locales: `es-ES`, `es-MX`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, and `zh-CN`. These models significantly enhance readability and entity recognition.
215
230
216
231
### March 2024 release
217
232
218
233
#### Whisper general availability (GA)
219
234
220
235
The Whisper speech to text model with Azure AI Speech is now generally available.
221
236
222
-
Check out [What is the Whisper model?](../../whisper-overview.md) to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
237
+
Check out [What is the Whisper model?](../../whisper-overview.md) to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
223
238
224
239
### February 2024 release
225
240
@@ -235,11 +250,11 @@ Added phrase list support for the following locales: ar-SA, de-CH, en-IE, en-ZA,
235
250
236
251
#### Introducing Bilingual Speech Modeling!
237
252
We're thrilled to unveil a groundbreaking addition to our real-time speech modeling—Bilingual Speech Modeling. This significant enhancement allows our speech model to seamlessly support bilingual language pairs, such as English and Spanish, as well as English and French. This feature empowers users to effortlessly switch between languages during real-time interactions, marking a pivotal moment in our commitment to enhancing communication experiences.
238
-
253
+
239
254
Key Highlights:
240
255
- Bilingual Support: With our latest release, users can seamlessly switch between English and Spanish or between English and French during real-time speech interactions. This functionality is tailored to accommodate bilingual speakers who frequently transition between these two languages.
241
256
- Enhanced User Experience: Bilingual speakers, whether at work, home, or in various community settings, will find this feature immensely beneficial. The model's ability to comprehend and respond to both English and Spanish in real time opens up new possibilities for effective and fluid communication.
242
-
257
+
243
258
How to Use:
244
259
245
260
Choose es-US (Spanish and English) or fr-CA (French and English) when you call the Speech Service API or try it out on Speech Studio. Feel free to speak either language or mix them together—the model is designed to adapt dynamically, providing accurate and context-aware responses in both languages.
@@ -275,12 +290,12 @@ We encourage you to explore these improvements and consider potential issues for
275
290
276
291
#### Whisper public preview
277
292
278
-
Azure AI Speech now supports OpenAI's Whisper model via the batch transcription API. To learn more, check out the [Create a batch transcription](../../batch-transcription-create.md#use-a-whisper-model) guide.
293
+
Azure AI Speech now supports OpenAI's Whisper model via the batch transcription API. To learn more, check out the [Create a batch transcription](../../batch-transcription-create.md#use-a-whisper-model) guide.
279
294
280
295
> [!NOTE]
281
-
> Azure OpenAI Service also supports OpenAI's Whisper model for speech to text with a synchronous REST API. To learn more, check out the [quickstart](../../../openai/whisper-quickstart.md).
296
+
> Azure OpenAI Service also supports OpenAI's Whisper model for speech to text with a synchronous REST API. To learn more, check out the [quickstart](../../../openai/whisper-quickstart.md).
282
297
283
-
Check out [What is the Whisper model?](../../whisper-overview.md) to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
298
+
Check out [What is the Whisper model?](../../whisper-overview.md) to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
284
299
285
300
#### Speech to text REST API v3.2 public preview
286
301
@@ -306,13 +321,13 @@ Speech to text supports two new locales as shown in the following table. Refer t
306
321
#### Pronunciation Assessment
307
322
308
323
- Speech [Pronunciation Assessment](../../how-to-pronunciation-assessment.md) now supports 3 additional languages generally available in German (Germany), Japanese (Japan), and Spanish (Mexico), with 4 additional languages available in preview. For more information, see the full [language list for Pronunciation Assessment](../../language-support.md?tabs=pronunciation-assessment).
309
-
- You can now use the standard Speech to Text commitment tier for pronunciation assessment on all public regions. If you purchase a commitment tier for standard Speech to text, the spend for pronunciation assessment goes towards meeting the commitment. See [commitment tier pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services).
324
+
- You can now use the standard Speech to Text commitment tier for pronunciation assessment on all public regions. If you purchase a commitment tier for standard Speech to text, the spend for pronunciation assessment goes towards meeting the commitment. See [commitment tier pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services).
310
325
311
326
### February 2023 release
312
327
313
328
#### Pronunciation Assessment
314
329
315
-
- Speech [Pronunciation Assessment](../../how-to-pronunciation-assessment.md) now supports 5 additional languages generally available in English (United Kingdom), English (Australia), French (France), Spanish (Spain), and Chinese (Mandarin, Simplified), with other languages available in preview.
330
+
- Speech [Pronunciation Assessment](../../how-to-pronunciation-assessment.md) now supports 5 additional languages generally available in English (United Kingdom), English (Australia), French (France), Spanish (Spain), and Chinese (Mandarin, Simplified), with other languages available in preview.
316
331
- Added sample codes showing how to use Pronunciation Assessment in streaming mode in your own application.
317
332
-**C#**: See [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_recognition_samples.cs#:~:text=PronunciationAssessmentWithStream).
318
333
-**C++**: See [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/cpp/windows/console/samples/speech_recognition_samples.cpp#:~:text=PronunciationAssessmentWithStream).
@@ -510,4 +525,3 @@ Speech to text released 26 new locales in August: 2 European languages `cs-CZ` a
0 commit comments