You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/includes/release-notes/release-notes-stt.md
+33-26Lines changed: 33 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,19 +16,27 @@ To transcribe multi-lingual contents continuously and accurately in an audio fil
16
16
#### New locales supported in Fast Transcription
17
17
Fast transcription now supports additional locales including fi-FI, he-IL, id-ID, pl-PL, pt-PT, sv-SE, etc. For more information, see [speech to text supported languages](../../language-support.md?tabs=stt).
18
18
19
+
### April 2025 release
20
+
21
+
#### Pronunciation assessment
22
+
23
+
We are excited to announce substantial improvements to our pronunciation assessment models for these locales: `de-DE`, `es-MX`, `it-IT`, `ja-JP`, `ko-KR`, and `pt-BR`. These enhancements bring significant advancements in Pearson Correlation Coefficients (PCC), ensuring more accurate and reliable assessments.
24
+
25
+
As before, the models are available through the API and Azure AI Foundry playground.
Conversation transcription multichannel diarization is retiring on March 28, 2025.
31
+
Conversation transcription multichannel diarization is retiring on March 28, 2025.
24
32
25
33
To continue using speech to text with diarization, use the following features instead:
26
34
27
35
-[Real-time speech to text with diarization](../../get-started-stt-diarization.md)
28
36
-[Fast transcription with diarization](../../fast-transcription-create.md)
29
37
-[Batch transcription with diarization](../../batch-transcription.md)
30
38
31
-
These speech to text features only support diarization for single-channel audio. Multichannel audio that you used with conversation transcription multichannel diarization isn't supported.
39
+
These speech to text features only support diarization for single-channel audio. Multichannel audio that you used with conversation transcription multichannel diarization isn't supported.
32
40
33
41
### January 2025 release
34
42
@@ -92,11 +100,11 @@ The video translation API is now available in public preview. For more informati
92
100
93
101
### September 2024 release
94
102
95
-
#### Real-time speech to text
103
+
#### Real-time speech to text
96
104
97
-
[Real-time speech to text](../../how-to-recognize-speech.md) has released new models, with better quality, for the following languages.
105
+
[Real-time speech to text](../../how-to-recognize-speech.md) has released new models, with better quality, for the following languages.
@@ -112,7 +120,7 @@ Language learning is now available in public preview. Interactive language learn
112
120
113
121
Speech [pronunciation assessment](../../how-to-pronunciation-assessment.md) now supports 33 languages generally available, and each language is available on all Speech to text [regions](../../regions.md#regions). For more information, see the full [language list for Pronunciation assessment](../../language-support.md?tabs=pronunciation-assessment).
114
122
115
-
| Language | Locale (BCP-47) |
123
+
| Language | Locale (BCP-47) |
116
124
|--|--|
117
125
|Arabic (Egypt)|`ar-EG`|
118
126
|Arabic (Saudi Arabia)|`ar-SA`|
@@ -126,10 +134,10 @@ Speech [pronunciation assessment](../../how-to-pronunciation-assessment.md) now
126
134
|English (Canada)|`en-CA`|
127
135
|English (India)|`en-IN`|
128
136
|English (United Kingdom)|`en-GB`|
129
-
|English (United States)|`en-US`|
130
-
|Finnish (Finland)|`fi-FI`|
131
-
|French (Canada)|`fr-CA`|
132
-
|French (France)|`fr-FR`|
137
+
|English (United States)|`en-US`|
138
+
|Finnish (Finland)|`fi-FI`|
139
+
|French (Canada)|`fr-CA`|
140
+
|French (France)|`fr-FR`|
133
141
|German (Germany)|`de-DE`|
134
142
|Hindi (India)|`hi-IN`|
135
143
|Italian (Italy)|`it-IT`|
@@ -141,11 +149,11 @@ Speech [pronunciation assessment](../../how-to-pronunciation-assessment.md) now
141
149
|Portuguese (Brazil)|`pt-BR`|
142
150
|Portuguese (Portugal)|`pt-PT`|
143
151
|Russian (Russia)|`ru-RU`|
144
-
|Spanish (Mexico)|`es-MX`|
145
-
|Spanish (Spain)|`es-ES`|
152
+
|Spanish (Mexico)|`es-MX`|
153
+
|Spanish (Spain)|`es-ES`|
146
154
|Swedish (Sweden)|`sv-SE`|
147
-
|Tamil (India)|`ta-IN`|
148
-
|Thai (Thailand)|`th-TH`|
155
+
|Tamil (India)|`ta-IN`|
156
+
|Thai (Thailand)|`th-TH`|
149
157
|Vietnamese (Vietnam)|`vi-VN`|
150
158
151
159
@@ -162,7 +170,7 @@ Fast transcription is now available in public preview. Fast transcription allows
162
170
163
171
#### Speech to text REST API v3.2 general availability
164
172
165
-
The Speech to text REST API version 3.2 is now generally available. For more information about speech to text REST API v3.2, see the [Speech to text REST API v3.2 reference documentation](/rest/api/speechtotext/operation-groups?view=rest-speechtotext-v3.2&preserve-view=true) and the [Speech to text REST API guide](../../rest-speech-to-text.md).
173
+
The Speech to text REST API version 3.2 is now generally available. For more information about speech to text REST API v3.2, see the [Speech to text REST API v3.2 reference documentation](/rest/api/speechtotext/operation-groups?view=rest-speechtotext-v3.2&preserve-view=true) and the [Speech to text REST API guide](../../rest-speech-to-text.md).
166
174
167
175
> [!NOTE]
168
176
> Preview versions *3.2-preview.1* and *3.2-preview.2* are retired as of September 2024.
@@ -209,17 +217,17 @@ You can create speech to text applications that use diarization to distinguish b
209
217
210
218
#### Speech to text model Update
211
219
212
-
[Real-time speech to text](../../how-to-recognize-speech.md) has released new models with bilingual capabilities. The `en-IN` model now supports both English and Hindi bilingual scenarios and offers improved accuracy. Arabic locales (`ar-AE`, `ar-BH`, `ar-DZ`, `ar-IL`, `ar-IQ`, `ar-KW`, `ar-LB`, `ar-LY`, `ar-MA`, `ar-OM`, `ar-PS`, `ar-QA`, `ar-SA`, `ar-SY`, `ar-TN`, `ar-YE`) are now equipped with bilingual support for English, enhanced accuracy and call center support.
220
+
[Real-time speech to text](../../how-to-recognize-speech.md) has released new models with bilingual capabilities. The `en-IN` model now supports both English and Hindi bilingual scenarios and offers improved accuracy. Arabic locales (`ar-AE`, `ar-BH`, `ar-DZ`, `ar-IL`, `ar-IQ`, `ar-KW`, `ar-LB`, `ar-LY`, `ar-MA`, `ar-OM`, `ar-PS`, `ar-QA`, `ar-SA`, `ar-SY`, `ar-TN`, `ar-YE`) are now equipped with bilingual support for English, enhanced accuracy and call center support.
213
221
214
-
[Batch transcription](../../batch-transcription.md) provides models with new architecture for these locales: `es-ES`, `es-MX`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, and `zh-CN`. These models significantly enhance readability and entity recognition.
222
+
[Batch transcription](../../batch-transcription.md) provides models with new architecture for these locales: `es-ES`, `es-MX`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, and `zh-CN`. These models significantly enhance readability and entity recognition.
215
223
216
224
### March 2024 release
217
225
218
226
#### Whisper general availability (GA)
219
227
220
228
The Whisper speech to text model with Azure AI Speech is now generally available.
221
229
222
-
Check out [What is the Whisper model?](../../whisper-overview.md) to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
230
+
Check out [What is the Whisper model?](../../whisper-overview.md) to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
223
231
224
232
### February 2024 release
225
233
@@ -235,11 +243,11 @@ Added phrase list support for the following locales: ar-SA, de-CH, en-IE, en-ZA,
235
243
236
244
#### Introducing Bilingual Speech Modeling!
237
245
We're thrilled to unveil a groundbreaking addition to our real-time speech modeling—Bilingual Speech Modeling. This significant enhancement allows our speech model to seamlessly support bilingual language pairs, such as English and Spanish, as well as English and French. This feature empowers users to effortlessly switch between languages during real-time interactions, marking a pivotal moment in our commitment to enhancing communication experiences.
238
-
246
+
239
247
Key Highlights:
240
248
- Bilingual Support: With our latest release, users can seamlessly switch between English and Spanish or between English and French during real-time speech interactions. This functionality is tailored to accommodate bilingual speakers who frequently transition between these two languages.
241
249
- Enhanced User Experience: Bilingual speakers, whether at work, home, or in various community settings, will find this feature immensely beneficial. The model's ability to comprehend and respond to both English and Spanish in real time opens up new possibilities for effective and fluid communication.
242
-
250
+
243
251
How to Use:
244
252
245
253
Choose es-US (Spanish and English) or fr-CA (French and English) when you call the Speech Service API or try it out on Speech Studio. Feel free to speak either language or mix them together—the model is designed to adapt dynamically, providing accurate and context-aware responses in both languages.
@@ -275,12 +283,12 @@ We encourage you to explore these improvements and consider potential issues for
275
283
276
284
#### Whisper public preview
277
285
278
-
Azure AI Speech now supports OpenAI's Whisper model via the batch transcription API. To learn more, check out the [Create a batch transcription](../../batch-transcription-create.md#use-a-whisper-model) guide.
286
+
Azure AI Speech now supports OpenAI's Whisper model via the batch transcription API. To learn more, check out the [Create a batch transcription](../../batch-transcription-create.md#use-a-whisper-model) guide.
279
287
280
288
> [!NOTE]
281
-
> Azure OpenAI Service also supports OpenAI's Whisper model for speech to text with a synchronous REST API. To learn more, check out the [quickstart](../../../openai/whisper-quickstart.md).
289
+
> Azure OpenAI Service also supports OpenAI's Whisper model for speech to text with a synchronous REST API. To learn more, check out the [quickstart](../../../openai/whisper-quickstart.md).
282
290
283
-
Check out [What is the Whisper model?](../../whisper-overview.md) to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
291
+
Check out [What is the Whisper model?](../../whisper-overview.md) to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
284
292
285
293
#### Speech to text REST API v3.2 public preview
286
294
@@ -306,13 +314,13 @@ Speech to text supports two new locales as shown in the following table. Refer t
306
314
#### Pronunciation Assessment
307
315
308
316
- Speech [Pronunciation Assessment](../../how-to-pronunciation-assessment.md) now supports 3 additional languages generally available in German (Germany), Japanese (Japan), and Spanish (Mexico), with 4 additional languages available in preview. For more information, see the full [language list for Pronunciation Assessment](../../language-support.md?tabs=pronunciation-assessment).
309
-
- You can now use the standard Speech to Text commitment tier for pronunciation assessment on all public regions. If you purchase a commitment tier for standard Speech to text, the spend for pronunciation assessment goes towards meeting the commitment. See [commitment tier pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services).
317
+
- You can now use the standard Speech to Text commitment tier for pronunciation assessment on all public regions. If you purchase a commitment tier for standard Speech to text, the spend for pronunciation assessment goes towards meeting the commitment. See [commitment tier pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services).
310
318
311
319
### February 2023 release
312
320
313
321
#### Pronunciation Assessment
314
322
315
-
- Speech [Pronunciation Assessment](../../how-to-pronunciation-assessment.md) now supports 5 additional languages generally available in English (United Kingdom), English (Australia), French (France), Spanish (Spain), and Chinese (Mandarin, Simplified), with other languages available in preview.
323
+
- Speech [Pronunciation Assessment](../../how-to-pronunciation-assessment.md) now supports 5 additional languages generally available in English (United Kingdom), English (Australia), French (France), Spanish (Spain), and Chinese (Mandarin, Simplified), with other languages available in preview.
316
324
- Added sample codes showing how to use Pronunciation Assessment in streaming mode in your own application.
317
325
-**C#**: See [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_recognition_samples.cs#:~:text=PronunciationAssessmentWithStream).
318
326
-**C++**: See [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/cpp/windows/console/samples/speech_recognition_samples.cpp#:~:text=PronunciationAssessmentWithStream).
@@ -510,4 +518,3 @@ Speech to text released 26 new locales in August: 2 European languages `cs-CZ` a
0 commit comments