Skip to content

Commit 0f736b6

Browse files
committed
Merge branch 'main' of https://github.com/MicrosoftDocs/azure-ai-docs-pr into mrb_05_08_2025_autogen_authoring_ref
2 parents 59628d3 + 1135e39 commit 0f736b6

File tree

1 file changed

+33
-26
lines changed

1 file changed

+33
-26
lines changed

articles/ai-services/speech-service/includes/release-notes/release-notes-stt.md

Lines changed: 33 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -16,19 +16,27 @@ To transcribe multi-lingual contents continuously and accurately in an audio fil
1616
#### New locales supported in Fast Transcription
1717
Fast transcription now supports additional locales including fi-FI, he-IL, id-ID, pl-PL, pt-PT, sv-SE, etc. For more information, see [speech to text supported languages](../../language-support.md?tabs=stt).
1818

19+
### April 2025 release
20+
21+
#### Pronunciation assessment
22+
23+
We are excited to announce substantial improvements to our pronunciation assessment models for these locales: `de-DE`, `es-MX`, `it-IT`, `ja-JP`, `ko-KR`, and `pt-BR`. These enhancements bring significant advancements in Pearson Correlation Coefficients (PCC), ensuring more accurate and reliable assessments.
24+
25+
As before, the models are available through the API and Azure AI Foundry playground.
26+
1927
### March 2025 release
2028

2129
#### Conversation transcription multichannel diarization (retired)
2230

23-
Conversation transcription multichannel diarization is retiring on March 28, 2025.
31+
Conversation transcription multichannel diarization is retiring on March 28, 2025.
2432

2533
To continue using speech to text with diarization, use the following features instead:
2634

2735
- [Real-time speech to text with diarization](../../get-started-stt-diarization.md)
2836
- [Fast transcription with diarization](../../fast-transcription-create.md)
2937
- [Batch transcription with diarization](../../batch-transcription.md)
3038

31-
These speech to text features only support diarization for single-channel audio. Multichannel audio that you used with conversation transcription multichannel diarization isn't supported.
39+
These speech to text features only support diarization for single-channel audio. Multichannel audio that you used with conversation transcription multichannel diarization isn't supported.
3240

3341
### January 2025 release
3442

@@ -92,11 +100,11 @@ The video translation API is now available in public preview. For more informati
92100

93101
### September 2024 release
94102

95-
#### Real-time speech to text
103+
#### Real-time speech to text
96104

97-
[Real-time speech to text](../../how-to-recognize-speech.md) has released new models, with better quality, for the following languages.
105+
[Real-time speech to text](../../how-to-recognize-speech.md) has released new models, with better quality, for the following languages.
98106

99-
fi-FI/id-ID/zh-TW/pl-PL/pt-PT
107+
fi-FI/id-ID/zh-TW/pl-PL/pt-PT
100108
es-SV/es-EC/es-BO/es-PY/es-AR/es-DO/es-UY/es-CR/es-VE/es-NI/es-HN/es-PR/es-CO/es-CL/es-CU/es-PE/es-PA/es-GT/es-GQ
101109

102110
#### Fast transcription (Preview)
@@ -112,7 +120,7 @@ Language learning is now available in public preview. Interactive language learn
112120

113121
Speech [pronunciation assessment](../../how-to-pronunciation-assessment.md) now supports 33 languages generally available, and each language is available on all Speech to text [regions](../../regions.md#regions). For more information, see the full [language list for Pronunciation assessment](../../language-support.md?tabs=pronunciation-assessment).
114122

115-
| Language | Locale (BCP-47) |
123+
| Language | Locale (BCP-47) |
116124
|--|--|
117125
|Arabic (Egypt)|`ar-EG` |
118126
|Arabic (Saudi Arabia)|`ar-SA` |
@@ -126,10 +134,10 @@ Speech [pronunciation assessment](../../how-to-pronunciation-assessment.md) now
126134
|English (Canada)|`en-CA` |
127135
|English (India)|`en-IN` |
128136
|English (United Kingdom)|`en-GB`|
129-
|English (United States)|`en-US`|
130-
|Finnish (Finland)|`fi-FI`|
131-
|French (Canada)|`fr-CA`|
132-
|French (France)|`fr-FR`|
137+
|English (United States)|`en-US`|
138+
|Finnish (Finland)|`fi-FI`|
139+
|French (Canada)|`fr-CA`|
140+
|French (France)|`fr-FR`|
133141
|German (Germany)|`de-DE`|
134142
|Hindi (India)|`hi-IN`|
135143
|Italian (Italy)|`it-IT`|
@@ -141,11 +149,11 @@ Speech [pronunciation assessment](../../how-to-pronunciation-assessment.md) now
141149
|Portuguese (Brazil)|`pt-BR`|
142150
|Portuguese (Portugal)|`pt-PT`|
143151
|Russian (Russia)|`ru-RU`|
144-
|Spanish (Mexico)|`es-MX` |
145-
|Spanish (Spain)|`es-ES` |
152+
|Spanish (Mexico)|`es-MX` |
153+
|Spanish (Spain)|`es-ES` |
146154
|Swedish (Sweden)|`sv-SE`|
147-
|Tamil (India)|`ta-IN` |
148-
|Thai (Thailand)|`th-TH` |
155+
|Tamil (India)|`ta-IN` |
156+
|Thai (Thailand)|`th-TH` |
149157
|Vietnamese (Vietnam)|`vi-VN` |
150158

151159

@@ -162,7 +170,7 @@ Fast transcription is now available in public preview. Fast transcription allows
162170

163171
#### Speech to text REST API v3.2 general availability
164172

165-
The Speech to text REST API version 3.2 is now generally available. For more information about speech to text REST API v3.2, see the [Speech to text REST API v3.2 reference documentation](/rest/api/speechtotext/operation-groups?view=rest-speechtotext-v3.2&preserve-view=true) and the [Speech to text REST API guide](../../rest-speech-to-text.md).
173+
The Speech to text REST API version 3.2 is now generally available. For more information about speech to text REST API v3.2, see the [Speech to text REST API v3.2 reference documentation](/rest/api/speechtotext/operation-groups?view=rest-speechtotext-v3.2&preserve-view=true) and the [Speech to text REST API guide](../../rest-speech-to-text.md).
166174

167175
> [!NOTE]
168176
> Preview versions *3.2-preview.1* and *3.2-preview.2* are retired as of September 2024.
@@ -209,17 +217,17 @@ You can create speech to text applications that use diarization to distinguish b
209217

210218
#### Speech to text model Update
211219

212-
[Real-time speech to text](../../how-to-recognize-speech.md) has released new models with bilingual capabilities. The `en-IN` model now supports both English and Hindi bilingual scenarios and offers improved accuracy. Arabic locales (`ar-AE`, `ar-BH`, `ar-DZ`, `ar-IL`, `ar-IQ`, `ar-KW`, `ar-LB`, `ar-LY`, `ar-MA`, `ar-OM`, `ar-PS`, `ar-QA`, `ar-SA`, `ar-SY`, `ar-TN`, `ar-YE`) are now equipped with bilingual support for English, enhanced accuracy and call center support.
220+
[Real-time speech to text](../../how-to-recognize-speech.md) has released new models with bilingual capabilities. The `en-IN` model now supports both English and Hindi bilingual scenarios and offers improved accuracy. Arabic locales (`ar-AE`, `ar-BH`, `ar-DZ`, `ar-IL`, `ar-IQ`, `ar-KW`, `ar-LB`, `ar-LY`, `ar-MA`, `ar-OM`, `ar-PS`, `ar-QA`, `ar-SA`, `ar-SY`, `ar-TN`, `ar-YE`) are now equipped with bilingual support for English, enhanced accuracy and call center support.
213221

214-
[Batch transcription](../../batch-transcription.md) provides models with new architecture for these locales: `es-ES`, `es-MX`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, and `zh-CN`. These models significantly enhance readability and entity recognition.
222+
[Batch transcription](../../batch-transcription.md) provides models with new architecture for these locales: `es-ES`, `es-MX`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, and `zh-CN`. These models significantly enhance readability and entity recognition.
215223

216224
### March 2024 release
217225

218226
#### Whisper general availability (GA)
219227

220228
The Whisper speech to text model with Azure AI Speech is now generally available.
221229

222-
Check out [What is the Whisper model?](../../whisper-overview.md) to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
230+
Check out [What is the Whisper model?](../../whisper-overview.md) to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
223231

224232
### February 2024 release
225233

@@ -235,11 +243,11 @@ Added phrase list support for the following locales: ar-SA, de-CH, en-IE, en-ZA,
235243

236244
#### Introducing Bilingual Speech Modeling!
237245
We're thrilled to unveil a groundbreaking addition to our real-time speech modeling—Bilingual Speech Modeling. This significant enhancement allows our speech model to seamlessly support bilingual language pairs, such as English and Spanish, as well as English and French. This feature empowers users to effortlessly switch between languages during real-time interactions, marking a pivotal moment in our commitment to enhancing communication experiences.
238-
246+
239247
Key Highlights:
240248
- Bilingual Support: With our latest release, users can seamlessly switch between English and Spanish or between English and French during real-time speech interactions. This functionality is tailored to accommodate bilingual speakers who frequently transition between these two languages.
241249
- Enhanced User Experience: Bilingual speakers, whether at work, home, or in various community settings, will find this feature immensely beneficial. The model's ability to comprehend and respond to both English and Spanish in real time opens up new possibilities for effective and fluid communication.
242-
250+
243251
How to Use:
244252

245253
Choose es-US (Spanish and English) or fr-CA (French and English) when you call the Speech Service API or try it out on Speech Studio. Feel free to speak either language or mix them together—the model is designed to adapt dynamically, providing accurate and context-aware responses in both languages.
@@ -275,12 +283,12 @@ We encourage you to explore these improvements and consider potential issues for
275283

276284
#### Whisper public preview
277285

278-
Azure AI Speech now supports OpenAI's Whisper model via the batch transcription API. To learn more, check out the [Create a batch transcription](../../batch-transcription-create.md#use-a-whisper-model) guide.
286+
Azure AI Speech now supports OpenAI's Whisper model via the batch transcription API. To learn more, check out the [Create a batch transcription](../../batch-transcription-create.md#use-a-whisper-model) guide.
279287

280288
> [!NOTE]
281-
> Azure OpenAI Service also supports OpenAI's Whisper model for speech to text with a synchronous REST API. To learn more, check out the [quickstart](../../../openai/whisper-quickstart.md).
289+
> Azure OpenAI Service also supports OpenAI's Whisper model for speech to text with a synchronous REST API. To learn more, check out the [quickstart](../../../openai/whisper-quickstart.md).
282290
283-
Check out [What is the Whisper model?](../../whisper-overview.md) to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
291+
Check out [What is the Whisper model?](../../whisper-overview.md) to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
284292

285293
#### Speech to text REST API v3.2 public preview
286294

@@ -306,13 +314,13 @@ Speech to text supports two new locales as shown in the following table. Refer t
306314
#### Pronunciation Assessment
307315

308316
- Speech [Pronunciation Assessment](../../how-to-pronunciation-assessment.md) now supports 3 additional languages generally available in German (Germany), Japanese (Japan), and Spanish (Mexico), with 4 additional languages available in preview. For more information, see the full [language list for Pronunciation Assessment](../../language-support.md?tabs=pronunciation-assessment).
309-
- You can now use the standard Speech to Text commitment tier for pronunciation assessment on all public regions. If you purchase a commitment tier for standard Speech to text, the spend for pronunciation assessment goes towards meeting the commitment. See [commitment tier pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services).
317+
- You can now use the standard Speech to Text commitment tier for pronunciation assessment on all public regions. If you purchase a commitment tier for standard Speech to text, the spend for pronunciation assessment goes towards meeting the commitment. See [commitment tier pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services).
310318

311319
### February 2023 release
312320

313321
#### Pronunciation Assessment
314322

315-
- Speech [Pronunciation Assessment](../../how-to-pronunciation-assessment.md) now supports 5 additional languages generally available in English (United Kingdom), English (Australia), French (France), Spanish (Spain), and Chinese (Mandarin, Simplified), with other languages available in preview.
323+
- Speech [Pronunciation Assessment](../../how-to-pronunciation-assessment.md) now supports 5 additional languages generally available in English (United Kingdom), English (Australia), French (France), Spanish (Spain), and Chinese (Mandarin, Simplified), with other languages available in preview.
316324
- Added sample codes showing how to use Pronunciation Assessment in streaming mode in your own application.
317325
- **C#**: See [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_recognition_samples.cs#:~:text=PronunciationAssessmentWithStream).
318326
- **C++**: See [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/cpp/windows/console/samples/speech_recognition_samples.cpp#:~:text=PronunciationAssessmentWithStream).
@@ -510,4 +518,3 @@ Speech to text released 26 new locales in August: 2 European languages `cs-CZ` a
510518
| `es-UY` | Spanish (Uruguay) |
511519
| `es-VE` | Spanish (Venezuela) |
512520
| `hu-HU` | Hungarian (Hungary) |
513-

0 commit comments

Comments
 (0)