Skip to content

Commit ff24b9e

Browse files
author
Ke WANG
committed
feat(pronscore): feature release note
1 parent 7447963 commit ff24b9e

File tree

1 file changed

+40
-26
lines changed

1 file changed

+40
-26
lines changed

articles/ai-services/speech-service/includes/release-notes/release-notes-stt.md

Lines changed: 40 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,21 @@ ms.custom: references_regions
99

1010
### May 2025 release
1111

12+
#### Pronunciation assessment
13+
14+
Announcing the release of 6 latest pronunciation assessment models (de-DE, es-MX, it-IT, ja-JP, ko-KR, pt-BR), which brings substantial improvements on Pearson Correlation Coefficients (PCC). Below are the relative improvement on each locale:
15+
16+
|locale|Relative improvement on PCC (%)|
17+
|:-----|:-----------------------------:|
18+
|de-DE |13.4|
19+
|es-MX |32.4|
20+
|it-IT |4.0|
21+
|ja-JP |20.5|
22+
|ko-KR |15.9|
23+
|pt-BR |12.3|
24+
25+
The new models are expected to provide a more accurate, efficient, and satisfying experience for all users and are available through the API and Azure AI Foundry playground. Feedback is encouraged to further refine its capabilities.
26+
1227
#### Fast transcription API - Multi-lingual speech transcription
1328

1429
To transcribe multi-lingual contents continuously and accurately in an audio file, now you can use the latest multi-lingual model without specifying the locale codes via fast transcription API. For more information, see [multi-lingual transcription in fast transcription](../../fast-transcription-create.md?tabs=multilingual-transcription-on).
@@ -20,15 +35,15 @@ Fast transcription now supports additional locales including fi-FI, he-IL, id-ID
2035

2136
#### Conversation transcription multichannel diarization (retired)
2237

23-
Conversation transcription multichannel diarization is retiring on March 28, 2025.
38+
Conversation transcription multichannel diarization is retiring on March 28, 2025.
2439

2540
To continue using speech to text with diarization, use the following features instead:
2641

2742
- [Real-time speech to text with diarization](../../get-started-stt-diarization.md)
2843
- [Fast transcription with diarization](../../fast-transcription-create.md)
2944
- [Batch transcription with diarization](../../batch-transcription.md)
3045

31-
These speech to text features only support diarization for single-channel audio. Multichannel audio that you used with conversation transcription multichannel diarization isn't supported.
46+
These speech to text features only support diarization for single-channel audio. Multichannel audio that you used with conversation transcription multichannel diarization isn't supported.
3247

3348
### January 2025 release
3449

@@ -92,11 +107,11 @@ The video translation API is now available in public preview. For more informati
92107

93108
### September 2024 release
94109

95-
#### Real-time speech to text
110+
#### Real-time speech to text
96111

97-
[Real-time speech to text](../../how-to-recognize-speech.md) has released new models, with better quality, for the following languages.
112+
[Real-time speech to text](../../how-to-recognize-speech.md) has released new models, with better quality, for the following languages.
98113

99-
fi-FI/id-ID/zh-TW/pl-PL/pt-PT
114+
fi-FI/id-ID/zh-TW/pl-PL/pt-PT
100115
es-SV/es-EC/es-BO/es-PY/es-AR/es-DO/es-UY/es-CR/es-VE/es-NI/es-HN/es-PR/es-CO/es-CL/es-CU/es-PE/es-PA/es-GT/es-GQ
101116

102117
#### Fast transcription (Preview)
@@ -112,7 +127,7 @@ Language learning is now available in public preview. Interactive language learn
112127

113128
Speech [pronunciation assessment](../../how-to-pronunciation-assessment.md) now supports 33 languages generally available, and each language is available on all Speech to text [regions](../../regions.md#regions). For more information, see the full [language list for Pronunciation assessment](../../language-support.md?tabs=pronunciation-assessment).
114129

115-
| Language | Locale (BCP-47) |
130+
| Language | Locale (BCP-47) |
116131
|--|--|
117132
|Arabic (Egypt)|`ar-EG` |
118133
|Arabic (Saudi Arabia)|`ar-SA` |
@@ -126,10 +141,10 @@ Speech [pronunciation assessment](../../how-to-pronunciation-assessment.md) now
126141
|English (Canada)|`en-CA` |
127142
|English (India)|`en-IN` |
128143
|English (United Kingdom)|`en-GB`|
129-
|English (United States)|`en-US`|
130-
|Finnish (Finland)|`fi-FI`|
131-
|French (Canada)|`fr-CA`|
132-
|French (France)|`fr-FR`|
144+
|English (United States)|`en-US`|
145+
|Finnish (Finland)|`fi-FI`|
146+
|French (Canada)|`fr-CA`|
147+
|French (France)|`fr-FR`|
133148
|German (Germany)|`de-DE`|
134149
|Hindi (India)|`hi-IN`|
135150
|Italian (Italy)|`it-IT`|
@@ -141,11 +156,11 @@ Speech [pronunciation assessment](../../how-to-pronunciation-assessment.md) now
141156
|Portuguese (Brazil)|`pt-BR`|
142157
|Portuguese (Portugal)|`pt-PT`|
143158
|Russian (Russia)|`ru-RU`|
144-
|Spanish (Mexico)|`es-MX` |
145-
|Spanish (Spain)|`es-ES` |
159+
|Spanish (Mexico)|`es-MX` |
160+
|Spanish (Spain)|`es-ES` |
146161
|Swedish (Sweden)|`sv-SE`|
147-
|Tamil (India)|`ta-IN` |
148-
|Thai (Thailand)|`th-TH` |
162+
|Tamil (India)|`ta-IN` |
163+
|Thai (Thailand)|`th-TH` |
149164
|Vietnamese (Vietnam)|`vi-VN` |
150165

151166

@@ -162,7 +177,7 @@ Fast transcription is now available in public preview. Fast transcription allows
162177

163178
#### Speech to text REST API v3.2 general availability
164179

165-
The Speech to text REST API version 3.2 is now generally available. For more information about speech to text REST API v3.2, see the [Speech to text REST API v3.2 reference documentation](/rest/api/speechtotext/operation-groups?view=rest-speechtotext-v3.2&preserve-view=true) and the [Speech to text REST API guide](../../rest-speech-to-text.md).
180+
The Speech to text REST API version 3.2 is now generally available. For more information about speech to text REST API v3.2, see the [Speech to text REST API v3.2 reference documentation](/rest/api/speechtotext/operation-groups?view=rest-speechtotext-v3.2&preserve-view=true) and the [Speech to text REST API guide](../../rest-speech-to-text.md).
166181

167182
> [!NOTE]
168183
> Preview versions *3.2-preview.1* and *3.2-preview.2* are retired as of September 2024.
@@ -209,17 +224,17 @@ You can create speech to text applications that use diarization to distinguish b
209224

210225
#### Speech to text model Update
211226

212-
[Real-time speech to text](../../how-to-recognize-speech.md) has released new models with bilingual capabilities. The `en-IN` model now supports both English and Hindi bilingual scenarios and offers improved accuracy. Arabic locales (`ar-AE`, `ar-BH`, `ar-DZ`, `ar-IL`, `ar-IQ`, `ar-KW`, `ar-LB`, `ar-LY`, `ar-MA`, `ar-OM`, `ar-PS`, `ar-QA`, `ar-SA`, `ar-SY`, `ar-TN`, `ar-YE`) are now equipped with bilingual support for English, enhanced accuracy and call center support.
227+
[Real-time speech to text](../../how-to-recognize-speech.md) has released new models with bilingual capabilities. The `en-IN` model now supports both English and Hindi bilingual scenarios and offers improved accuracy. Arabic locales (`ar-AE`, `ar-BH`, `ar-DZ`, `ar-IL`, `ar-IQ`, `ar-KW`, `ar-LB`, `ar-LY`, `ar-MA`, `ar-OM`, `ar-PS`, `ar-QA`, `ar-SA`, `ar-SY`, `ar-TN`, `ar-YE`) are now equipped with bilingual support for English, enhanced accuracy and call center support.
213228

214-
[Batch transcription](../../batch-transcription.md) provides models with new architecture for these locales: `es-ES`, `es-MX`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, and `zh-CN`. These models significantly enhance readability and entity recognition.
229+
[Batch transcription](../../batch-transcription.md) provides models with new architecture for these locales: `es-ES`, `es-MX`, `fr-FR`, `it-IT`, `ja-JP`, `ko-KR`, `pt-BR`, and `zh-CN`. These models significantly enhance readability and entity recognition.
215230

216231
### March 2024 release
217232

218233
#### Whisper general availability (GA)
219234

220235
The Whisper speech to text model with Azure AI Speech is now generally available.
221236

222-
Check out [What is the Whisper model?](../../whisper-overview.md) to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
237+
Check out [What is the Whisper model?](../../whisper-overview.md) to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
223238

224239
### February 2024 release
225240

@@ -235,11 +250,11 @@ Added phrase list support for the following locales: ar-SA, de-CH, en-IE, en-ZA,
235250

236251
#### Introducing Bilingual Speech Modeling!
237252
We're thrilled to unveil a groundbreaking addition to our real-time speech modeling—Bilingual Speech Modeling. This significant enhancement allows our speech model to seamlessly support bilingual language pairs, such as English and Spanish, as well as English and French. This feature empowers users to effortlessly switch between languages during real-time interactions, marking a pivotal moment in our commitment to enhancing communication experiences.
238-
253+
239254
Key Highlights:
240255
- Bilingual Support: With our latest release, users can seamlessly switch between English and Spanish or between English and French during real-time speech interactions. This functionality is tailored to accommodate bilingual speakers who frequently transition between these two languages.
241256
- Enhanced User Experience: Bilingual speakers, whether at work, home, or in various community settings, will find this feature immensely beneficial. The model's ability to comprehend and respond to both English and Spanish in real time opens up new possibilities for effective and fluid communication.
242-
257+
243258
How to Use:
244259

245260
Choose es-US (Spanish and English) or fr-CA (French and English) when you call the Speech Service API or try it out on Speech Studio. Feel free to speak either language or mix them together—the model is designed to adapt dynamically, providing accurate and context-aware responses in both languages.
@@ -275,12 +290,12 @@ We encourage you to explore these improvements and consider potential issues for
275290

276291
#### Whisper public preview
277292

278-
Azure AI Speech now supports OpenAI's Whisper model via the batch transcription API. To learn more, check out the [Create a batch transcription](../../batch-transcription-create.md#use-a-whisper-model) guide.
293+
Azure AI Speech now supports OpenAI's Whisper model via the batch transcription API. To learn more, check out the [Create a batch transcription](../../batch-transcription-create.md#use-a-whisper-model) guide.
279294

280295
> [!NOTE]
281-
> Azure OpenAI Service also supports OpenAI's Whisper model for speech to text with a synchronous REST API. To learn more, check out the [quickstart](../../../openai/whisper-quickstart.md).
296+
> Azure OpenAI Service also supports OpenAI's Whisper model for speech to text with a synchronous REST API. To learn more, check out the [quickstart](../../../openai/whisper-quickstart.md).
282297
283-
Check out [What is the Whisper model?](../../whisper-overview.md) to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
298+
Check out [What is the Whisper model?](../../whisper-overview.md) to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
284299

285300
#### Speech to text REST API v3.2 public preview
286301

@@ -306,13 +321,13 @@ Speech to text supports two new locales as shown in the following table. Refer t
306321
#### Pronunciation Assessment
307322

308323
- Speech [Pronunciation Assessment](../../how-to-pronunciation-assessment.md) now supports 3 additional languages generally available in German (Germany), Japanese (Japan), and Spanish (Mexico), with 4 additional languages available in preview. For more information, see the full [language list for Pronunciation Assessment](../../language-support.md?tabs=pronunciation-assessment).
309-
- You can now use the standard Speech to Text commitment tier for pronunciation assessment on all public regions. If you purchase a commitment tier for standard Speech to text, the spend for pronunciation assessment goes towards meeting the commitment. See [commitment tier pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services).
324+
- You can now use the standard Speech to Text commitment tier for pronunciation assessment on all public regions. If you purchase a commitment tier for standard Speech to text, the spend for pronunciation assessment goes towards meeting the commitment. See [commitment tier pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services).
310325

311326
### February 2023 release
312327

313328
#### Pronunciation Assessment
314329

315-
- Speech [Pronunciation Assessment](../../how-to-pronunciation-assessment.md) now supports 5 additional languages generally available in English (United Kingdom), English (Australia), French (France), Spanish (Spain), and Chinese (Mandarin, Simplified), with other languages available in preview.
330+
- Speech [Pronunciation Assessment](../../how-to-pronunciation-assessment.md) now supports 5 additional languages generally available in English (United Kingdom), English (Australia), French (France), Spanish (Spain), and Chinese (Mandarin, Simplified), with other languages available in preview.
316331
- Added sample codes showing how to use Pronunciation Assessment in streaming mode in your own application.
317332
- **C#**: See [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_recognition_samples.cs#:~:text=PronunciationAssessmentWithStream).
318333
- **C++**: See [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/cpp/windows/console/samples/speech_recognition_samples.cpp#:~:text=PronunciationAssessmentWithStream).
@@ -510,4 +525,3 @@ Speech to text released 26 new locales in August: 2 European languages `cs-CZ` a
510525
| `es-UY` | Spanish (Uruguay) |
511526
| `es-VE` | Spanish (Venezuela) |
512527
| `hu-HU` | Hungarian (Hungary) |
513-

0 commit comments

Comments
 (0)