Skip to content

Commit 359abb6

Browse files
committed
release notes and updates
1 parent 768291e commit 359abb6

File tree

4 files changed

+54
-9
lines changed

4 files changed

+54
-9
lines changed

articles/ai-services/speech-service/includes/release-notes/release-notes-stt.md

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,28 @@ ms.author: eur
88

99
### November 2023 release
1010

11+
#### Speech To text models update
12+
13+
We're excited to introduce a significant update to our speech models, promising enhanced accuracy, improved readability, and refined entity recognition. This upgrade comes with a robust new structure, bolstered by an expanded training dataset, ensuring a marked advancement in overall performance. It includes newly released models for en-US, zh-CN, ja-JP, it-IT, pt-BR, es-MX, es-ES, fr-FR, de-DE, ko-KR, tr-TR, sv-SE, and he-IL.
14+
15+
Highlights:
16+
- Better accuracy with new model structure: The redefined model structure, coupled with a richer training dataset, elevates accuracy levels, promising more precise speech output.
17+
- Readability improvement: Our latest model brings a substantial boost to readability, enhancing the coherence and clarity of spoken content.
18+
- Advanced entity recognition: Entity recognition receives a substantial upgrade, resulting in more accurate and nuanced results.
19+
20+
Potential impacts: Despite these advancements, it's crucial to be mindful of potential impacts:
21+
- Custom Silence Timeout Feature: Users employing custom silence timeout, especially with low settings, might encounter over-segmentation and potential omissions of single-word phrases.
22+
- The new model might exhibit compatibility issues with the Keyword prefix feature, and users are advised to assess its performance in their specific applications.
23+
- Reduced disfluency words or phrases: Users might notice a reduction in disfluency words or phrases like "um" or "uh" in the speech output.
24+
- Inaccuracies in word timestamp duration: Some disfluency words might display inaccuracies in timestamp duration, requiring attention in applications dependent on precise timing.
25+
- Confidence score distribution variance: Users relying on confidence scores and associated thresholds should be aware of potential variations in distribution, necessitating adjustments for optimal performance.
26+
- The accuracy enhancement of the phrase list feature might be affected by the misrecognition of certain phrases.
27+
28+
We encourage you to explore these improvements and consider potential issues for a seamless transition, and as always, your feedback is instrumental in refining and advancing our services.
29+
1130
#### Pronunciation Assessment
1231

13-
- Speech [Pronunciation Assessment](../../how-to-pronunciation-assessment.md) now supports 18 languages generally available, with 6 additional languages available in public preview. For more information, see the full [language list for Pronunciation Assessment](../../language-support.md?tabs=pronunciation-assessment).
32+
- Speech [Pronunciation Assessment](../../how-to-pronunciation-assessment.md) now supports 18 languages generally available, with six more languages available in public preview. For more information, see the full [language list for Pronunciation Assessment](../../language-support.md?tabs=pronunciation-assessment).
1433

1534
| Language | Locale (BCP-47) |
1635
|--|--|
@@ -41,7 +60,7 @@ ms.author: eur
4160

4261
<sup>1</sup> The language is in public preview for pronunciation assessment.
4362

44-
- We are excited to announce that Pronunciation Assessment is introducing new features starting November 1, 2023: Prosody, Grammar, Vocabulary, and Topic. These enhancements aim to provide an even more comprehensive language learning experience for both reading and speaking assessments. Explore further details in the [How to use pronunciation assessment](../../how-to-pronunciation-assessment.md) and [Pronunciation assessment in Speech Studio](../../pronunciation-assessment-tool.md).
63+
- We're excited to announce that Pronunciation Assessment is introducing new features starting November 1, 2023: Prosody, Grammar, Vocabulary, and Topic. These enhancements aim to provide an even more comprehensive language learning experience for both reading and speaking assessments. Explore further details in the [How to use pronunciation assessment](../../how-to-pronunciation-assessment.md) and [Pronunciation assessment in Speech Studio](../../pronunciation-assessment-tool.md).
4564

4665
### September 2023 release
4766

@@ -99,7 +118,7 @@ Speech to text supports two new locales as shown in the following table. Refer t
99118

100119
<sup>1</sup> The language is in public preview for pronunciation assessment.
101120

102-
### May 2023 release
121+
### might 2023 release
103122

104123
#### Pronunciation Assessment
105124

articles/ai-services/speech-service/includes/release-notes/release-notes-tts.md

Lines changed: 27 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,30 @@ For more information, see [text to speech avatar](../../text-to-speech-avatar/wh
2424

2525
#### Custom neural voice
2626

27-
- Added support for the 24 new locales for cross-lingual voice. See the [full language list](../../language-support.md?tabs=tts#custom-neural-voice) for more information.
27+
Added support for the 24 new locales for cross-lingual voice. See the [full language list](../../language-support.md?tabs=tts#custom-neural-voice) for more information.
28+
29+
#### Prebuilt neural voice
30+
Introducing new voices for public preview:
31+
32+
| Locale (BCP-47) | Language | Text to speech voices |
33+
| ----- | ----- | ----- |
34+
| `de-DE` | German (Germany) | `SeraphinaNeural` (Female) |
35+
| `es-ES` | Spanish (Spain) | `XimenaNeural` (Female) |
36+
| `fr-CA` | French (Canada) | `ThierryNeural` (Male) |
37+
| `fr-FR` | French (France) | `VivienneNeural` (Female) |
38+
| `it-IT` | Italian (Italy) | `GiuseppeNeural` (Male) |
39+
| `ko-KR` | Korean (Korea) | `HyunsuNeural` (Male) |
40+
| `pt-BR` | Portuguese (Brazil) | `ThalitaNeural` (Female) |
41+
42+
Models updated with bugs fixed and quality improvement:
43+
44+
| Locale (BCP-47) | Language | Text to speech voices |
45+
| ----- | ----- | ----- |
46+
| `es-ES` | Spanish (Spain) | `AlvaroNeural` (Male) |
47+
| `en-GB` | English (United Kingdom) | `RyanNeural` (Male) |
48+
| `ko-KR` | Korean (Korea) | `InjoonNeural` (Male) |
49+
50+
See the [full language and voice list](../../language-support.md?tabs=tts#custom-neural-voice) for more information.
2851

2952
### October 2023 release
3053

@@ -99,7 +122,7 @@ Introducing new features in public preview for below voices:
99122
#### Audio Content Creation
100123

101124
- All prebuilt voices with speaking styles and multi-style custom voices support style degree adjustment.
102-
- Now you can fix the pronunciation of a word by simply speaking the word and recording it. The phonemes can be automatically recognized from your recording. The **Recognize by speaking** feature is now in public previw.
125+
- Now you can fix the pronunciation of a word by speaking the word and recording it. The phonemes can be automatically recognized from your recording. The **Recognize by speaking** feature is now in public preview.
103126

104127
### April 2023 release
105128

@@ -121,7 +144,7 @@ For more information, see the [language and voice list](../../language-support.m
121144

122145
#### New features
123146

124-
Speech Synthesis Markup Language (SSML) has been updated to support audio effect processor elements that optimize the quality of the synthesized speech output for specific scenarios on devices. Learn more at [speech synthesis markup](../../speech-synthesis-markup-voice.md#use-voice-elements).
147+
Speech Synthesis Markup Language (SSML) is updated to support audio effect processor elements that optimize the quality of the synthesized speech output for specific scenarios on devices. Learn more at [speech synthesis markup](../../speech-synthesis-markup-voice.md#use-voice-elements).
125148

126149
#### Custom neural voice
127150

@@ -158,7 +181,7 @@ The following voices are now generally available. See the [full language and voi
158181

159182
#### Batch synthesis REST API (Preview)
160183

161-
The Batch synthesis API is currently in public preview. Once it's generally available, the Long Audio API will be deprecated. For more information, see [Migrate to batch synthesis API](../../migrate-to-batch-synthesis.md).
184+
The Batch synthesis API is currently in public preview. Once it's generally available, the Long Audio API is deprecated. For more information, see [Migrate to batch synthesis API](../../migrate-to-batch-synthesis.md).
162185

163186
### November 2022 release
164187

articles/ai-services/speech-service/sovereign-clouds.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,10 @@ Available to US government entities and their partners only. See more informatio
3434
- Neural voice
3535
- Speech translation
3636
- **Unsupported features:**
37-
- Custom Voice
38-
- Custom Commands
37+
- Custom commands
38+
- Custom neural voice
39+
- Personal voice
40+
- Text to speech avatar
3941
- **Supported languages:**
4042
- See the list of supported languages [here](language-support.md)
4143

articles/ai-services/speech-service/speech-services-quotas-and-limits.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ You can use real-time speech to text with the [Speech SDK](speech-sdk.md) or the
4141
|--|--|--|
4242
| Concurrent request limit - base model endpoint | 1 <br/><br/>This limit isn't adjustable. | 100 (default value)<br/><br/>The rate is adjustable for Standard (S0) resources. See [additional explanations](#detailed-description-quota-adjustment-and-best-practices), [best practices](#general-best-practices-to-mitigate-throttling-during-autoscaling), and [adjustment instructions](#speech-to-text-increase-real-time-speech-to-text-concurrent-request-limit). |
4343
| Concurrent request limit - custom endpoint | 1 <br/><br/>This limit isn't adjustable. | 100 (default value)<br/><br/>The rate is adjustable for Standard (S0) resources. See [additional explanations](#detailed-description-quota-adjustment-and-best-practices), [best practices](#general-best-practices-to-mitigate-throttling-during-autoscaling), and [adjustment instructions](#speech-to-text-increase-real-time-speech-to-text-concurrent-request-limit). |
44+
| Max audio length for [real-time diarization](./get-started-stt-diarization.md). | N/A | 240 minutes per file |
4445

4546
#### Batch transcription
4647

0 commit comments

Comments
 (0)