You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/includes/release-notes/release-notes-stt.md
+22-3Lines changed: 22 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,9 +8,28 @@ ms.author: eur
8
8
9
9
### November 2023 release
10
10
11
+
#### Speech To text models update
12
+
13
+
We're excited to introduce a significant update to our speech models, promising enhanced accuracy, improved readability, and refined entity recognition. This upgrade comes with a robust new structure, bolstered by an expanded training dataset, ensuring a marked advancement in overall performance. It includes newly released models for en-US, zh-CN, ja-JP, it-IT, pt-BR, es-MX, es-ES, fr-FR, de-DE, ko-KR, tr-TR, sv-SE, and he-IL.
14
+
15
+
Highlights:
16
+
- Better accuracy with new model structure: The redefined model structure, coupled with a richer training dataset, elevates accuracy levels, promising more precise speech output.
17
+
- Readability improvement: Our latest model brings a substantial boost to readability, enhancing the coherence and clarity of spoken content.
18
+
- Advanced entity recognition: Entity recognition receives a substantial upgrade, resulting in more accurate and nuanced results.
19
+
20
+
Potential impacts: Despite these advancements, it's crucial to be mindful of potential impacts:
21
+
- Custom Silence Timeout Feature: Users employing custom silence timeout, especially with low settings, might encounter over-segmentation and potential omissions of single-word phrases.
22
+
- The new model might exhibit compatibility issues with the Keyword prefix feature, and users are advised to assess its performance in their specific applications.
23
+
- Reduced disfluency words or phrases: Users might notice a reduction in disfluency words or phrases like "um" or "uh" in the speech output.
24
+
- Inaccuracies in word timestamp duration: Some disfluency words might display inaccuracies in timestamp duration, requiring attention in applications dependent on precise timing.
25
+
- Confidence score distribution variance: Users relying on confidence scores and associated thresholds should be aware of potential variations in distribution, necessitating adjustments for optimal performance.
26
+
- The accuracy enhancement of the phrase list feature might be affected by the misrecognition of certain phrases.
27
+
28
+
We encourage you to explore these improvements and consider potential issues for a seamless transition, and as always, your feedback is instrumental in refining and advancing our services.
29
+
11
30
#### Pronunciation Assessment
12
31
13
-
- Speech [Pronunciation Assessment](../../how-to-pronunciation-assessment.md) now supports 18 languages generally available, with 6 additional languages available in public preview. For more information, see the full [language list for Pronunciation Assessment](../../language-support.md?tabs=pronunciation-assessment).
32
+
- Speech [Pronunciation Assessment](../../how-to-pronunciation-assessment.md) now supports 18 languages generally available, with six more languages available in public preview. For more information, see the full [language list for Pronunciation Assessment](../../language-support.md?tabs=pronunciation-assessment).
14
33
15
34
| Language | Locale (BCP-47) |
16
35
|--|--|
@@ -41,7 +60,7 @@ ms.author: eur
41
60
42
61
<sup>1</sup> The language is in public preview for pronunciation assessment.
43
62
44
-
- We are excited to announce that Pronunciation Assessment is introducing new features starting November 1, 2023: Prosody, Grammar, Vocabulary, and Topic. These enhancements aim to provide an even more comprehensive language learning experience for both reading and speaking assessments. Explore further details in the [How to use pronunciation assessment](../../how-to-pronunciation-assessment.md) and [Pronunciation assessment in Speech Studio](../../pronunciation-assessment-tool.md).
63
+
- We're excited to announce that Pronunciation Assessment is introducing new features starting November 1, 2023: Prosody, Grammar, Vocabulary, and Topic. These enhancements aim to provide an even more comprehensive language learning experience for both reading and speaking assessments. Explore further details in the [How to use pronunciation assessment](../../how-to-pronunciation-assessment.md) and [Pronunciation assessment in Speech Studio](../../pronunciation-assessment-tool.md).
45
64
46
65
### September 2023 release
47
66
@@ -99,7 +118,7 @@ Speech to text supports two new locales as shown in the following table. Refer t
99
118
100
119
<sup>1</sup> The language is in public preview for pronunciation assessment.
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/includes/release-notes/release-notes-tts.md
+27-4Lines changed: 27 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,7 +24,30 @@ For more information, see [text to speech avatar](../../text-to-speech-avatar/wh
24
24
25
25
#### Custom neural voice
26
26
27
-
- Added support for the 24 new locales for cross-lingual voice. See the [full language list](../../language-support.md?tabs=tts#custom-neural-voice) for more information.
27
+
Added support for the 24 new locales for cross-lingual voice. See the [full language list](../../language-support.md?tabs=tts#custom-neural-voice) for more information.
28
+
29
+
#### Prebuilt neural voice
30
+
Introducing new voices for public preview:
31
+
32
+
| Locale (BCP-47) | Language | Text to speech voices |
33
+
| ----- | ----- | ----- |
34
+
|`de-DE`| German (Germany) |`SeraphinaNeural` (Female) |
|`en-GB`| English (United Kingdom) |`RyanNeural` (Male) |
48
+
|`ko-KR`| Korean (Korea) |`InjoonNeural` (Male) |
49
+
50
+
See the [full language and voice list](../../language-support.md?tabs=tts#custom-neural-voice) for more information.
28
51
29
52
### October 2023 release
30
53
@@ -99,7 +122,7 @@ Introducing new features in public preview for below voices:
99
122
#### Audio Content Creation
100
123
101
124
- All prebuilt voices with speaking styles and multi-style custom voices support style degree adjustment.
102
-
- Now you can fix the pronunciation of a word by simply speaking the word and recording it. The phonemes can be automatically recognized from your recording. The **Recognize by speaking** feature is now in public previw.
125
+
- Now you can fix the pronunciation of a word by speaking the word and recording it. The phonemes can be automatically recognized from your recording. The **Recognize by speaking** feature is now in public preview.
103
126
104
127
### April 2023 release
105
128
@@ -121,7 +144,7 @@ For more information, see the [language and voice list](../../language-support.m
121
144
122
145
#### New features
123
146
124
-
Speech Synthesis Markup Language (SSML) has been updated to support audio effect processor elements that optimize the quality of the synthesized speech output for specific scenarios on devices. Learn more at [speech synthesis markup](../../speech-synthesis-markup-voice.md#use-voice-elements).
147
+
Speech Synthesis Markup Language (SSML) is updated to support audio effect processor elements that optimize the quality of the synthesized speech output for specific scenarios on devices. Learn more at [speech synthesis markup](../../speech-synthesis-markup-voice.md#use-voice-elements).
125
148
126
149
#### Custom neural voice
127
150
@@ -158,7 +181,7 @@ The following voices are now generally available. See the [full language and voi
158
181
159
182
#### Batch synthesis REST API (Preview)
160
183
161
-
The Batch synthesis API is currently in public preview. Once it's generally available, the Long Audio API will be deprecated. For more information, see [Migrate to batch synthesis API](../../migrate-to-batch-synthesis.md).
184
+
The Batch synthesis API is currently in public preview. Once it's generally available, the Long Audio API is deprecated. For more information, see [Migrate to batch synthesis API](../../migrate-to-batch-synthesis.md).
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/speech-services-quotas-and-limits.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,6 +41,7 @@ You can use real-time speech to text with the [Speech SDK](speech-sdk.md) or the
41
41
|--|--|--|
42
42
| Concurrent request limit - base model endpoint | 1 <br/><br/>This limit isn't adjustable. | 100 (default value)<br/><br/>The rate is adjustable for Standard (S0) resources. See [additional explanations](#detailed-description-quota-adjustment-and-best-practices), [best practices](#general-best-practices-to-mitigate-throttling-during-autoscaling), and [adjustment instructions](#speech-to-text-increase-real-time-speech-to-text-concurrent-request-limit). |
43
43
| Concurrent request limit - custom endpoint | 1 <br/><br/>This limit isn't adjustable. | 100 (default value)<br/><br/>The rate is adjustable for Standard (S0) resources. See [additional explanations](#detailed-description-quota-adjustment-and-best-practices), [best practices](#general-best-practices-to-mitigate-throttling-during-autoscaling), and [adjustment instructions](#speech-to-text-increase-real-time-speech-to-text-concurrent-request-limit). |
44
+
| Max audio length for [real-time diarization](./get-started-stt-diarization.md). | N/A | 240 minutes per file |
0 commit comments