Skip to content

Commit 47c51e4

Browse files
authored
Merge pull request #192123 from eric-urban/docs-editor/text-to-speech-1647615816
trim notes
2 parents 6d7f0f5 + 7898591 commit 47c51e4

File tree

1 file changed

+12
-15
lines changed

1 file changed

+12
-15
lines changed

articles/cognitive-services/Speech-Service/text-to-speech.md

Lines changed: 12 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -16,27 +16,24 @@ keywords: text to speech
1616

1717
# What is text-to-speech?
1818

19-
In this overview, you learn about the benefits and capabilities of the text-to-speech feature of the Speech service, which is part of Azure Cognitive Services.
19+
In this overview, you learn about the benefits and capabilities of the text-to-speech feature of the Speech service, which is part of Azure Cognitive Services.
2020

2121
Text-to-speech enables your applications, tools, or devices to convert text into humanlike synthesized speech. The text-to-speech capability is also known as speech synthesis. Use humanlike prebuilt neural voices out of the box, or create a custom neural voice that's unique to your product or brand. For a full list of supported voices, languages, and locales, see [Language and voice support for the Speech service](language-support.md#text-to-speech).
2222

23-
> [!NOTE]
24-
>
25-
> Bing Speech was decommissioned on October 15, 2019. If your applications, tools, or products are using the Bing Speech APIs or Custom Speech, see [Migrate from Bing Speech to the Speech service](how-to-migrate-from-bing-speech.md).
26-
2723
## Core features
2824

29-
Text-to-speech includes the following features:
25+
Text-to-speech includes the following features:
3026

31-
| Feature| Summary | Demo |
32-
|--------|----|------|
27+
| Feature | Summary | Demo |
28+
| --- | --- | --- |
3329
| Prebuilt neural voice (called *Neural* on the [pricing page](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/)) | Highly natural out-of-the-box voices. Create an Azure account and Speech service subscription, and then use the [Speech SDK](./get-started-text-to-speech.md) or visit the [Speech Studio portal](https://speech.microsoft.com/portal) and select prebuilt neural voices to get started. Check the [pricing details](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/). | Check the [voice samples](https://azure.microsoft.com/services/cognitive-services/text-to-speech/#overview) and determine the right voice for your business needs. |
3430
| Custom neural voice (called *Custom Neural* on the [pricing page](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/)) | Easy-to-use self-service for creating a natural brand voice, with limited access for responsible use. Create an Azure account and Speech service subscription (with the S0 tier), and [apply](https://aka.ms/customneural) to use the custom neural feature. After you've been granted access, visit the [Speech Studio portal](https://speech.microsoft.com/portal) and select **Custom Voice** to get started. Check the [pricing details](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/). | Check the [voice samples](https://aka.ms/customvoice). |
3531

3632
### More about neural text-to-speech features
33+
3734
The text-to-speech feature of the Speech service on Azure has been fully upgraded to the neural text-to-speech engine. This engine uses deep neural networks to make the voices of computers nearly indistinguishable from the recordings of people. With the clear articulation of words, neural text-to-speech significantly reduces listening fatigue when users interact with AI systems.
3835

39-
The patterns of stress and intonation in spoken language are called _prosody_. Traditional text-to-speech systems break down prosody into separate linguistic analysis and acoustic prediction steps that are governed by independent models. That can result in muffled, buzzy voice synthesis.
36+
The patterns of stress and intonation in spoken language are called _prosody_. Traditional text-to-speech systems break down prosody into separate linguistic analysis and acoustic prediction steps that are governed by independent models. That can result in muffled, buzzy voice synthesis.
4037

4138
Here's more information about neural text-to-speech features in the Speech service, and how they overcome the limits of traditional text-to-speech systems:
4239

@@ -48,22 +45,22 @@ Here's more information about neural text-to-speech features in the Speech servi
4845

4946
- Make interactions with chatbots and voice assistants more natural and engaging.
5047
- Convert digital texts such as e-books into audiobooks.
51-
- Enhance in-car navigation systems.
52-
48+
- Enhance in-car navigation systems.
49+
5350
For a full list of platform neural voices, see [Language and voice support for the Speech service](language-support.md#text-to-speech).
5451

55-
* **Fine-tuning text-to-speech output with SSML**: Speech Synthesis Markup Language (SSML) is an XML-based markup language that's used to customize text-to-speech outputs. With SSML, you can adjust pitch, add pauses, improve pronunciation, change speaking rate, adjust volume, and attribute multiple voices to a single document.
52+
* **Fine-tuning text-to-speech output with SSML**: Speech Synthesis Markup Language (SSML) is an XML-based markup language that's used to customize text-to-speech outputs. With SSML, you can adjust pitch, add pauses, improve pronunciation, change speaking rate, adjust volume, and attribute multiple voices to a single document.
5653

5754
You can use SSML to define your own lexicons or switch to different speaking styles. With the [multilingual voices](https://techcommunity.microsoft.com/t5/azure-ai/azure-text-to-speech-updates-at-build-2021/ba-p/2382981), you can also adjust the speaking languages via SSML. To fine-tune the voice output for your scenario, see [Improve synthesis with Speech Synthesis Markup Language](speech-synthesis-markup.md).
5855

59-
* **Visemes**: [Visemes](how-to-speech-synthesis-viseme.md) are the key poses in observed speech, including the position of the lips, jaw, and tongue in producing a particular phoneme. Visemes have a strong correlation with voices and phonemes.
56+
* **Visemes**: [Visemes](how-to-speech-synthesis-viseme.md) are the key poses in observed speech, including the position of the lips, jaw, and tongue in producing a particular phoneme. Visemes have a strong correlation with voices and phonemes.
6057

6158
By using viseme events in Speech SDK, you can generate facial animation data. This data can be used to animate faces in lip-reading communication, education, entertainment, and customer service. Viseme is currently supported only for the `en-US` (US English) [neural voices](language-support.md#text-to-speech).
6259

6360
> [!NOTE]
6461
> We plan to retire the traditional/standard voices and non-neural custom voice in 2024. After that, we'll no longer support them.
65-
>
66-
> If your applications, tools, or products are using any of the standard voices and custom voices, we've created guides to help you migrate to the neural version. For more information, see [Migrate to neural voices](migration-overview-neural-voice.md).
62+
>
63+
> If your applications, tools, or products are using any of the standard voices and custom voices, you must migrate to the neural version. For more information, see [Migrate to neural voices](migration-overview-neural-voice.md).
6764
6865
## Get started
6966

0 commit comments

Comments
 (0)