Skip to content

Commit c4c75aa

Browse files
authored
Merge pull request #190548 from sally-baolian/TTS_Voice_Container
Standard TTS Container
2 parents e2a62e9 + 2d6c640 commit c4c75aa

File tree

5 files changed

+13
-783
lines changed

5 files changed

+13
-783
lines changed

articles/cognitive-services/Speech-Service/how-to-migrate-to-custom-neural-voice.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,12 @@ ms.author: v-baolianzou
1919
2020
The custom neural voice lets you build higher-quality voice models while requiring less data. You can develop more realistic, natural, and conversational voices. Your customers and end users will benefit from the latest Text-to-Speech technology, in a responsible way.
2121

22-
|Custom voice |Custom neural voice |
22+
|Custom voice |Custom neural voice |
2323
|--|--|
2424
| The standard, or "traditional," method of custom voice breaks down spoken language into phonetic snippets that can be remixed and matched using classical programming or statistical methods. | Custom neural voice synthesizes speech using deep neural networks that have "learned" the way phonetics are combined in natural human speech rather than using classical programming or statistical methods.|
25-
| Custom voice requires a large volume of voice data to produce a more human-like voice model. With fewer recorded lines, a standard custom voice model will tend to sound more obviously robotic. |The custom neural voice capability enables you to create a unique brand voice in multiple languages and styles by using a small set of recordings.|
25+
| Custom voice<sup>1</sup> requires a large volume of voice data to produce a more human-like voice model. With fewer recorded lines, a standard custom voice model will tend to sound more obviously robotic. |The custom neural voice capability enables you to create a unique brand voice in multiple languages and styles by using a small set of recordings.|
26+
27+
<sup>1</sup> When creating a custom voice model, the maximum number of data files allowed to be imported per subscription is 10 .zip files for free subscription (F0) users, and 500 for standard subscription (S0) users.
2628

2729
## Action required
2830

@@ -41,7 +43,7 @@ Before you can migrate to custom neural voice, your [application](https://aka.ms
4143
3. After the custom neural voice model is created, deploy the voice model to a new endpoint. To create a new custom voice endpoint with your neural voice model, go to **Text-to-Speech > Custom Voice > Deploy model**. Select **Deploy models** and enter a **Name** and **Description** for your custom endpoint. Then select the custom neural voice model you would like to associate with this endpoint and confirm the deployment.
4244
4. Update your code in your apps if you have created a new endpoint with a new model.
4345

44-
## Custom voice details (retired)
46+
## Custom voice details (deprecated)
4547

4648
Read the following sections for details on custom voice.
4749

@@ -90,6 +92,7 @@ If you've created a custom voice font, use the endpoint that you've created. You
9092
| West US | `https://westus.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}` |
9193
| West US 2 | `https://westus2.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}` |
9294

95+
9396
## Next steps
9497

9598
> [!div class="nextstepaction"]

articles/cognitive-services/Speech-Service/how-to-migrate-to-prebuilt-neural-voice.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ The prebuilt neural voice provides more natural sounding speech output, and thus
3535
1. Review the [price](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/) structure and listen to the neural voice [samples](https://azure.microsoft.com/services/cognitive-services/text-to-speech/#overview) at the bottom of that page to determine the right voice for your business needs.
3636
2. To make the change, [follow the sample code](speech-synthesis-markup.md#choose-a-voice-for-text-to-speech) to update the voice name in your speech synthesis request to the supported neural voice names in chosen languages. Please use neural voices for your speech synthesis request, on cloud or on prem. For on-prem container, please use the [neural voice containers](../containers/container-image-tags.md) and follow the [instructions](speech-container-howto.md).
3737

38-
## Standard voice details (retired)
38+
## Standard voice details (deprecated)
3939

4040
Read the following sections for details on standard voice.
4141

articles/cognitive-services/Speech-Service/includes/text-to-speech-container-query-endpoint.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ ms.author: eur
1212

1313
The container provides [REST-based endpoint APIs](../rest-text-to-speech.md). Many [sample source code projects](https://github.com/Azure-Samples/Cognitive-Speech-TTS) for platform, framework, and language variations are available.
1414

15-
With the standard or neural text-to-speech containers, you should rely on the locale and voice of the image tag you downloaded. For example, if you downloaded the `latest` tag, the default locale is `en-US` and the `AriaNeural` voice. The `{VOICE_NAME}` argument would then be [`en-US-AriaNeural`](../language-support.md#prebuilt-neural-voices). See the following example SSML:
15+
With the neural Text-to-Speech containers, you should rely on the locale and voice of the image tag you downloaded. For example, if you downloaded the `latest` tag, the default locale is `en-US` and the `AriaNeural` voice. The `{VOICE_NAME}` argument would then be [`en-US-AriaNeural`](../language-support.md#prebuilt-neural-voices). See the following example SSML:
1616

1717
```xml
1818
<speak version="1.0" xml:lang="en-US">

articles/cognitive-services/Speech-Service/speech-container-howto.md

Lines changed: 2 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@ With Speech containers, you can build a speech application architecture that's o
2727
|--|--|--|--|
2828
| Speech-to-text | Analyzes sentiment and transcribes continuous real-time speech or batch audio recordings with intermediate results. | 3.0.0 | Generally available |
2929
| Custom speech-to-text | Using a custom model from the [Custom Speech portal](https://speech.microsoft.com/customspeech), transcribes continuous real-time speech or batch audio recordings into text with intermediate results. | 3.0.0 | Generally available |
30-
| Text-to-speech | Converts text to natural-sounding speech with plain text input or Speech Synthesis Markup Language (SSML). | 1.15.0 | Generally available |
3130
| Speech language identification | Detects the language spoken in audio files. | 1.5.0 | Preview |
3231
| Neural text-to-speech | Converts text to natural-sounding speech by using deep neural network technology, which allows for more natural synthesized speech. | 2.0.0 | Generally available |
3332

@@ -58,7 +57,6 @@ The following table describes the minimum and recommended allocation of resource
5857
|-----------|---------|-------------|
5958
| Speech-to-text | 2 core, 2-GB memory | 4 core, 4-GB memory |
6059
| Custom speech-to-text | 2 core, 2-GB memory | 4 core, 4-GB memory |
61-
| Text-to-speech | 1 core, 2-GB memory | 2 core, 3-GB memory |
6260
| Speech language identification | 1 core, 1-GB memory | 1 core, 1-GB memory |
6361
| Neural text-to-speech | 6 core, 12-GB memory | 8 core, 16-GB memory |
6462

@@ -101,12 +99,6 @@ Container images for Speech are available in the following container registry.
10199
|-----------|------------|
102100
| Custom speech-to-text | `mcr.microsoft.com/azure-cognitive-services/speechservices/custom-speech-to-text:latest` |
103101

104-
# [Text-to-speech](#tab/tts)
105-
106-
| Container | Repository |
107-
|-----------|------------|
108-
| Text-to-speech | `mcr.microsoft.com/azure-cognitive-services/speechservices/text-to-speech:latest` |
109-
110102
# [Neural text-to-speech](#tab/ntts)
111103

112104
| Container | Repository |
@@ -170,38 +162,6 @@ docker pull mcr.microsoft.com/azure-cognitive-services/speechservices/custom-spe
170162
> [!NOTE]
171163
> The `locale` and `voice` for custom Speech containers is determined by the custom model ingested by the container.
172164
173-
# [Text-to-speech](#tab/tts)
174-
175-
#### Docker pull for the text-to-speech container
176-
177-
Use the [docker pull](https://docs.docker.com/engine/reference/commandline/pull/) command to download a container image from Microsoft Container Registry:
178-
179-
```Docker
180-
docker pull mcr.microsoft.com/azure-cognitive-services/speechservices/text-to-speech:latest
181-
```
182-
183-
> [!IMPORTANT]
184-
> The `latest` tag pulls the `en-US` locale and `ariarus` voice. For more locales, see [Text-to-speech locales](#text-to-speech-locales).
185-
186-
#### Text-to-speech locales
187-
188-
All tags, except for `latest`, are in the following format and are case sensitive:
189-
190-
```
191-
<major>.<minor>.<patch>-<platform>-<locale>-<voice>-<prerelease>
192-
```
193-
194-
The following tag is an example of the format:
195-
196-
```
197-
1.8.0-amd64-en-us-ariarus
198-
```
199-
200-
For all the supported locales and corresponding voices of the text-to-speech container, see [Text-to-speech image tags](../containers/container-image-tags.md#text-to-speech).
201-
202-
> [!IMPORTANT]
203-
> When you construct a text-to-speech HTTP POST, the [SSML](speech-synthesis-markup.md) message requires a `voice` element with a `name` attribute. The value is the corresponding container locale and voice, which is also known as the [short name](how-to-migrate-to-prebuilt-neural-voice.md). For example, the `latest` tag would have a voice name of `en-US-AriaRUS`.
204-
205165
# [Neural text-to-speech](#tab/ntts)
206166

207167
#### Docker pull for the neural text-to-speech container
@@ -456,25 +416,6 @@ Checking available base model for en-us
456416

457417
Starting in v2.5.0 of the custom-speech-to-text container, you can get custom pronunciation results in the output. All you need to do is have your own custom pronunciation rules set up in your custom model and mount the model to a custom-speech-to-text container.
458418

459-
# [Text-to-speech](#tab/tts)
460-
461-
To run the standard text-to-speech container, execute the following `docker run` command:
462-
463-
```bash
464-
docker run --rm -it -p 5000:5000 --memory 2g --cpus 1 \
465-
mcr.microsoft.com/azure-cognitive-services/speechservices/text-to-speech \
466-
Eula=accept \
467-
Billing={ENDPOINT_URI} \
468-
ApiKey={API_KEY}
469-
```
470-
471-
This command:
472-
473-
* Runs a standard text-to-speech container from the container image.
474-
* Allocates 1 CPU core and 2 GB of memory.
475-
* Exposes TCP port 5000 and allocates a pseudo-TTY for the container.
476-
* Automatically removes the container after it exits. The container image is still available on the host computer.
477-
478419
# [Neural text-to-speech](#tab/ntts)
479420

480421
To run the neural text-to-speech container, execute the following `docker run` command:
@@ -534,7 +475,7 @@ Increasing the number of concurrent calls can affect reliability and latency. Fo
534475
| Containers | SDK Host URL | Protocol |
535476
|--|--|--|
536477
| Standard speech-to-text and custom speech-to-text | `ws://localhost:5000` | WS |
537-
| Text-to-speech (including standard and neural), Speech language identification | `http://localhost:5000` | HTTP |
478+
| Neural Text-to-speech, Speech language identification | `http://localhost:5000` | HTTP |
538479

539480
For more information on using WSS and HTTPS protocols, see [Container security](../cognitive-services-container-support.md#azure-cognitive-services-container-security).
540481

@@ -660,7 +601,7 @@ speech_config.set_service_property(
660601
)
661602
```
662603

663-
### Text-to-speech (standard and neural)
604+
### Neural Text-to-Speech
664605

665606
[!INCLUDE [Query Text-to-speech container endpoint](includes/text-to-speech-container-query-endpoint.md)]
666607

@@ -699,8 +640,6 @@ In this article, you learned concepts and workflow for how to download, install,
699640
* Speech provides four Linux containers for Docker that have various capabilities:
700641
* Speech-to-text
701642
* Custom speech-to-text
702-
* Text-to-speech
703-
* Custom text-to-speech
704643
* Neural text-to-speech
705644
* Speech language identification
706645
* Container images are downloaded from the container registry in Azure.

0 commit comments

Comments
 (0)