Merge pull request #190548 from sally-baolian/TTS_Voice_Container

v-shils · web-flow · commit c4c75aa21c04 · 2022-03-16T09:59:16.000-07:00
Standard TTS Container
diff --git a/articles/cognitive-services/Speech-Service/how-to-migrate-to-custom-neural-voice.md b/articles/cognitive-services/Speech-Service/how-to-migrate-to-custom-neural-voice.md
@@ -19,10 +19,12 @@ ms.author: v-baolianzou
 
 The custom neural voice lets you build higher-quality voice models while requiring less data. You can develop more realistic, natural, and conversational voices. Your customers and end users will benefit from the latest Text-to-Speech technology, in a responsible way. 
 
-|Custom voice  |Custom neural voice | 
+|Custom voice |Custom neural voice | 
 |--|--|
 | The standard, or "traditional," method of custom voice breaks down spoken language into phonetic snippets that can be remixed and matched using classical programming or statistical methods.  | Custom neural voice synthesizes speech using deep neural networks that have "learned" the way phonetics are combined in natural human speech rather than using classical programming or statistical methods.|
-| Custom voice requires a large volume of voice data to produce a more human-like voice model. With fewer recorded lines, a standard custom voice model will tend to sound more obviously robotic. |The custom neural voice capability enables you to create a unique brand voice in multiple languages and styles by using a small set of recordings.|
+| Custom voice<sup>1</sup>  requires a large volume of voice data to produce a more human-like voice model. With fewer recorded lines, a standard custom voice model will tend to sound more obviously robotic. |The custom neural voice capability enables you to create a unique brand voice in multiple languages and styles by using a small set of recordings.|
+
+<sup>1</sup> When creating a custom voice model, the maximum number of data files allowed to be imported per subscription is 10 .zip files for free subscription (F0) users, and 500 for standard subscription (S0) users. 
 
 ## Action required
 
@@ -41,7 +43,7 @@ Before you can migrate to custom neural voice, your [application](https://aka.ms
 3. After the custom neural voice model is created, deploy the voice model to a new endpoint. To create a new custom voice endpoint with your neural voice model, go to **Text-to-Speech > Custom Voice > Deploy model**. Select **Deploy models** and enter a **Name** and **Description** for your custom endpoint. Then select the custom neural voice model you would like to associate with this endpoint and confirm the deployment.  
 4. Update your code in your apps if you have created a new endpoint with a new model. 
 
-## Custom voice details (retired)
+## Custom voice details (deprecated)
 
 Read the following sections for details on custom voice.
 
@@ -90,6 +92,7 @@ If you've created a custom voice font, use the endpoint that you've created. You
 | West US | `https://westus.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}` |
 | West US 2 | `https://westus2.voice.speech.microsoft.com/cognitiveservices/v1?deploymentId={deploymentId}` |
 
+
 ## Next steps
 
 > [!div class="nextstepaction"]
diff --git a/articles/cognitive-services/Speech-Service/how-to-migrate-to-prebuilt-neural-voice.md b/articles/cognitive-services/Speech-Service/how-to-migrate-to-prebuilt-neural-voice.md
@@ -35,7 +35,7 @@ The prebuilt neural voice provides more natural sounding speech output, and thus
 1. Review the [price](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/) structure and listen to the neural voice [samples](https://azure.microsoft.com/services/cognitive-services/text-to-speech/#overview) at the bottom of that page to determine the right voice for your business needs.
 2. To make the change, [follow the sample code](speech-synthesis-markup.md#choose-a-voice-for-text-to-speech) to update the voice name in your speech synthesis request to the supported neural voice names in chosen languages. Please use neural voices for your speech synthesis request, on cloud or on prem. For on-prem container, please use the [neural voice containers](../containers/container-image-tags.md) and follow the [instructions](speech-container-howto.md).
 
-## Standard voice details (retired)
+## Standard voice details (deprecated)
 
 Read the following sections for details on standard voice.
 
diff --git a/articles/cognitive-services/Speech-Service/includes/text-to-speech-container-query-endpoint.md b/articles/cognitive-services/Speech-Service/includes/text-to-speech-container-query-endpoint.md
@@ -12,7 +12,7 @@ ms.author: eur
 
 The container provides [REST-based endpoint APIs](../rest-text-to-speech.md). Many [sample source code projects](https://github.com/Azure-Samples/Cognitive-Speech-TTS) for platform, framework, and language variations are available.
 
-With the standard or neural text-to-speech containers, you should rely on the locale and voice of the image tag you downloaded. For example, if you downloaded the `latest` tag, the default locale is `en-US` and the `AriaNeural` voice. The `{VOICE_NAME}` argument would then be [`en-US-AriaNeural`](../language-support.md#prebuilt-neural-voices). See the following example SSML:
+With the neural Text-to-Speech containers, you should rely on the locale and voice of the image tag you downloaded. For example, if you downloaded the `latest` tag, the default locale is `en-US` and the `AriaNeural` voice. The `{VOICE_NAME}` argument would then be [`en-US-AriaNeural`](../language-support.md#prebuilt-neural-voices). See the following example SSML:
 
 ```xml
 <speak version="1.0" xml:lang="en-US">
diff --git a/articles/cognitive-services/Speech-Service/speech-container-howto.md b/articles/cognitive-services/Speech-Service/speech-container-howto.md
@@ -27,7 +27,6 @@ With Speech containers, you can build a speech application architecture that's o
 |--|--|--|--|
 | Speech-to-text | Analyzes sentiment and transcribes continuous real-time speech or batch audio recordings with intermediate results.  | 3.0.0 | Generally available |
 | Custom speech-to-text | Using a custom model from the [Custom Speech portal](https://speech.microsoft.com/customspeech), transcribes continuous real-time speech or batch audio recordings into text with intermediate results. | 3.0.0 | Generally available |
-| Text-to-speech | Converts text to natural-sounding speech with plain text input or Speech Synthesis Markup Language (SSML). | 1.15.0 | Generally available |
 | Speech language identification | Detects the language spoken in audio files. | 1.5.0 | Preview |
 | Neural text-to-speech | Converts text to natural-sounding speech by using deep neural network technology, which allows for more natural synthesized speech. | 2.0.0 | Generally available |
 
@@ -58,7 +57,6 @@ The following table describes the minimum and recommended allocation of resource
 |-----------|---------|-------------|
 | Speech-to-text | 2 core, 2-GB memory | 4 core, 4-GB memory |
 | Custom speech-to-text | 2 core, 2-GB memory | 4 core, 4-GB memory |
-| Text-to-speech | 1 core, 2-GB memory | 2 core, 3-GB memory |
 | Speech language identification | 1 core, 1-GB memory | 1 core, 1-GB memory |
 | Neural text-to-speech | 6 core, 12-GB memory | 8 core, 16-GB memory |
 
@@ -101,12 +99,6 @@ Container images for Speech are available in the following container registry.
 |-----------|------------|
 | Custom speech-to-text | `mcr.microsoft.com/azure-cognitive-services/speechservices/custom-speech-to-text:latest` |
 
-# [Text-to-speech](#tab/tts)
-
-| Container | Repository |
-|-----------|------------|
-| Text-to-speech | `mcr.microsoft.com/azure-cognitive-services/speechservices/text-to-speech:latest` |
-
 # [Neural text-to-speech](#tab/ntts)
 
 | Container | Repository |
@@ -170,38 +162,6 @@ docker pull mcr.microsoft.com/azure-cognitive-services/speechservices/custom-spe
 > [!NOTE]
 > The `locale` and `voice` for custom Speech containers is determined by the custom model ingested by the container.
 
-# [Text-to-speech](#tab/tts)
-
-#### Docker pull for the text-to-speech container
-
-Use the [docker pull](https://docs.docker.com/engine/reference/commandline/pull/) command to download a container image from Microsoft Container Registry:
-
-```Docker
-docker pull mcr.microsoft.com/azure-cognitive-services/speechservices/text-to-speech:latest
-```
-
-> [!IMPORTANT]
-> The `latest` tag pulls the `en-US` locale and `ariarus` voice. For more locales, see [Text-to-speech locales](#text-to-speech-locales).
-
-#### Text-to-speech locales
-
-All tags, except for `latest`, are in the following format and are case sensitive:
-
-```
-<major>.<minor>.<patch>-<platform>-<locale>-<voice>-<prerelease>
-```
-
-The following tag is an example of the format:
-
-```
-1.8.0-amd64-en-us-ariarus
-```
-
-For all the supported locales and corresponding voices of the text-to-speech container, see [Text-to-speech image tags](../containers/container-image-tags.md#text-to-speech).
-
-> [!IMPORTANT]
-> When you construct a text-to-speech HTTP POST, the [SSML](speech-synthesis-markup.md) message requires a `voice` element with a `name` attribute. The value is the corresponding container locale and voice, which is also known as the [short name](how-to-migrate-to-prebuilt-neural-voice.md). For example, the `latest` tag would have a voice name of `en-US-AriaRUS`.
-
 # [Neural text-to-speech](#tab/ntts)
 
 #### Docker pull for the neural text-to-speech container
@@ -456,25 +416,6 @@ Checking available base model for en-us
 
 Starting in v2.5.0 of the custom-speech-to-text container, you can get custom pronunciation results in the output. All you need to do is have your own custom pronunciation rules set up in your custom model and mount the model to a custom-speech-to-text container.
 
-# [Text-to-speech](#tab/tts)
-
-To run the standard text-to-speech container, execute the following `docker run` command:
-
-```bash
-docker run --rm -it -p 5000:5000 --memory 2g --cpus 1 \
-mcr.microsoft.com/azure-cognitive-services/speechservices/text-to-speech \
-Eula=accept \
-Billing={ENDPOINT_URI} \
-ApiKey={API_KEY}
-```
-
-This command:
-
-* Runs a standard text-to-speech container from the container image.
-* Allocates 1 CPU core and 2 GB of memory.
-* Exposes TCP port 5000 and allocates a pseudo-TTY for the container.
-* Automatically removes the container after it exits. The container image is still available on the host computer.
-
 # [Neural text-to-speech](#tab/ntts)
 
 To run the neural text-to-speech container, execute the following `docker run` command:
@@ -534,7 +475,7 @@ Increasing the number of concurrent calls can affect reliability and latency. Fo
 | Containers | SDK Host URL | Protocol |
 |--|--|--|
 | Standard speech-to-text and custom speech-to-text | `ws://localhost:5000` | WS |
-| Text-to-speech (including standard and neural), Speech language identification | `http://localhost:5000` | HTTP |
+| Neural Text-to-speech, Speech language identification | `http://localhost:5000` | HTTP |
 
 For more information on using WSS and HTTPS protocols, see [Container security](../cognitive-services-container-support.md#azure-cognitive-services-container-security).
 
@@ -660,7 +601,7 @@ speech_config.set_service_property(
 )
 ```
 
-### Text-to-speech (standard and neural)
+### Neural Text-to-Speech
 
 [!INCLUDE [Query Text-to-speech container endpoint](includes/text-to-speech-container-query-endpoint.md)]
 
@@ -699,8 +640,6 @@ In this article, you learned concepts and workflow for how to download, install,
 * Speech provides four Linux containers for Docker that have various capabilities:
   * Speech-to-text
   * Custom speech-to-text
-  * Text-to-speech
-  * Custom text-to-speech
   * Neural text-to-speech
   * Speech language identification
 * Container images are downloaded from the container registry in Azure.
diff --git a/articles/cognitive-services/containers/container-image-tags.md b/articles/cognitive-services/containers/container-image-tags.md