Merge pull request #192392 from eric-urban/eur/custom-stt-container

PRMerger12 · web-flow · commit ff4222d8c162 · 2022-03-22T06:51:33.000-07:00
suggestions for PR 191965
diff --git a/articles/cognitive-services/Speech-Service/speech-container-howto.md b/articles/cognitive-services/Speech-Service/speech-container-howto.md
@@ -25,8 +25,8 @@ With Speech containers, you can build a speech application architecture that's o
 
 | Container | Features | Latest | Release status |
 |--|--|--|--|
-| Speech-to-text | Analyzes sentiment and transcribes continuous real-time speech or batch audio recordings with intermediate results.  | 3.0.0 | Generally available |
-| Custom speech-to-text | Using a custom model from the [Custom Speech portal](https://speech.microsoft.com/customspeech), transcribes continuous real-time speech or batch audio recordings into text with intermediate results. | 3.0.0 | Generally available |
+| Speech-to-text | Analyzes sentiment and transcribes continuous real-time speech or batch audio recordings with intermediate results.  | 3.1.0 | Generally available |
+| Custom speech-to-text | Using a custom model from the [Custom Speech portal](https://speech.microsoft.com/customspeech), transcribes continuous real-time speech or batch audio recordings into text with intermediate results. | 3.1.0 | Generally available |
 | Speech language identification | Detects the language spoken in audio files. | 1.5.0 | Preview |
 | Neural text-to-speech | Converts text to natural-sounding speech by using deep neural network technology, which allows for more natural synthesized speech. | 2.0.0 | Generally available |
 
@@ -378,7 +378,7 @@ This command:
 
 #### Base model download on the custom speech-to-text container
 
-Starting in v2.6.0 of the custom-speech-to-text container, you can get the available base model information by using option `BaseModelLocale=<locale>`. This option gives you a list of available base models on that locale under your billing account. For example:
+Starting in v2.6.0 of the custom-speech-to-text container, you can get the available base model information by using option `BaseModelLocale={LOCALE}`. This option gives you a list of available base models on that locale under your billing account. For example:
 
 ```bash
 docker run --rm -it \
@@ -412,6 +412,49 @@ Checking available base model for en-us
 2020/10/30 21:54:21 [Fatal] Please run this tool again and assign --modelId '<one above base model id>'. If no model id listed above, it means currently there is no available base model for en-us
 ```
 
+#### Display model download on the custom speech-to-text container
+Starting in v3.1.0 of the custom-speech-to-text container, you can get the available display models information and choose to download those models into your speech-to-text container to get highly improved final display output. 
+
+You can query or download any or all of these display model types: Rescoring (`Rescore`), Punctuation (`Punct`), resegmentation (`Resegment`), and wfstitn (`Wfstitn`). Otherwise, you can use the `FullDisplay` option (and omit the other types) to query or download all types of display models. 
+
+Set the `BaseModelLocale` to query the latest available display model on the target locale. If you include multiple display model types, the command will return the latest available display model for each type. For example:
+
+```bash
+docker run --rm -it \
+mcr.microsoft.com/azure-cognitive-services/speechservices/custom-speech-to-text \
+FullDisplay Punct Rescore Resegment Wfstitn \   # Space separated list of display model types
+BaseModelLocale={LOCALE} \           
+Eula=accept \
+Billing={ENDPOINT_URI} \
+ApiKey={API_KEY}
+```
+
+Set the `DisplayLocale` to download the latest available display model on the target locale. When you set `DisplayLocale`, you must also include a space separated list of one or more display model types. If you include multiple display model types, the command will return the latest available display model for each type. For example:
+
+```bash
+docker run --rm -it \
+mcr.microsoft.com/azure-cognitive-services/speechservices/custom-speech-to-text \
+FullDisplay Punct Rescore Resegment Wfstitn \   # Space separated list of display model types
+DisplayLocale={LOCALE} \           
+Eula=accept \
+Billing={ENDPOINT_URI} \
+ApiKey={API_KEY}
+```
+
+Set the `ModelId` to download a specific display model. This is the same as how you would download a base model. For example:
+
+```bash
+docker run --rm -it \
+mcr.microsoft.com/azure-cognitive-services/speechservices/custom-speech-to-text \
+ModelId={MODEL_ID} \         
+Eula=accept \
+Billing={ENDPOINT_URI} \
+ApiKey={API_KEY}
+```
+
+> [!NOTE]
+> If you set more than one query or download parameter, the command will prioritize in this order: `BaseModelLocale`, `ModelId`, and `DisplayLocale` (only applicable for display models).
+
 #### Custom pronunciation on the custom speech-to-text container
 
 Starting in v2.5.0 of the custom-speech-to-text container, you can get custom pronunciation results in the output. All you need to do is have your own custom pronunciation rules set up in your custom model and mount the model to a custom-speech-to-text container.