You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/speech-container-howto.md
+46-3Lines changed: 46 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,8 +25,8 @@ With Speech containers, you can build a speech application architecture that's o
25
25
26
26
| Container | Features | Latest | Release status |
27
27
|--|--|--|--|
28
-
| Speech-to-text | Analyzes sentiment and transcribes continuous real-time speech or batch audio recordings with intermediate results. | 3.0.0 | Generally available |
29
-
| Custom speech-to-text | Using a custom model from the [Custom Speech portal](https://speech.microsoft.com/customspeech), transcribes continuous real-time speech or batch audio recordings into text with intermediate results. | 3.0.0 | Generally available |
28
+
| Speech-to-text | Analyzes sentiment and transcribes continuous real-time speech or batch audio recordings with intermediate results. | 3.1.0 | Generally available |
29
+
| Custom speech-to-text | Using a custom model from the [Custom Speech portal](https://speech.microsoft.com/customspeech), transcribes continuous real-time speech or batch audio recordings into text with intermediate results. | 3.1.0 | Generally available |
30
30
| Speech language identification | Detects the language spoken in audio files. | 1.5.0 | Preview |
31
31
| Neural text-to-speech | Converts text to natural-sounding speech by using deep neural network technology, which allows for more natural synthesized speech. | 2.0.0 | Generally available |
32
32
@@ -378,7 +378,7 @@ This command:
378
378
379
379
#### Base model download on the custom speech-to-text container
380
380
381
-
Starting in v2.6.0 of the custom-speech-to-text container, you can get the available base model information by using option `BaseModelLocale=<locale>`. This option gives you a list of available base models on that locale under your billing account. For example:
381
+
Starting in v2.6.0 of the custom-speech-to-text container, you can get the available base model information by using option `BaseModelLocale={LOCALE}`. This option gives you a list of available base models on that locale under your billing account. For example:
382
382
383
383
```bash
384
384
docker run --rm -it \
@@ -412,6 +412,49 @@ Checking available base model for en-us
412
412
2020/10/30 21:54:21 [Fatal] Please run this tool again and assign --modelId '<one above base model id>'. If no model id listed above, it means currently there is no available base model for en-us
413
413
```
414
414
415
+
#### Display model download on the custom speech-to-text container
416
+
Starting in v3.1.0 of the custom-speech-to-text container, you can get the available display models information and choose to download those models into your speech-to-text container to get highly improved final display output.
417
+
418
+
You can query or download any or all of these display model types: Rescoring (`Rescore`), Punctuation (`Punct`), resegmentation (`Resegment`), and wfstitn (`Wfstitn`). Otherwise, you can use the `FullDisplay` option (and omit the other types) to query or download all types of display models.
419
+
420
+
Set the `BaseModelLocale` to query the latest available display model on the target locale. If you include multiple display model types, the command will return the latest available display model for each type. For example:
FullDisplay Punct Rescore Resegment Wfstitn \ # Space separated list of display model types
426
+
BaseModelLocale={LOCALE} \
427
+
Eula=accept \
428
+
Billing={ENDPOINT_URI} \
429
+
ApiKey={API_KEY}
430
+
```
431
+
432
+
Set the `DisplayLocale` to download the latest available display model on the target locale. When you set `DisplayLocale`, you must also include a space separated list of one or more display model types. If you include multiple display model types, the command will return the latest available display model for each type. For example:
> If you set more than one query or download parameter, the command will prioritize in this order: `BaseModelLocale`, `ModelId`, and `DisplayLocale` (only applicable for display models).
457
+
415
458
#### Custom pronunciation on the custom speech-to-text container
416
459
417
460
Starting in v2.5.0 of the custom-speech-to-text container, you can get custom pronunciation results in the output. All you need to do is have your own custom pronunciation rules set up in your custom model and mount the model to a custom-speech-to-text container.
0 commit comments