Skip to content

Commit cd3e122

Browse files
authored
Merge pull request #111192 from aahill/speech-update
[CogSvcs] Speech-to-text container update
2 parents 0488379 + af2d348 commit cd3e122

File tree

4 files changed

+181
-15
lines changed

4 files changed

+181
-15
lines changed

articles/cognitive-services/Speech-Service/includes/speech-to-text-chart-config.md

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
1010
ms.topic: include
11-
ms.date: 08/22/2019
11+
ms.date: 04/15/2020
1212
ms.author: trbye
1313
---
1414

@@ -34,4 +34,31 @@ To override the "umbrella" chart, add the prefix `speechToText.` on any paramete
3434
| `service.port`| The port of the **speech-to-text** service. | `80` |
3535
| `service.annotations` | The **speech-to-text** annotations for the service metadata. Annotations are key value pairs. <br>`annotations:`<br>&nbsp;&nbsp;`some/annotation1: value1`<br>&nbsp;&nbsp;`some/annotation2: value2` | |
3636
| `service.autoScaler.enabled` | Whether the [Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) is enabled. If `true`, the `speech-to-text-autoscaler` will be deployed in the Kubernetes cluster. | `true` |
37-
| `service.podDisruption.enabled` | Whether the [Pod Disruption Budget](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/) is enabled. If `true`, the `speech-to-text-poddisruptionbudget` will be deployed in the Kubernetes cluster. | `true` |
37+
| `service.podDisruption.enabled` | Whether the [Pod Disruption Budget](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/) is enabled. If `true`, the `speech-to-text-poddisruptionbudget` will be deployed in the Kubernetes cluster. | `true` |
38+
39+
#### Sentiment analysis (sub-chart: charts/speechToText)
40+
41+
Starting with v2.2.0 of the speech-to-text container, the following parameters are used for sentiment analysis using the Text Analytics API.
42+
43+
|Parameter|Description|Values|Default|
44+
| --- | --- | --- | --- |
45+
|`textanalytics.enabled`| Whether the **text-analytics** service is enabled| true/false| `false`|
46+
|`textanalytics.image.registry`| The **text-analytics** docker image registry| valid docker image registry| |
47+
|`textanalytics.image.repository`| The **text-analytics** docker image repository| valid docker image repository| |
48+
|`textanalytics.image.tag`| The **text-analytics** docker image tag| valid docker image tag| |
49+
|`textanalytics.image.pullSecrets`| The image secrets for pulling **text-analytics** docker image| valid secrets name| |
50+
|`textanalytics.image.pullByHash`| Specifies if you are pulling docker image by hash. If `yes`, `image.hash` is required to have as well. If `no`, set it as 'false'. Default is `false`.| true/false| `false`|
51+
|`textanalytics.image.hash`| The **text-analytics** docker image hash. Only use it with `image.pullByHash:true`.| valid docker image hash | |
52+
|`textanalytics.image.args.eula`| One of the required arguments by **text-analytics** container, which indicates you've accepted the license. The value of this option must be: `accept`.| `accept`, if you want to use the container | |
53+
|`textanalytics.image.args.billing`| One of the required arguments by **text-analytics** container, which specifies the billing endpoint URI. The billing endpoint URI value is available on the Azure portal's Speech Overview page.|valid billing endpoint URI||
54+
|`textanalytics.image.args.apikey`| One of the required arguments by **text-analytics** container, which is used to track billing information.| valid apikey||
55+
|`textanalytics.cpuRequest`| The requested CPU for **text-analytics** container| int| `3000m`|
56+
|`textanalytics.cpuLimit`| The limited CPU for **text-analytics** container| | `8000m`|
57+
|`textanalytics.memoryRequest`| The requested memory for **text-analytics** container| | `3Gi`|
58+
|`textanalytics.memoryLimit`| The limited memory for **text-analytics** container| | `8Gi`|
59+
|`textanalytics.service.sentimentURISuffix`| The sentiment analysis URI suffix, the whole URI is in format "http://`<service>`:`<port>`/`<sentimentURISuffix>`". | | `text/analytics/v3.0-preview/sentiment`|
60+
|`textanalytics.service.type`| The type of **text-analytics** service in Kubernetes. See [Kubernetes service types](https://kubernetes.io/docs/concepts/services-networking/service/) | valid Kubernetes service type | `LoadBalancer` |
61+
|`textanalytics.service.port`| The port of the **text-analytics** service| int| `50085`|
62+
|`textanalytics.service.annotations`| The annotations users can add to **text-analytics** service metadata. For instance:<br/> **annotations:**<br/>` ` **some/annotation1: value1**<br/>` ` **some/annotation2: value2** | annotations, one per each line| |
63+
|`textanalytics.serivce.autoScaler.enabled`| Whether [Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) is enabled. If enabled, `text-analytics-autoscaler` will be deployed in the Kubernetes cluster | true/false| `true`|
64+
|`textanalytics.service.podDisruption.enabled`| Whether [Pod Disruption Budget](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/) is enabled. If enabled, `text-analytics-poddisruptionbudget` will be deployed in the Kubernetes cluster| true/false| `true`|

articles/cognitive-services/Speech-Service/includes/speech-to-text-container-query-endpoint.md

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ manager: nitinme
44
ms.service: cognitive-services
55
ms.subservice: speech-service
66
ms.topic: include
7-
ms.date: 04/01/2020
7+
ms.date: 04/29/2020
88
ms.author: aahi
99
---
1010

@@ -31,6 +31,7 @@ to this call using the container [host](https://docs.microsoft.com/dotnet/api/mi
3131
var config = SpeechConfig.FromHost(
3232
new Uri("ws://localhost:5000"));
3333
```
34+
3435
# [Python](#tab/python)
3536

3637
Change from using this Azure-cloud initialization call:
@@ -40,11 +41,4 @@ speech_config = speechsdk.SpeechConfig(
4041
subscription=speech_key, region=service_region)
4142
```
4243

43-
to this call using the container [host](https://docs.microsoft.com/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.speechconfig?view=azure-python):
44-
45-
```python
46-
speech_config = speechsdk.SpeechConfig(
47-
host="ws://localhost:5000")
48-
```
49-
50-
***
44+
---

articles/cognitive-services/Speech-Service/speech-container-howto-on-premises.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
1010
ms.topic: conceptual
11-
ms.date: 04/01/2020
11+
ms.date: 04/29/2020
1212
ms.author: aahi
1313
---
1414

articles/cognitive-services/Speech-Service/speech-container-howto.md

Lines changed: 148 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
1010
ms.topic: conceptual
11-
ms.date: 04/01/2020
11+
ms.date: 04/29/2020
1212
ms.author: aahi
1313
---
1414

@@ -23,7 +23,7 @@ Speech containers enable customers to build a speech application architecture th
2323
2424
| Function | Features | Latest |
2525
|--|--|--|
26-
| Speech-to-text | Transcribes continuous real-time speech or batch audio recordings into text with intermediate results. | 2.1.1 |
26+
| Speech-to-text | Analyzes sentiment and transcribes continuous real-time speech or batch audio recordings with intermediate results. | 2.2.0 |
2727
| Custom Speech-to-text | Using a custom model from the [Custom Speech portal](https://speech.microsoft.com/customspeech), transcribes continuous real-time speech or batch audio recordings into text with intermediate results. | 2.1.1 |
2828
| Text-to-speech | Converts text to natural-sounding speech with plain text input or Speech Synthesis Markup Language (SSML). | 1.3.0 |
2929
| Custom Text-to-speech | Using a custom model from the [Custom Voice portal](https://aka.ms/custom-voice-portal), converts text to natural-sounding speech with plain text input or Speech Synthesis Markup Language (SSML). | 1.3.0 |
@@ -159,7 +159,7 @@ All tags, except for `latest` are in the following format and are case-sensitive
159159
The following tag is an example of the format:
160160

161161
```
162-
2.1.1-amd64-en-us-preview
162+
2.2.0-amd64-en-us-preview
163163
```
164164

165165
For all of the supported locales of the **speech-to-text** container, please see [Speech-to-text image tags](../containers/container-image-tags.md#speech-to-text).
@@ -254,6 +254,33 @@ This command:
254254
* Exposes TCP port 5000 and allocates a pseudo-TTY for the container.
255255
* Automatically removes the container after it exits. The container image is still available on the host computer.
256256

257+
258+
#### Analyze sentiment on the speech-to-text output
259+
260+
Starting in v2.2.0 of the speech-to-text container, you can call the [sentiment analysis v3 API](../text-analytics/how-tos/text-analytics-how-to-sentiment-analysis.md) on the output. To call sentiment analysis, you will need a Text Analytics API resource endpoint. For example:
261+
* `https://westus2.api.cognitive.microsoft.com/text/analytics/v3.0-preview.1/sentiment`
262+
* `https://localhost:5000/text/analytics/v3.0-preview.1/sentiment`
263+
264+
If you're accessing a Text analytics endpoint in the cloud, you will need a key. If you're running Text Analytics locally, you may not need to provide this.
265+
266+
The key and endpoint are passed to the Speech container as arguments, as in the following example.
267+
268+
```bash
269+
docker run -it --rm -p 5000:5000 \
270+
containerpreview.azurecr.io/microsoft/cognitive-services-speech-to-text:latest \
271+
Eula=accept \
272+
Billing={ENDPOINT_URI} \
273+
ApiKey={API_KEY} \
274+
CloudAI:SentimentAnalysisSettings:TextAnalyticsHost={TEXT_ANALYTICS_HOST} \
275+
CloudAI:SentimentAnalysisSettings:SentimentAnalysisApiKey={SENTIMENT_APIKEY}
276+
```
277+
278+
This command:
279+
280+
* Performs the same steps as the command above.
281+
* Stores a Text Analytics API endpoint and key, for sending sentiment analysis requests.
282+
283+
257284
# [Custom Speech-to-text](#tab/cstt)
258285

259286
The *Custom Speech-to-text* container relies on a custom speech model. The custom model has to have been [trained](how-to-custom-speech-train-model.md) using the [custom speech portal](https://speech.microsoft.com/customspeech).
@@ -375,6 +402,9 @@ This command:
375402
376403
## Query the container's prediction endpoint
377404

405+
> [!NOTE]
406+
> Use a unique port number if you're running multiple containers.
407+
378408
| Containers | SDK Host URL | Protocol |
379409
|--|--|--|
380410
| Speech-to-text and Custom Speech-to-text | `ws://localhost:5000` | WS |
@@ -384,6 +414,121 @@ For more information on using WSS and HTTPS protocols, see [container security](
384414

385415
[!INCLUDE [Query Speech-to-text container endpoint](includes/speech-to-text-container-query-endpoint.md)]
386416

417+
#### Analyze sentiment
418+
419+
If you provided your Text Analytics API credentials [to the container](#analyze-sentiment-on-the-speech-to-text-output), you can use the Speech SDK to send speech recognition requests with sentiment analysis. You can configure the API responses to use either a *simple* or *detailed* format.
420+
421+
# [Simple format](#tab/simple-format)
422+
423+
To configure the Speech client to use a simple format, add `"Sentiment"` as a value for `Simple.Extensions`. If you want to choose a specific Text Analytics model version, replace `'latest'` in the `speechcontext-phraseDetection.sentimentAnalysis.modelversion` property configuration.
424+
425+
```python
426+
speech_config.set_service_property(
427+
name='speechcontext-PhraseOutput.Simple.Extensions',
428+
value='["Sentiment"]',
429+
channel=speechsdk.ServicePropertyChannel.UriQueryParameter
430+
)
431+
speech_config.set_service_property(
432+
name='speechcontext-phraseDetection.sentimentAnalysis.modelversion',
433+
value='latest',
434+
channel=speechsdk.ServicePropertyChannel.UriQueryParameter
435+
)
436+
```
437+
438+
`Simple.Extensions` will return the sentiment result in root layer of the response.
439+
440+
```json
441+
{
442+
"DisplayText":"What's the weather like?",
443+
"Duration":13000000,
444+
"Id":"6098574b79434bd4849fee7e0a50f22e",
445+
"Offset":4700000,
446+
"RecognitionStatus":"Success",
447+
"Sentiment":{
448+
"Negative":0.03,
449+
"Neutral":0.79,
450+
"Positive":0.18
451+
}
452+
}
453+
```
454+
455+
# [Detailed format](#tab/detailed-format)
456+
457+
To configure the Speech client to use a detailed format, add `"Sentiment"` as a value for `Detailed.Extensions`, `Detailed.Options`, or both. If you want to choose a specific Text Analytics model version, replace `'latest'` in the `speechcontext-phraseDetection.sentimentAnalysis.modelversion` property configuration.
458+
459+
```python
460+
speech_config.set_service_property(
461+
name='speechcontext-PhraseOutput.Detailed.Options',
462+
value='["Sentiment"]',
463+
channel=speechsdk.ServicePropertyChannel.UriQueryParameter
464+
)
465+
speech_config.set_service_property(
466+
name='speechcontext-PhraseOutput.Detailed.Extensions',
467+
value='["Sentiment"]',
468+
channel=speechsdk.ServicePropertyChannel.UriQueryParameter
469+
)
470+
speech_config.set_service_property(
471+
name='speechcontext-phraseDetection.sentimentAnalysis.modelversion',
472+
value='latest',
473+
channel=speechsdk.ServicePropertyChannel.UriQueryParameter
474+
)
475+
```
476+
477+
`Detailed.Extensions` provides sentiment result in the root layer of the response. `Detailed.Options` provides the result in `NBest` layer of the response. They can be used separately or together.
478+
479+
```json
480+
{
481+
"DisplayText":"What's the weather like?",
482+
"Duration":13000000,
483+
"Id":"6a2aac009b9743d8a47794f3e81f7963",
484+
"NBest":[
485+
{
486+
"Confidence":0.973695,
487+
"Display":"What's the weather like?",
488+
"ITN":"what's the weather like",
489+
"Lexical":"what's the weather like",
490+
"MaskedITN":"What's the weather like",
491+
"Sentiment":{
492+
"Negative":0.03,
493+
"Neutral":0.79,
494+
"Positive":0.18
495+
}
496+
},
497+
{
498+
"Confidence":0.9164971,
499+
"Display":"What is the weather like?",
500+
"ITN":"what is the weather like",
501+
"Lexical":"what is the weather like",
502+
"MaskedITN":"What is the weather like",
503+
"Sentiment":{
504+
"Negative":0.02,
505+
"Neutral":0.88,
506+
"Positive":0.1
507+
}
508+
}
509+
],
510+
"Offset":4700000,
511+
"RecognitionStatus":"Success",
512+
"Sentiment":{
513+
"Negative":0.03,
514+
"Neutral":0.79,
515+
"Positive":0.18
516+
}
517+
}
518+
```
519+
520+
---
521+
522+
If you want to completely disable sentiment analysis, add a `false` value to `sentimentanalysis.enabled`.
523+
524+
```python
525+
speech_config.set_service_property(
526+
name='speechcontext-phraseDetection.sentimentanalysis.enabled',
527+
value='false',
528+
channel=speechsdk.ServicePropertyChannel.UriQueryParameter
529+
)
530+
```
531+
387532
### Text-to-speech or Custom Text-to-speech
388533

389534
[!INCLUDE [Query Text-to-speech container endpoint](includes/text-to-speech-container-query-endpoint.md)]

0 commit comments

Comments
 (0)