You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/index-speech-to-text.yml
+3-1Lines changed: 3 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
### YamlMime:Landing
2
2
3
3
title: Speech-to-text documentation
4
-
summary: Speech-to-text from the Speech service, also known as speech recognition, enables real-time and batch transcription of audio streams into text.
4
+
summary: Speech-to-text from the Speech service, also known as speech recognition, enables real-time and batch transcription of audio streams into text. With additional reference text input, it also enables real-time pronunciation assessment and gives speakers feedback on the accuracy and fluency of spoken audio.
5
5
metadata:
6
6
title: Speech-to-text documentation - Tutorials, API Reference - Azure Cognitive Services | Microsoft Docs
7
7
titleSuffix: Azure Cognitive Services
@@ -34,6 +34,8 @@ landingContent:
34
34
url: quickstarts/speech-to-text-from-file.md
35
35
- text: Recognize speech stored in blob storage
36
36
url: quickstarts/from-blob.md
37
+
- text: Pronunciation assessment with reference input
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/rest-speech-to-text.md
+15-6Lines changed: 15 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ manager: nitinme
8
8
ms.service: cognitive-services
9
9
ms.subservice: speech-service
10
10
ms.topic: conceptual
11
-
ms.date: 04/23/2020
11
+
ms.date: 05/13/2020
12
12
ms.author: yinhew
13
13
---
14
14
@@ -49,7 +49,6 @@ These parameters may be included in the query string of the REST request.
49
49
|`language`| Identifies the spoken language that is being recognized. See [Supported languages](language-support.md#speech-to-text). | Required |
50
50
|`format`| Specifies the result format. Accepted values are `simple` and `detailed`. Simple results include `RecognitionStatus`, `DisplayText`, `Offset`, and `Duration`. Detailed responses include four different representations of display text. The default setting is `simple`. | Optional |
51
51
|`profanity`| Specifies how to handle profanity in recognition results. Accepted values are `masked`, which replaces profanity with asterisks, `removed`, which removes all profanity from the result, or `raw`, which includes the profanity in the result. The default setting is `masked`. | Optional |
52
-
|`pronunciationScoreParams`| Specifies the parameters for showing pronunciation scores in recognition results, which assess the pronunciation quality of speech input, with indicators of accuracy, fluency, completeness, etc. This parameter is a base64 encoded json containing multiple detailed parameters. See [Pronunciation assessment parameters](#pronunciation-assessment-parameters) for how to build this parameter. | Optional |
53
52
|`cid`| When using the [Custom Speech portal](how-to-custom-speech.md) to create custom models, you can use custom models via their **Endpoint ID** found on the **Deployment** page. Use the **Endpoint ID** as the argument to the `cid` query string parameter. | Optional |
54
53
55
54
## Request headers
@@ -60,6 +59,7 @@ This table lists required and optional headers for speech-to-text requests.
60
59
|------|-------------|---------------------|
61
60
|`Ocp-Apim-Subscription-Key`| Your Speech service subscription key. | Either this header or `Authorization` is required. |
62
61
|`Authorization`| An authorization token preceded by the word `Bearer`. For more information, see [Authentication](#authentication). | Either this header or `Ocp-Apim-Subscription-Key` is required. |
62
+
|`Pronunciation-Assessment`| Specifies the parameters for showing pronunciation scores in recognition results, which assess the pronunciation quality of speech input, with indicators of accuracy, fluency, completeness, etc. This parameter is a base64 encoded json containing multiple detailed parameters. See [Pronunciation assessment parameters](#pronunciation-assessment-parameters) for how to build this header. | Optional |
63
63
|`Content-type`| Describes the format and codec of the provided audio data. Accepted values are `audio/wav; codecs=audio/pcm; samplerate=16000` and `audio/ogg; codecs=opus`. | Required |
64
64
|`Transfer-Encoding`| Specifies that chunked audio data is being sent, rather than a single file. Only use this header if chunking audio data. | Optional |
65
65
|`Expect`| If using chunked transfer, send `Expect: 100-continue`. The Speech service acknowledges the initial request and awaits additional data.| Required if sending chunked audio data. |
@@ -101,14 +101,17 @@ Below is an example JSON containing the pronunciation assessment parameters:
101
101
}
102
102
```
103
103
104
-
The following sample code shows how to build the pronunciation assessment parameters into the URL query parameter:
104
+
The following sample code shows how to build the pronunciation assessment parameters into the `Pronunciation-Assessment` header:
>The pronunciation assessment feature is currently only available on `westus` and `eastasia` regions. And this feature is currently only available on `en-US` language.
114
+
112
115
## Sample request
113
116
114
117
The sample below includes the hostname and required headers. It's important to note that the service also expects audio data, which is not included in this sample. As mentioned earlier, chunking is recommended, however, not required.
@@ -123,6 +126,12 @@ Transfer-Encoding: chunked
123
126
Expect: 100-continue
124
127
```
125
128
129
+
To enable pronunciation assessment, you can add below header. See [Pronunciation assessment parameters](#pronunciation-assessment-parameters) for how to build this header.
130
+
131
+
```HTTP
132
+
Pronunciation-Assessment: eyJSZWZlcm...
133
+
```
134
+
126
135
## HTTP status codes
127
136
128
137
The HTTP status code for each response indicates success or common errors.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/speech-to-text.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,6 +20,8 @@ Speech-to-text from the Speech service, also known as speech recognition, enable
20
20
21
21
The speech-to-text service defaults to using the Universal language model. This model was trained using Microsoft-owned data and is deployed in the cloud. It's optimal for conversational and dictation scenarios. When using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models. Customization is helpful for addressing ambient noise or industry-specific vocabulary.
22
22
23
+
With additional reference text as input, speech-to-text service also enables [pronunciation assessment](rest-speech-to-text.md#pronunciation-assessment-parameters) capability to evaluate speech pronunciation and gives speakers feedback on the accuracy and fluency of spoken audio. With pronunciation assessment, language learners can practice, get instant feedback, and improve their pronunciation so that they can speak and present with confidence. Educators can use the capability to evaluate pronunciation of multiple speakers in real-time. The feature currently supports American English, and correlates highly with speech assessments conducted by experts.
24
+
23
25
> [!NOTE]
24
26
> Bing Speech was decommissioned on October 15, 2019. If your applications, tools, or products are using the Bing Speech APIs, we've created guides to help you migrate to the Speech service.
25
27
> -[Migrate from Bing Speech to the Speech service](how-to-migrate-from-bing-speech.md)
@@ -34,6 +36,8 @@ The speech-to-text service is available via the [Speech SDK](speech-sdk.md). The
34
36
35
37
If you prefer to use the speech-to-text REST service, see [REST APIs](rest-speech-to-text.md).
36
38
39
+
-[Quickstart: Pronunciation assessment with reference input](rest-speech-to-text.md#pronunciation-assessment-parameters)
40
+
37
41
## Tutorials and sample code
38
42
39
43
After you've had a chance to use the Speech service, try our tutorial that teaches you how to recognize intents from speech using the Speech SDK and LUIS.
@@ -44,6 +48,7 @@ Sample code for the Speech SDK is available on GitHub. These samples cover commo
0 commit comments