Skip to content

Commit 864941f

Browse files
authored
Merge pull request #115087 from yinhew/master
Update document for pronunciation assessment
2 parents b557984 + 5b996db commit 864941f

File tree

5 files changed

+26
-7
lines changed

5 files changed

+26
-7
lines changed

articles/cognitive-services/Speech-Service/includes/speech-reference-doc-links.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,5 +38,6 @@ The [Speech Devices SDK](../speech-devices-sdk.md) is a superset of the Speech S
3838
For references of various Speech service REST APIs, refer to the listing below:
3939

4040
- [REST API: Speech-to-text](../rest-speech-to-text.md)
41+
- [REST API: Pronunciation assessment](../rest-speech-to-text.md#pronunciation-assessment-parameters)
4142
- [REST API: Text-to-speech](../rest-text-to-speech.md)
4243
- <a href="https://cris.ai/swagger/ui/index" target="_blank" rel="noopener">REST API: Batch transcription and customization <span class="docon docon-navigate-external x-hidden-focus"></span></a>

articles/cognitive-services/Speech-Service/index-speech-to-text.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
### YamlMime:Landing
22

33
title: Speech-to-text documentation
4-
summary: Speech-to-text from the Speech service, also known as speech recognition, enables real-time and batch transcription of audio streams into text.
4+
summary: Speech-to-text from the Speech service, also known as speech recognition, enables real-time and batch transcription of audio streams into text. With additional reference text input, it also enables real-time pronunciation assessment and gives speakers feedback on the accuracy and fluency of spoken audio.
55
metadata:
66
title: Speech-to-text documentation - Tutorials, API Reference - Azure Cognitive Services | Microsoft Docs
77
titleSuffix: Azure Cognitive Services
@@ -34,6 +34,8 @@ landingContent:
3434
url: quickstarts/speech-to-text-from-file.md
3535
- text: Recognize speech stored in blob storage
3636
url: quickstarts/from-blob.md
37+
- text: Pronunciation assessment with reference input
38+
url: rest-speech-to-text.md#pronunciation-assessment-parameters
3739
- title: Develop with speech-to-text
3840
linkLists:
3941
- linkListType: how-to-guide

articles/cognitive-services/Speech-Service/rest-speech-to-text.md

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
1010
ms.topic: conceptual
11-
ms.date: 04/23/2020
11+
ms.date: 05/13/2020
1212
ms.author: yinhew
1313
---
1414

@@ -49,7 +49,6 @@ These parameters may be included in the query string of the REST request.
4949
| `language` | Identifies the spoken language that is being recognized. See [Supported languages](language-support.md#speech-to-text). | Required |
5050
| `format` | Specifies the result format. Accepted values are `simple` and `detailed`. Simple results include `RecognitionStatus`, `DisplayText`, `Offset`, and `Duration`. Detailed responses include four different representations of display text. The default setting is `simple`. | Optional |
5151
| `profanity` | Specifies how to handle profanity in recognition results. Accepted values are `masked`, which replaces profanity with asterisks, `removed`, which removes all profanity from the result, or `raw`, which includes the profanity in the result. The default setting is `masked`. | Optional |
52-
| `pronunciationScoreParams` | Specifies the parameters for showing pronunciation scores in recognition results, which assess the pronunciation quality of speech input, with indicators of accuracy, fluency, completeness, etc. This parameter is a base64 encoded json containing multiple detailed parameters. See [Pronunciation assessment parameters](#pronunciation-assessment-parameters) for how to build this parameter. | Optional |
5352
| `cid` | When using the [Custom Speech portal](how-to-custom-speech.md) to create custom models, you can use custom models via their **Endpoint ID** found on the **Deployment** page. Use the **Endpoint ID** as the argument to the `cid` query string parameter. | Optional |
5453

5554
## Request headers
@@ -60,6 +59,7 @@ This table lists required and optional headers for speech-to-text requests.
6059
|------|-------------|---------------------|
6160
| `Ocp-Apim-Subscription-Key` | Your Speech service subscription key. | Either this header or `Authorization` is required. |
6261
| `Authorization` | An authorization token preceded by the word `Bearer`. For more information, see [Authentication](#authentication). | Either this header or `Ocp-Apim-Subscription-Key` is required. |
62+
| `Pronunciation-Assessment` | Specifies the parameters for showing pronunciation scores in recognition results, which assess the pronunciation quality of speech input, with indicators of accuracy, fluency, completeness, etc. This parameter is a base64 encoded json containing multiple detailed parameters. See [Pronunciation assessment parameters](#pronunciation-assessment-parameters) for how to build this header. | Optional |
6363
| `Content-type` | Describes the format and codec of the provided audio data. Accepted values are `audio/wav; codecs=audio/pcm; samplerate=16000` and `audio/ogg; codecs=opus`. | Required |
6464
| `Transfer-Encoding` | Specifies that chunked audio data is being sent, rather than a single file. Only use this header if chunking audio data. | Optional |
6565
| `Expect` | If using chunked transfer, send `Expect: 100-continue`. The Speech service acknowledges the initial request and awaits additional data.| Required if sending chunked audio data. |
@@ -101,14 +101,17 @@ Below is an example JSON containing the pronunciation assessment parameters:
101101
}
102102
```
103103

104-
The following sample code shows how to build the pronunciation assessment parameters into the URL query parameter:
104+
The following sample code shows how to build the pronunciation assessment parameters into the `Pronunciation-Assessment` header:
105105

106106
```csharp
107-
var pronunciationScoreParamsJson = $"{{\"ReferenceText\":\"Good morning.\",\"GradingSystem\":\"HundredMark\",\"Granularity\":\"FullText\",\"Dimension\":\"Comprehensive\"}}";
108-
var pronunciationScoreParamsBytes = Encoding.UTF8.GetBytes(pronunciationScoreParamsJson);
109-
var pronunciationScoreParams = Convert.ToBase64String(pronunciationScoreParamsBytes);
107+
var pronAssessmentParamsJson = $"{{\"ReferenceText\":\"Good morning.\",\"GradingSystem\":\"HundredMark\",\"Granularity\":\"FullText\",\"Dimension\":\"Comprehensive\"}}";
108+
var pronAssessmentParamsBytes = Encoding.UTF8.GetBytes(pronAssessmentParamsJson);
109+
var pronAssessmentHeader = Convert.ToBase64String(pronAssessmentParamsBytes);
110110
```
111111

112+
>[!NOTE]
113+
>The pronunciation assessment feature is currently only available on `westus` and `eastasia` regions. And this feature is currently only available on `en-US` language.
114+
112115
## Sample request
113116

114117
The sample below includes the hostname and required headers. It's important to note that the service also expects audio data, which is not included in this sample. As mentioned earlier, chunking is recommended, however, not required.
@@ -123,6 +126,12 @@ Transfer-Encoding: chunked
123126
Expect: 100-continue
124127
```
125128

129+
To enable pronunciation assessment, you can add below header. See [Pronunciation assessment parameters](#pronunciation-assessment-parameters) for how to build this header.
130+
131+
```HTTP
132+
Pronunciation-Assessment: eyJSZWZlcm...
133+
```
134+
126135
## HTTP status codes
127136

128137
The HTTP status code for each response indicates success or common errors.

articles/cognitive-services/Speech-Service/speech-to-text.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ Speech-to-text from the Speech service, also known as speech recognition, enable
2020

2121
The speech-to-text service defaults to using the Universal language model. This model was trained using Microsoft-owned data and is deployed in the cloud. It's optimal for conversational and dictation scenarios. When using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models. Customization is helpful for addressing ambient noise or industry-specific vocabulary.
2222

23+
With additional reference text as input, speech-to-text service also enables [pronunciation assessment](rest-speech-to-text.md#pronunciation-assessment-parameters) capability to evaluate speech pronunciation and gives speakers feedback on the accuracy and fluency of spoken audio. With pronunciation assessment, language learners can practice, get instant feedback, and improve their pronunciation so that they can speak and present with confidence. Educators can use the capability to evaluate pronunciation of multiple speakers in real-time. The feature currently supports American English, and correlates highly with speech assessments conducted by experts.
24+
2325
> [!NOTE]
2426
> Bing Speech was decommissioned on October 15, 2019. If your applications, tools, or products are using the Bing Speech APIs, we've created guides to help you migrate to the Speech service.
2527
> - [Migrate from Bing Speech to the Speech service](how-to-migrate-from-bing-speech.md)
@@ -34,6 +36,8 @@ The speech-to-text service is available via the [Speech SDK](speech-sdk.md). The
3436

3537
If you prefer to use the speech-to-text REST service, see [REST APIs](rest-speech-to-text.md).
3638

39+
- [Quickstart: Pronunciation assessment with reference input](rest-speech-to-text.md#pronunciation-assessment-parameters)
40+
3741
## Tutorials and sample code
3842

3943
After you've had a chance to use the Speech service, try our tutorial that teaches you how to recognize intents from speech using the Speech SDK and LUIS.
@@ -44,6 +48,7 @@ Sample code for the Speech SDK is available on GitHub. These samples cover commo
4448

4549
- [Speech-to-text samples (SDK)](https://github.com/Azure-Samples/cognitive-services-speech-sdk)
4650
- [Batch transcription samples (REST)](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/batch)
51+
- [Pronunciation assessment samples (REST)](rest-speech-to-text.md#pronunciation-assessment-parameters)
4752

4853
## Customization
4954

articles/cognitive-services/Speech-Service/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@
3232
href: quickstarts/speech-to-text-from-file.md
3333
- name: Recognize speech stored in blob storage
3434
href: quickstarts/from-blob.md
35+
- name: Pronunciation assessment with reference input
36+
href: rest-speech-to-text.md#pronunciation-assessment-parameters
3537
- name: How-to guides
3638
items:
3739
- name: Choose speech recognition mode

0 commit comments

Comments
 (0)