Merge pull request #115087 from yinhew/master

v-dihans · web-flow · commit 864941f9bcbc · 2020-05-19T10:25:03.000-06:00
Update document for pronunciation assessment
diff --git a/articles/cognitive-services/Speech-Service/includes/speech-reference-doc-links.md b/articles/cognitive-services/Speech-Service/includes/speech-reference-doc-links.md
@@ -38,5 +38,6 @@ The [Speech Devices SDK](../speech-devices-sdk.md) is a superset of the Speech S
 For references of various Speech service REST APIs, refer to the listing below:
 
 - [REST API: Speech-to-text](../rest-speech-to-text.md)
+- [REST API: Pronunciation assessment](../rest-speech-to-text.md#pronunciation-assessment-parameters)
 - [REST API: Text-to-speech](../rest-text-to-speech.md)
 - <a href="https://cris.ai/swagger/ui/index" target="_blank" rel="noopener">REST API: Batch transcription and customization <span class="docon docon-navigate-external x-hidden-focus"></span></a>
diff --git a/articles/cognitive-services/Speech-Service/index-speech-to-text.yml b/articles/cognitive-services/Speech-Service/index-speech-to-text.yml
@@ -1,7 +1,7 @@
 ### YamlMime:Landing
 
 title: Speech-to-text documentation
-summary: Speech-to-text from the Speech service, also known as speech recognition, enables real-time and batch transcription of audio streams into text.
+summary: Speech-to-text from the Speech service, also known as speech recognition, enables real-time and batch transcription of audio streams into text. With additional reference text input, it also enables real-time pronunciation assessment and gives speakers feedback on the accuracy and fluency of spoken audio.
 metadata:
   title: Speech-to-text documentation - Tutorials, API Reference - Azure Cognitive Services | Microsoft Docs
   titleSuffix: Azure Cognitive Services
@@ -34,6 +34,8 @@ landingContent:
           url: quickstarts/speech-to-text-from-file.md
         - text: Recognize speech stored in blob storage
           url: quickstarts/from-blob.md
+        - text: Pronunciation assessment with reference input
+          url: rest-speech-to-text.md#pronunciation-assessment-parameters
 - title: Develop with speech-to-text
   linkLists:
     - linkListType: how-to-guide
diff --git a/articles/cognitive-services/Speech-Service/rest-speech-to-text.md b/articles/cognitive-services/Speech-Service/rest-speech-to-text.md
@@ -8,7 +8,7 @@ manager: nitinme
 ms.service: cognitive-services
 ms.subservice: speech-service
 ms.topic: conceptual
-ms.date: 04/23/2020
+ms.date: 05/13/2020
 ms.author: yinhew
 ---
 
@@ -49,7 +49,6 @@ These parameters may be included in the query string of the REST request.
 | `language` | Identifies the spoken language that is being recognized. See [Supported languages](language-support.md#speech-to-text). | Required |
 | `format` | Specifies the result format. Accepted values are `simple` and `detailed`. Simple results include `RecognitionStatus`, `DisplayText`, `Offset`, and `Duration`. Detailed responses include four different representations of display text. The default setting is `simple`. | Optional |
 | `profanity` | Specifies how to handle profanity in recognition results. Accepted values are `masked`, which replaces profanity with asterisks, `removed`, which removes all profanity from the result, or `raw`, which includes the profanity in the result. The default setting is `masked`. | Optional |
-| `pronunciationScoreParams` | Specifies the parameters for showing pronunciation scores in recognition results, which assess the pronunciation quality of speech input, with indicators of accuracy, fluency, completeness, etc. This parameter is a base64 encoded json containing multiple detailed parameters. See [Pronunciation assessment parameters](#pronunciation-assessment-parameters) for how to build this parameter. | Optional |
 | `cid` | When using the [Custom Speech portal](how-to-custom-speech.md) to create custom models, you can use custom models via their **Endpoint ID** found on the **Deployment** page. Use the **Endpoint ID** as the argument to the `cid` query string parameter. | Optional |
 
 ## Request headers
@@ -60,6 +59,7 @@ This table lists required and optional headers for speech-to-text requests.
 |------|-------------|---------------------|
 | `Ocp-Apim-Subscription-Key` | Your Speech service subscription key. | Either this header or `Authorization` is required. |
 | `Authorization` | An authorization token preceded by the word `Bearer`. For more information, see [Authentication](#authentication). | Either this header or `Ocp-Apim-Subscription-Key` is required. |
+| `Pronunciation-Assessment` | Specifies the parameters for showing pronunciation scores in recognition results, which assess the pronunciation quality of speech input, with indicators of accuracy, fluency, completeness, etc. This parameter is a base64 encoded json containing multiple detailed parameters. See [Pronunciation assessment parameters](#pronunciation-assessment-parameters) for how to build this header. | Optional |
 | `Content-type` | Describes the format and codec of the provided audio data. Accepted values are `audio/wav; codecs=audio/pcm; samplerate=16000` and `audio/ogg; codecs=opus`. | Required |
 | `Transfer-Encoding` | Specifies that chunked audio data is being sent, rather than a single file. Only use this header if chunking audio data. | Optional |
 | `Expect` | If using chunked transfer, send `Expect: 100-continue`. The Speech service acknowledges the initial request and awaits additional data.| Required if sending chunked audio data. |
@@ -101,14 +101,17 @@ Below is an example JSON containing the pronunciation assessment parameters:
 }
 ```
 
-The following sample code shows how to build the pronunciation assessment parameters into the URL query parameter:
+The following sample code shows how to build the pronunciation assessment parameters into the `Pronunciation-Assessment` header:
 
 ```csharp
-var pronunciationScoreParamsJson = $"{{\"ReferenceText\":\"Good morning.\",\"GradingSystem\":\"HundredMark\",\"Granularity\":\"FullText\",\"Dimension\":\"Comprehensive\"}}";
-var pronunciationScoreParamsBytes = Encoding.UTF8.GetBytes(pronunciationScoreParamsJson);
-var pronunciationScoreParams = Convert.ToBase64String(pronunciationScoreParamsBytes);
+var pronAssessmentParamsJson = $"{{\"ReferenceText\":\"Good morning.\",\"GradingSystem\":\"HundredMark\",\"Granularity\":\"FullText\",\"Dimension\":\"Comprehensive\"}}";
+var pronAssessmentParamsBytes = Encoding.UTF8.GetBytes(pronAssessmentParamsJson);
+var pronAssessmentHeader = Convert.ToBase64String(pronAssessmentParamsBytes);
 ```
 
+>[!NOTE]
+>The pronunciation assessment feature is currently only available on `westus` and `eastasia` regions. And this feature is currently only available on `en-US` language.
+
 ## Sample request
 
 The sample below includes the hostname and required headers. It's important to note that the service also expects audio data, which is not included in this sample. As mentioned earlier, chunking is recommended, however, not required.
@@ -123,6 +126,12 @@ Transfer-Encoding: chunked
 Expect: 100-continue
 ```
 
+To enable pronunciation assessment, you can add below header. See [Pronunciation assessment parameters](#pronunciation-assessment-parameters) for how to build this header.
+
+```HTTP
+Pronunciation-Assessment: eyJSZWZlcm...
+```
+
 ## HTTP status codes
 
 The HTTP status code for each response indicates success or common errors.
diff --git a/articles/cognitive-services/Speech-Service/speech-to-text.md b/articles/cognitive-services/Speech-Service/speech-to-text.md
@@ -20,6 +20,8 @@ Speech-to-text from the Speech service, also known as speech recognition, enable
 
 The speech-to-text service defaults to using the Universal language model. This model was trained using Microsoft-owned data and is deployed in the cloud. It's optimal for conversational and dictation scenarios. When using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models. Customization is helpful for addressing ambient noise or industry-specific vocabulary.
 
+With additional reference text as input, speech-to-text service also enables [pronunciation assessment](rest-speech-to-text.md#pronunciation-assessment-parameters) capability to evaluate speech pronunciation and gives speakers feedback on the accuracy and fluency of spoken audio. With pronunciation assessment, language learners can practice, get instant feedback, and improve their pronunciation so that they can speak and present with confidence. Educators can use the capability to evaluate pronunciation of multiple speakers in real-time. The feature currently supports American English, and correlates highly with speech assessments conducted by experts.
+
 > [!NOTE]
 > Bing Speech was decommissioned on October 15, 2019. If your applications, tools, or products are using the Bing Speech APIs, we've created guides to help you migrate to the Speech service.
 > - [Migrate from Bing Speech to the Speech service](how-to-migrate-from-bing-speech.md)
@@ -34,6 +36,8 @@ The speech-to-text service is available via the [Speech SDK](speech-sdk.md). The
 
 If you prefer to use the speech-to-text REST service, see [REST APIs](rest-speech-to-text.md).
 
+ - [Quickstart: Pronunciation assessment with reference input](rest-speech-to-text.md#pronunciation-assessment-parameters)
+
 ## Tutorials and sample code
 
 After you've had a chance to use the Speech service, try our tutorial that teaches you how to recognize intents from speech using the Speech SDK and LUIS.
@@ -44,6 +48,7 @@ Sample code for the Speech SDK is available on GitHub. These samples cover commo
 
 - [Speech-to-text samples (SDK)](https://github.com/Azure-Samples/cognitive-services-speech-sdk)
 - [Batch transcription samples (REST)](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/batch)
+- [Pronunciation assessment samples (REST)](rest-speech-to-text.md#pronunciation-assessment-parameters)
 
 ## Customization
 
diff --git a/articles/cognitive-services/Speech-Service/toc.yml b/articles/cognitive-services/Speech-Service/toc.yml
@@ -32,6 +32,8 @@
           href: quickstarts/speech-to-text-from-file.md
         - name: Recognize speech stored in blob storage
           href: quickstarts/from-blob.md
+        - name: Pronunciation assessment with reference input
+          href: rest-speech-to-text.md#pronunciation-assessment-parameters
     - name: How-to guides
       items:
         - name: Choose speech recognition mode