Skip to content

Commit bca626f

Browse files
authored
Merge pull request #112439 from yinhew/master
Update cognitive-services/Speech-Service/rest-speech-to-text.md to introduce pronunciation assessment feature on STT REST API
2 parents 1e2ce16 + 80f45d0 commit bca626f

File tree

1 file changed

+80
-3
lines changed

1 file changed

+80
-3
lines changed

articles/cognitive-services/Speech-Service/rest-speech-to-text.md

Lines changed: 80 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,13 @@ title: Speech-to-text API reference (REST) - Speech service
33
titleSuffix: Azure Cognitive Services
44
description: Learn how to use the speech-to-text REST API. In this article, you'll learn about authorization options, query options, how to structure a request and receive a response.
55
services: cognitive-services
6-
author: trevorbye
6+
author: yinhew
77
manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
1010
ms.topic: conceptual
11-
ms.date: 03/16/2020
12-
ms.author: trbye
11+
ms.date: 04/23/2020
12+
ms.author: yinhew
1313
---
1414

1515
# Speech-to-text REST API
@@ -49,6 +49,7 @@ These parameters may be included in the query string of the REST request.
4949
| `language` | Identifies the spoken language that is being recognized. See [Supported languages](language-support.md#speech-to-text). | Required |
5050
| `format` | Specifies the result format. Accepted values are `simple` and `detailed`. Simple results include `RecognitionStatus`, `DisplayText`, `Offset`, and `Duration`. Detailed responses include multiple results with confidence values and four different representations. The default setting is `simple`. | Optional |
5151
| `profanity` | Specifies how to handle profanity in recognition results. Accepted values are `masked`, which replaces profanity with asterisks, `removed`, which removes all profanity from the result, or `raw`, which includes the profanity in the result. The default setting is `masked`. | Optional |
52+
| `pronunciationScoreParams` | Specifies the parameters for showing pronunciation scores in recognition results, which assess the pronunciation quality of speech input, with indicators of accuracy, fluency, completeness, etc. This parameter is a base64 encoded json containing multiple detailed parameters. See [Pronunciation assessment parameters](#pronunciation-assessment-parameters) for how to build this parameter. | Optional |
5253
| `cid` | When using the [Custom Speech portal](how-to-custom-speech.md) to create custom models, you can use custom models via their **Endpoint ID** found on the **Deployment** page. Use the **Endpoint ID** as the argument to the `cid` query string parameter. | Optional |
5354

5455
## Request headers
@@ -76,6 +77,38 @@ Audio is sent in the body of the HTTP `POST` request. It must be in one of the f
7677
>[!NOTE]
7778
>The above formats are supported through REST API and WebSocket in the Speech service. The [Speech SDK](speech-sdk.md) currently supports the WAV format with PCM codec as well as [other formats](how-to-use-codec-compressed-audio-input-streams.md).
7879
80+
## Pronunciation assessment parameters
81+
82+
This table lists required and optional parameters for pronunciation assessment.
83+
84+
| Parameter | Description | Required / Optional |
85+
|-----------|-------------|---------------------|
86+
| ReferenceText | The text that the pronunciation will be evaluated against. | Required |
87+
| GradingSystem | The point system for score calibration. Accepted values are `FivePoint` and `HundredMark`. The default setting is `FivePoint`. | Optional |
88+
| Granularity | The evaluation granularity. Accepted values are `Phoneme`, which shows the score on the full text, word and phoneme level, `Word`, which shows the score on the full text and word level, `FullText`, which shows the score on the full text level only. The default setting is `Phoneme`. | Optional |
89+
| Dimension | Defines the output criteria. Accepted values are `Basic`, which shows the accuracy score only, `Comprehensive` shows scores on more dimensions (e.g. fluency score and completeness score on the full text level, error type on word level). Check [Response parameters](#response-parameters) to see definitions of different score dimensions and word error types. The default setting is `Basic`. | Optional |
90+
| EnableMiscue | Enables miscue calculation. With this enabled, the pronounced words will be compared to the reference text, and will be marked with omission/insertion based on the comparison. Accepted values are `False` and `True`. The default setting is `False`. | Optional |
91+
| ScenarioId | A GUID indicating a customized point system. | Optional |
92+
93+
Below is an example JSON containing the pronunciation assessment parameters:
94+
95+
```json
96+
{
97+
"ReferenceText": "Good morning.",
98+
"GradingSystem": "HundredMark",
99+
"Granularity": "FullText",
100+
"Dimension": "Comprehensive"
101+
}
102+
```
103+
104+
The following sample code shows how to build the pronunciation assessment parameters into the URL query parameter:
105+
106+
```csharp
107+
var pronunciationScoreParamsJson = $"{{\"ReferenceText\":\"Good morning.\",\"GradingSystem\":\"HundredMark\",\"Granularity\":\"FullText\",\"Dimension\":\"Comprehensive\"}}";
108+
var pronunciationScoreParamsBytes = Encoding.UTF8.GetBytes(pronunciationScoreParamsJson);
109+
var pronunciationScoreParams = Convert.ToBase64String(pronunciationScoreParamsBytes);
110+
```
111+
79112
## Sample request
80113

81114
The sample below includes the hostname and required headers. It's important to note that the service also expects audio data, which is not included in this sample. As mentioned earlier, chunking is recommended, however, not required.
@@ -173,6 +206,11 @@ Each object in the `NBest` list includes:
173206
| `ITN` | The inverse-text-normalized ("canonical") form of the recognized text, with phone numbers, numbers, abbreviations ("doctor smith" to "dr smith"), and other transformations applied. |
174207
| `MaskedITN` | The ITN form with profanity masking applied, if requested. |
175208
| `Display` | The display form of the recognized text, with punctuation and capitalization added. This parameter is the same as `DisplayText` provided when format is set to `simple`. |
209+
| `AccuracyScore` | The score indicating the pronunciation accuracy of the given speech. |
210+
| `FluencyScore` | The score indicating the fluency of the given speech. |
211+
| `CompletenessScore` | The score indicating the completeness of the given speech by calculating the ratio of pronounced words towards entire input. |
212+
| `PronScore` | The overall score indicating the pronunciation quality of the given speech. This is calculated from `AccuracyScore`, `FluencyScore` and `CompletenessScore` with weight. |
213+
| `ErrorType` | This value indicates whether a word is omitted, inserted or badly pronounced, compared to `ReferenceText`. Possible values are `None` (meaning no error on this word), `Omission`, `Insertion` and `Mispronunciation`. |
176214

177215
## Sample responses
178216

@@ -213,6 +251,45 @@ A typical response for `detailed` recognition:
213251
}
214252
```
215253

254+
A typical response for recognition with pronunciation assessment:
255+
256+
```json
257+
{
258+
"RecognitionStatus": "Success",
259+
"Offset": "400000",
260+
"Duration": "11000000",
261+
"NBest": [
262+
{
263+
"Confidence" : "0.87",
264+
"Lexical" : "good morning",
265+
"ITN" : "good morning",
266+
"MaskedITN" : "good morning",
267+
"Display" : "Good morning.",
268+
"PronScore" : 84.4,
269+
"AccuracyScore" : 100.0,
270+
"FluencyScore" : 74.0,
271+
"CompletenessScore" : 100.0,
272+
"Words": [
273+
{
274+
"Word" : "Good",
275+
"AccuracyScore" : 100.0,
276+
"ErrorType" : "None",
277+
"Offset" : 500000,
278+
"Duration" : 2700000
279+
},
280+
{
281+
"Word" : "morning",
282+
"AccuracyScore" : 100.0,
283+
"ErrorType" : "None",
284+
"Offset" : 5300000,
285+
"Duration" : 900000
286+
}
287+
]
288+
}
289+
]
290+
}
291+
```
292+
216293
## Next steps
217294

218295
- [Get your Speech trial subscription](https://azure.microsoft.com/try/cognitive-services/)

0 commit comments

Comments
 (0)