Skip to content

Commit d5ddf70

Browse files
committed
Update cognitive-services/Speech-Service/rest-speech-to-text.md to introduce pronunciation assessment feature on STT API
1 parent 8142502 commit d5ddf70

File tree

1 file changed

+80
-3
lines changed

1 file changed

+80
-3
lines changed

articles/cognitive-services/Speech-Service/rest-speech-to-text.md

Lines changed: 80 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,13 @@ title: Speech-to-text API reference (REST) - Speech service
33
titleSuffix: Azure Cognitive Services
44
description: Learn how to use the speech-to-text REST API. In this article, you'll learn about authorization options, query options, how to structure a request and receive a response.
55
services: cognitive-services
6-
author: trevorbye
6+
author: yinhew
77
manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
1010
ms.topic: conceptual
11-
ms.date: 03/16/2020
12-
ms.author: trbye
11+
ms.date: 04/23/2020
12+
ms.author: yinhew
1313
---
1414

1515
# Speech-to-text REST API
@@ -50,6 +50,39 @@ These parameters may be included in the query string of the REST request.
5050
| `format` | Specifies the result format. Accepted values are `simple` and `detailed`. Simple results include `RecognitionStatus`, `DisplayText`, `Offset`, and `Duration`. Detailed responses include multiple results with confidence values and four different representations. The default setting is `simple`. | Optional |
5151
| `profanity` | Specifies how to handle profanity in recognition results. Accepted values are `masked`, which replaces profanity with asterisks, `removed`, which removes all profanity from the result, or `raw`, which includes the profanity in the result. The default setting is `masked`. | Optional |
5252
| `cid` | When using the [Custom Speech portal](how-to-custom-speech.md) to create custom models, you can use custom models via their **Endpoint ID** found on the **Deployment** page. Use the **Endpoint ID** as the argument to the `cid` query string parameter. | Optional |
53+
| `pronunciationScoreParams` | Specifies the parameters for showing pronunciation scores in recognition results, which assess the pronunciation quality of speech input, with indicators of accuracy, fluency, completeness, etc. This parameter is a base64 encoded json containing multiple detailed parameters. See [Pronunciation assessment parameters](#pronunciation-assessment-parameters) for how to build this parameter. | Optional |
54+
55+
## Pronunciation assessment parameters
56+
57+
This table lists required and optional parameters for pronunciation assessment.
58+
59+
| Parameter | Description | Required / Optional |
60+
|-----------|-------------|---------------------|
61+
| ReferenceText | The text that the speech audio is following. | Required |
62+
| GradingSystem | The point system for score calibration. Accepted values are `FivePoint` and `HundredMark`. The default settting is `FivePoint`. | Optional |
63+
| Granularity | The evaluation granularity. Accepted values are `Phoneme`, which shows the score on full text, word and phoneme level, `Word`, which shows the score on full text and word level, `FullText`, which shows the score on full text level only. The default settting is `Phoneme`. | Optional |
64+
| Dimension | Defines the output criteria. Accepted values are `Basic`, which shows the accuracy score only, `Comprehensive` shows scores on more dimensions (e.g. fluency score and completeness score on full text level, error type on word level). Check [Response parameters](#response-parameters) to see definitions of different score dimensions and word error types. The default setting is `Basic`. | Optional |
65+
| EnableMiscue | Enables miscue calculation. With this enabled, the pronounced words will be compared to reference text, and will be marked omission/insertion based on the comparison. Accepted values are `False` and `True`. The default setting is `False`. | Optional |
66+
| ScenarioId | A GUID indicating a customized point system. | Optional |
67+
68+
Below is an example JSON containing the pronuncition assessment parameters:
69+
70+
```json
71+
{
72+
"ReferenceText": "Good morning.",
73+
"GradingSystem": "HundredMark",
74+
"Granularity": "FullText",
75+
"Dimension": "Comprehensive"
76+
}
77+
```
78+
79+
Below sample code shows how to build the pronunciation assessment parameters into the URL query parameter:
80+
81+
```csharp
82+
var pronunciationScoreParamsJson = $"{{\"ReferenceText\":\"Good morning.\",\"GradingSystem\":\"HundredMark\",\"Granularity\":\"FullText\",\"Dimension\":\"Comprehensive\"}}";
83+
var pronunciationScoreParamsBytes = Encoding.UTF8.GetBytes(pronunciationScoreParamsJson);
84+
var pronunciationScoreParams = Convert.ToBase64String(pronunciationScoreParamsBytes);
85+
```
5386

5487
## Request headers
5588

@@ -173,6 +206,11 @@ Each object in the `NBest` list includes:
173206
| `ITN` | The inverse-text-normalized ("canonical") form of the recognized text, with phone numbers, numbers, abbreviations ("doctor smith" to "dr smith"), and other transformations applied. |
174207
| `MaskedITN` | The ITN form with profanity masking applied, if requested. |
175208
| `Display` | The display form of the recognized text, with punctuation and capitalization added. This parameter is the same as `DisplayText` provided when format is set to `simple`. |
209+
| `AccuracyScore` | The score indicating the pronunciation accuracy of the given speech. |
210+
| `FluencyScore` | The score indicating the fluency of the given speech. |
211+
| `CompletenessScore` | The score indicating the completeness of the given speech by calculating the ratio of pronounced words towards entire input. |
212+
| `PronScore` | The overall score indicating the pronunciation quality of the given speech. This is calculated from `AccuracyScore`, `FluencyScore` and `CompletenessScore` with weight. |
213+
| `ErrorType` | This value indicates whether a word is omitted, inserted or badly pronounced, compared to `ReferenceText`. Possible values are `None` (meaning no error on this word), `Omission`, `Insertion` and `Mispronunciation`. |
176214

177215
## Sample responses
178216

@@ -213,6 +251,45 @@ A typical response for `detailed` recognition:
213251
}
214252
```
215253

254+
A typical response for recognition with pronunciation assessment:
255+
256+
```json
257+
{
258+
"RecognitionStatus": "Success",
259+
"Offset": "400000",
260+
"Duration": "11000000",
261+
"NBest": [
262+
{
263+
"Confidence" : "0.87",
264+
"Lexical" : "good morning",
265+
"ITN" : "good morning",
266+
"MaskedITN" : "good morning",
267+
"Display" : "Good morning.",
268+
"PronScore" : 84.4,
269+
"AccuracyScore" : 100.0,
270+
"FluencyScore" : 74.0,
271+
"CompletenessScore" : 100.0,
272+
"Words": [
273+
{
274+
"Word" : "Good",
275+
"AccuracyScore" : 100.0,
276+
"ErrorType" : "None",
277+
"Offset" : 500000,
278+
"Duration" : 2700000
279+
},
280+
{
281+
"Word" : "morning",
282+
"AccuracyScore" : 100.0,
283+
"ErrorType" : "None",
284+
"Offset" : 5300000,
285+
"Duration" : 900000
286+
}
287+
]
288+
}
289+
]
290+
}
291+
```
292+
216293
## Next steps
217294

218295
- [Get your Speech trial subscription](https://azure.microsoft.com/try/cognitive-services/)

0 commit comments

Comments
 (0)