You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/rest-speech-to-text.md
+80-3Lines changed: 80 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,13 +3,13 @@ title: Speech-to-text API reference (REST) - Speech service
3
3
titleSuffix: Azure Cognitive Services
4
4
description: Learn how to use the speech-to-text REST API. In this article, you'll learn about authorization options, query options, how to structure a request and receive a response.
5
5
services: cognitive-services
6
-
author: trevorbye
6
+
author: yinhew
7
7
manager: nitinme
8
8
ms.service: cognitive-services
9
9
ms.subservice: speech-service
10
10
ms.topic: conceptual
11
-
ms.date: 03/16/2020
12
-
ms.author: trbye
11
+
ms.date: 04/23/2020
12
+
ms.author: yinhew
13
13
---
14
14
15
15
# Speech-to-text REST API
@@ -50,6 +50,39 @@ These parameters may be included in the query string of the REST request.
50
50
|`format`| Specifies the result format. Accepted values are `simple` and `detailed`. Simple results include `RecognitionStatus`, `DisplayText`, `Offset`, and `Duration`. Detailed responses include multiple results with confidence values and four different representations. The default setting is `simple`. | Optional |
51
51
|`profanity`| Specifies how to handle profanity in recognition results. Accepted values are `masked`, which replaces profanity with asterisks, `removed`, which removes all profanity from the result, or `raw`, which includes the profanity in the result. The default setting is `masked`. | Optional |
52
52
|`cid`| When using the [Custom Speech portal](how-to-custom-speech.md) to create custom models, you can use custom models via their **Endpoint ID** found on the **Deployment** page. Use the **Endpoint ID** as the argument to the `cid` query string parameter. | Optional |
53
+
|`pronunciationScoreParams`| Specifies the parameters for showing pronunciation scores in recognition results, which assess the pronunciation quality of speech input, with indicators of accuracy, fluency, completeness, etc. This parameter is a base64 encoded json containing multiple detailed parameters. See [Pronunciation assessment parameters](#pronunciation-assessment-parameters) for how to build this parameter. | Optional |
54
+
55
+
## Pronunciation assessment parameters
56
+
57
+
This table lists required and optional parameters for pronunciation assessment.
58
+
59
+
| Parameter | Description | Required / Optional |
60
+
|-----------|-------------|---------------------|
61
+
| ReferenceText | The text that the speech audio is following. | Required |
62
+
| GradingSystem | The point system for score calibration. Accepted values are `FivePoint` and `HundredMark`. The default settting is `FivePoint`. | Optional |
63
+
| Granularity | The evaluation granularity. Accepted values are `Phoneme`, which shows the score on full text, word and phoneme level, `Word`, which shows the score on full text and word level, `FullText`, which shows the score on full text level only. The default settting is `Phoneme`. | Optional |
64
+
| Dimension | Defines the output criteria. Accepted values are `Basic`, which shows the accuracy score only, `Comprehensive` shows scores on more dimensions (e.g. fluency score and completeness score on full text level, error type on word level). Check [Response parameters](#response-parameters) to see definitions of different score dimensions and word error types. The default setting is `Basic`. | Optional |
65
+
| EnableMiscue | Enables miscue calculation. With this enabled, the pronounced words will be compared to reference text, and will be marked omission/insertion based on the comparison. Accepted values are `False` and `True`. The default setting is `False`. | Optional |
66
+
| ScenarioId | A GUID indicating a customized point system. | Optional |
67
+
68
+
Below is an example JSON containing the pronuncition assessment parameters:
69
+
70
+
```json
71
+
{
72
+
"ReferenceText": "Good morning.",
73
+
"GradingSystem": "HundredMark",
74
+
"Granularity": "FullText",
75
+
"Dimension": "Comprehensive"
76
+
}
77
+
```
78
+
79
+
Below sample code shows how to build the pronunciation assessment parameters into the URL query parameter:
@@ -173,6 +206,11 @@ Each object in the `NBest` list includes:
173
206
|`ITN`| The inverse-text-normalized ("canonical") form of the recognized text, with phone numbers, numbers, abbreviations ("doctor smith" to "dr smith"), and other transformations applied. |
174
207
|`MaskedITN`| The ITN form with profanity masking applied, if requested. |
175
208
|`Display`| The display form of the recognized text, with punctuation and capitalization added. This parameter is the same as `DisplayText` provided when format is set to `simple`. |
209
+
|`AccuracyScore`| The score indicating the pronunciation accuracy of the given speech. |
210
+
|`FluencyScore`| The score indicating the fluency of the given speech. |
211
+
|`CompletenessScore`| The score indicating the completeness of the given speech by calculating the ratio of pronounced words towards entire input. |
212
+
|`PronScore`| The overall score indicating the pronunciation quality of the given speech. This is calculated from `AccuracyScore`, `FluencyScore` and `CompletenessScore` with weight. |
213
+
|`ErrorType`| This value indicates whether a word is omitted, inserted or badly pronounced, compared to `ReferenceText`. Possible values are `None` (meaning no error on this word), `Omission`, `Insertion` and `Mispronunciation`. |
176
214
177
215
## Sample responses
178
216
@@ -213,6 +251,45 @@ A typical response for `detailed` recognition:
213
251
}
214
252
```
215
253
254
+
A typical response for recognition with pronunciation assessment:
255
+
256
+
```json
257
+
{
258
+
"RecognitionStatus": "Success",
259
+
"Offset": "400000",
260
+
"Duration": "11000000",
261
+
"NBest": [
262
+
{
263
+
"Confidence" : "0.87",
264
+
"Lexical" : "good morning",
265
+
"ITN" : "good morning",
266
+
"MaskedITN" : "good morning",
267
+
"Display" : "Good morning.",
268
+
"PronScore" : 84.4,
269
+
"AccuracyScore" : 100.0,
270
+
"FluencyScore" : 74.0,
271
+
"CompletenessScore" : 100.0,
272
+
"Words": [
273
+
{
274
+
"Word" : "Good",
275
+
"AccuracyScore" : 100.0,
276
+
"ErrorType" : "None",
277
+
"Offset" : 500000,
278
+
"Duration" : 2700000
279
+
},
280
+
{
281
+
"Word" : "morning",
282
+
"AccuracyScore" : 100.0,
283
+
"ErrorType" : "None",
284
+
"Offset" : 5300000,
285
+
"Duration" : 900000
286
+
}
287
+
]
288
+
}
289
+
]
290
+
}
291
+
```
292
+
216
293
## Next steps
217
294
218
295
-[Get your Speech trial subscription](https://azure.microsoft.com/try/cognitive-services/)
0 commit comments