Skip to content

Commit 56f50a6

Browse files
committed
Update based on Oliver's comments
1 parent d5ddf70 commit 56f50a6

File tree

1 file changed

+31
-31
lines changed

1 file changed

+31
-31
lines changed

articles/cognitive-services/Speech-Service/rest-speech-to-text.md

Lines changed: 31 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -49,20 +49,45 @@ These parameters may be included in the query string of the REST request.
4949
| `language` | Identifies the spoken language that is being recognized. See [Supported languages](language-support.md#speech-to-text). | Required |
5050
| `format` | Specifies the result format. Accepted values are `simple` and `detailed`. Simple results include `RecognitionStatus`, `DisplayText`, `Offset`, and `Duration`. Detailed responses include multiple results with confidence values and four different representations. The default setting is `simple`. | Optional |
5151
| `profanity` | Specifies how to handle profanity in recognition results. Accepted values are `masked`, which replaces profanity with asterisks, `removed`, which removes all profanity from the result, or `raw`, which includes the profanity in the result. The default setting is `masked`. | Optional |
52-
| `cid` | When using the [Custom Speech portal](how-to-custom-speech.md) to create custom models, you can use custom models via their **Endpoint ID** found on the **Deployment** page. Use the **Endpoint ID** as the argument to the `cid` query string parameter. | Optional |
5352
| `pronunciationScoreParams` | Specifies the parameters for showing pronunciation scores in recognition results, which assess the pronunciation quality of speech input, with indicators of accuracy, fluency, completeness, etc. This parameter is a base64 encoded json containing multiple detailed parameters. See [Pronunciation assessment parameters](#pronunciation-assessment-parameters) for how to build this parameter. | Optional |
53+
| `cid` | When using the [Custom Speech portal](how-to-custom-speech.md) to create custom models, you can use custom models via their **Endpoint ID** found on the **Deployment** page. Use the **Endpoint ID** as the argument to the `cid` query string parameter. | Optional |
54+
55+
## Request headers
56+
57+
This table lists required and optional headers for speech-to-text requests.
58+
59+
|Header| Description | Required / Optional |
60+
|------|-------------|---------------------|
61+
| `Ocp-Apim-Subscription-Key` | Your Speech service subscription key. | Either this header or `Authorization` is required. |
62+
| `Authorization` | An authorization token preceded by the word `Bearer`. For more information, see [Authentication](#authentication). | Either this header or `Ocp-Apim-Subscription-Key` is required. |
63+
| `Content-type` | Describes the format and codec of the provided audio data. Accepted values are `audio/wav; codecs=audio/pcm; samplerate=16000` and `audio/ogg; codecs=opus`. | Required |
64+
| `Transfer-Encoding` | Specifies that chunked audio data is being sent, rather than a single file. Only use this header if chunking audio data. | Optional |
65+
| `Expect` | If using chunked transfer, send `Expect: 100-continue`. The Speech service acknowledges the initial request and awaits additional data.| Required if sending chunked audio data. |
66+
| `Accept` | If provided, it must be `application/json`. The Speech service provides results in JSON. Some request frameworks provide an incompatible default value. It is good practice to always include `Accept`. | Optional, but recommended. |
67+
68+
## Audio formats
69+
70+
Audio is sent in the body of the HTTP `POST` request. It must be in one of the formats in this table:
71+
72+
| Format | Codec | Bitrate | Sample Rate |
73+
|--------|-------|---------|--------------|
74+
| WAV | PCM | 16-bit | 16 kHz, mono |
75+
| OGG | OPUS | 16-bit | 16 kHz, mono |
76+
77+
>[!NOTE]
78+
>The above formats are supported through REST API and WebSocket in the Speech service. The [Speech SDK](speech-sdk.md) currently supports the WAV format with PCM codec as well as [other formats](how-to-use-codec-compressed-audio-input-streams.md).
5479
5580
## Pronunciation assessment parameters
5681

5782
This table lists required and optional parameters for pronunciation assessment.
5883

5984
| Parameter | Description | Required / Optional |
6085
|-----------|-------------|---------------------|
61-
| ReferenceText | The text that the speech audio is following. | Required |
86+
| ReferenceText | The text that the pronunciation will be evaluated against. | Required |
6287
| GradingSystem | The point system for score calibration. Accepted values are `FivePoint` and `HundredMark`. The default settting is `FivePoint`. | Optional |
63-
| Granularity | The evaluation granularity. Accepted values are `Phoneme`, which shows the score on full text, word and phoneme level, `Word`, which shows the score on full text and word level, `FullText`, which shows the score on full text level only. The default settting is `Phoneme`. | Optional |
64-
| Dimension | Defines the output criteria. Accepted values are `Basic`, which shows the accuracy score only, `Comprehensive` shows scores on more dimensions (e.g. fluency score and completeness score on full text level, error type on word level). Check [Response parameters](#response-parameters) to see definitions of different score dimensions and word error types. The default setting is `Basic`. | Optional |
65-
| EnableMiscue | Enables miscue calculation. With this enabled, the pronounced words will be compared to reference text, and will be marked omission/insertion based on the comparison. Accepted values are `False` and `True`. The default setting is `False`. | Optional |
88+
| Granularity | The evaluation granularity. Accepted values are `Phoneme`, which shows the score on the full text, word and phoneme level, `Word`, which shows the score on the full text and word level, `FullText`, which shows the score on the full text level only. The default settting is `Phoneme`. | Optional |
89+
| Dimension | Defines the output criteria. Accepted values are `Basic`, which shows the accuracy score only, `Comprehensive` shows scores on more dimensions (e.g. fluency score and completeness score on the full text level, error type on word level). Check [Response parameters](#response-parameters) to see definitions of different score dimensions and word error types. The default setting is `Basic`. | Optional |
90+
| EnableMiscue | Enables miscue calculation. With this enabled, the pronounced words will be compared to the reference text, and will be marked with omission/insertion based on the comparison. Accepted values are `False` and `True`. The default setting is `False`. | Optional |
6691
| ScenarioId | A GUID indicating a customized point system. | Optional |
6792

6893
Below is an example JSON containing the pronuncition assessment parameters:
@@ -76,39 +101,14 @@ Below is an example JSON containing the pronuncition assessment parameters:
76101
}
77102
```
78103

79-
Below sample code shows how to build the pronunciation assessment parameters into the URL query parameter:
104+
The following sample code shows how to build the pronunciation assessment parameters into the URL query parameter:
80105

81106
```csharp
82107
var pronunciationScoreParamsJson = $"{{\"ReferenceText\":\"Good morning.\",\"GradingSystem\":\"HundredMark\",\"Granularity\":\"FullText\",\"Dimension\":\"Comprehensive\"}}";
83108
var pronunciationScoreParamsBytes = Encoding.UTF8.GetBytes(pronunciationScoreParamsJson);
84109
var pronunciationScoreParams = Convert.ToBase64String(pronunciationScoreParamsBytes);
85110
```
86111

87-
## Request headers
88-
89-
This table lists required and optional headers for speech-to-text requests.
90-
91-
|Header| Description | Required / Optional |
92-
|------|-------------|---------------------|
93-
| `Ocp-Apim-Subscription-Key` | Your Speech service subscription key. | Either this header or `Authorization` is required. |
94-
| `Authorization` | An authorization token preceded by the word `Bearer`. For more information, see [Authentication](#authentication). | Either this header or `Ocp-Apim-Subscription-Key` is required. |
95-
| `Content-type` | Describes the format and codec of the provided audio data. Accepted values are `audio/wav; codecs=audio/pcm; samplerate=16000` and `audio/ogg; codecs=opus`. | Required |
96-
| `Transfer-Encoding` | Specifies that chunked audio data is being sent, rather than a single file. Only use this header if chunking audio data. | Optional |
97-
| `Expect` | If using chunked transfer, send `Expect: 100-continue`. The Speech service acknowledges the initial request and awaits additional data.| Required if sending chunked audio data. |
98-
| `Accept` | If provided, it must be `application/json`. The Speech service provides results in JSON. Some request frameworks provide an incompatible default value. It is good practice to always include `Accept`. | Optional, but recommended. |
99-
100-
## Audio formats
101-
102-
Audio is sent in the body of the HTTP `POST` request. It must be in one of the formats in this table:
103-
104-
| Format | Codec | Bitrate | Sample Rate |
105-
|--------|-------|---------|--------------|
106-
| WAV | PCM | 16-bit | 16 kHz, mono |
107-
| OGG | OPUS | 16-bit | 16 kHz, mono |
108-
109-
>[!NOTE]
110-
>The above formats are supported through REST API and WebSocket in the Speech service. The [Speech SDK](speech-sdk.md) currently supports the WAV format with PCM codec as well as [other formats](how-to-use-codec-compressed-audio-input-streams.md).
111-
112112
## Sample request
113113

114114
The sample below includes the hostname and required headers. It's important to note that the service also expects audio data, which is not included in this sample. As mentioned earlier, chunking is recommended, however, not required.

0 commit comments

Comments
 (0)