You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/rest-speech-to-text.md
+13-12Lines changed: 13 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,21 +14,22 @@ ms.author: erhopf
14
14
15
15
# Speech-to-text REST API
16
16
17
-
As an alternative to the [Speech SDK](speech-sdk.md), Speech Services allow you to convert speech-to-text using a REST API. Each accessible endpoint is associated with a region. Your application requires a subscription key for the endpoint you plan to use.
17
+
As an alternative to the [Speech SDK](speech-sdk.md), Speech Services allows you to convert speech-to-text using a REST API. Each accessible endpoint is associated with a region. Your application requires a subscription key for the endpoint you plan to use.
18
18
19
19
Before using the speech-to-text REST API, understand:
20
-
* Requests that use the REST API can only contain 10 seconds of recorded audio.
20
+
21
+
* Requests that use the REST API and transmit audio directly can only contain up to 60 seconds of audio.
21
22
* The speech-to-text REST API only returns final results. Partial results are not provided.
22
23
23
-
If sending longer audio is a requirement for your application, consider using the [Speech SDK](speech-sdk.md) or [batch transcription](batch-transcription.md).
24
+
If sending longer audio is a requirement for your application, consider using the [Speech SDK](speech-sdk.md) or a file-based REST API, like [batch transcription](batch-transcription.md).
These regions are supported for speech-to-text transcription using the REST API. Make sure that you select the endpoint that matches your subscription region.
@@ -38,7 +39,7 @@ These parameters may be included in the query string of the REST request.
38
39
|-----------|-------------|---------------------|
39
40
|`language`| Identifies the spoken language that is being recognized. See [Supported languages](language-support.md#speech-to-text). | Required |
40
41
|`format`| Specifies the result format. Accepted values are `simple` and `detailed`. Simple results include `RecognitionStatus`, `DisplayText`, `Offset`, and `Duration`. Detailed responses include multiple results with confidence values and four different representations. The default setting is `simple`. | Optional |
41
-
|`profanity`| Specifies how to handle profanity in recognition results. Accepted values are `masked`, which replaces profanity with asterisks, `removed`, which remove all profanity from the result, or `raw`, which includes the profanity in the result. The default setting is `masked`. | Optional |
42
+
|`profanity`| Specifies how to handle profanity in recognition results. Accepted values are `masked`, which replaces profanity with asterisks, `removed`, which removes all profanity from the result, or `raw`, which includes the profanity in the result. The default setting is `masked`. | Optional |
42
43
43
44
## Request headers
44
45
@@ -50,8 +51,8 @@ This table lists required and optional headers for speech-to-text requests.
50
51
|`Authorization`| An authorization token preceded by the word `Bearer`. For more information, see [Authentication](#authentication). | Either this header or `Ocp-Apim-Subscription-Key` is required. |
51
52
|`Content-type`| Describes the format and codec of the provided audio data. Accepted values are `audio/wav; codecs=audio/pcm; samplerate=16000` and `audio/ogg; codecs=opus`. | Required |
52
53
|`Transfer-Encoding`| Specifies that chunked audio data is being sent, rather than a single file. Only use this header if chunking audio data. | Optional |
53
-
|`Expect`| If using chunked transfer, send `Expect: 100-continue`. The Speech Services acknowledge the initial request and awaits additional data.| Required if sending chunked audio data. |
54
-
|`Accept`| If provided, it must be `application/json`. The Speech Services provide results in JSON. Some Web request frameworks provide an incompatible default value if you do not specify one, so it is good practice to always include `Accept`. | Optional, but recommended. |
54
+
|`Expect`| If using chunked transfer, send `Expect: 100-continue`. The Speech Services acknowledges the initial request and awaits additional data.| Required if sending chunked audio data. |
55
+
|`Accept`| If provided, it must be `application/json`. The Speech Services provides results in JSON. Some request frameworks provide an incompatible default value. It is good practice to always include `Accept`. | Optional, but recommended. |
55
56
56
57
## Audio formats
57
58
@@ -67,7 +68,7 @@ Audio is sent in the body of the HTTP `POST` request. It must be in one of the f
67
68
68
69
## Sample request
69
70
70
-
This is a typical HTTP request. The sample below includes the hostname and required headers. It's important to note that the service also expects audio data, which is not included in this sample. As mentioned earlier, chunking is recommended, however, not required.
71
+
The sample below includes the hostname and required headers. It's important to note that the service also expects audio data, which is not included in this sample. As mentioned earlier, chunking is recommended, however, not required.
71
72
72
73
```HTTP
73
74
POST speech/recognition/conversation/cognitiveservices/v1?language=en-US&format=detailed HTTP/1.1
@@ -87,13 +88,13 @@ The HTTP status code for each response indicates success or common errors.
Chunked transfer (`Transfer-Encoding: chunked`) can help reduce recognition latency because it allows the Speech Services to begin processing the audio file while it's being transmitted. The REST API does not provide partial or interim results. This option is intended solely to improve responsiveness.
97
+
Chunked transfer (`Transfer-Encoding: chunked`) can help reduce recognition latency. It allows the Speech Services to begin processing the audio file while it is transmitted. The REST API does not provide partial or interim results.
97
98
98
99
This code sample shows how to send audio in chunks. Only the first chunk should contain the audio file's header. `request` is an HTTPWebRequest object connected to the appropriate REST endpoint. `audioFile` is the path to an audio file on disk.
99
100
@@ -172,7 +173,7 @@ Each object in the `NBest` list includes:
172
173
173
174
## Sample responses
174
175
175
-
This is a typical response for `simple` recognition.
176
+
A typical response for `simple` recognition:
176
177
177
178
```json
178
179
{
@@ -183,7 +184,7 @@ This is a typical response for `simple` recognition.
183
184
}
184
185
```
185
186
186
-
This is a typical response for `detailed` recognition.
0 commit comments