You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/batch-transcription.md
+25-16Lines changed: 25 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,18 +14,27 @@ ms.author: wolfma
14
14
15
15
# What is batch transcription?
16
16
17
-
Batch transcription is ideal for transcribing a large amount of audio in storage. By using the dedicated REST API, you can point to audio files with a shared access signature (SAS) URI and asynchronously receive transcription results.
17
+
Batch transcription is a set of REST API operations that enables you to transcribe a large amount of audio in storage. You can point to audio files with a shared access signature (SAS) URI and asynchronously receive transcription results.
18
18
19
-
The API offers asynchronous speech-to-text transcription and other features. You can use REST API to expose methods to:
19
+
Asynchronous speech-to-text transcription is just one of the features. You can use batch transcription REST APIs to call the following methods:
20
20
21
-
- Create a batch processing requests
22
-
- Query the status
23
-
- Download transcription results
24
-
- Delete transcription information from the service
25
21
26
-
The detailed API is available as a [Swagger document](https://westus.cris.ai/swagger/ui/index#/Custom%20Speech%20transcriptions%3A), under the heading `Custom Speech transcriptions`.
27
22
28
-
Batch transcription jobs are scheduled on a best effort basis. Currently there is no estimate for when a job will change into the running state. Under normal system load, it should happen within minutes. Once in the running state, the actual transcription is processed faster than the audio real time.
| Gets the transcription identified by the given ID. | GET | api/speechtotext/v2.0/transcriptions/{id} |
31
+
32
+
33
+
34
+
35
+
You can review and test the detailed API, which is available as a [Swagger document](https://westus.cris.ai/swagger/ui/index#/Custom%20Speech%20transcriptions%3A), under the heading `Custom Speech transcriptions`.
36
+
37
+
Batch transcription jobs are scheduled on a best effort basis. Currently there is no estimate for when a job changes into the running state. Under normal system load, it should happen within minutes. Once in the running state, the actual transcription is processed faster than the audio real time.
29
38
30
39
Next to the easy-to-use API, you don't need to deploy custom endpoints, and you don't have any concurrency requirements to observe.
31
40
@@ -36,7 +45,7 @@ Next to the easy-to-use API, you don't need to deploy custom endpoints, and you
36
45
As with all features of the Speech service, you create a subscription key from the [Azure portal](https://portal.azure.com) by following our [Get started guide](get-started.md).
37
46
38
47
>[!NOTE]
39
-
> A standard subscription (S0) for Speech service is required to use batch transcription. Free subscription keys (F0) will not work. For more information, see [pricing and limits](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).
48
+
> A standard subscription (S0) for Speech service is required to use batch transcription. Free subscription keys (F0) don't work. For more information, see [pricing and limits](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).
40
49
41
50
### Custom models
42
51
@@ -122,14 +131,14 @@ Use these optional properties to configure transcription:
122
131
`AddDiarization`
123
132
:::column-end:::
124
133
:::column span="2":::
125
-
Specifies that diarization analysis should be carried out on the input which is expected to be mono channel containing two voices. Accepted values are `true` enabling diarization and `false` (the default value) to disable it. It also requires `AddWordLevelTimestamps` to be set to true.
134
+
Specifies that diarization analysis should be carried out on the input, which is expected to be mono channel containing two voices. Accepted values are `true` enabling diarization and `false` (the default value) to disable it. It also requires `AddWordLevelTimestamps` to be set to true.
126
135
:::row-end:::
127
136
:::row:::
128
137
:::column span="1":::
129
138
`TranscriptionResultsContainerUrl`
130
139
:::column-end:::
131
140
:::column span="2":::
132
-
Optional URL with [service SAS](../../storage/common/storage-sas-overview.md) to a writeable container in Azure. The result will be stored in this container.
141
+
Optional URL with [service SAS](../../storage/common/storage-sas-overview.md) to a writeable container in Azure. The result is stored in this container.
133
142
:::row-end:::
134
143
135
144
### Storage
@@ -209,13 +218,13 @@ The result contains these forms:
209
218
|`Lexical`| The actual words recognized. |
210
219
|`ITN`| Inverse-text-normalized form of the recognized text. Abbreviations ("doctor smith" to "dr smith"), phone numbers, and other transformations are applied. |
211
220
|`MaskedITN`| The ITN form with profanity masking applied. |
212
-
|`Display`| The display form of the recognized text. This includes added punctuation and capitalization. |
221
+
|`Display`| The display form of the recognized text. Added punctuation and capitalization are included. |
213
222
214
223
## Speaker separation (Diarization)
215
224
216
225
Diarization is the process of separating speakers in a piece of audio. Our Batch pipeline supports diarization and is capable of recognizing two speakers on mono channel recordings. The feature is not available on stereo recordings.
217
226
218
-
All transcription output contains a `SpeakerId`. If diarization is not used, it will show `"SpeakerId": null` in the JSON output. For diarization we support two voices, so the speakers will be identified as `"1"` or `"2"`.
227
+
All transcription output contains a `SpeakerId`. If diarization is not used, it shows `"SpeakerId": null` in the JSON output. For diarization we support two voices, so the speakers are identified as `"1"` or `"2"`.
219
228
220
229
To request diarization, you simply have to add the relevant parameter in the HTTP request as shown below.
221
230
@@ -285,7 +294,7 @@ A JSON output sample looks like below:
285
294
286
295
## Best practices
287
296
288
-
The transcription service can handle large number of submitted transcriptions. You can query the status of your transcriptions through a `GET` on the [transcriptions method](https://westus.cris.ai/swagger/ui/index#/Custom%20Speech%20transcriptions%3A/GetTranscriptions). Keep the information returned to a reasonable size by specifying the `take` parameter (a few hundred). [Delete transcriptions](https://westus.cris.ai/swagger/ui/index#/Custom%20Speech%20transcriptions%3A/DeleteTranscription) regularly from the service once you retrieved the results. This will guarantee quick replies from the transcription management calls.
297
+
The transcription service can handle large number of submitted transcriptions. You can query the status of your transcriptions through a `GET` on the [transcriptions method](https://westus.cris.ai/swagger/ui/index#/Custom%20Speech%20transcriptions%3A/GetTranscriptions). Keep the information returned to a reasonable size by specifying the `take` parameter (a few hundred). [Delete transcriptions](https://westus.cris.ai/swagger/ui/index#/Custom%20Speech%20transcriptions%3A/DeleteTranscription) regularly from the service once you retrieved the results. This guarantees quick replies from the transcription management calls.
289
298
290
299
## Sample code
291
300
@@ -298,7 +307,7 @@ You have to customize the sample code with your subscription information, the se
298
307
299
308
[!code-csharp[Configuration variables for batch transcription](~/samples-cognitive-services-speech-sdk/samples/batch/csharp/program.cs#batchdefinition)]
300
309
301
-
The sample code will set up the client and submit the transcription request. It will then poll for status information and print details about the transcription progress.
310
+
The sample code sets up the client and submits the transcription request. It then polls for the status information and print details about the transcription progress.
302
311
303
312
[!code-csharp[Code to check batch transcription status](~/samples-cognitive-services-speech-sdk/samples/batch/csharp/program.cs#batchstatus)]
304
313
@@ -317,4 +326,4 @@ You can find the sample in the `samples/batch` directory in the [GitHub sample r
317
326
318
327
## Next steps
319
328
320
-
*[Get your Speech trial subscription](https://azure.microsoft.com/try/cognitive-services/)
329
+
-[Get your Speech trial subscription](https://azure.microsoft.com/try/cognitive-services/)
0 commit comments