Skip to content

Commit 6cd0e8f

Browse files
authored
Merge pull request #108426 from YassinePBI/letstest
Edited Article
2 parents 03eff03 + 3babe2a commit 6cd0e8f

File tree

1 file changed

+25
-16
lines changed

1 file changed

+25
-16
lines changed

articles/cognitive-services/Speech-Service/batch-transcription.md

Lines changed: 25 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -14,18 +14,27 @@ ms.author: wolfma
1414

1515
# What is batch transcription?
1616

17-
Batch transcription is ideal for transcribing a large amount of audio in storage. By using the dedicated REST API, you can point to audio files with a shared access signature (SAS) URI and asynchronously receive transcription results.
17+
Batch transcription is a set of REST API operations that enables you to transcribe a large amount of audio in storage. You can point to audio files with a shared access signature (SAS) URI and asynchronously receive transcription results.
1818

19-
The API offers asynchronous speech-to-text transcription and other features. You can use REST API to expose methods to:
19+
Asynchronous speech-to-text transcription is just one of the features. You can use batch transcription REST APIs to call the following methods:
2020

21-
- Create a batch processing requests
22-
- Query the status
23-
- Download transcription results
24-
- Delete transcription information from the service
2521

26-
The detailed API is available as a [Swagger document](https://westus.cris.ai/swagger/ui/index#/Custom%20Speech%20transcriptions%3A), under the heading `Custom Speech transcriptions`.
2722

28-
Batch transcription jobs are scheduled on a best effort basis. Currently there is no estimate for when a job will change into the running state. Under normal system load, it should happen within minutes. Once in the running state, the actual transcription is processed faster than the audio real time.
23+
| Batch Transcription Operation | Method | REST API Call |
24+
|------------------------------------------------------------------------------|--------------|----------------------------------------------------|
25+
| Creates a new transcription. | POST | api/speechtotext/v2.0/transcriptions |
26+
| Retrieves a list of transcriptions for the authenticated subscription. | GET | api/speechtotext/v2.0/transcriptions |
27+
| Gets a list of supported locales for offline transcriptions. | GET | api/speechtotext/v2.0/transcriptions/locales |
28+
| Updates the mutable details of the transcription identified by its ID. | PATCH | api/speechtotext/v2.0/transcriptions/{id} |
29+
| Deletes the specified transcription task. | DELETE | api/speechtotext/v2.0/transcriptions/{id} |
30+
| Gets the transcription identified by the given ID. | GET | api/speechtotext/v2.0/transcriptions/{id} |
31+
32+
33+
34+
35+
You can review and test the detailed API, which is available as a [Swagger document](https://westus.cris.ai/swagger/ui/index#/Custom%20Speech%20transcriptions%3A), under the heading `Custom Speech transcriptions`.
36+
37+
Batch transcription jobs are scheduled on a best effort basis. Currently there is no estimate for when a job changes into the running state. Under normal system load, it should happen within minutes. Once in the running state, the actual transcription is processed faster than the audio real time.
2938

3039
Next to the easy-to-use API, you don't need to deploy custom endpoints, and you don't have any concurrency requirements to observe.
3140

@@ -36,7 +45,7 @@ Next to the easy-to-use API, you don't need to deploy custom endpoints, and you
3645
As with all features of the Speech service, you create a subscription key from the [Azure portal](https://portal.azure.com) by following our [Get started guide](get-started.md).
3746

3847
>[!NOTE]
39-
> A standard subscription (S0) for Speech service is required to use batch transcription. Free subscription keys (F0) will not work. For more information, see [pricing and limits](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).
48+
> A standard subscription (S0) for Speech service is required to use batch transcription. Free subscription keys (F0) don't work. For more information, see [pricing and limits](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).
4049
4150
### Custom models
4251

@@ -122,14 +131,14 @@ Use these optional properties to configure transcription:
122131
`AddDiarization`
123132
:::column-end:::
124133
:::column span="2":::
125-
Specifies that diarization analysis should be carried out on the input which is expected to be mono channel containing two voices. Accepted values are `true` enabling diarization and `false` (the default value) to disable it. It also requires `AddWordLevelTimestamps` to be set to true.
134+
Specifies that diarization analysis should be carried out on the input, which is expected to be mono channel containing two voices. Accepted values are `true` enabling diarization and `false` (the default value) to disable it. It also requires `AddWordLevelTimestamps` to be set to true.
126135
:::row-end:::
127136
:::row:::
128137
:::column span="1":::
129138
`TranscriptionResultsContainerUrl`
130139
:::column-end:::
131140
:::column span="2":::
132-
Optional URL with [service SAS](../../storage/common/storage-sas-overview.md) to a writeable container in Azure. The result will be stored in this container.
141+
Optional URL with [service SAS](../../storage/common/storage-sas-overview.md) to a writeable container in Azure. The result is stored in this container.
133142
:::row-end:::
134143

135144
### Storage
@@ -209,13 +218,13 @@ The result contains these forms:
209218
| `Lexical` | The actual words recognized. |
210219
| `ITN` | Inverse-text-normalized form of the recognized text. Abbreviations ("doctor smith" to "dr smith"), phone numbers, and other transformations are applied. |
211220
| `MaskedITN` | The ITN form with profanity masking applied. |
212-
| `Display` | The display form of the recognized text. This includes added punctuation and capitalization. |
221+
| `Display` | The display form of the recognized text. Added punctuation and capitalization are included. |
213222

214223
## Speaker separation (Diarization)
215224

216225
Diarization is the process of separating speakers in a piece of audio. Our Batch pipeline supports diarization and is capable of recognizing two speakers on mono channel recordings. The feature is not available on stereo recordings.
217226

218-
All transcription output contains a `SpeakerId`. If diarization is not used, it will show `"SpeakerId": null` in the JSON output. For diarization we support two voices, so the speakers will be identified as `"1"` or `"2"`.
227+
All transcription output contains a `SpeakerId`. If diarization is not used, it shows `"SpeakerId": null` in the JSON output. For diarization we support two voices, so the speakers are identified as `"1"` or `"2"`.
219228

220229
To request diarization, you simply have to add the relevant parameter in the HTTP request as shown below.
221230

@@ -285,7 +294,7 @@ A JSON output sample looks like below:
285294

286295
## Best practices
287296

288-
The transcription service can handle large number of submitted transcriptions. You can query the status of your transcriptions through a `GET` on the [transcriptions method](https://westus.cris.ai/swagger/ui/index#/Custom%20Speech%20transcriptions%3A/GetTranscriptions). Keep the information returned to a reasonable size by specifying the `take` parameter (a few hundred). [Delete transcriptions](https://westus.cris.ai/swagger/ui/index#/Custom%20Speech%20transcriptions%3A/DeleteTranscription) regularly from the service once you retrieved the results. This will guarantee quick replies from the transcription management calls.
297+
The transcription service can handle large number of submitted transcriptions. You can query the status of your transcriptions through a `GET` on the [transcriptions method](https://westus.cris.ai/swagger/ui/index#/Custom%20Speech%20transcriptions%3A/GetTranscriptions). Keep the information returned to a reasonable size by specifying the `take` parameter (a few hundred). [Delete transcriptions](https://westus.cris.ai/swagger/ui/index#/Custom%20Speech%20transcriptions%3A/DeleteTranscription) regularly from the service once you retrieved the results. This guarantees quick replies from the transcription management calls.
289298

290299
## Sample code
291300

@@ -298,7 +307,7 @@ You have to customize the sample code with your subscription information, the se
298307

299308
[!code-csharp[Configuration variables for batch transcription](~/samples-cognitive-services-speech-sdk/samples/batch/csharp/program.cs#batchdefinition)]
300309

301-
The sample code will set up the client and submit the transcription request. It will then poll for status information and print details about the transcription progress.
310+
The sample code sets up the client and submits the transcription request. It then polls for the status information and print details about the transcription progress.
302311

303312
[!code-csharp[Code to check batch transcription status](~/samples-cognitive-services-speech-sdk/samples/batch/csharp/program.cs#batchstatus)]
304313

@@ -317,4 +326,4 @@ You can find the sample in the `samples/batch` directory in the [GitHub sample r
317326

318327
## Next steps
319328

320-
* [Get your Speech trial subscription](https://azure.microsoft.com/try/cognitive-services/)
329+
- [Get your Speech trial subscription](https://azure.microsoft.com/try/cognitive-services/)

0 commit comments

Comments
 (0)