Skip to content

Commit 61b4a03

Browse files
committed
batch transcription version 2024-11-15
1 parent 8ea9235 commit 61b4a03

10 files changed

+162
-160
lines changed

articles/ai-services/speech-service/batch-transcription-audio-data.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: eric-urban
77
ms.author: eur
88
ms.service: azure-ai-speech
99
ms.topic: how-to
10-
ms.date: 3/10/2025
10+
ms.date: 5/25/2025
1111
ms.devlang: csharp
1212
ms.custom: devx-track-csharp, devx-track-azurecli
1313
# Customer intent: As a user who implements audio transcription, I want to learn how to locate audio files for batch transcription.
@@ -27,7 +27,7 @@ You can specify one or multiple audio files when creating a transcription. We re
2727

2828
## Supported audio formats and codecs
2929

30-
The batch transcription API (and [fast transcription API](./fast-transcription-create.md)) supports multiple formats and codecs, such as:
30+
The [batch transcription API](./batch-transcription.md) and [fast transcription API](./fast-transcription-create.md) support multiple formats and codecs, such as:
3131

3232
- WAV
3333
- MP3
@@ -41,11 +41,10 @@ The batch transcription API (and [fast transcription API](./fast-transcription-c
4141
- WebM
4242
- SPEEX
4343

44-
4544
> [!NOTE]
46-
> Batch transcription service integrates [GStreamer](./how-to-use-codec-compressed-audio-input-streams.md) and might accept more formats and codecs without returning errors. We suggest to use lossless formats such as WAV (PCM encoding) and FLAC to ensure best transcription quality.
45+
> Batch transcription service integrates [GStreamer](./how-to-use-codec-compressed-audio-input-streams.md) and might accept more formats and codecs without returning errors. We suggest using lossless formats such as WAV (PCM encoding) and FLAC to ensure best transcription quality.
4746
48-
## Azure Blob Storage upload
47+
## Upload to Azure Blob Storage
4948

5049
When audio files are located in an [Azure Blob Storage](/azure/storage/blobs/storage-blobs-overview) account, you can request transcription of individual audio files or an entire Azure Blob Storage container. You can also [write transcription results](batch-transcription-create.md#specify-a-destination-container-url) to a Blob container.
5150

@@ -89,7 +88,7 @@ Follow these steps to create a storage account and upload wav files from your lo
8988
```
9089
9190
> [!TIP]
92-
> When you are finished with batch transcriptions and want to delete your storage account, use the [`az storage delete create`](/cli/azure/storage/account#az-storage-account-delete) command.
91+
> When you're finished with batch transcriptions and want to delete your storage account, use the [`az storage delete create`](/cli/azure/storage/account#az-storage-account-delete) command.
9392
9493
1. Get your new storage account keys with the [`az storage account keys list`](/cli/azure/storage/account#az-storage-account-keys-list) command.
9594
@@ -125,7 +124,7 @@ Follow these steps to create a storage account and upload wav files from your lo
125124
This section explains how to set up and limit access to your batch transcription source audio files in an Azure Storage account using the [trusted Azure services security mechanism](/azure/storage/common/storage-network-security#trusted-access-based-on-a-managed-identity).
126125
127126
> [!NOTE]
128-
> With the trusted Azure services security mechanism, you need to use [Azure Blob storage](/azure/storage/blobs/storage-blobs-overview) to store audio files. Usage of [Azure Files](/azure/storage/files/storage-files-introduction) is not supported.
127+
> With the trusted Azure services security mechanism, you need to use [Azure Blob storage](/azure/storage/blobs/storage-blobs-overview) to store audio files. Usage of [Azure Files](/azure/storage/files/storage-files-introduction) isn't supported.
129128
130129
If you perform all actions in this section, your Storage account is configured as follows:
131130
- Access to all external network traffic is prohibited.
@@ -288,9 +287,9 @@ You could otherwise specify individual files in the container. You must generate
288287
}
289288
```
290289

291-
## Next steps
290+
## Related content
292291

293-
- [Batch transcription overview](batch-transcription.md)
292+
- [Learn more about batch transcription](batch-transcription.md)
294293
- [Create a batch transcription](batch-transcription-create.md)
295294
- [Get batch transcription results](batch-transcription-get.md)
296295
- [See batch transcription code samples at GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/batch/)

articles/ai-services/speech-service/batch-transcription-create.md

Lines changed: 59 additions & 54 deletions
Large diffs are not rendered by default.

articles/ai-services/speech-service/batch-transcription-get.md

Lines changed: 56 additions & 55 deletions
Large diffs are not rendered by default.

articles/ai-services/speech-service/batch-transcription.md

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,13 @@ author: eric-urban
77
ms.author: eur
88
ms.service: azure-ai-speech
99
ms.topic: overview
10-
ms.date: 3/10/2025
10+
ms.date: 5/25/2025
1111
ms.devlang: csharp
1212
ms.custom: devx-track-csharp
1313
---
1414

1515
# What is batch transcription?
1616

17-
> [!IMPORTANT]
18-
> New pricing is in effect for batch transcription via [Speech to text REST API v3.2](./migrate-v3-1-to-v3-2.md). For more information, see the [pricing guide](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services).
19-
2017
Batch transcription is used to transcribe a large amount of audio data in storage. Both the [Speech to text REST API](rest-speech-to-text.md#batch-transcription) and [Speech CLI](spx-basics.md) support batch transcription.
2118

2219
You should provide multiple files per request or point to an Azure Blob Storage container with the audio files to transcribe. The batch transcription service can handle a large number of submitted transcriptions. The service transcribes the files concurrently, which reduces the turnaround time.
@@ -35,9 +32,9 @@ To use the batch transcription REST API:
3532
1. [Get batch transcription results](batch-transcription-get.md) - Check transcription status and retrieve transcription results asynchronously.
3633

3734
> [!IMPORTANT]
38-
> Batch transcription jobs are scheduled on a best-effort basis. At peak hours it may take up to 30 minutes or longer for a transcription job to start processing. See how to check the current status of a batch transcription job in [this section](batch-transcription-get.md#get-transcription-status).
35+
> Batch transcription jobs are scheduled on a best-effort basis. At peak hours it might take up to 30 minutes or longer for a transcription job to start processing. See how to check the current status of a batch transcription job in [this section](batch-transcription-get.md#get-transcription-status).
3936
40-
## Next steps
37+
## Related content
4138

4239
- [Locate audio files for batch transcription](batch-transcription-audio-data.md)
4340
- [Create a batch transcription](batch-transcription-create.md)

articles/ai-services/speech-service/fast-transcription-create.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: eric-urban
77
ms.author: eur
88
ms.service: azure-ai-speech
99
ms.topic: how-to
10-
ms.date: 5/4/2025
10+
ms.date: 5/25/2025
1111
# Customer intent: As a user who implements audio transcription, I want create transcriptions as quickly as possible.
1212
---
1313

@@ -31,10 +31,7 @@ Unlike the batch transcription API, fast transcription API only produces transcr
3131
> [!TIP]
3232
> Try out fast transcription in the [Azure AI Foundry portal](https://aka.ms/fasttranscription/studio).
3333
34-
> [!NOTE]
35-
> Speech service is an elastic service. If you receive 429 error code (too many requests), please follow the [best practices to mitigate throttling during autoscaling](speech-services-quotas-and-limits.md#general-best-practices-to-mitigate-throttling-during-autoscaling).
36-
37-
We learn how to use the fast transcription API (via [Transcriptions - Transcribe](https://go.microsoft.com/fwlink/?linkid=2296107)) with the following scenarios:
34+
We learn how to use the fast transcription API (via [Transcriptions - Transcribe](/rest/api/speechtotext/transcriptions/transcribe)) with the following scenarios:
3835
- [Known locale specified](?tabs=locale-specified): Transcribe an audio file with a specified locale. If you know the locale of the audio file, you can specify it to improve transcription accuracy and minimize the latency.
3936
- [Language identification on](?tabs=language-identification-on): Transcribe an audio file with language identification on. If you're not sure about the locale of the audio file, you can turn on language identification to let the Speech service identify the locale (one locale per audio).
4037
- [Multi-lingual transcription (preview)](?tabs=multilingual-transcription-on): Transcribe an audio file with the latest multi-lingual speech transcription model. If your audio contains multi-lingual contents that you want to transcribe continuously and accurately, you can use the latest multi-lingual speech transcription model without specifying the locale codes.
@@ -1722,9 +1719,12 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
17221719
```
17231720
---
17241721

1722+
> [!NOTE]
1723+
> Speech service is an elastic service. If you receive 429 error code (too many requests), please follow the [best practices to mitigate throttling during autoscaling](speech-services-quotas-and-limits.md#general-best-practices-to-mitigate-throttling-during-autoscaling).
1724+
17251725
## Request configuration options
17261726

1727-
Here are some property options to configure a transcription when you call the [Transcriptions - Transcribe](https://go.microsoft.com/fwlink/?linkid=2296107) operation.
1727+
Here are some property options to configure a transcription when you call the [Transcriptions - Transcribe](/rest/api/speechtotext/transcriptions/transcribe) operation.
17281728

17291729
| Property | Description | Required or optional |
17301730
|----------|-------------|----------------------|
@@ -1735,6 +1735,6 @@ Here are some property options to configure a transcription when you call the [T
17351735

17361736
## Related content
17371737

1738-
- [Fast transcription REST API reference](https://go.microsoft.com/fwlink/?linkid=2296107)
1738+
- [Fast transcription REST API reference](/rest/api/speechtotext/transcriptions/transcribe)
17391739
- [Speech to text supported languages](./language-support.md?tabs=stt)
17401740
- [Batch transcription](./batch-transcription.md)

articles/ai-services/speech-service/how-to-custom-speech-model-and-endpoint-lifecycle.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ manager: nitinme
77
ms.author: eur
88
ms.service: azure-ai-speech
99
ms.topic: how-to
10-
ms.date: 5/19/2025
10+
ms.date: 5/25/2025
1111
ms.reviewer: heikora
1212
zone_pivot_groups: foundry-speech-studio-cli-rest
1313
#Customer intent: As a developer, I want to understand the lifecycle of custom speech models and endpoints so that I can plan for the expiration of my models.
@@ -43,7 +43,7 @@ When a custom model or base model expires, it's no longer available for transcri
4343
|Transcription route |Expired model result |Recommendation |
4444
|---------|---------|---------|
4545
|Custom endpoint|Speech recognition requests fall back to the most recent base model for the same [locale](language-support.md?tabs=stt). You get results, but recognition might not accurately transcribe your domain data. |Update the endpoint's model as described in the [Deploy a custom speech model](how-to-custom-speech-deploy-model.md) guide. |
46-
|Batch transcription |[Batch transcription](batch-transcription.md) requests for expired models fail with a 4xx error. |In each [Transcriptions_Create](/rest/api/speechtotext/transcriptions/create) REST API request body, set the `model` property to a base model or custom model that isn't expired. Otherwise don't include the `model` property to always use the latest base model. |
46+
|Batch transcription |[Batch transcription](batch-transcription.md) requests for expired models fail with a 4xx error. |In each [Transcriptions - Submit](/rest/api/speechtotext/transcriptions/submit) REST API request body, set the `model` property to a base model or custom model that isn't expired. Otherwise don't include the `model` property to always use the latest base model. |
4747

4848
## Get base model expiration dates
4949

articles/ai-services/speech-service/how-to-get-speech-session-id.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.author: eur
77
manager: nitinme
88
ms.service: azure-ai-speech
99
ms.topic: how-to
10-
ms.date: 3/10/2025
10+
ms.date: 5/25/2025
1111
ms.reviewer: alexeyo
1212
#Customer intent: As a developer, I need to know how to get the session ID and transcription ID for speech to text so that I can debug issues with my application.
1313
---
@@ -91,11 +91,11 @@ https://eastus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiv
9191
9292
## Getting Transcription ID for Batch transcription
9393

94-
[Batch transcription API](batch-transcription.md) is a subset of the [Speech to text REST API](rest-speech-to-text.md).
94+
[Batch transcription API](batch-transcription.md) is part of the [Speech to text REST API](rest-speech-to-text.md).
9595

96-
The required Transcription ID is the GUID value contained in the main `self` element of the Response body returned by requests, like [Transcriptions_Create](/rest/api/speechtotext/transcriptions/create).
96+
The required Transcription ID is the GUID value contained in the main `self` element of the Response body returned by requests, like [Transcriptions - Submit](/rest/api/speechtotext/transcriptions/submit).
9797

98-
The following is and example response body of a [Transcriptions_Create](/rest/api/speechtotext/transcriptions/create) request. GUID value `537216f8-0620-4a10-ae2d-00bdb423b36f` found in the first `self` element is the Transcription ID.
98+
The following is and example response body of a [Transcriptions - Submit](/rest/api/speechtotext/transcriptions/submit) request. GUID value `537216f8-0620-4a10-ae2d-00bdb423b36f` found in the first `self` element is the Transcription ID.
9999

100100
```json
101101
{
@@ -127,4 +127,4 @@ The following is and example response body of a [Transcriptions_Create](/rest/ap
127127
> Use the same technique to determine different IDs required for debugging issues related to [custom speech](custom-speech-overview.md), like uploading a dataset using [Datasets_Create](/rest/api/speechtotext/datasets/create) request.
128128
129129
> [!NOTE]
130-
> You can also see all existing transcriptions and their Transcription IDs for a given Speech resource by using [Transcriptions_Get](/rest/api/speechtotext/transcriptions/get) request.
130+
> You can also see all existing transcriptions and their Transcription IDs for a given Speech resource by using [Transcriptions - Get](/rest/api/speechtotext/transcriptions/get) request.

articles/ai-services/speech-service/language-identification.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ manager: nitinme
77
ms.service: azure-ai-speech
88
ms.custom: devx-track-extended-java, devx-track-js, devx-track-python
99
ms.topic: how-to
10-
ms.date: 3/10/2025
10+
ms.date: 5/25/2025
1111
ms.author: eur
1212
zone_pivot_groups: programming-languages-speech-services-nomore-variant
1313
#customer intent: As an application developer, I want to use language recognition or translations in order to make my apps work seamlessly for more customers.
@@ -1075,7 +1075,7 @@ For more information about containers, see the [language identification speech c
10751075
10761076
## Implement speech to text batch transcription
10771077
1078-
To identify languages with [Batch transcription REST API](batch-transcription.md), use `languageIdentification` property in the body of your [Transcriptions_Create](/rest/api/speechtotext/transcriptions/create) request.
1078+
To identify languages with [Batch transcription REST API](batch-transcription.md), use `languageIdentification` property in the body of your [Transcriptions - Submit](/rest/api/speechtotext/transcriptions/submit) request.
10791079
10801080
> [!WARNING]
10811081
> Batch transcription only supports language identification for default base models. If both language identification and a custom model are specified in the transcription request, the service falls back to use the base models for the specified candidate languages. This might result in unexpected recognition results.

articles/ai-services/speech-service/rest-speech-to-text-short.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: eric-urban
66
manager: nitinme
77
ms.service: azure-ai-speech
88
ms.topic: how-to
9-
ms.date: 3/10/2025
9+
ms.date: 5/25/2025
1010
ms.author: eur
1111
ms.devlang: csharp
1212
ms.custom: devx-track-csharp
@@ -15,13 +15,13 @@ ms.custom: devx-track-csharp
1515

1616
# Speech to text REST API for short audio
1717

18-
Use cases for the Speech to text REST API for short audio are limited. Use it only in cases where you can't use the [Speech SDK](speech-sdk.md).
18+
Use cases for the Speech to text REST API for short audio are limited. Use it only in cases where you can't use the [Speech SDK](speech-sdk.md) or [fast transcription API](fast-transcription-create.md).
1919

2020
Before you use the Speech to text REST API for short audio, consider the following limitations:
2121

2222
* Requests that use the REST API for short audio and transmit audio directly can contain no more than 60 seconds of audio. For pronunciation assessment, the audio duration should be no more than 30 seconds. The input [audio formats](#audio-formats) are more limited compared to the [Speech SDK](speech-sdk.md).
2323
* The REST API for short audio returns only final results. It doesn't provide partial results.
24-
* [Speech translation](speech-translation.md) isn't supported via REST API for short audio. You need to use [Speech SDK](speech-sdk.md).
24+
* [Speech translation](speech-translation.md) isn't supported via REST API for short audio. You need to use the [Speech SDK](speech-sdk.md).
2525
* [Batch transcription](batch-transcription.md) and [custom speech](custom-speech-overview.md) aren't supported via REST API for short audio. You should always use the [Speech to text REST API](rest-speech-to-text.md) for batch transcription and custom speech.
2626

2727
Before you use the Speech to text REST API for short audio, understand that you need to complete a token exchange as part of authentication to access the service. For more information, see [Authentication](#authentication).
@@ -49,7 +49,7 @@ Audio is sent in the body of the HTTP `POST` request. It must be in one of the f
4949
| OGG | OPUS | 256 kbps | 16 kHz, mono |
5050

5151
> [!NOTE]
52-
> The preceding formats are supported through the REST API for short audio and WebSocket in the Speech service. The [Speech SDK](speech-sdk.md) supports the WAV format with PCM codec as well as [other formats](how-to-use-codec-compressed-audio-input-streams.md).
52+
> The preceding formats are supported through the REST API for short audio and WebSockets in the Speech service. The [Speech SDK](speech-sdk.md) supports the WAV format with PCM codec as well as [other formats](how-to-use-codec-compressed-audio-input-streams.md).
5353
5454
## Request headers
5555

@@ -77,7 +77,6 @@ These parameters might be included in the query string of the REST request.
7777
| `language` | Identifies the spoken language that's being recognized. See [Supported languages](language-support.md?tabs=stt). | Required |
7878
| `format` | Specifies the result format. Accepted values are `simple` and `detailed`. Simple results include `RecognitionStatus`, `DisplayText`, `Offset`, and `Duration`. Detailed responses include four different representations of display text. The default setting is `simple`. | Optional |
7979
| `profanity` | Specifies how to handle profanity in recognition results. Accepted values are: <br><br>`masked`, which replaces profanity with asterisks. <br>`removed`, which removes all profanity from the result. <br>`raw`, which includes profanity in the result. <br><br>The default setting is `masked`. | Optional |
80-
| `cid` | When you're using the [Speech Studio](speech-studio-overview.md) to create [custom models](./custom-speech-overview.md), you can take advantage of the **Endpoint ID** value from the **Deployment** page. Use the **Endpoint ID** value as the argument to the `cid` query string parameter. | Optional |
8180

8281
### Pronunciation assessment parameters
8382

@@ -360,7 +359,8 @@ using (var fs = new FileStream(audioFile, FileMode.Open, FileAccess.Read))
360359

361360
[!INCLUDE [](includes/cognitive-services-speech-service-rest-auth.md)]
362361

363-
## Next steps
362+
## Related content
364363

364+
- [Fast transcription API](fast-transcription-create.md)
365365
- [Customize speech models](./how-to-custom-speech-train-model.md)
366366
- [Get familiar with batch transcription](batch-transcription.md)

0 commit comments

Comments
 (0)