|
| 1 | +--- |
| 2 | +title: Locate audio files for batch transcription - Speech service |
| 3 | +titleSuffix: Azure Cognitive Services |
| 4 | +description: Batch transcription is used to transcribe a large amount of audio in storage. You should provide multiple files per request or point to an Azure Blob Storage container with the audio files to transcribe. |
| 5 | +services: cognitive-services |
| 6 | +manager: nitinme |
| 7 | +author: eric-urban |
| 8 | +ms.author: eur |
| 9 | +ms.service: cognitive-services |
| 10 | +ms.subservice: speech-service |
| 11 | +ms.topic: how-to |
| 12 | +ms.date: 09/11/2022 |
| 13 | +ms.devlang: csharp |
| 14 | +ms.custom: devx-track-csharp |
| 15 | +--- |
| 16 | + |
| 17 | +# Locate audio files for batch transcription |
| 18 | + |
| 19 | +Batch transcription is used to transcribe a large amount of audio in storage. Batch transcription can read audio files from a public URI (such as "https://crbn.us/hello.wav") or a [shared access signature (SAS)](../../storage/common/storage-sas-overview.md) URI. |
| 20 | + |
| 21 | +You should provide multiple files per request or point to an Azure Blob Storage container with the audio files to transcribe. The batch transcription service can handle a large number of submitted transcriptions. The service transcribes the files concurrently, which reduces the turnaround time. |
| 22 | + |
| 23 | +## Supported audio formats |
| 24 | + |
| 25 | +The batch transcription API supports the following formats: |
| 26 | + |
| 27 | +| Format | Codec | Bits per sample | Sample rate | |
| 28 | +|--------|-------|---------|---------------------------------| |
| 29 | +| WAV | PCM | 16-bit | 8 kHz or 16 kHz, mono or stereo | |
| 30 | +| MP3 | PCM | 16-bit | 8 kHz or 16 kHz, mono or stereo | |
| 31 | +| OGG | OPUS | 16-bit | 8 kHz or 16 kHz, mono or stereo | |
| 32 | + |
| 33 | +For stereo audio streams, the left and right channels are split during the transcription. A JSON result file is created for each input audio file. To create an ordered final transcript, use the timestamps that are generated per utterance. |
| 34 | + |
| 35 | +## Azure Blob Storage example |
| 36 | + |
| 37 | +Batch transcription can read audio files from a public URI (such as "https://crbn.us/hello.wav") or a [shared access signature (SAS)](../../storage/common/storage-sas-overview.md) URI. You can provide individual audio files, or an entire Azure Blob Storage container. You can also read or write transcription results in a container. This example shows how to transcribe audio files in [Azure Blob Storage](../../storage/blobs/storage-blobs-overview.md). |
| 38 | + |
| 39 | +The [SAS URI](../../storage/common/storage-sas-overview.md) must have `r` (read) and `l` (list) permissions. The storage container must have at most 5GB of audio data and a maximum number of 10,000 blobs. The maximum size for a blob is 2.5GB. |
| 40 | + |
| 41 | +Follow these steps to create a storage account, upload wav files from your local directory to a new container, and generate a SAS URL that you can use for batch transcriptions. |
| 42 | + |
| 43 | +1. Set the `RESOURCE_GROUP` environment variable to the name of an existing resource group where the new storage account will be created. |
| 44 | + |
| 45 | + ```azurecli-interactive |
| 46 | + set RESOURCE_GROUP=<your existing resource group name> |
| 47 | + ``` |
| 48 | +
|
| 49 | +1. Set the `AZURE_STORAGE_ACCOUNT` environment variable to the name of a storage account that you want to create. |
| 50 | +
|
| 51 | + ```azurecli-interactive |
| 52 | + set AZURE_STORAGE_ACCOUNT=<choose new storage account name> |
| 53 | + ``` |
| 54 | +
|
| 55 | +1. Create a new storage account with the [`az storage account create`](/cli/azure/storage/account#az-storage-account-create) command. Replace `eastus` with the region of your resource group. |
| 56 | +
|
| 57 | + ```azurecli-interactive |
| 58 | + az storage account create -n %AZURE_STORAGE_ACCOUNT% -g %RESOURCE_GROUP% -l eastus |
| 59 | + ``` |
| 60 | +
|
| 61 | + > [!TIP] |
| 62 | + > When you are finished with batch transcriptions and want to delete your storage account, use the [`az storage delete create`](/cli/azure/storage/account#az-storage-account-delete) command. |
| 63 | +
|
| 64 | +1. Get your new storage account keys with the [`az storage account keys list`](/cli/azure/storage/account#az-storage-account-keys-list) command. |
| 65 | +
|
| 66 | + ```azurecli-interactive |
| 67 | + az storage account keys list -g %RESOURCE_GROUP% -n %AZURE_STORAGE_ACCOUNT% |
| 68 | + ``` |
| 69 | +
|
| 70 | +1. Set the `AZURE_STORAGE_KEY` environment variable to one of the key values retrieved in the previous step. |
| 71 | +
|
| 72 | + ```azurecli-interactive |
| 73 | + set AZURE_STORAGE_KEY=<your storage account key> |
| 74 | + ``` |
| 75 | + |
| 76 | + > [!IMPORTANT] |
| 77 | + > The remaining steps use the `AZURE_STORAGE_ACCOUNT` and `AZURE_STORAGE_KEY` environment variables. If you didn't set the environment variables, you can pass the values as parameters to the commands. See the [az storage container create](/cli/azure/storage/) documentation for more information. |
| 78 | + |
| 79 | +1. Create a container with the [`az storage container create`](/cli/azure/storage/container#az-storage-container-create) command. Replace `<mycontainer>` with a name for your container. |
| 80 | +
|
| 81 | + ```azurecli-interactive |
| 82 | + az storage container create -n <mycontainer> |
| 83 | + ``` |
| 84 | +
|
| 85 | +1. The following [`az storage blob upload-batch`](/cli/azure/storage/blob#az-storage-blob-upload-batch) command uploads all .wav files from the current local directory. Replace `<mycontainer>` with a name for your container. Optionally you can modify the command to upload files from a different directory. |
| 86 | +
|
| 87 | + ```azurecli-interactive |
| 88 | + az storage blob upload-batch -d <mycontainer> -s . --pattern *.wav |
| 89 | + ``` |
| 90 | +
|
| 91 | +1. Generate a SAS URL with read (r) and list (l) permissions for the container with the [`az storage container generate-sas`](/cli/azure/storage/container#az-storage-container-generate-sas) command. Replace `<mycontainer>` with the name of your container. |
| 92 | +
|
| 93 | + ```azurecli-interactive |
| 94 | + az storage container generate-sas -n <mycontainer> --expiry 2022-09-09 --permissions rl --https-only |
| 95 | + ``` |
| 96 | +
|
| 97 | +The previous command returns a SAS token. Append the SAS token to your container blob URL to create a SAS URL. For example: `https://<storage_account_name>.blob.core.windows.net/<container_name>?SAS_TOKEN`. |
| 98 | +
|
| 99 | +You will use the SAS URL when you [create a batch transcription](batch-transcription-create.md) request. For example: |
| 100 | +
|
| 101 | +```json |
| 102 | +{ |
| 103 | + "contentContainerUrl": "https://<storage_account_name>.blob.core.windows.net/<container_name>?SAS_TOKEN" |
| 104 | +} |
| 105 | +``` |
| 106 | + |
| 107 | +## Next steps |
| 108 | + |
| 109 | +- [Batch transcription overview](batch-transcription.md) |
| 110 | +- [Create a batch transcription](batch-transcription-create.md) |
| 111 | +- [Get batch transcription results](batch-transcription-get.md) |
0 commit comments