Skip to content

Commit e0e3e2c

Browse files
authored
Merge pull request #214711 from eric-urban/eur/batch-secure-storage
batch secure storage
2 parents 87b80cf + 4d4ca6b commit e0e3e2c

File tree

6 files changed

+227
-39
lines changed

6 files changed

+227
-39
lines changed

articles/cognitive-services/Speech-Service/batch-transcription-audio-data.md

Lines changed: 187 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9,16 +9,22 @@ ms.author: eur
99
ms.service: cognitive-services
1010
ms.subservice: speech-service
1111
ms.topic: how-to
12-
ms.date: 09/11/2022
12+
ms.date: 10/21/2022
1313
ms.devlang: csharp
1414
ms.custom: devx-track-csharp
1515
---
1616

1717
# Locate audio files for batch transcription
1818

19-
Batch transcription is used to transcribe a large amount of audio in storage. Batch transcription can read audio files from a public URI (such as "https://crbn.us/hello.wav") or a [shared access signature (SAS)](../../storage/common/storage-sas-overview.md) URI.
19+
Batch transcription is used to transcribe a large amount of audio in storage. Batch transcription can access audio files from inside or outside of Azure.
2020

21-
You should provide multiple files per request or point to an Azure Blob Storage container with the audio files to transcribe. The batch transcription service can handle a large number of submitted transcriptions. The service transcribes the files concurrently, which reduces the turnaround time.
21+
When source audio files are stored outside of Azure, they can be accessed via a public URI (such as "https://crbn.us/hello.wav"). Files should be directly accessible; URIs that require authentication or that invoke interactive scripts before the file can be accessed aren't supported.
22+
23+
Audio files that are stored in Azure Blob storage can be accessed via one of two methods:
24+
- [Trusted Azure services security mechanism](#trusted-azure-services-security-mechanism)
25+
- [Shared access signature (SAS)](#sas-url-for-batch-transcription) URI.
26+
27+
You can specify one or multiple audio files when creating a transcription. We recommend that you provide multiple files per request or point to an Azure Blob storage container with the audio files to transcribe. The batch transcription service can handle a large number of submitted transcriptions. The service transcribes the files concurrently, which reduces the turnaround time.
2228

2329
## Supported audio formats
2430

@@ -32,15 +38,32 @@ The batch transcription API supports the following formats:
3238

3339
For stereo audio streams, the left and right channels are split during the transcription. A JSON result file is created for each input audio file. To create an ordered final transcript, use the timestamps that are generated per utterance.
3440

35-
## Azure Blob Storage example
41+
## Azure Blob Storage upload
42+
43+
When audio files are located in an [Azure Blob Storage](../../storage/blobs/storage-blobs-overview.md) account, you can request transcription of individual audio files or an entire Azure Blob Storage container. You can also [write transcription results](batch-transcription-create.md#destination-container-url) to a Blob container.
44+
45+
> [!NOTE]
46+
> For blob and container limits, see [batch transcription quotas and limits](speech-services-quotas-and-limits.md#batch-transcription).
47+
48+
# [Azure portal](#tab/portal)
3649

37-
Batch transcription can read audio files from a public URI (such as "https://crbn.us/hello.wav") or a [shared access signature (SAS)](../../storage/common/storage-sas-overview.md) URI. You can provide individual audio files, or an entire Azure Blob Storage container. You can also read or write transcription results in a container. This example shows how to transcribe audio files in [Azure Blob Storage](../../storage/blobs/storage-blobs-overview.md).
50+
Follow these steps to create a storage account and upload wav files from your local directory to a new container.
3851

39-
The [SAS URI](../../storage/common/storage-sas-overview.md) must have `r` (read) and `l` (list) permissions. The storage container must have at most 5GB of audio data and a maximum number of 10,000 blobs. The maximum size for a blob is 2.5GB.
52+
1. Go to the [Azure portal](https://portal.azure.com/) and sign in to your Azure account.
53+
1. <a href="https://portal.azure.com/#create/Microsoft.StorageAccount-ARM" title="Create a Storage account resource" target="_blank">Create a Storage account resource</a> in the Azure portal. Use the same subscription and resource group as your Speech resource.
54+
1. Select the Storage account.
55+
1. In the **Data storage** group in the left pane, select **Containers**.
56+
1. Select **+ Container**.
57+
1. Enter a name for the new container and select **Create**.
58+
1. Select the new container.
59+
1. Select **Upload**.
60+
1. Choose the files to upload and select **Upload**.
4061

41-
Follow these steps to create a storage account, upload wav files from your local directory to a new container, and generate a SAS URL that you can use for batch transcriptions.
62+
# [Azure CLI](#tab/azure-cli)
4263

43-
1. Set the `RESOURCE_GROUP` environment variable to the name of an existing resource group where the new storage account will be created.
64+
Follow these steps to create a storage account and upload wav files from your local directory to a new container.
65+
66+
1. Set the `RESOURCE_GROUP` environment variable to the name of an existing resource group where the new storage account will be created. Use the same subscription and resource group as your Speech resource.
4467

4568
```azurecli-interactive
4669
set RESOURCE_GROUP=<your existing resource group name>
@@ -88,14 +111,157 @@ Follow these steps to create a storage account, upload wav files from your local
88111
az storage blob upload-batch -d <mycontainer> -s . --pattern *.wav
89112
```
90113
91-
1. Generate a SAS URL with read (r) and list (l) permissions for the container with the [`az storage container generate-sas`](/cli/azure/storage/container#az-storage-container-generate-sas) command. Replace `<mycontainer>` with the name of your container.
114+
---
115+
116+
## Trusted Azure services security mechanism
117+
118+
This section explains how to set up and limit access to your batch transcription source audio files in an Azure Storage account using the [trusted Azure services security mechanism](../../storage/common/storage-network-security.md#trusted-access-based-on-a-managed-identity).
119+
120+
> [!NOTE]
121+
> With the trusted Azure services security mechanism, you need to use [Azure Blob storage](../../storage/blobs/storage-blobs-overview.md) to store audio files. Usage of [Azure Files](../../storage/files/storage-files-introduction.md) is not supported.
122+
123+
If you perform all actions in this section, your Storage account will be in the following configuration:
124+
- Access to all external network traffic is prohibited.
125+
- Access to Storage account using Storage account key is prohibited.
126+
- Access to Storage account blob storage using [shared access signatures (SAS)](../../storage/common/storage-sas-overview.md) is prohibited.
127+
- Access to the selected Speech resource is allowed using the resource [system assigned managed identity](../../active-directory/managed-identities-azure-resources/overview.md).
128+
129+
So in effect your Storage account becomes completely "locked" and can't be used in any scenario apart from transcribing audio files that were already present by the time the new configuration was applied. You should consider this configuration as a model as far as the security of your audio data is concerned and customize it according to your needs.
130+
131+
For example, you may allow traffic from selected public IP addresses and Azure Virtual networks. You may also set up access to your Storage account using [private endpoints](../../storage/common/storage-private-endpoints.md) (see as well [this tutorial](../../private-link/tutorial-private-endpoint-storage-portal.md)), re-enable access using Storage account key, allow access to other Azure trusted services, etc.
132+
133+
> [!NOTE]
134+
> Using [private endpoints for Speech](speech-services-private-link.md) isn't required to secure the storage account. You can use a private endpoint for batch transcription API requests, while separately accessing the source audio files from a secure storage account, or the other way around.
135+
136+
By following the steps below, you'll severely restrict access to the storage account. Then you'll assign the minimum required permissions for Speech resource managed identity to access the Storage account.
137+
138+
### Enable system assigned managed identity for the Speech resource
139+
140+
Follow these steps to enable system assigned managed identity for the Speech resource that you will use for batch transcription.
141+
142+
1. Go to the [Azure portal](https://portal.azure.com/) and sign in to your Azure account.
143+
1. Select the Speech resource.
144+
1. In the **Resource Management** group in the left pane, select **Identity**.
145+
1. On the **System assigned** tab, select **On** for the status.
146+
147+
> [!IMPORTANT]
148+
> User assigned managed identity won't meet requirements for the batch transcription storage account scenario. Be sure to enable system assigned managed identity.
149+
150+
1. Select **Save**
151+
152+
Now the managed identity for your Speech resource can be granted access to your storage account.
153+
154+
### Restrict access to the storage account
155+
156+
Follow these steps to restrict access to the storage account.
157+
158+
> [!IMPORTANT]
159+
> Upload audio files in a Blob container before locking down the storage account access.
160+
161+
1. Go to the [Azure portal](https://portal.azure.com/) and sign in to your Azure account.
162+
1. Select the Storage account.
163+
1. In the **Settings** group in the left pane, select **Configuration**.
164+
1. Select **Disabled** for **Allow Blob public access**.
165+
1. Select **Disabled** for **Allow storage account key access**
166+
1. Select **Save**.
167+
168+
For more information, see [Prevent anonymous public read access to containers and blobs](/azure/storage/blobs/anonymous-read-access-prevent) and [Prevent Shared Key authorization for an Azure Storage account](/azure/storage/common/shared-key-authorization-prevent).
169+
170+
### Configure Azure Storage firewall
171+
172+
Having restricted access to the Storage account, you need to grant access to specific managed identities. Follow these steps to add access for the Speech resource.
173+
174+
1. Go to the [Azure portal](https://portal.azure.com/) and sign in to your Azure account.
175+
1. Select the Storage account.
176+
1. In the **Security + networking** group in the left pane, select **Networking**.
177+
1. In the **Firewalls and virtual networks** tab, select **Enabled from selected virtual networks and IP addresses**.
178+
1. Deselect all check boxes.
179+
1. Make sure **Microsoft network routing** is selected.
180+
1. Under the **Resource instances** section, select **Microsoft.CognitiveServices/accounts** as the resource type and select your Speech resource as the instance name.
181+
1. Select **Save**.
182+
183+
> [!NOTE]
184+
> It may take up to 5 min for the network changes to propagate.
185+
186+
Although by now the network access is permitted, the Speech resource can't yet access the data in the Storage account. You need to assign a specific access role for Speech resource managed identity.
187+
188+
### Assign resource access role
189+
190+
Follow these steps to assign the **Storage Blob Data Reader** role to the managed identity of your Speech resource.
191+
192+
> [!IMPORTANT]
193+
> You need to be assigned the *Owner* role of the Storage account or higher scope (like Subscription) to perform the operation in the next steps. This is because only the *Owner* role can assign roles to others. See details [here](../../role-based-access-control/built-in-roles.md).
194+
195+
1. Go to the [Azure portal](https://portal.azure.com/) and sign in to your Azure account.
196+
1. Select the Storage account.
197+
1. Select **Access Control (IAM)** menu in the left pane.
198+
1. Select **Add role assignment** in the **Grant access to this resource** tile.
199+
1. Select **Storage Blob Data Reader** under **Role** and then select **Next**.
200+
1. Select **Managed identity** under **Members** > **Assign access to**.
201+
1. Assign the managed identity of your Speech resource and then select **Review + assign**.
202+
203+
:::image type="content" source="media/storage/storage-identity-access-management-role.png" alt-text="Screenshot of the managed role assignment review.":::
204+
205+
1. After confirming the settings, select **Review + assign**
206+
207+
Now the Speech resource managed identity has access to the Storage account and can access the audio files for batch transcription.
208+
209+
With system assigned managed identity, you'll use a plain Storage Account URL (no SAS or other additions) when you [create a batch transcription](batch-transcription-create.md) request. For example:
210+
211+
```json
212+
{
213+
"contentContainerUrl": "https://<storage_account_name>.blob.core.windows.net/<container_name>"
214+
}
215+
```
216+
217+
You could otherwise specify individual files in the container. For example:
218+
219+
```json
220+
{
221+
"contentUrls": [
222+
"https://<storage_account_name>.blob.core.windows.net/<container_name>/<file_name_1>",
223+
"https://<storage_account_name>.blob.core.windows.net/<container_name>/<file_name_2>"
224+
]
225+
}
226+
```
227+
228+
## SAS URL for batch transcription
229+
230+
A shared access signature (SAS) is a URI that grants restricted access to an Azure Storage container. Use it when you want to grant access to your batch transcription files for a specific time range without sharing your storage account key.
231+
232+
> [!TIP]
233+
> If the container with batch transcription source files should only be accessed by your Speech resource, use the [trusted Azure services security mechanism](#trusted-azure-services-security-mechanism) instead.
234+
235+
# [Azure portal](#tab/portal)
236+
237+
Follow these steps to generate a SAS URL that you can use for batch transcriptions.
238+
239+
1. Complete the steps in [Azure Blob Storage upload](#azure-blob-storage-upload) to create a Storage account and upload audio files to a new container.
240+
1. Select the new container.
241+
1. In the **Settings** group in the left pane, select **Shared access tokens**.
242+
1. Select **+ Container**.
243+
1. Select **Read** and **List** for **Permissions**.
244+
245+
:::image type="content" source="media/storage/storage-container-shared-access-signature.png" alt-text="Screenshot of the container SAS URI permissions.":::
246+
247+
1. Enter the start and expiry times for the SAS URI, or leave the defaults.
248+
1. Select **Generate SAS token and URL**.
249+
250+
# [Azure CLI](#tab/azure-cli)
251+
252+
Follow these steps to generate a SAS URL that you can use for batch transcriptions.
253+
254+
1. Complete the steps in [Azure Blob Storage upload](#azure-blob-storage-upload) to create a Storage account and upload audio files to a new container.
255+
1. Generate a SAS URL with read (r) and list (l) permissions for the container with the [`az storage container generate-sas`](/cli/azure/storage/container#az-storage-container-generate-sas) command. Choose a new expiry date and replace `<mycontainer>` with the name of your container.
92256

93257
```azurecli-interactive
94-
az storage container generate-sas -n <mycontainer> --expiry 2022-09-09 --permissions rl --https-only
258+
az storage container generate-sas -n <mycontainer> --expiry 2022-10-10 --permissions rl --https-only
95259
```
96260
97261
The previous command returns a SAS token. Append the SAS token to your container blob URL to create a SAS URL. For example: `https://<storage_account_name>.blob.core.windows.net/<container_name>?SAS_TOKEN`.
98262
263+
---
264+
99265
You will use the SAS URL when you [create a batch transcription](batch-transcription-create.md) request. For example:
100266
101267
```json
@@ -104,6 +270,17 @@ You will use the SAS URL when you [create a batch transcription](batch-transcrip
104270
}
105271
```
106272

273+
You could otherwise specify individual files in the container. You must generate and use a different SAS URL with read (r) permissions for each file. For example:
274+
275+
```json
276+
{
277+
"contentUrls": [
278+
"https://<storage_account_name>.blob.core.windows.net/<container_name>/<file_name_1>?SAS_TOKEN_1",
279+
"https://<storage_account_name>.blob.core.windows.net/<container_name>/<file_name_2>?SAS_TOKEN_2"
280+
]
281+
}
282+
```
283+
107284
## Next steps
108285

109286
- [Batch transcription overview](batch-transcription.md)

0 commit comments

Comments
 (0)