|
| 1 | +--- |
| 2 | +title: 'How to configure Azure Blob Storage with Azure OpenAI Batch' |
| 3 | +titleSuffix: Azure OpenAI |
| 4 | +description: Learn how to configure Azure Blob Storage with Azure OpenAI Batch |
| 5 | +manager: nitinme |
| 6 | +ms.service: azure-ai-openai |
| 7 | +ms.custom: references_regions |
| 8 | +ms.topic: how-to |
| 9 | +ms.date: 05/18/2025 |
| 10 | +author: mrbullwinkle |
| 11 | +ms.author: mbullwin |
| 12 | +recommendations: false |
| 13 | +--- |
| 14 | + |
| 15 | +# Configuring Azure Blob Storage for Azure OpenAI |
| 16 | + |
| 17 | +Azure OpenAI now supports using [Azure Blob Storage](/azure/storage/blobs/storage-blobs-introduction) for Azure OpenAI Batch input and output files. By using your own storage, you aren't subject to the batch restrictions on the number of files. |
| 18 | + |
| 19 | +## Region Support |
| 20 | + |
| 21 | +- australiaeast |
| 22 | +- eastus |
| 23 | +- germanywestcentral |
| 24 | +- northcentralus |
| 25 | +- polandcentral |
| 26 | +- swedencentral |
| 27 | +- switzerlandnorth |
| 28 | +- eastus2 |
| 29 | +- westus |
| 30 | + |
| 31 | +## Azure Blob Storage configuration |
| 32 | + |
| 33 | +### Prerequisites |
| 34 | + |
| 35 | +- An [Azure Blob Storage account](/azure/storage/blobs/storage-blobs-introduction). |
| 36 | +- An Azure OpenAI resource with a model of the deployment type `Global-Batch` or `DataZoneBatch` deployed. You can refer to the [resource creation and model deployment guide](./create-resource.md) for help with this process. |
| 37 | + |
| 38 | +### Managed identity |
| 39 | + |
| 40 | +In order for your Azure OpenAI resource to securely access your Azure Blob Storage account you need setup your resource with a **system assigned managed identity**. |
| 41 | + |
| 42 | +> [!NOTE] |
| 43 | +> Currently user assigned managed identities aren't supported. |
| 44 | +
|
| 45 | +1. Sign in to [https://portal.azure.com](https://portal.azure.com). |
| 46 | +2. Find your Azure OpenAI resource > Select **Resource Management** > **Identity** > **System assigned** > set status to **On**. |
| 47 | + |
| 48 | + :::image type="content" source="../media/how-to/batch-blob-storage/identity.png" alt-text="Screenshot that shows system managed identity configuration." lightbox="../media/how-to/batch-blob-storage/identity.png"::: |
| 49 | + |
| 50 | +### Role-based access control |
| 51 | + |
| 52 | +Once your Azure OpenAI resource has been configured for system assigned managed identity, you need to give it access to your Azure Blob Storage account. |
| 53 | + |
| 54 | +1. From [https://portal.azure.com](https://portal.azure.com) find and select your Azure Blob Storage resource. |
| 55 | +2. Select **Access Control (IAM)** > **Add** > **Add role assignment**. |
| 56 | + |
| 57 | + :::image type="content" source="../media/how-to/batch-blob-storage/access-control.png" alt-text="Screenshot that shows access control interface for an Azure Blob Storage resource." lightbox="../media/how-to/batch-blob-storage/access-control.png"::: |
| 58 | + |
| 59 | +3. Search for **Storage Blob Data Contributor** > **Next**. |
| 60 | +4. Select **Managed identity** > **+Select members** > Select your Azure OpenAI resources's managed identity. |
| 61 | + |
| 62 | + :::image type="content" source="../media/how-to/batch-blob-storage/add-role.png" alt-text="Screenshot that shows Storage Blob Data Contributor role assignment." lightbox="../media/how-to/batch-blob-storage/add-role.png"::: |
| 63 | + |
| 64 | +If you prefer using custom roles for more granular access, the following permissions are required: |
| 65 | + |
| 66 | +**Input data**: |
| 67 | + |
| 68 | +- `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read` |
| 69 | + |
| 70 | +**Output data/folders**: |
| 71 | + |
| 72 | +- `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read` |
| 73 | +- `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write` |
| 74 | + |
| 75 | +### Create containers |
| 76 | + |
| 77 | +For this example you'll create two containers named `batch-input`, and `batch-output`. You can name these whatever you want, but if you use an alternate name you'll need to adjust the examples in the following steps. |
| 78 | + |
| 79 | +To create a container under **Data storage** > Select **+Container** > Name your containers. |
| 80 | + |
| 81 | +:::image type="content" source="../media/how-to/batch-blob-storage/container.png" alt-text="Screenshot that shows Storage Blob Data containers." lightbox="../media/how-to/batch-blob-storage/container.png"::: |
| 82 | + |
| 83 | +Once your containers are created retrieve the URL for each container by selecting the container > **Settings** > **Properties** > Copy the URLs. |
| 84 | + |
| 85 | +In this case we have: |
| 86 | + |
| 87 | +- `https://{AZURE-BLOB-STORAGE-RESOURCE-NAME}.blob.core.windows.net/batch-input` |
| 88 | +- `https://{AZURE-BLOB-STORAGE-RESOURCE-NAME}.blob.core.windows.net/batch-output` |
| 89 | + |
| 90 | +### Create input file |
| 91 | + |
| 92 | +For this article, we'll create a file named `test.jsonl` and will copy the contents below to the file. You'll need to modify and add your global batch deployment name to each line of the file. |
| 93 | + |
| 94 | +```json |
| 95 | +{"custom_id": "task-0", "method": "POST", "url": "/chat/completions", "body": {"model": "REPLACE-WITH-MODEL-DEPLOYMENT-NAME", "messages": [{"role": "system", "content": "You are an AI assistant that helps people find information."}, {"role": "user", "content": "When was Microsoft founded?"}]}} |
| 96 | +{"custom_id": "task-1", "method": "POST", "url": "/chat/completions", "body": {"model": "REPLACE-WITH-MODEL-DEPLOYMENT-NAME", "messages": [{"role": "system", "content": "You are an AI assistant that helps people find information."}, {"role": "user", "content": "When was the first XBOX released?"}]}} |
| 97 | +{"custom_id": "task-2", "method": "POST", "url": "/chat/completions", "body": {"model": "REPLACE-WITH-MODEL-DEPLOYMENT-NAME", "messages": [{"role": "system", "content": "You are an AI assistant that helps people find information."}, {"role": "user", "content": "What is Altair Basic?"}]}} |
| 98 | +``` |
| 99 | + |
| 100 | +### Upload training file |
| 101 | + |
| 102 | +From your Azure Blob Storage account, open your **batch-input** container that you created previously. |
| 103 | + |
| 104 | +Select **Upload** and select your `test.jsonl` file. |
| 105 | + |
| 106 | +:::image type="content" source="../media/how-to/batch-blob-storage/upload.png" alt-text="Screenshot that shows Azure Storage Blob container upload UX." lightbox="../media/how-to/batch-blob-storage/upload.png"::: |
| 107 | + |
| 108 | +During the time that we're processing your `jsonl` file as part of the batch job, you can't make any changes to the file. If a file changes while the batch job is running the job will fail. |
| 109 | + |
| 110 | +## Create batch job |
| 111 | + |
| 112 | +> [!NOTE] |
| 113 | +> `metadata` is currently not supported with this capability. |
| 114 | +
|
| 115 | +# [Python](#tab/python) |
| 116 | + |
| 117 | +```python |
| 118 | +import os |
| 119 | +from datetime import datetime |
| 120 | +from openai import AzureOpenAI |
| 121 | +from azure.identity import DefaultAzureCredential, get_bearer_token_provider |
| 122 | + |
| 123 | +token_provider = get_bearer_token_provider( |
| 124 | + DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default" |
| 125 | +) |
| 126 | + |
| 127 | +client = AzureOpenAI( |
| 128 | + azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"), |
| 129 | + azure_ad_token_provider=token_provider, |
| 130 | + api_version="2025-04-01-preview" |
| 131 | +) |
| 132 | + |
| 133 | +batch_response = client.batches.create( |
| 134 | + input_file_id=None, |
| 135 | + endpoint="/chat/completions", |
| 136 | + completion_window="24h", |
| 137 | + extra_body={ |
| 138 | + "input_blob": "https://{AZURE-BLOB-STORAGE-RESOURCE-NAME}.blob.core.windows.net/batch-input/test.jsonl", |
| 139 | + "output_folder": { |
| 140 | + "url": "https://{AZURE-BLOB-STORAGE-RESOURCE-NAME}.blob.core.windows.net/batch-output", |
| 141 | + } |
| 142 | + } |
| 143 | +) |
| 144 | + |
| 145 | +# Save batch ID for later use |
| 146 | +batch_id = batch_response.id |
| 147 | + |
| 148 | +print(batch_response.model_dump_json(indent=2)) |
| 149 | + |
| 150 | +``` |
| 151 | + |
| 152 | +# [REST ](#tab/rest) |
| 153 | + |
| 154 | +```HTTP |
| 155 | +curl -X POST https://YOUR_RESOURCE_NAME.openai.azure.com/openai/batches?api-version=2025-04-01-preview \ |
| 156 | + -H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN" \ |
| 157 | + -H "Content-Type: application/json" \ |
| 158 | + -d '{ |
| 159 | + "input_file_id": null, |
| 160 | + "endpoint": "/chat/completions", |
| 161 | + "completion_window": "24h", |
| 162 | + "input_blob": "https://{AZURE-BLOB-STORAGE-RESOURCE-NAME}.blob.core.windows.net/batch-input/test.jsonl", |
| 163 | + "output_folder": { |
| 164 | + "url": "https://{AZURE-BLOB-STORAGE-RESOURCE-NAME}.blob.core.windows.net/batch-output" |
| 165 | + } |
| 166 | + }' |
| 167 | +``` |
| 168 | + |
| 169 | +--- |
| 170 | + |
| 171 | +**Output:** |
| 172 | + |
| 173 | +```json |
| 174 | +{ |
| 175 | + "id": "batch_b632a805-797b-49ed-9c9c-86eb4057f2a2", |
| 176 | + "completion_window": "24h", |
| 177 | + "created_at": 1747516485, |
| 178 | + "endpoint": "/chat/completions", |
| 179 | + "input_file_id": null, |
| 180 | + "object": "batch", |
| 181 | + "status": "validating", |
| 182 | + "cancelled_at": null, |
| 183 | + "cancelling_at": null, |
| 184 | + "completed_at": null, |
| 185 | + "error_file_id": null, |
| 186 | + "errors": null, |
| 187 | + "expired_at": null, |
| 188 | + "expires_at": 1747602881, |
| 189 | + "failed_at": null, |
| 190 | + "finalizing_at": null, |
| 191 | + "in_progress_at": null, |
| 192 | + "metadata": null, |
| 193 | + "output_file_id": null, |
| 194 | + "request_counts": { |
| 195 | + "completed": 0, |
| 196 | + "failed": 0, |
| 197 | + "total": 0 |
| 198 | + }, |
| 199 | + "error_blob": "", |
| 200 | + "input_blob": "https://docstest002.blob.core.windows.net/batch-input/test.jsonl", |
| 201 | + "output_blob": "" |
| 202 | +} |
| 203 | +``` |
| 204 | + |
| 205 | +You can monitor the status the same way you would previously as outlined in our [comprehensive guide on using Azure OpenAI batch](./batch.md). |
| 206 | + |
| 207 | +```python |
| 208 | +import time |
| 209 | +import datetime |
| 210 | + |
| 211 | +status = "validating" |
| 212 | +while status not in ("completed", "failed", "canceled"): |
| 213 | + time.sleep(60) |
| 214 | + batch_response = client.batches.retrieve(batch_id) |
| 215 | + status = batch_response.status |
| 216 | + print(f"{datetime.datetime.now()} Batch Id: {batch_id}, Status: {status}") |
| 217 | + |
| 218 | +if batch_response.status == "failed": |
| 219 | + for error in batch_response.errors.data: |
| 220 | + print(f"Error code {error.code} Message {error.message}") |
| 221 | +``` |
| 222 | + |
| 223 | +**Output:** |
| 224 | + |
| 225 | +```cmd |
| 226 | +2025-05-17 17:16:56.950427 Batch Id: batch_b632a805-797b-49ed-9c9c-86eb4057f2a2, Status: validating |
| 227 | +2025-05-17 17:17:57.532054 Batch Id: batch_b632a805-797b-49ed-9c9c-86eb4057f2a2, Status: validating |
| 228 | +2025-05-17 17:18:58.156793 Batch Id: batch_b632a805-797b-49ed-9c9c-86eb4057f2a2, Status: in_progress |
| 229 | +2025-05-17 17:19:58.739708 Batch Id: batch_b632a805-797b-49ed-9c9c-86eb4057f2a2, Status: in_progress |
| 230 | +2025-05-17 17:20:59.398508 Batch Id: batch_b632a805-797b-49ed-9c9c-86eb4057f2a2, Status: finalizing |
| 231 | +2025-05-17 17:22:00.242371 Batch Id: batch_b632a805-797b-49ed-9c9c-86eb4057f2a2, Status: completed |
| 232 | +``` |
| 233 | + |
| 234 | +Once the `status` is `completed` you can retrieve your `output_blob` path: |
| 235 | + |
| 236 | +```python |
| 237 | +print(batch_response.model_dump_json(indent=2)) |
| 238 | +``` |
| 239 | + |
| 240 | +**Output:** |
| 241 | + |
| 242 | +```json |
| 243 | +{ |
| 244 | + "id": "batch_b632a805-797b-49ed-9c9c-86eb4057f2a2", |
| 245 | + "completion_window": "24h", |
| 246 | + "created_at": 1747516485, |
| 247 | + "endpoint": "/chat/completions", |
| 248 | + "input_file_id": null, |
| 249 | + "object": "batch", |
| 250 | + "status": "completed", |
| 251 | + "cancelled_at": null, |
| 252 | + "cancelling_at": null, |
| 253 | + "completed_at": 1747516883, |
| 254 | + "error_file_id": null, |
| 255 | + "errors": null, |
| 256 | + "expired_at": null, |
| 257 | + "expires_at": 1747602881, |
| 258 | + "failed_at": null, |
| 259 | + "finalizing_at": 1747516834, |
| 260 | + "in_progress_at": 1747516722, |
| 261 | + "metadata": null, |
| 262 | + "output_file_id": null, |
| 263 | + "request_counts": { |
| 264 | + "completed": 3, |
| 265 | + "failed": 0, |
| 266 | + "total": 3 |
| 267 | + }, |
| 268 | + "error_blob": "https://docstest002.blob.core.windows.net/batch-output/{GUID}/errors.jsonl", |
| 269 | + "input_blob": "https://docstest002.blob.core.windows.net/batch-input/test.jsonl", |
| 270 | + "output_blob": "https://docstest002.blob.core.windows.net/batch-output/{GUID}/results.jsonl" |
| 271 | +} |
| 272 | +``` |
| 273 | + |
| 274 | +Once your batch job is complete, you can download the `error_blob` and `output_blob` via the Azure Blob Storage interface in the Azure portal or you can download programmatically: |
| 275 | + |
| 276 | +> [!NOTE] |
| 277 | +> `error_blob`, and `output_blob` paths are always returned in the response even in cases where a corresponding file isn't created. In this case there were no errors so `errors.jsonl` wasn't created, only `results.jsonl` exists. |
| 278 | +
|
| 279 | +```cmd |
| 280 | +pip install azure-identity azure-storage-blob |
| 281 | +``` |
| 282 | + |
| 283 | +Keep in mind that while you have granted the Azure OpenAI resource programmatic access to your Azure Blob Storage, to download the results you might need to also give the user account that is executing the script below access as well. For downloading the file, `Storage Blob Data Reader` access is sufficient. |
| 284 | + |
| 285 | +```python |
| 286 | +# Import required libraries |
| 287 | +from azure.identity import DefaultAzureCredential |
| 288 | +from azure.storage.blob import BlobServiceClient |
| 289 | + |
| 290 | +# Define storage account and container information |
| 291 | +storage_account_name = "docstest002" # replace with your storage account name |
| 292 | +container_name = "batch-output" |
| 293 | + |
| 294 | +# Define the blob paths to download |
| 295 | +blob_paths = [ |
| 296 | + "{REPLACE-WITH-YOUR-GUID}/results.jsonl", |
| 297 | +] |
| 298 | + |
| 299 | +credential = DefaultAzureCredential() |
| 300 | +account_url = f"https://{storage_account_name}.blob.core.windows.net" |
| 301 | +blob_service_client = BlobServiceClient(account_url=account_url, credential=credential) |
| 302 | +container_client = blob_service_client.get_container_client(container_name) |
| 303 | + |
| 304 | +for blob_path in blob_paths: |
| 305 | + blob_client = container_client.get_blob_client(blob_path) |
| 306 | + |
| 307 | + file_name = blob_path.split("/")[-1] |
| 308 | + |
| 309 | + print(f"Downloading {file_name}...") |
| 310 | + with open(file_name, "wb") as file: |
| 311 | + download_stream = blob_client.download_blob() |
| 312 | + file.write(download_stream.readall()) |
| 313 | + |
| 314 | + print(f"Downloaded {file_name} successfully!") |
| 315 | +``` |
| 316 | + |
| 317 | +## See also |
| 318 | + |
| 319 | +For more information on Azure OpenAI Batch, see the [comprehensive batch guide](./batch.md). |
0 commit comments