You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/batch/tutorial-parallel-python.md
+35-39Lines changed: 35 additions & 39 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,67 +1,67 @@
1
1
---
2
-
title: Tutorial - Run a parallel workload using the Python API
3
-
description: Tutorial - Process media files in parallel with ffmpeg in Azure Batch using the Batch Python client library
2
+
title: "Tutorial: Run a parallel workload using the Python API"
3
+
description: Learn how to process media files in parallel using ffmpeg in Azure Batch with the Batch Python client library.
4
4
ms.devlang: python
5
5
ms.topic: tutorial
6
-
ms.date: 12/13/2021
6
+
ms.date: 04/19/2023
7
7
ms.custom: mvc, devx-track-python
8
8
---
9
9
10
10
# Tutorial: Run a parallel workload with Azure Batch using the Python API
11
11
12
-
Use Azure Batch to run large-scale parallel and high-performance computing (HPC) batch jobs efficiently in Azure. This tutorial walks through a Python example of running a parallel workload using Batch. You learn a common Batch application workflow and how to interact programmatically with Batch and Storage resources. You learn how to:
12
+
Use Azure Batch to run large-scale parallel and high-performance computing (HPC) batch jobs efficiently in Azure. This tutorial walks through a Python example of running a parallel workload using Batch. You learn a common Batch application workflow and how to interact programmatically with Batch and Storage resources.
13
13
14
14
> [!div class="checklist"]
15
-
> * Authenticate with Batch and Storage accounts
16
-
> * Upload input files to Storage
17
-
> * Create a pool of compute nodes to run an application
18
-
> * Create a job and tasks to process input files
19
-
> * Monitor task execution
20
-
> * Retrieve output files
15
+
> * Authenticate with Batch and Storage accounts.
16
+
> * Upload input files to Storage.
17
+
> * Create a pool of compute nodes to run an application.
18
+
> * Create a job and tasks to process input files.
19
+
> * Monitor task execution.
20
+
> * Retrieve output files.
21
21
22
-
In this tutorial, you convert MP4 media files in parallel to MP3 format using the [ffmpeg](https://ffmpeg.org/) open-source tool.
22
+
In this tutorial, you convert MP4 media files to MP3 format, in parallel, by using the [ffmpeg](https://ffmpeg.org/) open-source tool.
* An Azure Batch account and a linked Azure Storage account. To create these accounts, see the Batch quickstarts using the[Azure portal](quick-create-portal.md) or [Azure CLI](quick-create-cli.md).
32
+
* An Azure Batch account and a linked Azure Storage account. To create these accounts, see the Batch quickstart guides for[Azure portal](quick-create-portal.md) or [Azure CLI](quick-create-cli.md).
33
33
34
34
## Sign in to Azure
35
35
36
-
Sign in to the Azure portal at [https://portal.azure.com](https://portal.azure.com).
36
+
Sign in to the [Azure portal](https://portal.azure.com).
[Download or clone the sample app](https://github.com/Azure-Samples/batch-python-ffmpeg-tutorial) from GitHub. To clone the sample app repo with a Git client, use the following command:
Navigate to the directory that contains the file `batch_python_tutorial_ffmpeg.py`.
50
+
Navigate to the directory that contains the file *batch_python_tutorial_ffmpeg.py*.
51
51
52
52
In your Python environment, install the required packages using `pip`.
53
53
54
54
```bash
55
55
pip install -r requirements.txt
56
56
```
57
57
58
-
Open the file `config.py`. Update the Batch and storage account credential strings with the values unique to your accounts. For example:
58
+
Use a code editor to open the file *config.py*. Update the Batch and storage account credential strings with the values unique to your accounts. For example:
Go to your Batch account in the Azure portal to monitor the pool, compute nodes, job, and tasks. For example, to see a heat map of the compute nodes in your pool, click**Pools** > *LinuxFFmpegPool*.
100
+
Go to your Batch account in the Azure portal to monitor the pool, compute nodes, job, and tasks. For example, to see a heat map of the compute nodes in your pool, select**Pools** > **LinuxFFmpegPool**.
101
101
102
102
When tasks are running, the heat map is similar to the following:
The app uses the `blob_client` reference create a storage container for the input MP4 files and a container for the task output. Then, it calls the `upload_file_to_container` function to upload MP4 files in the local `InputFiles` directory to the container. The files in storage are defined as Batch [ResourceFile](/python/api/azure-batch/azure.batch.models.resourcefile) objects that Batch can later download to compute nodes.
137
+
The app uses the `blob_client` reference create a storage container for the input MP4 files and a container for the task output. Then, it calls the `upload_file_to_container` function to upload MP4 files in the local *InputFiles* directory to the container. The files in storage are defined as Batch [ResourceFile](/python/api/azure-batch/azure.batch.models.resourcefile) objects that Batch can later download to compute nodes.
Next, the sample creates a pool of compute nodes in the Batch account with a call to `create_pool`. This defined function uses the Batch [PoolAddParameter](/python/api/azure-batch/azure.batch.models.pooladdparameter) class to set the number of nodes, VM size, and a pool configuration. Here, a [VirtualMachineConfiguration](/python/api/azure-batch/azure.batch.models.virtualmachineconfiguration) object specifies an [ImageReference](/python/api/azure-batch/azure.batch.models.imagereference) to an Ubuntu Server 18.04 LTS image published in the Azure Marketplace. Batch supports a wide range of VM images in the Azure Marketplace, as well as custom VM images.
159
159
160
-
The number of nodes and VM size are set using defined constants. Batch supports dedicated nodes and [Spot nodes](batch-spot-vms.md), and you can use either or both in your pools. Dedicated nodes are reserved for your pool. Spot nodes are offered at a reduced price from surplus VM capacity in Azure. Spot nodes become unavailable if Azure does not have enough capacity. The sample by default creates a pool containing only 5 Spot nodes in size *Standard_A1_v2*.
160
+
The number of nodes and VM size are set using defined constants. Batch supports dedicated nodes and [Spot nodes](batch-spot-vms.md), and you can use either or both in your pools. Dedicated nodes are reserved for your pool. Spot nodes are offered at a reduced price from surplus VM capacity in Azure. Spot nodes become unavailable if Azure doesn't have enough capacity. The sample by default creates a pool containing only five Spot nodes in size *Standard_A1_v2*.
161
161
162
162
In addition to physical node properties, this pool configuration includes a [StartTask](/python/api/azure-batch/azure.batch.models.starttask) object. The StartTask executes on each node as that node joins the pool, and each time a node is restarted. In this example, the StartTask runs Bash shell commands to install the ffmpeg package and dependencies on the nodes.
163
163
@@ -259,22 +259,18 @@ while datetime.datetime.now() < timeout_expiration:
259
259
260
260
After it runs the tasks, the app automatically deletes the input storage container it created, and gives you the option to delete the Batch pool and job. The BatchClient's [JobOperations](/python/api/azure-batch/azure.batch.operations.joboperations) and [PoolOperations](/python/api/azure-batch/azure.batch.operations.pooloperations) classes both have delete methods, which are called if you confirm deletion. Although you're not charged for jobs and tasks themselves, you are charged for compute nodes. Thus, we recommend that you allocate pools only as needed. When you delete the pool, all task output on the nodes is deleted. However, the input and output files remain in the storage account.
261
261
262
-
When no longer needed, delete the resource group, Batch account, and storage account. To do so in the Azure portal, select the resource group for the Batch account and click**Delete resource group**.
262
+
When no longer needed, delete the resource group, Batch account, and storage account. To do so in the Azure portal, select the resource group for the Batch account and choose**Delete resource group**.
263
263
264
264
## Next steps
265
265
266
266
In this tutorial, you learned how to:
267
267
268
268
> [!div class="checklist"]
269
-
> * Authenticate with Batch and Storage accounts
270
-
> * Upload input files to Storage
271
-
> * Create a pool of compute nodes to run an application
272
-
> * Create a job and tasks to process input files
273
-
> * Monitor task execution
274
-
> * Retrieve output files
275
-
276
-
For more examples of using the Python API to schedule and process Batch workloads, see the samples on GitHub.
> * Create a pool of compute nodes to run an application.
272
+
> * Create a job and tasks to process input files.
273
+
> * Monitor task execution.
274
+
> * Retrieve output files.
275
+
276
+
For more examples of using the Python API to schedule and process Batch workloads, see the [Batch Python samples](https://github.com/Azure/azure-batch-samples/tree/master/Python/Batch) on GitHub.
0 commit comments