Skip to content

Commit e50e455

Browse files
Merge pull request #234970 from cdpark/parallel-workload-python
Freshness Pass for User Story: 79612 Python workloads
2 parents 2464602 + d190272 commit e50e455

File tree

1 file changed

+35
-39
lines changed

1 file changed

+35
-39
lines changed

articles/batch/tutorial-parallel-python.md

Lines changed: 35 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,67 +1,67 @@
11
---
2-
title: Tutorial - Run a parallel workload using the Python API
3-
description: Tutorial - Process media files in parallel with ffmpeg in Azure Batch using the Batch Python client library
2+
title: "Tutorial: Run a parallel workload using the Python API"
3+
description: Learn how to process media files in parallel using ffmpeg in Azure Batch with the Batch Python client library.
44
ms.devlang: python
55
ms.topic: tutorial
6-
ms.date: 12/13/2021
6+
ms.date: 04/19/2023
77
ms.custom: mvc, devx-track-python
88
---
99

1010
# Tutorial: Run a parallel workload with Azure Batch using the Python API
1111

12-
Use Azure Batch to run large-scale parallel and high-performance computing (HPC) batch jobs efficiently in Azure. This tutorial walks through a Python example of running a parallel workload using Batch. You learn a common Batch application workflow and how to interact programmatically with Batch and Storage resources. You learn how to:
12+
Use Azure Batch to run large-scale parallel and high-performance computing (HPC) batch jobs efficiently in Azure. This tutorial walks through a Python example of running a parallel workload using Batch. You learn a common Batch application workflow and how to interact programmatically with Batch and Storage resources.
1313

1414
> [!div class="checklist"]
15-
> * Authenticate with Batch and Storage accounts
16-
> * Upload input files to Storage
17-
> * Create a pool of compute nodes to run an application
18-
> * Create a job and tasks to process input files
19-
> * Monitor task execution
20-
> * Retrieve output files
15+
> * Authenticate with Batch and Storage accounts.
16+
> * Upload input files to Storage.
17+
> * Create a pool of compute nodes to run an application.
18+
> * Create a job and tasks to process input files.
19+
> * Monitor task execution.
20+
> * Retrieve output files.
2121
22-
In this tutorial, you convert MP4 media files in parallel to MP3 format using the [ffmpeg](https://ffmpeg.org/) open-source tool.
22+
In this tutorial, you convert MP4 media files to MP3 format, in parallel, by using the [ffmpeg](https://ffmpeg.org/) open-source tool.
2323

2424
[!INCLUDE [quickstarts-free-trial-note.md](../../includes/quickstarts-free-trial-note.md)]
2525

2626
## Prerequisites
2727

28-
* [Python version 3.7+](https://www.python.org/downloads/)
28+
* [Python version 3.7 or later](https://www.python.org/downloads/)
2929

30-
* [pip](https://pip.pypa.io/en/stable/installing/) package manager
30+
* [pip package manager](https://pip.pypa.io/en/stable/installation/)
3131

32-
* An Azure Batch account and a linked Azure Storage account. To create these accounts, see the Batch quickstarts using the [Azure portal](quick-create-portal.md) or [Azure CLI](quick-create-cli.md).
32+
* An Azure Batch account and a linked Azure Storage account. To create these accounts, see the Batch quickstart guides for [Azure portal](quick-create-portal.md) or [Azure CLI](quick-create-cli.md).
3333

3434
## Sign in to Azure
3535

36-
Sign in to the Azure portal at [https://portal.azure.com](https://portal.azure.com).
36+
Sign in to the [Azure portal](https://portal.azure.com).
3737

3838
[!INCLUDE [batch-common-credentials](../../includes/batch-common-credentials.md)]
3939

40-
## Download and run the sample
40+
## Download and run the sample app
4141

42-
### Download the sample
42+
### Download the sample app
4343

4444
[Download or clone the sample app](https://github.com/Azure-Samples/batch-python-ffmpeg-tutorial) from GitHub. To clone the sample app repo with a Git client, use the following command:
4545

4646
```
4747
git clone https://github.com/Azure-Samples/batch-python-ffmpeg-tutorial.git
4848
```
4949

50-
Navigate to the directory that contains the file `batch_python_tutorial_ffmpeg.py`.
50+
Navigate to the directory that contains the file *batch_python_tutorial_ffmpeg.py*.
5151

5252
In your Python environment, install the required packages using `pip`.
5353

5454
```bash
5555
pip install -r requirements.txt
5656
```
5757

58-
Open the file `config.py`. Update the Batch and storage account credential strings with the values unique to your accounts. For example:
58+
Use a code editor to open the file *config.py*. Update the Batch and storage account credential strings with the values unique to your accounts. For example:
5959

6060

6161
```Python
62-
_BATCH_ACCOUNT_NAME = 'mybatchaccount'
62+
_BATCH_ACCOUNT_NAME = 'yourbatchaccount'
6363
_BATCH_ACCOUNT_KEY = 'xxxxxxxxxxxxxxxxE+yXrRvJAqT9BlXwwo1CwF+SwAYOxxxxxxxxxxxxxxxx43pXi/gdiATkvbpLRl3x14pcEQ=='
64-
_BATCH_ACCOUNT_URL = 'https://mybatchaccount.mybatchregion.batch.azure.com'
64+
_BATCH_ACCOUNT_URL = 'https://yourbatchaccount.yourbatchregion.batch.azure.com'
6565
_STORAGE_ACCOUNT_NAME = 'mystorageaccount'
6666
_STORAGE_ACCOUNT_KEY = 'xxxxxxxxxxxxxxxxy4/xxxxxxxxxxxxxxxxfwpbIC5aAWA8wDu+AFXZB827Mt9lybZB1nUcQbQiUrkPtilK5BQ=='
6767
```
@@ -97,13 +97,13 @@ Sample end: 11/28/2018 3:29:36 PM
9797
Elapsed time: 00:09:14.3418742
9898
```
9999

100-
Go to your Batch account in the Azure portal to monitor the pool, compute nodes, job, and tasks. For example, to see a heat map of the compute nodes in your pool, click **Pools** > *LinuxFFmpegPool*.
100+
Go to your Batch account in the Azure portal to monitor the pool, compute nodes, job, and tasks. For example, to see a heat map of the compute nodes in your pool, select **Pools** > **LinuxFFmpegPool**.
101101

102102
When tasks are running, the heat map is similar to the following:
103103

104-
![Pool heat map](./media/tutorial-parallel-python/pool.png)
104+
:::image type="content" source="./media/tutorial-parallel-python/pool.png" alt-text="Screenshot of Pool heat map.":::
105105

106-
Typical execution time is approximately **5 minutes** when you run the application in its default configuration. Pool creation takes the most time.
106+
Typical execution time is approximately *5 minutes* when you run the application in its default configuration. Pool creation takes the most time.
107107

108108
[!INCLUDE [batch-common-tutorial-download](../../includes/batch-common-tutorial-download.md)]
109109

@@ -134,7 +134,7 @@ batch_client = batch.BatchServiceClient(
134134

135135
### Upload input files
136136

137-
The app uses the `blob_client` reference create a storage container for the input MP4 files and a container for the task output. Then, it calls the `upload_file_to_container` function to upload MP4 files in the local `InputFiles` directory to the container. The files in storage are defined as Batch [ResourceFile](/python/api/azure-batch/azure.batch.models.resourcefile) objects that Batch can later download to compute nodes.
137+
The app uses the `blob_client` reference create a storage container for the input MP4 files and a container for the task output. Then, it calls the `upload_file_to_container` function to upload MP4 files in the local *InputFiles* directory to the container. The files in storage are defined as Batch [ResourceFile](/python/api/azure-batch/azure.batch.models.resourcefile) objects that Batch can later download to compute nodes.
138138

139139
```python
140140
blob_client.create_container(input_container_name, fail_on_exist=False)
@@ -157,7 +157,7 @@ input_files = [
157157

158158
Next, the sample creates a pool of compute nodes in the Batch account with a call to `create_pool`. This defined function uses the Batch [PoolAddParameter](/python/api/azure-batch/azure.batch.models.pooladdparameter) class to set the number of nodes, VM size, and a pool configuration. Here, a [VirtualMachineConfiguration](/python/api/azure-batch/azure.batch.models.virtualmachineconfiguration) object specifies an [ImageReference](/python/api/azure-batch/azure.batch.models.imagereference) to an Ubuntu Server 18.04 LTS image published in the Azure Marketplace. Batch supports a wide range of VM images in the Azure Marketplace, as well as custom VM images.
159159

160-
The number of nodes and VM size are set using defined constants. Batch supports dedicated nodes and [Spot nodes](batch-spot-vms.md), and you can use either or both in your pools. Dedicated nodes are reserved for your pool. Spot nodes are offered at a reduced price from surplus VM capacity in Azure. Spot nodes become unavailable if Azure does not have enough capacity. The sample by default creates a pool containing only 5 Spot nodes in size *Standard_A1_v2*.
160+
The number of nodes and VM size are set using defined constants. Batch supports dedicated nodes and [Spot nodes](batch-spot-vms.md), and you can use either or both in your pools. Dedicated nodes are reserved for your pool. Spot nodes are offered at a reduced price from surplus VM capacity in Azure. Spot nodes become unavailable if Azure doesn't have enough capacity. The sample by default creates a pool containing only five Spot nodes in size *Standard_A1_v2*.
161161

162162
In addition to physical node properties, this pool configuration includes a [StartTask](/python/api/azure-batch/azure.batch.models.starttask) object. The StartTask executes on each node as that node joins the pool, and each time a node is restarted. In this example, the StartTask runs Bash shell commands to install the ffmpeg package and dependencies on the nodes.
163163

@@ -259,22 +259,18 @@ while datetime.datetime.now() < timeout_expiration:
259259

260260
After it runs the tasks, the app automatically deletes the input storage container it created, and gives you the option to delete the Batch pool and job. The BatchClient's [JobOperations](/python/api/azure-batch/azure.batch.operations.joboperations) and [PoolOperations](/python/api/azure-batch/azure.batch.operations.pooloperations) classes both have delete methods, which are called if you confirm deletion. Although you're not charged for jobs and tasks themselves, you are charged for compute nodes. Thus, we recommend that you allocate pools only as needed. When you delete the pool, all task output on the nodes is deleted. However, the input and output files remain in the storage account.
261261

262-
When no longer needed, delete the resource group, Batch account, and storage account. To do so in the Azure portal, select the resource group for the Batch account and click **Delete resource group**.
262+
When no longer needed, delete the resource group, Batch account, and storage account. To do so in the Azure portal, select the resource group for the Batch account and choose **Delete resource group**.
263263

264264
## Next steps
265265

266266
In this tutorial, you learned how to:
267267

268268
> [!div class="checklist"]
269-
> * Authenticate with Batch and Storage accounts
270-
> * Upload input files to Storage
271-
> * Create a pool of compute nodes to run an application
272-
> * Create a job and tasks to process input files
273-
> * Monitor task execution
274-
> * Retrieve output files
275-
276-
For more examples of using the Python API to schedule and process Batch workloads, see the samples on GitHub.
277-
278-
> [!div class="nextstepaction"]
279-
> [Batch Python samples](https://github.com/Azure/azure-batch-samples/tree/master/Python/Batch)
280-
269+
> * Authenticate with Batch and Storage accounts.
270+
> * Upload input files to Storage.
271+
> * Create a pool of compute nodes to run an application.
272+
> * Create a job and tasks to process input files.
273+
> * Monitor task execution.
274+
> * Retrieve output files.
275+
276+
For more examples of using the Python API to schedule and process Batch workloads, see the [Batch Python samples](https://github.com/Azure/azure-batch-samples/tree/master/Python/Batch) on GitHub.

0 commit comments

Comments
 (0)