Skip to content

Commit b291a63

Browse files
committed
Extra edits
1 parent 6eb43bc commit b291a63

File tree

1 file changed

+17
-17
lines changed

1 file changed

+17
-17
lines changed

articles/machine-learning/how-to-train-distributed-gpu.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Distributed GPU training guide (SDK v2)
33
titleSuffix: Azure Machine Learning
4-
description: Learn best practices for performing distributed training with Azure Machine Learning SDK (v2) supported frameworks, such as MPI, Horovod, DeepSpeed, PyTorch, TensorFlow, and InfiniBand.
4+
description: Learn best practices for distributed training with supported frameworks, such as MPI, Horovod, DeepSpeed, PyTorch, TensorFlow, and InfiniBand.
55
author: rtanase
66
ms.author: ratanase
77
ms.reviewer: sgilley
@@ -27,7 +27,7 @@ Learn more about using distributed GPU training code in Azure Machine Learning.
2727

2828
## Prerequisites
2929

30-
Review the [basic concepts of distributed GPU training](concept-distributed-training.md) such as _data parallelism_, _distributed data parallelism_, and _model parallelism_.
30+
Review the basic concepts of [distributed GPU training](concept-distributed-training.md), such as *data parallelism*, *distributed data parallelism*, and *model parallelism*.
3131

3232
> [!TIP]
3333
> If you don't know which type of parallelism to use, more than 90% of the time you should use **distributed data parallelism**.
@@ -63,16 +63,16 @@ Make sure your code follows these tips:
6363

6464
### Environment variables from Open MPI
6565

66-
When running MPI jobs with Open MPI images, the following environment variables for each process launched:
66+
When running MPI jobs with Open MPI images, you can use the following environment variables for each process launched:
6767

68-
1. `OMPI_COMM_WORLD_RANK`: the rank of the process
69-
2. `OMPI_COMM_WORLD_SIZE`: the world size
70-
3. `AZ_BATCH_MASTER_NODE`: the primary address with port, `MASTER_ADDR:MASTER_PORT`
71-
4. `OMPI_COMM_WORLD_LOCAL_RANK`: the local rank of the process on the node
72-
5. `OMPI_COMM_WORLD_LOCAL_SIZE`: the number of processes on the node
68+
1. `OMPI_COMM_WORLD_RANK`: The rank of the process
69+
2. `OMPI_COMM_WORLD_SIZE`: The world size
70+
3. `AZ_BATCH_MASTER_NODE`: The primary address with port, `MASTER_ADDR:MASTER_PORT`
71+
4. `OMPI_COMM_WORLD_LOCAL_RANK`: The local rank of the process on the node
72+
5. `OMPI_COMM_WORLD_LOCAL_SIZE`: The number of processes on the node
7373

7474
> [!TIP]
75-
> Despite the name, environment variable `OMPI_COMM_WORLD_NODE_RANK` doesn't correspond to the `NODE_RANK`. To use per-node-launcher, set `process_count_per_node=1` and use `OMPI_COMM_WORLD_RANK` as the `NODE_RANK`.
75+
> Despite the name, the environment variable `OMPI_COMM_WORLD_NODE_RANK` doesn't correspond to the `NODE_RANK`. To use per-node-launcher, set `process_count_per_node=1` and use `OMPI_COMM_WORLD_RANK` as the `NODE_RANK`.
7676
7777
## PyTorch
7878

@@ -93,10 +93,10 @@ The most common communication backends used are `mpi`, `nccl`, and `gloo`. For G
9393

9494
`init_method` tells how each process can discover each other, how they initialize and verify the process group using the communication backend. By default, if `init_method` isn't specified, PyTorch uses the environment variable initialization method (`env://`). `init_method` is the recommended initialization method to use in your training code to run distributed PyTorch on Azure Machine Learning. PyTorch looks for the following environment variables for initialization:
9595

96-
- **`MASTER_ADDR`**: IP address of the machine that hosts the process with rank 0.
97-
- **`MASTER_PORT`**: A free port on the machine that hosts the process with rank 0.
98-
- **`WORLD_SIZE`**: The total number of processes. Should be equal to the total number of devices (GPU) used for distributed training.
99-
- **`RANK`**: The (global) rank of the current process. The possible values are 0 to (world size - 1).
96+
- **`MASTER_ADDR`**: IP address of the machine that hosts the process with rank 0
97+
- **`MASTER_PORT`**: A free port on the machine that hosts the process with rank 0
98+
- **`WORLD_SIZE`**: The total number of processes. Should be equal to the total number of devices (GPU) used for distributed training
99+
- **`RANK`**: The (global) rank of the current process. The possible values are 0 to (world size - 1)
100100

101101
For more information on process group initialization, see the [PyTorch documentation](https://pytorch.org/docs/stable/distributed.html#torch.distributed.init_process_group).
102102

@@ -119,22 +119,22 @@ Azure Machine Learning sets the `MASTER_ADDR`, `MASTER_PORT`, `WORLD_SIZE`, and
119119

120120
## DeepSpeed
121121

122-
[DeepSpeed](https://www.deepspeed.ai/tutorials/azure/) is supported as a first-class citizen within Azure Machine Learning to run distributed jobs with near linear scalability in terms of:
122+
Azure Machine Learning supports [DeepSpeed](https://www.deepspeed.ai/tutorials/azure/) as a first-class citizen to run distributed jobs with near linear scalability in terms of:
123123

124124
* Increase in model size
125125
* Increase in number of GPUs
126126

127-
`DeepSpeed` can be enabled using either Pytorch distribution or MPI for running distributed training. Azure Machine Learning supports the `DeepSpeed` launcher to launch distributed training as well as autotuning to get optimal `ds` configuration.
127+
DeepSpeed can be enabled using either Pytorch distribution or MPI for running distributed training. Azure Machine Learning supports the DeepSpeed launcher to launch distributed training as well as autotuning to get optimal `ds` configuration.
128128

129-
You can use a [curated environment](resource-curated-environments.md) for an out of the box environment with the latest state of art technologies including `DeepSpeed`, `ORT`, `MSSCCL`, and `Pytorch` for your DeepSpeed training jobs.
129+
You can use a [curated environment](resource-curated-environments.md) for an out of the box environment with the latest state of art technologies including DeepSpeed, ORT, MSSCCL, and Pytorch for your DeepSpeed training jobs.
130130

131131
### DeepSpeed example
132132

133133
* For DeepSpeed training and autotuning examples, see [these folders](https://github.com/Azure/azureml-examples/tree/main/cli/jobs/deepspeed).
134134

135135
## TensorFlow
136136

137-
If you're using [native distributed TensorFlow](https://www.tensorflow.org/guide/distributed_training) in your training code, such as TensorFlow 2.x's `tf.distribute.Strategy` API, you can launch the distributed job via Azure Machine Learning using `distribution` parameters or the `TensorFlowDistribution` object.
137+
If you use [native distributed TensorFlow](https://www.tensorflow.org/guide/distributed_training) in your training code, such as TensorFlow 2.x's `tf.distribute.Strategy` API, you can launch the distributed job via Azure Machine Learning using `distribution` parameters or the `TensorFlowDistribution` object.
138138

139139
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/tensorflow/mnist-distributed/tensorflow-mnist-distributed.ipynb?name=job)]
140140

0 commit comments

Comments
 (0)