Skip to content

Commit 1904b5e

Browse files
authored
Update articles/machine-learning/how-to-train-distributed-gpu.md
1 parent 129ee60 commit 1904b5e

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

articles/machine-learning/how-to-train-distributed-gpu.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -258,7 +258,7 @@ run = Experiment(ws, 'experiment_name').submit(run_config)
258258
For single-node training (including single-node multi-GPU), you can run your code on Azure ML without needing to specify a `distributed_job_config`.
259259
To run an experiment using multiple nodes with multiple GPUs, there are 2 options:
260260
261-
- Using PyTorch configuration (recommended): Define `PyTorchConfiguration` and specify `communication_backend="Nccl"`, `node_count`, and `process_count` (note that this is the total number of processes, ie, `num_nodes * process_count_per_node`). In Lightning Trainer module, specify both `num_nodes` and `gpus` to be consistent with `PyTorchConfiguration`, ie, `num_nodes = node_count` and `gpus = process_count_per_node`.
261+
- Using PyTorch configuration (recommended): Define `PyTorchConfiguration` and specify `communication_backend="Nccl"`, `node_count`, and `process_count` (note that this is the total number of processes, ie, `num_nodes * process_count_per_node`). In Lightning Trainer module, specify both `num_nodes` and `gpus` to be consistent with `PyTorchConfiguration`. For example, `num_nodes = node_count` and `gpus = process_count_per_node`.
262262
263263
- Using MPI Configuration:
264264

0 commit comments

Comments
 (0)