Skip to content

Commit b87cf7d

Browse files
committed
definition tweaks
1 parent a0209f3 commit b87cf7d

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

articles/machine-learning/concept-distributed-training.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
---
22
title: What is distributed training?
33
titleSuffix: Azure Machine Learning
4-
description:
4+
description: Distributed training refers to the ability to accelerate model training by sharing and parallelizing data loads and training tasks across multiple GPUs.
55
services: machine-learning
66
ms.service: machine-learning
77
author: nibaccam
88
ms.author: nibaccam
99
ms.subservice: core
1010
ms.topic: conceptual
11-
ms.date: 03/24/2020
11+
ms.date: 03/27/2020
1212
---
1313

1414
# Distributed training with Azure Machine Learning
@@ -36,13 +36,13 @@ There are two main types of distributed training: **data parallelism** and **mod
3636

3737
### Data parallelism
3838

39-
In data parallelism, the data is divided into partitions, where the number of partitions is equal to the total number of available nodes, in the compute cluster. The model is copied in each of these worker nodes, and each worker operates on its own subset of the data. Keep in mind that the model has to entirely fit on each node, that is each node has to have the capacity to support the model that's being trained.
39+
In data parallelism, the data is divided into partitions, where the number of partitions is equal to the total number of available nodes, in the compute cluster. The model is copied in each of these worker nodes, and each worker operates on its own subset of the data. Keep in mind that each node has to have the capacity to support the model that's being trained, that is the model has to entirely fit on each node.
4040

41-
Each node independently computes the errors between its predictions for its training samples and the labeled outputs. This means that the worker nodes need to synchronize the model parameters, or gradients, at the end of the batch computation to ensure they are training a consistent model. In turn, each node must communicate all of its changes to the other nodes to update all of the models.
41+
Each node independently computes the errors between its predictions for its training samples and the labeled outputs. In turn, each node updates its model based on the errors and must communicate all of its changes to the other nodes to update their corresponding models. This means that the worker nodes need to synchronize the model parameters, or gradients, at the end of the batch computation to ensure they are training a consistent model.
4242

4343
### Model parallelism
4444

45-
In model parallelism, also known as network parallelism, the model is segmented into different parts that can run concurrently, and each one will run on the same data in different nodes. The scalability of this method depends on the degree of task parallelization of the algorithm, and it is more complex to implement than data parallelism.
45+
In model parallelism, also known as network parallelism, the model is segmented into different parts that can run concurrently in different nodes, and each one will run on the same data. The scalability of this method depends on the degree of task parallelization of the algorithm, and it is more complex to implement than data parallelism.
4646

4747
In model parallelism, worker nodes only need to synchronize the shared parameters, usually once for each forward or backward-propagation step. Also, larger models aren't a concern since each node operates on a subsection of the model on the same training data.
4848

0 commit comments

Comments
 (0)