You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/concept-distributed-training.md
+7-6Lines changed: 7 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: What is distributed training?
3
3
titleSuffix: Azure Machine Learning
4
-
description: Distributed training refers to the ability to accelerate model training by sharing and parallelizing data loads and training tasks across multiple GPUs.
4
+
description: Learn about distributed training and how Azure Machine Learning supports it.
5
5
services: machine-learning
6
6
ms.service: machine-learning
7
7
author: nibaccam
@@ -13,18 +13,19 @@ ms.date: 03/27/2020
13
13
14
14
# Distributed training with Azure Machine Learning
15
15
16
-
In distributed training the work load to train a model is split up and shared among multiple mini processors, called worker nodes. These worker nodes work in parallel to speed up model training. Distrbuted training is well suited for compute and time intensive tasks, like [deep learning](concept-deep-learning-vs-machine-learning.md) for training deep neural networks.
16
+
In this article, you learn about distributed trainingand how Azure Machine Learning supports it.
17
17
18
-
## Distributed training in Azure Machine Learning
18
+
In distributed training the workload to train a model is split up and shared among multiple mini processors, called worker nodes. These worker nodes work in parallel to speed up model training. Distributed training is well suited for compute and time intensive tasks, like [deep learning](concept-deep-learning-vs-machine-learning.md) for training deep neural networks.
19
19
20
-
There are two main types of distributed training: [data parallelism](#data-parallelism) and [model parallelism](#model-parallelism).
21
-
The [Azure Machine Learning SDK in Python](https://docs.microsoft.com/python/api/overview/azure/ml/intro?view=azure-ml-py) supports integrations with popular deep learning frameworks, PyTorch and TensorFlow. Both frameworks employ data parallelism for distributed training, and leverage [horovod](https://horovod.readthedocs.io/en/latest/summary_include.html) for optimizing compute speeds.
20
+
## Deep learning and distributed training
21
+
22
+
There are two main types of distributed training: [data parallelism](#data-parallelism) and [model parallelism](#model-parallelism). For distributed training, the [Azure Machine Learning SDK in Python](https://docs.microsoft.com/python/api/overview/azure/ml/intro?view=azure-ml-py) supports integrations with popular deep learning frameworks, PyTorch and TensorFlow. Both frameworks employ data parallelism for distributed training, and leverage [horovod](https://horovod.readthedocs.io/en/latest/summary_include.html) for optimizing compute speeds.
22
23
23
24
*[Distributed training with PyTorch](how-to-train-pytorch.md#distributed-training)
24
25
25
26
*[Distributed training with TensorFlow](how-to-train-tensorflow.md#distributed-training)
26
27
27
-
For training traditional ML models, see [Azure Machine Learning SDK for Python](concept-train-machine-learning-model.md#python-sdk) for the different ways to train models using the Python SDK.
28
+
For training traditional ML models, see [train models with Azure Machine Learning](concept-train-machine-learning-model.md#python-sdk) for the different ways to train models using the Python SDK.
0 commit comments