You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use DDP this way because `ddp_spawn` has a few limitations (due to Python and PyTorch):
66
+
Using DDP this way has a few disadvantages over ``torch.multiprocessing.spawn()``:
66
67
67
-
1. Since `.spawn()` trains the model in subprocesses, the model on the main process does not get updated.
68
-
2. Dataloader(num_workers=N), where N is large, bottlenecks training with DDP... ie: it will be VERY slow or won't work at all. This is a PyTorch limitation.
69
-
3. Forces everything to be picklable.
68
+
1. All processes (including the main process) participate in training and have the updated state of the model and Trainer state.
69
+
2. No multiprocessing pickle errors
70
+
3. Easily scales to multi-node training
70
71
71
-
There are cases in which it is NOT possible to use DDP. Examples are:
72
+
|
72
73
73
-
- Jupyter Notebook, Google COLAB, Kaggle, etc.
74
-
- You have a nested script without a root package
74
+
It is NOT possible to use DDP in interactive environments like Jupyter Notebook, Google COLAB, Kaggle, etc.
75
+
In these situations you should use `ddp_notebook`.
75
76
76
-
In these situations you should use `ddp_notebook` or `dp` instead.
77
77
78
-
Distributed Data Parallel Spawn
79
-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
80
-
`ddp_spawn` is exactly like `ddp` except that it uses .spawn to start the training processes.
78
+
----
81
79
82
-
.. warning:: It is STRONGLY recommended to use `DDP` for speed and performance.
@@ -165,8 +121,11 @@ The Trainer enables it by default when such environments are detected.
165
121
Among the native distributed strategies, regular DDP (``strategy="ddp"``) is still recommended as the go-to strategy over Spawn and Fork/Notebook for its speed and stability but it can only be used with scripts.
166
122
167
123
124
+
----
125
+
126
+
168
127
Comparison of DDP variants and tradeoffs
169
-
****************************************
128
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
170
129
171
130
.. list-table:: DDP variants and their tradeoffs
172
131
:widths: 40 20 20 20
@@ -202,68 +161,23 @@ Comparison of DDP variants and tradeoffs
DDP can also be used with 1 GPU, but there's no reason to do so other than debugging distributed-related issues.
223
-
224
-
225
-
Implement Your Own Distributed (DDP) training
226
-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
227
-
If you need your own way to init PyTorch DDP you can override :meth:`lightning.pytorch.strategies.ddp.DDPStrategy.setup_distributed`.
228
-
229
-
If you also need to use your own DDP implementation, override :meth:`lightning.pytorch.strategies.ddp.DDPStrategy.configure_ddp`.
164
+
----
230
165
231
-
----------
232
166
233
-
Torch Distributed Elastic
234
-
-------------------------
235
-
Lightning supports the use of Torch Distributed Elastic to enable fault-tolerant and elastic distributed job scheduling. To use it, specify the 'ddp' backend and the number of GPUs you want to use in the trainer.
167
+
TorchRun (TorchElastic)
168
+
-----------------------
169
+
Lightning supports the use of TorchRun (previously known as TorchElastic) to enable fault-tolerant and elastic distributed job scheduling.
170
+
To use it, specify the DDP strategy and the number of GPUs you want to use in the Trainer.
0 commit comments