Skip to content

Commit cca6d2c

Browse files
committed
added single gpu train doc
1 parent 73b50ab commit cca6d2c

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed

docs/Trainer/Distributed training.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,19 @@ have configuration issues depending on your cluster.
2323

2424
For a deeper understanding of what lightning is doing, feel free to read [this guide](https://medium.com/@_willfalcon/9-tips-for-training-lightning-fast-neural-networks-in-pytorch-8e63a502f565).
2525

26+
---
27+
#### Distributed and 16-bit precision.
28+
Due to an issue with apex and DistributedDataParallel (PyTorch and NVIDIA issue), Lightning does
29+
not allow 16-bit and DP training. We tried to get this to work, but it's an issue on their end.
30+
31+
| 1 GPU | 1+ GPUs | DP | DDP | 16-bit | command |
32+
|---|---|---|---|---|---|
33+
| Y | | | | Y | ```Trainer(gpus=[0])``` |
34+
| | Y | Y | | | ```Trainer(gpus=[0, ...])``` |
35+
| | Y | | Y | | ```Trainer(gpus=[0, ...], distributed_backend='ddp')``` |
36+
| | Y | | Y | Y | ```Trainer(gpus=[0, ...], distributed_backend='ddp', use_amp=True)``` |
37+
38+
2639
---
2740
#### CUDA flags
2841
CUDA flags make certain GPUs visible to your script.

0 commit comments

Comments
 (0)