Skip to content

Commit fdcc09c

Browse files
authored
Update old device flags (#12471)
1 parent 1c50ff7 commit fdcc09c

File tree

14 files changed

+58
-52
lines changed

14 files changed

+58
-52
lines changed

docs/source/accelerators/gpu.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -244,8 +244,8 @@ The table below lists examples of possible input formats and how they are interp
244244

245245
.. note::
246246

247-
When specifying number of gpus as an integer ``devices=k``, setting the trainer flag
248-
``auto_select_gpus=True`` will automatically help you find ``k`` gpus that are not
247+
When specifying number of ``devices`` as an integer ``devices=k``, setting the trainer flag
248+
``auto_select_gpus=True`` will automatically help you find ``k`` GPUs that are not
249249
occupied by other processes. This is especially useful when GPUs are configured
250250
to be in "exclusive mode", such that only one process at a time can access them.
251251
For more details see the :doc:`trainer guide <../common/trainer>`.
@@ -295,7 +295,7 @@ For a deeper understanding of what Lightning is doing, feel free to read this
295295
Data Parallel
296296
^^^^^^^^^^^^^
297297
:class:`~torch.nn.DataParallel` (DP) splits a batch across k GPUs.
298-
That is, if you have a batch of 32 and use DP with 2 gpus, each GPU will process 16 samples,
298+
That is, if you have a batch of 32 and use DP with 2 GPUs, each GPU will process 16 samples,
299299
after which the root node will aggregate the results.
300300

301301
.. warning:: DP use is discouraged by PyTorch and Lightning. State is not maintained on the replicas created by the
@@ -749,7 +749,7 @@ Let's say you have a batch size of 7 in your dataloader.
749749
def train_dataloader(self):
750750
return Dataset(..., batch_size=7)
751751

752-
In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED, or Horovod your effective batch size will be 7 * gpus * num_nodes.
752+
In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED, or Horovod your effective batch size will be 7 * devices * num_nodes.
753753

754754
.. code-block:: python
755755
@@ -786,7 +786,7 @@ The reason is that the full batch is visible to all GPUs on the node when using
786786

787787
Torch Distributed Elastic
788788
-------------------------
789-
Lightning supports the use of Torch Distributed Elastic to enable fault-tolerant and elastic distributed job scheduling. To use it, specify the 'ddp' or 'ddp2' backend and the number of gpus you want to use in the trainer.
789+
Lightning supports the use of Torch Distributed Elastic to enable fault-tolerant and elastic distributed job scheduling. To use it, specify the 'ddp' or 'ddp2' backend and the number of GPUs you want to use in the trainer.
790790

791791
.. code-block:: python
792792

docs/source/accelerators/ipu.rst

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Specify the number of IPUs to train with. Note that when training with IPUs, you
3434

3535
.. code-block:: python
3636
37-
trainer = pl.Trainer(ipus=8) # Train using data parallel on 8 IPUs
37+
trainer = pl.Trainer(accelerator="ipu", devices=8) # Train using data parallel on 8 IPUs
3838
3939
IPUs only support specifying a single number to allocate devices, which is handled via the underlying libraries.
4040

@@ -53,7 +53,7 @@ set the precision flag.
5353
import pytorch_lightning as pl
5454
5555
model = MyLightningModule()
56-
trainer = pl.Trainer(ipus=8, precision=16)
56+
trainer = pl.Trainer(accelerator="ipu", devices=8, precision=16)
5757
trainer.fit(model)
5858
5959
You can also use pure 16-bit training, where the weights are also in 16-bit precision.
@@ -65,7 +65,7 @@ You can also use pure 16-bit training, where the weights are also in 16-bit prec
6565
6666
model = MyLightningModule()
6767
model = model.half()
68-
trainer = pl.Trainer(ipus=8, precision=16)
68+
trainer = pl.Trainer(accelerator="ipu", devices=8, precision=16)
6969
trainer.fit(model)
7070
7171
Advanced IPU options
@@ -83,7 +83,7 @@ IPUs provide further optimizations to speed up training. By using the ``IPUStrat
8383
from pytorch_lightning.strategies import IPUStrategy
8484
8585
model = MyLightningModule()
86-
trainer = pl.Trainer(ipus=8, strategy=IPUStrategy(device_iterations=32))
86+
trainer = pl.Trainer(accelerator="ipu", devices=8, strategy=IPUStrategy(device_iterations=32))
8787
trainer.fit(model)
8888
8989
Note that by default we return the last device iteration loss. You can override this by passing in your own ``poptorch.Options`` and setting the AnchorMode as described in the `PopTorch documentation <https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/reference.html#poptorch.Options.anchorMode>`__.
@@ -102,7 +102,9 @@ Note that by default we return the last device iteration loss. You can override
102102
training_opts.anchorMode(poptorch.AnchorMode.All)
103103
training_opts.deviceIterations(32)
104104
105-
trainer = Trainer(ipus=8, strategy=IPUStrategy(inference_opts=inference_opts, training_opts=training_opts))
105+
trainer = Trainer(
106+
accelerator="ipu", devices=8, strategy=IPUStrategy(inference_opts=inference_opts, training_opts=training_opts)
107+
)
106108
trainer.fit(model)
107109
108110
You can also override all options by passing the ``poptorch.Options`` to the plugin. See `PopTorch options documentation <https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/batching.html>`__ for more information.
@@ -124,7 +126,7 @@ Lightning supports dumping all reports to a directory to open using the tool.
124126
from pytorch_lightning.strategies import IPUStrategy
125127
126128
model = MyLightningModule()
127-
trainer = pl.Trainer(ipus=8, strategy=IPUStrategy(autoreport_dir="report_dir/"))
129+
trainer = pl.Trainer(accelerator="ipu", devices=8, strategy=IPUStrategy(autoreport_dir="report_dir/"))
128130
trainer.fit(model)
129131
130132
This will dump all reports to ``report_dir/`` which can then be opened using the Graph Analyser Tool, see `Opening Reports <https://docs.graphcore.ai/projects/graph-analyser-userguide/en/latest/graph-analyser.html#opening-reports>`__.
@@ -142,7 +144,7 @@ Below is an example using the block annotation in a LightningModule.
142144

143145
Currently, when using model parallelism we do not infer the number of IPUs required for you. This is done via the annotations themselves. If you specify 4 different IDs when defining Blocks, this means your model will be split onto 4 different IPUs.
144146

145-
This is also mutually exclusive with the Trainer flag. In other words, if your model is split onto 2 IPUs and you set ``Trainer(ipus=4)`` this will require 8 IPUs in total: data parallelism will be used to replicate the two-IPU model 4 times.
147+
This is also mutually exclusive with the Trainer flag. In other words, if your model is split onto 2 IPUs and you set ``Trainer(accelerator="ipu", devices=4)`` this will require 8 IPUs in total: data parallelism will be used to replicate the two-IPU model 4 times.
146148

147149
When pipelining the model you must also increase the `device_iterations` to ensure full data saturation of the devices data, i.e whilst one device in the model pipeline processes a batch of data, the other device can start on the next batch. For example if the model is split onto 4 IPUs, we require `device_iterations` to be at-least 4.
148150

@@ -174,7 +176,7 @@ Below is an example using the block annotation in a LightningModule.
174176
175177
176178
model = MyLightningModule()
177-
trainer = pl.Trainer(ipus=8, strategy=IPUStrategy(device_iterations=20))
179+
trainer = pl.Trainer(accelerator="ipu", devices=8, strategy=IPUStrategy(device_iterations=20))
178180
trainer.fit(model)
179181
180182
@@ -217,7 +219,7 @@ You can also use the block context manager within the forward function, or any o
217219
218220
219221
model = MyLightningModule()
220-
trainer = pl.Trainer(ipus=8, strategy=IPUStrategy(device_iterations=20))
222+
trainer = pl.Trainer(accelerator="ipu", devices=8, strategy=IPUStrategy(device_iterations=20))
221223
trainer.fit(model)
222224
223225

docs/source/accelerators/tpu.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ TPU core training
127127

128128
Lightning supports training on a single TPU core or 8 TPU cores.
129129

130-
The Trainer parameters ``tpu_cores`` defines how many TPU cores to train on (1 or 8) / Single TPU to train on [1].
130+
The Trainer parameters ``devices`` along with ``accelerator="tpu"`` defines how many TPU cores to train on (1 or 8) / Single TPU to train on [1].
131131

132132
For Single TPU training, Just pass the TPU core ID [1-8] in a list.
133133

docs/source/advanced/model_parallel.rst

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -732,7 +732,8 @@ When enabled, it can result in a performance hit and can be disabled in most cas
732732
from pytorch_lightning.strategies import DDPStrategy
733733
734734
trainer = pl.Trainer(
735-
gpus=2,
735+
accelerator="gpu",
736+
devices=2,
736737
strategy=DDPStrategy(find_unused_parameters=False),
737738
)
738739
@@ -741,7 +742,8 @@ When enabled, it can result in a performance hit and can be disabled in most cas
741742
from pytorch_lightning.strategies import DDPSpawnStrategy
742743
743744
trainer = pl.Trainer(
744-
gpus=2,
745+
accelerator="gpu",
746+
devices=2,
745747
strategy=DDPSpawnStrategy(find_unused_parameters=False),
746748
)
747749
@@ -894,7 +896,8 @@ When using Post-localSGD, you must also pass ``model_averaging_period`` to allow
894896
895897
model = MyModel()
896898
trainer = Trainer(
897-
gpus=4,
899+
accelerator="gpu",
900+
devices=4,
898901
strategy=DDPStrategy(
899902
ddp_comm_state=post_localSGD.PostLocalSGDState(
900903
process_group=None,

docs/source/clouds/cloud_training.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ You can launch any Lightning model on Grid using the Grid `CLI <https://pypi.org
3232

3333
.. code-block:: bash
3434
35-
grid run --instance_type v100 --gpus 4 my_model.py --gpus 4 --learning_rate 'uniform(1e-6, 1e-1, 20)' --layers '[2, 4, 8, 16]'
35+
grid run --instance_type v100 --gpus 4 my_model.py --accelerator 'gpu' --devices 4 --learning_rate 'uniform(1e-6, 1e-1, 20)' --layers '[2, 4, 8, 16]'
3636
3737
You can also start runs or interactive sessions from the `Grid platform <https://platform.grid.ai>`_, where you can upload datasets, view artifacts, view the logs, the cost, log into tensorboard, and so much more.
3838

docs/source/common/trainer.rst

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,7 @@ as well as custom accelerator instances.
217217
# CPU accelerator
218218
trainer = Trainer(accelerator="cpu")
219219
220-
# Training with GPU Accelerator using 2 gpus
220+
# Training with GPU Accelerator using 2 GPUs
221221
trainer = Trainer(devices=2, accelerator="gpu")
222222
223223
# Training with TPU Accelerator using 8 tpu cores
@@ -350,16 +350,16 @@ auto_select_gpus
350350

351351
|
352352
353-
If enabled and `gpus` is an integer, pick available gpus automatically.
353+
If enabled and ``devices`` is an integer, pick available GPUs automatically.
354354
This is especially useful when GPUs are configured to be in "exclusive mode",
355355
such that only one process at a time can access them.
356356

357357
Example::
358358

359-
# no auto selection (picks first 2 gpus on system, may fail if other process is occupying)
359+
# no auto selection (picks first 2 GPUs on system, may fail if other process is occupying)
360360
trainer = Trainer(accelerator="gpu", devices=2, auto_select_gpus=False)
361361

362-
# enable auto selection (will find two available gpus on system)
362+
# enable auto selection (will find two available GPUs on system)
363363
trainer = Trainer(accelerator="gpu", devices=2, auto_select_gpus=True)
364364

365365
# specifies all GPUs regardless of its availability
@@ -696,8 +696,8 @@ See Also:
696696
gpus
697697
^^^^
698698

699-
.. warning:: Setting `Trainer(gpus=x)` is deprecated in v1.6 and will be removed"
700-
in v2.0. Please use `Trainer(accelerator='gpu', devices=x)` instead.
699+
.. warning:: Setting `Trainer(gpus=x)` is deprecated in v1.6 and will be removed
700+
in v2.0. Please use `Trainer(accelerator="gpu", devices=x)` instead.
701701

702702
.. raw:: html
703703

@@ -1189,7 +1189,7 @@ Half precision, or mixed precision, is the combined use of 32 and 16 bit floatin
11891189
trainer = Trainer(precision=32)
11901190

11911191
# 16-bit precision
1192-
trainer = Trainer(precision=16, gpus=1) # works only on CUDA
1192+
trainer = Trainer(precision=16, accelerator="gpu", devices=1) # works only on CUDA
11931193

11941194
# bfloat16 precision
11951195
trainer = Trainer(precision="bf16")
@@ -1214,7 +1214,7 @@ Half precision, or mixed precision, is the combined use of 32 and 16 bit floatin
12141214
:skipif: not _APEX_AVAILABLE or not torch.cuda.is_available()
12151215

12161216
# turn on 16-bit
1217-
trainer = Trainer(amp_backend="apex", amp_level="O2", precision=16, gpus=1)
1217+
trainer = Trainer(amp_backend="apex", amp_level="O2", precision=16, accelerator="gpu", devices=1)
12181218

12191219

12201220
process_position
@@ -1412,7 +1412,7 @@ Supports passing different training strategies with aliases (ddp, ddp_spawn, etc
14121412

14131413
.. code-block:: python
14141414
1415-
# Training with the DistributedDataParallel strategy on 4 gpus
1415+
# Training with the DistributedDataParallel strategy on 4 GPUs
14161416
trainer = Trainer(strategy="ddp", accelerator="gpu", devices=4)
14171417
14181418
# Training with the DDP Spawn strategy using 4 cpu processes

docs/source/guides/speed.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -37,10 +37,10 @@ Lightning supports a variety of plugins to speed up distributed GPU training. Mo
3737
# run on 1 gpu
3838
trainer = Trainer(accelerator="gpu", devices=1)
3939
40-
# train on 8 gpus, using the DDP strategy
40+
# train on 8 GPUs, using the DDP strategy
4141
trainer = Trainer(accelerator="gpu", devices=8, strategy="ddp")
4242
43-
# train on multiple GPUs across nodes (uses 8 gpus in total)
43+
# train on multiple GPUs across nodes (uses 8 GPUs in total)
4444
trainer = Trainer(accelerator="gpu", devices=2, num_nodes=4)
4545
4646
@@ -140,7 +140,7 @@ This is a limitation of Python ``.spawn()`` and PyTorch.
140140
TPU Training
141141
============
142142

143-
You can set the ``tpu_cores`` trainer flag to 1, [7] (specific core) or eight cores.
143+
You can set the ``devices`` trainer argument to 1, [7] (specific core) or eight cores.
144144

145145
.. code-block:: python
146146
@@ -214,7 +214,7 @@ Lightning offers mixed precision training for GPUs and CPUs, as well as bfloat16
214214
:skipif: torch.cuda.device_count() < 4
215215

216216
# 16-bit precision
217-
trainer = Trainer(precision=16, gpus=4)
217+
trainer = Trainer(precision=16, accelerator="gpu", devices=4)
218218

219219

220220
Read more about :ref:`mixed-precision training <amp>`.
@@ -361,7 +361,7 @@ Here is an example of an advanced use case:
361361

362362
.. testcode::
363363

364-
# Scenario for a GAN with gradient accumulation every two batches and optimized for multiple gpus.
364+
# Scenario for a GAN with gradient accumulation every two batches and optimized for multiple GPUs.
365365
class SimpleGAN(LightningModule):
366366
def __init__(self):
367367
super().__init__()

docs/source/starter/introduction.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -390,10 +390,10 @@ CPU
390390
trainer = Trainer()
391391

392392
# train on 8 CPUs
393-
trainer = Trainer(num_processes=8)
393+
trainer = Trainer(accelerator="cpu", devices=8)
394394

395395
# train on 1024 CPUs across 128 machines
396-
trainer = pl.Trainer(num_processes=8, num_nodes=128)
396+
trainer = pl.Trainer(accelerator="cpu", devices=8, num_nodes=128)
397397

398398
GPU
399399
---
@@ -403,10 +403,10 @@ GPU
403403
# train on 1 GPU
404404
trainer = pl.Trainer(accelerator="gpu", devices=1)
405405
406-
# train on multiple GPUs across nodes (32 gpus here)
406+
# train on multiple GPUs across nodes (32 GPUs here)
407407
trainer = pl.Trainer(accelerator="gpu", devices=4, num_nodes=8)
408408
409-
# train on gpu 1, 3, 5 (3 gpus total)
409+
# train on gpu 1, 3, 5 (3 GPUs total)
410410
trainer = pl.Trainer(accelerator="gpu", devices=[1, 3, 5])
411411
412412
# Multi GPU with mixed precision
@@ -437,7 +437,7 @@ IPU
437437
.. code-block:: python
438438
439439
# Train on IPUs
440-
trainer = pl.Trainer(ipus=8)
440+
trainer = pl.Trainer(accelerator="ipu", devices=8)
441441
442442
443443
Checkpointing

docs/source/starter/lightning_lite.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ Here is an example while running on 256 GPUs (eight GPUs times 32 nodes).
182182
self.barrier()
183183
184184
185-
Lite(strategy="ddp", gpus=8, num_nodes=32, accelerator="gpu").run()
185+
Lite(strategy="ddp", devices=8, num_nodes=32, accelerator="gpu").run()
186186
187187
188188
If you require custom data or model device placement, you can deactivate

pl_examples/basic_examples/README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ This script shows you the result of the conversion to the `LightningModule` and
5050
python mnist_examples/image_classifier_4_lightning_module.py
5151

5252
# GPUs (any number)
53-
python mnist_examples/image_classifier_4_lightning_module.py --trainer.gpus 2
53+
python mnist_examples/image_classifier_4_lightning_module.py --trainer.accelerator 'gpu' --trainer.devices 2
5454
```
5555

5656
______________________________________________________________________
@@ -64,10 +64,10 @@ This script shows you how to extract the data related components into a `Lightni
6464
python mnist_examples/image_classifier_5_lightning_datamodule.py
6565

6666
# GPUs (any number)
67-
python mnist_examples/image_classifier_5_lightning_datamodule.py --trainer.gpus 2
67+
python mnist_examples/image_classifier_5_lightning_datamodule.py --trainer.accelerator 'gpu' --trainer.devices 2
6868

6969
# Distributed Data Parallel (DDP)
70-
python mnist_examples/image_classifier_5_lightning_datamodule.py --trainer.gpus 2 --trainer.strategy 'ddp'
70+
python mnist_examples/image_classifier_5_lightning_datamodule.py --trainer.accelerator 'gpu' --trainer.devices 2 --trainer.strategy 'ddp'
7171
```
7272

7373
______________________________________________________________________
@@ -81,10 +81,10 @@ This script shows you how to implement a CNN auto-encoder.
8181
python autoencoder.py
8282

8383
# GPUs (any number)
84-
python autoencoder.py --trainer.gpus 2
84+
python autoencoder.py --trainer.accelerator 'gpu' --trainer.devices 2
8585

8686
# Distributed Data Parallel (DDP)
87-
python autoencoder.py --trainer.gpus 2 --trainer.strategy 'ddp'
87+
python autoencoder.py --trainer.accelerator 'gpu' --trainer.devices 2 --trainer.strategy 'ddp'
8888
```
8989

9090
______________________________________________________________________
@@ -99,10 +99,10 @@ A system describes a `LightningModule` which takes a single `torch.nn.Module` wh
9999
python backbone_image_classifier.py
100100

101101
# GPUs (any number)
102-
python backbone_image_classifier.py --trainer.gpus 2
102+
python backbone_image_classifier.py --trainer.accelerator 'gpu' --trainer.devices 2
103103

104104
# Distributed Data Parallel (DDP)
105-
python backbone_image_classifier.py --trainer.gpus 2 --trainer.strategy 'ddp'
105+
python backbone_image_classifier.py --trainer.accelerator 'gpu' --trainer.devices 2 --trainer.strategy 'ddp'
106106
```
107107

108108
______________________________________________________________________

0 commit comments

Comments
 (0)