Skip to content

Commit 0130273

Browse files
authored
Trainer: auto default (#16847)
1 parent d486f94 commit 0130273

File tree

21 files changed

+336
-192
lines changed

21 files changed

+336
-192
lines changed

docs/source-pytorch/accelerators/gpu_basic.rst

Lines changed: 20 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -14,30 +14,31 @@ A Graphics Processing Unit (GPU), is a specialized hardware accelerator designed
1414

1515
----
1616

17-
Train on 1 GPU
18-
--------------
19-
20-
Make sure you're running on a machine with at least one GPU. There's no need to specify any NVIDIA flags
21-
as Lightning will do it for you.
22-
23-
.. testcode::
24-
:skipif: torch.cuda.device_count() < 1
25-
26-
trainer = Trainer(accelerator="gpu", devices=1)
27-
28-
----------------
29-
30-
3117
.. _multi_gpu:
3218

33-
Train on multiple GPUs
34-
----------------------
19+
Train on GPUs
20+
-------------
3521

36-
To use multiple GPUs, set the number of devices in the Trainer or the index of the GPUs.
22+
The Trainer will run on all available GPUs by default. Make sure you're running on a machine with at least one GPU.
23+
There's no need to specify any NVIDIA flags as Lightning will do it for you.
3724

38-
.. code::
25+
.. code-block:: python
26+
27+
# run on as many GPUs as available by default
28+
trainer = Trainer(accelerator="auto", devices="auto", strategy="auto")
29+
# equivalent to
30+
trainer = Trainer()
3931
40-
trainer = Trainer(accelerator="gpu", devices=4)
32+
# run on one GPU
33+
trainer = Trainer(accelerator="gpu", devices=1)
34+
# run on multiple GPUs
35+
trainer = Trainer(accelerator="gpu", devices=8)
36+
# choose the number of devices automatically
37+
trainer = Trainer(accelerator="gpu", devices="auto")
38+
39+
.. note::
40+
Setting ``accelerator="gpu"`` will also automatically choose the "mps" device on Apple sillicon GPUs.
41+
If you want to avoid this, you can set ``accelerator="cuda"`` instead.
4142

4243
Choosing GPU devices
4344
^^^^^^^^^^^^^^^^^^^^

docs/source-pytorch/accelerators/hpu_basic.rst

Lines changed: 16 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -25,25 +25,30 @@ For more information, check out `Gaudi Architecture <https://docs.habana.ai/en/l
2525

2626
----
2727

28-
Run on 1 Gaudi
29-
--------------
28+
Run on Gaudi
29+
------------
3030

3131
To enable PyTorch Lightning to utilize the HPU accelerator, simply provide ``accelerator="hpu"`` parameter to the Trainer class.
3232

3333
.. code-block:: python
3434
35-
trainer = Trainer(accelerator="hpu", devices=1)
36-
37-
----
35+
# run on as many Gaudi devices as available by default
36+
trainer = Trainer(accelerator="auto", devices="auto", strategy="auto")
37+
# equivalent to
38+
trainer = Trainer()
3839
39-
Run on multiple Gaudis
40-
----------------------
41-
The ``devices=8`` and ``accelerator="hpu"`` parameters to the Trainer class enables the Habana accelerator for distributed training with 8 Gaudis.
42-
It uses :class:`~pytorch_lightning.strategies.hpu_parallel.HPUParallelStrategy` internally which is based on DDP strategy with the addition of Habana's collective communication library (HCCL) to support scale-up within a node and scale-out across multiple nodes.
40+
# run on one Gaudi device
41+
trainer = Trainer(accelerator="hpu", devices=1)
42+
# run on multiple Gaudi devices
43+
trainer = Trainer(accelerator="hpu", devices=8)
44+
# choose the number of devices automatically
45+
trainer = Trainer(accelerator="hpu", devices="auto")
4346
44-
.. code-block:: python
4547
46-
trainer = Trainer(devices=8, accelerator="hpu")
48+
The ``devices>1`` parameter with HPUs enables the Habana accelerator for distributed training.
49+
It uses :class:`~pytorch_lightning.strategies.hpu_parallel.HPUParallelStrategy` internally which is based on DDP
50+
strategy with the addition of Habana's collective communication library (HCCL) to support scale-up within a node and
51+
scale-out across multiple nodes.
4752

4853
----
4954

@@ -81,19 +86,6 @@ On Node 2:
8186
8287
----
8388

84-
Select Gaudis automatically
85-
---------------------------
86-
87-
Lightning can automatically detect the number of Gaudi devices to run on. This setting is enabled by default if the devices argument is missing.
88-
89-
.. code-block:: python
90-
91-
# equivalent
92-
trainer = Trainer(accelerator="hpu")
93-
trainer = Trainer(accelerator="hpu", devices="auto")
94-
95-
----
96-
9789
How to access HPUs
9890
------------------
9991

docs/source-pytorch/accelerators/ipu_basic.rst

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -24,23 +24,26 @@ See the `Graphcore Glossary <https://docs.graphcore.ai/projects/graphcore-glossa
2424

2525
----
2626

27-
Run on 1 IPU
28-
------------
29-
To use a single IPU, set the accelerator and devices argument.
27+
Run on IPU
28+
----------
3029

31-
.. code-block:: python
32-
33-
trainer = pl.Trainer(accelerator="ipu", devices=1)
34-
35-
----
30+
To enable PyTorch Lightning to utilize the IPU accelerator, simply provide ``accelerator="ipu"`` parameter to the Trainer class.
3631

37-
Run on multiple IPUs
38-
--------------------
3932
To use multiple IPUs set the devices to a number that is a power of 2 (i.e: 2, 4, 8, 16, ...)
4033

4134
.. code-block:: python
4235
43-
trainer = pl.Trainer(accelerator="ipu", devices=8)
36+
# run on as many IPUs as available by default
37+
trainer = Trainer(accelerator="auto", devices="auto", strategy="auto")
38+
# equivalent to
39+
trainer = Trainer()
40+
41+
# run on one IPU
42+
trainer = Trainer(accelerator="ipu", devices=1)
43+
# run on multiple IPUs
44+
trainer = Trainer(accelerator="ipu", devices=8)
45+
# choose the number of devices automatically
46+
trainer = Trainer(accelerator="ipu", devices="auto")
4447
4548
----
4649

docs/source-pytorch/accelerators/tpu_basic.rst

Lines changed: 15 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -32,36 +32,26 @@ some subset of those 2048 cores.
3232

3333
----
3434

35-
Run on 1 TPU core
36-
-----------------
37-
Enable the following Trainer arguments to run on 1 TPU.
38-
39-
.. code::
40-
41-
trainer = Trainer(accelerator="tpu", devices=1)
42-
43-
----
44-
45-
Run on multiple TPU cores
46-
-------------------------
47-
For multiple TPU cores, change the value of the devices flag.
48-
49-
.. code::
50-
51-
trainer = Trainer(accelerator="tpu", devices=8)
52-
53-
----
54-
55-
Run on a specific TPU core
56-
--------------------------
35+
Run on TPU cores
36+
----------------
5737

58-
To run on a specific core, specify the index of the TPU core.
38+
To run on different cores, modify the ``devices`` argument.
5939

6040
.. code-block:: python
6141
62-
trainer = pl.Trainer(accelerator="tpu", devices=[5])
42+
# run on as many TPUs as available by default
43+
trainer = Trainer(accelerator="auto", devices="auto", strategy="auto")
44+
# equivalent to
45+
trainer = Trainer()
6346
64-
This example runs on the 5th core, not on five cores.
47+
# run on one TPU core
48+
trainer = Trainer(accelerator="tpu", devices=1)
49+
# run on multiple TPU cores
50+
trainer = Trainer(accelerator="tpu", devices=8)
51+
# run on the 5th core
52+
trainer = Trainer(accelerator="tpu", devices=[5])
53+
# choose the number of cores automatically
54+
trainer = Trainer(accelerator="tpu", devices="auto")
6555
6656
----
6757

docs/source-pytorch/common/trainer.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -200,7 +200,7 @@ as well as custom accelerator instances.
200200
# Training with GPU Accelerator using the DistributedDataParallel strategy
201201
trainer = Trainer(devices=4, accelerator="gpu", strategy="ddp")
202202
203-
.. note:: The ``"auto"`` option recognizes the machine you are on, and selects the respective ``Accelerator``.
203+
.. note:: The ``"auto"`` option recognizes the machine you are on, and selects the appropriate ``Accelerator``.
204204

205205
.. code-block:: python
206206
@@ -417,7 +417,7 @@ Number of devices to train on (``int``), which devices to train on (``list`` or
417417

418418
.. code-block:: python
419419
420-
# If your machine has GPUs, it will use all the available GPUs for training
420+
# Use whatever hardware your machine has available
421421
trainer = Trainer(devices="auto", accelerator="auto")
422422
423423
# Training with CPU Accelerator using 1 process

src/lightning/pytorch/CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
5252

5353
### Changed
5454

55+
56+
- The `Trainer` now chooses `accelerator="auto", strategy="auto", devices="auto"` as defaults ([#16847](https://github.com/Lightning-AI/lightning/pull/16847))
57+
58+
5559
- "Native" suffix removal ([#16490](https://github.com/Lightning-AI/lightning/pull/16490))
5660
* `strategy="fsdp_native"` is now `strategy="fsdp"`
5761
* `strategy="fsdp_native_full_shard_offload"` is now `strategy="fsdp_cpu_offload"`

0 commit comments

Comments
 (0)