Skip to content

Commit a8605b4

Browse files
Update Lightning Lite docs (5/n) (#16291)
* organize * organize * organize * organize * Fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * accelerator * distributed launch * notebooks * code structure * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lightning_module * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * x * update * conflicts * fix duplicates * links.rst * api folder * add todo for build errors * resolve duplicate reference warnings * address review by eden Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 552ed1e commit a8605b4

21 files changed

+1456
-590
lines changed

docs/source-pytorch/api_references.rst

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -82,16 +82,6 @@ core
8282
~optimizer.LightningOptimizer
8383
~saving.ModelIO
8484

85-
lightning_fabric
86-
----------------
87-
88-
.. currentmodule:: lightning_fabric.fabric
89-
90-
.. autosummary::
91-
:toctree: api
92-
:nosignatures:
93-
94-
Fabric
9585

9686
loggers
9787
-------
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
:orphan:
2+
3+
.. include:: ../../links.rst
4+
5+
#############
6+
API Reference
7+
#############
8+
9+
10+
Fabric
11+
^^^^^^
12+
13+
.. currentmodule:: lightning_fabric.fabric
14+
15+
.. autosummary::
16+
:toctree: ../../api
17+
:nosignatures:
18+
:template: classtemplate.rst
19+
20+
Fabric
21+
22+
23+
Accelerators
24+
^^^^^^^^^^^^
25+
26+
.. currentmodule:: lightning_fabric.accelerators
27+
28+
.. autosummary::
29+
:toctree: ../../api
30+
:nosignatures:
31+
:template: classtemplate.rst
32+
33+
Accelerator
34+
CPUAccelerator
35+
CUDAAccelerator
36+
MPSAccelerator
37+
TPUAccelerator
38+
39+
40+
Plugins
41+
^^^^^^^
42+
43+
Precision
44+
"""""""""
45+
46+
.. TODO(fabric): include DeepSpeedPrecision
47+
48+
.. currentmodule:: lightning_fabric.plugins.precision
49+
50+
.. autosummary::
51+
:toctree: ../../api
52+
:nosignatures:
53+
:template: classtemplate.rst
54+
55+
Precision
56+
DoublePrecision
57+
MixedPrecision
58+
TPUPrecision
59+
TPUBf16Precision
60+
FSDPPrecision
61+
62+
63+
Environments
64+
""""""""""""
65+
66+
.. currentmodule:: lightning_fabric.plugins.environments
67+
68+
.. autosummary::
69+
:toctree: ../../api
70+
:nosignatures:
71+
:template: classtemplate_noindex.rst
72+
73+
~cluster_environment.ClusterEnvironment
74+
~kubeflow.KubeflowEnvironment
75+
~lightning.LightningEnvironment
76+
~lsf.LSFEnvironment
77+
~slurm.SLURMEnvironment
78+
~torchelastic.TorchElasticEnvironment
79+
~xla.XLAEnvironment
80+
81+
82+
IO
83+
""
84+
85+
.. currentmodule:: lightning_fabric.plugins.io
86+
87+
.. autosummary::
88+
:toctree: ../../api
89+
:nosignatures:
90+
:template: classtemplate.rst
91+
92+
~checkpoint_io.CheckpointIO
93+
~torch_io.TorchCheckpointIO
94+
~xla.XLACheckpointIO
95+
96+
97+
Collectives
98+
"""""""""""
99+
100+
.. currentmodule:: lightning_fabric.plugins.collectives
101+
102+
.. autosummary::
103+
:toctree: ../../api
104+
:nosignatures:
105+
:template: classtemplate.rst
106+
107+
Collective
108+
TorchCollective
109+
SingleDeviceCollective
110+
111+
112+
Strategies
113+
^^^^^^^^^^
114+
115+
.. TODO(fabric): include DeepSpeedStrategy, XLAStrategy
116+
117+
.. currentmodule:: lightning_fabric.strategies
118+
119+
.. autosummary::
120+
:toctree: ../../api
121+
:nosignatures:
122+
:template: classtemplate.rst
123+
124+
Strategy
125+
DDPStrategy
126+
DataParallelStrategy
127+
DDPShardedStrategy
128+
FSDPStrategy
129+
ParallelStrategy
130+
SingleDeviceStrategy
131+
SingleTPUStrategy
Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
:orphan:
2+
3+
################
4+
Fabric Arguments
5+
################
6+
7+
8+
accelerator
9+
===========
10+
11+
Choose one of ``"cpu"``, ``"gpu"``, ``"tpu"``, ``"auto"`` (IPU support is coming soon).
12+
13+
.. code-block:: python
14+
15+
# CPU accelerator
16+
fabric = Fabric(accelerator="cpu")
17+
18+
# Running with GPU Accelerator using 2 GPUs
19+
fabric = Fabric(devices=2, accelerator="gpu")
20+
21+
# Running with TPU Accelerator using 8 tpu cores
22+
fabric = Fabric(devices=8, accelerator="tpu")
23+
24+
# Running with GPU Accelerator using the DistributedDataParallel strategy
25+
fabric = Fabric(devices=4, accelerator="gpu", strategy="ddp")
26+
27+
The ``"auto"`` option recognizes the machine you are on and selects the available accelerator.
28+
29+
.. code-block:: python
30+
31+
# If your machine has GPUs, it will use the GPU Accelerator
32+
fabric = Fabric(devices=2, accelerator="auto")
33+
34+
35+
strategy
36+
========
37+
38+
Choose a training strategy: ``"dp"``, ``"ddp"``, ``"ddp_spawn"``, ``"tpu_spawn"``, ``"deepspeed"``, ``"ddp_sharded"``, or ``"ddp_sharded_spawn"``.
39+
40+
.. code-block:: python
41+
42+
# Running with the DistributedDataParallel strategy on 4 GPUs
43+
fabric = Fabric(strategy="ddp", accelerator="gpu", devices=4)
44+
45+
# Running with the DDP Spawn strategy using 4 cpu processes
46+
fabric = Fabric(strategy="ddp_spawn", accelerator="cpu", devices=4)
47+
48+
49+
Additionally, you can pass in your custom strategy by configuring additional parameters.
50+
51+
.. code-block:: python
52+
53+
from lightning.fabric.strategies import DeepSpeedStrategy
54+
55+
fabric = Fabric(strategy=DeepSpeedStrategy(stage=2), accelerator="gpu", devices=2)
56+
57+
58+
Support for Fully Sharded training strategies are coming soon.
59+
60+
61+
devices
62+
=======
63+
64+
Configure the devices to run on. Can be of type:
65+
66+
- int: the number of devices (e.g., GPUs) to train on
67+
- list of int: which device index (e.g., GPU ID) to train on (0-indexed)
68+
- str: a string representation of one of the above
69+
70+
.. code-block:: python
71+
72+
# default used by Fabric, i.e., use the CPU
73+
fabric = Fabric(devices=None)
74+
75+
# equivalent
76+
fabric = Fabric(devices=0)
77+
78+
# int: run on two GPUs
79+
fabric = Fabric(devices=2, accelerator="gpu")
80+
81+
# list: run on GPUs 1, 4 (by bus ordering)
82+
fabric = Fabric(devices=[1, 4], accelerator="gpu")
83+
fabric = Fabric(devices="1, 4", accelerator="gpu") # equivalent
84+
85+
# -1: run on all GPUs
86+
fabric = Fabric(devices=-1, accelerator="gpu")
87+
fabric = Fabric(devices="-1", accelerator="gpu") # equivalent
88+
89+
90+
num_nodes
91+
=========
92+
93+
94+
Number of cluster nodes for distributed operation.
95+
96+
.. code-block:: python
97+
98+
# Default used by Fabric
99+
fabric = Fabric(num_nodes=1)
100+
101+
# Run on 8 nodes
102+
fabric = Fabric(num_nodes=8)
103+
104+
105+
Learn more about distributed multi-node training on clusters :doc:`here <../../clouds/cluster>`.
106+
107+
108+
precision
109+
=========
110+
111+
Fabric supports double precision (64), full precision (32), or half precision (16) operation (including `bfloat16 <https://pytorch.org/docs/1.10.0/generated/torch.Tensor.bfloat16.html>`_).
112+
Half precision, or mixed precision, is the combined use of 32 and 16-bit floating points to reduce the memory footprint during model training.
113+
This can result in improved performance, achieving significant speedups on modern GPUs.
114+
115+
.. code-block:: python
116+
117+
# Default used by the Fabric
118+
fabric = Fabric(precision=32, devices=1)
119+
120+
# 16-bit (mixed) precision
121+
fabric = Fabric(precision=16, devices=1)
122+
123+
# 16-bit bfloat precision
124+
fabric = Fabric(precision="bf16", devices=1)
125+
126+
# 64-bit (double) precision
127+
fabric = Fabric(precision=64, devices=1)
128+
129+
130+
plugins
131+
=======
132+
133+
:ref:`Plugins` allow you to connect arbitrary backends, precision libraries, clusters etc. For example:
134+
To define your own behavior, subclass the relevant class and pass it in. Here's an example linking up your own
135+
:class:`~lightning.fabric.plugins.environments.ClusterEnvironment`.
136+
137+
.. code-block:: python
138+
139+
from lightning.fabric.plugins.environments import ClusterEnvironment
140+
141+
142+
class MyCluster(ClusterEnvironment):
143+
@property
144+
def main_address(self):
145+
return your_main_address
146+
147+
@property
148+
def main_port(self):
149+
return your_main_port
150+
151+
def world_size(self):
152+
return the_world_size
153+
154+
155+
fabric = Fabric(plugins=[MyCluster()], ...)
156+
157+
158+
callbacks
159+
=========
160+
161+
A callback class is a collection of methods that the training loop can call at a specific point in time, for example, at the end of an epoch.
162+
Add callbacks to Fabric to inject logic into your training loop from an external callback class.
163+
164+
.. code-block:: python
165+
166+
class MyCallback:
167+
def on_train_epoch_end(self, results):
168+
...
169+
170+
You can then register this callback, or multiple ones directly in Fabric:
171+
172+
.. code-block:: python
173+
174+
fabric = Fabric(callbacks=[MyCallback()])
175+
176+
177+
Then, in your training loop, you can call a hook by its name. Any callback objects that have this hook will execute it:
178+
179+
.. code-block:: python
180+
181+
# Call any hook by name
182+
fabric.call("on_train_epoch_end", results={...})
183+
184+
185+
loggers
186+
=======
187+
188+
Attach one or several loggers/experiment trackers to Fabric for convenient logging of metrics.
189+
190+
.. code-block:: python
191+
192+
# Default used by Fabric, no loggers are active
193+
fabric = Fabric(loggers=[])
194+
195+
# Log to a single logger
196+
fabric = Fabric(loggers=TensorBoardLogger(...))
197+
198+
# Or multiple instances
199+
fabric = Fabric(loggers=[logger1, logger2, ...])
200+
201+
Anywhere in your training loop, you can log metrics to all loggers at once:
202+
203+
.. code-block:: python
204+
205+
fabric.log("loss", loss)
206+
fabric.log_dict({"loss": loss, "accuracy": acc})

0 commit comments

Comments
 (0)