Skip to content

Commit a913db8

Browse files
authored
Update Lightning Lite docs (3/n) (#16245)
1 parent 0a928e8 commit a913db8

File tree

1 file changed

+21
-20
lines changed

1 file changed

+21
-20
lines changed

docs/source-pytorch/fabric/fabric.rst

Lines changed: 21 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -169,33 +169,34 @@ Furthermore, you can access the current device from ``fabric.device`` or rely on
169169

170170
----------
171171

172+
*******************
173+
Fabric in Notebooks
174+
*******************
172175

173-
Distributed Training Pitfalls
174-
=============================
175176

176-
The :class:`~lightning_fabric.fabric.Fabric` provides you with the tools to scale your training, but there are several major challenges ahead of you now:
177+
Fabric works exactly the same way in notebooks (Jupyter, Google Colab, Kaggle, etc.) if you only run in a single process or a single GPU.
178+
If you want to use multiprocessing, for example multi-GPU, you can put your code in a function and pass that function to the
179+
:meth:`~lightning_fabric.fabric.Fabric.launch` method:
177180

178181

179-
.. list-table::
180-
:widths: 50 50
181-
:header-rows: 0
182+
.. code-block:: python
183+
184+
185+
# Notebook Cell
186+
def train(fabric):
187+
188+
model = ...
189+
optimizer = ...
190+
model, optimizer = fabric.setup(model, optimizer)
191+
...
192+
182193
183-
* - Processes divergence
184-
- This happens when processes execute a different section of the code due to different if/else conditions, race conditions on existing files and so on, resulting in hanging.
185-
* - Cross processes reduction
186-
- Miscalculated metrics or gradients due to errors in their reduction.
187-
* - Large sharded models
188-
- Instantiation, materialization and state management of large models.
189-
* - Rank 0 only actions
190-
- Logging, profiling, and so on.
191-
* - Checkpointing / Early stopping / Callbacks / Logging
192-
- Ability to customize your training behavior easily and make it stateful.
193-
* - Fault-tolerant training
194-
- Ability to resume from a failure as if it never happened.
194+
# Notebook Cell
195+
fabric = Fabric(accelerator="cuda", devices=2)
196+
fabric.launch(train) # Launches the `train` function on two GPUs
195197
196198
197-
If you are facing one of those challenges, then you are already meeting the limit of :class:`~lightning_fabric.fabric.Fabric`.
198-
We recommend you to convert to :doc:`Lightning <../starter/introduction>`, so you never have to worry about those.
199+
As you can see, this function accepts one argument, the ``Fabric`` object, and it gets launched on as many devices as specified.
199200

200201

201202
----------

0 commit comments

Comments
 (0)