You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source-pytorch/fabric/fabric.rst
+21-20Lines changed: 21 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -169,33 +169,34 @@ Furthermore, you can access the current device from ``fabric.device`` or rely on
169
169
170
170
----------
171
171
172
+
*******************
173
+
Fabric in Notebooks
174
+
*******************
172
175
173
-
Distributed Training Pitfalls
174
-
=============================
175
176
176
-
The :class:`~lightning_fabric.fabric.Fabric` provides you with the tools to scale your training, but there are several major challenges ahead of you now:
177
+
Fabric works exactly the same way in notebooks (Jupyter, Google Colab, Kaggle, etc.) if you only run in a single process or a single GPU.
178
+
If you want to use multiprocessing, for example multi-GPU, you can put your code in a function and pass that function to the
- This happens when processes execute a different section of the code due to different if/else conditions, race conditions on existing files and so on, resulting in hanging.
185
-
* - Cross processes reduction
186
-
- Miscalculated metrics or gradients due to errors in their reduction.
187
-
* - Large sharded models
188
-
- Instantiation, materialization and state management of large models.
189
-
* - Rank 0 only actions
190
-
- Logging, profiling, and so on.
191
-
* - Checkpointing / Early stopping / Callbacks / Logging
192
-
- Ability to customize your training behavior easily and make it stateful.
193
-
* - Fault-tolerant training
194
-
- Ability to resume from a failure as if it never happened.
194
+
# Notebook Cell
195
+
fabric = Fabric(accelerator="cuda", devices=2)
196
+
fabric.launch(train) # Launches the `train` function on two GPUs
195
197
196
198
197
-
If you are facing one of those challenges, then you are already meeting the limit of :class:`~lightning_fabric.fabric.Fabric`.
198
-
We recommend you to convert to :doc:`Lightning <../starter/introduction>`, so you never have to worry about those.
199
+
As you can see, this function accepts one argument, the ``Fabric`` object, and it gets launched on as many devices as specified.
0 commit comments