You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source-fabric/advanced/compile.rst
+26-23Lines changed: 26 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -220,6 +220,7 @@ On PyTorch 2.2 and later, ``torch.compile`` will detect dynamism automatically a
220
220
221
221
Numbers produced with NVIDIA A100 SXM4 40GB, PyTorch 2.2.0, CUDA 12.1.
222
222
223
+
223
224
----
224
225
225
226
@@ -255,17 +256,33 @@ Naturally, the tradoff here is that it will consume a bit more memory.
255
256
256
257
You can find a full list of compile options in the `PyTorch documentation <https://pytorch.org/docs/stable/generated/torch.compile.html>`_.
257
258
259
+
260
+
----
261
+
262
+
263
+
**************************************
264
+
A note about torch.compile in practice
265
+
**************************************
266
+
267
+
In practice, you will find that ``torch.compile`` often doesn't work well and can even be counter-productive.
268
+
Compilation may fail with cryptic error messages that are impossible to debug without help from the PyTorch team.
269
+
It is also not uncommon that ``torch.compile`` will produce a significantly *slower* model or one with much higher memory usage.
270
+
On top of that, the compilation phase itself can be incredibly slow, taking several minutes to finish.
271
+
For these reasons, we recommend that you don't waste too much time trying to apply ``torch.compile`` during development, and rather evaluate its effectiveness toward the end when you are about to launch long-running, expensive experiments.
272
+
Always compare the speed and memory usage of the compiled model against the original model!
As stated earlier, we recommend that you compile the model before calling ``fabric.setup()``.
266
-
However, if you are using DDP or FSDP with Fabric, the compilation won't incorporate the distributed calls inside these wrappers by default.
267
-
In an experimental feature, you can let ``fabric.setup()`` reapply the ``torch.compile`` call after the model gets wrapped in DDP/FSDP internally.
268
-
In the future, this option will become the default.
283
+
In the case of DDP and FSDP, ``fabric.setup()`` will automatically reapply the ``torch.compile`` call after the model gets wrapped in DDP/FSDP internally.
284
+
This will ensure that the compilation can incorporate the distributed calls and optimize them.
285
+
However, should you have issues compiling DDP and FSDP models, you can opt out of this feature:
269
286
270
287
.. code-block:: python
271
288
@@ -275,25 +292,11 @@ In the future, this option will become the default.
275
292
# Compile the model
276
293
model = torch.compile(model)
277
294
278
-
# Default: `fabric.setup()` will not reapply the compilation over DDP/FSDP
279
-
model = fabric.setup(model, _reapply_compile=False)
280
-
281
-
# Recompile the model over DDP/FSDP (experimental)
295
+
# Default: `fabric.setup()` will configure compilation over DDP/FSDP for you
282
296
model = fabric.setup(model, _reapply_compile=True)
283
297
298
+
# Turn it off if you see issues with DDP/FSDP
299
+
model = fabric.setup(model, _reapply_compile=False)
284
300
285
-
----
286
-
287
-
288
-
**************************************
289
-
A note about torch.compile in practice
290
-
**************************************
291
-
292
-
In practice, you will find that ``torch.compile`` often doesn't work well and can even be counter-productive.
293
-
Compilation may fail with cryptic error messages that are impossible to debug without help from the PyTorch team.
294
-
It is also not uncommon that ``torch.compile`` will produce a significantly *slower* model or one with much higher memory usage.
295
-
On top of that, the compilation phase itself can be incredibly slow, taking several minutes to finish.
296
-
For these reasons, we recommend that you don't waste too much time trying to apply ``torch.compile`` during development, and rather evaluate its effectiveness toward the end when you are about to launch long-running, expensive experiments.
297
-
Always compare the speed and memory usage of the compiled model against the original model!
0 commit comments