-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Add documentation warning: Don’t use torch.profiler.profile context manager around Trainer methods #20864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add documentation warning: Don’t use torch.profiler.profile context manager around Trainer methods #20864
Changes from 7 commits
df928db
82d7717
c1c2f86
c6c8d0e
0edf0d0
c188fcb
bb40bab
da4230d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -4,6 +4,29 @@ | |||||||
Find bottlenecks in your code | ||||||||
############################# | ||||||||
|
||||||||
.. warning:: | ||||||||
|
||||||||
**Do not wrap** ``Trainer.fit()``, ``Trainer.validate()``, or other Trainer methods | ||||||||
inside a manual ``torch.profiler.profile`` context manager. | ||||||||
This will cause unexpected crashes and cryptic errors due to incompatibility between | ||||||||
PyTorch Profiler's context management and Lightning's internal training loop. | ||||||||
Instead, always use the ``profiler`` argument in the ``Trainer`` constructor. | ||||||||
|
||||||||
Example (correct usage): | ||||||||
|
||||||||
.. code-block:: python | ||||||||
|
||||||||
import pytorch_lightning as pl | ||||||||
|
||||||||
trainer = pl.Trainer( | ||||||||
profiler="pytorch", # <- This enables built-in profiling safely! | ||||||||
... | ||||||||
) | ||||||||
trainer.fit(model, train_dataloaders=...) | ||||||||
|
||||||||
**References:** | ||||||||
- https://github.com/pytorch/pytorch/issues/88472 | ||||||||
|
||||||||
Comment on lines
+26
to
+29
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is really not a reason for users to know the exact issue. Also even if the user looked at the issue I will say that it does not really help, because there is no mention of profiling in that so it will only lead to more confusion.
Suggested change
|
||||||||
.. raw:: html | ||||||||
|
||||||||
<div class="display-card-container"> | ||||||||
|
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -264,6 +264,14 @@ def __init__( | |||||||||||||
profiler: To profile individual steps during training and assist in identifying bottlenecks. | ||||||||||||||
Default: ``None``. | ||||||||||||||
|
||||||||||||||
.. note:: | ||||||||||||||
Do **not** use a manual ``torch.profiler.profile`` context manager around | ||||||||||||||
``Trainer.fit()``, ``Trainer.validate()``, etc. | ||||||||||||||
This will lead to internal errors and cryptic crashes due to incompatibility between | ||||||||||||||
PyTorch Profiler and Lightning's training loop. | ||||||||||||||
Always use this ``profiler`` argument to enable profiling in Lightning. | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
Comment on lines
+267
to
+274
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Overall I think the intention with the rainer arg docstring is that it should be fairly short and on point. There are not a note on any of the other arguments so lets remove this.
Suggested change
|
||||||||||||||
detect_anomaly: Enable anomaly detection for the autograd engine. | ||||||||||||||
Default: ``False``. | ||||||||||||||
|
||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets move the note to this page
https://lightning.ai/docs/pytorch/stable/tuning/profiler_basic.html
that specifically has to do with the profiler feature in lightning and not have it in the overview page