-
Notifications
You must be signed in to change notification settings - Fork 3.6k
docs(fit_loop): Improve documentation and comments #20426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Added comments to clarify the purpose of methods in fit_loop.py. - Explained the logic behind multiple calls to setup_data(). - Documented handling of validation checks and logging intervals. These changes aim to enhance code readability and maintainability for future developers.
for more information, see https://pre-commit.ci
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #20426 +/- ##
=========================================
- Coverage 89% 81% -8%
=========================================
Files 267 264 -3
Lines 23065 23148 +83
=========================================
- Hits 20574 18740 -1834
- Misses 2491 4408 +1917 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pmatczak2
It's in general a good thing to add comments to code, but comments need to add meaningful value for the reader in order to not be line-noise.
As in, clarify what happens in depth and not superficially, beyond what the following source code already describes. See comments.
else: | ||
combined_loader = train_dataloader | ||
|
||
# Handle overfitting scenarios |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct. Overfit batches is a feature of Trainer to purposefully overfit to spot potential bugs in users code.
train_dataloader = _request_dataloader(source) | ||
trainer.strategy.barrier("train_dataloader()") | ||
|
||
# Initialize combined loader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be extended: from now on we assume we deal with a combined_loader
, in case of a single dataloader we treat it as a combined_loader
holding a single data loader.
if trainer.overfit_batches > 0: | ||
_resolve_overfit_batches(combined_loader, mode=RunningStage.TRAINING) | ||
|
||
# Process each data loader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the comment to be useful we need to expand what "Process" means
# Store epoch of dataloader reset for reload_dataloaders_every_n_epochs | ||
self._last_train_dl_reload_epoch = trainer.current_epoch | ||
|
||
# Validation check interval logic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment is only useful it if sheds more light on the underlying mechanisms.
Adding "Validation check interval logic" on top of if isinstance(trainer.val_check_interval, int) ...
is pretty redundant.
What does this logic do exactly is what will make the comment useful for readers. Conversely it's line noise.
trainer.val_check_batch = int(self.max_batches * trainer.val_check_interval) | ||
trainer.val_check_batch = max(1, trainer.val_check_batch) | ||
|
||
# Warning for logging intervals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning what for what reason?
Closing for now, feel free to reopen at any time |
Description
This pull request introduces enhancements to the
fit_loop.py
file within the PyTorch Lightning framework. The primary focus of the changes is to improve code documentation and clarify the logic surrounding data loading and management in the training loop.Key Changes
Enhanced Documentation:
_FitLoop
class to explain the purpose and functionality of key methods, particularlysetup_data()
,on_run_start()
, andon_advance_start()
.Clarified Logic for Data Loading:
setup_data()
, emphasizing its role in ensuring that data loaders are fresh for each epoch and the conditions under which they are reloaded.Improved Readability:
Specific Method Highlights:
setup_data()
: Documented its purpose in managing training data loaders and handling overfitting scenarios.on_run_start()
: Clarified its role in setting up validation data loaders and invoking relevant hooks.on_advance_start()
: Explained the necessity of callingsetup_data()
to prepare for the current epoch.Impact
These changes aim to enhance the maintainability and clarity of the codebase, making it easier for contributors to understand the data loading logic and the overall training process within PyTorch Lightning.
📚 Documentation preview 📚: https://pytorch-lightning--20426.org.readthedocs.build/en/20426/