Skip to content

Conversation

Copilot
Copy link

@Copilot Copilot AI commented Oct 4, 2025

What does this PR do?

Removes an outdated TODO comment from src/lightning/pytorch/loops/fit_loop.py that suggested moving the max_steps check inside the training loop.

Why is this change needed?

The TODO comment on line 179 suggested:

# TODO: Move track steps inside training loop and move part of these condition inside training loop

However, the current implementation is correct and the TODO is unnecessary because:

  1. global_step tracks steps across epochs: The global_step counter is cumulative across all epochs, not per-epoch, so it makes sense to check it at the fit loop level rather than within the epoch loop.

  2. Fit loop is the right place for this check: The fit_loop.done property evaluates when to leave the entire fit process. Since max_steps determines when the entire training should stop (not just a single epoch), checking it here is the appropriate design.

  3. Moving it would add unnecessary complexity: Relocating this check to the epoch loop would require introducing additional flag variables to communicate back to the fit loop when max_steps is reached, making the code more complex without any benefit.

Testing

The existing test suite validates this behavior:

  • test_fit_loop_done_log_messages - Tests the done property including the max_steps check
  • test_min_max_steps_epochs - Tests max_steps behavior across 7 different scenarios

All tests pass successfully with no regressions.

Fixes #<issue_number>

cc @lantiga @justusschock

Original prompt

This section details on the original issue you should resolve

<issue_title>Remove an unnecessary TODO in src/lightning/pytorch/loops/fit_loop.py</issue_title>
<issue_description>### Outline & Motivation

On line 178, there'a TODO item:

# TODO: Move track steps inside training loop and move part of these condition inside training loop
    stop_steps = _is_max_limit_reached(self.epoch_loop.global_step, self.max_steps)
    if stop_steps:
        rank_zero_info(f"`Trainer.fit` stopped: `max_steps={self.max_steps!r}` reached.")
        return True

Since global_step takes effect cross epoches, it's reasonable to perform this check here.
If we remove this check, then we need to add a flag variable, which is set to True when we hit max_steps in epoch_loop, which doesn't make things better.

Pitch

Remove this TODO to make code cleaner.

Additional context

No response

cc @lantiga @justusschock</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #20998

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

The TODO suggested moving max_steps check inside training loop, but this is
correctly placed in fit_loop since global_step tracks steps across epochs
and the fit_loop is responsible for deciding when to stop the entire fit
process. Moving it would require additional flag variables, making the code
more complex without any benefit.

Co-authored-by: Borda <[email protected]>
@Copilot Copilot AI changed the title [WIP] Remove an unnecessary TODO in src/lightning/pytorch/loops/fit_loop.py Remove unnecessary TODO comment in fit_loop.py Oct 4, 2025
@Copilot Copilot AI requested a review from Borda October 4, 2025 21:12
Copilot finished work on behalf of Borda October 4, 2025 21:12
@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Oct 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pl Generic label for PyTorch Lightning package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove an unnecessary TODO in src/lightning/pytorch/loops/fit_loop.py
2 participants