Skip to content

Conversation

littlebullGit
Copy link
Contributor

@littlebullGit littlebullGit commented Aug 14, 2025

Title
fix(tuner/lr_finder): apply LR suggestion after checkpoint restore when used as callback [Fixes #21030]
Fixes #16787

Summary

  • When used as a callback, lr_finder.py applied the suggested LR before restoring the checkpoint, so the restore reset the optimizer LR to its original value. This led to “Learning rate set to …” being logged without persisting.
  • This PR applies the LR suggestion after checkpoint restore and updates both the LightningModule LR attribute and the active optimizer param groups so training proceeds with the suggested LR.

Tests

  • Add test_lr_finder.py to assert the optimizer LR equals the LR Finder suggestion after the search completes.

Files changed

  • [src/lightning/pytorch/tuner/lr_finder.py]
  • [tests/tests_pytorch/tuner/test_lr_finder.py]

Breaking changes

  • None.

Docs/Changelog

  • No docs updates required.
  • Optionally add a note under “Fixes” in CHANGELOG.

Fixes


📚 Documentation preview 📚: https://pytorch-lightning--21068.org.readthedocs.build/en/21068/

…en used as callback

Previously, LearningRateFinder applied the suggested LR before restoring the
checkpoint, so the optimizer LR was reverted by the restore step. This caused
the callback to print “Learning rate set to …” without persisting the change.

Change:
- Move LR application to after checkpoint restore and update both the LM attr
  and active optimizer param groups so the LR persists for training.

Tests:
- Add unit test [test_lr_finder_callback_applies_lr_after_restore] to assert the
  optimizer LR matches the LR Finder suggestion after the search completes.

Fixes Lightning-AI#21030
@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Aug 14, 2025
@Borda Borda changed the title fix(tuner/lr_finder): apply LR suggestion after checkpoint restore wh… Fix LR not being correctly set after using LearningRateFinder callback Aug 14, 2025
@SkafteNicki
Copy link
Collaborator

Added #16787 to issue description as it will be fixed by this PR

@littlebullGit
Copy link
Contributor Author

Error: The timeout of 60 minutes has triggered but not all required jobs were passing. This job will need to be re-run to merge your PR. If you do not have write access to the repository you can ask Lightning-AI/lai-frameworks to re-run it for you. If you have any other questions, you can reach out to carmocca for help.

The failed check of "Probot/required-jobs" seems a timeout and has nothing to do with the fix. Can you retrigger the check ?

@Borda
Copy link
Contributor

Borda commented Aug 15, 2025

The failed check of "Probot/required-jobs"

yes this is an aggregation check which, expects that all particular checks finish within an hour so in case something takes longer, it needs to be restarted

@Borda Borda merged commit 3ed9d4e into Lightning-AI:master Aug 15, 2025
84 checks passed
Borda pushed a commit that referenced this pull request Aug 28, 2025
…ack (#21068)

* fix(tuner/lr_finder): apply LR suggestion after checkpoint restore when used as callback

Previously, LearningRateFinder applied the suggested LR before restoring the
checkpoint, so the optimizer LR was reverted by the restore step. This caused
the callback to print “Learning rate set to …” without persisting the change.

Change:
- Move LR application to after checkpoint restore and update both the LM attr
  and active optimizer param groups so the LR persists for training.

Tests:
- Add unit test [test_lr_finder_callback_applies_lr_after_restore] to assert the
  optimizer LR matches the LR Finder suggestion after the search completes.

* changelog
* Apply suggestions from code review

---------

Co-authored-by: Nicki Skafte Detlefsen <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

(cherry picked from commit 3ed9d4e)
lantiga pushed a commit that referenced this pull request Aug 29, 2025
…ack (#21068)

* fix(tuner/lr_finder): apply LR suggestion after checkpoint restore when used as callback

Previously, LearningRateFinder applied the suggested LR before restoring the
checkpoint, so the optimizer LR was reverted by the restore step. This caused
the callback to print “Learning rate set to …” without persisting the change.

Change:
- Move LR application to after checkpoint restore and update both the LM attr
  and active optimizer param groups so the LR persists for training.

Tests:
- Add unit test [test_lr_finder_callback_applies_lr_after_restore] to assert the
  optimizer LR matches the LR Finder suggestion after the search completes.

* changelog
* Apply suggestions from code review

---------

Co-authored-by: Nicki Skafte Detlefsen <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

(cherry picked from commit 3ed9d4e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pl Generic label for PyTorch Lightning package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LearningRateFinder: Learning Rate is not set correctly with the callback LearningRateFinder not working with CLI optimizers

3 participants