Skip to content

Resume training from checkpoints #20361

@ArkashJ

Description

@ArkashJ

📚 Documentation

There's a lot of documentation out there about using the resume_from_checkpoint keyword in a pytorch trainer however this is wrong. In the latest pytorch version, one needs to provide the path to the checkpoint (.ckpt file) itself in the fit function for the trainer to get it going. here's some popular incorrect references -

  1. https://stackoverflow.com/questions/71961436/pytorch-lightning-resuming-from-checkpoint-with-new-data
  2. https://lightning.ai/forums/t/how-to-resume-training/432
  3. Resume training from checkpoint with new data #12845
  4. https://www.youtube.com/watch?v=V5KGEzIwAxQ

ChatGPT and claude also got this wrong:
Uploading Screenshot 2024-10-23 at 1.38.11 PM.png…

I wanted this to get visibility because knowing how to resume training from checkpoints is imperative and there's a lot of wrong information out there!

cc @Borda @awaelchli

Metadata

Metadata

Assignees

No one assigned

    Labels

    checkpointingRelated to checkpointingdocsDocumentation related

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions