Reproduction:
- Start a training run for 10 steps with checkpointing set to 5 steps
- Quit run after 6 steps
- Start new run with
resume_from_checkpoint
Training runs as expected, but all information before step 5 is lost as the same wandb session of the initial run is not resumed.
The same wandb session should be continued to preserve validation videos and other data