-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Resuming from checkpoint is very important for recovering from crashes or for extending training but there are a lot of moving parts in RL pipelines that we need to decide on. At a minimum we need all of the things that are need from regular training:
- Checkpoint
- Optimizer State
- LR Schedulers (with option to extend)
- data step
- seed (if you can use it)
But on top of this there are a lot of other things that could be restored in an RL pipeline
- replay buffer data
- states of any tools or stateful services
- any additional models critics or reward models that get updated
I think we can start with just the first.
allenwang28
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request