Saving the networks, optimizers and replay buffer for resuming training.

I am currently using your DDPG implementation for real life training of an inverted pendulum system. Although you currently have save checkpoints for the 4 networks, I notice that they are not enough to resume training. After some research, I believe that there is a need to **save the 4 networks, their optimizers, as well as the replay buffer**, into a file on the computer.

So that the next time we resume training, we will be able to load all of them and continue the training.

Based on my current situation, I have did 1000 episodes of training and saved/loaded the checkpoints of the 4 networks. But when I resumed training (by loading the checkpoint), the network retains the performance, but it seems that it is trying to retrain by re-experimenting all the errors that it has previously (much earlier) experimented. 

E.g, I could see my pendulum really trying to swing up upon loading checkpoint, but yet after a few episodes of resuming training, it started banging against the extremities for many episodes (as if it has never learnt those lessons in the past).


For the optimizer, I can use the state_dict() to extract the parameters for saving, but it appears that it is not as straightforward for the replay buffer. May I know if you can implement that into your codes please?

Also, should we just save only the latest batch_sizes (i used a batch size of only 64) within the memory? Or should I save the entire 1E6 of the memory (for state, state_, action, reward, terminal)?

Do you know if there are other critical information that needs saving/loading in order to permit the resumption of training please?

This is my first time posting an "Issue" in Github, please pardon me if I am not posting in the usual way.

Thanks for your explanation on youtube and sharing your codes here!



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Saving the networks, optimizers and replay buffer for resuming training. #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Saving the networks, optimizers and replay buffer for resuming training. #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions