-
Notifications
You must be signed in to change notification settings - Fork 79
Description
I am currently using your DDPG implementation for real life training of an inverted pendulum system. Although you currently have save checkpoints for the 4 networks, I notice that they are not enough to resume training. After some research, I believe that there is a need to save the 4 networks, their optimizers, as well as the replay buffer, into a file on the computer.
So that the next time we resume training, we will be able to load all of them and continue the training.
Based on my current situation, I have did 1000 episodes of training and saved/loaded the checkpoints of the 4 networks. But when I resumed training (by loading the checkpoint), the network retains the performance, but it seems that it is trying to retrain by re-experimenting all the errors that it has previously (much earlier) experimented.
E.g, I could see my pendulum really trying to swing up upon loading checkpoint, but yet after a few episodes of resuming training, it started banging against the extremities for many episodes (as if it has never learnt those lessons in the past).
For the optimizer, I can use the state_dict() to extract the parameters for saving, but it appears that it is not as straightforward for the replay buffer. May I know if you can implement that into your codes please?
Also, should we just save only the latest batch_sizes (i used a batch size of only 64) within the memory? Or should I save the entire 1E6 of the memory (for state, state_, action, reward, terminal)?
Do you know if there are other critical information that needs saving/loading in order to permit the resumption of training please?
This is my first time posting an "Issue" in Github, please pardon me if I am not posting in the usual way.
Thanks for your explanation on youtube and sharing your codes here!