-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Please move:
torch.savebf.state.dict(), name)
to between the lines:
rewards, average, d, h, loss = run_upside_down(max_episodes=200)
plt.figure(figsize=(15,8))
I just lost the results of a 4 day experimental run due to this error:
Episode: 7000 | Rewards: 32.87 | Mean_100_Rewards: 0.38 | Loss: 0.6333
qt.qpa.screen: QXcbConnection: Could not connect to display :50.0
Could not connect to any X display
This happened because of a bug in the x2go server exposed when the internet connection is interrupted and a plot function is called.
PS: I was able to construct a graph (attached) of mean reward as a function of episode number by copying the STDOUT log and parsing out the mean reward. As is obvious I increased the number of episodes to 7000 for this experiment. At about 200 episodes the reward peaked and then gradually declined to 0. Any idea why this would happen? The "game" I had it play was very simple: Track a curve that is the sum of 3 sine waves of varying frequency and amplitude with a 256 time-step history available to help classify the action.