-
Notifications
You must be signed in to change notification settings - Fork 763
Open
Description
in dqn/agent.py line 59
if terminal:
screen, reward, action, terminal = self.env.new_random_game()
when starting a new game due to a terminal state.
why we don't need to reset the self.history?
because it would affect the next iteration.
# 1. predict
action = self.predict(self.history.get())
# 2. act
screen, reward, terminal = self.env.act(action, is_training=True)
# 3. observe
self.observe(screen, reward, action, terminal)
the predicted action for self.history.get() is not depending on the current game screens, it will predict action for the previous game screen, which is ended, instead.
Do I miss anything?
Thank you very much.
Metadata
Metadata
Assignees
Labels
No labels