-
Notifications
You must be signed in to change notification settings - Fork 763
Open
Description
`
for self.step in tqdm(range(start_step, self.max_step), ncols=70, initial=start_step):
if self.step == self.learn_start:
num_game, self.update_count, ep_reward = 0, 0, 0.
total_reward, self.total_loss, self.total_q = 0., 0., 0.
ep_rewards, actions = [], []
# 1. predict
action = self.predict(self.history.get())
# 2. act
screen, reward, terminal = self.env.act(action, is_training=True)
# 3. observe
self.observe(screen, reward, action, terminal)
if terminal:
screen, reward, action, terminal = self.env.new_random_game()
num_game += 1
ep_rewards.append(ep_reward)
ep_reward = 0.
`
Function train in agent.py may not handle properly when the game is terminated. As the game is terminated, the new screen didn't add into history and memory, self.history isn't get updated. And in the next iteration, action = self.predict(self.history.get()) will be the same, i.e. terminated.
Metadata
Metadata
Assignees
Labels
No labels