-
Notifications
You must be signed in to change notification settings - Fork 39
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I have been working on wrapping a custom game with a Dreamer agent. During the execution in the epsilon_greedy mode, I noticed that the expl_amount was not decreasing as expected in dreamer/agents/dreamer_agent.py. It seems the variable self._itr was not properly updated, which led to this issue.
def exploration(self, action: torch.Tensor) -> torch.Tensor:
"""
:param action: action to take, shape (1,) (if categorical), or (action dim,) (if continuous)
:return: action of the same shape passed in, augmented with some noise
"""
if self._mode in ["train", "sample"]:
expl_amount = self.train_noise
if self.expl_decay: # Linear decay
expl_amount = expl_amount - self._itr / self.expl_decay
if self.expl_min:
expl_amount = max(self.expl_min, expl_amount)
elif self._mode == "eval":
expl_amount = self.eval_noise
else:
raise NotImplementedError
if self.expl_type == "additive_gaussian": # For continuous actions
noise = torch.randn(*action.shape, device=action.device) * expl_amount
return torch.clamp(action + noise, -1, 1)
if self.expl_type == "completely_random": # For continuous actions
if expl_amount == 0:
return action
else:
return (
torch.rand(*action.shape, device=action.device) * 2 - 1
) # scale to [-1, 1]
if self.expl_type == "epsilon_greedy": # For discrete actions
action_dim = self.env_model_kwargs["action_shape"][0]
if np.random.uniform(0, 1) < expl_amount:
index = torch.randint(
0, action_dim, action.shape[:-1], device=action.device
)
action = torch.zeros_like(action)
action[..., index] = 1
return action
raise NotImplementedError(self.expl_type)
Solve
To remedy this, I have added self.agent._itr = itr in rlpyt/rlpyt/runners/minibatch_rl.py as shown below:
def train(self):
"""samples
Performs startup, then loops by alternating between
``sampler.obtain_samples()`` and ``algo.optimize_agent()``, logging
diagnostics at the specified interval.
"""
n_itr = self.startup()
for itr in range(n_itr):
logger.set_iteration(itr)
with logger.prefix(f"itr #{itr} "):
self.agent._itr = itr # add
self.agent.sample_mode(itr) # Might not be this agent sampling.
samples, traj_infos = self.sampler.obtain_samples(itr)
self.agent.train_mode(itr)
opt_info = self.algo.optimize_agent(itr, samples)
self.store_diagnostics(itr, traj_infos, opt_info)
if (itr + 1) % self.log_interval_itrs == 0:
self.log_diagnostics(itr)
self.shutdown()
Additional context
With this modification, I was able to observe the decrease in expl_amount during the execution in the epsilon_greedy mode.
Could you please confirm if this is the correct way to address this issue? If not, any suggestions or guidance would be greatly appreciated.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working