Skip to content

Issue in Adjusting Exploration Amount in Dreamer Agent #98

@takashi-yamanashi

Description

@takashi-yamanashi

Describe the bug
I have been working on wrapping a custom game with a Dreamer agent. During the execution in the epsilon_greedy mode, I noticed that the expl_amount was not decreasing as expected in dreamer/agents/dreamer_agent.py. It seems the variable self._itr was not properly updated, which led to this issue.

    def exploration(self, action: torch.Tensor) -> torch.Tensor:
        """
        :param action: action to take, shape (1,) (if categorical), or (action dim,) (if continuous)
        :return: action of the same shape passed in, augmented with some noise
        """
        if self._mode in ["train", "sample"]:
            expl_amount = self.train_noise
            if self.expl_decay:  # Linear decay
                expl_amount = expl_amount - self._itr / self.expl_decay
            if self.expl_min:
                expl_amount = max(self.expl_min, expl_amount)
        elif self._mode == "eval":
            expl_amount = self.eval_noise
        else:
            raise NotImplementedError
            
        if self.expl_type == "additive_gaussian":  # For continuous actions
            noise = torch.randn(*action.shape, device=action.device) * expl_amount
            return torch.clamp(action + noise, -1, 1)
        if self.expl_type == "completely_random":  # For continuous actions
            if expl_amount == 0:
                return action
            else:
                return (
                    torch.rand(*action.shape, device=action.device) * 2 - 1
                )  # scale to [-1, 1]
        if self.expl_type == "epsilon_greedy":  # For discrete actions
            action_dim = self.env_model_kwargs["action_shape"][0]
            if np.random.uniform(0, 1) < expl_amount:
                index = torch.randint(
                    0, action_dim, action.shape[:-1], device=action.device
                )
                action = torch.zeros_like(action)
                action[..., index] = 1
            return action
        raise NotImplementedError(self.expl_type)

Solve
To remedy this, I have added self.agent._itr = itr in rlpyt/rlpyt/runners/minibatch_rl.py as shown below:

    def train(self):
        """samples
        Performs startup, then loops by alternating between
        ``sampler.obtain_samples()`` and ``algo.optimize_agent()``, logging
        diagnostics at the specified interval.
        """
        n_itr = self.startup()
        for itr in range(n_itr):
            logger.set_iteration(itr)
            with logger.prefix(f"itr #{itr} "):
                self.agent._itr = itr # add
                self.agent.sample_mode(itr)  # Might not be this agent sampling.
                samples, traj_infos = self.sampler.obtain_samples(itr)
                self.agent.train_mode(itr)
                opt_info = self.algo.optimize_agent(itr, samples)
                self.store_diagnostics(itr, traj_infos, opt_info)
                if (itr + 1) % self.log_interval_itrs == 0:
                    self.log_diagnostics(itr)
        self.shutdown()

Additional context
With this modification, I was able to observe the decrease in expl_amount during the execution in the epsilon_greedy mode.

Could you please confirm if this is the correct way to address this issue? If not, any suggestions or guidance would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions