Want to know what I'm doing wrong in implementation here #444

dhruvsreenivas · 2022-10-23T15:39:01Z

dhruvsreenivas
Oct 23, 2022

Hi everyone, hope you are doing well! I'm working on a research project with the DeepMind JAX ecosystem (Haiku, Optax), but for some reason, I find that when I train over a dataset, the training loss doesn't go down, as shown in this screenshot.

I'm trying to do something pretty simple: train Random Network Distillation (https://arxiv.org/abs/1810.12894, https://github.com/deepmind/acme/tree/master/acme/agents/jax/rnd) on an offline dataset of D4RL MuJoCo data. I tried a few sanity checks, including training on one random data point for some number of iterations. That loss also doesn't go down: it basically stays at 0.005 for 1000 straight epochs (shown in below screenshots):

Here are some snippets:

RND neural network + trainer code:

class RNDTrainState(NamedTuple):
    params: hk.Params
    target_params: hk.Params
    opt_state: optax.OptState

class MLPRNDModel(hk.Module):
    def __init__(self, cfg):
        super().__init__()
        
        self.encoder = hk.nets.MLP(
            [cfg.hidden_dim, cfg.hidden_dim],
            activation=jax.nn.swish
        )
        self.predictor = RNDPredictor(cfg)
    
    def __call__(self, obs):
        reprs = self.encoder(obs)
        return self.predictor(reprs)
    
class RNDModelTrainer:
    '''RND model trainer.'''
    def __init__(self, cfg):
        self.cfg = cfg
        
        if cfg.task in MUJOCO_ENVS:
            rnd_fn = lambda o: MLPRNDModel(cfg.d4rl)(o)
        else:
            rnd_fn = lambda o: ConvRNDModel(cfg.vd4rl)(o)
        
        self.rnd = hk.without_apply_rng(hk.transform(rnd_fn))
        
        # params
        key = jax.random.PRNGKey(cfg.seed)
        k1, k2 = jax.random.split(key)
        
        rnd_params = self.rnd.init(k1, jnp.zeros((1,) + tuple(cfg.obs_shape)))
        target_params = self.rnd.init(k2, jnp.zeros((1,) + tuple(cfg.obs_shape)))
        
        # optimizer
        self.rnd_opt = optax.adam(cfg.lr)
        rnd_opt_state = self.rnd_opt.init(rnd_params)
        
        self.train_state = RNDTrainState(
            params=rnd_params,
            target_params=target_params,
            opt_state=rnd_opt_state
        )
    
    @functools.partial(jax.jit, static_argnames=('self',))
    def rnd_loss_fn(self, params, target_params, obs):
        output = self.rnd.apply(params, obs)
        target_output = self.rnd.apply(target_params, obs)
        
        # no need to do jax.lax.stop_gradient, as gradient is only taken w.r.t. first param
        return jnp.mean(jnp.square(target_output - output))
    
    @functools.partial(jax.jit, static_argnames=('self',))
    def update(self, obs, step):
        del step
        
        loss_grad_fn = jax.value_and_grad(self.rnd_loss_fn)
        loss, grads = loss_grad_fn(self.train_state.params, self.train_state.target_params, obs)
        
        update, new_opt_state = self.rnd_opt.update(grads, self.train_state.opt_state)
        new_params = optax.apply_updates(self.train_state.params, update)
        
        metrics = {
            'rnd_loss': loss
        }
        
        new_train_state = RNDTrainState(
            params=new_params,
            target_params=self.train_state.target_params,
            opt_state=new_opt_state
        )
        
        return new_train_state, metrics

Training loop code:

def train_rnd(self):
        '''Train RND model offline.'''
        for epoch in trange(1, self.cfg.model_train_epochs + 1):
            epoch_metrics = defaultdict(AverageMeter)
            for batch in self.rnd_dataloader:
                obs, _, _, _, _ = batch
                self.rnd_trainer.train_state, batch_metrics = self.rnd_trainer.update(obs, self.global_step)
                
                for k, v in batch_metrics.items():
                    epoch_metrics[k].update(v, obs.shape[0])
            
            if self.cfg.wandb:
                log_dump = {k: v.value() for k, v in epoch_metrics.items()}
                wandb.log(log_dump)
            
            if self.cfg.save_model and epoch % self.cfg.model_save_every == 0:
                model_path = self.pretrained_rnd_dir / f'rnd_{epoch}.pkl'
                self.rnd_trainer.save(model_path)

def train_one_datapoint(self):
        '''Train on one datapoint for sanity checking. Loss SHOULD converge to 0.'''
        self.rng, subkey = jax.random.split(self.rng)
        rand_datapoint = jax.random.normal(key=subkey, shape=(1,) + tuple(self.cfg.obs_shape), dtype=jnp.float32)
        for epoch in trange(1, self.cfg.model_train_epochs + 1):
            self.rnd_trainer.train_state, metrics = self.rnd_trainer.update(rand_datapoint, self.global_step)
            print(f'metrics for epoch {epoch}:  {metrics["rnd_loss"]}')
            
            if self.cfg.wandb:
                wandb.log(metrics)

where self refers to a workspace with an experiment config cfg where I train and save everything of interest.

As shown, I use the optax.adam optimizer with learning rate 1e-3. This I think is standard (maybe a bit large, but I've swept through a few learning rates both larger and smaller to get the same results).

I'm wondering where I am going wrong in this training approach--I think I have it correct, but there's something that I'm certainly missing that I don't know about. Any help would be greatly appreciated! If you guys have any additional questions, I'll be happy to send you updates either on here or through a video chat. Also let me know if the Optax repo is the right place to send this msg--I don't think this is an issue yet (more on me than on the package) so I'm putting it in the discussions tab.

Regarding package versions, I am using Haiku 0.0.7, Optax 0.1.3, JAX 0.3.16 on CUDA for these experiments. I love the framework by the way!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Want to know what I'm doing wrong in implementation here #444

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Want to know what I'm doing wrong in implementation here #444

Uh oh!

Uh oh!

dhruvsreenivas Oct 23, 2022

Replies: 0 comments

dhruvsreenivas
Oct 23, 2022