Option share_weight does not work

Hi ! 

Thank you for sharing the code of your amazing paper.

I managed to run your code with the BabyAI environment in the setup where weights are not share between the value and the policy nets

I am now trying to run the experiment in the share_weight configuration but I run into an assertion error :

```  
File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/models.py", line 234, in get_value assert self.share_weights, "Must be sharing weights to use get_value"
AssertionError: Must be sharing weights to use get_value
```

If I manually set the share_weight parameter of the policy model to True, I run into the following error:

```
Traceback (most recent call last):
  File "/Users/tristankarch/Repo/implementation-matters/src/run.py", line 255, in <module>
    main(params)
  File "/Users/tristankarch/Repo/implementation-matters/src/run.py", line 124, in main
    mean_reward = p.train_step()
  File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/agent.py", line 473, in train_step
    surr_loss, val_loss = self.take_steps(saps)
  File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/agent.py", line 436, in take_steps
    surr_loss = self.policy_step(*args).mean()
  File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/steps.py", line 289, in ppo_step
    store, old_vs=batch_old_vs, opt_step=opt_step)
  File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/steps.py", line 193, in value_step
    val_opt.zero_grad()
AttributeError: 'NoneType' object has no attribute 'zero_grad'
```

It looks like the Value Optimizer is set to None when you call value_step from inside ppo_step.

Moreover, when I look at the value_step function in the weight sharing mode, you are computing the value loss from the ppo_step on minibatches. However, it seems that when value_step is called from ppo_step you split the minibatches again into other minibatches, do the value loss computation on the first split and then return. 

It seems that the value_step function is processing the data in the same way no matter the value of the share_weight parameter. This seems a bit strange as when share_weight is False, value_step operates on a whole trajectory whereas when share_weight is True, value_step takes only a batch as input.

Can you confirm that the sharing weight option is supported and works?

Thank you again for sharing!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Option share_weight does not work #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Option share_weight does not work #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions