-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Hi !
Thank you for sharing the code of your amazing paper.
I managed to run your code with the BabyAI environment in the setup where weights are not share between the value and the policy nets
I am now trying to run the experiment in the share_weight configuration but I run into an assertion error :
File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/models.py", line 234, in get_value assert self.share_weights, "Must be sharing weights to use get_value"
AssertionError: Must be sharing weights to use get_value
If I manually set the share_weight parameter of the policy model to True, I run into the following error:
Traceback (most recent call last):
File "/Users/tristankarch/Repo/implementation-matters/src/run.py", line 255, in <module>
main(params)
File "/Users/tristankarch/Repo/implementation-matters/src/run.py", line 124, in main
mean_reward = p.train_step()
File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/agent.py", line 473, in train_step
surr_loss, val_loss = self.take_steps(saps)
File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/agent.py", line 436, in take_steps
surr_loss = self.policy_step(*args).mean()
File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/steps.py", line 289, in ppo_step
store, old_vs=batch_old_vs, opt_step=opt_step)
File "/Users/tristankarch/Repo/implementation-matters/src/policy_gradients/steps.py", line 193, in value_step
val_opt.zero_grad()
AttributeError: 'NoneType' object has no attribute 'zero_grad'
It looks like the Value Optimizer is set to None when you call value_step from inside ppo_step.
Moreover, when I look at the value_step function in the weight sharing mode, you are computing the value loss from the ppo_step on minibatches. However, it seems that when value_step is called from ppo_step you split the minibatches again into other minibatches, do the value loss computation on the first split and then return.
It seems that the value_step function is processing the data in the same way no matter the value of the share_weight parameter. This seems a bit strange as when share_weight is False, value_step operates on a whole trajectory whereas when share_weight is True, value_step takes only a batch as input.
Can you confirm that the sharing weight option is supported and works?
Thank you again for sharing!