-
Notifications
You must be signed in to change notification settings - Fork 79
Open
Description
action = T.tanh(actions)*T.tensor(self.max_action).to(self.device)
log_probs = probabilities.log_prob(actions)
log_probs -= T.log(1-action.pow(2) + self.reparam_noise) --> produces negative outputs inside the log, which in turn produces nan
log_probs = log_probs.sum(1, keepdim=True)
How can I fix this issue? Are the following modifications correct?
action = T.tanh(actions)*T.tensor(self.max_action).to(self.device)
log_probs = probabilities.log_prob(actions)
log_probs -= T.log(1-T.tanh(actions).pow(2) + self.reparam_noise)
log_probs = log_probs.sum(1, keepdim=True)
Metadata
Metadata
Assignees
Labels
No labels