Skip to content

NaN Issue with SAC Code #7

@SaifAlWahaibi

Description

@SaifAlWahaibi

action = T.tanh(actions)*T.tensor(self.max_action).to(self.device)
log_probs = probabilities.log_prob(actions)
log_probs -= T.log(1-action.pow(2) + self.reparam_noise) --> produces negative outputs inside the log, which in turn produces nan
log_probs = log_probs.sum(1, keepdim=True)

How can I fix this issue? Are the following modifications correct?

action = T.tanh(actions)*T.tensor(self.max_action).to(self.device)
log_probs = probabilities.log_prob(actions)
log_probs -= T.log(1-T.tanh(actions).pow(2) + self.reparam_noise)
log_probs = log_probs.sum(1, keepdim=True)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions