ActorNetwork - sample_normal method log_probs issue

In the following line, the code can break if the value of 'self.max_action' is high enough that 'action' could have a high value, making the value within the logarithm negative. Negative values of logarithms return NaN.

`log_probs -= T.log(1-action.pow(2)+self.reparam_noise)`

https://github.com/philtabor/Youtube-Code-Repository/blob/a6006478809f3c00026b6ce921a2d4a23b4b1df9/ReinforcementLearning/PolicyGradient/SAC/networks.py#L130

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ActorNetwork - sample_normal method log_probs issue #59

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ActorNetwork - sample_normal method log_probs issue #59

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions