We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent 4a919f8 commit e835c79Copy full SHA for e835c79
ding/rl_utils/ppo.py
@@ -35,7 +35,7 @@ def calculate_kl_div(log_ratio: torch.Tensor, kl_type: str) -> torch.Tensor:
35
The implementation is based on John Schulman's blog post "Approximating KL Divergence".
36
Reference: http://joschu.net/blog/kl-approx.html
37
Arguments:
38
- - log_ratio (:obj:`torch.Tensor`): The log-ratio of probabilities, which should be
+ - log_ratio (:obj:`torch.Tensor`): The log-ratio of probabilities, which should be
39
log(q/p) = logp_new - logp_pretrained.
40
- kl_type (:obj:`str`): The type of KL divergence estimator to use.
41
- 'k1': The standard, unbiased but high-variance estimator: `E_q[log(q/p)]`.
0 commit comments