File tree Expand file tree Collapse file tree 3 files changed +3
-3
lines changed Expand file tree Collapse file tree 3 files changed +3
-3
lines changed Original file line number Diff line number Diff line change @@ -98,6 +98,6 @@ Peak GPU Memory: 2560.27 MB
98
98
To run the tests:
99
99
100
100
``` bash
101
- pytest -v -s test_log_prob_fn .py
101
+ pytest -v -s tests/test_log_prob_utils .py
102
102
```
103
103
Original file line number Diff line number Diff line change @@ -15,7 +15,7 @@ def grpo_policy_error(
15
15
) -> Tuple [namedtuple , namedtuple ]:
16
16
"""
17
17
Overview:
18
- Group Relative Policy Optimization( arxiv: 2402.03300) .
18
+ Group Relative Policy Optimization (GRPO) algorithm, see https://arxiv.org/abs/ 2402.03300.
19
19
Arguments:
20
20
- data (:obj:`namedtuple`): the grpo input data with fields shown in ``grpo_policy_data``.
21
21
- clip_ratio (:obj:`float`): the ppo clip ratio for the constraint of policy update, defaults to 0.2.
Original file line number Diff line number Diff line change @@ -14,7 +14,7 @@ def rloo_policy_error(
14
14
) -> Tuple [namedtuple , namedtuple ]:
15
15
"""
16
16
Overview:
17
- REINFORCE Leave-One-Out(arXiv: 2402.14740)
17
+ REINFORCE Leave-One-Out (RLOO) algorithm, see https://arxiv.org/abs/ 2402.14740.
18
18
Arguments:
19
19
- data (:obj:`namedtuple`): the rloo input data with fields shown in ``rloo_policy_data``.
20
20
- clip_ratio (:obj:`float`): the ppo clip ratio for the constraint of policy update, defaults to 0.2.
You can’t perform that action at this time.
0 commit comments