Skip to content

Commit 6c2ca2f

Browse files
committed
style(nyz): polish rl_utils style details (ci skip)
1 parent 8f48cb1 commit 6c2ca2f

File tree

3 files changed

+3
-3
lines changed

3 files changed

+3
-3
lines changed

ding/rl_utils/tests/readme.md renamed to ding/rl_utils/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,6 @@ Peak GPU Memory: 2560.27 MB
9898
To run the tests:
9999

100100
```bash
101-
pytest -v -s test_log_prob_fn.py
101+
pytest -v -s tests/test_log_prob_utils.py
102102
```
103103

ding/rl_utils/grpo.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ def grpo_policy_error(
1515
) -> Tuple[namedtuple, namedtuple]:
1616
"""
1717
Overview:
18-
Group Relative Policy Optimization( arxiv:2402.03300) .
18+
Group Relative Policy Optimization (GRPO) algorithm, see https://arxiv.org/abs/2402.03300.
1919
Arguments:
2020
- data (:obj:`namedtuple`): the grpo input data with fields shown in ``grpo_policy_data``.
2121
- clip_ratio (:obj:`float`): the ppo clip ratio for the constraint of policy update, defaults to 0.2.

ding/rl_utils/rloo.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ def rloo_policy_error(
1414
) -> Tuple[namedtuple, namedtuple]:
1515
"""
1616
Overview:
17-
REINFORCE Leave-One-Out(arXiv:2402.14740)
17+
REINFORCE Leave-One-Out (RLOO) algorithm, see https://arxiv.org/abs/2402.14740.
1818
Arguments:
1919
- data (:obj:`namedtuple`): the rloo input data with fields shown in ``rloo_policy_data``.
2020
- clip_ratio (:obj:`float`): the ppo clip ratio for the constraint of policy update, defaults to 0.2.

0 commit comments

Comments
 (0)