Skip to content

Commit dca45db

Browse files
committed
Add reference to DeepSeekMath in accuracy_reward docstring
1 parent 6648832 commit dca45db

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

trl/rewards/accuracy_rewards.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,9 @@
2222

2323
def accuracy_reward(completions: list[list[dict[str, str]]], solution: list[str], **kwargs) -> list[float | None]:
2424
r"""
25-
Reward function that checks if the completion matches the ground truth.
25+
Reward function that checks if the completion matches the ground truth. This function was built based on the
26+
descrition in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language
27+
Models](https://arxiv.org/abs/2402.03300)
2628
- If both gold and prediction are parseable → use math verification.
2729
- If gold is not parseable → return `None` to skip the example.
2830

0 commit comments

Comments
 (0)