Replies: 1 comment
-
Maybe relevant: #379 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
The R1 report states "rule-based rewards [...] in math, code, and logical reasoning domains" were used.
I assume the logical reasoning tasks refer to things like [1][2].
(In [2], it was shown that RLing on one particular type of logic puzzle was enough to improve performance in AIME/AMCto generalize to math performance).
Plus RL on formatting correctness tasks[3] might also have been done.
Perhaps the collection of such non-math&code RL tasks should be added to to-do list?
Also might be worth noting: the report mentions the usage of logical reasoning tasks only in the section on R1 and not R1-zero.
[1] https://github.com/open-thought/reasoning-gym
[2] Xie, Tian, et al. "Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning."
[3] like the IFeval task in Tulu3 https://huggingface.co/datasets/allenai/RLVR-IFeval?row=0
Beta Was this translation helpful? Give feedback.
All reactions