The addition of non-math-nor-code RL tasks to the training of R1 (but maybe not R1-Zero) #440

sunjin-k · 2025-02-26T10:00:38Z

sunjin-k
Feb 26, 2025

The R1 report states "rule-based rewards [...] in math, code, and logical reasoning domains" were used.

I assume the logical reasoning tasks refer to things like [1][2].
(In [2], it was shown that RLing on one particular type of logic puzzle was enough to improve performance in AIME/AMCto generalize to math performance).

Plus RL on formatting correctness tasks[3] might also have been done.

Perhaps the collection of such non-math&code RL tasks should be added to to-do list?

Also might be worth noting: the report mentions the usage of logical reasoning tasks only in the section on R1 and not R1-zero.

[1] https://github.com/open-thought/reasoning-gym
[2] Xie, Tian, et al. "Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning."
[3] like the IFeval task in Tulu3 https://huggingface.co/datasets/allenai/RLVR-IFeval?row=0

ocramz · 2025-02-27T01:51:59Z

ocramz
Feb 27, 2025

Maybe relevant: #379

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The addition of non-math-nor-code RL tasks to the training of R1 (but maybe not R1-Zero) #440

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

The addition of non-math-nor-code RL tasks to the training of R1 (but maybe not R1-Zero) #440

Uh oh!

sunjin-k Feb 26, 2025

Replies: 1 comment

Uh oh!

ocramz Feb 27, 2025

sunjin-k
Feb 26, 2025

ocramz
Feb 27, 2025