Clarifications on Roadmap #356

zyzhang1130 · 2025-02-18T06:58:58Z

zyzhang1130
Feb 18, 2025

Hi, I am very interested in this project and would like to contribute, but before that I have some clarifications want to make regarding the roadmap:

In the repo's main page README Plan of attack section Step 2 and 3, I notice the reproduction pipeline looks different from the Deepseek R1 report. In particular, there is no mentioning of 'RL reasoning data' in the report, and I don't understand of the purpose of it since the RL part is supposed to be data-free (i.e., solely rely on a rule-based verifier to assign reward for the outcome).
At Step 3: R1 was training with 2 rounds of RL and a SFT in between, why here there is only one round of RL after the SFT?

Appreciate if some main contributor can answer my questions. Also I would like to ask if there is any more effective way to communicate (e.g., Discord) available? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarifications on Roadmap #356

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Clarifications on Roadmap #356

Uh oh!

Uh oh!

zyzhang1130 Feb 18, 2025

Replies: 0 comments

zyzhang1130
Feb 18, 2025