Clarifications on Roadmap #356
Unanswered
zyzhang1130
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am very interested in this project and would like to contribute, but before that I have some clarifications want to make regarding the roadmap:
Plan of attack
section Step 2 and 3, I notice the reproduction pipeline looks different from the Deepseek R1 report. In particular, there is no mentioning of 'RL reasoning data' in the report, and I don't understand of the purpose of it since the RL part is supposed to be data-free (i.e., solely rely on a rule-based verifier to assign reward for the outcome).Appreciate if some main contributor can answer my questions. Also I would like to ask if there is any more effective way to communicate (e.g., Discord) available? Thanks.
Beta Was this translation helpful? Give feedback.
All reactions