Welcome to open-r1 Discussions! #36
Replies: 13 comments 26 replies
-
Please share your negative result or failure case so that other don't waste resource repeating the same mistake |
Beta Was this translation helpful? Give feedback.
-
This is the discussion panel for only Deepseek R1? |
Beta Was this translation helpful? Give feedback.
-
I want to know how to improve the generality of reward models? It is important for RL! |
Beta Was this translation helpful? Give feedback.
-
I am interested in two things:
|
Beta Was this translation helpful? Give feedback.
-
Hi folks, I come with a decent understanding of deep learning and transformer models. I have a desktop with an RTX 3060 and an M1 Max with 64 GBs. Just tried out the both the 7b and 70b versions Deepseek R1 with Ollama. Mind-blowing stuff and makes me want to learn as much as much possible. How can I contribute? |
Beta Was this translation helpful? Give feedback.
-
Dear all: Looking to learn and contribute in the process. There are multiple components here (RL, SFT, Data etc.)..If I were to pick one area for contribution, then that could around "Distillation"...Do let know how we can go about this. |
Beta Was this translation helpful? Give feedback.
-
Could you please add a method to export the model to a format supported by Ollama? |
Beta Was this translation helpful? Give feedback.
-
Awesome work on this project! I didn’t know about your project and was working on my own based on the research paper, so it’s really exciting to come across your approach. I enjoyed it , especially with the accuracy and format based reward functions and the way you’ve incorporated strict LaTeX parsing and format validation is super awesome and it’s such a solid way to ensure both accuracy and structure. For my project, I took a slightly different approach to reward calculation using cross entropy. It provides a more nuanced signal by factoring in token level probabilities, which I found works really well for fine tuning models where precision is key. It’s such a cool complement to the binary rewards in your methods. Thanks for sharing this it’s always inspiring to see the creative ways others solve similar challenges!❤️ |
Beta Was this translation helpful? Give feedback.
-
Great project |
Beta Was this translation helpful? Give feedback.
-
Hi, I am a tech lead from Decerno in Göteborg, Sweden 👋 Just wanted to bring up a question regarding distillation of reasoning data from DeepSeek-R1. You have to correct me if I am wrong, but looking into the DeepSeek-V3 model license it seems to me that it would make DeepSeek to own open-r1 according to these two lines:
Just wanted to highlight that before any distillation process gets started, and sorry if it ruin any plans. |
Beta Was this translation helpful? Give feedback.
-
First I am interested in speeding up the training of grpo . Can somebody break down the training time ? Why it's taking long time ? So that community can speed it up.. |
Beta Was this translation helpful? Give feedback.
-
Hey there, I'm a hobbyist AI tinkerer, currently experimenting with (quantum) pseudocode cognito-constructs—what I call "short-term brains"—and thoughtstreams. My idea? To explore a kind of mind-metamorphosis through these constructs, allowing an AI to iteratively transform its reasoning process. Example Cognito-Construct (Prototype)(Not final—just a rough draft of the concept):
The Core IdeaThe goal is to inject different prompts at But here’s where it gets really interesting: construct-metamorphosis—where the AI questions itself, transforms into another cognito-construct, and follows a different reasoning path. Potential OutcomesBy chaining together:
…we might witness emergent, highly dynamic AI behaviors. Specific Example: Protocol CycleStep 1: Initial Construct
Step 2: Prompt & ReasoningPrompt: "Find the degree for the given field extension Q(sqrt(2), sqrt(3), sqrt(18)) over Q." Reasoning:
Step 3: Self-PromptAI Questions Itself: "Could there be a more optimal approach using field tower arguments?" Step 4: Construct-MetamorphosisNew Construct:
Step 5: Second Reasoning ProcessPrompt: "Analyze the field tower structure in Q(sqrt(2), sqrt(3)) over Q." New Reasoning Path:
This structured cycle illustrates how an AI system could evolve its reasoning through self-inquiry and transformation. This is just my two cents, but I’d love to hear your thoughts! TODO: initially create constructs from prompts, check example for correctness. |
Beta Was this translation helpful? Give feedback.
-
Can anyone make me understand the RL part that is done after the cold start with CoT fine-tuning? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
👋 Welcome!
We’re using Discussions as a place to connect with other members of our community. We hope that you:
build together 💪.
To get started, comment below with an introduction of yourself and tell us about what you do with this community.
Beta Was this translation helpful? Give feedback.
All reactions