You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So you guys were able to reproduce DeepSeek's distillation results by training Qwen2.5-7B-Math-Instruct for 3 epochs on the 220k dataset. I tried to reproduce a similar result but with Qwen2.5-1.5B-Instruct. I ran the command from the Training section of the README essentially unchanged except I ran it for 3 epochs. I was not able to get very close at all to DeepSeek's Qwen-1.5B distillation results. See the below graphs for the performance of my checkpoints over the three epochs.
What am I doing wrong? I wanted to ask you guys before I spend time fixing the wrong thing, in case someone already knows what I need to change.
Should I be filtering out the incorrect responses from the dataset? Or does the config that I copied from the README already do that?
Does this only work with the 7B model?
Does this only work with the Math-Instruct models and not the Instruct models?
Do I just need to keep going? Do smaller models need to train longer? That curve doesn't look like it's going to come up to meet DeepSeek, but it could...
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
So you guys were able to reproduce DeepSeek's distillation results by training Qwen2.5-7B-Math-Instruct for 3 epochs on the 220k dataset. I tried to reproduce a similar result but with Qwen2.5-1.5B-Instruct. I ran the command from the Training section of the README essentially unchanged except I ran it for 3 epochs. I was not able to get very close at all to DeepSeek's Qwen-1.5B distillation results. See the below graphs for the performance of my checkpoints over the three epochs.
What am I doing wrong? I wanted to ask you guys before I spend time fixing the wrong thing, in case someone already knows what I need to change.
Thanks for all your hard work!
Beta Was this translation helpful? Give feedback.
All reactions