-
Hi HF community. I'm training a qwen 2.5 1.5b instruct using accelerate and SFT. I use a validation set, with eval loss being computed every so many steps. On the completion side the answers are answer only, and wouldn't be repeated, making memorizing over-training within the first epoch theoretically impossible, if completion only was working. But that's not what I'm seeing at all, during training. I'm seeing some seriously suspect things that suggest to me that maybe completion_only_loss secretly doesn't support what I'm trying to do. For one, the training speed in not different between passing a vanilla conversational modeling dataset with just a 'messages' column. But what's more damning is, even though the q&a is shuffled, the loss on the train data gets over trained to hell (<0.05) within the first 5% of the first epoch. The eval loss makes progress within the first 5% of the first epoch but then bounces hard by 10% as the model over-trains on train and worsens on eval performance. Given the completions are answer only, there's not really much to memorize other than the eos tokens and role:assistant which would be common within the eval set as well. But eval loss is way off with the min being around 0.25 before bouncing off and getting worse. Am I doing anything obviously wrong? Any insight is much appreciated. reqs:
train loop:
logs:
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
Thanks for reporting, do you mind sharing a sample of your dataset so that we can try to reproduce? |
Beta Was this translation helpful? Give feedback.
-
I believe the issue is on my end. The training regime for this synthetic dataset works much more like I'd expect. I'll retract this discussion for now. |
Beta Was this translation helpful? Give feedback.
-
For anyone that finds their way here. It seems like liger might break completion only as per this issue in trl: |
Beta Was this translation helpful? Give feedback.
For anyone that finds their way here. It seems like liger might break completion only as per this issue in trl:
#3484