Replies: 1 comment 2 replies
-
Hey, thanks for question. I don't think you can do DPO on model+adapter to produce second adapter. This is because, when doing lora/qlora DPO, it would eject the adapters when calculating ref prob. This may eject your trained adapter as well. This means the ref prob will be calculated from the original base model instead of base model + sft adapter, which seems incorrect. You could merge the sft adapter then do dpo lora/qlora on that base. At the end, you'll still have both adapters. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
Is it possible to perform DPO training after SFT, without merging to create a new base model? I would like to apply SFT to produce a LoRA adapter, followed by a subsequent DPO step on the model+adapter, to produce a second adapter. Is this something possible? Does it make sense?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions