Training DPO after SFT stage, without adapter merging #2411

vody-am · 2025-03-13T18:29:53Z

vody-am
Mar 13, 2025

Hi,

Is it possible to perform DPO training after SFT, without merging to create a new base model? I would like to apply SFT to produce a LoRA adapter, followed by a subsequent DPO step on the model+adapter, to produce a second adapter. Is this something possible? Does it make sense?

Thanks!

NanoCode012 · 2025-03-14T05:11:15Z

NanoCode012
Mar 14, 2025
Maintainer

Hey, thanks for question.

I don't think you can do DPO on model+adapter to produce second adapter. This is because, when doing lora/qlora DPO, it would eject the adapters when calculating ref prob. This may eject your trained adapter as well. This means the ref prob will be calculated from the original base model instead of base model + sft adapter, which seems incorrect.

You could merge the sft adapter then do dpo lora/qlora on that base. At the end, you'll still have both adapters.

2 replies

vody-am Mar 14, 2025
Author

Ok thanks, I figured what I probably need to do is merge the first adapter from SFT with the base model to produce a new base, and then apply DPO on top of that to get a new adapter. Backing up a few steps, my situation is that I would ideally like to serve multiple adapters applied to the same base model (e.g. 3 different LoRAs from Llama 3 instruct), to reduce the amount of GPUs I need overall, so in an ideal world this would work, but if not it's fine.

NanoCode012 Mar 14, 2025
Maintainer

I’m aware that you can load multiple adapters on a model at a time on inference. Maybe you could look into if it’s applied sequentially? (Then you can load both sft and dpo adapters)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Training DPO after SFT stage, without adapter merging #2411

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Training DPO after SFT stage, without adapter merging #2411

Uh oh!

Uh oh!

vody-am Mar 13, 2025

Replies: 1 comment · 2 replies

Uh oh!

NanoCode012 Mar 14, 2025 Maintainer

Uh oh!

vody-am Mar 14, 2025 Author

Uh oh!

NanoCode012 Mar 14, 2025 Maintainer

vody-am
Mar 13, 2025

Replies: 1 comment 2 replies

NanoCode012
Mar 14, 2025
Maintainer

vody-am Mar 14, 2025
Author

NanoCode012 Mar 14, 2025
Maintainer