How to fine tune a PEFT model in stages with LoRA/QLoRA #2774

supreme-gg-gg · 2025-09-07T05:08:03Z

supreme-gg-gg
Sep 7, 2025

Hi, I'm thinking of replicating Zephyr 7B from the Hugging face alignment handbook at here https://github.com/huggingface/alignment-handbook/tree/main/recipes/zephyr-7b-gemma (the technical report: https://huggingface.co/papers/2310.16944)

The guide says the training has two stages:

Apply SFT to fine-tune Gemma 7B on the Deita 10k dataset (link). The result is an SFT model like zephyr-7b-gemma-sft.

Align the SFT model to AI feedback via DPO on a curated mix of 7k examples by Argilla (link). The result is a DPO model like zephyr-7b-gemma.

I'm wondering if I am to do this with LoRA or QLoRA instead of full fine tuning as they've done, does it mean I am training on the same adapter in the second stage as the first? Or does it mean I should add multiple LoRA adapters on each stage? I don't quite understand which makes more sense conceptually since I've only done PEFT with single step fine tuning and I can't find any reference material for this case. Would appreciate any explanation!

Thank you very much!

BenjaminBossan · 2025-09-08T09:22:00Z

BenjaminBossan
Sep 8, 2025
Maintainer

I would generally use two independent LoRA adapters if I plan on using them independently. Just as an example, if I have a LoRA adapter that is trained on certain content and another LoRA adapter that is trained on a certain style, there could be use cases where I just want one or the other. Moreover, if I have separate adapters, I can choose different configurations for them. For example, one adapter may require a higher rank than the other adapter to train properly.

If the second LoRA adapter only makes sense on top of the first one and if the same config works for both, I would just continue training on the first one. You can still keep the first one as a separate checkpoint just in case. Having a single adapter is easier to handle and slightly more efficient, so unless you really need two separate ones, stick with one.

Another approach would be to train the first one, merge it into the base model (with model = model.merge_and_unload()) and then load the second adapter. However, since you consider to use QLoRA and since merging with quantized weights is a lossy process, it would not be my go to approach.

2 replies

supreme-gg-gg Sep 8, 2025
Author

Thanks for the reply, this makes sense. So I believe if I'm trying to do SFT + DPO / PPO probably using one adapter makes more sense.

How can we load the adapter in this case? I tried doing AutoModelForCausalLM.from_pretrain(adapter_path_from_first_qlora_training) and wanted to just keep training directly, but it says

ValueError: You cannot perform fine-tuning on purely quantized models. Please attach trainable adapters on top of the quantized model to correctly perform fine-tuning. Please see: https://huggingface.co/docs/transformers/peft for more details

Should I apply PEFT again with get_peft_model? This is how I created the first fine tune, but since I want to tune the adapter again but just with a different trainer, what is the suggested approach?

BenjaminBossan Sep 9, 2025
Maintainer

In that case, you should do something like:

from peft import PeftModel

base_model = ...  # the same quantized base model as previously
model = PeftModel.from_pretrained(base_model, adapter_path_from_first_qlora_training, is_trainable=True)

From there, you get a trainable PEFT model, similar to what you'd get with get_peft_model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to fine tune a PEFT model in stages with LoRA/QLoRA #2774

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to fine tune a PEFT model in stages with LoRA/QLoRA #2774

Uh oh!

Uh oh!

supreme-gg-gg Sep 7, 2025

Replies: 1 comment · 2 replies

Uh oh!

BenjaminBossan Sep 8, 2025 Maintainer

Uh oh!

supreme-gg-gg Sep 8, 2025 Author

Uh oh!

BenjaminBossan Sep 9, 2025 Maintainer

supreme-gg-gg
Sep 7, 2025

Replies: 1 comment 2 replies

BenjaminBossan
Sep 8, 2025
Maintainer

supreme-gg-gg Sep 8, 2025
Author

BenjaminBossan Sep 9, 2025
Maintainer