Using Peft LoRA for better, simpler UNet fine-tuning? #9102
Replies: 4 comments
-
| @AbraarArique Hi, I'm recently finetuning unet with lora, the baseline i use is quite like the code you gave above(get_peft_model), but the effect is not good, did you solve this problem or do you have a better solution? I would be really appreciated if you can answer me. | 
Beta Was this translation helpful? Give feedback.
-
| @lyb369 What do you mean that the effect is not good? I ran some UNet LoRA fine-tuning runs using PEFT and it seemed to work fine. Is there a particular problem you're having? Do note that the code example I gave above is just for the UNet whereas the HF script also allows fine-tuning the text encoders... | 
Beta Was this translation helpful? Give feedback.
-
| Thanks for your reply! I use the code like this to add lora_layers to unet, I successfully add the lora_layers, but I find when I freeze the original parameters of the UNet and perform gradient descent only on the LoRA layer parameters, the LoRA layer parameters do not change at all. config = LoraConfig(r=16, target_modules=[...])
unet = UNet2DConditionModel.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    subfolder="unet",
)
fake_score = Loramodel(unet, config, adapter_name="default")My code used for gradient descent is below: with misc.ddp_sync(fake_score_ddp, (round_idx == num_accumulation_rounds - 1)):
                    #Denoised fake images (stop generator gradient) under fake score network, using guidance scale: kappa1=cfg_eval_train
                    noise_fake = sid_sd_denoise(unet=fake_score_ddp,images=images,noise=noise,contexts=contexts,timesteps=timesteps,
                                                     noise_scheduler=noise_scheduler,
                                                     text_encoder=text_encoder, tokenizer=tokenizer, 
                                                     resolution=resolution,dtype=dtype,predict_x0=False,guidance_scale=cfg_train_fake)
                    nan_mask = torch.isnan(noise_fake).flatten(start_dim=1).any(dim=1)
                    if noise_scheduler.config.prediction_type == "v_prediction":
                        target = noise_scheduler.get_velocity(images, noise, timesteps)
                        nan_mask = nan_mask | torch.isnan(target).flatten(start_dim=1).any(dim=1)
                    # Check if there are any NaN values present
                    if nan_mask.any():
                        # Invert the nan_mask to get a mask of samples without NaNs
                        non_nan_mask = ~nan_mask
                        # Filter out samples with NaNs from y_real and y_fake
                        noise_fake = noise_fake[non_nan_mask]
                        noise = noise[non_nan_mask]
                        if noise_scheduler.config.prediction_type == "v_prediction":
                            target = target[non_nan_mask]
                    if noise_scheduler.config.prediction_type == "v_prediction":
                        loss = (noise_fake-target)**2
                        snr = compute_snr(noise_scheduler, timesteps)
                        loss = loss * snr/(snr+1)
                    else:
                        loss = (noise_fake-noise)**2
                    loss=loss.sum().mul(loss_scaling / batch_gpu_total)
                    del images   
                    if len(noise) > 0:
                        loss.backward()Can you or someone help me find the error place, I would be really appreciated! | 
Beta Was this translation helpful? Give feedback.
-
| @lyb369 In my code, I used  If so, then do the added LoRA layers have  params = list(filter(lambda p: p.requires_grad, unet.parameters())) | 
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I was looking at the Stable Diffusion XL LoRA fine-tuning script:
It seems that, while adding LoRA to the UNet is simple and intuitive enough, saving and loading the models/checkpoints is quite complicated (uses internal methods, implementations, and heuristics).
So I'm wondering if it's possible to use Hugging Face PEFT's standard API to do this instead? Like:
I tested and the Peft-wrapped UNet seems to work fine for inference.
So my question is: will this work for training as well, or will the Peft-wrapper cause any problems/incompatibilities?
If this is indeed a simpler approach, would it be better to use this in the example training script as well? (I can update it)
For reference, here's how it's done now:
Beta Was this translation helpful? Give feedback.
All reactions