Getting bad results in SDXL Lora training - what am I missing? #388

gabriel-oj · 2024-07-17T00:49:57Z

gabriel-oj
Jul 17, 2024

I'm new to LoRA training and have prepared a dataset of 46 high-quality images (mostly 1440x1800) aiming for photorealism. Using an RTX 4060 Ti 8GB, I've managed to complete trainings, but results are poor—quality drops after the second epoch, with distorted outputs and lighting issues, even after 200 epochs.

I've tested different optimizers (Adafactor, ADAMW), learning rates, resolutions (1024x1024, 896x1152), and configurations (BLIP captioning, trigger words) but with no improvement. Aspect ratio bucketing is enabled, and no regularization images have been used.

Any suggestions or feedback would be greatly appreciated. Config file attached.

Gabriel

sdxl 1.0 LoRA - Gabriel.json

Answered by madman404

Jul 19, 2024

Your learning rate is now too low. For your convenience, here's a few things to keep in mind that additionally affect the learning rate's impact:

Network Rank - Larger networks, all else equal, need a lower learning rate to be stable. This relationship seems to hold at scale, i.e LoRA usually need learning rates ~10x higher than the original model.
Network Alpha - Is literally just a scalar on the effective learning rate, but consequently any suggested learning rate from anyone else is completely meaningless unless they also provide this parameter and the rank. Your chosen learning rate is effectively multiplied by (alpha/rank) to get your "real" learning rate.
Optimizer - Valid learning…

View full answer

djp3k05 · 2024-07-18T07:26:10Z

djp3k05
Jul 18, 2024

This usually means that your LR's are to high. Try to lower them a little bit and test again. Try for LR and Unet: 5e-06 and 1e-06 (or 5e-07) for TE1.
Your current settings works fine while finetuning (for SDXL and BF16), but maybe for Lora's are to high.
Also, you could increase the "lora_rank" to something bigger, like 64 or 128, for better results (if you card can handle it).
Change also the "lora_weight_dtype": to BF16.

5 replies

gabriel-oj Jul 19, 2024
Author

Thanks a lot for taking the time to reply. I'm running new tests according to your suggestions, and will inform about the results.

gabriel-oj Jul 19, 2024
Author

Well, I'm using the settings you suggested on my current training, which is still running, and on epoch 58 now.

Some considerations:

I'm generating 2 samples every 2 epochs, both of them have the same resolution I'm training on (896x1152), and each of these prompts are an exact copy of the content of 2 random txt files from the dataset
This dataset contains the same images from the previous trainings, however, now I've downscaled them from 1440x1800 to 896x1120, yet some of the images have different aspect ratios
The samples are not showing deformed bodies/faces anymore, or reddish light distortions as the previous trainings, however, even after 58 epochs, the samples look almost exactly the same, with some minor detail differences
The samples do not resemble my subject's face or body, at all

Any thoughts on this?

Thanks.

Gabriel

madman404 Jul 19, 2024

Your learning rate is now too low. For your convenience, here's a few things to keep in mind that additionally affect the learning rate's impact:

Network Rank - Larger networks, all else equal, need a lower learning rate to be stable. This relationship seems to hold at scale, i.e LoRA usually need learning rates ~10x higher than the original model.
Network Alpha - Is literally just a scalar on the effective learning rate, but consequently any suggested learning rate from anyone else is completely meaningless unless they also provide this parameter and the rank. Your chosen learning rate is effectively multiplied by (alpha/rank) to get your "real" learning rate.
Optimizer - Valid learning rates are not compatible across optimizers. I don't know if Adafactor and AdamW accept similar learning rates, though.
Batch Size - Changing the batch size (or gradient accumulation steps, which acts effectively as a multiplier both on the batch size and time per optimizer update step in equal measure) decreases the overall gradient noise by getting a more representative sample of the dataset, and those less "noisy," more useful gradients allow you to use marginally higher learning rates. Adam mostly diminishes the effect of this, though.
Precision - If you change the LoRA weight dtype from fp32 (you should really leave it at this, explained more below), you will probably have to adjust the learning rate. Bf16 has low precision and high range, and compared to fp16 or even fp32 will need a higher learning rate to get the update steps to actually do anything.

As for identifying a learning rate - there is no easy way to do this. All you can do is run trainings at various learning rates until it works. I like to sweep 1e-5, 1e-4, and 1e-3 first to see which is stable, and then go halfway between the two most stable results and repeat until I am satisfied. Some things to look out for when sweeping learning rates:

A learning rate that is too low will make little to no progress, like you are observing right now.
A learning rate that is too high will diverge, making oversaturated, ugly, or generally non-representative samples that do not appear to even be moving generally in the direction of your dataset.
There is a limit to what learning rates will allow your model to converge in a stable manner, and once you've identified it (ideally identified the learning rate that performs the best on a short test run), you'll have to run that learning rate and instead increase the length of the training until it converges at a result you think fits well enough.

Other notes you may find helpful:

Though it contradicts what the person above me said, you shouldn't set the LoRA weight dtype to bf16. Bf16 is low-precision, high-range, and is intended for mixed precision training rather than being the precision of the entire model. LoRA are relatively small, and the vram gained from using bf16 instead of fp32 is not remotely worth the precision lost. If you DO need to use bf16 LoRA weights, use an optimizer that supports stochastic rounding like adafactor.
If you aren't already, use min snr gamma. It's pretty much just free lunch, and using a value of 5 (default) or 1 (recommendation by birch-san for latent models like stable diffusion, stable in my own testing) will allow your training to converge faster.
Train in the resolution the model you are training on expects (1024 for sdxl), turn on bucketing, and let bucketing handle your data. Do not mess around with cropping stuff randomly unless you know very well what you are doing and why you are doing it. Bucketing works.
Copying other people's training settings isn't a good idea. Because of the number of interdependent parameters (including even the base model and dataset!!!), you will likely have to use exactly the same settings as you are given to see results. It is in your best interests to learn how to properly tune the model yourself instead.

I sincerely hope this helps.

Answer selected by O-J1

gabriel-oj Jul 19, 2024
Author

I can't thank you enough for your generosity. I'll definitely perform more tests based on your insights, and will post my results here.

This comment was marked as off-topic.

Sign in to view

FurkanGozukara · 2024-07-19T03:03:45Z

FurkanGozukara
Jul 19, 2024

3 replies

mx Jul 19, 2024
Collaborator

Stop advertising your stuff on this github project, especially in threads that are totally unrelated.

ZeroCool22 Aug 26, 2024

Stop advertising your stuff on this github project, especially in threads that are totally unrelated.

O-J1 Sep 24, 2024
Collaborator

Stop advertising your stuff on this github project, especially in threads that are totally unrelated.

https://youtu.be/csUD6JkVVM0

Please dont comment if you have no idea. He was at one point constantly advertising. This is not an advertising platform. Both Nerogar and Surgo have stated they will not tolerate it.

Uh oh!

Getting bad results in SDXL Lora training - what am I missing? #388

Uh oh!

Uh oh!

gabriel-oj Jul 17, 2024

Replies: 2 comments · 8 replies

Uh oh!

Uh oh!

djp3k05 Jul 18, 2024

Uh oh!

gabriel-oj Jul 19, 2024 Author

Uh oh!

Uh oh!

gabriel-oj Jul 19, 2024 Author

Uh oh!

Uh oh!

madman404 Jul 19, 2024

Uh oh!

gabriel-oj Jul 19, 2024 Author

This comment was marked as off-topic.

Uh oh!

FurkanGozukara Jul 19, 2024

Uh oh!

mx Jul 19, 2024 Collaborator

Uh oh!

ZeroCool22 Aug 26, 2024

Uh oh!

Uh oh!

O-J1 Sep 24, 2024 Collaborator

gabriel-oj
Jul 17, 2024

Replies: 2 comments 8 replies

djp3k05
Jul 18, 2024

gabriel-oj Jul 19, 2024
Author

gabriel-oj Jul 19, 2024
Author

gabriel-oj Jul 19, 2024
Author

FurkanGozukara
Jul 19, 2024

mx Jul 19, 2024
Collaborator

O-J1 Sep 24, 2024
Collaborator