-
-
Notifications
You must be signed in to change notification settings - Fork 362
How To Do Stable Diffusion LORA Training By Using Web UI On Different Models Tested SD 15 SD 21
How To Do Stable Diffusion LORA Training By Using Web UI On Different Models - Tested SD 1.5, SD 2.1
Full tutorial link > https://www.youtube.com/watch?v=mfaqqL5yOO4
Our Discord : https://discord.gg/HbqgGaZVmr. Ultimate guide to the LoRA training. If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 https://www.patreon.com/SECourses
Playlist of Stable Diffusion Tutorials, #Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, #LoRA, AI Upscaling, Pix2Pix, Img2Img:
https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3
Welcome to the ultimate beginner's guide to training with #StableDiffusion models using Automatic1111 Web UI. In this video, we will walk you through the entire process of setting up and training a Stable Diffusion model, from installing the LoRA extension to preparing your training set and tuning your training parameters. We'll also cover advanced training options and show you how to generate new images using your trained model. By the end of this video, you'll have a solid understanding of how to use Stable Diffusion to train your own custom models and generate high-quality images.
You should watch these two videos prior to this one if you don't have sufficient knowledge about Stable Diffusion or Automatic1111 Web UI:
1 - Easiest Way to Install & Run Stable Diffusion Web UI on PC by Using Open Source Automatic Installer - https://youtu.be/AZg6vzWHOTA
2 - How to Use SD 2.1 & Custom Models on Google Colab for Training with Dreambooth & Image Generation - https://youtu.be/AZg6vzWHOTA
00:00:00 Introduction speech
00:01:07 How to install the LoRA extension to the Stable Diffusion Web UI
00:02:36 Preparation of training set images by properly sized cropping
00:02:54 How to crop images using Paint .NET, an open-source image editing software
00:05:02 What is Low-Rank Adaptation (LoRA)
00:05:35 Starting preparation for training using the DreamBooth tab - LoRA
00:06:50 Explanation of all training parameters, settings, and options
00:08:27 How many training steps equal one epoch
00:09:09 Save checkpoints frequency
00:09:48 Save a preview of training images after certain steps or epochs
00:10:04 What is batch size in training settings
00:11:56 Where to set LoRA training in SD Web UI
00:13:45 Explanation of Concepts tab in training section of SD Web UI
00:14:00 How to set the path for training images
00:14:28 Classification Dataset Directory
00:15:22 Training prompt - how to set what to teach the model
00:15:55 What is Class and Sample Image Prompt in SD training
00:17:57 What is Image Generation settings and why we need classification image generation in SD training
00:19:40 Starting the training process
00:21:03 How and why to tune your Class Prompt (generating generic training images)
00:22:39 Why we generate regularization generic images by class prompt
00:23:27 Recap of the setting up process for training parameters, options, and settings
00:29:23 How much GPU, CPU, and RAM the class regularization image generation uses
00:29:57 Training process starts after class image generation completed
00:30:04 Displaying the generated class regularization images folder for SD 2.1
00:30:31 The speed of the training process - how many seconds per iteration on an RTX 3060 GPU
00:31:19 Where LoRA training checkpoints (weights) are saved
00:32:36 Where training preview images are saved and our first training preview image
00:33:10 When we will decide to stop training
00:34:09 How to resume training after training has crashed or you close it down
00:36:49 Lifetime vs. session training steps
00:37:54 After 30 epochs, resembling images start to appear in the preview folder
00:38:19 The command line printed messages are incorrect in some cases
00:39:05 Training step speed, a certain number of seconds per iteration (IT)
00:39:44 How I'm picking a checkpoint to generate a full model .ckpt file
00:40:23 How to generate a full model .ckpt file from a LoRA checkpoint .pt file
00:41:17 Generated/saved file name is incorrect, but it is generated from the correct selected .pt file
00:42:01 Doing inference (generating new images) using the text2img tab with our newly trained and generated model
00:42:47 The results of SD 2.1 Version 768 pixel model after training with the LoRA method and teaching a human face
00:44:38 Setting up the training parameters/options for SD version 1.5 this time
00:48:35 Re-generating class regularization images since SD 1.5 uses 512 pixel resolution
00:49:11 Displaying the generated class regularization images folder for SD 1.5
00:50:16 Training of Stable Diffusion 1.5 using the LoRA methodology and teaching a face has been completed and the results are displayed
00:51:09 The inference (text2img) results with SD 1.5 training
00:51:19 You have to do more inference with LoRA since it has less precision than DreamBooth
00:51:39 How to give more attention/emphasis to certain keywords in the SD Web UI
00:52:51 How to generate more than 100 images
00:54:46 How to check PNG info to see used prompts and settings
00:55:24 How to upscale using AI models
00:56:12 Fixing face image quality, especially eyes, with GFPGAN visibility
00:56:32 How to batch post-process
00:57:00 Where batch-generated images are saved
-
00:00:00 Greetings everyone. Welcome to the most beginner friendly guide for how to do training on Stable
-
00:00:06 Diffusion models by using Automatic1111 web UI. In this tutorial I will train portrait images of
-
00:00:12 my brother by using Low-Rank Adaptation, as known as LoRA training method on the Stable
-
00:00:18 Diffusion 2.1 768 pixels model. If you do not have prior knowledge, please watch these two
-
00:00:25 videos on our channel. On our channel, go to the playlist section and in here you see we
-
00:00:32 have Stable Diffusion DreamBooth playlist And in here first watch easiest way to install and run
-
00:00:41 Stable Diffusion web UI on PC. So this will teach you how to install web UI on PC and how to run it.
-
00:00:50 And then watch how to use Stable Diffusion version 2.1 and different models in the web UI. This will
-
00:00:57 teach you how to download and install different models and use them with the web UI. After that,
-
00:01:03 you are ready to watch this tutorial and follow me. To be able to train with LoRA,
-
00:01:10 you need to go to the extensions tab here and install DreamBooth extension,
-
00:01:14 check for updates. And if you don't know how to install from available, first go to available tab,
-
00:01:22 load from and in here search DreamBooth. Since I am currently hiding extension with installed
-
00:01:30 it is not showing. But when I disable it, you see DreamBooth is already installed but it has
-
00:01:36 updates. So I am going to update it. OK, so for updating, we just click apply and restart
-
00:01:42 UI and it updates. Now, when we check for updates, you see we have the latest version. And, as I
-
00:01:48 said, for installation, go to available. And when I check this installed, it shows the DreamBooth is
-
00:01:54 here and click install. OK, that's all it. After that, and after you restart your application,
-
00:02:00 you may need to do a full restart for DreamBooth tab to appear. You will get this tab. OK, and once
-
00:02:08 you are in here and from the models, you have the version 2.1. You are ready to follow me. You see,
-
00:02:18 current Stable Diffusion checkpoint is 2.1. OK, first of all, before starting our training,
-
00:02:24 we need to prepare our images. Since I am going to use 768 pixels version, I need to set my images
-
00:02:33 as 768 pixels. So my images are inside in this folder. I didn't still set their resolution. So
-
00:02:44 first I will show you how you can crop them with an open source, a free software: Paint .NET. Let
-
00:02:51 me show you Paint .NET. OK, this is the Paint .NET and you can download paint dot net from its
-
00:02:58 official website in here. It's an open source .NET based software. Alternatively, you can use this
-
00:03:04 website, which is free to resize and crop your images. But I prefer Paint .NET. I will show how
-
00:03:10 to crop one of the images. So I am going to drag and drop this image into here. Click open. And
-
00:03:16 in here you see there is rectangle select, and in here I click fixed ratio. I set it one hundred and
-
00:03:23 one hundred, like this. Then I am selecting the image like this as I want. I click ctrl-C to copy,
-
00:03:30 I click ctrl-R for resize. I am typing something smaller and clicking enter. Then I am clicking
-
00:03:38 ctrl-V expanding. Then I am clicking ctrl-R again to resize. And I am exactly resizing as 768
-
00:03:48 pixels. Then I save it with one hundred percent quality. You can also use PNG or JPG images.
-
00:03:58 Alternatively, you can use Birme dot net as well. However, you may not trust this website. It is up
-
00:04:06 to you. So I will select two images from here, upload them. And in here I am going to select
-
00:04:12 768 pixels like this: 768 pixels. OK, and you can just set where you want cut to be like this. Then
-
00:04:23 you need to click save as zip. It will download a zip like this. You can click it and you can then
-
00:04:31 extract them into your folder and overwrite existing files And they will be exactly as. Let me
-
00:04:39 show you 768 pixels. I will open with paint, not net, And you see they are 768 pixels. So this is
-
00:04:48 the way you need to prepare your images. OK, all images are cropped by 768 pixels and 768 pixels.
-
00:04:56 Now we are ready to do training. So go to our Stable Diffusion web UI. So you may wonder what
-
00:05:05 is LoRA? LoRA is a low-rank adaptation for faster text to image diffusion fine tuning. It uses both
-
00:05:12 UNET and CLIP. It is faster than DreamBooth. Also, its checkpoints are much smaller than the full
-
00:05:18 checkpoint of DreamBooth. When you do a checkpoint with DreamBooth, it generates full .ckpt file.
-
00:05:27 However, LoRA generates much smaller files. And when you are done,
-
00:05:32 you can generate the full checkpoint file. To do training we go to the DreamBooth tab here
-
00:05:38 and we first need to generate our model. I am going to use my brother. I need to select the
-
00:05:45 source checkpoint. I have selected version 2.1 like this: I am using the EMA version
-
00:05:51 and I am not going to click this. This is not necessary. OK, and just click Create button.
-
00:06:01 After you click Create button, you see it is downloading the necessary files from the internet
-
00:06:08 like this. So you need to wait this download. If you are not seeing anything on the web UI,
-
00:06:14 always check the running command line window to see what is happening like this:
-
00:06:22 OK, the model has been generated. You see checkpoint successfully extracted to Models
-
00:06:27 DreamBooth my brother working. We can also check it from our installed folder. Let's go to C drive
-
00:06:33 and I have installed in StableDiffusion web UI, in Models and in here in StableDiffusion. And now
-
00:06:42 no, not in StableDiffusion in DreamBooth folder, and in here you see there is working directory and
-
00:06:47 there is my brother directory, as you can see. OK, let's return back to our interface. In here
-
00:06:54 you see there is LoRA Weight. So this defines what percentage of LoRA Weight should be applied to the
-
00:07:01 UNET when training or creating a checkpoint, and it is same for Text Weight. Setting this as 1 may
-
00:07:09 cause overtraining, over tuning. However, since we are going to do generate our portrait images,
-
00:07:19 our own portrait images, and we are just teaching one face, this is fine for now. You can pick this
-
00:07:26 half model. It will enable FP16 Precision, which results in a smaller checkpoint with minimal loss
-
00:07:33 in quality. But we don't need this for LoRA, since the checkpoints are already low size.
-
00:07:39 And when you click this, checkpoints will be saved to a subdirectory in the selected checkpoints
-
00:07:44 folder. So I am going to click training wizard person. OK, and let's set up our parameters.
-
00:07:52 This is really important. So how many training steps we want to do for each image? How many
-
00:07:59 images do I have? I have images. Let me show once again: total: 16. OK, since I am going to compare
-
00:08:08 checkpoints quality, I am going to set this very high because I will early terminate the training
-
00:08:16 or I will decide whether I have trained enough or not. OK, I am setting max training steps as zero,
-
00:08:23 pause after an epochs zero, amount of time passed between epochs. By the way, one epoch equal to 16
-
00:08:31 steps because I have 16 images. OK, and I am not going to set any with pause between epochs. So
-
00:08:40 use lifetime steps, epochs when saving. Let's say you have stopped or paused your training
-
00:08:47 and then later at a time you continue it. So use lifetime means that it will consider your previous
-
00:08:55 training epochs steps as well. However, if you unclick this, it will use only this session of
-
00:09:02 training steps and epochs when saving. So I am just unclicking it. This is really important.
-
00:09:10 The save checkpoint frequency by n steps. OK, since I didn't click this, it will check by n
-
00:09:18 steps. Since I have 16 images. If I set this 16, it will save checkpoint after each epoch. OK,
-
00:09:30 if it is confusing for you, you can just click this and you can set this 10. So it will save
-
00:09:35 checkpoints every 10 epochs. In this case, since I have 16 images, it will be after 160 training
-
00:09:47 steps. OK, this is fine. I will also save a preview of the image after each checkpoint
-
00:09:53 so that I can decide whether that checkpoint is good or not. I will explain this in the video,
-
00:10:01 so don't worry about that, You will understand it. Batch size: OK, how many images to process
-
00:10:07 process at once per training step? We are going to process one image per training step and we will
-
00:10:16 do same for classifier regularization images to generate at once. If you have more than one GPU,
-
00:10:24 you can increase batch size to process them in parallel I suppose. Learning rate and
-
00:10:31 other rates. I am not going to touch them, but you can try to obtain better learning rates or
-
00:10:38 encoder rates. You can also scale the learning rate, but I am just leaving them as default. OK,
-
00:10:47 image processing: This is important. Since I am using 768 pixel version, I am setting it
-
00:10:55 as 768. However, if you use another version, like version 1.5, in that case you need to
-
00:11:04 use 512 pixels. So this resolution depends on your Stable Diffusion model, version and type.
-
00:11:13 I am not going to do any cropping since I have cropped. Apply horizontal flip. It means that
-
00:11:21 the images will be flipped as well, so it will add more variation to your images. You can set
-
00:11:28 this. Do we have pretrained VAE Name or Path? No, we don't have. We will use the base model vae.
-
00:11:35 When you watch my previous videos, you will learn what is vae and how to set them. OK, concept list:
-
00:11:42 I am not going to use any concept list as well, since I am just going to train for teaching one
-
00:11:50 portrait image. And advanced tab: OK, this is important. This is where we set
-
00:11:55 our training methodology. We are going to use LoRA methodology. So use 8bit Adam. This is,
-
00:12:03 enable this to save VRAM. If your graphic card VRAM is not much or let's say you
-
00:12:10 have encountered not enough VRAM problem while training, you can set this. I am going to set
-
00:12:16 this for now because I'm not sure how much VRAM it is going to take. And you can also set FP 16.
-
00:12:24 So, mixed precision. You probably want this to be FP16. If using Xformers. You definitely want this
-
00:12:32 to be FP 16. And if you have watched my previous videos, you know that we are already using
-
00:12:38 Xformers to speed up our inference and training. So I am going to set this. Memory Attention:
-
00:12:45 I am going to set this Xformers. My graphic card is RTX 3060 and it supports that. OK, Don't Cache
-
00:12:55 Latents: When I hover my mouse over that, you see, there appears a tooltip and explains to me
-
00:13:03 what does that checkbox is doing. When this box is checked, latents will not be cached. When latents
-
00:13:10 are not caged, you will save a bit of VRAM but train slightly slower. So for a lower VRAM usage,
-
00:13:17 I am checking this. Train Text Encode. Enabling this will provide better results and editability
-
00:13:24 but cost more VRAM. Yes, we are setting this. I am not changing any of default parameters
-
00:13:33 And I am not changing these parameters as well. So we are done with parameters tab. By the way,
-
00:13:41 I will make this arbitrarily high number because I will stop it myself. The training myself.
-
00:13:47 OK, concepts. This is really important. This is the part where we are setting what to teach. OK,
-
00:13:56 maximum training steps is minus one. It will never end. OK, data set directory. Path to
-
00:14:04 the directory with input images. So to get the director of my input images. I am right clicking
-
00:14:11 one of the image, right clicking properties, left clicking. And in here you see it shows location.
-
00:14:16 I am copying this like this. Alternatively, you can also click the search bar here and
-
00:14:24 select entire path and copy. And I am pasting it here. Classification data set directory. OK,
-
00:14:32 this is a path to directly with classification regularization images. So let's also set a path
-
00:14:39 for this to understand what is it. So, however, you shouldn't set them inside here because in that
-
00:14:47 case it is using, I think, all of the images in all of the folders. So let's say, brother
-
00:14:57 classification folder. OK, let's enter here. Copy the path like this. File words: OK,
-
00:15:08 we are not going to use any file words in this training because we are not training hypernet, or
-
00:15:16 let me show you what was it. It is embedding. Therefore, you can just leave this empty. But
-
00:15:24 prompts: this is where we need to enter a unique prompt to teach the face of or the other thing
-
00:15:33 that we want to teach to the model. So I will give it a unique name as my brother. OK, like this,
-
00:15:43 you can give any unique name. It should be unique enough that it won't be in the original training
-
00:15:51 data set. So to be sure, you can just also expand it like this. Class Prompt. Now this is important.
-
00:15:57 What am I teaching to the model? I am teaching face of a man. So I will say face of a man.
-
00:16:06 OK, like this. Classification image, negative prompt. OK, you may give a negative prompt to
-
00:16:16 generate better quality classification images. These images will be generated to improve your
-
00:16:23 training success. They will be automatically generated. So let's enter a good negative
-
00:16:28 prompt here. OK, I have a decent negative prompt I have prepared previously. I have used a chatGPT to
-
00:16:36 expand some of the famous negative prompts. I will put this into the comment section of the video.
-
00:16:44 So, don't worry, you will be able to copy and paste it. So I'm copying and pasted in here. So
-
00:16:51 I will explain to you what is this as well. So this class prompt and classification images what
-
00:16:58 are they? Sample Image Prompt. This is important Why? Because during the training we want to see
-
00:17:05 how the training is going on. And for that we will generate sample images. So the sample images
-
00:17:12 will be like this: First we will give the instance prompt so that we will be able to see the face of
-
00:17:22 the person we are teaching. Then you can append here some of the good keywords to obtain better
-
00:17:30 results. But if you just want to see how much the model has learned, you can leave this only
-
00:17:37 with the instance prompt. So you will get a better idea of how much the face has been learned by the
-
00:17:47 model. Sample prompt template. We don't need a prompt template right now. Sample image:
-
00:17:53 negative prompt. You can just copy and paste it in here as well. Okay, Image Generation. So
-
00:18:00 when doing training we will generate, let's say, generic images, generic face of a man,
-
00:18:08 images to make our training more generalized. Okay, to improve its success rate. So I will
-
00:18:20 generate 10 times of the my input like this. And these are for the other things that you need to
-
00:18:30 set for generating template images, we may say, or generic images, we may say. Number of samples
-
00:18:36 to generate one. Okay, it looks good. Okay, you can just enter 10 times of it. It's up to you.
-
00:18:43 And we are ready to do training. But I will also show you one other thing in settings. In settings
-
00:18:53 when you go to training section here, you can reduce VRAM usage by clicking this checkbox. Okay,
-
00:19:03 I think it will probably reduce your training speed. And you can also turn on pin memory for
-
00:19:10 data loader. Make training slightly faster but increase memory usage. So you may play with these
-
00:19:15 settings to obtain the best possible training speed. I have 12 GB VRAM. So I may open this,
-
00:19:28 but I won't open that. Also. Yeah, the others are just fine. Okay, and let's click training button,
-
00:19:39 since we are ready. Okay, when we click training button, first it will generate the generic
-
00:19:49 face of a man images. Why face of a man? Because we have entered our
-
00:19:58 class prompt as face of a man and where they will be saved. I think they will
-
00:20:02 save it in here brother classification. So it is starting to generate our generic face images to
-
00:20:13 add more variety to our data set. So you see, it is generating face of a man with the given
-
00:20:20 prompt I have. You can also improve this prompt with adding other, let's say, styling prompts,
-
00:20:29 more quality prompts, anything you want. If you watch my previous videos on the channel,
-
00:20:37 on the playlist, you will understand what I mean. This is actually same as doing inference
-
00:20:43 text2img from here to generate images. It is exactly doing that to improve variety of our
-
00:20:57 classification training. You see they are really bad quality right now,
-
00:21:04 So maybe we should tune our class prompt.
-
00:21:12 To do that I will just cancel the training with clicking cancel button. You see training
-
00:21:20 cancelled. And you see that these are the images it has generated. I will
-
00:21:27 modify. I have deleted all of them. I will modify the class prompt by adding some keywords. Okay, to
-
00:21:39 decide what to enter, I have moved to text2image tab and I have typed: portrait photo of a man,
-
00:21:46 HDR, 8K and sharp. The keywords that you will find from real photos of people. And I have entered my
-
00:21:56 negative prompt as well and this is the image that has been generated. It is pretty good. So I am
-
00:22:02 returning back to my DreamBooth tab and in here I am changing to my the class prompt like this
-
00:22:10 and now I am clicking train again. Now it will generate 160 class images for training. Basically
-
00:22:23 the generic images to improve my training quality. Let's see what kind of results we are going to
-
00:22:30 get. We should get results same as we got in text2image tab actually. Yeah, it's a decent
-
00:22:37 face photo of a male. Why we are doing this? As I said, to increase the variance variation. When
-
00:22:51 you have different styles, different variations of photos. It will prevent over training and it will
-
00:23:01 force model to learn face of the person that you want to teach. OK, so you see, now we are getting
-
00:23:09 really decent quality face images of male persons and it will help our model to learn better.
-
00:23:21 OK, meanwhile, the training is going on. I mean, the image generation is going on. Let's
-
00:23:28 quickly recap. First, we generated our model with a unique name like this, and we have selected the
-
00:23:35 source checkpoint. You can source checkpoint any model on the Internet that you want to teach. OK,
-
00:23:42 it will work. Exactly same as version 2.1. The only thing that may differ if your model
-
00:23:51 is based on stable diffusion 1.5 or 1.4, they use 512 pixel size images. Therefore, the only
-
00:24:02 thing that you need to change is the image size, which is where was it? Let me show you.
-
00:24:14 Here image processing resolution. So, if you use a checkpoint model based on 1.5, 1.4 or 512 pixel
-
00:24:24 based 2.x version, then you need to change this to 512 pixel. But if you are using 2.1 version based
-
00:24:35 model which has native 768 pixel resolution, then you need to change that. Other than that,
-
00:24:46 we are going to parameters: training steps per image. I have set this very big number because
-
00:24:51 I will stop the training myself at a certain point. I will show you when I will stop it and
-
00:24:58 how I will decide to stop training, And this is important. I will save and generate images every
-
00:25:07 10 epoch. Every 10 epoch means that one epoch will happen when it process all of the images
-
00:25:14 in my training folder. I have 16 images in my training folder, which is here. It will also put,
-
00:25:23 I think, flipped images there, so it will be 32 images. We will see that when training starts,
-
00:25:29 currently still generating the generic images that I have requested, like this. OK, so I will be able
-
00:25:39 to decide whether model has learned enough so that I can stop and start using the model or not. OK,
-
00:25:48 so these save previews and save checkpoints is really important to see the progress of training.
-
00:25:56 The batch size is, I think, related to how many GPUs you have, or if you have a very strong GPU
-
00:26:04 that can process in parallel two images at the same time. If it has enough VRAM memory, you can
-
00:26:12 also increase this. But if your graphic card can only process one image at a time, then you should
-
00:26:19 leave both of these as one. I didn't change any of the learning rates or other things. I did leave
-
00:26:28 them default. I have also applied horizontal flip randomly. Decide to flip images horizontally so
-
00:26:36 that it will add more variation to the learning data set. I don't have any VAE or concept list.
-
00:26:46 I am using LoRA because with this way it will use lesser VRAM than DreamBooth. And the save files
-
00:26:57 will be 1000 times lesser than the DreamBooth, because DreamBooth generates full size model
-
00:27:04 files for checkpoints. However, this will generate minimal files. Then from those files, after we
-
00:27:12 are satisfied with the training process, we will generate full model. I am using 8bit Adam to save
-
00:27:20 VRAM. I am using mixed precision memory attention and I didn't change any other parameters. And in
-
00:27:27 the concepts I did set my data set directory and the classification data set directory. I have
-
00:27:33 already shown you them. We are not using any file words because we are not doing a general concept
-
00:27:40 training. It is not the context of this video. I may make another video to train hypernet or
-
00:27:50 textual embeddings. The instance prompt. This is really important because this keyword is being
-
00:28:00 taught to our model. So when I do inference with the new model, the tuned model, it will
-
00:28:09 know that this keyword is the face of my brother pictures. Therefore, this is really important.
-
00:28:18 This is the generic class prompt. I already have explained that. And these are the arbitrary
-
00:28:26 numbers. Actually, this is the arbitrary number I have entered. I didn't change the other things.
-
00:28:33 These are only affecting the images generated in here. None other than that. So this part is
-
00:28:43 only important for these images. So it will generate 160 images in this folder. You see,
-
00:28:50 it is also generating same named text files and it is saving the description of the input. You
-
00:29:00 could also modify these descriptions, but I think it is not very important for LoRA
-
00:29:04 training. It is important for hypernetwork and especially for text embeddings. Now I will pause
-
00:29:12 video until the image generation has been done and the training has started. OK, meanwhile,
-
00:29:19 the class image generation: So far, we are almost at 50 percent. It says there is still 20 minutes
-
00:29:27 remaining. Approximately. It is using 95 percent of my GPU. It is using almost 9 gigabytes of my
-
00:29:37 GPU and it is using about 20 percent of CPU. So these are the values that it is using for just
-
00:29:45 class image generation. And let's see how much it will use for training. And the class image
-
00:29:53 generation speed is 14.58 seconds / IT. OK, the training process has started. After generating all
-
00:30:04 of the images. Let me show you them. Once you have generated these generic images, you don't have to
-
00:30:11 generate them once again. You can stop and restart training and use these base generic images.
-
00:30:19 However, an error occurred. So my web interface is not getting updated anymore, unfortunately.
-
00:30:29 But the training is going on, as you can see. Currently it is two iterations,
-
00:30:34 actually two seconds per iteration as a speed. It has done 145 iterations so far.
-
00:30:47 And let's see how much VRAM. Oh, you see, my entire VRAM is almost full. It says
-
00:30:56 that there is allocated and reserved, but I am seeing the full VRAM usage in my graphic card.
-
00:31:04 And after 10 epochs we are supposed to get our first training output to see. OK, it says that
-
00:31:15 you see, LoRA weights successfully saved to C Stable Diffusion web UI, which is my folder.
-
00:31:20 Inside models LoRA. So let's go there and check it out. In Stable Diffusion web UI in models in LoRA.
-
00:31:33 OK, so you see, this is the checkpoint file it has generated and it is only three megabytes. So
-
00:31:40 I can generate checkpoint file for even every epoch. However, if we were using DreamBooth
-
00:31:48 instead of the LoRA, this would be minimal, like 5 gigabytes or 6 gigabytes or 4 gigabytes based on
-
00:31:56 the model that you are using as the first initial checkpoint. So for version 2.1, it would be equal
-
00:32:07 to minimum five gigabytes because the base model is five gigabytes. If we were using DreamBooth
-
00:32:13 instead of LoRA, our every checkpoint would be five gigabytes. But now we are only getting
-
00:32:20 three megabytes checkpoint. It is even smaller than 1/1000. OK, so we should also got the first
-
00:32:34 image output of our training. So where it is saved, If you are wondering, I think inside
-
00:32:42 DreamBooth, in my brother's in here samples. Yes, So this is the first sample image it has
-
00:32:50 generated after ten epochs. In this folder, as the time passes we are going to see images that
-
00:32:59 will be similar to my brother's sample images. Let me show you our training data set images.
-
00:33:09 Let me show you once again. So once we get images as close as possible to our training data set,
-
00:33:17 then we will, we will, we will generate a checkpoint model file, full model file, from that
-
00:33:27 file. Which file from the file? Let me open once again the C folder
-
00:33:37 to explain better. In here in models in LoRA. So once we got a good image we are
-
00:33:46 going to you see the file name is 160. We are going to get the same file in here and we will
-
00:33:54 generate a full model checkpoint from that and then we will be able to generate the images of
-
00:34:01 the person we train it for. OK, so now I will pause video until we got some good results.
-
00:34:10 Oh, by the way, it seems like. Yes, yes, the process has stopped. So therefore I have to
-
00:34:17 continue. Probably an error has occurred. Yeah, an error occurred. So what we need to do is:
-
00:34:26 OK, let me show you to continue from there. So I am refreshing the web interface. This
-
00:34:34 error may occur time to time And in here I go back to the DreamBooth. And in here you see,
-
00:34:41 let's refresh. We have my brother as LoRA model. And OK, so let's click load params and
-
00:34:53 see if it will load. I hope it loads. OK, it says loading. I maybe need to restart the application.
-
00:35:04 Yeah, probably I need to restart the application. You see, when you play with the web UI while
-
00:35:10 training, these kind of errors may occur. So I will now restart the application. OK,
-
00:35:16 after I close the command line, you see connection error occurred. So let's go back to our stable
-
00:35:24 diffusion web UI. Click web UI webui-user.bat. OK, I have restarted the web UI. Now refresh
-
00:35:33 and go to the DreamBooth and pick the LoRA model. And let's click load params. Please specify model
-
00:35:41 to load. So you see, my brother model is now here. After restart, and after I click load params,
-
00:35:49 load loaded config and I click train, it should continue from where it is left. OK, you see,
-
00:35:57 it is getting. Concept requires 160 images. It has loaded the same images, so it is not
-
00:36:03 regenerating the classification images, It is loading the weights where it is left, I think.
-
00:36:12 So the number of examples: 16. Number of batches per epoch: 16.. Correct. Number of epochs: one
-
00:36:19 million. As we have set, the total optimization step is 16 million. OK, everything looks correct.
-
00:36:28 And, yes, it is now continuing where it is left. This is great. So if an error occurs,
-
00:36:37 this is how you are going to continue your training. So you can further optimize your
-
00:36:44 model from any checkpoint. And in here, if you see, let me zoom in.
-
00:36:50 You see, OK, I did zoom too much. Training step is 23. This is the current session. And, you see,
-
00:36:59 this is the lifetime session. So this is different that you are setting in here. Use lifetime steps,
-
00:37:06 epochs when saving. We didn't check that. So we are only taking into account the current
-
00:37:13 session steps for saving and previewing the checkpoints. But this is the lifetime. OK,
-
00:37:20 now I will pause the video again. OK, so you see, after the second save and now it continue to do
-
00:37:28 training. So sometimes errors may happen, even though they shouldn't. So if an error happens,
-
00:37:36 just restart application, just as I have shown, and continue to training. So the samples are
-
00:37:44 getting produced. I hope it doesn't take too much time to teach my brother face into the model.
-
00:37:55 OK, it has been only 30 epochs so far and we already got somewhat
-
00:38:02 similar picture in the third one. You see, this is the thirty epoch after 480 steps
-
00:38:09 in total. And this is my brother. You see, there is a similarity, as you can see. OK,
-
00:38:18 I have noticed another mistake. You see the command line interface is displaying the LoRA
-
00:38:25 weights has been saved to my brother, underline 160 dot PT. However, it is correctly saving in the
-
00:38:33 folder. So this printed message is incorrect, but the same with file names are correct,
-
00:38:40 as you can see. So this is the thirty epoch has been done actually for. Actually for the epochs,
-
00:38:47 Yes, since we have 16 images, when you divide this 16, it is 40 epochs. And these are the
-
00:38:54 so far generated images. You see it starts to resemble more and more as the training continue.
-
00:39:00 OK, it has been over 1423 steps so far. So the training step speed is, for 1.60 seconds for per
-
00:39:14 iteration. So far it is going on. We are getting closer to our target image, as you can see here.
-
00:39:24 OK, It has been over 5600 steps so far, which makes 350 epochs. And for
-
00:39:37 this tutorial I am now going to cancel the training and I will generate a checkpoint
-
00:39:45 based on the best that comes to me as best epoch number, which is sample 2400. You can
-
00:39:57 continue to do training until you are satisfied with the results. But these results are just
-
00:40:03 a preview of what it has learned. With a good prompt you can obtain much better photos. And
-
00:40:11 also it also depends on your data set quality. If you prepare better images than in this example,
-
00:40:18 you can still obtain better results. I think this is a decent one. And let's generate our
-
00:40:26 model checkpoint. So how we are going to generate our model checkpoint to use later.
-
00:40:32 You see there is a LoRA model and now we are going to generate checkpoint from our 2400.
-
00:40:42 I am entering the model name here that I want to give. Let's say, my
-
00:40:48 brother test one, and I am clicking generate ckpt file. OK, it is generating. You can see that it is
-
00:40:58 loading LoRA from the selected checkpoint from here and applying weight. As you can see here:
-
00:41:07 LoRA weight: What percentage of LoRA weight should be applied to the UNET when training
-
00:41:12 or creating checkpoint. Applying the text weight as well. And then it is saving. However,
-
00:41:19 the saved file name is not correct. You see, it has appended the latest training number, However
-
00:41:28 it has loaded 2400. So I think there is a simple mistake in the web UI. So where it is saved,
-
00:41:39 It is saved inside Stable Diffusion installation folder, then models, then Stable Diffusion
-
00:41:46 folder. OK, I am going to rename this into the name that I want. Let's say LoRA.
-
00:41:54 And you see it also has YAML file and it has to be the same name. OK, I did rename with F2.
-
00:42:02 OK, now I can do text2img and generate new images based on our training. How we are going to do
-
00:42:10 that: First click refresh here and then my new model has appeared here. It is now loading. You
-
00:42:19 can see the loading from this command window here: Loading config and loading other parameters. OK,
-
00:42:27 the model has been loaded. Now what is the prompt that we are going to do? The prompt
-
00:42:35 we are going to do is the prompt we have given in here, which is my brother face. OK, this is our
-
00:42:42 unique keyword. And then we will append the other keywords that we want. Even though our model has
-
00:42:51 learned the face very good As soon as we add new keywords to improve and obtain different styles of
-
00:43:01 the learned face, it totally produces different images. Unfortunately, no matter how many times
-
00:43:09 I have tried, my all attempts have failed. It always produces different faces, not the face
-
00:43:17 it has learned. If I only give my prompt instance, yes, it produces the face of my brother. But then
-
00:43:26 what is the purpose of training? Because I am not able to modify it. Change the style, produce
-
00:43:33 different styles. Therefore, now I will do another training with SD version 1.5. And let's see the
-
00:43:41 difference between SD version 2.1 and 1.5 when we are doing face training. Since SD version 1.5
-
00:43:51 requires 512 pixel resolution, I am recutting the images, as you can see. I am cropping them again
-
00:44:03 And I have removed some of the very old images. Actually, I only removed two of them. Okay,
-
00:44:10 it is like this that I am setting up the images for SD version 1.5 training. Okay, yes,
-
00:44:21 like this. Save as zip and open the downloaded file and extract them into the folder.
-
00:44:33 Okay, here, like this, I will overwrite and the training set is now ready. Okay, for 1.5,
-
00:44:40 I am first changing model into 1.5. Then I am going to the reboot. And in here we are
-
00:44:49 going to generate a new model. Let's say, okay, brother, SD 15, like this. And the
-
00:44:58 source checkpoint will be 1.5 because we are starting a new model. Let's generate the model
-
00:45:06 like this: Okay, it is preparing the model file for training.
-
00:45:12 Actually, everything is same. Then I am clicking training wizard person. It will
-
00:45:17 set the parameters. Oh, I think I click it. Yeah, I didn't wait process to finish.
-
00:45:26 Okay, now I will set again. Okay, now it is set and the model is also set. All right, you see,
-
00:45:33 the model has arrived here for training. Okay, I am just doing the same things. By the way, we now
-
00:45:41 need to recompose new class images for improving the test accuracy. Also, this apply: horizontal
-
00:45:53 flip means that on the runtime it will sometimes provide the horizontally flipped images in the
-
00:46:01 training. Okay, it won't generate new images on the folder, It will do that on the runtime. Okay,
-
00:46:09 I am selecting LoRA. I am using 8bit adam with FP16, xFormers. Don't Cache Latents.
-
00:46:25 Okay, I am not changing other things because it is already learning good,
-
00:46:29 but we weren't able to generate good images. I think it was due to
-
00:46:34 version, SD version 2.1. Okay, the path for our VAE here. And classification. We now need to make
-
00:46:48 new classification, so I will just make another folder too. Okay, like this: Let's enter here.
-
00:46:58 All right, we are leaving this empty because we are only teaching one phase. I will give the same
-
00:47:05 name as the model name to instance prompt. Now class prompt. This is important to decide class
-
00:47:14 prompt. Now I will do a few tests here. Okay, with a simple prompt, such as face photo of a man:
-
00:47:21 8K HDR, smooth, sharp focus and cinematography, we got decent faces. So this will be our class
-
00:47:32 prompt. Okay, let's go back to our DreamBooth training, and the class prompt will be like this:
-
00:47:39 Classification image, negative prompt. So let's also copy and paste it. By the way, don't worry,
-
00:47:45 I will provide these as comments. Okay, so the sample image prompt will be same as before. Okay,
-
00:47:55 and should I provide negative image prompt for sample? Yes, let's also provide it. Okay,
-
00:48:03 how many we want? This time? We have how many images in the folder? Let's check it out
-
00:48:12 Once again. Okay, we have 14, so I will generate just 140. Okay, I'm not touching
-
00:48:23 this. Parameters are set. Everything looking good. Okay, let's start another training.
-
00:48:35 So you see, since we don't have any concept images now, it is going to generate first our class
-
00:48:43 images, as before. But this time we are using 512 pixel resolution. This is really important
-
00:48:50 because our base model is now version 1.5 and it is using 512 pixel as native resolution.
-
00:49:00 Generating class images are now much faster, you see, because it is now lower size in dimension.
-
00:49:12 Okay so, the classification training set has been completed and now the training has started and so
-
00:49:20 far we are at the 50 training step. You see, it is much faster now than before because we are simply
-
00:49:29 working with 0.44, which means 44% of image size than before. Because before we were working on
-
00:49:41 768 pixels. Now we are working with 512 pixels. Therefore, it is more than two times faster.
-
00:49:50 New model training checkpoints are also getting saved under the models
-
00:49:57 LoRA folder. As you can see here with the name that I have given.
-
00:50:01 Also, the new folder under DreamBooth has been generated for the new training in here, brother
-
00:50:09 SD15. And in here we can see the samples it is generating: so far, nothing resembling at all.
-
00:50:17 Okay so the training has been completed. I let it run during the night while I was sleeping, so
-
00:50:23 generated so many checkpoints. And now I am going to use this particular checkpoint to generate our
-
00:50:31 .ckpt file from here. As usual, as previously same. I have selected the model entered,
-
00:50:41 selected the checkpoint, given a name, and then click it generate ckpt file. Then now
-
00:50:51 I am going to load newly generated ckpt file. So to do that, just click refresh.
-
00:50:58 And it should come. And yes, it is. It has arrived. With now checkpointing that model.
-
00:51:05 It is done. And now we can do our tests. OK, I have generated over 600 images and some of them
-
00:51:15 are really good and really resembling the face we teach it. So the key thing is that you need to
-
00:51:23 generate more images with LoRA because I think it is not as precise as DreamBooth. The prompt I have
-
00:51:31 used is portrait photo of brother SD 15, which is my prompt instance, with weight 1.2. 1.2 weight
-
00:51:40 means that it will give more importance to this keyword. On the official page of Automatic1111
-
00:51:48 Stable Diffusion web UI wiki features, You can see attention emphasis and it is explaining that
-
00:51:55 how you can give more attention to each word. You can also use parentheses like this, or you
-
00:52:02 can directly set importance like this. So it is totally up to you to use the either way. So I have
-
00:52:10 given more importance to the prompt instance and I have also written photo of brother SD 15. And
-
00:52:17 then I have used a generic keywords to generate images as close as to our prompt instance. You see
-
00:52:26 8K HDR, smooth, sharp focus cinematic. I am going to share all these keywords in the comments of the
-
00:52:34 video, and I have also entered a lengthy negative prompt. I have used Eular a as a sampling method
-
00:52:44 with 25 steps and the native resolution for SD 1.5 512 pixels. So how can you generate more than 100
-
00:52:55 images? Set the batch count to 100, then go to bottom. Here. You will see the script section.
-
00:53:03 By default it is set to none, but you can go to prompts from file or text box and you can just
-
00:53:10 copy and paste your prompt. So it will read each line and will continue generating images
-
00:53:18 until all of the lines are executed. With this way, you can generate much more images. Also,
-
00:53:27 there are other options that you can do here. For example, you can do X and Y plots. So you can give
-
00:53:35 X values and Y values and, if you wonder what they are, separate values for X axis, using commas.
-
00:53:42 You can play with these to, for example, generate different style images with having, let's say,
-
00:53:52 artist names or style names in your X values and the Y values would be like your regular
-
00:54:00 prompt. Okay, so I have selected few of the images and now I will show you how to upscale them. The
-
00:54:09 resemblance rate is not as good as DreamBooth, unfortunately. So you can also do DreamBooth
-
00:54:16 training. The only thing different in DreamBooth training than the LoRA is in the advanced setup.
-
00:54:22 You just don't pick this LoRA and it will do DreamBooth training. And also be careful that
-
00:54:27 when you are DreamBooth training it will generate 5GB files, 4GB files, at the each save checkpoint.
-
00:54:35 So you may want to reduce this, increase the save checkpoint frequency, not just 10, but maybe
-
00:54:43 50. It totally depends on your hardware hard drive. Okay, so what am I going to do is first
-
00:54:50 let's check out the PNG info. Tap in here in pictures and in brother selected. Let's pick
-
00:54:57 one of them. So you see Web UI embeds the meta information of the parameters. So if you can
-
00:55:05 get the original image that is generated by the Web UI, you can just use PNG info to extract the
-
00:55:13 parameters from that image. And the one another thing I am going to show you is extras. In extras,
-
00:55:20 let's first try a single image. You can upscale it. Okay, the best upscaling algorithm I have
-
00:55:28 found is R-ESRGAN 4x+. I pretty much like this. And let's upscale to 3X dimension. The
-
00:55:39 first time you do. It may download something in here. Since I have done it previously,
-
00:55:44 it didn't download the necessary models. Okay, the upscaling is done. As you can see,
-
00:55:51 now this is upscaled version. You can also apply GFPGAN visibility. The GFPGAN will
-
00:56:00 improve the face of a human. It is another model. Let's do that and see the difference.
-
00:56:10 Okay, it is getting done. And yes, so now you see it is more like a real human. This is fixing
-
00:56:21 eyes much better, making the eyes much better if they are not oriented, if they are not symmetric.
-
00:56:29 So you may want to apply this as well, if you want.
-
00:56:34 And also you can do batch processing. For batch processing, just open the folder with Ctrl-A,
-
00:56:41 select all, open. It will be loaded like this. Then it will apply all of the parameters you
-
00:56:48 set here and click generate. And all of the images will be generated as a batch. So the
-
00:56:58 results of the batch generation will be saved in a folder. If you want to open that folder,
-
00:57:03 just click this folder image here. You see open images output directory and the batch processing
-
00:57:08 results will appear here. You can just directly copy them, paste them and do whatever you want.
-
00:57:17 Okay, this is all for today. Please ask any questions that you might have.
-
00:57:23 To improve the performance so I suggest you to use DreamBooth instead of the LoRA training. Also,
-
00:57:31 you can improve your data set. The data set, our data set, was not that very good. You see
-
00:57:40 almost same time captured, same pose images. So if you add more variety to your data set,
-
00:57:48 you will obtain better results, more likely. And also, please like, share and subscribe our channel
-
00:57:56 if you have enjoyed. And if you support us on Patreon we would appreciate very much. Currently,
-
00:58:02 so far, we have one Patreon, as you can see. Thank you very much to our beloved Patreon, by the way,
-
00:58:08 And I am hoping that you will support us as well. Hopefully, more videos, more advanced videos will
-
00:58:15 come for Stable Diffusion. If you want to also learn something about Stable Diffusion, let me
-
00:58:22 know by comments. And hopefully I will make videos about them. Hopefully, see you in another video.
