-
-
Notifications
You must be signed in to change notification settings - Fork 362
Zero To Hero Stable Diffusion DreamBooth Tutorial By Using Automatic1111 Web UI Ultra Detailed
Full tutorial link > https://www.youtube.com/watch?v=Bdl-jWR3Ukc
Our Discord : https://discord.gg/HbqgGaZVmr. The most advanced tutorial of Stable Diffusion Dreambooth Training. If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 https://www.patreon.com/SECourses
Playlist of Stable Diffusion Tutorials, Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, LoRA, AI Upscaling, Pix2Pix, Img2Img:
https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3
I am explaining from scratch to very advanced level how to use #Automatic1111 Web UI and D8ahazard #DreamBooth extension to teach new subjects, e.g. your face into a model. Moreover, I am showing how to inject your taught face into a completely new model e.g. Protogen x3.4 to produce awesome quality images without wasting too much time on finding correct prompts.
Automatic1111
https://github.com/AUTOMATIC1111/stable-diffusion-webui
How to install Web UI: https://youtu.be/AZg6vzWHOTA
How to use #StableDiffusion different models on Web UI:
Official SD v1-5-pruned : https://bit.ly/sd15ckpt
How To Do LoRA Training: https://youtu.be/mfaqqL5yOO4
Wiki Ram memory: http://bit.ly/3IqFUeW
Rare tokens: https://bit.ly/SDRareTokens
Rare tokens list: https://bit.ly/SDRareTokensList
Basics wiki: http://bit.ly/3Yy78pn
DreamBooth paper
https://arxiv.org/pdf/2208.12242.pdf
Best caption: https://bit.ly/bestcaption2
00:00:00 Introduction to Grand Master yet most beginner friendly Stable Diffusion Dreambooth tutorial by using Automatic1111 Web UI
00:03:11 How to install DreamBooth extension to the Web UI
00:04:09 How to update installed extensions on the Web UI
00:04:35 Introduction to DreamBooth extension tab
00:04:45 Training model generation for DreamBooth
00:05:34 How to download official SD model files
00:06:21 Training model selection and settings tab of the DreamBooth extension
00:07:36 What is training steps per image epochs
00:08:24 Checkpoint saving frequency
00:09:15 What is training batch size in DreamBooth training and how to set them properly
00:10:47 Set gradients to none when zeroing
00:11:24 Gradient checkpoint
00:12:04 Image processing and resolution
00:12:39 Horizontal flip and Center crop
00:12:50 What is Sanity sample prompt and how to utilize it to understand overtraining
00:13:30 Best options to set in Advanced tab of DreamBooth extension
00:14:22 Step Ratio of Text Encoder Training
00:14:49 Concepts tab of the DreamBooth extension
00:15:27 How to crop images from any position with Paint .NET or use Birme .NET
00:17:22 Setting training dataset directory
00:17:44 What are classification images
00:18:46 What is Instance prompt
00:19:05 How to and why to pick your instance prompt as a very rare word (very crucial)
00:21:52 Class of the subject
00:22:15 Everything about class prompt
00:22:55 Sample prompt
00:23:30 Clas images per instance
00:25:00 Number of samples to generate
00:26:27 Teach multiple concepts in 1 run
00:28:24 Saving tab
00:29:10 How to generate checkpoints during training
00:30:52 Generating class images before start training
00:33:28 What is batch size in txt2img tab
00:36:09 Start training
00:38:25 First samples/previews of training
00:39:13 Sanity prompt sample
00:39:54 How to understand overtraining with sanity samples
00:40:34 How to properly prepare your training dataset images
00:43:15 Checkpoint saving during training
00:44:30 What is Lr displayed in cmd during training
00:45:38 How to continue / resume training if an error occurs or you cancel it
00:46:41 We started to overtraining and how we understood it
00:48:24 How to start generating our subject (face) images from best trained checkpoint
00:50:09 What is prompt strength / attention / emphasis and how to increase it
00:51:17 How to increase image quality with negative prompts
00:51:50 How to get your taught subject with which correct prompting
00:52:31 What is CFG and why should we increase it
00:52:54 How to try multiple CFG scale values by using X/Y prompting
00:54:54 Analyzing CFG effect
00:56:03 How to test different artist styles with different CFG scales by using X/Y plot
01:00:47 How to use prompt matrix
01:02:54 Prompts from file or text box to test many different prompts
01:03:57 Generate thousands of images while sleeping
01:04:22 PNG info to learn used prompts, CFG, seed and others
01:07:00 Extras tab to upscale images by using AI models with awesome quality
01:09:54 How improve eyes and face quality by using GFPGAN
01:11:35 How to continue training from any saved ckpt checkpoint
01:12:06 How to upload your trained model to Google Colab to use
01:14:19 How to teach a new subject to your already trained model
01:15:55 How to use filewords for training
01:21:52 What is fine tuning and how it is done
01:23:10 Hybrid training
01:24:39 How to understand out of memory error
01:25:39 Lowest GPU VRAM settings
01:27:35 How to batch preprocess images
01:31:47 How to generate very correct descriptions by using GIT large model
01:33:19 How to inject your trained subject into any custom / new model
01:37:36 Where is model hash written and how to compare
-
00:00:02 Greetings everyone.
-
00:00:03 Welcome to the most beginner-friendly and yet the most advanced and up-to-date Stable
-
00:00:07 Diffusion DreamBooth model training tutorial.
-
00:00:09 In this guide video, I am going to use the latest Automatic1111 web UI and the DreamBooth
-
00:00:14 extension.
-
00:00:16 The interface and the features of the DreamBooth plugin have been significantly changed, so
-
00:00:20 all other tutorials are now obsolete.
-
00:00:23 I have been experimenting for over 7 days to find the best settings and the training
-
00:00:27 parameters.
-
00:00:28 Moreover, I tried to learn what each option does and I have explained everything in this
-
00:00:33 video.
-
00:00:34 Before starting, let me provide some quick info.
-
00:00:37 Stable Diffusion is a text-to-image generative public AI model, and the Automatic1111 web
-
00:00:42 UI is a tool developed by the open source community to use Stable Diffusion easily.
-
00:00:47 DreamBooth is an AI algorithm that allows you to teach new subjects or even styles to
-
00:00:52 existing Stable Diffusion models very successfully, such as teaching the face of a person.
-
00:00:57 In this tutorial, I am going to use freshly installed Automatic1111 web UI to teach my
-
00:01:02 face by using Stable Diffusion 1.5 official version.
-
00:01:05 I will also show how you can do the same training on Stable Diffusion version 2.1 as well.
-
00:01:11 Moreover, I will show you how you can inject your trained subject, in this case my face,
-
00:01:16 into any custom model and obtain amazing results.
-
00:01:19 I will demonstrate an example by using the very popular and very high-quality custom
-
00:01:24 model Protogen x3.4.
-
00:01:26 With this injection methodology, you can use any namely released custom model and obtain
-
00:01:32 even better results.
-
00:01:33 You won't even need to retrain your subject for this to work.
-
00:01:37 This method provides such high-quality images that you cannot even obtain them on paid services
-
00:01:42 like Lensa or Midjourney.
-
00:01:44 The Automatic1111 web UI is getting constantly updated, so let me show you the version I
-
00:01:49 am using from official repository.
-
00:01:53 This is the official repository of the Stable Diffusion web UI.
-
00:01:57 It has been recently taken down, but it is now back again.
-
00:02:00 So if you can't find this URL, just check out the video and I will update the description
-
00:02:05 of the video and the comment of the video so you will find the latest link of the Automatic1111.
-
00:02:11 So the commit we are using is published 9 hours ago, January 7, 2023.
-
00:02:20 If you don't know how to install Automatic1111 web UI, I have a great tutorial for that.
-
00:02:25 So this is the homepage of our YouTube channel.
-
00:02:27 Go to playlist and in here you will see Stable Diffusion DreamBooth playlist and in this
-
00:02:34 playlist, easiest ways to install and run Stable Diffusion web UI on PC.
-
00:02:38 I will put the link of this video to the description and also you can watch how to use Stable Diffusion
-
00:02:43 version 2.1 and different models in the web UI.
-
00:02:46 This is also very important.
-
00:02:47 I will also put the link of this video to the description as well.
-
00:02:51 One more thing.
-
00:02:52 This is commonly asked.
-
00:02:54 If you encounter any problem, go to about page of our channel and in here you will see
-
00:03:00 our Discord channel link.
-
00:03:01 As you can see, I am currently hovering that.
-
00:03:03 You can join our Discord channel and ask me any questions that you encounter.
-
00:03:08 So this is our beginning screen of the Stable Diffusion.
-
00:03:11 And first let's start with installing our extension, DreamBooth.
-
00:03:14 To do that, go to extensions tab, click available load from and in here you will see DreamBooth
-
00:03:21 extension.
-
00:03:23 When you type DreamBooth, it is listed in here.
-
00:03:26 I am just clicking install and it is getting installed.
-
00:03:29 You should see a message here: OK, it has been installed.
-
00:03:35 We have one error, but it is not a problem.
-
00:03:38 It still works.
-
00:03:39 So you see, we have a message on CMD window and also installed into the C web UI tutorial
-
00:03:45 extensions as the DreamBooth extension.
-
00:03:47 Now we have to restart CMD window because we are the first time installing and it is
-
00:03:53 a necessity.
-
00:03:54 Otherwise it won't work.
-
00:03:56 Let's close.
-
00:03:58 Let's restart.
-
00:03:59 OK, restart has been completed.
-
00:04:02 Let's just refresh and then go back to extensions and check for updates every time you start.
-
00:04:09 OK, it has been just got updated.
-
00:04:12 So I'm just clicking apply and restart UI.
-
00:04:14 OK, it is done.
-
00:04:16 After the first time installation.
-
00:04:19 You don't need to restart CMD window once again.
-
00:04:23 So you see, this is how frequently this stuff are getting updated.
-
00:04:26 Literally, it has been updated just now, as you can see.
-
00:04:30 So you should always check the latest version.
-
00:04:33 Now we can start our tutorial.
-
00:04:35 We are now.
-
00:04:36 We see the DreamBooth tab in the interface.
-
00:04:38 We click that.
-
00:04:39 This is the interface where you are going to generate our model and train our face or
-
00:04:44 new subject.
-
00:04:46 First of all, we need to generate our model.
-
00:04:51 You can simply enter any name here.
-
00:04:53 It doesn't matter.
-
00:04:54 So I will enter as web UI and the identifier prompt of my model, which will be ohwx.
-
00:05:02 I will explain why it will be ohwx.
-
00:05:05 Then we need to check source point.
-
00:05:08 You can also import from Hugging Face, but I don't suggest that it is not necessary.
-
00:05:12 I am checking version 1.5 Pruned ckpt.
-
00:05:17 So version 1.5 pruned ckpt available in the official repository of StableDiffusion 1.5.
-
00:05:24 You can just download it from here.
-
00:05:26 Why we are using Pruned ckpt, not the pruned-emaonly ckpt, because this is better for training
-
00:05:33 new subjects.
-
00:05:34 When you click here, you can just download it with clicking here.
-
00:05:39 And after you put that into your model folder, it will be also available here, as you can
-
00:05:44 see.
-
00:05:46 OK.
-
00:05:48 Then just click the create model button.
-
00:05:51 OK, you see.
-
00:05:54 We have a message checkpoint successfully extracted to this folder.
-
00:05:59 Where it is.
-
00:06:00 Let me show you.
-
00:06:01 It is inside Web UI Tutorial.
-
00:06:04 And let's go to our models and inside DreamBooth, inside Web UI ohwx and in here working.
-
00:06:11 And these are actually weights of the model that we have just composed.
-
00:06:17 Lets continue.
-
00:06:19 Now this model is selected here.
-
00:06:21 This is where the selection we make.
-
00:06:24 After we make this selection, we will train the selected model.
-
00:06:28 Yes.
-
00:06:29 OK.
-
00:06:30 Now let's go to the settings tab in here.
-
00:06:33 First click performance wizard.
-
00:06:35 It will set the parameters according to the VRAM of your GPU.
-
00:06:39 If you have less than 12GB of GPU, it is really hard to use DreamBooth.
-
00:06:44 Unfortunately.
-
00:06:45 You can use LoRA, but it is a topic of another video.
-
00:06:48 Actually, it is almost same as this video, but there is.
-
00:06:52 There are some just few tricks, and I already have a video for LoRA.
-
00:06:56 So after watching this video, if you watch that video, the LoRA video.
-
00:07:00 You can easily apply LoRA to your training.
-
00:07:05 It is in here.
-
00:07:06 You see how to do Stable Diffusion LoRA training by using.
-
00:07:08 I will also put the link of this video to the description as well.
-
00:07:13 So training steps per image epochs.
-
00:07:16 First of all, let me explain what is epoch.
-
00:07:19 We will have a training data set, the pictures of the subject that we are going to teach.
-
00:07:26 In this case, I am going to teach myself.
-
00:07:29 I will use 12 images of myself.
-
00:07:33 Therefore, one epoch means that 12 steps.
-
00:07:38 So each step is a training step and each epoch is training all of the training images one
-
00:07:44 time.
-
00:07:45 So one epoch means 12 steps in my case, because I have to have training images.
-
00:07:50 And how many epochs we want?
-
00:07:52 For teaching faces it is usually suggested 150.
-
00:07:57 So when you go to the concepts, just click training with a person.
-
00:08:00 It will set the most appropriate values for person.
-
00:08:05 So you see, now it is set to 150.
-
00:08:07 However, you can set this as much as you want and you can use a certain checkpoints.
-
00:08:13 I will explain that.
-
00:08:14 So I'm just going to make it 300.
-
00:08:17 And how much time you want to wait between each epoch: zero, This is also zero.
-
00:08:22 OK, this is important.
-
00:08:24 How frequently we want to save our training.
-
00:08:29 You know, if your computer crashes, if you cancel your training, if whatever happens,
-
00:08:34 you will be able to continue from your latest saved model.
-
00:08:40 Therefore, this is important.
-
00:08:42 Also, if you do over training and you want to use previous training checkpoint, you also
-
00:08:48 need to have a save.
-
00:08:49 So I'm going to set this as 10.
-
00:08:50 Be careful that when you are doing the DreamBooth training, it is usually taking about 4 to
-
00:08:55 5 gigabytes for per saving.
-
00:08:58 So if you don't have much hard drive space, you need to set this a higher number perhaps.
-
00:09:04 This is saving preview images each epoch, for example, or for whatever the number of
-
00:09:08 epochs you want.
-
00:09:09 This doesn't take space, but this will slow you down.
-
00:09:13 So I'm just going to leave this as five.
-
00:09:17 Batch size: Now, this is very important.
-
00:09:19 If you increase batch size, it will speed up your training significantly.
-
00:09:23 However, this will also increase your GPU memory usage significantly as well.
-
00:09:29 If you increase these numbers, you need to increase both of them equally to obtain the
-
00:09:35 best results.
-
00:09:36 So now, for example, it will be almost four times faster.
-
00:09:40 Also, make sure that your training images count is divisible to this number.
-
00:09:45 So two multiplied by two makes four and you must have training number of images divisible
-
00:09:54 to four.
-
00:09:55 So it can be four images, eight images, 12 images, 16 images, 20 images, but it shouldn't
-
00:10:02 be 17 images.
-
00:10:03 OK, this is the formula.
-
00:10:06 Let's say you have 16 gigabytes of GPU RAM, then you can make this three by three.
-
00:10:12 And then you should have nine or 18 or 27 or 36 images.
-
00:10:17 That is the formula.
-
00:10:18 I'm just going to leave this one by one for now.
-
00:10:22 Also, another thing is, if you make this two and two, like this, it will be four times.
-
00:10:28 Then you need to also increase learning rate by four times, like this and this.
-
00:10:35 Otherwise it will be very slow.
-
00:10:36 It is also requiring to speed up the learning rate as well.
-
00:10:40 As much as you increase them.
-
00:10:42 Since I will use one by one, I am just going to leave the default learning rate.
-
00:10:46 OK, set gradients to none when zeroing.
-
00:10:50 If you select this, it will increase the GPU RAM usage.
-
00:10:55 How can you know that?
-
00:10:57 The DreamBooth has a wiki pages and in here they have RAM usage settings.
-
00:11:03 Let me show you.
-
00:11:06 OK in here: settings known to use more VRAM.
-
00:11:10 High batch size, as I just explained.
-
00:11:12 Setting gradients to none when zeroing, which is these settings in here.
-
00:11:18 So when you check this, it will use more VRAM and then use EMA.
-
00:11:23 OK.
-
00:11:24 Now let's continue.
-
00:11:25 And I will explain.
-
00:11:26 Gradient checkpoint: This is technique to reduce memory usage by clearing activations.
-
00:11:31 So it is good to check it out.
-
00:11:35 And then we are not just passing here.
-
00:11:37 These are just kind of more advanced things to play with it.
-
00:11:42 After you get used to how to use the DreamBooth, you can just change them, but in the learning
-
00:11:47 stage just leave them as they are.
-
00:11:49 If you set these too high, it will get too fast trained.
-
00:11:53 However, it will also over train easily.
-
00:11:56 If you get them too low, then you may never get it trained.
-
00:12:01 So this is kind of experimental thing that you need to do a lot of experimentation.
-
00:12:05 Image processing and resolution.
-
00:12:07 This is important.
-
00:12:08 When you use a model version based on the version 1.X, then they are 512 pixels.
-
00:12:18 If you use version 2.1 then there is also 768 pixels version.
-
00:12:24 So you need to set this according to the version of your base model.
-
00:12:28 OK, the base model, the source checkpoint.
-
00:12:30 We checked here.
-
00:12:32 Since we are using version 1.5, official version.
-
00:12:35 It is 512 pixels.
-
00:12:39 Don't apply horizontal flip.
-
00:12:40 This is not good for faces.
-
00:12:42 Center crop.
-
00:12:44 If your images are not cropped, you should check this out.
-
00:12:46 I will explain how to set your images.
-
00:12:49 Since my images are center cropped, I am not checking this.
-
00:12:53 Sanity sample prompt.
-
00:12:54 OK, this is important.
-
00:12:55 We are going to use this prompt to see the overall training of the model.
-
00:13:02 But how, in in terms of overtraining or not.
-
00:13:07 During the training training, I will explain.
-
00:13:11 So I am going to enter here photo of ohwx man by Tomer Hanuka.
-
00:13:16 I will explain why did I enter this prompt.
-
00:13:20 And by Tomer Hanuka.
-
00:13:22 You will understand it.
-
00:13:23 Miscellaneous, pre trained VAE or path.
-
00:13:26 These are advanced things and you don't need currently.
-
00:13:28 OK.
-
00:13:29 OK, advanced stuff.
-
00:13:31 This is important.
-
00:13:33 If you check box the use EMA, then it will improve your training quality.
-
00:13:38 However, it also increases the RAM usage significantly.
-
00:13:41 Use eight bit Adam: This will reduce the RAM usage.
-
00:13:45 BF16: This is also.
-
00:13:47 This will also reduce RAM usage.
-
00:13:50 xFormers: This will significantly increase your training speed.
-
00:13:54 Cache Latent: This will also reduce the VRAM usage.
-
00:13:59 All of these are actually written in this page.
-
00:14:02 The out of memory top of the wiki.
-
00:14:05 I will put this into the description.
-
00:14:08 So you see these are all decreasing the RAM usage.
-
00:14:11 Actually, it says that cache Latent increases, but as far as I know this is not increasing.
-
00:14:19 But you can test that.
-
00:14:22 So the Step Ratio of Text Encoder Training.
-
00:14:25 This will improve your training quality.
-
00:14:27 However, it will also increase the RAM usage of the graphic card.
-
00:14:31 So if you encounter out of memory error, you should set this zero.
-
00:14:37 But the optimal value for faces is 0.7, for style 0.2.
-
00:14:44 And the other things you don't need to play with them.
-
00:14:46 They are more advanced stuff.
-
00:14:49 OK, now the concepts.
-
00:14:51 This is the very important part.
-
00:14:54 You can set a [filewords], prompts and directories.
-
00:14:59 So first of all we have to set our training data set.
-
00:15:03 Training data set directory.
-
00:15:04 Where are my training data set?
-
00:15:07 It is inside my pictures folder and it is in here Best DB.
-
00:15:12 So all of these images are now 512 by 512 pixels.
-
00:15:19 Let me show their original version.
-
00:15:21 So their original version is here.
-
00:15:26 How did I set them like this?
-
00:15:27 I have used a Paint .NET to crop them as I want.
-
00:15:31 For example.
-
00:15:32 Let me show you: paint dot net is a free tool, by the way.
-
00:15:36 You can install it from the Google.
-
00:15:40 Just click, like this, and then I am just cropping them with a square.
-
00:15:46 So I click rectangle, select, then click here, then in here, fixed ratio like this, Then
-
00:15:52 you can pick the any part of the image you want.
-
00:15:55 Just for example here.
-
00:15:57 Then you can control-C control-N and it will paste into a new place.
-
00:16:01 You can save it.
-
00:16:03 Or in here.
-
00:16:04 You can just resize these to very low resolution like this, with control-R. It will open resize
-
00:16:10 type like this, then control-V and expand.
-
00:16:13 You see, now it is cropped.
-
00:16:15 Alternatively, you can use Birme .NET.
-
00:16:17 Birme dot net is a famous site to crop images.
-
00:16:22 It is commonly used in the community.
-
00:16:25 You can just, for example, upload any image there and crop them.
-
00:16:30 For example, let's upload this image.
-
00:16:33 These are currently squared, but if they are not square, it will also automatically let
-
00:16:37 you square them.
-
00:16:38 Let me show: OK, you see, both of these images are not cropped.
-
00:16:42 So you are able to crop them with your mouse like this: set the position, then set the
-
00:16:48 resolution from here: 512, 512.
-
00:16:51 If you use SD version 2.1, then they will be seven 768 pixels.
-
00:16:55 OK, you can also use auto detect image focal point.
-
00:16:58 Do not resize.
-
00:16:59 And you can click here.
-
00:17:01 If you check, do not resize, It won't.
-
00:17:03 They won't be resized to this resolution.
-
00:17:06 Then save a zip and all of them will be saved as zip.
-
00:17:08 Then you can extract them with the software you have.
-
00:17:12 If you don't have any software like Winrar Windows still able to extract them.
-
00:17:17 All right.
-
00:17:18 If you can't make them, just join Discord and I will help you, hopefully.
-
00:17:22 So data set directory.
-
00:17:23 When you ready your images, then we will enter the path of it.
-
00:17:27 So this is my.
-
00:17:28 Let me enter the folder directory.
-
00:17:32 I click here and you see I am able to select the path.
-
00:17:34 I do control-C to copy it, paste it here (ctrl-v).
-
00:17:38 So this is the directory where my training images are located.
-
00:17:43 Classification directory: Now, what is classification?
-
00:17:46 Classification are generic images that we will use to not over train our model and also
-
00:17:54 keep the inner sanity of the model.
-
00:17:58 So that the entire model does not become looking like us.
-
00:18:02 OK.
-
00:18:03 So for this I will just generate a new folder.
-
00:18:06 Yes, I have copy pasted the path.
-
00:18:10 I will set it as web UI tutorial.
-
00:18:13 You can also enter an existing another directory.
-
00:18:16 It is fine.
-
00:18:18 Instance token: Now [filewords] are used to set the different description for each training
-
00:18:25 images.
-
00:18:26 This is very, very advanced and hard to do.
-
00:18:29 So I will explain this in the later parts of the tutorial video.
-
00:18:34 For now I will just skip them.
-
00:18:35 You can also skip to that part in the video, because I will put the sections of the video
-
00:18:42 into the description.
-
00:18:43 Now prompts: This is very important.
-
00:18:46 The instance prompt is used to define the keyword that will activate our new subject
-
00:18:53 that we taught to the model.
-
00:18:56 So in here you have to pick a unique word, but it has to be very specific and rare.
-
00:19:05 Whatever you enter to the model.
-
00:19:07 They will get turned into tokens.
-
00:19:10 They will split into tokens.
-
00:19:11 So there is a reddit thread that explains the rare tokens.
-
00:19:15 I will put link of this page to the description and in here the rarity of the tokens are listed.
-
00:19:24 So, for example, you have entered, let's say, mill.
-
00:19:29 It is a single token, but mill probably exist in the real life a lot.
-
00:19:34 Therefore, you have to go to the bottom and try to find rare tokens that you can't make
-
00:19:40 sense of.
-
00:19:42 For example, they.
-
00:19:43 Also, these tokens should be used in other languages as well.
-
00:19:49 For example, from here: ohwx is a very famous token because this is a token that almost
-
00:19:57 does not exist in anywhere.
-
00:19:59 When I type ohwx to the Google, you see all unrelated things.
-
00:20:06 They look like spam.
-
00:20:07 So this is a good token and, for example, you can also try other tokens here that looks
-
00:20:14 like to you weird.
-
00:20:15 Maybe this one?
-
00:20:16 Yes, this, OK.
-
00:20:17 I'm not sure if this is a real name or not, so you can verify it, but ohwx works very
-
00:20:27 well and the token you pick is extremely important.
-
00:20:31 Because your training will begin from that token and you can inject a new token that
-
00:20:37 does not exist in the database, so everything you enter will become a token that it knows
-
00:20:44 they will get splint into.
-
00:20:46 Even if you generate a new keyword, such as SECourses, the model will not see this as
-
00:20:54 an SECourses.
-
00:20:55 How will it see it?
-
00:20:57 First it will look to S key, SE key.
-
00:21:01 So the SE key does exist, OK.
-
00:21:03 Then it will look sec.
-
00:21:05 So, yes, sec also exists.
-
00:21:10 And then it will look seco.
-
00:21:12 OK, there is no seco, so it will get split into sec.
-
00:21:16 And then it will be like it will check the other characters, the remaining characters,
-
00:21:23 so they will all get split into yes our SECourses will probably become sec our ses or something
-
00:21:33 like that.
-
00:21:34 You see, you are understanding.
-
00:21:35 I am hoping that.
-
00:21:37 So the keyword you enter will get split into tokens, no matter what you enter.
-
00:21:44 Therefore, we are picking a single token that is very rare from this list and I have done
-
00:21:51 many tests.
-
00:21:52 So ohwx is working very well and then we need to enter the class of the subject we are going
-
00:21:58 to teach.
-
00:21:59 What am I going to teach?
-
00:22:00 I am going to teach the face of me.
-
00:22:03 So it's the face of man.
-
00:22:04 Therefore, I am just entering man.
-
00:22:07 So this is really important.
-
00:22:09 It will use the underlying knowledge of man in the model to learn my face.
-
00:22:15 Class prompt: now, as I said, this will be used to keep sanity of our model and prevent
-
00:22:21 overtraining.
-
00:22:22 When you also hover it, it says: read me for more info.
-
00:22:27 I wonder if they added into the wiki yet.
-
00:22:31 In the basics perhaps?
-
00:22:33 OK, in the wiki, in the basics they have a small explanation.
-
00:22:38 A class specific prior preservation loss is also introduced to prevent overfitting and
-
00:22:44 encourage the generation of diverse instances of the same class.
-
00:22:49 They have made an example like this.
-
00:22:51 So in class prompt I am going to enter photo of man.
-
00:22:55 OK, you see, these two are same and the sample prompt.
-
00:23:00 This will be used to generate preview images during the training so we will be able to
-
00:23:04 see how the training is going on and if it is becoming too overtrained or not.
-
00:23:11 So in here I am going to enter photo of ohwx man.
-
00:23:15 OK, I am not entering any negative prompts and I'm not using any sample prompt template.
-
00:23:22 So these are more, let's say, advanced things that you can also play with them after you
-
00:23:28 learned the basics.
-
00:23:30 And in here, class images per instance.
-
00:23:32 In the community it is usually said that have minimal 300 images total.
-
00:23:39 In the official paper of the DreamBooth.
-
00:23:42 Which is here.
-
00:23:45 I will also put the link of this paper to the description.
-
00:23:48 They have used 200 classification images.
-
00:23:53 I have made some tests but I can't say for sure how much minimum is necessary.
-
00:23:59 So I am just going to follow the community and to reach the 300 images I need to enter,
-
00:24:06 let's easily calculate 300 divided by the number of training images.
-
00:24:12 I have 12, so 25..
-
00:24:14 You can also calculate like this.
-
00:24:16 So classification, CVG scale.
-
00:24:18 This is same as text2images CFG scale how many, how much CFG scale you want to use for
-
00:24:25 generating classification images?
-
00:24:27 By the way, you can also use text2image tab to generate your classification images.
-
00:24:32 Put them into the folder that we set here.
-
00:24:35 Then the extension will not generate any new images.
-
00:24:40 It is up to you.
-
00:24:41 You can use the both ways, but if you use this way, it will also generate a text description
-
00:24:47 file same as the image name, and it will put the description you have typed here inside.
-
00:24:54 That I will show in a moment.
-
00:24:56 Classification steps.
-
00:24:58 So this is the number of steps equal to the in here.
-
00:25:01 Sampling steps: OK, and number of samples to generate.
-
00:25:05 So this is the number of samples that we want to be generated during the training to see
-
00:25:10 how the training is going on.
-
00:25:12 You can set this to 1, 2, 3, 4, whatever you want.
-
00:25:15 Sample seed -1.
-
00:25:17 It means that the every image generated for samples will be different random with a random
-
00:25:23 seed and the samples CFG scale 7.5.
-
00:25:27 You don't need to change this.
-
00:25:30 These are just same as the text2image.
-
00:25:32 You will make sense of it after you get used to text to image.
-
00:25:36 OK, and now let's return back here: How many images we want to generate for classification
-
00:25:45 at the same time in parallel.
-
00:25:47 So I have 12 GB VRAM memory.
-
00:25:50 Therefore, I am able to generate 10 images as a batches, so it will take lesser time
-
00:25:56 to generate classification images.
-
00:25:58 By the way, you only need to generate classification images one time for each class prompt.
-
00:26:06 So if you don't change photo of man, if you don't change your subject class, then you
-
00:26:11 you don't need to generate them once again.
-
00:26:14 So for showing you, I will just set this as five and you will understand.
-
00:26:20 It will generate images, five and five as batches.
-
00:26:24 OK.
-
00:26:25 And one more thing: you can teach up to three concept at a time to the model.
-
00:26:34 So the first concept is is: let's say it's me, and in here I can also teach my wife picture
-
00:26:42 for example.
-
00:26:43 It can be like wife DB.
-
00:26:45 So another folder and its classification data set can be exactly same as the other one,
-
00:26:52 or no, it wouldn't be, because it would be related to women.
-
00:26:55 Since it will be a woman, not man.
-
00:26:59 Therefore, let's say woman images, and in here you need to use another keyword for that.
-
00:27:06 So it is important to find a rare keyword from this list.
-
00:27:13 I don't know which ones are very rare, but a ske is commonly used at another prompt.
-
00:27:21 So it can be like a ske woman.
-
00:27:26 And in here it will be a photo of women and sample will be a photo of a ske woman.
-
00:27:38 OK, and the rest is same.
-
00:27:40 And you can also add another concept here.
-
00:27:43 But the only thing that matters is the class of the another subject, If it.
-
00:27:50 If it is a cat or a dog or a tree, whatever you are teaching the class and instance prompt
-
00:27:59 so that you can differently call them and you can use both of them in a single picture.
-
00:28:05 For example, you can generate pictures of your wife and yourself in the same picture,
-
00:28:09 or your dog and yourself in same picture.
-
00:28:12 But for this tutorial I am not going to teach multiple concepts, so it is up to you to teach
-
00:28:18 or not.
-
00:28:19 I will just teach a single concept.
-
00:28:23 All right.
-
00:28:24 Now we are moving to saving tab.
-
00:28:27 In here you can enter a custom model name for saving checkpoints and LoRA models.
-
00:28:33 You can check out the half-model.
-
00:28:34 They say that it doesn't decrease the quality, but the checkpoints are smaller.
-
00:28:41 I didn't test it so I can't say if it is 100 percent correct or not.
-
00:28:46 So to keep the quality in max, I won't check it.
-
00:28:51 Save checkpoints to sub directory.
-
00:28:53 You should make this.
-
00:28:55 You should check this checkbox so that the savings will be under Web UI ohwx.
-
00:29:01 They won't get in the same directory.
-
00:29:04 Now this is important to set.
-
00:29:06 Generate a ckpt file when saving during training.
-
00:29:10 If you don't check this, then let's say you won't be able to test, load back and test
-
00:29:17 the model at the 20 epoch or 40 epoch or 60 epoch.
-
00:29:22 So you should check this out.
-
00:29:24 You can also continue from that point using that as a base mode.
-
00:29:27 And you can also load that model and you can do test inference on that.
-
00:29:35 So this is important, but this will increase your hard drive usage.
-
00:29:40 Be careful with that.
-
00:29:41 Generate a ckpt file when training completes.
-
00:29:43 Yes.
-
00:29:44 Generate a ckpt file when training is canceled.
-
00:29:46 I'm not checking this because when I cancel I don't want it to generate a ckpt.
-
00:29:53 After canceling you can just load the model and click ckpt and it will generate a ckpt
-
00:29:58 file from the last saved weights.
-
00:30:00 Now weights.
-
00:30:02 You see there is also option to save separate diffuser snapshots when saving during training.
-
00:30:08 This option will generate weight files, like you see here.
-
00:30:14 So for demonstration purposes I will also select this from later point you can just
-
00:30:21 make them as a new model folder and then you can continue your training from there.
-
00:30:27 Alternatively, I believe you can generate a new model from your saved ckpt file as a
-
00:30:35 new source checkpoint and you can continue from that saved checkpoint ckpt file.
-
00:30:41 I think both should be same.
-
00:30:44 OK.
-
00:30:45 After you did settings, just click save settings.
-
00:30:48 When you click train, I think it is automatically also saving.
-
00:30:51 Now I will generate the class images before starting training.
-
00:30:56 This will use the settings that I did set in these options.
-
00:31:02 And let's see what kind of class images we are going to get.
-
00:31:05 OK, so you see, it is generating 300 class images for training.
-
00:31:10 Why?
-
00:31:11 Because currently I have no images in here, but, as you can see, it is not working right
-
00:31:18 now.
-
00:31:19 So there is a mistake.
-
00:31:20 Obviously, To solve this mistake, I will just restart the application.
-
00:31:25 OK, restart is completed.
-
00:31:28 Let's refresh.
-
00:31:29 Go back to our extensions tab.
-
00:31:32 Check for updates.
-
00:31:33 If there is an update.
-
00:31:34 Yes, there is a new update during the video.
-
00:31:38 The updates are coming, So let's just refresh.
-
00:31:41 OK, refreshed.
-
00:31:43 Let's go back to extensions.
-
00:31:44 Check for updates.
-
00:31:45 OK, we are at the last.
-
00:31:46 Then let's go to DreamBooth, select our model load settings.
-
00:31:52 Go to the generate.
-
00:31:54 Before generating, I will delete these incorrect images first.
-
00:31:59 Let me do that.
-
00:32:00 Go to the pictures and in here, go to the web UI tutorial.
-
00:32:07 Ctrl-a shift-delete.
-
00:32:08 Yes, all deleted.
-
00:32:11 And just click generate class images.
-
00:32:13 OK, let's see if any error again.
-
00:32:16 OK, OK, I think error continues.
-
00:32:20 So instead of these methods, I will use txt2image tab to generate images.
-
00:32:27 The only difference between these and using text to image is: let me show you.
-
00:32:33 Meanwhile, just let's restart the application.
-
00:32:37 When you use, generate images like this will also generate a text file, same name as the
-
00:32:44 image name, and inside it it will write photo of man as a description.
-
00:32:51 So this is useful when you do [filewords] training or when you do LoRA training, But
-
00:33:00 for now it is not necessary for us.
-
00:33:04 I just reported this bug also to the developer, so I believe it will get fixed really quickly.
-
00:33:11 OK, so we are going to generate our class images from here.
-
00:33:19 Classification images: photo of man.
-
00:33:21 I'm just typing that setting the sampling steps counts 40, setting CFG: 7.5.
-
00:33:26 So this batch size means that processing multiple images at the same epoch.
-
00:33:35 It will use more GPU RAM, but it will make it faster.
-
00:33:39 And how many I need?
-
00:33:40 I need 300.
-
00:33:41 Therefore, I am going to set this as 38, like this, and then just click generate.
-
00:33:50 So now it will generate images.
-
00:33:52 But make sure that the selected model here you see, is same as the model that you used
-
00:34:00 to generate your model.
-
00:34:02 So in here, when you select your model for training, it shows the base model source checkpoint.
-
00:34:08 You see Stable Diffusion 1.5 pruned, and currently I am generating same images from this model.
-
00:34:14 So the generated images will be saved in text to image folder.
-
00:34:20 Let's open it by clicking here.
-
00:34:22 OK, when I have clicked open folder in here.
-
00:34:27 It didn't open because it says in the CMD window: text to image images does not exist.
-
00:34:34 After you create an image.
-
00:34:35 It will be generated because, as I said, this is a fresh installation to demonstrate you.
-
00:34:40 Therefore, all of my settings here are also default.
-
00:34:42 I didn't change any of them.
-
00:34:46 And there is one another thing that I want to mention.
-
00:34:48 In the DreamBooth model selection You will see in the SD 1.x versions they has they have
-
00:34:56 EMA or not.
-
00:34:57 So if they have EMA, it will increase your further training, fine tuning the model.
-
00:35:05 So you should pick EMA version having models.
-
00:35:09 It only exists in the 1.x versions.
-
00:35:11 I think in the SD 2.0.
-
00:35:14 In the 2.1 there is no model released with has EMA features.
-
00:35:20 OK, the first batch has been completed.
-
00:35:22 Let's open the folder.
-
00:35:23 Now the folder is opened.
-
00:35:26 So these are photo of man.
-
00:35:28 You see, there will be very weird images, bad quality images, but they don't matter
-
00:35:33 much.
-
00:35:34 They are not very important as long as they are generated by our checkpoint model.
-
00:35:40 OK, after all of the images have been generated, just select them all with control-C, then
-
00:35:48 go back to your folder where you want to get them saved web UI tutorial.
-
00:35:54 I am just going to copy paste them in the folder.
-
00:35:58 OK, let's return back to our DreamBooth and load settings.
-
00:36:04 So now we have the sufficient amount of classification images.
-
00:36:09 Now we are ready to click start training.
-
00:36:12 OK, when we start training, it will first start by caching them out.
-
00:36:19 We will see that.
-
00:36:23 So you see, it says that it has found 300 regularization images.
-
00:36:28 Therefore, it is not going to generate any more images.
-
00:36:32 Currently it is caching them.
-
00:36:35 OK, after the caching has been completed, you will see the training has been started.
-
00:36:42 It is progressing step by step.
-
00:36:45 You see 13, 14.
-
00:36:48 If you get out of memory error, then you need to try further decreasing memory usage.
-
00:36:55 All of the low memory settings and high memory settings are stated in the wiki.
-
00:37:01 I will put this into the description.
-
00:37:03 Also, you are seeing right now.
-
00:37:05 High batch size, set gradients.
-
00:37:08 These will increase your memory usage and these will decrease your memory usage.
-
00:37:13 There is not much else things that you can do, and one another thing is that the developers
-
00:37:18 are constantly trying to optimize and improve the extension to reduce memory usage.
-
00:37:26 So therefore, when you watch this video, or maybe one month later, you, your card, could
-
00:37:33 perhaps do use DreamBooth training.
-
00:37:37 So that's another possibility.
-
00:37:41 And after how many steps we are going to see our first sample images?
-
00:37:45 We can calculate it easily.
-
00:37:47 In the settings tab.
-
00:37:49 We did set as 10 epoch and how many training images we have.
-
00:37:53 We have 12, you see in here.
-
00:37:56 Therefore, after 120 steps we are going to see our first sample training sample images.
-
00:38:05 Actually, on, after 120 steps, it will save the checkpoint.
-
00:38:12 After 60 steps, because we did set 5 epochs.
-
00:38:15 We are going to see the first sample image and 60 steps has been completed.
-
00:38:20 So it is generating preview images at the step 60.
-
00:38:24 Ok, the first samples have been generated.
-
00:38:28 Let's open the samples folder.
-
00:38:29 So where they were saved, they were saved under our model.
-
00:38:34 Let me show: Ok, I have so many same tabs.
-
00:38:41 Ok, inside our installation folder, go to the models and in here go to the DreamBooth
-
00:38:47 and in here you see the same name as our training model name.
-
00:38:50 Enter there.
-
00:38:52 In here you will see samples.
-
00:38:53 When you click here you will see the samples.
-
00:38:56 So the first sample is generated with this sample prompt with ohwx man.
-
00:39:05 So this is our class and this is the unique instance prompt we have set.
-
00:39:09 Ok, so there is another image.
-
00:39:11 You see.
-
00:39:12 This is generated with photo of ohwx man by Tomer Hanuka.
-
00:39:19 Why did I set this and where did I set this?
-
00:39:22 I did set this in here.
-
00:39:24 If you remember, The second prompt you see in here with name it as one, is the sanity
-
00:39:35 sample prompt.
-
00:39:38 The number here is the step count that it has been generated, and this is the other
-
00:39:43 thing is the prompt used to generate it.
-
00:39:47 After we progress in the training, you will understand why we are using this.
-
00:39:54 As much as this image looks like us, with a different style, it means that our model
-
00:40:00 is learning good, and when it becomes exactly like us, not styled like this, that would
-
00:40:07 mean that our model is overtrained and now we can't apply styles.
-
00:40:12 Our aim is learning our teaching our shape, but not overtraining it, not distributing,
-
00:40:22 disturbing the underlying context, the knowledge of it, not overriding it completely.
-
00:40:29 So after we progress in the training, we will understand better.
-
00:40:32 Okay, now let me explain to you to how to prepare your training dataset images.
-
00:40:40 What is important with the selection of the images?
-
00:40:45 What we want to teach is the subject that we want to teach.
-
00:40:50 The most important part.
-
00:40:52 I want to teach my face.
-
00:40:54 Therefore, other than my face, everything must be different, or, let's say, should be
-
00:40:59 different in each of the images.
-
00:41:01 So, other than face, what can be different?
-
00:41:03 My clothes and the background can be different.
-
00:41:07 So if you are teaching your face other than your face, all of the backgrounds and the
-
00:41:14 clothes should be different as much as possible.
-
00:41:17 As you can see in my pictures, I have made sure that all of the backgrounds and the clothes
-
00:41:23 are different or the clothes are not visible.
-
00:41:27 So if you make your clothes different and your backgrounds are different, then the model
-
00:41:34 will learn your face, not your clothes or not the backgrounds.
-
00:41:37 That is what we want.
-
00:41:38 We want to teach our face, not the other things in the pictures.
-
00:41:42 If you use same clothes, then the model will not say that this is the face and this is
-
00:41:49 the clothes and the model will learn both of them at the same time and it will reduce
-
00:41:53 your stylizing your face.
-
00:41:57 Therefore, the key point of preparing training images is having different things other than
-
00:42:04 the subject.
-
00:42:05 So if the subject is face, the other things must be different.
-
00:42:09 Also, you should have different angles of photos and different distances of photos.
-
00:42:18 It will make the model learn different angles and different distances to generate different
-
00:42:25 kinds of different styles, more variety of images.
-
00:42:30 So if you make your images, I can't say my data set is the best available data set.
-
00:42:37 You can expand your data set with more variety of images, more variety of poses, more variety
-
00:42:42 of angles, more variety of lightning.
-
00:42:46 Lightning also matters.
-
00:42:47 It would be better.
-
00:42:49 However, this is a small data set and I think it is working pretty decently.
-
00:42:55 But if you expand this data set, your training data set, with more variety, then it is better.
-
00:43:00 It will learn your face or subject in a more generalized matter and with that way we will
-
00:43:08 be able to produce different kind of different artistic images more easily.
-
00:43:14 Okay, so you see, currently it is compiling a checkpoint ckpt file and you can just load
-
00:43:21 the ckpt file directly and do inference on that checkpoint.
-
00:43:27 It is compiling checkpoint at the step 360, which is epoch 30, and so where are these
-
00:43:34 checkpoint files are located?
-
00:43:37 They are located on models inside inside our folder, and you see the ckpt file and the
-
00:43:45 yaml file is here.
-
00:43:48 If you don't know what are yaml files, just watch my how to use Stable Diffusion 2.1 and
-
00:43:55 different models in the web ui tutorial video.
-
00:43:59 I will put the link as usual, and let's check out our so far samples.
-
00:44:07 So in this image this is like me, but no other sample prompts are like us.
-
00:44:13 We just need to do more training.
-
00:44:15 And also in this screen you will see 5.5 or 3.7.
-
00:44:22 So this means that this is how many iterations per iteration is done in each second.
-
00:44:30 However, these values are not very correctly displayed, so there is also loss and this
-
00:44:37 lr is important.
-
00:44:38 This shows your learning rate.
-
00:44:40 So 2e-6, what does that mean?
-
00:44:43 That means that it is a number.
-
00:44:47 When you type it to the google 2e-6 and go to the first result, for example, it will
-
00:44:54 show you it is equal to this number.
-
00:44:57 Okay, so this is the number.
-
00:44:58 Actually we did set in our settings, in our learning rate, you see.
-
00:45:04 So this is equivalent of the scientific e-notation number.
-
00:45:10 If you set changing numbers from here you see there are changing numbers like polynomial,
-
00:45:17 constant or other things, learning rates then you will see different numbers in here and
-
00:45:25 it also shows the gpu usage.
-
00:45:27 However, this is also not very accurate.
-
00:45:30 It says that 9.5 gigabytes currently is being used.
-
00:45:34 Okay, okay, it has been 72, 82 epochs.
-
00:45:41 Now i will show you how you can continue training, if an error occurs.
-
00:45:46 So to illustrate that, i will just crash the application with closing here.
-
00:45:51 When you close from here, it won't save any checkpoint or anything.
-
00:45:55 Use the error connection error.
-
00:45:58 Then just restart the application and after the restart is done, just refresh your interface.
-
00:46:08 Go to the DreamBooth tab, select the model, click load settings it actually it will be
-
00:46:14 automatically loaded.
-
00:46:15 And then just click train.
-
00:46:17 It will continue from the last checkpoint, which is 80 epochs.
-
00:46:23 Let's wait.
-
00:46:25 Okay, you see it has.
-
00:46:28 It is continuing from wherever it is left, as you can see here.
-
00:46:34 Also, in the cmd window it shows first resume epoch, and first resume step, step, as you,
-
00:46:42 as you can see here.
-
00:46:43 Okay, we are over 168 epochs and we are already doing a lot of over training.
-
00:46:52 How do i know?
-
00:46:54 As i said you in the beginning, i have entered a sanity, sanity prompt.
-
00:47:03 So the samples numbered with, dash one are the sanity prompts.
-
00:47:09 And let's look at the sanity prompts changes.
-
00:47:12 So the sanity prompts started like this.
-
00:47:15 Then in here you see, the sanity prompt is resembling me and also here resembling me,
-
00:47:23 okay, resembling me somehow.
-
00:47:26 And after certain point, actually after 1368 steps, the sanity prompts become just like
-
00:47:37 me.
-
00:47:38 You see, it is not anymore styled okay, like this, like this and this is almost as like
-
00:47:45 me, and you see, they are not anymore styled like here.
-
00:47:50 Styling is completely gone and in here.
-
00:47:53 Therefore, now we are sure that we are doing over training.
-
00:47:59 So i am just going to stop training with cancel and i am going to use different checkpoints,
-
00:48:08 test them out to see how they are performing.
-
00:48:11 Now the hard part is coming: the prompting, the proper, the correct prompting to obtain
-
00:48:17 the good results.
-
00:48:19 So the training has been cancelled.
-
00:48:22 Let's look for the closest one.
-
00:48:25 I am refreshing here and in here, yes, this one looks like the closest one: 1308.
-
00:48:32 Then go to the text2image tab.
-
00:48:35 So how are we going to generate our own image?
-
00:48:39 We are going to use photo of.
-
00:48:41 These two keywords are also associated with us right now, but not as strong as our prompt
-
00:48:47 instance.
-
00:48:49 Ohwx and man.
-
00:48:50 Also man is very much associated, okay, so when we type like this and hit the generate
-
00:48:57 button, it will generate our own image.
-
00:49:00 Okay, the image is ready.
-
00:49:02 You see, it is like us and now we need to style it.
-
00:49:06 So let's add in this name style and let's see what kind of result we are going to get.
-
00:49:12 Okay, as you can see, we didn't get much of styling, so therefore, i am going to show
-
00:49:21 you an extension which is named as web ui prompt generator.
-
00:49:26 You can install it from available tab.
-
00:49:29 Just click load and in here just search for prompt and you will see prompt generator and
-
00:49:34 just click install and then just apply and restart the ui.
-
00:49:38 After that you will see prompt generator tab here.
-
00:49:41 So let's get some extra additional keywords from prompt generator and let's click generate.
-
00:49:47 Okay, there are a lot of results here, but, this came to me, could work like, so i copied
-
00:49:55 it and pasted it in here and let's see the result we are going to get.
-
00:50:00 Okay, we got somewhat decent results, but it is still not very much like us.
-
00:50:06 Therefore, we need to increase the prompt strength.
-
00:50:10 So what is prompt strength?
-
00:50:12 prompt attention.
-
00:50:13 This is from the official wiki of the Automatic1111.
-
00:50:17 So if you want to increase attention to a word by factor of 1.1, you can take the word
-
00:50:24 inside one parentheses.
-
00:50:26 If you want to increase the attention even more by factor of 1.2, 21, so you can just
-
00:50:34 put like this: alternatively, you can use an easier way, which will be: let me show
-
00:50:40 me, let me also zoom in, just type like this: okay, so this will increase the attention.
-
00:50:47 This will force model to generate image that is more like us and it will going to ignore
-
00:50:56 the rest.
-
00:50:57 Also, in this prompt there are so many things that would be unrelated to disney style.
-
00:51:05 So what would be related to disney style, for example, CGI, and let's also add some
-
00:51:13 other keywords.
-
00:51:15 Okay, here are results.
-
00:51:17 Not very much like us and not very good quality.
-
00:51:21 We need to improve the prompt with adding some negative prompts as well.
-
00:51:28 Okay, here i have added some negative prompts and now you see we have a much better artwork,
-
00:51:35 but still not very much resembling to me.
-
00:51:39 So i am going to try another prompt with also increasing, the, the emphasis of our unique
-
00:51:47 keyword, which is ohwx and the man.
-
00:51:51 In every prompt you must have ohwx man with some increased strength, probably to get your
-
00:51:58 own face, and also adding photo of.
-
00:52:01 Why?
-
00:52:02 Because during the training we have used class prompt as photo of man.
-
00:52:09 Therefore, now these three keywords are also associated with us, but the most association
-
00:52:15 is coming from ohwx, okay.
-
00:52:18 Okay, so i am going to try with emphasis of 1.5 and a new prompt like this.
-
00:52:27 Let's see the results.
-
00:52:28 Okay, we got an image that is not very stylized.
-
00:52:32 Therefore, we need to increase CFG.
-
00:52:35 So what is CFG?
-
00:52:36 CFG is classifier free guidance scale how strongly the image should conform the prompt.
-
00:52:43 Lower values produce more creative results.
-
00:52:45 We want the model to obey our prompt because we are providing a very detailed prompt.
-
00:52:54 Therefore, we need to increase scale and try it.
-
00:52:59 So i will show you how you can try multiple scale values.
-
00:53:04 Go to the bottom on here and go to the x/y plot.
-
00:53:09 So in the x/y plot there are x and y values.
-
00:53:14 Currently we only need x value.
-
00:53:15 In the x value i am going to select CFG scale and in here i am just typing seven, eight,
-
00:53:22 nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, and i wanted to use same
-
00:53:29 seed for all of the input so that i can see the changes, and i will generate four images
-
00:53:37 in each iteration, in each step.
-
00:53:42 My graphic card is able to process four images.
-
00:53:45 If you don't have much vram, you can't do that.
-
00:53:48 Then you should increase this.
-
00:53:50 Okay, if you check this, keep minus one for seeds.
-
00:53:54 Then each image in the each generation would be different.
-
00:53:58 However, i want to see the difference of CFG effect in a legend.
-
00:54:04 Therefore, i'm keeping it like this and then just click generate.
-
00:54:09 So currently, in the CMD window, actually it is generating four images at each epoch.
-
00:54:19 So you see, in the 20 steps, actually it is processing 80 steps.
-
00:54:23 So four of them is being parallelly processed, since i did set batch size to four.
-
00:54:32 Okay, CFG images.
-
00:54:33 Different CFG images have been generated.
-
00:54:35 I have modified the input because the previous input was not very good.
-
00:54:40 Actually, it turns out that.
-
00:54:42 But it is not important, because when you are working with Stable Diffusion, you have
-
00:54:47 to make, you have to generate a lot of images to find out the good ones that you, you would
-
00:54:55 like to obtain.
-
00:54:57 So let's look at, look at the effect of the CFG.
-
00:55:01 So this is our seed value.
-
00:55:04 If you use this seed value, you will always generate similar images in each generation,
-
00:55:11 as long as you keep the same value, same model.
-
00:55:16 So this is the CFG scale seven.
-
00:55:18 At the CFG scale seven, there is not much resemblance.
-
00:55:23 At the CFG scale eight, a little bit resemblance.
-
00:55:28 Look at how the images are changing.
-
00:55:31 This is CFG scale nine.
-
00:55:33 There is some resemblance in these two.
-
00:55:36 Okay, and if in the CFG scale 10.
-
00:55:39 Now, this is also some resembles and in here, okay, you see, resembles is increasing and
-
00:55:47 in the CFG scale 14 actually, there is really good resemblance in this image and in this
-
00:55:53 image actually, and so it goes, and after certain CFG scale it becomes, i think the
-
00:56:00 quality starts to be decreasing.
-
00:56:03 So therefore the CFG scale makes difference.
-
00:56:06 Now let's say you want to test out different artists, styles with different CFG scales.
-
00:56:14 How can you do that?
-
00:56:17 I am putting here a special keyword that i am going to use by replace kw.
-
00:56:23 Okay, then the rest is anything you want, and in the bottom so this time i am going
-
00:56:30 to select prompt sr.
-
00:56:33 Okay, so the prompt sr works as separate a list of words with commas, and the first word
-
00:56:39 will be used as a keyword.
-
00:56:40 Script will search for this word in the prompt and replace it with others.
-
00:56:45 So these keywords will be replaced whatever i type here.
-
00:56:49 So let's say wlob and then artgerm and then whatever other artists that you want to test.
-
00:56:57 Okay, i have added two more artists, so we have four artists.
-
00:57:02 Let's also test 4 CFG values 10, 11, 12 and 13.
-
00:57:10 Perhaps let's start from 11, okay, and let's keep seeds for minus one, but that time we
-
00:57:23 we couldn't test the CFG or the style.
-
00:57:27 Therefore, let's keep the same seed, okay, and you see, there are restore faces, tiling
-
00:57:34 and high res fix, so you could also pick them to improve your output, but that would take
-
00:57:40 extra time and you can do them in the extras tab which i will show.
-
00:57:45 And the batch count is one and batch size is four.
-
00:57:48 Let's see what kind of results we are going to get.
-
00:57:50 By the way, these other keywords will also heavily affect the artist style.
-
00:57:56 Therefore, if you want to only check out the artist style, then you should reduce number
-
00:58:03 of extra keywords here and let's see what we are going to get.
-
00:58:08 Okay, i did get runtime error.
-
00:58:11 Why?
-
00:58:12 Because i have forgotten to put this keyword in here.
-
00:58:16 The first keyword has to be that.
-
00:58:18 Now i need to run again.
-
00:58:20 Okay, now the generation started.
-
00:58:23 You should always, check out the CMD window and what is happening here.
-
00:58:28 If you get an error, then you should fix it, obviously.
-
00:58:32 Okay, this is the kind of tile that we are going to get.
-
00:58:35 Actually, it is pretty useful.
-
00:58:37 So, you see, in the top CFG scale and in the left we got the art style, by the way: It
-
00:58:45 also produces results with: replacekw and not much like representing the style or me.
-
00:58:56 Therefore, perhaps we can remove many of the keywords that would take away the style like:
-
00:59:07 let me do.
-
00:59:10 Okay this time we have more kind of styling, as you can see here: this is the default,
-
00:59:17 this is wlobe, this is artgerm.
-
00:59:21 This is Robert S Duncanson and this is Karol Bak.
-
00:59:25 Especially Karol Bak style is pretty different and significant, as you can see.
-
00:59:31 So the key point here is is, with Stable Diffusion, that you have to generate a lot of images,
-
00:59:38 and some of them will be very, very good and maybe majority of them will not be good and
-
00:59:44 useful.
-
00:59:45 This is the nature of the AI based art generation, especially if you are trying to generate art
-
00:59:55 based on your subject, a new subject, and also when we were doing training in here,
-
01:00:04 you can use more classification images.
-
01:00:08 That can help.
-
01:00:09 I said that the community is using 300 total, but that is not a hard limit.
-
01:00:18 You can just use 200 images for per training image and that may help you to improve your
-
01:00:25 style.
-
01:00:26 Actually, it is also the number used in the official paper, as i said.
-
01:00:30 So it is up to you.
-
01:00:31 You have to do experimenting.
-
01:00:33 The numbers and the quality you get also totally will depend on your training data set.
-
01:00:39 If you get a much variety having a training data set, as i have explained, then your model
-
01:00:47 can learn much better.
-
01:00:49 I will show you one another thing here.
-
01:00:50 There is a prompt matrix that will generate combination of the images.
-
01:00:57 Okay, so when you type your query like this and select it from matrix, this query will
-
01:01:05 become face photo of ohwx man:1.3, like this.
-
01:01:10 And then they will get combined by like this.
-
01:01:15 So this will be generate all of the combinations of the written text separated with the let
-
01:01:24 me tell you once again, vertical pipe character.
-
01:01:28 It will generate all of these keywords combination like this.
-
01:01:32 Okay, i will show one another thing.
-
01:01:34 Let's say you are going to sleep and you want your computer to generate many different style
-
01:01:40 of images for you during your sleep.
-
01:01:44 For that i will show you an easy way to do it.
-
01:01:49 So our first our first prompt is face photo of ohwx and let's say 1.4.
-
01:01:58 Then let's add some certain keywords to get some certain kind of prompt.
-
01:02:06 Okay, i have typed like this and generated 20 input like this, then it has generated
-
01:02:14 me a lot of results.
-
01:02:15 I am going to copy all of this into a notepad file, paste it so you see they are actually
-
01:02:23 copied as one line each one.
-
01:02:26 Then i will generate several more.
-
01:02:28 Okay, i keep copy, pasting the newly generated input to there.
-
01:02:35 Okay, now i have 60 lines of inputs like this.
-
01:02:40 I am going to save it as.
-
01:02:45 Let's go to the pictures and nightly prompts.
-
01:02:49 Okay, then go back to text2img tab and in here select prompts from file or text box.
-
01:03:00 You can paste all of them here or you can upload them from here.
-
01:03:05 So i will upload them, from the text box, from the text file, and they are all uploaded.
-
01:03:13 I am going to say, use random seed for all lines, because i want to get as many as possibly
-
01:03:19 different results.
-
01:03:22 And then i want to generate how many images you want to generate for each one.
-
01:03:29 I want to generate, let's say, eight images in parallel, and currently they will use the
-
01:03:36 CFG value I am going to set here 14.
-
01:03:39 So with 60 and 8 images batch size, we are going to get 480 images.
-
01:03:48 Let's say you want to generate 4000 or whatever you want.
-
01:03:53 So if i set this 20, we are going to get exactly 20 times multiplied by 8 and multiplied by
-
01:04:02 the number of lines we have.
-
01:04:05 9600 images during the night with a lot of different inputs, variation, and among them
-
01:04:13 you can pick whatever you want and use it as you want.
-
01:04:17 This is one of the options that you can.
-
01:04:20 Okay, after i click it, it started generating images.
-
01:04:25 For example, generated this one, and if you wonder what is this image, you go to the png
-
01:04:31 info and then you can just go to get the image, drag and drop it in here and it will show
-
01:04:39 you all of the parameters it has.
-
01:04:42 So this is the prompt input and this is the negative prompt input it has and the number
-
01:04:47 of steps used.
-
01:04:48 The sampler used, the CFG scale used, the seed, so with this seed you can repeat this
-
01:04:55 image generated.
-
01:04:57 You can use this seed and change the CFG value and generate other variations of this and
-
01:05:02 the size and the model hash.
-
01:05:04 The model hash, of course, will change since, we are using our custom trained model.
-
01:05:10 The batch size and the batch position, so this is also important.
-
01:05:15 To exactly get this, you need to generate a against batch size as 8, and the sixth position
-
01:05:23 will be this one.
-
01:05:24 If you use this seed and this CFG value and this sampler.
-
01:05:28 We are getting some decent photos, and i will leave it to run during my sleep and tomorrow
-
01:05:34 i will show you, of course, in a moment for you.
-
01:05:38 We are going to see what kind of good images we got.
-
01:05:43 Okay, here you see, some of the images i have generated during my sleep.
-
01:05:49 They are pretty good quality, but they are very similar.
-
01:05:52 Why?
-
01:05:53 Because it appears that the inputs i have used to generate them were not much different.
-
01:05:59 However, some of them are really high quality.
-
01:06:02 For example, this image: you see, it has almost perfect eyes, perfect shape.
-
01:06:07 It's a really good quality image.
-
01:06:10 So your training data set and the keywords, the prompts you use, will hundred percent
-
01:06:16 affect the outcome that you are going to get, and you really need to stylize your prompt
-
01:06:23 according to what you want to get.
-
01:06:25 Now let me show you a few of the prompts used for generating these images.
-
01:06:30 To doing that, i am going png info, okay, and then i will drag and drop.
-
01:06:36 For example, let's first see a 3d like image.
-
01:06:41 Okay, and you see this used blender, zbrush, autodesk maya, unreal engine, colored, because
-
01:06:51 if you want to generate a 3d like image then you need to use these kind of keywords.
-
01:06:57 Then you can send these to.
-
01:06:59 For example, let's go to the extras tab.
-
01:07:02 In extras tab i can upscale this image to get it a bigger size.
-
01:07:08 After my testing i have found that R-ESRGAN 4x+ works best.
-
01:07:14 There is also anime version.
-
01:07:18 Also, LDSR is working very good, but this requires a lot of gpu memory.
-
01:07:25 So when i click generate, when the first time you generate it, it is going to download the
-
01:07:31 model that is necessary for R-ESRGAN 4x+.
-
01:07:35 You can see here and now we will see the upscaled image.
-
01:07:41 So this is the upscaled image.
-
01:07:43 The upscale and the original will not be exactly same, but let's compare them okay.
-
01:07:48 Let's make them, not zoomed in, okay.
-
01:07:54 So you see, both of these are, really similar.
-
01:07:59 A little bit loss of quality.
-
01:08:02 Let's also try with the anime version.
-
01:08:10 Okay, now we got anime version.
-
01:08:14 So let's say, you want to make your images like anime, then you can use that.
-
01:08:19 This is extremely useful.
-
01:08:23 You can also upscale entire folder.
-
01:08:26 For example, i will just ctrl a select all, then i will drag and drop them here.
-
01:08:32 All of them is now here.
-
01:08:33 Now i can upscale all of them at once.
-
01:08:37 Let me show.
-
01:08:38 During the operation you will see they are getting tiled like this to generate bigger
-
01:08:45 size images.
-
01:08:47 The results of upscaling the extras tab actually will be inside another folder.
-
01:08:52 When i click it, you will see they are getting here and all of these images are now upscaled.
-
01:08:58 For example, let's open this: this is a pixar style image, actually.
-
01:09:04 Okay, this is another pixar style image, so, for example, this is also another pixar style
-
01:09:13 image, as you can see.
-
01:09:17 I have trained these on the Google Colab and now i will show you how you can upload your
-
01:09:23 model to the Google Colab and generate images there with faster than probably your gpu,
-
01:09:31 because the Google Colab gpu is really strong, able to process a lot of images at once in
-
01:09:40 a parallel way.
-
01:09:41 Okay, you see, all of these are getting upscaled, okay, um, let's see some of them like this,
-
01:09:52 as you can see, okay, okay.
-
01:10:02 Now i will show one another cool thing.
-
01:10:06 Usually you may not get very good looking eyes or some errors in the face, and there
-
01:10:13 is a very good way to improve the eyes or the overall structure of the face.
-
01:10:20 It uses another AI model and let's try this image improving.
-
01:10:26 Usually, the my images were really good eyes.
-
01:10:31 Okay, to test it.
-
01:10:33 I am just going to not upscale, but i am going to use GFPGAN.
-
01:10:38 So this GFPGAN is a model to improve the eyes.
-
01:10:42 Let's test it.
-
01:10:43 When the first time you use it, it will download the necessary model.
-
01:10:48 Okay, now let's compare the result.
-
01:10:50 This is the original image and this is the fixed image.
-
01:10:53 Now let's also apply an upscale, okay.
-
01:10:59 Okay, after applying upscale and applying a GFPGAN, you see it is now looking much better
-
01:11:07 in terms of quality correctness.
-
01:11:09 This will seriously improve the eyes.
-
01:11:13 Let's open them like this.
-
01:11:14 Okay, let's zoom in.
-
01:11:16 So you see the difference is huge: much better quality, styling.
-
01:11:22 You can apply this to your generated images as a batch as well.
-
01:11:26 Just go to batch process and select the options from here and it will do everything.
-
01:11:31 You can also try these other options.
-
01:11:33 I didn't find them very useful actually, and there is also not a description to them.
-
01:11:39 Okay, now i will show you how you can continue training from any checkpoint that you did
-
01:11:44 set.
-
01:11:45 Just go to the search checkpoint and you will see your saved checkpoints here, by the way,
-
01:11:49 to get them saved in the saving, you need to check this generator ckpt file when saving
-
01:11:57 during checkpoint and then, if you generate a new model from that checkpoint, you will
-
01:12:03 basically continue training from that certain checkpoint.
-
01:12:08 Now i will show you how you can use these ckpt files directly in a Google Colab.
-
01:12:16 If you have watched my previous video about transform yourself into a stunning ai avatar,
-
01:12:22 this tutorial is how to do training on a Google Colab and, everything is explained there to
-
01:12:30 use your ckpt file in a Google Colab.
-
01:12:34 It is so, so easy.
-
01:12:35 First we are going to generate a new model from the our wanted checkpoint.
-
01:12:41 That let's say i want to use step 1380 as a checkpoint.
-
01:12:47 Then i am giving it a name as a Colab image.
-
01:12:52 Okay, and nothing else.
-
01:12:54 Just click create model.
-
01:12:56 Okay, it has generated a, generated a new model for the Colab image and inside working
-
01:13:02 directory you just need to upload this into google google drive and then just give its
-
01:13:10 path.
-
01:13:11 So for i will say that, my image.
-
01:13:15 Okay.
-
01:13:17 Let's say, let's also add our keyword to that and let's move them inside here and then go
-
01:13:26 to your drive folder like this, where you are running your DreamBooth or the Stable
-
01:13:34 Diffusion, then drag and drop this directory here.
-
01:13:41 It will upload all of the files, as you can see in here.
-
01:13:45 Once the upload is completed, all we need to do is changing model path in the inference
-
01:13:52 tab of the Google Colab notebook.
-
01:13:55 This is linked in the description of the tutorial.
-
01:13:59 So you need to change it like this: content: drive my drive, and in here my drive image
-
01:14:06 ohwx, which is the folder name that i have given and i am uploading to the main folder
-
01:14:13 of my Google Drive.
-
01:14:15 Then, in the Google Colab, you will be able to use your trained ckpt file right away.
-
01:14:21 So what if?
-
01:14:23 If you want to teach another face?
-
01:14:27 Just generate a new model like this and this time, in the concepts folder, set the directory
-
01:14:34 and the classification directory for your new subject.
-
01:14:38 However, be careful with something.
-
01:14:41 Currently, my model is trained with ohwx man as an instance prompt and photo of man as
-
01:14:48 class prompt.
-
01:14:50 So if i am going to teach another, a person, a male, then i have to pick another keyword,
-
01:14:56 for example ske or another rare keyword, and um, it will teach this man into the model
-
01:15:06 as well.
-
01:15:07 So we will be able to use both of them.
-
01:15:09 However, probably you will get mixed results because man keyword were already taught for
-
01:15:17 my own image and when i introduce another man image they will get mixed.
-
01:15:24 So it could be a problem, but you can try it.
-
01:15:27 Test it and if you generate sufficient of images, then i think you will.
-
01:15:31 You can obtain still good results.
-
01:15:34 However, if you inject some another class, like a woman, then it shouldn't be much problem
-
01:15:42 and you should be able to teach multiple different subjects easily.
-
01:15:47 Now i will explain more advanced stuff.
-
01:15:50 For example, the directories, data set directory.
-
01:15:54 Okay to be able to use [filewords], you need to have a training data set named like this:
-
01:16:02 okay.
-
01:16:03 So for each image, you are also going to have a text file with the same name.
-
01:16:08 The extension will be txt, like this, and, you need to write the description of that
-
01:16:13 file properly.
-
01:16:16 There is a new AI model for captioning images.
-
01:16:20 This is not implemented to Automatic1111 yet, but i will it will be.
-
01:16:25 I will put the link of this into the description.
-
01:16:28 You can also locally run this.
-
01:16:31 And if you don't know how to locally run run this, then you need to watch our this video
-
01:16:38 on our channel.
-
01:16:39 In this video, i am explaining how to locally run HuggingFace files.
-
01:16:45 Okay, and i will just use the online demo right now because it is not very much used.
-
01:16:51 So, first image, i will just drag and drop here.
-
01:16:56 Sorry about that.
-
01:16:58 Okay, like this: and click submit.
-
01:17:01 It will generate the description for this image.
-
01:17:03 You see, you should use the caption generated by GIT large.
-
01:17:08 This is the best one.
-
01:17:09 A man with dark hair and glasses is smiling.
-
01:17:13 Okay, so let's just change this text text.
-
01:17:18 Text description, like this.
-
01:17:20 However, there is one key issue: you have to have your class for this image inside this
-
01:17:27 description.
-
01:17:28 So my class is man and therefore it is there.
-
01:17:31 Okay, let's go.
-
01:17:32 Then.
-
01:17:33 This is another image that we want to caption, so let's submit it.
-
01:17:40 Okay, and then another image description is here.
-
01:17:44 Let's open the description: a cat with long whiskers looking at the camera.
-
01:17:50 And this is the class of cat, and it is inside here as well.
-
01:17:54 Yes, correct, and the rest will be for dog as well.
-
01:17:58 Now for classification images.
-
01:18:01 You need to do the same.
-
01:18:03 When you generate classification, you also need to have classification image and its
-
01:18:08 description.
-
01:18:10 Let's say: this is my classification image and it is it is generated with photo of man.
-
01:18:16 Therefore, i need to generate a same file description like this and inside here i need
-
01:18:23 to type photo of man.
-
01:18:26 When this tab get fixed, let me show you maybe it is already fixed, i am not sure.
-
01:18:33 In here, you see, we have generate class images and when you use that feature, it will be
-
01:18:40 able to: let's try it, actually okay.
-
01:18:44 And let's yeah, it doesn't matter, okay.
-
01:18:48 And when we type class prompt here photo of man, i think it will generate with it.
-
01:18:56 Let's try it, okay, it is not working.
-
01:19:01 It says maybe say okay, it's still not working.
-
01:19:06 When this become working, then you can easily generate it.
-
01:19:10 Or you need to generate the description like this: photo of man, and it will generate images
-
01:19:17 like that, or photo of cat or photo of dog.
-
01:19:20 So this will be your classification directory with description like this and this will be
-
01:19:27 your classification directory with naming like this.
-
01:19:29 With this way, you can teach multiple subjects in the one run and you can also possibly improve
-
01:19:37 your training quality if you provide a better description with defining more things.
-
01:19:45 By the way, when defining, you should specify your subject in the description what you want
-
01:19:53 to teach.
-
01:19:54 If you want to teach face, then you should describe the face in mostly.
-
01:19:58 Okay, and one another thing: okay, once you prepared your folders.
-
01:20:05 Now here the way to do it.
-
01:20:08 First of all, we are defining the data set directory as usual.
-
01:20:14 Okay, let's set it.
-
01:20:16 And let's also set the classification directory like this.
-
01:20:21 And in [filewords], we need to use defining prompt instance.
-
01:20:29 Okay, this will be used to define it.
-
01:20:35 It has to be a single word.
-
01:20:37 Therefore, i am entering ohwx and the class token.
-
01:20:41 This will be also a single word.
-
01:20:45 By the way, it won't be very precise actually if you use this way, class token.
-
01:20:55 But yeah, looks like if you teach multiple different classes, then you may not get very
-
01:21:02 good performance, for example, teaching a cat, a face, a cat, a dog and a man, because
-
01:21:08 they are conflicting with the current setup.
-
01:21:12 So using three concept is better, but let me also explain it to you.
-
01:21:17 So this will be man.
-
01:21:18 And in prompts you are just going to type [filewords] and class prompt.
-
01:21:23 You are just going to type [filewords] and leave blank to use instance prompt optionally.
-
01:21:30 Use [filewords] to base sample captions on instance images.
-
01:21:33 You can just also use [filewords] to see what is what it is generating.
-
01:21:40 This is called mixed where in the basics of the wiki of DreamBooth extension.
-
01:21:47 So you see there is DreamBooth regular training that i have shown in this tutorial.
-
01:21:53 Then there is fine tuning.
-
01:21:55 Fine tuning is the standard approach for big data sets.
-
01:21:58 Only the captions of the images are used.
-
01:22:00 [filewords] class images are not used.
-
01:22:02 These results in a model that doesn't need instance token and reacts to any prompt.
-
01:22:07 So in this case you are overall training.
-
01:22:10 What does that mean?
-
01:22:11 That means that, let's say, in your [filewords] you have cars, you have cats, you have dogs,
-
01:22:18 you have men.
-
01:22:19 You are training all of these words.
-
01:22:22 And this is how the custom models you see are usually trained.
-
01:22:28 Let me show an example.
-
01:22:29 So, for example, protogen x3.4 is a custom model and it is working pretty good.
-
01:22:37 How did they train it?
-
01:22:38 They probably trained it with fine tuning.
-
01:22:41 So in fine tuning they have, precisely prepared the descriptions of each training image.
-
01:22:48 They didn't use any classification images and they have overall changed the underlying
-
01:22:53 context, data, the knowledge of the model.
-
01:22:56 So when you use now man, it produces quality of man images depending on their new fine-tuned
-
01:23:04 data set or car or castle or whatever that you are improving your model on.
-
01:23:10 And there is hybrid.
-
01:23:12 Okay, actually i said mix it, but it will be hybrid.
-
01:23:15 Hybrid, for lack or of better term, is achieved using instance token in combination to [filewords]
-
01:23:20 as instance prompt.
-
01:23:21 Trained Dataset will be linked to that instance token.
-
01:23:24 This minimize the bleed but requires token in every prompt, as you can see here.
-
01:23:29 So you have to use or ohwx french bulldog or ohwx, whatever you have teached.
-
01:23:37 Also you see the class token is person.
-
01:23:39 So with hybrid model with [filewords] if you, if you don't do fine tuning but only teach
-
01:23:45 any subject, the subject should be, i think, same class.
-
01:23:49 They can't be from different classes.
-
01:23:51 So you can teach multiple person in a single run, maybe 10 person, with just providing
-
01:23:59 correct [filewords] and their descriptions.
-
01:24:03 So for this person you need to add, let's say a man personA.
-
01:24:09 Okay, this will define personA.
-
01:24:11 For person b, you need to add personB and for person c, you need that personC.
-
01:24:16 But you are not going to add into this description: you are not going to add this instance token.
-
01:24:24 Okay, you don't need to type instance token into the [filewords], into the description
-
01:24:30 of the training images or the into the description of the classification images.
-
01:24:37 Okay, this is important.
-
01:24:39 Okay, now i will show how you can understand out of memory error.
-
01:24:46 So it is easy.
-
01:24:47 I'm just going to load settings for our existing data set.
-
01:24:50 You see, i have an error.
-
01:24:52 So it looks like i had error in cmd.
-
01:24:54 I just need to restart.
-
01:24:56 Okay, i did restart and in the settings, if i set use EMA.
-
01:25:03 So actually this improves our result quality but it costs more ram.
-
01:25:08 And then i just click train and let's see how we are going to get out of memory error.
-
01:25:14 Okay, we got our error.
-
01:25:18 Let me show you how to understand out of memory error.
-
01:25:22 You will see runtime CUDA out of memory.
-
01:25:24 If you are seeing this error, all other messages are not important.
-
01:25:28 This means that with the current settings that you are trying to training, your graphic
-
01:25:34 card is not enough and you need to reduce the ram usage.
-
01:25:38 Now let me show you all of the settings to how to reduce the ram usage.
-
01:25:43 Okay, so for minimal ram usage you need to pick LoRA with the LoRA.
-
01:25:48 There is just a little bit difference.
-
01:25:52 It is only different when you try to do inference and generate new images from generated LoRA
-
01:26:00 file.
-
01:26:01 And when you watch this video you will learn that, okay, LoRA will significantly reduce
-
01:26:06 ram usage.
-
01:26:08 Other than that, always make sure that your batch size and gradient accumulation steps
-
01:26:12 are one and other than that, in the advanced tab you need to pick use 8 bit adam and select
-
01:26:20 bf16 and select xformers.
-
01:26:23 So for xformers to be able to, you need to set your starting arguments to xformers and
-
01:26:31 minus minus no half.
-
01:26:33 These will allow you to use that.
-
01:26:35 Cache latents.
-
01:26:36 Actually, this is the.
-
01:26:37 This is still not clear.
-
01:26:39 You should try both this checked and unchecked.
-
01:26:42 Because some says that this increases, some says that this decreases . So also, Step Ratio
-
01:26:49 of Text Encoder Training.
-
01:26:50 This should be zero because this increases quality but also reduces, also increases the
-
01:26:54 vram usage.
-
01:26:56 And other than these, there is not much else that you can do.
-
01:27:02 These are the lowest possible.
-
01:27:04 Also, you need to uncheck this checkbox and you need to check this checkbox.
-
01:27:12 So when you check this checkbox it will increase your vram usage, but when you check this checkbox
-
01:27:18 it will reduce your vram usage.
-
01:27:21 Actually, actually, the settings are written in the troubleshooting part of the DreamBooth
-
01:27:26 wiki extension, in the OOM tab, and there is also overtraining and other things.
-
01:27:32 Actually, overtraining is still in working process and i have already shown you how to
-
01:27:37 understand overtraining.
-
01:27:39 And one another cool thing that i am going to show you is preprocessing your images.
-
01:27:45 So with preprocessing images you can easily generate descriptions for your both training
-
01:27:52 images and your classification images.
-
01:27:55 Of course they won't be very accurate, so let me show you.
-
01:28:00 I am picking my best db 512 as source directory and the description directory will be same.
-
01:28:09 So in here you can even define their target resolution, change them, but i prefer manually
-
01:28:16 changing them and captioning.
-
01:28:19 So for captioning, i am just going to select ignore, so it will generate new captions and
-
01:28:26 i am going to use deepbooru for captioning.
-
01:28:29 You can also generate flipped copies oversized images, splitted, autofocal point crop.
-
01:28:35 So let's say you have tens of thousands of images, then these options will be extremely
-
01:28:41 useful for you.
-
01:28:42 However, if you are only going to train your face, then you should manually prepare your
-
01:28:47 training data set to be best, and then i am going to generate captions for them.
-
01:28:53 I am just going to click preprocess.
-
01:28:54 It shouldn't change the width and height because they are already 512 pixels and it is downloading
-
01:29:03 the deepbooru for captioning.
-
01:29:04 This is another model, just as i have shown you in here.
-
01:29:09 The deepbooru is not as good as caption generated by git large, but it is still useful and in
-
01:29:15 a moment we are going to see.
-
01:29:17 Okay, it has thrown an error.
-
01:29:19 Says that same director specified as source and destination directory.
-
01:29:22 Obviously, this is not allowed.
-
01:29:25 Actually, it's a good thing that they don't allow.
-
01:29:28 So i'm just going to change it as processed, so that you don't override your original images
-
01:29:36 and just lets click preprocess.
-
01:29:39 Okay, the models are only downloaded one time, and all images are preprocessed.
-
01:29:45 So let's check out the preprocessed images.
-
01:29:48 Okay, you see same images with descriptions.
-
01:29:52 Let's look at the description.
-
01:29:53 So the description is one: boy, black hair, facial hair, gray pants, jacket, long sleeves,
-
01:29:58 male focus pants, realistic solo sub stable track jacket and track it track pants.
-
01:30:05 So it's a pretty good description.
-
01:30:07 You can also manually modify them.
-
01:30:10 Let's also modify our classification images so that, it will generate all of the description
-
01:30:17 of classification images.
-
01:30:18 By the way, this is useful, as i said, when you use [filewords].
-
01:30:22 If you are not using [filewords], then these won't get used.
-
01:30:26 This is also useful, very useful, if you use a hyper network or embeddings, and i will
-
01:30:32 also hopefully make a video about embeddings.
-
01:30:35 Hyper networks are not very good, but embeddings are really really good.
-
01:30:39 Okay, let's preprocess our classification folder.
-
01:30:45 So the preprocess is in train tab.
-
01:30:47 This is a feature of Automatic1111.
-
01:30:50 Okay, and preprocess it.
-
01:30:53 It is also pretty fast.
-
01:30:57 So this will be extremely useful to caption.
-
01:31:00 And also, if your images are not properly cropped and you have tons of thousands of
-
01:31:07 images, as i said, that will take huge time.
-
01:31:10 You can just use this.
-
01:31:12 As a beginner you can also use this to make your job easier and see the results, how it
-
01:31:17 is performing.
-
01:31:18 Let's say you picked your hundreds of images of yourself and you don't want to spend time.
-
01:31:23 Then you can preprocess images like this and try, try, train, try the training on them
-
01:31:31 and see the results.
-
01:31:32 If you can get good results, then why not spend much time, more time on them?
-
01:31:37 But if you want to get perfect results, then you need to manually crop your images and
-
01:31:43 set your set your description.
-
01:31:47 So let's see the preprocess now.
-
01:31:49 Every image has description.
-
01:31:51 Let's look at them.
-
01:31:52 Okay, it, for example, it defined this man as a girl, which is a very incorrect and also
-
01:31:59 3d asian black shirt.
-
01:32:01 Okay, this is a completely incorrect description, as you can see.
-
01:32:06 It's completely failed.
-
01:32:07 And now let's compare this with the large git which i have shown.
-
01:32:12 Okay, i wonder what kind of result we are going to get with large git, so i'm just going
-
01:32:18 to drag and drop.
-
01:32:21 By the way, as i said, i have suggested adding this model to the Automatic1111 to get better
-
01:32:27 results, and the large git generated a portrait of man with beard.
-
01:32:32 Yes, absolutely fantastically correct when compared to this trashy description, as you
-
01:32:41 can see.
-
01:32:42 Okay, as a final thing, i suggest you to look at the ELI5 training.
-
01:32:48 So this is getting updated by the experienced persons and, for example, in [filewords],
-
01:32:55 they say that they are giving an example of instance, token alexa is bad because underlying
-
01:33:01 data for alexa is great and it would be hard to override it.
-
01:33:06 This is also bad because this is getting split into like this: ohwx, great.
-
01:33:12 Class token is also important.
-
01:33:15 I already experienced them, but you can also check these pages.
-
01:33:19 I will put the links of these pages into the description.
-
01:33:24 Now i will show you one another very cool thing.
-
01:33:26 You see, this Protogen x3.4 is a custom model that has been generated by using multiple
-
01:33:33 models, a lot of training, and you see, if you train your face or subject into this model,
-
01:33:40 it won't produce good results.
-
01:33:43 Because the underlying data have been significantly changed.
-
01:33:48 So how can we inject our face into this model?
-
01:33:53 There is a way to do that and now i am going to show you.
-
01:33:56 We go to the checkpoint merger and in the primary model we are selecting our target
-
01:34:05 model, which is Protogen x 3.4.
-
01:34:09 Secondary model will be the model that we train it we are using, which will be this
-
01:34:16 one: ohwx 1308.
-
01:34:19 And there is tertiary model.
-
01:34:21 So the tertiary model will be version 1.5.
-
01:34:24 This is the model, this is the base model of our model, and what we are going to do
-
01:34:29 is we are going to extract our image from base model and we will apply our image into
-
01:34:37 the our new target model.
-
01:34:39 Let's give it a name: ohwx, protogen okay, 3.4, and set the weight 0.75.
-
01:34:49 This is 75%.
-
01:34:50 You may ask: how did you come up with this value?
-
01:34:53 I asked the community and, according to the experience of the community, 75% is a good
-
01:35:00 point.
-
01:35:01 You can, of course, try multiple different points.
-
01:35:03 You can try your different checkpoints to see how you perform.
-
01:35:08 Also, click the add difference.
-
01:35:10 So this will extract our face information from our base model and it will inject our
-
01:35:16 face information into our new target model without breaking the underlying context, the
-
01:35:25 information.
-
01:35:26 We are going to generate ckpt add difference and just click run.
-
01:35:31 In the cmd window you will see the messages like this: and checkpoint saved, then refresh
-
01:35:36 here and just go to our new model, which is ohwx protogen.
-
01:35:42 Now we can produce images by using the protogen model and our face, same as usual.
-
01:35:51 Okay, everyone, i have done a few tests and the results are just amazing.
-
01:35:58 So you see, these are some of the images that i have selected from the results.
-
01:36:03 And let me show you something.
-
01:36:05 So you see, this is generated by protogen and this is my original, real image.
-
01:36:12 And this is the generated image.
-
01:36:14 You see the quality.
-
01:36:15 It is just amazing.
-
01:36:17 And what kind of test i did.
-
01:36:20 For testing, i have used the x/y plot, i have entered different x values as CFG and i have
-
01:36:29 entered prompt sr as the weights.
-
01:36:32 So how did i make?
-
01:36:33 So you you see the ohwx man and then we are entering a weight here, right to give an importance
-
01:36:41 to it.
-
01:36:42 So i have entered a keyword here change weight and i have used it as a change weight here
-
01:36:49 in the prompt sr.
-
01:36:50 So the you the Automatic1111 ui into application changed the weight for me and tested different
-
01:36:59 weights.
-
01:37:00 Now i can see the properties of this generated part particular image to see what were the
-
01:37:09 used values.
-
01:37:10 Then, based on that, i can generate anything i want.
-
01:37:14 So the weight used was 1.4 and the cfg scale was 8.
-
01:37:20 So by using 1.4 and cfg scale 8 i can generate much more quality images.
-
01:37:28 So these two parameters will work with my merged model.
-
01:37:35 By the way, i also have used something else.
-
01:37:39 You see, there is a model hash and that hash, the hash written here, also displayed here.
-
01:37:46 This 95 means that i have generated another checkpoint, but this time i have used 95%
-
01:37:53 weight.
-
01:37:54 This worked better for me.
-
01:37:57 So in the beginning you can start with 75 and if you are not getting good images then
-
01:38:03 you can increase it and make different model merges and then do test on them.
-
01:38:11 So this is the way how to test and find out the good working parameters for your model
-
01:38:17 and then use those parameters to generate more stylized images as you want.
-
01:38:22 But the results are just simply amazing.
-
01:38:25 You can't just get these results so easily on the default Stable Diffusion model.
-
01:38:30 So you can inject your trained model, trained face, into any custom model out there and
-
01:38:37 generate the beautiful images as you want.
-
01:38:42 So let's also upscale this image.
-
01:38:45 To do that, i am just going to send it to extras and i will upscale it with R-ESRGAN
-
01:38:51 4x+.
-
01:38:53 And here the result: it is just beautiful.
-
01:38:56 Let's also apply GFPGAN to get better face quality.
-
01:39:02 Okay, now, amazing, as you can see, amazing quality, amazing image.
-
01:39:07 There is only only an artefact here, as you can see, . So if i would generate such images,
-
01:39:13 i could also get rid of this artefact.
-
01:39:15 I think i have covered pretty much everything.
-
01:39:19 As i said in the beginning, just join our discord channel.
-
01:39:22 From our about page and also in the in here you will see the link.
-
01:39:29 Just click the official discord . Please also share, like, subscribe, and if you support
-
01:39:35 us on our patreon, i would be greatly appreciated.
-
01:39:39 Currently we have three patrons.
-
01:39:42 I think i thank a lot to them for becoming patron of our, supporting our job.
-
01:39:48 You can also join our channel and support us from here, as you can see.
-
01:39:53 I would appreciate every bit of your support.
-
01:39:56 Hopefully see you in another video.
-
01:39:59 Please leave comments and ask the questions.
-
01:40:02 Ask the topics that you want to see as a new um tutorial.
-
01:40:07 Thank you very much.
-
01:40:09 Hopefully see you later.
