Skip to content

Zero To Hero Stable Diffusion DreamBooth Tutorial By Using Automatic1111 Web UI Ultra Detailed

FurkanGozukara edited this page Oct 27, 2025 · 1 revision

Zero To Hero Stable Diffusion DreamBooth Tutorial By Using Automatic1111 Web UI - Ultra Detailed

Zero To Hero Stable Diffusion DreamBooth Tutorial By Using Automatic1111 Web UI - Ultra Detailed

image Hits Patreon BuyMeACoffee Furkan Gözükara Medium Codio Furkan Gözükara Medium

YouTube Channel Furkan Gözükara LinkedIn Udemy Twitter Follow Furkan Gözükara

Our Discord : https://discord.gg/HbqgGaZVmr. The most advanced tutorial of Stable Diffusion Dreambooth Training. If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 https://www.patreon.com/SECourses

Playlist of Stable Diffusion Tutorials, Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, LoRA, AI Upscaling, Pix2Pix, Img2Img:

https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3

I am explaining from scratch to very advanced level how to use #Automatic1111 Web UI and D8ahazard #DreamBooth extension to teach new subjects, e.g. your face into a model. Moreover, I am showing how to inject your taught face into a completely new model e.g. Protogen x3.4 to produce awesome quality images without wasting too much time on finding correct prompts.

Automatic1111

https://github.com/AUTOMATIC1111/stable-diffusion-webui

How to install Web UI: https://youtu.be/AZg6vzWHOTA

How to use #StableDiffusion different models on Web UI:

https://youtu.be/aAyvsX-EpG4

Official SD v1-5-pruned : https://bit.ly/sd15ckpt

How To Do LoRA Training: https://youtu.be/mfaqqL5yOO4

Wiki Ram memory: http://bit.ly/3IqFUeW

Rare tokens: https://bit.ly/SDRareTokens

Rare tokens list: https://bit.ly/SDRareTokensList

Basics wiki: http://bit.ly/3Yy78pn

DreamBooth paper

https://arxiv.org/pdf/2208.12242.pdf

Best caption: https://bit.ly/bestcaption2

00:00:00 Introduction to Grand Master yet most beginner friendly Stable Diffusion Dreambooth tutorial by using Automatic1111 Web UI

00:03:11 How to install DreamBooth extension to the Web UI

00:04:09 How to update installed extensions on the Web UI

00:04:35 Introduction to DreamBooth extension tab

00:04:45 Training model generation for DreamBooth

00:05:34 How to download official SD model files

00:06:21 Training model selection and settings tab of the DreamBooth extension

00:07:36 What is training steps per image epochs

00:08:24 Checkpoint saving frequency

00:09:15 What is training batch size in DreamBooth training and how to set them properly

00:10:47 Set gradients to none when zeroing

00:11:24 Gradient checkpoint

00:12:04 Image processing and resolution

00:12:39 Horizontal flip and Center crop

00:12:50 What is Sanity sample prompt and how to utilize it to understand overtraining

00:13:30 Best options to set in Advanced tab of DreamBooth extension

00:14:22 Step Ratio of Text Encoder Training

00:14:49 Concepts tab of the DreamBooth extension

00:15:27 How to crop images from any position with Paint .NET or use Birme .NET

00:17:22 Setting training dataset directory

00:17:44 What are classification images

00:18:46 What is Instance prompt

00:19:05 How to and why to pick your instance prompt as a very rare word (very crucial)

00:21:52 Class of the subject

00:22:15 Everything about class prompt

00:22:55 Sample prompt

00:23:30 Clas images per instance

00:25:00 Number of samples to generate

00:26:27 Teach multiple concepts in 1 run

00:28:24 Saving tab

00:29:10 How to generate checkpoints during training

00:30:52 Generating class images before start training

00:33:28 What is batch size in txt2img tab

00:36:09 Start training

00:38:25 First samples/previews of training

00:39:13 Sanity prompt sample

00:39:54 How to understand overtraining with sanity samples

00:40:34 How to properly prepare your training dataset images

00:43:15 Checkpoint saving during training

00:44:30 What is Lr displayed in cmd during training

00:45:38 How to continue / resume training if an error occurs or you cancel it

00:46:41 We started to overtraining and how we understood it

00:48:24 How to start generating our subject (face) images from best trained checkpoint

00:50:09 What is prompt strength / attention / emphasis and how to increase it

00:51:17 How to increase image quality with negative prompts

00:51:50 How to get your taught subject with which correct prompting

00:52:31 What is CFG and why should we increase it

00:52:54 How to try multiple CFG scale values by using X/Y prompting

00:54:54 Analyzing CFG effect

00:56:03 How to test different artist styles with different CFG scales by using X/Y plot

01:00:47 How to use prompt matrix

01:02:54 Prompts from file or text box to test many different prompts

01:03:57 Generate thousands of images while sleeping

01:04:22 PNG info to learn used prompts, CFG, seed and others

01:07:00 Extras tab to upscale images by using AI models with awesome quality

01:09:54 How improve eyes and face quality by using GFPGAN

01:11:35 How to continue training from any saved ckpt checkpoint

01:12:06 How to upload your trained model to Google Colab to use

01:14:19 How to teach a new subject to your already trained model

01:15:55 How to use filewords for training

01:21:52 What is fine tuning and how it is done

01:23:10 Hybrid training

01:24:39 How to understand out of memory error

01:25:39 Lowest GPU VRAM settings

01:27:35 How to batch preprocess images

01:31:47 How to generate very correct descriptions by using GIT large model

01:33:19 How to inject your trained subject into any custom / new model

01:37:36 Where is model hash written and how to compare

Video Transcription

  • 00:00:02 Greetings everyone.

  • 00:00:03 Welcome to the most beginner-friendly and yet the most advanced and up-to-date Stable

  • 00:00:07 Diffusion DreamBooth model training tutorial.

  • 00:00:09 In this guide video, I am going to use the latest Automatic1111 web UI and the DreamBooth

  • 00:00:14 extension.

  • 00:00:16 The interface and the features of the DreamBooth plugin have been significantly changed, so

  • 00:00:20 all other tutorials are now obsolete.

  • 00:00:23 I have been experimenting for over 7 days to find the best settings and the training

  • 00:00:27 parameters.

  • 00:00:28 Moreover, I tried to learn what each option does and I have explained everything in this

  • 00:00:33 video.

  • 00:00:34 Before starting, let me provide some quick info.

  • 00:00:37 Stable Diffusion is a text-to-image generative public AI model, and the Automatic1111 web

  • 00:00:42 UI is a tool developed by the open source community to use Stable Diffusion easily.

  • 00:00:47 DreamBooth is an AI algorithm that allows you to teach new subjects or even styles to

  • 00:00:52 existing Stable Diffusion models very successfully, such as teaching the face of a person.

  • 00:00:57 In this tutorial, I am going to use freshly installed Automatic1111 web UI to teach my

  • 00:01:02 face by using Stable Diffusion 1.5 official version.

  • 00:01:05 I will also show how you can do the same training on Stable Diffusion version 2.1 as well.

  • 00:01:11 Moreover, I will show you how you can inject your trained subject, in this case my face,

  • 00:01:16 into any custom model and obtain amazing results.

  • 00:01:19 I will demonstrate an example by using the very popular and very high-quality custom

  • 00:01:24 model Protogen x3.4.

  • 00:01:26 With this injection methodology, you can use any namely released custom model and obtain

  • 00:01:32 even better results.

  • 00:01:33 You won't even need to retrain your subject for this to work.

  • 00:01:37 This method provides such high-quality images that you cannot even obtain them on paid services

  • 00:01:42 like Lensa or Midjourney.

  • 00:01:44 The Automatic1111 web UI is getting constantly updated, so let me show you the version I

  • 00:01:49 am using from official repository.

  • 00:01:53 This is the official repository of the Stable Diffusion web UI.

  • 00:01:57 It has been recently taken down, but it is now back again.

  • 00:02:00 So if you can't find this URL, just check out the video and I will update the description

  • 00:02:05 of the video and the comment of the video so you will find the latest link of the Automatic1111.

  • 00:02:11 So the commit we are using is published 9 hours ago, January 7, 2023.

  • 00:02:20 If you don't know how to install Automatic1111 web UI, I have a great tutorial for that.

  • 00:02:25 So this is the homepage of our YouTube channel.

  • 00:02:27 Go to playlist and in here you will see Stable Diffusion DreamBooth playlist and in this

  • 00:02:34 playlist, easiest ways to install and run Stable Diffusion web UI on PC.

  • 00:02:38 I will put the link of this video to the description and also you can watch how to use Stable Diffusion

  • 00:02:43 version 2.1 and different models in the web UI.

  • 00:02:46 This is also very important.

  • 00:02:47 I will also put the link of this video to the description as well.

  • 00:02:51 One more thing.

  • 00:02:52 This is commonly asked.

  • 00:02:54 If you encounter any problem, go to about page of our channel and in here you will see

  • 00:03:00 our Discord channel link.

  • 00:03:01 As you can see, I am currently hovering that.

  • 00:03:03 You can join our Discord channel and ask me any questions that you encounter.

  • 00:03:08 So this is our beginning screen of the Stable Diffusion.

  • 00:03:11 And first let's start with installing our extension, DreamBooth.

  • 00:03:14 To do that, go to extensions tab, click available load from and in here you will see DreamBooth

  • 00:03:21 extension.

  • 00:03:23 When you type DreamBooth, it is listed in here.

  • 00:03:26 I am just clicking install and it is getting installed.

  • 00:03:29 You should see a message here: OK, it has been installed.

  • 00:03:35 We have one error, but it is not a problem.

  • 00:03:38 It still works.

  • 00:03:39 So you see, we have a message on CMD window and also installed into the C web UI tutorial

  • 00:03:45 extensions as the DreamBooth extension.

  • 00:03:47 Now we have to restart CMD window because we are the first time installing and it is

  • 00:03:53 a necessity.

  • 00:03:54 Otherwise it won't work.

  • 00:03:56 Let's close.

  • 00:03:58 Let's restart.

  • 00:03:59 OK, restart has been completed.

  • 00:04:02 Let's just refresh and then go back to extensions and check for updates every time you start.

  • 00:04:09 OK, it has been just got updated.

  • 00:04:12 So I'm just clicking apply and restart UI.

  • 00:04:14 OK, it is done.

  • 00:04:16 After the first time installation.

  • 00:04:19 You don't need to restart CMD window once again.

  • 00:04:23 So you see, this is how frequently this stuff are getting updated.

  • 00:04:26 Literally, it has been updated just now, as you can see.

  • 00:04:30 So you should always check the latest version.

  • 00:04:33 Now we can start our tutorial.

  • 00:04:35 We are now.

  • 00:04:36 We see the DreamBooth tab in the interface.

  • 00:04:38 We click that.

  • 00:04:39 This is the interface where you are going to generate our model and train our face or

  • 00:04:44 new subject.

  • 00:04:46 First of all, we need to generate our model.

  • 00:04:51 You can simply enter any name here.

  • 00:04:53 It doesn't matter.

  • 00:04:54 So I will enter as web UI and the identifier prompt of my model, which will be ohwx.

  • 00:05:02 I will explain why it will be ohwx.

  • 00:05:05 Then we need to check source point.

  • 00:05:08 You can also import from Hugging Face, but I don't suggest that it is not necessary.

  • 00:05:12 I am checking version 1.5 Pruned ckpt.

  • 00:05:17 So version 1.5 pruned ckpt available in the official repository of StableDiffusion 1.5.

  • 00:05:24 You can just download it from here.

  • 00:05:26 Why we are using Pruned ckpt, not the pruned-emaonly ckpt, because this is better for training

  • 00:05:33 new subjects.

  • 00:05:34 When you click here, you can just download it with clicking here.

  • 00:05:39 And after you put that into your model folder, it will be also available here, as you can

  • 00:05:44 see.

  • 00:05:46 OK.

  • 00:05:48 Then just click the create model button.

  • 00:05:51 OK, you see.

  • 00:05:54 We have a message checkpoint successfully extracted to this folder.

  • 00:05:59 Where it is.

  • 00:06:00 Let me show you.

  • 00:06:01 It is inside Web UI Tutorial.

  • 00:06:04 And let's go to our models and inside DreamBooth, inside Web UI ohwx and in here working.

  • 00:06:11 And these are actually weights of the model that we have just composed.

  • 00:06:17 Lets continue.

  • 00:06:19 Now this model is selected here.

  • 00:06:21 This is where the selection we make.

  • 00:06:24 After we make this selection, we will train the selected model.

  • 00:06:28 Yes.

  • 00:06:29 OK.

  • 00:06:30 Now let's go to the settings tab in here.

  • 00:06:33 First click performance wizard.

  • 00:06:35 It will set the parameters according to the VRAM of your GPU.

  • 00:06:39 If you have less than 12GB of GPU, it is really hard to use DreamBooth.

  • 00:06:44 Unfortunately.

  • 00:06:45 You can use LoRA, but it is a topic of another video.

  • 00:06:48 Actually, it is almost same as this video, but there is.

  • 00:06:52 There are some just few tricks, and I already have a video for LoRA.

  • 00:06:56 So after watching this video, if you watch that video, the LoRA video.

  • 00:07:00 You can easily apply LoRA to your training.

  • 00:07:05 It is in here.

  • 00:07:06 You see how to do Stable Diffusion LoRA training by using.

  • 00:07:08 I will also put the link of this video to the description as well.

  • 00:07:13 So training steps per image epochs.

  • 00:07:16 First of all, let me explain what is epoch.

  • 00:07:19 We will have a training data set, the pictures of the subject that we are going to teach.

  • 00:07:26 In this case, I am going to teach myself.

  • 00:07:29 I will use 12 images of myself.

  • 00:07:33 Therefore, one epoch means that 12 steps.

  • 00:07:38 So each step is a training step and each epoch is training all of the training images one

  • 00:07:44 time.

  • 00:07:45 So one epoch means 12 steps in my case, because I have to have training images.

  • 00:07:50 And how many epochs we want?

  • 00:07:52 For teaching faces it is usually suggested 150.

  • 00:07:57 So when you go to the concepts, just click training with a person.

  • 00:08:00 It will set the most appropriate values for person.

  • 00:08:05 So you see, now it is set to 150.

  • 00:08:07 However, you can set this as much as you want and you can use a certain checkpoints.

  • 00:08:13 I will explain that.

  • 00:08:14 So I'm just going to make it 300.

  • 00:08:17 And how much time you want to wait between each epoch: zero, This is also zero.

  • 00:08:22 OK, this is important.

  • 00:08:24 How frequently we want to save our training.

  • 00:08:29 You know, if your computer crashes, if you cancel your training, if whatever happens,

  • 00:08:34 you will be able to continue from your latest saved model.

  • 00:08:40 Therefore, this is important.

  • 00:08:42 Also, if you do over training and you want to use previous training checkpoint, you also

  • 00:08:48 need to have a save.

  • 00:08:49 So I'm going to set this as 10.

  • 00:08:50 Be careful that when you are doing the DreamBooth training, it is usually taking about 4 to

  • 00:08:55 5 gigabytes for per saving.

  • 00:08:58 So if you don't have much hard drive space, you need to set this a higher number perhaps.

  • 00:09:04 This is saving preview images each epoch, for example, or for whatever the number of

  • 00:09:08 epochs you want.

  • 00:09:09 This doesn't take space, but this will slow you down.

  • 00:09:13 So I'm just going to leave this as five.

  • 00:09:17 Batch size: Now, this is very important.

  • 00:09:19 If you increase batch size, it will speed up your training significantly.

  • 00:09:23 However, this will also increase your GPU memory usage significantly as well.

  • 00:09:29 If you increase these numbers, you need to increase both of them equally to obtain the

  • 00:09:35 best results.

  • 00:09:36 So now, for example, it will be almost four times faster.

  • 00:09:40 Also, make sure that your training images count is divisible to this number.

  • 00:09:45 So two multiplied by two makes four and you must have training number of images divisible

  • 00:09:54 to four.

  • 00:09:55 So it can be four images, eight images, 12 images, 16 images, 20 images, but it shouldn't

  • 00:10:02 be 17 images.

  • 00:10:03 OK, this is the formula.

  • 00:10:06 Let's say you have 16 gigabytes of GPU RAM, then you can make this three by three.

  • 00:10:12 And then you should have nine or 18 or 27 or 36 images.

  • 00:10:17 That is the formula.

  • 00:10:18 I'm just going to leave this one by one for now.

  • 00:10:22 Also, another thing is, if you make this two and two, like this, it will be four times.

  • 00:10:28 Then you need to also increase learning rate by four times, like this and this.

  • 00:10:35 Otherwise it will be very slow.

  • 00:10:36 It is also requiring to speed up the learning rate as well.

  • 00:10:40 As much as you increase them.

  • 00:10:42 Since I will use one by one, I am just going to leave the default learning rate.

  • 00:10:46 OK, set gradients to none when zeroing.

  • 00:10:50 If you select this, it will increase the GPU RAM usage.

  • 00:10:55 How can you know that?

  • 00:10:57 The DreamBooth has a wiki pages and in here they have RAM usage settings.

  • 00:11:03 Let me show you.

  • 00:11:06 OK in here: settings known to use more VRAM.

  • 00:11:10 High batch size, as I just explained.

  • 00:11:12 Setting gradients to none when zeroing, which is these settings in here.

  • 00:11:18 So when you check this, it will use more VRAM and then use EMA.

  • 00:11:23 OK.

  • 00:11:24 Now let's continue.

  • 00:11:25 And I will explain.

  • 00:11:26 Gradient checkpoint: This is technique to reduce memory usage by clearing activations.

  • 00:11:31 So it is good to check it out.

  • 00:11:35 And then we are not just passing here.

  • 00:11:37 These are just kind of more advanced things to play with it.

  • 00:11:42 After you get used to how to use the DreamBooth, you can just change them, but in the learning

  • 00:11:47 stage just leave them as they are.

  • 00:11:49 If you set these too high, it will get too fast trained.

  • 00:11:53 However, it will also over train easily.

  • 00:11:56 If you get them too low, then you may never get it trained.

  • 00:12:01 So this is kind of experimental thing that you need to do a lot of experimentation.

  • 00:12:05 Image processing and resolution.

  • 00:12:07 This is important.

  • 00:12:08 When you use a model version based on the version 1.X, then they are 512 pixels.

  • 00:12:18 If you use version 2.1 then there is also 768 pixels version.

  • 00:12:24 So you need to set this according to the version of your base model.

  • 00:12:28 OK, the base model, the source checkpoint.

  • 00:12:30 We checked here.

  • 00:12:32 Since we are using version 1.5, official version.

  • 00:12:35 It is 512 pixels.

  • 00:12:39 Don't apply horizontal flip.

  • 00:12:40 This is not good for faces.

  • 00:12:42 Center crop.

  • 00:12:44 If your images are not cropped, you should check this out.

  • 00:12:46 I will explain how to set your images.

  • 00:12:49 Since my images are center cropped, I am not checking this.

  • 00:12:53 Sanity sample prompt.

  • 00:12:54 OK, this is important.

  • 00:12:55 We are going to use this prompt to see the overall training of the model.

  • 00:13:02 But how, in in terms of overtraining or not.

  • 00:13:07 During the training training, I will explain.

  • 00:13:11 So I am going to enter here photo of ohwx man by Tomer Hanuka.

  • 00:13:16 I will explain why did I enter this prompt.

  • 00:13:20 And by Tomer Hanuka.

  • 00:13:22 You will understand it.

  • 00:13:23 Miscellaneous, pre trained VAE or path.

  • 00:13:26 These are advanced things and you don't need currently.

  • 00:13:28 OK.

  • 00:13:29 OK, advanced stuff.

  • 00:13:31 This is important.

  • 00:13:33 If you check box the use EMA, then it will improve your training quality.

  • 00:13:38 However, it also increases the RAM usage significantly.

  • 00:13:41 Use eight bit Adam: This will reduce the RAM usage.

  • 00:13:45 BF16: This is also.

  • 00:13:47 This will also reduce RAM usage.

  • 00:13:50 xFormers: This will significantly increase your training speed.

  • 00:13:54 Cache Latent: This will also reduce the VRAM usage.

  • 00:13:59 All of these are actually written in this page.

  • 00:14:02 The out of memory top of the wiki.

  • 00:14:05 I will put this into the description.

  • 00:14:08 So you see these are all decreasing the RAM usage.

  • 00:14:11 Actually, it says that cache Latent increases, but as far as I know this is not increasing.

  • 00:14:19 But you can test that.

  • 00:14:22 So the Step Ratio of Text Encoder Training.

  • 00:14:25 This will improve your training quality.

  • 00:14:27 However, it will also increase the RAM usage of the graphic card.

  • 00:14:31 So if you encounter out of memory error, you should set this zero.

  • 00:14:37 But the optimal value for faces is 0.7, for style 0.2.

  • 00:14:44 And the other things you don't need to play with them.

  • 00:14:46 They are more advanced stuff.

  • 00:14:49 OK, now the concepts.

  • 00:14:51 This is the very important part.

  • 00:14:54 You can set a [filewords], prompts and directories.

  • 00:14:59 So first of all we have to set our training data set.

  • 00:15:03 Training data set directory.

  • 00:15:04 Where are my training data set?

  • 00:15:07 It is inside my pictures folder and it is in here Best DB.

  • 00:15:12 So all of these images are now 512 by 512 pixels.

  • 00:15:19 Let me show their original version.

  • 00:15:21 So their original version is here.

  • 00:15:26 How did I set them like this?

  • 00:15:27 I have used a Paint .NET to crop them as I want.

  • 00:15:31 For example.

  • 00:15:32 Let me show you: paint dot net is a free tool, by the way.

  • 00:15:36 You can install it from the Google.

  • 00:15:40 Just click, like this, and then I am just cropping them with a square.

  • 00:15:46 So I click rectangle, select, then click here, then in here, fixed ratio like this, Then

  • 00:15:52 you can pick the any part of the image you want.

  • 00:15:55 Just for example here.

  • 00:15:57 Then you can control-C control-N and it will paste into a new place.

  • 00:16:01 You can save it.

  • 00:16:03 Or in here.

  • 00:16:04 You can just resize these to very low resolution like this, with control-R. It will open resize

  • 00:16:10 type like this, then control-V and expand.

  • 00:16:13 You see, now it is cropped.

  • 00:16:15 Alternatively, you can use Birme .NET.

  • 00:16:17 Birme dot net is a famous site to crop images.

  • 00:16:22 It is commonly used in the community.

  • 00:16:25 You can just, for example, upload any image there and crop them.

  • 00:16:30 For example, let's upload this image.

  • 00:16:33 These are currently squared, but if they are not square, it will also automatically let

  • 00:16:37 you square them.

  • 00:16:38 Let me show: OK, you see, both of these images are not cropped.

  • 00:16:42 So you are able to crop them with your mouse like this: set the position, then set the

  • 00:16:48 resolution from here: 512, 512.

  • 00:16:51 If you use SD version 2.1, then they will be seven 768 pixels.

  • 00:16:55 OK, you can also use auto detect image focal point.

  • 00:16:58 Do not resize.

  • 00:16:59 And you can click here.

  • 00:17:01 If you check, do not resize, It won't.

  • 00:17:03 They won't be resized to this resolution.

  • 00:17:06 Then save a zip and all of them will be saved as zip.

  • 00:17:08 Then you can extract them with the software you have.

  • 00:17:12 If you don't have any software like Winrar Windows still able to extract them.

  • 00:17:17 All right.

  • 00:17:18 If you can't make them, just join Discord and I will help you, hopefully.

  • 00:17:22 So data set directory.

  • 00:17:23 When you ready your images, then we will enter the path of it.

  • 00:17:27 So this is my.

  • 00:17:28 Let me enter the folder directory.

  • 00:17:32 I click here and you see I am able to select the path.

  • 00:17:34 I do control-C to copy it, paste it here (ctrl-v).

  • 00:17:38 So this is the directory where my training images are located.

  • 00:17:43 Classification directory: Now, what is classification?

  • 00:17:46 Classification are generic images that we will use to not over train our model and also

  • 00:17:54 keep the inner sanity of the model.

  • 00:17:58 So that the entire model does not become looking like us.

  • 00:18:02 OK.

  • 00:18:03 So for this I will just generate a new folder.

  • 00:18:06 Yes, I have copy pasted the path.

  • 00:18:10 I will set it as web UI tutorial.

  • 00:18:13 You can also enter an existing another directory.

  • 00:18:16 It is fine.

  • 00:18:18 Instance token: Now [filewords] are used to set the different description for each training

  • 00:18:25 images.

  • 00:18:26 This is very, very advanced and hard to do.

  • 00:18:29 So I will explain this in the later parts of the tutorial video.

  • 00:18:34 For now I will just skip them.

  • 00:18:35 You can also skip to that part in the video, because I will put the sections of the video

  • 00:18:42 into the description.

  • 00:18:43 Now prompts: This is very important.

  • 00:18:46 The instance prompt is used to define the keyword that will activate our new subject

  • 00:18:53 that we taught to the model.

  • 00:18:56 So in here you have to pick a unique word, but it has to be very specific and rare.

  • 00:19:05 Whatever you enter to the model.

  • 00:19:07 They will get turned into tokens.

  • 00:19:10 They will split into tokens.

  • 00:19:11 So there is a reddit thread that explains the rare tokens.

  • 00:19:15 I will put link of this page to the description and in here the rarity of the tokens are listed.

  • 00:19:24 So, for example, you have entered, let's say, mill.

  • 00:19:29 It is a single token, but mill probably exist in the real life a lot.

  • 00:19:34 Therefore, you have to go to the bottom and try to find rare tokens that you can't make

  • 00:19:40 sense of.

  • 00:19:42 For example, they.

  • 00:19:43 Also, these tokens should be used in other languages as well.

  • 00:19:49 For example, from here: ohwx is a very famous token because this is a token that almost

  • 00:19:57 does not exist in anywhere.

  • 00:19:59 When I type ohwx to the Google, you see all unrelated things.

  • 00:20:06 They look like spam.

  • 00:20:07 So this is a good token and, for example, you can also try other tokens here that looks

  • 00:20:14 like to you weird.

  • 00:20:15 Maybe this one?

  • 00:20:16 Yes, this, OK.

  • 00:20:17 I'm not sure if this is a real name or not, so you can verify it, but ohwx works very

  • 00:20:27 well and the token you pick is extremely important.

  • 00:20:31 Because your training will begin from that token and you can inject a new token that

  • 00:20:37 does not exist in the database, so everything you enter will become a token that it knows

  • 00:20:44 they will get splint into.

  • 00:20:46 Even if you generate a new keyword, such as SECourses, the model will not see this as

  • 00:20:54 an SECourses.

  • 00:20:55 How will it see it?

  • 00:20:57 First it will look to S key, SE key.

  • 00:21:01 So the SE key does exist, OK.

  • 00:21:03 Then it will look sec.

  • 00:21:05 So, yes, sec also exists.

  • 00:21:10 And then it will look seco.

  • 00:21:12 OK, there is no seco, so it will get split into sec.

  • 00:21:16 And then it will be like it will check the other characters, the remaining characters,

  • 00:21:23 so they will all get split into yes our SECourses will probably become sec our ses or something

  • 00:21:33 like that.

  • 00:21:34 You see, you are understanding.

  • 00:21:35 I am hoping that.

  • 00:21:37 So the keyword you enter will get split into tokens, no matter what you enter.

  • 00:21:44 Therefore, we are picking a single token that is very rare from this list and I have done

  • 00:21:51 many tests.

  • 00:21:52 So ohwx is working very well and then we need to enter the class of the subject we are going

  • 00:21:58 to teach.

  • 00:21:59 What am I going to teach?

  • 00:22:00 I am going to teach the face of me.

  • 00:22:03 So it's the face of man.

  • 00:22:04 Therefore, I am just entering man.

  • 00:22:07 So this is really important.

  • 00:22:09 It will use the underlying knowledge of man in the model to learn my face.

  • 00:22:15 Class prompt: now, as I said, this will be used to keep sanity of our model and prevent

  • 00:22:21 overtraining.

  • 00:22:22 When you also hover it, it says: read me for more info.

  • 00:22:27 I wonder if they added into the wiki yet.

  • 00:22:31 In the basics perhaps?

  • 00:22:33 OK, in the wiki, in the basics they have a small explanation.

  • 00:22:38 A class specific prior preservation loss is also introduced to prevent overfitting and

  • 00:22:44 encourage the generation of diverse instances of the same class.

  • 00:22:49 They have made an example like this.

  • 00:22:51 So in class prompt I am going to enter photo of man.

  • 00:22:55 OK, you see, these two are same and the sample prompt.

  • 00:23:00 This will be used to generate preview images during the training so we will be able to

  • 00:23:04 see how the training is going on and if it is becoming too overtrained or not.

  • 00:23:11 So in here I am going to enter photo of ohwx man.

  • 00:23:15 OK, I am not entering any negative prompts and I'm not using any sample prompt template.

  • 00:23:22 So these are more, let's say, advanced things that you can also play with them after you

  • 00:23:28 learned the basics.

  • 00:23:30 And in here, class images per instance.

  • 00:23:32 In the community it is usually said that have minimal 300 images total.

  • 00:23:39 In the official paper of the DreamBooth.

  • 00:23:42 Which is here.

  • 00:23:45 I will also put the link of this paper to the description.

  • 00:23:48 They have used 200 classification images.

  • 00:23:53 I have made some tests but I can't say for sure how much minimum is necessary.

  • 00:23:59 So I am just going to follow the community and to reach the 300 images I need to enter,

  • 00:24:06 let's easily calculate 300 divided by the number of training images.

  • 00:24:12 I have 12, so 25..

  • 00:24:14 You can also calculate like this.

  • 00:24:16 So classification, CVG scale.

  • 00:24:18 This is same as text2images CFG scale how many, how much CFG scale you want to use for

  • 00:24:25 generating classification images?

  • 00:24:27 By the way, you can also use text2image tab to generate your classification images.

  • 00:24:32 Put them into the folder that we set here.

  • 00:24:35 Then the extension will not generate any new images.

  • 00:24:40 It is up to you.

  • 00:24:41 You can use the both ways, but if you use this way, it will also generate a text description

  • 00:24:47 file same as the image name, and it will put the description you have typed here inside.

  • 00:24:54 That I will show in a moment.

  • 00:24:56 Classification steps.

  • 00:24:58 So this is the number of steps equal to the in here.

  • 00:25:01 Sampling steps: OK, and number of samples to generate.

  • 00:25:05 So this is the number of samples that we want to be generated during the training to see

  • 00:25:10 how the training is going on.

  • 00:25:12 You can set this to 1, 2, 3, 4, whatever you want.

  • 00:25:15 Sample seed -1.

  • 00:25:17 It means that the every image generated for samples will be different random with a random

  • 00:25:23 seed and the samples CFG scale 7.5.

  • 00:25:27 You don't need to change this.

  • 00:25:30 These are just same as the text2image.

  • 00:25:32 You will make sense of it after you get used to text to image.

  • 00:25:36 OK, and now let's return back here: How many images we want to generate for classification

  • 00:25:45 at the same time in parallel.

  • 00:25:47 So I have 12 GB VRAM memory.

  • 00:25:50 Therefore, I am able to generate 10 images as a batches, so it will take lesser time

  • 00:25:56 to generate classification images.

  • 00:25:58 By the way, you only need to generate classification images one time for each class prompt.

  • 00:26:06 So if you don't change photo of man, if you don't change your subject class, then you

  • 00:26:11 you don't need to generate them once again.

  • 00:26:14 So for showing you, I will just set this as five and you will understand.

  • 00:26:20 It will generate images, five and five as batches.

  • 00:26:24 OK.

  • 00:26:25 And one more thing: you can teach up to three concept at a time to the model.

  • 00:26:34 So the first concept is is: let's say it's me, and in here I can also teach my wife picture

  • 00:26:42 for example.

  • 00:26:43 It can be like wife DB.

  • 00:26:45 So another folder and its classification data set can be exactly same as the other one,

  • 00:26:52 or no, it wouldn't be, because it would be related to women.

  • 00:26:55 Since it will be a woman, not man.

  • 00:26:59 Therefore, let's say woman images, and in here you need to use another keyword for that.

  • 00:27:06 So it is important to find a rare keyword from this list.

  • 00:27:13 I don't know which ones are very rare, but a ske is commonly used at another prompt.

  • 00:27:21 So it can be like a ske woman.

  • 00:27:26 And in here it will be a photo of women and sample will be a photo of a ske woman.

  • 00:27:38 OK, and the rest is same.

  • 00:27:40 And you can also add another concept here.

  • 00:27:43 But the only thing that matters is the class of the another subject, If it.

  • 00:27:50 If it is a cat or a dog or a tree, whatever you are teaching the class and instance prompt

  • 00:27:59 so that you can differently call them and you can use both of them in a single picture.

  • 00:28:05 For example, you can generate pictures of your wife and yourself in the same picture,

  • 00:28:09 or your dog and yourself in same picture.

  • 00:28:12 But for this tutorial I am not going to teach multiple concepts, so it is up to you to teach

  • 00:28:18 or not.

  • 00:28:19 I will just teach a single concept.

  • 00:28:23 All right.

  • 00:28:24 Now we are moving to saving tab.

  • 00:28:27 In here you can enter a custom model name for saving checkpoints and LoRA models.

  • 00:28:33 You can check out the half-model.

  • 00:28:34 They say that it doesn't decrease the quality, but the checkpoints are smaller.

  • 00:28:41 I didn't test it so I can't say if it is 100 percent correct or not.

  • 00:28:46 So to keep the quality in max, I won't check it.

  • 00:28:51 Save checkpoints to sub directory.

  • 00:28:53 You should make this.

  • 00:28:55 You should check this checkbox so that the savings will be under Web UI ohwx.

  • 00:29:01 They won't get in the same directory.

  • 00:29:04 Now this is important to set.

  • 00:29:06 Generate a ckpt file when saving during training.

  • 00:29:10 If you don't check this, then let's say you won't be able to test, load back and test

  • 00:29:17 the model at the 20 epoch or 40 epoch or 60 epoch.

  • 00:29:22 So you should check this out.

  • 00:29:24 You can also continue from that point using that as a base mode.

  • 00:29:27 And you can also load that model and you can do test inference on that.

  • 00:29:35 So this is important, but this will increase your hard drive usage.

  • 00:29:40 Be careful with that.

  • 00:29:41 Generate a ckpt file when training completes.

  • 00:29:43 Yes.

  • 00:29:44 Generate a ckpt file when training is canceled.

  • 00:29:46 I'm not checking this because when I cancel I don't want it to generate a ckpt.

  • 00:29:53 After canceling you can just load the model and click ckpt and it will generate a ckpt

  • 00:29:58 file from the last saved weights.

  • 00:30:00 Now weights.

  • 00:30:02 You see there is also option to save separate diffuser snapshots when saving during training.

  • 00:30:08 This option will generate weight files, like you see here.

  • 00:30:14 So for demonstration purposes I will also select this from later point you can just

  • 00:30:21 make them as a new model folder and then you can continue your training from there.

  • 00:30:27 Alternatively, I believe you can generate a new model from your saved ckpt file as a

  • 00:30:35 new source checkpoint and you can continue from that saved checkpoint ckpt file.

  • 00:30:41 I think both should be same.

  • 00:30:44 OK.

  • 00:30:45 After you did settings, just click save settings.

  • 00:30:48 When you click train, I think it is automatically also saving.

  • 00:30:51 Now I will generate the class images before starting training.

  • 00:30:56 This will use the settings that I did set in these options.

  • 00:31:02 And let's see what kind of class images we are going to get.

  • 00:31:05 OK, so you see, it is generating 300 class images for training.

  • 00:31:10 Why?

  • 00:31:11 Because currently I have no images in here, but, as you can see, it is not working right

  • 00:31:18 now.

  • 00:31:19 So there is a mistake.

  • 00:31:20 Obviously, To solve this mistake, I will just restart the application.

  • 00:31:25 OK, restart is completed.

  • 00:31:28 Let's refresh.

  • 00:31:29 Go back to our extensions tab.

  • 00:31:32 Check for updates.

  • 00:31:33 If there is an update.

  • 00:31:34 Yes, there is a new update during the video.

  • 00:31:38 The updates are coming, So let's just refresh.

  • 00:31:41 OK, refreshed.

  • 00:31:43 Let's go back to extensions.

  • 00:31:44 Check for updates.

  • 00:31:45 OK, we are at the last.

  • 00:31:46 Then let's go to DreamBooth, select our model load settings.

  • 00:31:52 Go to the generate.

  • 00:31:54 Before generating, I will delete these incorrect images first.

  • 00:31:59 Let me do that.

  • 00:32:00 Go to the pictures and in here, go to the web UI tutorial.

  • 00:32:07 Ctrl-a shift-delete.

  • 00:32:08 Yes, all deleted.

  • 00:32:11 And just click generate class images.

  • 00:32:13 OK, let's see if any error again.

  • 00:32:16 OK, OK, I think error continues.

  • 00:32:20 So instead of these methods, I will use txt2image tab to generate images.

  • 00:32:27 The only difference between these and using text to image is: let me show you.

  • 00:32:33 Meanwhile, just let's restart the application.

  • 00:32:37 When you use, generate images like this will also generate a text file, same name as the

  • 00:32:44 image name, and inside it it will write photo of man as a description.

  • 00:32:51 So this is useful when you do [filewords] training or when you do LoRA training, But

  • 00:33:00 for now it is not necessary for us.

  • 00:33:04 I just reported this bug also to the developer, so I believe it will get fixed really quickly.

  • 00:33:11 OK, so we are going to generate our class images from here.

  • 00:33:19 Classification images: photo of man.

  • 00:33:21 I'm just typing that setting the sampling steps counts 40, setting CFG: 7.5.

  • 00:33:26 So this batch size means that processing multiple images at the same epoch.

  • 00:33:35 It will use more GPU RAM, but it will make it faster.

  • 00:33:39 And how many I need?

  • 00:33:40 I need 300.

  • 00:33:41 Therefore, I am going to set this as 38, like this, and then just click generate.

  • 00:33:50 So now it will generate images.

  • 00:33:52 But make sure that the selected model here you see, is same as the model that you used

  • 00:34:00 to generate your model.

  • 00:34:02 So in here, when you select your model for training, it shows the base model source checkpoint.

  • 00:34:08 You see Stable Diffusion 1.5 pruned, and currently I am generating same images from this model.

  • 00:34:14 So the generated images will be saved in text to image folder.

  • 00:34:20 Let's open it by clicking here.

  • 00:34:22 OK, when I have clicked open folder in here.

  • 00:34:27 It didn't open because it says in the CMD window: text to image images does not exist.

  • 00:34:34 After you create an image.

  • 00:34:35 It will be generated because, as I said, this is a fresh installation to demonstrate you.

  • 00:34:40 Therefore, all of my settings here are also default.

  • 00:34:42 I didn't change any of them.

  • 00:34:46 And there is one another thing that I want to mention.

  • 00:34:48 In the DreamBooth model selection You will see in the SD 1.x versions they has they have

  • 00:34:56 EMA or not.

  • 00:34:57 So if they have EMA, it will increase your further training, fine tuning the model.

  • 00:35:05 So you should pick EMA version having models.

  • 00:35:09 It only exists in the 1.x versions.

  • 00:35:11 I think in the SD 2.0.

  • 00:35:14 In the 2.1 there is no model released with has EMA features.

  • 00:35:20 OK, the first batch has been completed.

  • 00:35:22 Let's open the folder.

  • 00:35:23 Now the folder is opened.

  • 00:35:26 So these are photo of man.

  • 00:35:28 You see, there will be very weird images, bad quality images, but they don't matter

  • 00:35:33 much.

  • 00:35:34 They are not very important as long as they are generated by our checkpoint model.

  • 00:35:40 OK, after all of the images have been generated, just select them all with control-C, then

  • 00:35:48 go back to your folder where you want to get them saved web UI tutorial.

  • 00:35:54 I am just going to copy paste them in the folder.

  • 00:35:58 OK, let's return back to our DreamBooth and load settings.

  • 00:36:04 So now we have the sufficient amount of classification images.

  • 00:36:09 Now we are ready to click start training.

  • 00:36:12 OK, when we start training, it will first start by caching them out.

  • 00:36:19 We will see that.

  • 00:36:23 So you see, it says that it has found 300 regularization images.

  • 00:36:28 Therefore, it is not going to generate any more images.

  • 00:36:32 Currently it is caching them.

  • 00:36:35 OK, after the caching has been completed, you will see the training has been started.

  • 00:36:42 It is progressing step by step.

  • 00:36:45 You see 13, 14.

  • 00:36:48 If you get out of memory error, then you need to try further decreasing memory usage.

  • 00:36:55 All of the low memory settings and high memory settings are stated in the wiki.

  • 00:37:01 I will put this into the description.

  • 00:37:03 Also, you are seeing right now.

  • 00:37:05 High batch size, set gradients.

  • 00:37:08 These will increase your memory usage and these will decrease your memory usage.

  • 00:37:13 There is not much else things that you can do, and one another thing is that the developers

  • 00:37:18 are constantly trying to optimize and improve the extension to reduce memory usage.

  • 00:37:26 So therefore, when you watch this video, or maybe one month later, you, your card, could

  • 00:37:33 perhaps do use DreamBooth training.

  • 00:37:37 So that's another possibility.

  • 00:37:41 And after how many steps we are going to see our first sample images?

  • 00:37:45 We can calculate it easily.

  • 00:37:47 In the settings tab.

  • 00:37:49 We did set as 10 epoch and how many training images we have.

  • 00:37:53 We have 12, you see in here.

  • 00:37:56 Therefore, after 120 steps we are going to see our first sample training sample images.

  • 00:38:05 Actually, on, after 120 steps, it will save the checkpoint.

  • 00:38:12 After 60 steps, because we did set 5 epochs.

  • 00:38:15 We are going to see the first sample image and 60 steps has been completed.

  • 00:38:20 So it is generating preview images at the step 60.

  • 00:38:24 Ok, the first samples have been generated.

  • 00:38:28 Let's open the samples folder.

  • 00:38:29 So where they were saved, they were saved under our model.

  • 00:38:34 Let me show: Ok, I have so many same tabs.

  • 00:38:41 Ok, inside our installation folder, go to the models and in here go to the DreamBooth

  • 00:38:47 and in here you see the same name as our training model name.

  • 00:38:50 Enter there.

  • 00:38:52 In here you will see samples.

  • 00:38:53 When you click here you will see the samples.

  • 00:38:56 So the first sample is generated with this sample prompt with ohwx man.

  • 00:39:05 So this is our class and this is the unique instance prompt we have set.

  • 00:39:09 Ok, so there is another image.

  • 00:39:11 You see.

  • 00:39:12 This is generated with photo of ohwx man by Tomer Hanuka.

  • 00:39:19 Why did I set this and where did I set this?

  • 00:39:22 I did set this in here.

  • 00:39:24 If you remember, The second prompt you see in here with name it as one, is the sanity

  • 00:39:35 sample prompt.

  • 00:39:38 The number here is the step count that it has been generated, and this is the other

  • 00:39:43 thing is the prompt used to generate it.

  • 00:39:47 After we progress in the training, you will understand why we are using this.

  • 00:39:54 As much as this image looks like us, with a different style, it means that our model

  • 00:40:00 is learning good, and when it becomes exactly like us, not styled like this, that would

  • 00:40:07 mean that our model is overtrained and now we can't apply styles.

  • 00:40:12 Our aim is learning our teaching our shape, but not overtraining it, not distributing,

  • 00:40:22 disturbing the underlying context, the knowledge of it, not overriding it completely.

  • 00:40:29 So after we progress in the training, we will understand better.

  • 00:40:32 Okay, now let me explain to you to how to prepare your training dataset images.

  • 00:40:40 What is important with the selection of the images?

  • 00:40:45 What we want to teach is the subject that we want to teach.

  • 00:40:50 The most important part.

  • 00:40:52 I want to teach my face.

  • 00:40:54 Therefore, other than my face, everything must be different, or, let's say, should be

  • 00:40:59 different in each of the images.

  • 00:41:01 So, other than face, what can be different?

  • 00:41:03 My clothes and the background can be different.

  • 00:41:07 So if you are teaching your face other than your face, all of the backgrounds and the

  • 00:41:14 clothes should be different as much as possible.

  • 00:41:17 As you can see in my pictures, I have made sure that all of the backgrounds and the clothes

  • 00:41:23 are different or the clothes are not visible.

  • 00:41:27 So if you make your clothes different and your backgrounds are different, then the model

  • 00:41:34 will learn your face, not your clothes or not the backgrounds.

  • 00:41:37 That is what we want.

  • 00:41:38 We want to teach our face, not the other things in the pictures.

  • 00:41:42 If you use same clothes, then the model will not say that this is the face and this is

  • 00:41:49 the clothes and the model will learn both of them at the same time and it will reduce

  • 00:41:53 your stylizing your face.

  • 00:41:57 Therefore, the key point of preparing training images is having different things other than

  • 00:42:04 the subject.

  • 00:42:05 So if the subject is face, the other things must be different.

  • 00:42:09 Also, you should have different angles of photos and different distances of photos.

  • 00:42:18 It will make the model learn different angles and different distances to generate different

  • 00:42:25 kinds of different styles, more variety of images.

  • 00:42:30 So if you make your images, I can't say my data set is the best available data set.

  • 00:42:37 You can expand your data set with more variety of images, more variety of poses, more variety

  • 00:42:42 of angles, more variety of lightning.

  • 00:42:46 Lightning also matters.

  • 00:42:47 It would be better.

  • 00:42:49 However, this is a small data set and I think it is working pretty decently.

  • 00:42:55 But if you expand this data set, your training data set, with more variety, then it is better.

  • 00:43:00 It will learn your face or subject in a more generalized matter and with that way we will

  • 00:43:08 be able to produce different kind of different artistic images more easily.

  • 00:43:14 Okay, so you see, currently it is compiling a checkpoint ckpt file and you can just load

  • 00:43:21 the ckpt file directly and do inference on that checkpoint.

  • 00:43:27 It is compiling checkpoint at the step 360, which is epoch 30, and so where are these

  • 00:43:34 checkpoint files are located?

  • 00:43:37 They are located on models inside inside our folder, and you see the ckpt file and the

  • 00:43:45 yaml file is here.

  • 00:43:48 If you don't know what are yaml files, just watch my how to use Stable Diffusion 2.1 and

  • 00:43:55 different models in the web ui tutorial video.

  • 00:43:59 I will put the link as usual, and let's check out our so far samples.

  • 00:44:07 So in this image this is like me, but no other sample prompts are like us.

  • 00:44:13 We just need to do more training.

  • 00:44:15 And also in this screen you will see 5.5 or 3.7.

  • 00:44:22 So this means that this is how many iterations per iteration is done in each second.

  • 00:44:30 However, these values are not very correctly displayed, so there is also loss and this

  • 00:44:37 lr is important.

  • 00:44:38 This shows your learning rate.

  • 00:44:40 So 2e-6, what does that mean?

  • 00:44:43 That means that it is a number.

  • 00:44:47 When you type it to the google 2e-6 and go to the first result, for example, it will

  • 00:44:54 show you it is equal to this number.

  • 00:44:57 Okay, so this is the number.

  • 00:44:58 Actually we did set in our settings, in our learning rate, you see.

  • 00:45:04 So this is equivalent of the scientific e-notation number.

  • 00:45:10 If you set changing numbers from here you see there are changing numbers like polynomial,

  • 00:45:17 constant or other things, learning rates then you will see different numbers in here and

  • 00:45:25 it also shows the gpu usage.

  • 00:45:27 However, this is also not very accurate.

  • 00:45:30 It says that 9.5 gigabytes currently is being used.

  • 00:45:34 Okay, okay, it has been 72, 82 epochs.

  • 00:45:41 Now i will show you how you can continue training, if an error occurs.

  • 00:45:46 So to illustrate that, i will just crash the application with closing here.

  • 00:45:51 When you close from here, it won't save any checkpoint or anything.

  • 00:45:55 Use the error connection error.

  • 00:45:58 Then just restart the application and after the restart is done, just refresh your interface.

  • 00:46:08 Go to the DreamBooth tab, select the model, click load settings it actually it will be

  • 00:46:14 automatically loaded.

  • 00:46:15 And then just click train.

  • 00:46:17 It will continue from the last checkpoint, which is 80 epochs.

  • 00:46:23 Let's wait.

  • 00:46:25 Okay, you see it has.

  • 00:46:28 It is continuing from wherever it is left, as you can see here.

  • 00:46:34 Also, in the cmd window it shows first resume epoch, and first resume step, step, as you,

  • 00:46:42 as you can see here.

  • 00:46:43 Okay, we are over 168 epochs and we are already doing a lot of over training.

  • 00:46:52 How do i know?

  • 00:46:54 As i said you in the beginning, i have entered a sanity, sanity prompt.

  • 00:47:03 So the samples numbered with, dash one are the sanity prompts.

  • 00:47:09 And let's look at the sanity prompts changes.

  • 00:47:12 So the sanity prompts started like this.

  • 00:47:15 Then in here you see, the sanity prompt is resembling me and also here resembling me,

  • 00:47:23 okay, resembling me somehow.

  • 00:47:26 And after certain point, actually after 1368 steps, the sanity prompts become just like

  • 00:47:37 me.

  • 00:47:38 You see, it is not anymore styled okay, like this, like this and this is almost as like

  • 00:47:45 me, and you see, they are not anymore styled like here.

  • 00:47:50 Styling is completely gone and in here.

  • 00:47:53 Therefore, now we are sure that we are doing over training.

  • 00:47:59 So i am just going to stop training with cancel and i am going to use different checkpoints,

  • 00:48:08 test them out to see how they are performing.

  • 00:48:11 Now the hard part is coming: the prompting, the proper, the correct prompting to obtain

  • 00:48:17 the good results.

  • 00:48:19 So the training has been cancelled.

  • 00:48:22 Let's look for the closest one.

  • 00:48:25 I am refreshing here and in here, yes, this one looks like the closest one: 1308.

  • 00:48:32 Then go to the text2image tab.

  • 00:48:35 So how are we going to generate our own image?

  • 00:48:39 We are going to use photo of.

  • 00:48:41 These two keywords are also associated with us right now, but not as strong as our prompt

  • 00:48:47 instance.

  • 00:48:49 Ohwx and man.

  • 00:48:50 Also man is very much associated, okay, so when we type like this and hit the generate

  • 00:48:57 button, it will generate our own image.

  • 00:49:00 Okay, the image is ready.

  • 00:49:02 You see, it is like us and now we need to style it.

  • 00:49:06 So let's add in this name style and let's see what kind of result we are going to get.

  • 00:49:12 Okay, as you can see, we didn't get much of styling, so therefore, i am going to show

  • 00:49:21 you an extension which is named as web ui prompt generator.

  • 00:49:26 You can install it from available tab.

  • 00:49:29 Just click load and in here just search for prompt and you will see prompt generator and

  • 00:49:34 just click install and then just apply and restart the ui.

  • 00:49:38 After that you will see prompt generator tab here.

  • 00:49:41 So let's get some extra additional keywords from prompt generator and let's click generate.

  • 00:49:47 Okay, there are a lot of results here, but, this came to me, could work like, so i copied

  • 00:49:55 it and pasted it in here and let's see the result we are going to get.

  • 00:50:00 Okay, we got somewhat decent results, but it is still not very much like us.

  • 00:50:06 Therefore, we need to increase the prompt strength.

  • 00:50:10 So what is prompt strength?

  • 00:50:12 prompt attention.

  • 00:50:13 This is from the official wiki of the Automatic1111.

  • 00:50:17 So if you want to increase attention to a word by factor of 1.1, you can take the word

  • 00:50:24 inside one parentheses.

  • 00:50:26 If you want to increase the attention even more by factor of 1.2, 21, so you can just

  • 00:50:34 put like this: alternatively, you can use an easier way, which will be: let me show

  • 00:50:40 me, let me also zoom in, just type like this: okay, so this will increase the attention.

  • 00:50:47 This will force model to generate image that is more like us and it will going to ignore

  • 00:50:56 the rest.

  • 00:50:57 Also, in this prompt there are so many things that would be unrelated to disney style.

  • 00:51:05 So what would be related to disney style, for example, CGI, and let's also add some

  • 00:51:13 other keywords.

  • 00:51:15 Okay, here are results.

  • 00:51:17 Not very much like us and not very good quality.

  • 00:51:21 We need to improve the prompt with adding some negative prompts as well.

  • 00:51:28 Okay, here i have added some negative prompts and now you see we have a much better artwork,

  • 00:51:35 but still not very much resembling to me.

  • 00:51:39 So i am going to try another prompt with also increasing, the, the emphasis of our unique

  • 00:51:47 keyword, which is ohwx and the man.

  • 00:51:51 In every prompt you must have ohwx man with some increased strength, probably to get your

  • 00:51:58 own face, and also adding photo of.

  • 00:52:01 Why?

  • 00:52:02 Because during the training we have used class prompt as photo of man.

  • 00:52:09 Therefore, now these three keywords are also associated with us, but the most association

  • 00:52:15 is coming from ohwx, okay.

  • 00:52:18 Okay, so i am going to try with emphasis of 1.5 and a new prompt like this.

  • 00:52:27 Let's see the results.

  • 00:52:28 Okay, we got an image that is not very stylized.

  • 00:52:32 Therefore, we need to increase CFG.

  • 00:52:35 So what is CFG?

  • 00:52:36 CFG is classifier free guidance scale how strongly the image should conform the prompt.

  • 00:52:43 Lower values produce more creative results.

  • 00:52:45 We want the model to obey our prompt because we are providing a very detailed prompt.

  • 00:52:54 Therefore, we need to increase scale and try it.

  • 00:52:59 So i will show you how you can try multiple scale values.

  • 00:53:04 Go to the bottom on here and go to the x/y plot.

  • 00:53:09 So in the x/y plot there are x and y values.

  • 00:53:14 Currently we only need x value.

  • 00:53:15 In the x value i am going to select CFG scale and in here i am just typing seven, eight,

  • 00:53:22 nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, and i wanted to use same

  • 00:53:29 seed for all of the input so that i can see the changes, and i will generate four images

  • 00:53:37 in each iteration, in each step.

  • 00:53:42 My graphic card is able to process four images.

  • 00:53:45 If you don't have much vram, you can't do that.

  • 00:53:48 Then you should increase this.

  • 00:53:50 Okay, if you check this, keep minus one for seeds.

  • 00:53:54 Then each image in the each generation would be different.

  • 00:53:58 However, i want to see the difference of CFG effect in a legend.

  • 00:54:04 Therefore, i'm keeping it like this and then just click generate.

  • 00:54:09 So currently, in the CMD window, actually it is generating four images at each epoch.

  • 00:54:19 So you see, in the 20 steps, actually it is processing 80 steps.

  • 00:54:23 So four of them is being parallelly processed, since i did set batch size to four.

  • 00:54:32 Okay, CFG images.

  • 00:54:33 Different CFG images have been generated.

  • 00:54:35 I have modified the input because the previous input was not very good.

  • 00:54:40 Actually, it turns out that.

  • 00:54:42 But it is not important, because when you are working with Stable Diffusion, you have

  • 00:54:47 to make, you have to generate a lot of images to find out the good ones that you, you would

  • 00:54:55 like to obtain.

  • 00:54:57 So let's look at, look at the effect of the CFG.

  • 00:55:01 So this is our seed value.

  • 00:55:04 If you use this seed value, you will always generate similar images in each generation,

  • 00:55:11 as long as you keep the same value, same model.

  • 00:55:16 So this is the CFG scale seven.

  • 00:55:18 At the CFG scale seven, there is not much resemblance.

  • 00:55:23 At the CFG scale eight, a little bit resemblance.

  • 00:55:28 Look at how the images are changing.

  • 00:55:31 This is CFG scale nine.

  • 00:55:33 There is some resemblance in these two.

  • 00:55:36 Okay, and if in the CFG scale 10.

  • 00:55:39 Now, this is also some resembles and in here, okay, you see, resembles is increasing and

  • 00:55:47 in the CFG scale 14 actually, there is really good resemblance in this image and in this

  • 00:55:53 image actually, and so it goes, and after certain CFG scale it becomes, i think the

  • 00:56:00 quality starts to be decreasing.

  • 00:56:03 So therefore the CFG scale makes difference.

  • 00:56:06 Now let's say you want to test out different artists, styles with different CFG scales.

  • 00:56:14 How can you do that?

  • 00:56:17 I am putting here a special keyword that i am going to use by replace kw.

  • 00:56:23 Okay, then the rest is anything you want, and in the bottom so this time i am going

  • 00:56:30 to select prompt sr.

  • 00:56:33 Okay, so the prompt sr works as separate a list of words with commas, and the first word

  • 00:56:39 will be used as a keyword.

  • 00:56:40 Script will search for this word in the prompt and replace it with others.

  • 00:56:45 So these keywords will be replaced whatever i type here.

  • 00:56:49 So let's say wlob and then artgerm and then whatever other artists that you want to test.

  • 00:56:57 Okay, i have added two more artists, so we have four artists.

  • 00:57:02 Let's also test 4 CFG values 10, 11, 12 and 13.

  • 00:57:10 Perhaps let's start from 11, okay, and let's keep seeds for minus one, but that time we

  • 00:57:23 we couldn't test the CFG or the style.

  • 00:57:27 Therefore, let's keep the same seed, okay, and you see, there are restore faces, tiling

  • 00:57:34 and high res fix, so you could also pick them to improve your output, but that would take

  • 00:57:40 extra time and you can do them in the extras tab which i will show.

  • 00:57:45 And the batch count is one and batch size is four.

  • 00:57:48 Let's see what kind of results we are going to get.

  • 00:57:50 By the way, these other keywords will also heavily affect the artist style.

  • 00:57:56 Therefore, if you want to only check out the artist style, then you should reduce number

  • 00:58:03 of extra keywords here and let's see what we are going to get.

  • 00:58:08 Okay, i did get runtime error.

  • 00:58:11 Why?

  • 00:58:12 Because i have forgotten to put this keyword in here.

  • 00:58:16 The first keyword has to be that.

  • 00:58:18 Now i need to run again.

  • 00:58:20 Okay, now the generation started.

  • 00:58:23 You should always, check out the CMD window and what is happening here.

  • 00:58:28 If you get an error, then you should fix it, obviously.

  • 00:58:32 Okay, this is the kind of tile that we are going to get.

  • 00:58:35 Actually, it is pretty useful.

  • 00:58:37 So, you see, in the top CFG scale and in the left we got the art style, by the way: It

  • 00:58:45 also produces results with: replacekw and not much like representing the style or me.

  • 00:58:56 Therefore, perhaps we can remove many of the keywords that would take away the style like:

  • 00:59:07 let me do.

  • 00:59:10 Okay this time we have more kind of styling, as you can see here: this is the default,

  • 00:59:17 this is wlobe, this is artgerm.

  • 00:59:21 This is Robert S Duncanson and this is Karol Bak.

  • 00:59:25 Especially Karol Bak style is pretty different and significant, as you can see.

  • 00:59:31 So the key point here is is, with Stable Diffusion, that you have to generate a lot of images,

  • 00:59:38 and some of them will be very, very good and maybe majority of them will not be good and

  • 00:59:44 useful.

  • 00:59:45 This is the nature of the AI based art generation, especially if you are trying to generate art

  • 00:59:55 based on your subject, a new subject, and also when we were doing training in here,

  • 01:00:04 you can use more classification images.

  • 01:00:08 That can help.

  • 01:00:09 I said that the community is using 300 total, but that is not a hard limit.

  • 01:00:18 You can just use 200 images for per training image and that may help you to improve your

  • 01:00:25 style.

  • 01:00:26 Actually, it is also the number used in the official paper, as i said.

  • 01:00:30 So it is up to you.

  • 01:00:31 You have to do experimenting.

  • 01:00:33 The numbers and the quality you get also totally will depend on your training data set.

  • 01:00:39 If you get a much variety having a training data set, as i have explained, then your model

  • 01:00:47 can learn much better.

  • 01:00:49 I will show you one another thing here.

  • 01:00:50 There is a prompt matrix that will generate combination of the images.

  • 01:00:57 Okay, so when you type your query like this and select it from matrix, this query will

  • 01:01:05 become face photo of ohwx man:1.3, like this.

  • 01:01:10 And then they will get combined by like this.

  • 01:01:15 So this will be generate all of the combinations of the written text separated with the let

  • 01:01:24 me tell you once again, vertical pipe character.

  • 01:01:28 It will generate all of these keywords combination like this.

  • 01:01:32 Okay, i will show one another thing.

  • 01:01:34 Let's say you are going to sleep and you want your computer to generate many different style

  • 01:01:40 of images for you during your sleep.

  • 01:01:44 For that i will show you an easy way to do it.

  • 01:01:49 So our first our first prompt is face photo of ohwx and let's say 1.4.

  • 01:01:58 Then let's add some certain keywords to get some certain kind of prompt.

  • 01:02:06 Okay, i have typed like this and generated 20 input like this, then it has generated

  • 01:02:14 me a lot of results.

  • 01:02:15 I am going to copy all of this into a notepad file, paste it so you see they are actually

  • 01:02:23 copied as one line each one.

  • 01:02:26 Then i will generate several more.

  • 01:02:28 Okay, i keep copy, pasting the newly generated input to there.

  • 01:02:35 Okay, now i have 60 lines of inputs like this.

  • 01:02:40 I am going to save it as.

  • 01:02:45 Let's go to the pictures and nightly prompts.

  • 01:02:49 Okay, then go back to text2img tab and in here select prompts from file or text box.

  • 01:03:00 You can paste all of them here or you can upload them from here.

  • 01:03:05 So i will upload them, from the text box, from the text file, and they are all uploaded.

  • 01:03:13 I am going to say, use random seed for all lines, because i want to get as many as possibly

  • 01:03:19 different results.

  • 01:03:22 And then i want to generate how many images you want to generate for each one.

  • 01:03:29 I want to generate, let's say, eight images in parallel, and currently they will use the

  • 01:03:36 CFG value I am going to set here 14.

  • 01:03:39 So with 60 and 8 images batch size, we are going to get 480 images.

  • 01:03:48 Let's say you want to generate 4000 or whatever you want.

  • 01:03:53 So if i set this 20, we are going to get exactly 20 times multiplied by 8 and multiplied by

  • 01:04:02 the number of lines we have.

  • 01:04:05 9600 images during the night with a lot of different inputs, variation, and among them

  • 01:04:13 you can pick whatever you want and use it as you want.

  • 01:04:17 This is one of the options that you can.

  • 01:04:20 Okay, after i click it, it started generating images.

  • 01:04:25 For example, generated this one, and if you wonder what is this image, you go to the png

  • 01:04:31 info and then you can just go to get the image, drag and drop it in here and it will show

  • 01:04:39 you all of the parameters it has.

  • 01:04:42 So this is the prompt input and this is the negative prompt input it has and the number

  • 01:04:47 of steps used.

  • 01:04:48 The sampler used, the CFG scale used, the seed, so with this seed you can repeat this

  • 01:04:55 image generated.

  • 01:04:57 You can use this seed and change the CFG value and generate other variations of this and

  • 01:05:02 the size and the model hash.

  • 01:05:04 The model hash, of course, will change since, we are using our custom trained model.

  • 01:05:10 The batch size and the batch position, so this is also important.

  • 01:05:15 To exactly get this, you need to generate a against batch size as 8, and the sixth position

  • 01:05:23 will be this one.

  • 01:05:24 If you use this seed and this CFG value and this sampler.

  • 01:05:28 We are getting some decent photos, and i will leave it to run during my sleep and tomorrow

  • 01:05:34 i will show you, of course, in a moment for you.

  • 01:05:38 We are going to see what kind of good images we got.

  • 01:05:43 Okay, here you see, some of the images i have generated during my sleep.

  • 01:05:49 They are pretty good quality, but they are very similar.

  • 01:05:52 Why?

  • 01:05:53 Because it appears that the inputs i have used to generate them were not much different.

  • 01:05:59 However, some of them are really high quality.

  • 01:06:02 For example, this image: you see, it has almost perfect eyes, perfect shape.

  • 01:06:07 It's a really good quality image.

  • 01:06:10 So your training data set and the keywords, the prompts you use, will hundred percent

  • 01:06:16 affect the outcome that you are going to get, and you really need to stylize your prompt

  • 01:06:23 according to what you want to get.

  • 01:06:25 Now let me show you a few of the prompts used for generating these images.

  • 01:06:30 To doing that, i am going png info, okay, and then i will drag and drop.

  • 01:06:36 For example, let's first see a 3d like image.

  • 01:06:41 Okay, and you see this used blender, zbrush, autodesk maya, unreal engine, colored, because

  • 01:06:51 if you want to generate a 3d like image then you need to use these kind of keywords.

  • 01:06:57 Then you can send these to.

  • 01:06:59 For example, let's go to the extras tab.

  • 01:07:02 In extras tab i can upscale this image to get it a bigger size.

  • 01:07:08 After my testing i have found that R-ESRGAN 4x+ works best.

  • 01:07:14 There is also anime version.

  • 01:07:18 Also, LDSR is working very good, but this requires a lot of gpu memory.

  • 01:07:25 So when i click generate, when the first time you generate it, it is going to download the

  • 01:07:31 model that is necessary for R-ESRGAN 4x+.

  • 01:07:35 You can see here and now we will see the upscaled image.

  • 01:07:41 So this is the upscaled image.

  • 01:07:43 The upscale and the original will not be exactly same, but let's compare them okay.

  • 01:07:48 Let's make them, not zoomed in, okay.

  • 01:07:54 So you see, both of these are, really similar.

  • 01:07:59 A little bit loss of quality.

  • 01:08:02 Let's also try with the anime version.

  • 01:08:10 Okay, now we got anime version.

  • 01:08:14 So let's say, you want to make your images like anime, then you can use that.

  • 01:08:19 This is extremely useful.

  • 01:08:23 You can also upscale entire folder.

  • 01:08:26 For example, i will just ctrl a select all, then i will drag and drop them here.

  • 01:08:32 All of them is now here.

  • 01:08:33 Now i can upscale all of them at once.

  • 01:08:37 Let me show.

  • 01:08:38 During the operation you will see they are getting tiled like this to generate bigger

  • 01:08:45 size images.

  • 01:08:47 The results of upscaling the extras tab actually will be inside another folder.

  • 01:08:52 When i click it, you will see they are getting here and all of these images are now upscaled.

  • 01:08:58 For example, let's open this: this is a pixar style image, actually.

  • 01:09:04 Okay, this is another pixar style image, so, for example, this is also another pixar style

  • 01:09:13 image, as you can see.

  • 01:09:17 I have trained these on the Google Colab and now i will show you how you can upload your

  • 01:09:23 model to the Google Colab and generate images there with faster than probably your gpu,

  • 01:09:31 because the Google Colab gpu is really strong, able to process a lot of images at once in

  • 01:09:40 a parallel way.

  • 01:09:41 Okay, you see, all of these are getting upscaled, okay, um, let's see some of them like this,

  • 01:09:52 as you can see, okay, okay.

  • 01:10:02 Now i will show one another cool thing.

  • 01:10:06 Usually you may not get very good looking eyes or some errors in the face, and there

  • 01:10:13 is a very good way to improve the eyes or the overall structure of the face.

  • 01:10:20 It uses another AI model and let's try this image improving.

  • 01:10:26 Usually, the my images were really good eyes.

  • 01:10:31 Okay, to test it.

  • 01:10:33 I am just going to not upscale, but i am going to use GFPGAN.

  • 01:10:38 So this GFPGAN is a model to improve the eyes.

  • 01:10:42 Let's test it.

  • 01:10:43 When the first time you use it, it will download the necessary model.

  • 01:10:48 Okay, now let's compare the result.

  • 01:10:50 This is the original image and this is the fixed image.

  • 01:10:53 Now let's also apply an upscale, okay.

  • 01:10:59 Okay, after applying upscale and applying a GFPGAN, you see it is now looking much better

  • 01:11:07 in terms of quality correctness.

  • 01:11:09 This will seriously improve the eyes.

  • 01:11:13 Let's open them like this.

  • 01:11:14 Okay, let's zoom in.

  • 01:11:16 So you see the difference is huge: much better quality, styling.

  • 01:11:22 You can apply this to your generated images as a batch as well.

  • 01:11:26 Just go to batch process and select the options from here and it will do everything.

  • 01:11:31 You can also try these other options.

  • 01:11:33 I didn't find them very useful actually, and there is also not a description to them.

  • 01:11:39 Okay, now i will show you how you can continue training from any checkpoint that you did

  • 01:11:44 set.

  • 01:11:45 Just go to the search checkpoint and you will see your saved checkpoints here, by the way,

  • 01:11:49 to get them saved in the saving, you need to check this generator ckpt file when saving

  • 01:11:57 during checkpoint and then, if you generate a new model from that checkpoint, you will

  • 01:12:03 basically continue training from that certain checkpoint.

  • 01:12:08 Now i will show you how you can use these ckpt files directly in a Google Colab.

  • 01:12:16 If you have watched my previous video about transform yourself into a stunning ai avatar,

  • 01:12:22 this tutorial is how to do training on a Google Colab and, everything is explained there to

  • 01:12:30 use your ckpt file in a Google Colab.

  • 01:12:34 It is so, so easy.

  • 01:12:35 First we are going to generate a new model from the our wanted checkpoint.

  • 01:12:41 That let's say i want to use step 1380 as a checkpoint.

  • 01:12:47 Then i am giving it a name as a Colab image.

  • 01:12:52 Okay, and nothing else.

  • 01:12:54 Just click create model.

  • 01:12:56 Okay, it has generated a, generated a new model for the Colab image and inside working

  • 01:13:02 directory you just need to upload this into google google drive and then just give its

  • 01:13:10 path.

  • 01:13:11 So for i will say that, my image.

  • 01:13:15 Okay.

  • 01:13:17 Let's say, let's also add our keyword to that and let's move them inside here and then go

  • 01:13:26 to your drive folder like this, where you are running your DreamBooth or the Stable

  • 01:13:34 Diffusion, then drag and drop this directory here.

  • 01:13:41 It will upload all of the files, as you can see in here.

  • 01:13:45 Once the upload is completed, all we need to do is changing model path in the inference

  • 01:13:52 tab of the Google Colab notebook.

  • 01:13:55 This is linked in the description of the tutorial.

  • 01:13:59 So you need to change it like this: content: drive my drive, and in here my drive image

  • 01:14:06 ohwx, which is the folder name that i have given and i am uploading to the main folder

  • 01:14:13 of my Google Drive.

  • 01:14:15 Then, in the Google Colab, you will be able to use your trained ckpt file right away.

  • 01:14:21 So what if?

  • 01:14:23 If you want to teach another face?

  • 01:14:27 Just generate a new model like this and this time, in the concepts folder, set the directory

  • 01:14:34 and the classification directory for your new subject.

  • 01:14:38 However, be careful with something.

  • 01:14:41 Currently, my model is trained with ohwx man as an instance prompt and photo of man as

  • 01:14:48 class prompt.

  • 01:14:50 So if i am going to teach another, a person, a male, then i have to pick another keyword,

  • 01:14:56 for example ske or another rare keyword, and um, it will teach this man into the model

  • 01:15:06 as well.

  • 01:15:07 So we will be able to use both of them.

  • 01:15:09 However, probably you will get mixed results because man keyword were already taught for

  • 01:15:17 my own image and when i introduce another man image they will get mixed.

  • 01:15:24 So it could be a problem, but you can try it.

  • 01:15:27 Test it and if you generate sufficient of images, then i think you will.

  • 01:15:31 You can obtain still good results.

  • 01:15:34 However, if you inject some another class, like a woman, then it shouldn't be much problem

  • 01:15:42 and you should be able to teach multiple different subjects easily.

  • 01:15:47 Now i will explain more advanced stuff.

  • 01:15:50 For example, the directories, data set directory.

  • 01:15:54 Okay to be able to use [filewords], you need to have a training data set named like this:

  • 01:16:02 okay.

  • 01:16:03 So for each image, you are also going to have a text file with the same name.

  • 01:16:08 The extension will be txt, like this, and, you need to write the description of that

  • 01:16:13 file properly.

  • 01:16:16 There is a new AI model for captioning images.

  • 01:16:20 This is not implemented to Automatic1111 yet, but i will it will be.

  • 01:16:25 I will put the link of this into the description.

  • 01:16:28 You can also locally run this.

  • 01:16:31 And if you don't know how to locally run run this, then you need to watch our this video

  • 01:16:38 on our channel.

  • 01:16:39 In this video, i am explaining how to locally run HuggingFace files.

  • 01:16:45 Okay, and i will just use the online demo right now because it is not very much used.

  • 01:16:51 So, first image, i will just drag and drop here.

  • 01:16:56 Sorry about that.

  • 01:16:58 Okay, like this: and click submit.

  • 01:17:01 It will generate the description for this image.

  • 01:17:03 You see, you should use the caption generated by GIT large.

  • 01:17:08 This is the best one.

  • 01:17:09 A man with dark hair and glasses is smiling.

  • 01:17:13 Okay, so let's just change this text text.

  • 01:17:18 Text description, like this.

  • 01:17:20 However, there is one key issue: you have to have your class for this image inside this

  • 01:17:27 description.

  • 01:17:28 So my class is man and therefore it is there.

  • 01:17:31 Okay, let's go.

  • 01:17:32 Then.

  • 01:17:33 This is another image that we want to caption, so let's submit it.

  • 01:17:40 Okay, and then another image description is here.

  • 01:17:44 Let's open the description: a cat with long whiskers looking at the camera.

  • 01:17:50 And this is the class of cat, and it is inside here as well.

  • 01:17:54 Yes, correct, and the rest will be for dog as well.

  • 01:17:58 Now for classification images.

  • 01:18:01 You need to do the same.

  • 01:18:03 When you generate classification, you also need to have classification image and its

  • 01:18:08 description.

  • 01:18:10 Let's say: this is my classification image and it is it is generated with photo of man.

  • 01:18:16 Therefore, i need to generate a same file description like this and inside here i need

  • 01:18:23 to type photo of man.

  • 01:18:26 When this tab get fixed, let me show you maybe it is already fixed, i am not sure.

  • 01:18:33 In here, you see, we have generate class images and when you use that feature, it will be

  • 01:18:40 able to: let's try it, actually okay.

  • 01:18:44 And let's yeah, it doesn't matter, okay.

  • 01:18:48 And when we type class prompt here photo of man, i think it will generate with it.

  • 01:18:56 Let's try it, okay, it is not working.

  • 01:19:01 It says maybe say okay, it's still not working.

  • 01:19:06 When this become working, then you can easily generate it.

  • 01:19:10 Or you need to generate the description like this: photo of man, and it will generate images

  • 01:19:17 like that, or photo of cat or photo of dog.

  • 01:19:20 So this will be your classification directory with description like this and this will be

  • 01:19:27 your classification directory with naming like this.

  • 01:19:29 With this way, you can teach multiple subjects in the one run and you can also possibly improve

  • 01:19:37 your training quality if you provide a better description with defining more things.

  • 01:19:45 By the way, when defining, you should specify your subject in the description what you want

  • 01:19:53 to teach.

  • 01:19:54 If you want to teach face, then you should describe the face in mostly.

  • 01:19:58 Okay, and one another thing: okay, once you prepared your folders.

  • 01:20:05 Now here the way to do it.

  • 01:20:08 First of all, we are defining the data set directory as usual.

  • 01:20:14 Okay, let's set it.

  • 01:20:16 And let's also set the classification directory like this.

  • 01:20:21 And in [filewords], we need to use defining prompt instance.

  • 01:20:29 Okay, this will be used to define it.

  • 01:20:35 It has to be a single word.

  • 01:20:37 Therefore, i am entering ohwx and the class token.

  • 01:20:41 This will be also a single word.

  • 01:20:45 By the way, it won't be very precise actually if you use this way, class token.

  • 01:20:55 But yeah, looks like if you teach multiple different classes, then you may not get very

  • 01:21:02 good performance, for example, teaching a cat, a face, a cat, a dog and a man, because

  • 01:21:08 they are conflicting with the current setup.

  • 01:21:12 So using three concept is better, but let me also explain it to you.

  • 01:21:17 So this will be man.

  • 01:21:18 And in prompts you are just going to type [filewords] and class prompt.

  • 01:21:23 You are just going to type [filewords] and leave blank to use instance prompt optionally.

  • 01:21:30 Use [filewords] to base sample captions on instance images.

  • 01:21:33 You can just also use [filewords] to see what is what it is generating.

  • 01:21:40 This is called mixed where in the basics of the wiki of DreamBooth extension.

  • 01:21:47 So you see there is DreamBooth regular training that i have shown in this tutorial.

  • 01:21:53 Then there is fine tuning.

  • 01:21:55 Fine tuning is the standard approach for big data sets.

  • 01:21:58 Only the captions of the images are used.

  • 01:22:00 [filewords] class images are not used.

  • 01:22:02 These results in a model that doesn't need instance token and reacts to any prompt.

  • 01:22:07 So in this case you are overall training.

  • 01:22:10 What does that mean?

  • 01:22:11 That means that, let's say, in your [filewords] you have cars, you have cats, you have dogs,

  • 01:22:18 you have men.

  • 01:22:19 You are training all of these words.

  • 01:22:22 And this is how the custom models you see are usually trained.

  • 01:22:28 Let me show an example.

  • 01:22:29 So, for example, protogen x3.4 is a custom model and it is working pretty good.

  • 01:22:37 How did they train it?

  • 01:22:38 They probably trained it with fine tuning.

  • 01:22:41 So in fine tuning they have, precisely prepared the descriptions of each training image.

  • 01:22:48 They didn't use any classification images and they have overall changed the underlying

  • 01:22:53 context, data, the knowledge of the model.

  • 01:22:56 So when you use now man, it produces quality of man images depending on their new fine-tuned

  • 01:23:04 data set or car or castle or whatever that you are improving your model on.

  • 01:23:10 And there is hybrid.

  • 01:23:12 Okay, actually i said mix it, but it will be hybrid.

  • 01:23:15 Hybrid, for lack or of better term, is achieved using instance token in combination to [filewords]

  • 01:23:20 as instance prompt.

  • 01:23:21 Trained Dataset will be linked to that instance token.

  • 01:23:24 This minimize the bleed but requires token in every prompt, as you can see here.

  • 01:23:29 So you have to use or ohwx french bulldog or ohwx, whatever you have teached.

  • 01:23:37 Also you see the class token is person.

  • 01:23:39 So with hybrid model with [filewords] if you, if you don't do fine tuning but only teach

  • 01:23:45 any subject, the subject should be, i think, same class.

  • 01:23:49 They can't be from different classes.

  • 01:23:51 So you can teach multiple person in a single run, maybe 10 person, with just providing

  • 01:23:59 correct [filewords] and their descriptions.

  • 01:24:03 So for this person you need to add, let's say a man personA.

  • 01:24:09 Okay, this will define personA.

  • 01:24:11 For person b, you need to add personB and for person c, you need that personC.

  • 01:24:16 But you are not going to add into this description: you are not going to add this instance token.

  • 01:24:24 Okay, you don't need to type instance token into the [filewords], into the description

  • 01:24:30 of the training images or the into the description of the classification images.

  • 01:24:37 Okay, this is important.

  • 01:24:39 Okay, now i will show how you can understand out of memory error.

  • 01:24:46 So it is easy.

  • 01:24:47 I'm just going to load settings for our existing data set.

  • 01:24:50 You see, i have an error.

  • 01:24:52 So it looks like i had error in cmd.

  • 01:24:54 I just need to restart.

  • 01:24:56 Okay, i did restart and in the settings, if i set use EMA.

  • 01:25:03 So actually this improves our result quality but it costs more ram.

  • 01:25:08 And then i just click train and let's see how we are going to get out of memory error.

  • 01:25:14 Okay, we got our error.

  • 01:25:18 Let me show you how to understand out of memory error.

  • 01:25:22 You will see runtime CUDA out of memory.

  • 01:25:24 If you are seeing this error, all other messages are not important.

  • 01:25:28 This means that with the current settings that you are trying to training, your graphic

  • 01:25:34 card is not enough and you need to reduce the ram usage.

  • 01:25:38 Now let me show you all of the settings to how to reduce the ram usage.

  • 01:25:43 Okay, so for minimal ram usage you need to pick LoRA with the LoRA.

  • 01:25:48 There is just a little bit difference.

  • 01:25:52 It is only different when you try to do inference and generate new images from generated LoRA

  • 01:26:00 file.

  • 01:26:01 And when you watch this video you will learn that, okay, LoRA will significantly reduce

  • 01:26:06 ram usage.

  • 01:26:08 Other than that, always make sure that your batch size and gradient accumulation steps

  • 01:26:12 are one and other than that, in the advanced tab you need to pick use 8 bit adam and select

  • 01:26:20 bf16 and select xformers.

  • 01:26:23 So for xformers to be able to, you need to set your starting arguments to xformers and

  • 01:26:31 minus minus no half.

  • 01:26:33 These will allow you to use that.

  • 01:26:35 Cache latents.

  • 01:26:36 Actually, this is the.

  • 01:26:37 This is still not clear.

  • 01:26:39 You should try both this checked and unchecked.

  • 01:26:42 Because some says that this increases, some says that this decreases . So also, Step Ratio

  • 01:26:49 of Text Encoder Training.

  • 01:26:50 This should be zero because this increases quality but also reduces, also increases the

  • 01:26:54 vram usage.

  • 01:26:56 And other than these, there is not much else that you can do.

  • 01:27:02 These are the lowest possible.

  • 01:27:04 Also, you need to uncheck this checkbox and you need to check this checkbox.

  • 01:27:12 So when you check this checkbox it will increase your vram usage, but when you check this checkbox

  • 01:27:18 it will reduce your vram usage.

  • 01:27:21 Actually, actually, the settings are written in the troubleshooting part of the DreamBooth

  • 01:27:26 wiki extension, in the OOM tab, and there is also overtraining and other things.

  • 01:27:32 Actually, overtraining is still in working process and i have already shown you how to

  • 01:27:37 understand overtraining.

  • 01:27:39 And one another cool thing that i am going to show you is preprocessing your images.

  • 01:27:45 So with preprocessing images you can easily generate descriptions for your both training

  • 01:27:52 images and your classification images.

  • 01:27:55 Of course they won't be very accurate, so let me show you.

  • 01:28:00 I am picking my best db 512 as source directory and the description directory will be same.

  • 01:28:09 So in here you can even define their target resolution, change them, but i prefer manually

  • 01:28:16 changing them and captioning.

  • 01:28:19 So for captioning, i am just going to select ignore, so it will generate new captions and

  • 01:28:26 i am going to use deepbooru for captioning.

  • 01:28:29 You can also generate flipped copies oversized images, splitted, autofocal point crop.

  • 01:28:35 So let's say you have tens of thousands of images, then these options will be extremely

  • 01:28:41 useful for you.

  • 01:28:42 However, if you are only going to train your face, then you should manually prepare your

  • 01:28:47 training data set to be best, and then i am going to generate captions for them.

  • 01:28:53 I am just going to click preprocess.

  • 01:28:54 It shouldn't change the width and height because they are already 512 pixels and it is downloading

  • 01:29:03 the deepbooru for captioning.

  • 01:29:04 This is another model, just as i have shown you in here.

  • 01:29:09 The deepbooru is not as good as caption generated by git large, but it is still useful and in

  • 01:29:15 a moment we are going to see.

  • 01:29:17 Okay, it has thrown an error.

  • 01:29:19 Says that same director specified as source and destination directory.

  • 01:29:22 Obviously, this is not allowed.

  • 01:29:25 Actually, it's a good thing that they don't allow.

  • 01:29:28 So i'm just going to change it as processed, so that you don't override your original images

  • 01:29:36 and just lets click preprocess.

  • 01:29:39 Okay, the models are only downloaded one time, and all images are preprocessed.

  • 01:29:45 So let's check out the preprocessed images.

  • 01:29:48 Okay, you see same images with descriptions.

  • 01:29:52 Let's look at the description.

  • 01:29:53 So the description is one: boy, black hair, facial hair, gray pants, jacket, long sleeves,

  • 01:29:58 male focus pants, realistic solo sub stable track jacket and track it track pants.

  • 01:30:05 So it's a pretty good description.

  • 01:30:07 You can also manually modify them.

  • 01:30:10 Let's also modify our classification images so that, it will generate all of the description

  • 01:30:17 of classification images.

  • 01:30:18 By the way, this is useful, as i said, when you use [filewords].

  • 01:30:22 If you are not using [filewords], then these won't get used.

  • 01:30:26 This is also useful, very useful, if you use a hyper network or embeddings, and i will

  • 01:30:32 also hopefully make a video about embeddings.

  • 01:30:35 Hyper networks are not very good, but embeddings are really really good.

  • 01:30:39 Okay, let's preprocess our classification folder.

  • 01:30:45 So the preprocess is in train tab.

  • 01:30:47 This is a feature of Automatic1111.

  • 01:30:50 Okay, and preprocess it.

  • 01:30:53 It is also pretty fast.

  • 01:30:57 So this will be extremely useful to caption.

  • 01:31:00 And also, if your images are not properly cropped and you have tons of thousands of

  • 01:31:07 images, as i said, that will take huge time.

  • 01:31:10 You can just use this.

  • 01:31:12 As a beginner you can also use this to make your job easier and see the results, how it

  • 01:31:17 is performing.

  • 01:31:18 Let's say you picked your hundreds of images of yourself and you don't want to spend time.

  • 01:31:23 Then you can preprocess images like this and try, try, train, try the training on them

  • 01:31:31 and see the results.

  • 01:31:32 If you can get good results, then why not spend much time, more time on them?

  • 01:31:37 But if you want to get perfect results, then you need to manually crop your images and

  • 01:31:43 set your set your description.

  • 01:31:47 So let's see the preprocess now.

  • 01:31:49 Every image has description.

  • 01:31:51 Let's look at them.

  • 01:31:52 Okay, it, for example, it defined this man as a girl, which is a very incorrect and also

  • 01:31:59 3d asian black shirt.

  • 01:32:01 Okay, this is a completely incorrect description, as you can see.

  • 01:32:06 It's completely failed.

  • 01:32:07 And now let's compare this with the large git which i have shown.

  • 01:32:12 Okay, i wonder what kind of result we are going to get with large git, so i'm just going

  • 01:32:18 to drag and drop.

  • 01:32:21 By the way, as i said, i have suggested adding this model to the Automatic1111 to get better

  • 01:32:27 results, and the large git generated a portrait of man with beard.

  • 01:32:32 Yes, absolutely fantastically correct when compared to this trashy description, as you

  • 01:32:41 can see.

  • 01:32:42 Okay, as a final thing, i suggest you to look at the ELI5 training.

  • 01:32:48 So this is getting updated by the experienced persons and, for example, in [filewords],

  • 01:32:55 they say that they are giving an example of instance, token alexa is bad because underlying

  • 01:33:01 data for alexa is great and it would be hard to override it.

  • 01:33:06 This is also bad because this is getting split into like this: ohwx, great.

  • 01:33:12 Class token is also important.

  • 01:33:15 I already experienced them, but you can also check these pages.

  • 01:33:19 I will put the links of these pages into the description.

  • 01:33:24 Now i will show you one another very cool thing.

  • 01:33:26 You see, this Protogen x3.4 is a custom model that has been generated by using multiple

  • 01:33:33 models, a lot of training, and you see, if you train your face or subject into this model,

  • 01:33:40 it won't produce good results.

  • 01:33:43 Because the underlying data have been significantly changed.

  • 01:33:48 So how can we inject our face into this model?

  • 01:33:53 There is a way to do that and now i am going to show you.

  • 01:33:56 We go to the checkpoint merger and in the primary model we are selecting our target

  • 01:34:05 model, which is Protogen x 3.4.

  • 01:34:09 Secondary model will be the model that we train it we are using, which will be this

  • 01:34:16 one: ohwx 1308.

  • 01:34:19 And there is tertiary model.

  • 01:34:21 So the tertiary model will be version 1.5.

  • 01:34:24 This is the model, this is the base model of our model, and what we are going to do

  • 01:34:29 is we are going to extract our image from base model and we will apply our image into

  • 01:34:37 the our new target model.

  • 01:34:39 Let's give it a name: ohwx, protogen okay, 3.4, and set the weight 0.75.

  • 01:34:49 This is 75%.

  • 01:34:50 You may ask: how did you come up with this value?

  • 01:34:53 I asked the community and, according to the experience of the community, 75% is a good

  • 01:35:00 point.

  • 01:35:01 You can, of course, try multiple different points.

  • 01:35:03 You can try your different checkpoints to see how you perform.

  • 01:35:08 Also, click the add difference.

  • 01:35:10 So this will extract our face information from our base model and it will inject our

  • 01:35:16 face information into our new target model without breaking the underlying context, the

  • 01:35:25 information.

  • 01:35:26 We are going to generate ckpt add difference and just click run.

  • 01:35:31 In the cmd window you will see the messages like this: and checkpoint saved, then refresh

  • 01:35:36 here and just go to our new model, which is ohwx protogen.

  • 01:35:42 Now we can produce images by using the protogen model and our face, same as usual.

  • 01:35:51 Okay, everyone, i have done a few tests and the results are just amazing.

  • 01:35:58 So you see, these are some of the images that i have selected from the results.

  • 01:36:03 And let me show you something.

  • 01:36:05 So you see, this is generated by protogen and this is my original, real image.

  • 01:36:12 And this is the generated image.

  • 01:36:14 You see the quality.

  • 01:36:15 It is just amazing.

  • 01:36:17 And what kind of test i did.

  • 01:36:20 For testing, i have used the x/y plot, i have entered different x values as CFG and i have

  • 01:36:29 entered prompt sr as the weights.

  • 01:36:32 So how did i make?

  • 01:36:33 So you you see the ohwx man and then we are entering a weight here, right to give an importance

  • 01:36:41 to it.

  • 01:36:42 So i have entered a keyword here change weight and i have used it as a change weight here

  • 01:36:49 in the prompt sr.

  • 01:36:50 So the you the Automatic1111 ui into application changed the weight for me and tested different

  • 01:36:59 weights.

  • 01:37:00 Now i can see the properties of this generated part particular image to see what were the

  • 01:37:09 used values.

  • 01:37:10 Then, based on that, i can generate anything i want.

  • 01:37:14 So the weight used was 1.4 and the cfg scale was 8.

  • 01:37:20 So by using 1.4 and cfg scale 8 i can generate much more quality images.

  • 01:37:28 So these two parameters will work with my merged model.

  • 01:37:35 By the way, i also have used something else.

  • 01:37:39 You see, there is a model hash and that hash, the hash written here, also displayed here.

  • 01:37:46 This 95 means that i have generated another checkpoint, but this time i have used 95%

  • 01:37:53 weight.

  • 01:37:54 This worked better for me.

  • 01:37:57 So in the beginning you can start with 75 and if you are not getting good images then

  • 01:38:03 you can increase it and make different model merges and then do test on them.

  • 01:38:11 So this is the way how to test and find out the good working parameters for your model

  • 01:38:17 and then use those parameters to generate more stylized images as you want.

  • 01:38:22 But the results are just simply amazing.

  • 01:38:25 You can't just get these results so easily on the default Stable Diffusion model.

  • 01:38:30 So you can inject your trained model, trained face, into any custom model out there and

  • 01:38:37 generate the beautiful images as you want.

  • 01:38:42 So let's also upscale this image.

  • 01:38:45 To do that, i am just going to send it to extras and i will upscale it with R-ESRGAN

  • 01:38:51 4x+.

  • 01:38:53 And here the result: it is just beautiful.

  • 01:38:56 Let's also apply GFPGAN to get better face quality.

  • 01:39:02 Okay, now, amazing, as you can see, amazing quality, amazing image.

  • 01:39:07 There is only only an artefact here, as you can see, . So if i would generate such images,

  • 01:39:13 i could also get rid of this artefact.

  • 01:39:15 I think i have covered pretty much everything.

  • 01:39:19 As i said in the beginning, just join our discord channel.

  • 01:39:22 From our about page and also in the in here you will see the link.

  • 01:39:29 Just click the official discord . Please also share, like, subscribe, and if you support

  • 01:39:35 us on our patreon, i would be greatly appreciated.

  • 01:39:39 Currently we have three patrons.

  • 01:39:42 I think i thank a lot to them for becoming patron of our, supporting our job.

  • 01:39:48 You can also join our channel and support us from here, as you can see.

  • 01:39:53 I would appreciate every bit of your support.

  • 01:39:56 Hopefully see you in another video.

  • 01:39:59 Please leave comments and ask the questions.

  • 01:40:02 Ask the topics that you want to see as a new um tutorial.

  • 01:40:07 Thank you very much.

  • 01:40:09 Hopefully see you later.

Clone this wiki locally