Skip to content

Ultimate RunPod Tutorial For Stable Diffusion Automatic1111 Data Transfers Extensions CivitAI

FurkanGozukara edited this page Oct 26, 2025 · 1 revision

Ultimate RunPod Tutorial For Stable Diffusion - Automatic1111 - Data Transfers, Extensions, CivitAI

Ultimate RunPod Tutorial For Stable Diffusion - Automatic1111 - Data Transfers, Extensions, CivitAI

image Hits Patreon BuyMeACoffee Furkan Gözükara Medium Codio Furkan Gözükara Medium

YouTube Channel Furkan Gözükara LinkedIn Udemy Twitter Follow Furkan Gözükara

Sign up RunPod: https://bit.ly/RunPodIO. This is the Grand Master tutorial for running Stable Diffusion via Web UI on RunPod cloud services. If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 https://www.patreon.com/SECourses

SECourses Discord To Get Full Support ⤵️

https://discord.com/servers/software-engineering-courses-secourses-772774097734074388

#RunPod discord: https://discord.gg/pJ3P2DbUUq

Colab Tutorial 1: https://youtu.be/mnCY8uM7E50

Colab Tutorial 2: https://youtu.be/kIyqAdd_i10

Automatic1111 Command Line: https://bit.ly/StartArguments

Best DreamBooth Tutorial: https://youtu.be/Bdl-jWR3Ukc

DreamBooth second tutorial: https://youtu.be/KwxNcGhHuLY

RunPodCTL GitHub: https://github.com/runpod/runpodctl

Pre-trained models repo link : https://huggingface.co/lllyasviel/ControlNet

Web UI install tutorial on PC: https://youtu.be/AZg6vzWHOTA

How To Use Different Models Automatic1111: https://youtu.be/aAyvsX-EpG4

Textual Inversion Training Tutorial: https://youtu.be/dNOpWt-epdQ

ControlNet Tutorial Video: https://youtu.be/vhqqmkTBMlU

ControlNet extension: http://bit.ly/3IxBYc6

ControlNet Model Files: https://bit.ly/CTRLNETModels

ControlNet Native Script: https://youtu.be/YJebdQ30UZQ

Upgrade xformers Commands: https://bit.ly/UPxformers

Kohya GUI: http://bit.ly/3ICvsB7

Cloud sync: http://bit.ly/40Zf44C

00:00:00 Intro

00:01:32 How to register RunPod.io and charge your credits

00:02:34 How to deploy a pod - start a server for Stable Diffusion 1.5 Automatic1111 Web UI

00:03:30 How to select deployment template for Stable Diffusion Web UI in RunPod

00:04:00 Explanation of temporary disk and persistent volume

00:04:44 Explanation of credit spending per minute for storage usage in RunPod

00:08:10 My Pods section

00:08:30 Connect to the started Pod

00:08:41 Start SD 2.1 Version Web UI Pod

00:09:25 Why pick a lesser used Pod

00:10:53 Bidding system of RunPod.io

00:13:11 Where and how to see scheduled maintenance

00:13:31 Stop Pod vs Terminate (delete) Pod

00:14:24 Where to see logs to debug and understand errors

00:15:08 Connect your Pod via a Jupyter Lab interface

00:15:16 How to change Automatic1111 Web UI command line arguments and restart it

00:17:54 First prompt in RunPod Automatic1111 Web UI

00:18:45 Where to see logs, find error logs, debug them

00:19:35 How to install DreamBooth extension of Automatic1111 Web UI

00:20:58 Where the generated images are saved

00:21:10 How to download generated images

00:21:38 How to update installed extensions

00:21:55 How to notice port error and fix it

00:23:04 How to install runpodctl latest version to transfer files very quickly between Pods and PC

00:23:55 How to download a ckpt file very fast from Hugging Face repo

00:25:10 Start DreamBooth training with best model and settings

00:30:41 How to upload your training dataset images

00:34:15 How to upload thousands of images (big data) from your computer to RunPod via runpodctl

00:34:28 How to install RunPodCTL on your Windows computer

00:35:06 How to send files from your PC to RunPod via runpodctl

00:39:38 Where to find generated checkpoints and sample images during DreamBooth training

00:41:30 How to delete non-empty folder

00:41:51 Even though xformers not selected during training, still breaks training and how to fix it

00:42:29 How to download a folder from RunPod to your PC via runpodctl very quickly

00:43:09 How to add runpodctl to environment path to use from every folder

00:47:25 How to continue/resume DreamBooth training

00:48:20 Test all training checkpoints with x/y plot to find best one

00:52:09 How to set correct command line arguments for SD 2.1

00:52:55 Where to see currently spent credits per hour

00:54:05 How to do DreamBooth training on SD 2.1 - 768 pixel version with best possible settings

00:57:42 How to generate classification images manually very fast

01:00:26 Why SD 1.5 is superior to 2.1

01:04:34 How to download custom models very fast from CivitAI

01:08:45 How to do Textual Inversion training with some optimal settings

01:13:00 Where Textual Inversion training samples and checkpoints are saved

01:14:07 How to use Textual Inversion check points

01:15:55 Move generated SD 2.1 classification images into correct folder

01:19:26 How to install and run ControlNet extension on RunPod IO

01:21:11 How to download your trained model files (ckpt) into your PC very fast via runpodctl

01:25:00 How to upgrade xformers to 0.0.17 for DreamBooth SD 2.1 training

01:26:04 How to expand runtime disk space

01:27:21 Best settings for SD 2.1 with xformers

01:31:30 What is Stable Diffusion fine tuning and how to do fine tuning with DreamBooth

01:39:20 Best settings quick recap for SD 2.1 for 24 GB VRAM

01:40:34 How to install and run Kohya GUI on RunPod

01:44:16 How to enable public Gradio link for Kohya GUI

01:44:52 How to start RunPods without GPU

01:46:53 Cloud snyching your Pod data / content

thumbnail freepik macrovector

Video Transcription

  • 00:00:00 Greetings everyone.

  • 00:00:01 In this video, I am going to show how to use Automatic1111 Web UI for Stable Diffusion

  • 00:00:07 tasks on RunPod.io like you are using it on your computer.

  • 00:00:11 I will cover many topics such as how to upload and download files quickly, how to delete

  • 00:00:17 directories, how to install and run extensions, how to quickly download and use custom models,

  • 00:00:23 how to do DreamBooth training on Stable Diffusion 1.5 or 2.1 versions, how to do fine tuning

  • 00:00:30 via DreamBooth extension, how to do Textual Inversion training.

  • 00:00:34 I will also explain how their pricing system works, how you can use bidding, how you can

  • 00:00:39 transfer files from Pod to Pod or from Computer to Pod and vice versa, how you can install

  • 00:00:45 custom other scripts such as famous Kohya graphical user interface.

  • 00:00:50 I will also demonstrate how you can use new famous ControlNet on RunPod.io.

  • 00:00:56 So why RunPod.io?

  • 00:00:58 Because their system charges you based on per minute and they have great Discord support.

  • 00:01:03 They are also easier to use with the tools they have.

  • 00:01:07 But still, if you are interested in free cloud services for Stable Diffusion, I have two

  • 00:01:12 great tutorials for Google Colab.

  • 00:01:14 The first one is this one and the second one is this one.

  • 00:01:18 And if you don't know how to use Automatic1111 Web UI, if you don't know what is Stable Diffusion,

  • 00:01:23 what is Automatic1111 Web UI, I have great tutorial series for them.

  • 00:01:27 For example, you can begin with watching video and you can check out the other videos in

  • 00:01:31 this playlist.

  • 00:01:32 So let's begin the Grandmaster RunPod.io tutorial by signing up a new account.

  • 00:01:38 Click the sign up button.

  • 00:01:39 For sign up I will use my Google account.

  • 00:01:42 You can also enter your email and password if you wish.

  • 00:01:45 Choose your account to sign up.

  • 00:01:47 Click I have read and agreed RunPod Terms and Services.

  • 00:01:51 Click Continue.

  • 00:01:52 And yes, we are ready to start.

  • 00:01:54 First of all, you need to charge some credits to start using the pods.

  • 00:01:59 Click your balance from here as you can see in the right top menu, then it will show your

  • 00:02:04 available balance.

  • 00:02:06 From here you can pay with a card.

  • 00:02:08 You can change the amount that you want to charge.

  • 00:02:10 To have automatic payments you can add a card.

  • 00:02:13 They also allow you to pay with a crypto.

  • 00:02:16 Just click this icon.

  • 00:02:17 They also show recent transactions, recent charges, and everything is very transparent.

  • 00:02:23 OK, now I have logged in my account where I have my credits.

  • 00:02:28 Now we can start using our Pods.

  • 00:02:30 To do that, go to the browse servers tab in here and in here you will see the available

  • 00:02:37 servers.

  • 00:02:38 If you are going to do training, then I suggest you to get minimum 24 gigabytes VRAM having

  • 00:02:45 server.

  • 00:02:46 Because currently the latest officially released xformers is not working very well for training.

  • 00:02:53 They have a nightly version that works well, but for training we won't use xformers.

  • 00:03:00 And if you are not going to use xformers, then you should get minimum 24 gigabytes VRAM

  • 00:03:06 having server.

  • 00:03:07 I find that RTX A5000 is very decent GPU with a lower price.

  • 00:03:14 As you can see, it is only 0.32 dollars per hour.

  • 00:03:19 So I am going to deploy RTX A5000 GPU.

  • 00:03:23 When you click the deploy icon, this interface will appear to you.

  • 00:03:28 So in this interface, you should select your template.

  • 00:03:31 There are many templates.

  • 00:03:33 When you type Stable Diffusion, you see there are two very popular templates for Stable

  • 00:03:39 Diffusion.

  • 00:03:40 RunPod Stable Diffusion 1.5 and RunPod Stable Diffusion 2.1.

  • 00:03:43 I will start both of them and I will start doing training both of them simultaneously.

  • 00:03:49 So let's begin with RunPod Stable Diffusion 1.5 as a template.

  • 00:03:54 So it will also download the official 1.5 version when it starts.

  • 00:03:59 In here it shows us the other features.

  • 00:04:01 They are very decent.

  • 00:04:03 The temporary disk is the disk where the operating system will run.

  • 00:04:08 You don't need to increase this.

  • 00:04:10 And the persistent volume.

  • 00:04:11 Now this is really important.

  • 00:04:13 The persistent volume will stay remain as long as you don't delete your Pod.

  • 00:04:20 So when you close your Pod, it will remain as it is.

  • 00:04:23 It is like your hard drive.

  • 00:04:24 It is persistent.

  • 00:04:26 Everything you have generated, you have downloaded will remain there.

  • 00:04:30 So this should be a sufficient amount of disk space based on your needs.

  • 00:04:36 I am going to set it as 100 and when you set it, it will increase your minute credit spending.

  • 00:04:44 So when you hover your mouse over this icon, it shows that 0.10 per gigabyte per month

  • 00:04:52 for total disk on running Pods, 0.20 per gigabyte per month for volumes on exited Pods.

  • 00:05:00 I know that this may be sounding confusing in the beginning, so I have prepared an example

  • 00:05:06 for you which I will explain step by step.

  • 00:05:09 So we have 105 gigabytes while running.

  • 00:05:13 Why?

  • 00:05:14 Persistent volume is 100 gigabytes and temporary disk is 5 gigabytes.

  • 00:05:17 So while running, we are going to spend like this.

  • 00:05:21 Let's say our Pod did run 75 minutes.

  • 00:05:25 So 105 multiplied with 0.1 which is the per gigabyte price for per month.

  • 00:05:33 In per month, how many days there are?

  • 00:05:36 30 days.

  • 00:05:37 So we are dividing it with 30 days.

  • 00:05:40 In a day, how many hours there are?

  • 00:05:42 24 hours.

  • 00:05:43 So we are dividing it with 24 hours.

  • 00:05:46 In an hour, how many minutes there are?

  • 00:05:48 There are 60 minutes.

  • 00:05:49 So this is the price of per minute running and since we are running 75 minutes, it is

  • 00:05:56 going to take total 0.018 dollar from our credit.

  • 00:06:02 You can also copy this.

  • 00:06:04 Open your calculator with typing calculator in your search bar, paste it and hit enter

  • 00:06:09 and you will get the result like this as you can see.

  • 00:06:12 So in the below, I am giving example of Pod when it is not running.

  • 00:06:18 When the Pod is not running, we are going to use 100 gigabytes persistent volume and

  • 00:06:25 let's say our Pod did remain not running for two days.

  • 00:06:30 So when Pod is not running, the price is for per gigabyte per month 0.20 dollars.

  • 00:06:37 So since we have 100 gigabytes, 100 multiplied with 0.2, then let's delete this to not have

  • 00:06:45 more confusion than in a month.

  • 00:06:48 We have 30 days and we are going to use two days.

  • 00:06:52 So this will be our spending.

  • 00:06:54 So you can also open the calculator and copy-paste it, hit enter and you will get the price.

  • 00:07:01 So this part is the price of one day offline for your 100 gigabytes having Pod.

  • 00:07:09 And since it will be offline for two days, this is the credit that we are going to use.

  • 00:07:13 The very important thing is that these credits will be deducted from your account per minute.

  • 00:07:20 So if you keep using RunPod.io service for 10 minutes, you will be charged for 10 minutes.

  • 00:07:26 So if it remains offline for 10 minutes, then you will be charged for 10 minutes.

  • 00:07:31 It is not like taking your credits for per day, for per hour, or for per month.

  • 00:07:37 It is using your credits for every minute.

  • 00:07:40 When you hover your mouse over encrypt volume, you will see the message.

  • 00:07:44 Encrypted volumes provide better data security, but will incur a performance penalty and cannot

  • 00:07:49 be resized later.

  • 00:07:50 So unless you need this, don't check this box.

  • 00:07:54 Start Jupyter Notebook.

  • 00:07:55 This will make your life much easier.

  • 00:07:58 And this is the price per hour for our GPU.

  • 00:08:02 So this price will be added these volume prices as well.

  • 00:08:07 After you clicked deploy button, you will see an interface like this.

  • 00:08:11 You can go to the My Pods section and you will see on demand community cloud is being

  • 00:08:17 prepared.

  • 00:08:18 When I click in here, you see it is showing me the messages of the Pod that is being prepared,

  • 00:08:27 what is happening on the Pod.

  • 00:08:28 And once it becomes ready, we will see connect button in here.

  • 00:08:33 So it is initializing the Pod with the necessary installation and the Pod is now ready and

  • 00:08:39 it is running.

  • 00:08:40 Now I will start SD 2.1 version Pod simultaneously.

  • 00:08:45 To do that I am clicking browse servers and when you open browse servers tab, you will

  • 00:08:49 see in the right tab how much credits you are spending right now.

  • 00:08:54 Because currently my other Pod is running, as you can see in My Pods tab.

  • 00:09:00 So let's return back the browse servers and in here there are several options.

  • 00:09:04 So you see there are one GPU Pods, two GPU Pods, large Pods, four GPU or x large Pods,

  • 00:09:12 eight GPUs.

  • 00:09:13 So if you need multiple GPUs, then you can filter them with this.

  • 00:09:16 Also in each Pod, you will see their location, their available upload and download speeds,

  • 00:09:22 their available disks and other things.

  • 00:09:25 Choosing a less used Pod is better because if your previous Pod is fully used, then you

  • 00:09:34 won't be able to get a GPU on that Pod.

  • 00:09:37 So what happens then, then to use your existing files, you need to compose a new Pod and transfer

  • 00:09:45 your files.

  • 00:09:46 So availability is really important when choosing your Pod.

  • 00:09:50 If you choose highly preferred Pod, then you will have lesser time to get it and it will

  • 00:09:57 make things harder for you.

  • 00:09:59 So based on this fact, you should choose your Pod.

  • 00:10:03 So for the SD 2.1 version, I am going to pick another RTX A5000.

  • 00:10:09 When you click more RTX A5000, it displays you other locations as well.

  • 00:10:16 You see the upload and download speeds changes and the available space changes.

  • 00:10:22 More available space probably means that it is being used lesser.

  • 00:10:27 So for Canada server, it looks like it is not very much preferred this particular server.

  • 00:10:33 So there is also Norway server.

  • 00:10:35 You see it has great upload and download speeds.

  • 00:10:38 It has decent hard drive space as well.

  • 00:10:40 So it is probably also not very much used.

  • 00:10:43 However, this is expensive than others.

  • 00:10:46 So I think I will go with this Canada server.

  • 00:10:50 Its speeds are also decent.

  • 00:10:52 Click deploy.

  • 00:10:54 There is one more thing as well that I need to explain.

  • 00:10:57 Community cloud.

  • 00:10:58 So what does community cloud mean that?

  • 00:11:01 In the community cloud section, you will be able to bid for shared servers.

  • 00:11:06 All of the servers are shared, but this is kind of that you bid and if someone overbids

  • 00:11:11 you, they get your GPU.

  • 00:11:14 So in here you see the prices will be lower.

  • 00:11:17 When I click RTX A5000 select and then I click continue.

  • 00:11:22 So you see currently this is selected.

  • 00:11:24 RunPod Stable Diffusion 1.5.

  • 00:11:27 I can also change it from this template.

  • 00:11:29 Don't forget to change template.

  • 00:11:32 When I click continue, you see now I am getting pricing summary and advanced.

  • 00:11:36 When I click advanced, it will allow me to bid for a spot.

  • 00:11:41 So you see the current bid is 0.198.

  • 00:11:44 When I bid this, I will overbid the other person who has bidded lesser than this.

  • 00:11:51 So I am going to get his GPU if there are no available other GPUs.

  • 00:11:56 So let's say we did bid like this and we started our RunPod.

  • 00:12:00 So someone else comes and bids 0.2 and they will get our GPU.

  • 00:12:06 Then our pod will not have any GPU to do inference or training and our training will be also

  • 00:12:13 halted.

  • 00:12:14 So be careful with this.

  • 00:12:15 If you are not going to do training, if you are only going to do image generation, then

  • 00:12:20 you can go with this option and spend lesser.

  • 00:12:23 The running disk cost and exited disk cost also slightly changes.

  • 00:12:28 You can recalculate the cost.

  • 00:12:30 So this is how you do bidding and this is how you use community cloud servers.

  • 00:12:36 Since I am going to do training, I am going to use on demand server and I am going to

  • 00:12:41 pick on demand server from here.

  • 00:12:44 This Canada server.

  • 00:12:46 Let's check again.

  • 00:12:47 Yes, I am going to use this Canada server because it has the most available disk space.

  • 00:12:52 Therefore, I am assuming that it is being used lesser than others.

  • 00:12:57 Click deploy and we have selected RunPod Stable Diffusion 2.1 version.

  • 00:13:01 Let's set our persistent volume as 100 GB and let's also deploy it so it will get deployed.

  • 00:13:08 When I click My Pods, I will see them in here.

  • 00:13:11 OK, when you go on My Pods, it is going to show you if there will be a maintenance or

  • 00:13:18 not.

  • 00:13:19 So you should be careful with this maintenance.

  • 00:13:21 It says that it will start at this local time.

  • 00:13:24 Therefore, I think I will delete this Pod.

  • 00:13:27 So I will just click stop Pod and then I will delete it.

  • 00:13:31 So when you stop your Pod, it will remain as it is.

  • 00:13:34 However, if you click this terminate, then the Pod will be permanently deleted and you

  • 00:13:39 won't be able to recover or access any of your data.

  • 00:13:43 So now it is gone.

  • 00:13:45 Let's go back to the browse servers tab and let's pick another server from here.

  • 00:13:51 Maybe that is why it was being lesser used.

  • 00:13:54 So I will pick this one.

  • 00:13:56 OK, 2.1 version 100 GB.

  • 00:13:59 Let's deploy.

  • 00:14:00 Let's go to the My Pods and it is being deployed.

  • 00:14:03 The first one we started is running.

  • 00:14:06 The other one is being initialized and this is my per hour using credits right now.

  • 00:14:12 OK, let's connect our first Pod.

  • 00:14:14 To connect our first Pod.

  • 00:14:16 I am clicking My Pods.

  • 00:14:18 Let's refresh so you will see the interface as it is.

  • 00:14:20 OK, I am clicking here.

  • 00:14:22 It will open the interface.

  • 00:14:24 When you click logs, it will show you the logs screen.

  • 00:14:27 This is really important to debug the errors that you might encounter.

  • 00:14:31 So it started with xformers with Workspace 1.5. emaonly CKPT file.

  • 00:14:36 Actually, this is not the best CKPT file for training, so I will download the best one

  • 00:14:43 and it is running on xformers 0.0.16.

  • 00:14:47 This xformers is not compatible with DreamBooth training or Textual Inversion training, unfortunately,

  • 00:14:54 so we won't use xformers during training and the other things are also displayed here.

  • 00:14:59 When you click system logs, it will also show you the system logs.

  • 00:15:02 When you click this refresh icon, it will refresh and when you click this X, it will

  • 00:15:07 close it.

  • 00:15:08 So let's click connect and I will connect it via Jupyter Lab, which will make our life

  • 00:15:14 much easier.

  • 00:15:15 OK, so our Jupyter has started like this.

  • 00:15:19 The first thing that I am going to show you is how to change starting command line arguments.

  • 00:15:25 To change them, I am zooming it for you to see easier.

  • 00:15:28 That is webui-user.sh.

  • 00:15:33 So this is the file where the command line arguments are provided.

  • 00:15:38 You see it is starting with default port 3000.

  • 00:15:41 It is starting with xformers.

  • 00:15:43 The default CKPT is provided like this and there is a listen and enable insecure access.

  • 00:15:50 So if you wonder what are these arguments are doing, there is a wiki page of Automatic1111

  • 00:15:55 web UI and you can search for the commands by copying and pasting them and it will show

  • 00:16:02 you launch gradio with 0000 as server name allowing to respond network requests.

  • 00:16:07 Actually, I am going to also add share to be able to use it from my browser like this

  • 00:16:14 and enable insecure extension access means that we will be able to install extensions.

  • 00:16:19 Make sure that these commands are already enabled.

  • 00:16:23 Otherwise, you won't be able to install extensions and I think we are ready.

  • 00:16:28 I will also change the port to not get conflicted with any of the initial starting.

  • 00:16:34 Just save.

  • 00:16:35 When you save, you will see in the bottom saving completed.

  • 00:16:37 Then go to the running terminals and kernels.

  • 00:16:40 Shut down all of the running terminals and then go back to the file browser.

  • 00:16:46 Make sure that you are inside Stable Diffusion web UI folder.

  • 00:16:50 Then start the terminal.

  • 00:16:51 When you start the terminal, it will start with the folder that you are currently in.

  • 00:16:56 You see it is the same as the folder that we are in and in here we will use relauncher.py.

  • 00:17:03 To do that just type python and I will copy paste the name relauncher.py hit enter and

  • 00:17:10 it will restart our web UI with the newest set command line arguments.

  • 00:17:15 We should be able to see them in here.

  • 00:17:17 Yes, we are seeing dash dash port three thousand ten xformers.

  • 00:17:21 So with this way you can also start multiple instances of web UI.

  • 00:17:27 If you are a professional, then you can do that.

  • 00:17:29 But if you are not, I don't suggest you to do that.

  • 00:17:32 Now we can access it from this public URL.

  • 00:17:35 This public URL is currently not secured by a password.

  • 00:17:39 You can also add a password in here I think.

  • 00:17:42 Let me show you.

  • 00:17:43 Yes, you can also set out username and password.

  • 00:17:46 However, if you are not giving this URL to anyone, then it should be safe.

  • 00:17:51 As you can see, our interface is started.

  • 00:17:54 Let's start with typing a simple prompt and see what happens.

  • 00:17:58 OK, I have prepared my prompt.

  • 00:18:01 I hit generate and in My Pods now you will see the GPU memory used is being increased.

  • 00:18:07 GPU utilization will also increase as it generates the images and image is already generated.

  • 00:18:14 Let's set the batch size as eight and batch count as one hundred.

  • 00:18:18 And let's see how it is using our GPU.

  • 00:18:21 So let's hit the refresh.

  • 00:18:23 So it is showing like ten seconds ago.

  • 00:18:25 OK, now you see the GPU utilization is one hundred percent.

  • 00:18:29 GPU memory used is still significantly low because it is using also xformers, even though

  • 00:18:37 we are generating images as batches with eight as batch size.

  • 00:18:42 So in each time it will generate eight images.

  • 00:18:45 So where are these files are being saved?

  • 00:18:49 And how can I see if any error occurs?

  • 00:18:52 You see in the My Pods, just click the logs and you will see all of the logs here.

  • 00:18:58 This is really important to debug the logs.

  • 00:19:00 And in here in the terminal window, you will see what is happening.

  • 00:19:04 So how can you open the terminal.

  • 00:19:05 To open the terminal, go to the running terminals and kernels.

  • 00:19:08 And let's say I have closed the terminal.

  • 00:19:11 I double click the terminal and it will show me the terminal as here.

  • 00:19:16 As you can see in here.

  • 00:19:17 This is equal to the terminal that we have on our computer when we are running it locally

  • 00:19:23 on our computer.

  • 00:19:24 This is the it per second.

  • 00:19:26 However, since we are generating eight images at a time, it is actually over twenty four

  • 00:19:31 it per second.

  • 00:19:33 You need to multiply this with eight.

  • 00:19:35 OK, let's hit the interrupt.

  • 00:19:37 Now I will install the DreamBooth extension.

  • 00:19:39 To do that go to the extension tab.

  • 00:19:41 Go to the available hit load from.

  • 00:19:44 Search DreamBooth, hit install.

  • 00:19:47 Meanwhile, my two point one version terminal is also spending my time, my credit.

  • 00:19:53 So I will just stop it.

  • 00:19:55 So when you click stop Pod you are going to get this message, you should read it and understand

  • 00:20:01 it.

  • 00:20:02 OK, stopped Pod.

  • 00:20:03 Basically what does it says that all of the things that is not saved on your workspace

  • 00:20:09 will be lost.

  • 00:20:11 So whatever you have in your workspace will be saved.

  • 00:20:15 OK, let's see the status of the installation.

  • 00:20:18 OK, it says that installed into workspace, Stable Diffusion, web ui extensions, SD DreamBooth

  • 00:20:24 extension.

  • 00:20:25 Now I will restart my terminal because when you first time install DreamBooth, you really

  • 00:20:30 need to restart terminal so that it can install the necessary dependencies.

  • 00:20:35 So I am going to do terminal stop, shut down all terminals.

  • 00:20:39 Then I am going to Stable Diffusion Web UI folder and in here I will open a new terminal.

  • 00:20:46 Same as before, I will type Python and relauncher.py and hit enter.

  • 00:20:51 So the Web UI has been restarted and now we got a new link.

  • 00:20:56 Let's copy and paste it.

  • 00:20:58 Meanwhile, it is being loaded let's check out the generated images.

  • 00:21:01 So they are saved in the outputs folder in the in the text to image images folder.

  • 00:21:07 And yes, they are in here.

  • 00:21:09 So how to download them?

  • 00:21:10 You can download them one by one, right click and download.

  • 00:21:13 Then it will download like this.

  • 00:21:16 You can alternatively right click and download current folder as an archive.

  • 00:21:20 It will first make archive and it will download all of the images like this.

  • 00:21:25 It is a decent speed and it has downloaded all of these images.

  • 00:21:31 121 files so far.

  • 00:21:32 OK, the interface has been reloaded and now we are seeing the DreamBooth extension.

  • 00:21:38 When we go to the extension tab, check for updates.

  • 00:21:41 We should see the latest version in here.

  • 00:21:44 Actually, it says that it is behind.

  • 00:21:46 So let's click apply and restart UI.

  • 00:21:49 And once we do that, we get an error.

  • 00:21:53 It is relaunching in two seconds.

  • 00:21:55 OK, when relaunching, we are getting port error because the previous one was crashed.

  • 00:22:01 So what I'm going to do is: I will shut down all of the terminals.

  • 00:22:05 Go back to the file browser.

  • 00:22:07 In the first installation, you may encounter such errors.

  • 00:22:10 Go to the webui user.sh file and change the port here and then go to the terminal tab.

  • 00:22:18 Open a new terminal like this and type Python relauncher.py.

  • 00:22:22 it will restart and when restarting now it is showing us the DreamBooth revision and

  • 00:22:28 the SD Web UI revision like this.

  • 00:22:30 I will just start training.

  • 00:22:31 OK, it has been restarted.

  • 00:22:33 Let's open the new URL.

  • 00:22:35 OK, currently it is selected as 1.5 pruned emaonly CKPT and in the DreamBooth tab.

  • 00:22:42 When we are going to generate a new training model, this is only available model.

  • 00:22:48 However, 1.5 pruned CKPT is better than emaonly for training.

  • 00:22:53 Therefore, I am going to download this CKPT file.

  • 00:22:56 So how am I going to download it?

  • 00:22:58 You see there is a download button in here.

  • 00:23:01 I am right clicking and copying link address.

  • 00:23:04 But before doing that, let's start a new terminal.

  • 00:23:06 To do that, I am going to right click new plus icon here.

  • 00:23:10 It will open a new launcher.

  • 00:23:11 Hit terminal.

  • 00:23:13 For fast download I am going to use RunPod CTL.

  • 00:23:17 The RunPod CTL allows us to quickly download or upload files through our Pods to Pods or

  • 00:23:24 from Windows to Pods and vice versa.

  • 00:23:27 There are different versions.

  • 00:23:29 I am going to install the Linux one on my RunPod.

  • 00:23:32 So I am selecting it like this and copying it.

  • 00:23:35 Then in my terminal I am pasting it with control V and I am hitting enter.

  • 00:23:41 It will install the latest RunPod CTL.

  • 00:23:44 After this command type RunPod CTL hit enter and you should get a message like this.

  • 00:23:50 That means that it has been successfully installed or it was already installed.

  • 00:23:55 Now how are we going to download this pruned CKPT file.

  • 00:23:57 To download it first enter where you want to download, which is inside models inside

  • 00:24:05 Stable Diffusion.

  • 00:24:06 And in here where we want to download our model file, then I am going to click this

  • 00:24:12 plus new launcher, launch a new terminal, and in this new terminal, this is the folder

  • 00:24:18 where we are right now.

  • 00:24:19 Now for downloading type wget and copy this URL paste it, hit enter and it will get downloaded

  • 00:24:29 inside this folder.

  • 00:24:30 By the way RunPod CTL is not necessary to download this file, but we will use it to

  • 00:24:37 send data and get data from RunPod to our computer or from computer to RunPod or from

  • 00:24:44 RunPod to RunPod.

  • 00:24:45 This wget is a unix command and also alternative of it is available on windows as well.

  • 00:24:52 So with this wget command, you can quickly download files into your RunPod folders like

  • 00:24:59 this.

  • 00:25:00 So you see currently it is downloading with 90 megabytes per second which is pretty decent

  • 00:25:05 speed.

  • 00:25:06 Okay the download has been completed and now the file is located in here.

  • 00:25:10 Then what are we going to do is hit refresh button here and now I can see the 1.5 pruned

  • 00:25:18 CKPT as well.

  • 00:25:19 This is the way to download models from Hugging Face or wherever they are hosted.

  • 00:25:24 If you can get direct link of it I will show examples.

  • 00:25:28 Don't worry.

  • 00:25:29 So now I will start DreamBooth training with the best possible settings.

  • 00:25:33 First let's switch to 1.5 pruned CKPT.

  • 00:25:36 This is not necessary but I'm not being sure that it is working as expected.

  • 00:25:40 So I am making sure I have selected the target model in here as well.

  • 00:25:45 So it has been loaded.

  • 00:25:47 If it doesn't get loaded.

  • 00:25:48 You should check the terminal window.

  • 00:25:50 It is running on here.

  • 00:25:52 It will show what is happening and you can also check the logs window in here.

  • 00:25:57 It will show what is happening.

  • 00:25:59 Okay now let's give a name to our training.

  • 00:26:01 Let's say test SD 15 and check the source point.

  • 00:26:05 So you see it is not seeing my latest checkpoint.

  • 00:26:07 I am clicking refresh and I am checking the latest checkpoint.

  • 00:26:11 This is very good to teach faces.

  • 00:26:14 1.5 pruned CKPT the 512x model is selected and hit create model.

  • 00:26:20 I am not changing other parameters because optimal parameters are currently selected.

  • 00:26:26 These are more like experimental things or things that for more professional people.

  • 00:26:31 And in the terminal you see it is downloading the necessary files right now.

  • 00:26:36 That is why it is waiting.

  • 00:26:37 Okay it says that checkpoint successfully extracted.

  • 00:26:41 So the model has been generated.

  • 00:26:42 However as you can see, the interface is frozen.

  • 00:26:46 Unfortunately this is a problem of Gradio.

  • 00:26:49 So what are we going to do is we will refresh reload this page and now it says no interface

  • 00:26:54 is running.

  • 00:26:55 It looks like the interface has been terminated unexpectedly.

  • 00:27:01 And what do we see in the terminal in here in the system logs.

  • 00:27:05 Okay it doesn't show anything and it doesn't show anything in here either.

  • 00:27:09 So let's check out our terminals.

  • 00:27:11 Terminal one which is our main terminal and yes it is not showing.

  • 00:27:17 So what can we do.

  • 00:27:19 We need to restart.

  • 00:27:20 To restart I will shut down all terminals and I will follow the same procedure.

  • 00:27:24 Open terminal.

  • 00:27:26 However currently we are inside model Stable Diffusion so it won't work.

  • 00:27:29 We need to move to the parent folder.

  • 00:27:32 To moving parent folder.

  • 00:27:33 I am closing this terminal going to the folders tab.

  • 00:27:37 I am navigating like this opening a new terminal.

  • 00:27:40 Python relauncher.py and in my pod current GPU memory usage is only 11 percent.

  • 00:27:46 So it is good, which means that no other terminal or instance of Web UI is running.

  • 00:27:52 Also there are some warning messages here.

  • 00:27:55 I think we could ignore them.

  • 00:27:57 Okay it has started.

  • 00:27:58 I am opening this URL.

  • 00:28:00 I am going DreamBooth tab and now I will select my model because I already created it and

  • 00:28:06 it is selected.

  • 00:28:07 Let's set up the settings.

  • 00:28:09 Okay I won't pick this checkbox because it is usually causing me problems.

  • 00:28:14 How many steps per image.

  • 00:28:15 I am going to use 12 images and I am going to train up to 200 epochs.

  • 00:28:20 I will save model for every 10 epoch.

  • 00:28:24 Be careful with this because each save will take about five gigabyte space and with every

  • 00:28:32 10 epoch, it is going to make 20 saves.

  • 00:28:35 So it is going to take all of my hard drive.

  • 00:28:38 So I think I will make this up to 180 or 160.

  • 00:28:42 This should be sufficient.

  • 00:28:44 If you don't know what are these parameters, how am I setting them.

  • 00:28:48 I have an excellent DreamBooth tutorial on my YouTube channel.

  • 00:28:53 You should watch this definitely to learn more about DreamBooth training.

  • 00:28:57 Okay the batch size is one.

  • 00:28:59 Gradient accumulation steps are one class batch size which will determine how many images

  • 00:29:04 at a time that I want to be generated for classification images not related to training.

  • 00:29:11 I will set this as 16 because this graphic card has huge VRAM, but if we get error, I

  • 00:29:17 will reduce it.

  • 00:29:18 Set gradients to none when zeroing.

  • 00:29:20 Okay correct.

  • 00:29:21 I am going to use half learning rate.

  • 00:29:24 I am going to use sanity prompt as photo of ohwx man by Tomer Hanuka.

  • 00:29:30 I will explain what are these for.

  • 00:29:33 Actually I am explaining what are these for in this tutorial with details.

  • 00:29:37 This is for checking the over trained or not.

  • 00:29:41 And in here I am going to use EMA.

  • 00:29:43 This will improve my training success rate and I have 24 gigabyte VRAM.

  • 00:29:48 I will use 8 bit adam.

  • 00:29:50 I am going to use mixed precision and I am going to use fp16 because this bf16 is not

  • 00:29:57 supported by all graphic cards.

  • 00:29:58 It is supported by RTX 2000 series or 3000 series.

  • 00:30:03 I am not sure about this card as well.

  • 00:30:05 So fp16 is our most safe option for every cards.

  • 00:30:09 I am not going to use xformers.

  • 00:30:11 This is important because the current xformers is not supporting the DreamBooth training

  • 00:30:17 or Textual Inversion training.

  • 00:30:18 It is you see, xformers 0016.

  • 00:30:21 I think it will become compatible with xformers 0017 when it is officially released.

  • 00:30:28 Currently nightly version is supporting as well as far as I know.

  • 00:30:32 Cache latents.

  • 00:30:33 Yes it will improve speed.

  • 00:30:35 Train UNET.

  • 00:30:36 Okay these are the optimal settings actually, so no need to change them.

  • 00:30:40 And in here concepts.

  • 00:30:41 Okay first we need to upload our training data set.

  • 00:30:44 To do that go to the Stable Diffusion web ui folder or workspace.

  • 00:30:48 Doesn't matter I will upload them to workspace.

  • 00:30:51 In here create new folder training data set.

  • 00:30:55 I have named the folder like this.

  • 00:30:57 Enter inside folder and click upload files.

  • 00:31:00 Select the files from your computer.

  • 00:31:03 Since I don't have many files currently I am going to use this method and you see I

  • 00:31:08 have only nine images which are pretty close shots.

  • 00:31:12 No same background.

  • 00:31:13 No same clothes as you can see.

  • 00:31:16 I am all explaining what is a good training data set in this video and they are getting

  • 00:31:21 uploaded.

  • 00:31:22 We could also use runpodctl.

  • 00:31:24 However, since there isn't many files, I am using this methodology for this task and our

  • 00:31:31 training data set is ready.

  • 00:31:33 Okay now we need to give the path of it.

  • 00:31:35 To give the path of it.

  • 00:31:36 Go back to the workspace like this: right click, copy path, paste it like this and put

  • 00:31:42 a backslash to the beginning of it and where we want regularization images to be generated.

  • 00:31:48 I am copy pasting like this and I will type classification images.

  • 00:31:53 Okay filewords.

  • 00:31:55 For training faces I am not using filewords.

  • 00:31:57 It is more likely needed to fine-tune your model with lots of tokens and lots of good

  • 00:32:05 images.

  • 00:32:06 If you wonder how filewords are working in this short video.

  • 00:32:09 I am explaining how file words are actually working.

  • 00:32:13 So I'm just skipping file words and I am going to prompts.

  • 00:32:17 So our instance prompt will be ohwx man.

  • 00:32:20 Ohwx is our rare token and man is our class.

  • 00:32:24 Class prompt will be photo of man since I am teaching a face of a man.

  • 00:32:29 Sample prompt will be simply photo of ohwx man.

  • 00:32:33 I am not going to set negative prompt or other things.

  • 00:32:36 How many classification regularization images we want for per training image.

  • 00:32:42 I have nine training images and I want 50 for per image.

  • 00:32:46 This is actually a debated topic, how many is good is not precise.

  • 00:32:52 In the official DreamBooth paper, the authors have used 200 so you can also try with 100

  • 00:32:58 like this as well.

  • 00:32:59 Okay then go to the saving tab, generate a ckpt file when saving during training.

  • 00:33:04 So we will be able to generate checkpoints for every 10 epochs and then we will be able

  • 00:33:11 to compare them to see which one of the checkpoint is performing best, which one of the checkpoint

  • 00:33:19 has learned our subject best and with this way you can avoid over training.

  • 00:33:26 And once you are ready, click save settings and hit train.

  • 00:33:30 First it will start with generating class images.

  • 00:33:32 In my pod I will see GPU utilization and memory usage.

  • 00:33:35 Okay it says that exception training model no executable batch size found reached zero.

  • 00:33:42 Why we got this error because we did set the classification images batch size pretty big.

  • 00:33:50 If you make it like let's say six and try again.

  • 00:33:54 And now I am seeing that it is generating six images at a time.

  • 00:33:59 The it is pretty low actually only 12 because we need to multiply this with six and we are

  • 00:34:07 seeing the images are being generated.

  • 00:34:09 They will be saved in workspace, classification images directory like this: if you have previously

  • 00:34:18 generated images on your computer, then you can alternatively upload them.

  • 00:34:22 For uploading them I will install runpodctl on my windows.

  • 00:34:28 To do that I am going to run this command on my windows powershell.

  • 00:34:33 Type powershell, right click and hit enter.

  • 00:34:37 Okay the installation has been completed, the runpodctl is now available on my command

  • 00:34:42 prompt: let's see, runpodctl and now I am seeing it.

  • 00:34:48 So I have previously generated 2400 images on my hard drive.

  • 00:34:54 I am going to share this with runpodctl to download them in RunPod.

  • 00:34:59 Alternatively, you can use upload methodology as well.

  • 00:35:03 It also works, but for bigger files, runpodctl is better.

  • 00:35:08 So for sharing the folder type runpodctl send and the folder path like this.

  • 00:35:16 Getting the folder path easier, copy the folder path from here, paste it into the notepad

  • 00:35:22 like this.

  • 00:35:23 Put quotation mark to the beginning and end and type in your cmd.

  • 00:35:29 I will show from beginning once again.

  • 00:35:32 Open cmd type runpodctl send and paste the path like this and it will prepare like it.

  • 00:35:39 It says that photo of man zip already exists because in another cmd window we used that.

  • 00:35:46 So I need to delete this file.

  • 00:35:49 Okay, this zip file is generated inside local disk c users and my username directory.

  • 00:35:56 I am just going to delete it and I will run the command once again.

  • 00:36:00 It will quickly prepare all of the files and now share link is generated I am copying this,

  • 00:36:07 selecting it ctrl c or select it right click from here and copy, then go back to your Jupyter

  • 00:36:14 Lab where your RunPod is running and in here I will make a new folder like this: ready

  • 00:36:21 class.

  • 00:36:22 I will enter inside ready class folder, then I will open a new terminal like this and I

  • 00:36:28 will paste the command.

  • 00:36:31 You see runpodctl receive the URL it has generated, hit enter.

  • 00:36:36 It will connect to my computer and it will start downloading all of the files very quickly.

  • 00:36:41 So this is how you can upload files from your computer to the remote RunPod.

  • 00:36:48 The same thing applies to the RunPod to RunPod, so this is all vice versa.

  • 00:36:53 RunPod to computer, RunPod to RunPod computer to RunPod.

  • 00:36:57 You can send and receive files like this.

  • 00:37:00 This is of course totally depends on my upload speed.

  • 00:37:03 So when I open my task manager I see that it is using all of my available upload speed

  • 00:37:09 like this.

  • 00:37:10 This is pretty useful and convenient.

  • 00:37:13 Instead of generating new classification images each time which uses your GPU time and consumes

  • 00:37:20 your credits, you can prepare them on your computer and then quickly upload them to your

  • 00:37:25 RunPod.

  • 00:37:27 You can also upload them to any hosting, website, or other places that has better upload speed

  • 00:37:33 and download them with the wget command as I have shown to download ckpt file.

  • 00:37:41 RunPodCTL is extremely useful to upload and download files as you can see.

  • 00:37:48 Okay 2400 photo of man.

  • 00:37:51 The classification regularization images upload have been completed.

  • 00:37:55 Now I see that it is uploaded as a zip here.

  • 00:37:59 I need to extract them like oh.

  • 00:38:01 It has automatically extracted as you can see after refresh.

  • 00:38:05 Now they are here.

  • 00:38:07 So what am I going to do is I will cancel training and I will give this folder.

  • 00:38:13 So I will just skip image generation.

  • 00:38:16 So it has been cancelled.

  • 00:38:17 Let's give the new folder.

  • 00:38:20 In concepts type here new folder name and click save settings and okay looks like the

  • 00:38:27 train button is not appeared.

  • 00:38:29 So what we need to do is we need to refresh reload.

  • 00:38:34 Okay reloaded.

  • 00:38:35 Go to the DreamBooth select the model, hit load settings, verify the settings are properly

  • 00:38:41 loaded.

  • 00:38:42 Okay, this is not being saved so you should uncheck it.

  • 00:38:45 Okay, all settings are looking good and click train.

  • 00:38:49 Now it won't generate any new classification regularization images because we already provided.

  • 00:38:54 We can see that in the terminal window in here.

  • 00:38:58 So you see it is processing the uploaded photo of man images.

  • 00:39:02 Then it is going to cache the classification images with caching latents.

  • 00:39:07 Okay, the training has started.

  • 00:39:09 It has a pretty good speed as you can see.

  • 00:39:13 It is supposed to do 180 epochs in less than 15 minutes.

  • 00:39:17 However, this will take a little bit more time because it will generate ckpt during

  • 00:39:22 the training.

  • 00:39:23 We can also watch the training here.

  • 00:39:25 However, you may get disconnected from gradio interface.

  • 00:39:30 You can just watch the command line interface from here and know the status of the training

  • 00:39:35 if that happens.

  • 00:39:37 Okay, 10 epochs have been completed so it started generating the initial images as you

  • 00:39:42 can see.

  • 00:39:43 It also generated a checkpoint.

  • 00:39:45 Where can we see the checkpoint?

  • 00:39:47 Go to the workspace, go to the Stable Diffusion Web UI, go to the models folder, go to the

  • 00:39:52 Stable Diffusion folder, and in here you will see our training name, go inside that folder

  • 00:39:58 and now we can see the checkpoints being generated.

  • 00:40:01 Then we will test each one of them with x/y plot and see how they are performing.

  • 00:40:07 So if you want to see the generated samples during training, go to the models folder,

  • 00:40:12 go to the DreamBooth folder, go to your training named folder, and in here you will see samples.

  • 00:40:18 So these are the samples being generated during training and when you click the txt file,

  • 00:40:24 you will see which prompt was used to generate this image.

  • 00:40:27 When you double click the image, it will open image like this.

  • 00:40:30 So far it is not like me at the moment.

  • 00:40:34 When you go to the My Pods, you can see the GPU utilization and GPU memory being used.

  • 00:40:39 The GPU memory is almost full because we are using EMA and we are not using xformers.

  • 00:40:45 Because in the settings tab, we checked to use EMA and in the memory attention we didn't

  • 00:40:49 use xformers.

  • 00:40:50 And do, and these two are heavily increasing the memory usage.

  • 00:40:55 Also, we didn't check the gradient checkpointing.

  • 00:40:58 This also reduces the VRAM usage.

  • 00:41:01 However, if you have sufficient amount of VRAM you shouldn't check this as well.

  • 00:41:05 Okay, even after 130 epochs, it is still not learning even though it shows a good loss

  • 00:41:11 rate.

  • 00:41:12 That means that there is a bug currently with DreamBooth extension.

  • 00:41:16 Therefore, I have cancelled the training.

  • 00:41:18 Now I will delete the folder to open a space.

  • 00:41:22 Right click folder.

  • 00:41:23 Delete it.

  • 00:41:24 It says that it is not empty so you can't delete it.

  • 00:41:27 However, we can.

  • 00:41:28 Now I will show you how to do it.

  • 00:41:30 Click new, open a new terminal type rm minus r and the directory name test sd15.

  • 00:41:39 It will iteratively delete all of the files and the folder.

  • 00:41:42 After we refresh it is gone.

  • 00:41:44 Now I will figure out the problem and show you the working settings and setup.

  • 00:41:50 So I have figured out the problem and the problem was exactly as I have guessed it.

  • 00:41:55 It was using xformers even though we didn't select use xformers.

  • 00:42:02 In the settings, we had used memory attention default.

  • 00:42:06 However, it was still using xformers.

  • 00:42:09 Wo what did I do to fix this problem?

  • 00:42:12 It is simple.

  • 00:42:13 I have opened the Web UI user dot sh file and I have removed the dash dash xformers

  • 00:42:21 from command line arguments.

  • 00:42:22 I have restarted my Web UI.

  • 00:42:25 Then I have composed a new training with the exactly same parameters and it did work very

  • 00:42:31 well.

  • 00:42:32 The training has been completed, so let's download the samples and check them out on

  • 00:42:37 our computer.

  • 00:42:38 To download the folder of samples, I will use runpodctl command.

  • 00:42:42 So what I need to do is I will enter the samples folders.

  • 00:42:47 So to do that, go to the models folder, go to the DreamBooth, go to the training folder

  • 00:42:52 name so the samples are located here.

  • 00:42:55 Open a new command terminal, write runpodctl send samples which is the folder name and

  • 00:43:03 it will zip the samples folder and generate a receive command.

  • 00:43:07 Copy it with ctrl c.

  • 00:43:09 First, I need to add the path of runpodctl into my environment.

  • 00:43:15 So the currently runpodctl exe is located inside my user folder.

  • 00:43:21 Go to the users and your username and I will copy the runpot, yaml and runpodctl exe file.

  • 00:43:27 Copy them.

  • 00:43:28 Then I will make a new folder in my c drive as runpot exe.

  • 00:43:33 Paste them here.

  • 00:43:34 Then in the search bar search for environment, it will open, edit environment variables like

  • 00:43:41 here and in here.

  • 00:43:42 I am going to add a path variable for system variables, so go to the path, click edit and

  • 00:43:49 in here click browse, select the folder where you have copy pasted which is inside c drive.

  • 00:43:56 Runpod exe click ok, now the runpod exe is registered in my path.

  • 00:44:02 Click ok, click ok click ok and now runpod exe should be available to call from everywhere.

  • 00:44:09 Where I want to download.

  • 00:44:10 I want to download the files inside my pictures, inside test samples.

  • 00:44:16 I type cmd here.

  • 00:44:18 So currently this is where I am.

  • 00:44:20 Now I will copy and paste this command into my cmd window.

  • 00:44:25 And yes, it is running as expected and the files are being copied into my folder.

  • 00:44:33 And then they are automatically extracted with the folder name.

  • 00:44:36 So in here we are able to see the generated sample images.

  • 00:44:40 I can say that after 800 steps it started to resemble me and we have totally trained

  • 00:44:48 it for 160 epochs, 3200 steps, we can see the examples here.

  • 00:44:55 Okay, this is pretty much like me, so with good prompting I think we can get good results.

  • 00:45:03 So let's try all of the checkpoints to see which one is working best.

  • 00:45:07 How are we going to do that?

  • 00:45:09 We are going to do that with text to image tab and in here we are going to use x/y/z

  • 00:45:14 plot.

  • 00:45:15 Okay, it didn't appear.

  • 00:45:16 Let's refresh.

  • 00:45:17 Oh, looks like our instance is closed so I will restart.

  • 00:45:22 So before restarting make sure that you have closed all of the running terminals and I

  • 00:45:27 will also close all of the open tabs.

  • 00:45:29 Okay, all of the tabs and terminals are closed.

  • 00:45:33 Okay, Web UI is restarted.

  • 00:45:35 Let's open it.

  • 00:45:36 Okay, now we can also see the checkpoints in here so you can test particularly one of

  • 00:45:43 them.

  • 00:45:44 But I am going to do xyz plot test.

  • 00:45:47 But before that, let's decide our testing prompt.

  • 00:45:51 So I am going to make my tests on 2200 step checkpoint.

  • 00:45:58 I am going to select it from here.

  • 00:46:00 First, let's see the raw prompt.

  • 00:46:02 Ohwx man.

  • 00:46:03 Okay, this is the raw prompt and it looks pretty decent.

  • 00:46:08 This is the training data set you see.

  • 00:46:11 It looks pretty decent, but it looks like have some memorization.

  • 00:46:15 Actually, not exactly memorization.

  • 00:46:18 The clothe is similar but not exactly same.

  • 00:46:20 Okay, while doing testing, my Web UI has been killed.

  • 00:46:24 So I have checked the terminal to see the message.

  • 00:46:28 So you should be careful if some error happens.

  • 00:46:31 Make sure to check the terminal to see what is happening in the behind the scenes and

  • 00:46:36 now it is not able to restart.

  • 00:46:39 Therefore, I will close all of the terminals and start with a different port.

  • 00:46:46 To do that, you need to go to the terminals tab, shut down all and change the webui user.sh

  • 00:46:53 file: change the port from here.

  • 00:46:56 Save and restart.

  • 00:46:57 Okay, I got a simple prompt like this: photo of ohwx man 1.2 emphasis: you can learn emphasis

  • 00:47:05 from wiki page of Automatic1111.

  • 00:47:09 Just pause the video and read here if you don't know.

  • 00:47:12 And digital painting, artstation, masterpiece.

  • 00:47:14 I don't have any negative prompts.

  • 00:47:17 The picture is not exactly like me.

  • 00:47:19 So now we are ready to do test and see if model is trained enough.

  • 00:47:23 If it is not trained enough, then go to the DreamBooth tab, select the model load settings

  • 00:47:29 and continue training.

  • 00:47:31 It will continue training for the number of steps that you have defined in here.

  • 00:47:36 Okay, I started continue training and it will start from this model revision which means

  • 00:47:42 it will start from 3200 steps and it will continue to do training for number of epochs

  • 00:47:50 that we have defined here.

  • 00:47:52 However, my Gradio is crashed once again and I am able to see the continuing training from

  • 00:47:59 here.

  • 00:48:00 Now let's test the current checkpoints and see whether they are trained enough or not

  • 00:48:05 and decide upon that to continue training or not.

  • 00:48:08 However, since my Gradio is crashed, I have to restart the terminal because there is no

  • 00:48:14 way to cancel the training right now.

  • 00:48:16 Let's also have yes no way.

  • 00:48:18 Okay, I did a restart.

  • 00:48:20 So how are we going to test different checkpoints?

  • 00:48:24 Prompt emphasis, and CFG values.

  • 00:48:26 Go to the bottom, pick x/y/z plot and in here you see there are different type of parameters.

  • 00:48:33 So first parameter will be checkpoint name.

  • 00:48:36 When you click this icon it will paste the available checkpoints.

  • 00:48:40 I am going to start picking from 1600 steps which means 80 epochs for me.

  • 00:48:46 It depends on the your training dataset size and I will test the remaining as well like

  • 00:48:52 this.

  • 00:48:53 It is also displaying the calculated hash value.

  • 00:48:56 Okay, as a second thing, I am going to test prompt strength.

  • 00:49:00 To do that, I am going to use prompt s/r.

  • 00:49:02 So I am going to give this a any keyword like prsr.

  • 00:49:07 So the first value here will be prsr.

  • 00:49:10 Then I will type the prompt strengths like 1.1, 1.2, 1.3 let's also try 1.0, 1.4 1.5,

  • 00:49:20 1.6 and 1.7 okay, as a third comparison thing, I am going to test CFG value.

  • 00:49:28 So for CFG values, I am going to test seven, seven point five, eight, eight point five,

  • 00:49:34 nine, nine point five and ten.

  • 00:49:37 If you keep minus one for seeds then you won't be able to compare them very well.

  • 00:49:42 So do not check this checkbox so it will use same seed for all of the comparisons and then

  • 00:49:48 when you click generate it will process all of them.

  • 00:49:52 You can see the process in the command line interface.

  • 00:49:56 Now meanwhile this is running I will start my 2.1 version RunPod as well.

  • 00:50:02 Okay, it says that there is no available GPU for this RunPod right now so I can start it

  • 00:50:09 without a GPU and transfer my files with runpodctl.

  • 00:50:14 However, I do not have any files on it so I will just delete it because I didn't even

  • 00:50:20 start it yet and I will start a new one.

  • 00:50:22 Okay, I am going to use this one.

  • 00:50:26 Select the template from here.

  • 00:50:27 I will pick Stable Diffusion 2.1 version: I will start with 100 gigabytes, deploy my

  • 00:50:34 pods, it is being initialized and my other Pod is currently working with this kind of

  • 00:50:40 i/t.

  • 00:50:41 By the way, xformers is still not enabled right now, so if you enable it, this will

  • 00:50:46 become even faster.

  • 00:50:47 But for training, make sure that you have disabled it.

  • 00:50:50 And images are being generated in here.

  • 00:50:52 We will download all of them and check all of them later.

  • 00:50:56 Okay, 2.1 version is being generated and getting ready.

  • 00:51:01 Okay, 2.1 is now ready.

  • 00:51:03 Just click connect.

  • 00:51:05 Connect to the Jupyter.

  • 00:51:06 Okay, it says that it cannot connect yet so it is probably still not ready.

  • 00:51:10 Let's wait.

  • 00:51:11 Try again.

  • 00:51:13 Okay, let's refresh the page.

  • 00:51:14 Maybe the URL is incorrect.

  • 00:51:17 Yes, after the refresh I think it is fixed or it is just started.

  • 00:51:21 So just be patient a little bit.

  • 00:51:23 It is getting loaded and yes, 2.1 version is started.

  • 00:51:27 It is exactly same as the previous one.

  • 00:51:31 We are editing the command line arguments here.

  • 00:51:33 I will add dash dash share so I can use it as I want.

  • 00:51:37 And I will also remove xformers because it is preventing training.

  • 00:51:42 I will set the port as 3001.

  • 00:51:45 Save it.

  • 00:51:46 Then there is no open terminals.

  • 00:51:49 Let's open a new launcher terminal python relauncher.py Our comparison on SD 1.5 trained

  • 00:51:57 models are continuing.

  • 00:51:58 Okay, 2.1 RunPod is ready.

  • 00:52:02 Let's start it.

  • 00:52:03 Okay, currently selected model is 2.1 version.

  • 00:52:06 Let's test it.

  • 00:52:07 Okay, I have written my prompt the the output resolution is 768 and 768.

  • 00:52:14 Looks like we got a problem.

  • 00:52:17 It says that a tensor with all NaNs was produced in Unet.

  • 00:52:21 So we need to add no half command to the command line because with this graphic card, otherwise

  • 00:52:28 it won't work.

  • 00:52:29 So let's go back to the RunPod.

  • 00:52:31 Open the webui dash user dash sh.

  • 00:52:34 So for SD 2.1 version, make sure that you are using these command line arguments.

  • 00:52:41 These may be necessary for some of the custom models as well.

  • 00:52:44 So check the messages that you see in here.

  • 00:52:47 This message should be available also in the terminal window.

  • 00:52:51 Yes, you can also see the error in here as well.

  • 00:52:55 So I will close the terminal and restart it.

  • 00:52:58 Currently I am spending 0.669 dollars per hour.

  • 00:53:03 My mode of the RunPods are running right now.

  • 00:53:06 Okay, it looks like I have mistyped the dash dash precision.

  • 00:53:12 So it says that argument precision expected one argument.

  • 00:53:15 I will just fix it quickly.

  • 00:53:17 To fix it, I am opening the file and I am setting dash dash precision as full, saving

  • 00:53:24 it and restarting.

  • 00:53:26 Make sure that you only have one active running terminal, otherwise other terminals will also

  • 00:53:32 consume your VRAM memory.

  • 00:53:35 You can also see the VRAM memory usage in your My Pods tab and you can see the logs

  • 00:53:41 from here.

  • 00:53:42 This is really important to debug the errors.

  • 00:53:44 Okay, it is started with these command line arguments exactly like this.

  • 00:53:49 Let's open the Gradio window.

  • 00:53:51 Okay, let's hit generate with our written prompt and it is getting generated.

  • 00:53:57 And we got our tank image.

  • 00:54:00 Now I will install the extension same exactly as I have did.

  • 00:54:04 Okay, 2.1 version is ready with DreamBooth now.

  • 00:54:07 Go to the DreamBooth tab, make a new model I will name as test select the source checkpoint.

  • 00:54:13 Uncheck 512 model.

  • 00:54:14 Hit create.

  • 00:54:15 When the first time you click hit create it is downloading the necessary files same as

  • 00:54:19 before because this is a new RunPod so they are not connected.

  • 00:54:23 This is a fresh installation and checkpoint successfully extracted so it is ready.

  • 00:54:29 Okay, we didn't get any error so we can continue.

  • 00:54:32 So for 2.1 version usually you need more epochs so I will set this as 300.

  • 00:54:38 However, now it will also use more space.

  • 00:54:42 Due to more epochs so I need to reduce save model frequency.

  • 00:54:47 I think I will save it for every 20 epochs.

  • 00:54:50 Batch size one, gradient accumulation one, class batch size will be four.

  • 00:54:55 I am not going to set gradient checkpointing.

  • 00:54:58 You can also leave it as default learning rate.

  • 00:55:01 This would make it learn faster, however, it may also not learn very well or it may

  • 00:55:07 get over trained quickly.

  • 00:55:09 So I will make this as one.

  • 00:55:10 But you can also leave it as default.

  • 00:55:13 So the other things are same.

  • 00:55:14 Now with 2.1 version, I don't know if 24 gigabytes will be enough without xformers when we use

  • 00:55:20 EMA so I will test it.

  • 00:55:23 Okay, it says let's make it like this.

  • 00:55:26 Actually, we should click performance wizard so it will set the optimal ones for us.

  • 00:55:32 Okay, okay, I am leaving the settings like this.

  • 00:55:35 Let's also set the memory attention as default and let's see if it will work.

  • 00:55:39 By the way, we also need to re-upload our training images and these training images

  • 00:55:44 have to be 768 pixels because this model is 768 pixels model.

  • 00:55:52 So to upload them I am following just the same things.

  • 00:55:56 Here my 768 pixel images.

  • 00:55:59 I'm just going to use drag and drop but you can use runpodctl as well as just I have displayed.

  • 00:56:06 Okay they are ready.

  • 00:56:07 So I am right clicking copy path, paste it, adding a backslash to the beginning.

  • 00:56:12 Copy this and let's say class 786.

  • 00:56:17 All other settings are same.

  • 00:56:19 Ohwx man, photo of man, photo of ohwx man and I will use only 12 images because I want

  • 00:56:28 training to start quickly but you should use bigger number.

  • 00:56:32 I am checking generate ckpt during the checkpoints, click save settings, and hit train.

  • 00:56:39 So it will start with generating class images.

  • 00:56:42 So for each image we are generating 12.

  • 00:56:44 Okay, we got an error.

  • 00:56:46 Therefore, we need to decrease the class batch size.

  • 00:56:49 Let's hit train again.

  • 00:56:50 Okay, looks like our Gradio is killed, therefore it has to be restarted.

  • 00:56:55 You may get these errors.

  • 00:56:56 Okay, during restart, it is throwing error because port is still being in used.

  • 00:57:01 So I am going to close the terminal, change the port and restart myself manually.

  • 00:57:07 Okay, restart has been completed.

  • 00:57:09 Let's go to the DreamBooth, select model load settings, just quickly verify settings.

  • 00:57:14 I am unchecking this because it is usually problematic.

  • 00:57:17 Class batch size is two and let's hit train.

  • 00:57:21 You can also generate classification images from text to image directly yourself.

  • 00:57:25 Cut the generated images and put them into a new folder.

  • 00:57:28 Okay, we got error once again.

  • 00:57:30 This is a memory error actually.

  • 00:57:32 When we check the command line interface, we can see the memory error.

  • 00:57:37 So looks like our only option is class batch size one.

  • 00:57:40 Let's click train.

  • 00:57:41 Okay, it is working.

  • 00:57:42 However, this will be very slow.

  • 00:57:44 So what am I going to do is?

  • 00:57:45 I will enable xformers, manually generate from text to image and use them as classification

  • 00:57:51 images which will save our time significantly.

  • 00:57:55 So follow me how am I doing.

  • 00:57:57 First, I will just terminate the terminal from here.

  • 00:58:00 I will add dash dash xformers, change the port and restart python relauncher.py I would

  • 00:58:08 also clear text to images tab so you can directly use it so I will just rename it.

  • 00:58:14 It will generate a new folder for me and new app is started with xformers.

  • 00:58:19 Let's open it!

  • 00:58:20 So our class prompt is photo of man.

  • 00:58:23 I am typing photo of man.

  • 00:58:25 I am going to set the sampling steps as 30 which is a decent enough and I am leaving

  • 00:58:30 all other options are same and I will use batch size as eight and how many images total

  • 00:58:37 do you need?

  • 00:58:38 Let's say for per training image 50 images since I have nine images, I am going to generate

  • 00:58:44 480 images.

  • 00:58:46 Therefore I need to set this minimum 57 and then hit generate and let's see if we will

  • 00:58:53 get out of memory error.

  • 00:58:55 And you see from text to image tab we are not getting out of memory error even when

  • 00:59:01 the batch size is eight.

  • 00:59:03 So it will very quickly generate all of these images for us much faster than using the classification

  • 00:59:11 images that is being generated in the DreamBooth.

  • 00:59:15 If you wonder why it is generating images like this or why we are using these kind of

  • 00:59:19 images, in this video I am explaining all of them so we are keeping the underlying contextual

  • 00:59:25 data of the model.

  • 00:59:27 You could also use more beautiful images in your classification training data set.

  • 00:59:32 However, it would break your model conceptual meaning so your model would become more biased

  • 00:59:40 to the images that you have used.

  • 00:59:42 Also, your face would be biased to the images that you use.

  • 00:59:46 With this methodology, we are using the underlying contextual knowledge of the model and we are

  • 00:59:53 trying to keep it as much as possible.

  • 00:59:55 However, this is up to you.

  • 00:59:58 So if you use all handsome images, all full colored, professional real images, then your

  • 01:00:06 model would become more biased to them.

  • 01:00:08 This is how custom models are usually made.

  • 01:00:12 They are being cooked to those kind of images.

  • 01:00:15 So whatever you type, you are getting beautiful images because all of the other underlying

  • 01:00:20 conceptual data of the model is lost during the training.

  • 01:00:25 Actually, according to the ControlNet developer, SD 2.1 version is inferior to the SD 1.5 due

  • 01:00:35 to the used CLIP.

  • 01:00:36 You can read this with pausing the video right now.

  • 01:00:39 Okay, looks like our 1.5 version experiment has ended.

  • 01:00:44 Let's go to the outputs and in here there are text to image grids and you see there

  • 01:00:51 is a grid file.

  • 01:00:53 35 megabytes.

  • 01:00:54 Let's open it.

  • 01:00:55 Actually I will download this and there is also 228 megabytes.

  • 01:01:00 So for downloading let's use the runpodctl.

  • 01:01:04 I am going to open a new command line in here.

  • 01:01:07 Runpodctl, send text to image grids.

  • 01:01:11 Hit enter and it will generate download link for us.

  • 01:01:15 Go to the folder where you want to download.

  • 01:01:17 I will download inside in here, type cmd, copy paste the link like this.

  • 01:01:22 So it is going to download 265 megabyte grid output.

  • 01:01:27 This is much faster than downloading from the Jupyter notebook.

  • 01:01:31 Okay, the grid images are downloaded.

  • 01:01:34 And in here this is the newest grid image that is generated.

  • 01:01:39 It is over 200 megabytes, it is over 35 000 pixels and now we are able to compare different

  • 01:01:47 checkpoints with different prompt emphasis and with different CFG scale.

  • 01:01:53 So this is for CFG scale 7.

  • 01:01:55 These are the checkpoints and these are the prompt emphasis.

  • 01:02:00 Let's find the best one that we like and that is similar to us.

  • 01:02:05 You see these faces are not like me but in here I am seeing faces like me.

  • 01:02:12 So with prompt strength 1.4 in these checkpoints I am starting to get similar face like to

  • 01:02:19 me.

  • 01:02:20 I think this one is very similar to me.

  • 01:02:22 So with prompt strength 1.4 for CFG scale 7 and for checkpoint 3000 steps.

  • 01:02:30 Yeah I like it.

  • 01:02:31 So you should also compare for yourself.

  • 01:02:34 And after prompt strength 1.4 the image becomes very very bad.

  • 01:02:39 So let's also look at the other CFG scales and checkpoints.

  • 01:02:44 Okay now I will show you slowly what is happening from CFG scale 10 to 7 and this is the prompt

  • 01:02:51 strength 1.4.

  • 01:02:53 This is how the images are changing.

  • 01:02:55 This would of course depend on your training data set, how it is trained and I can see

  • 01:03:01 that they are not very good at all because we also didn't use any negative prompts.

  • 01:03:08 Our aim here is finding the sweet spot of prompt strength and the checkpoint and the

  • 01:03:16 CFG possibly.

  • 01:03:18 Okay I think this model is still not trained enough.

  • 01:03:22 Because with only 1.4 strength and in the 3200 steps, it is providing the best.

  • 01:03:31 So therefore I will train this model even further with more steps and then do another

  • 01:03:38 experiment.

  • 01:03:39 However, currently we could use 1.4 strength with checkpoint 3200.

  • 01:03:45 I suggest you to test no half and precision full training for SD 1.5 version as well without

  • 01:03:54 xformers and compare whether it is learning better or not.

  • 01:04:00 Because of the used graphic card this could be making a difference and you can test use

  • 01:04:06 8bit adam or not.

  • 01:04:08 You can test mixed precision no versus fp16 and bf16 so these all things could improve

  • 01:04:16 your training success rate.

  • 01:04:18 You should experiment with them and currently I do not have time to test all of them.

  • 01:04:24 I am showing the some of the settings that are widely used but you should also experiment

  • 01:04:30 with them.

  • 01:04:31 Like options like this or like this or like this or like this.

  • 01:04:36 Now I will show you how to download custom models from CivitAI .com and use them in your

  • 01:04:43 RunPod io.

  • 01:04:44 So I am going to show example of Protogen x3.4.

  • 01:04:49 Right click download latest copy link, go to your RunPod io interface, Jupyter interface

  • 01:04:57 and in here go to the folder where the model files are downloaded.

  • 01:05:02 So in this folder which is model Stable Diffusion where you are supposed to put your model files,

  • 01:05:09 open a new launcher, open launcher, type wget, paste the link and hit enter and it will start

  • 01:05:16 downloading the model file.

  • 01:05:18 So you see 5.6 gigabytes and you see there are no more space left on my hard drive.

  • 01:05:26 What I need to do is I will delete the some of the models.

  • 01:05:31 So I am going to delete some of the training checkpoints.

  • 01:05:34 They are located inside models, inside Stable Diffusion, inside my training folder and in

  • 01:05:41 here I am going to remove delete some of them.

  • 01:05:43 You can also do a directory delete right, click and delete.

  • 01:05:47 You can also select them and hit delete button on your keyboard.

  • 01:05:51 Okay, I think we got now sufficient space so I will just rerun the prompt.

  • 01:05:56 So to open back the latest executed command I just hit up arrow and hit enter and now

  • 01:06:02 it will start downloading.

  • 01:06:03 Currently it will be downloaded in this folder where we had opened this terminal.

  • 01:06:10 Let's go back to there.

  • 01:06:11 Models Stable Diffusion and now this file is being downloaded with the name of 4048.

  • 01:06:19 Then I will rename it.

  • 01:06:21 Meanwhile, 2.1 version classification regularization images are still being generated.

  • 01:06:26 We can see the process in the terminal of it.

  • 01:06:30 You see it has generated over 160 images so far.

  • 01:06:34 Okay, it is downloading the custom model file with 50 megabytes per second.

  • 01:06:39 You can also upload those files from your computer or you can download from Hugging

  • 01:06:45 Face as I have shown you already.

  • 01:06:48 So this is how you can download files fast on your Pod.

  • 01:06:52 Okay, the file has been downloaded and saved as 4048.

  • 01:06:57 I will rename right click, rename and let's say protogen x34 it is renamed.

  • 01:07:05 Then let's go back to our Stable Diffusion interface.

  • 01:07:08 Click, refresh folder.

  • 01:07:10 It is not appearing because the model file extension is not correct.

  • 01:07:14 Right Click.

  • 01:07:15 And when renaming, add dot ckpt to end of it like this and then refresh again.

  • 01:07:23 Okay, now we see the model here.

  • 01:07:25 Let's test it.

  • 01:07:26 Okay, it didn't load even though I have selected.

  • 01:07:29 Let's look at the command line interface.

  • 01:07:31 Okay, it says that we should add disable safe unpickle because we have downloaded it like

  • 01:07:38 that.

  • 01:07:39 So I will add this to the command line arguments and restart like this.

  • 01:07:44 Let's also change the port.

  • 01:07:46 Just close all of the terminals.

  • 01:07:47 Okay, restart has been completed with disable safe unpickle.

  • 01:07:51 Let's open the interface.

  • 01:07:53 Okay, let's try with protogen.

  • 01:07:55 Okay, we got error once again because when we download it, it is downloading safetensors

  • 01:08:02 not ckpt.

  • 01:08:03 Therefore, we have to rename it once again into safe tensors .safetensors like this and

  • 01:08:11 try again.

  • 01:08:12 Let's hit refresh.

  • 01:08:13 Now there is safetensors.

  • 01:08:15 Okay, it is loaded.

  • 01:08:16 Let's test it and protogen is working as expected.

  • 01:08:20 You see of awesome, intricate, fantastic, castle, in a forest and this is what I got.

  • 01:08:25 Let's run again.

  • 01:08:26 And yes, this is definitely protogen.

  • 01:08:28 Let me run it on 1.5 version official as well.

  • 01:08:32 Okay, 1.5 version is loaded and this is the result on 1.5 version official.

  • 01:08:38 So this is how you can use custom models on RunPod io.

  • 01:08:43 2.1 image generation is still going on.

  • 01:08:45 Now I will show you how to do Textual Inversion training.

  • 01:08:49 To do that, let's go to the train tab.

  • 01:08:52 By the way before doing that, let's go to the settings and in here in training, move

  • 01:08:56 VAE and CLIP to RAM when training if possible.

  • 01:09:00 You can pick this option to reduce VRAM usage.

  • 01:09:03 You can also turn on pin memory for data loader.

  • 01:09:06 Makes training slightly faster, but it can increase memory usage.

  • 01:09:09 You can also pick this depending on your machine's RAM memory.

  • 01:09:13 However, since we have 24 gigabytes, I am not going to pick them.

  • 01:09:17 So let's give a name as test initialization text is none.

  • 01:09:21 Number of vectors is two.

  • 01:09:24 You can watch my excellent how to do Stable Diffusion Textual Inversion video.

  • 01:09:30 I am explaining in great details in this video and you can learn many of the things related

  • 01:09:37 to the Textual Inversion from this video.

  • 01:09:40 Hit create embedding and it is already created.

  • 01:09:43 Let's go to the train tab, pick the embedding.

  • 01:09:46 We also need to set dataset directory.

  • 01:09:49 So our data set directory is like this.

  • 01:09:52 We don't need classification images for Textual Inversion training.

  • 01:09:56 You can reduce the learning rate or leave it as default.

  • 01:10:00 You can test it.

  • 01:10:01 Okay, we need a style file word for Textual Inversion.

  • 01:10:06 When you watch this video, you will understand it better.

  • 01:10:10 So this text file is located inside Stable Diffusion, inside Textual Inversion templates.

  • 01:10:16 In here, i'm going to edit the none as as [name].

  • 01:10:20 You need this otherwise it won't work.

  • 01:10:22 This is the name of the Textual Inversion.

  • 01:10:24 This is basically going to use the unique tokens that it generates so i'm going to pick

  • 01:10:30 none from here.

  • 01:10:32 My width and height are 512 pixels.

  • 01:10:35 Max number of steps.

  • 01:10:37 You can leave it as this because it will generate pretty small files, but since we are already

  • 01:10:42 using a lot of space, I will delete my older checkpoints from DreamBooth, Stable Diffusion

  • 01:10:48 Web UI inside models inside Stable Diffusion and inside test2 folder.

  • 01:10:54 Okay for selecting hit left shift key, select first, then go to the very bottom while pressing

  • 01:11:00 shift key hit here it will select all of them.

  • 01:11:04 Then while hitting control button left control, unpick the ones that you don't want to delete

  • 01:11:10 right, click and hit delete.

  • 01:11:12 It will delete all these files and open a space for me.

  • 01:11:16 Okay, now we are ready.

  • 01:11:17 I want to check checkpoints for every 10 epochs.

  • 01:11:23 How many training images I have.

  • 01:11:25 I have nine training images.

  • 01:11:26 Therefore, one epochs means nine steps.

  • 01:11:30 Five epochs means 45 steps.

  • 01:11:33 So for every five epoch I am going to make save.

  • 01:11:36 I don't need this and I will pick deterministic.

  • 01:11:40 This is the best option and we are ready.

  • 01:11:43 Just click hit train embedding.

  • 01:11:45 Okay, it has started training.

  • 01:11:48 By the way currently xformers is enabled.

  • 01:11:51 Therefore, I will disable it and restart again because there is a bug as I have just shown

  • 01:11:58 and it is preventing good training.

  • 01:12:01 Also in settings this is unchecked.

  • 01:12:04 Use cross attention optimizations but still it could be using it due to a bug.

  • 01:12:10 So best thing is just disabling the xformers and restarting the training.

  • 01:12:16 However, looks like learning right now I think.

  • 01:12:19 So probably there is no bug for this one unlike the DreamBooth.

  • 01:12:24 The loss rate is also pretty low and it is pretty fast.

  • 01:12:28 Okay, it already started learning my face.

  • 01:12:32 Not very good but there is a resemblance as you can see and it is really really fast the

  • 01:12:38 number of steps it is taking really really fast.

  • 01:12:41 This is how fast it is you see.

  • 01:12:44 Training Textual Inversion epochs, training speed, the i/t per second and it is learning.

  • 01:12:52 However, which one will be best is needs to be checked from text to image tab from x/y

  • 01:13:00 plot and as you can see it is learning.

  • 01:13:03 So all these samples are being saved inside.

  • 01:13:07 Let's go to the Stable Diffusion Web UI folder inside here, textual inversion, inside here

  • 01:13:13 you will see the training date and inside here the name of the Textual Inversion training

  • 01:13:17 inside here images and these are the images named with the epoch number.

  • 01:13:24 You can check them like this, or you can download them and check all of them.

  • 01:13:29 Okay, 2700 steps looks a little bit decent.

  • 01:13:34 It is actually equal to 300 epochs.

  • 01:13:38 Maybe it may get better over time or we may need to use more vector count, but since I

  • 01:13:44 am just trying to explain, I will use this and show you how you can use this checkpoint

  • 01:13:51 in your queries in your text to image tab.

  • 01:13:54 First I will cancel the training.

  • 01:13:55 This one also looks like a decent one.

  • 01:13:59 Hit interrupt: yeah.

  • 01:14:01 3240 also looking decent so it may get even better over time as we do more training, but

  • 01:14:08 I don't have too much time.

  • 01:14:10 Okay, so to be able to use these embeddings first, we need to copy the generated pt file

  • 01:14:16 which is the checkpoint.

  • 01:14:18 To do that, go to the Textual Inversion inside your main folder, go to the date that you

  • 01:14:23 did training, go to the training name, go to the embeddings, and in here you will see

  • 01:14:28 the dot pt files.

  • 01:14:30 Pick the checkpoints that you want to test right, click, copy, then go back to the main

  • 01:14:36 installation folder and in here you will see embeddings folder and paste them there like

  • 01:14:42 this so it is pasted now here.

  • 01:14:44 So to activate this Textual Inversion, we are going to type it like this.

  • 01:14:50 By the way, there is one very important thing when you do training, it will train based

  • 01:14:56 on the model selected here.

  • 01:14:57 Therefore this will be most compatible with this selected model and just hit generate

  • 01:15:04 and you see our face is generated trained subject.

  • 01:15:07 Now we can try stylizing.

  • 01:15:09 Okay, I did a simple test awesome, intricate, 3d artstation, cinematic lightning and generated

  • 01:15:16 batch size as eight and these are the generated images.

  • 01:15:20 So with better prompting it should be possible to get better results.

  • 01:15:25 You can do same training on protogen or any other custom model as well, just check it

  • 01:15:31 from here, make a new embedding and do training.

  • 01:15:35 The Textual Inversion training works pretty decent on custom models as well.

  • 01:15:40 However, custom models are not working very well with DreamBooth training.

  • 01:15:44 Okay, so our image generation for classification data set for SD 2.1 is completed.

  • 01:15:52 Now we will put them into the correct folder so all of the images are now generated inside

  • 01:15:59 this folder.

  • 01:16:00 How am I gonna do that?

  • 01:16:01 I will right click cut, then I will go to the workspace, right, click paste and then

  • 01:16:07 I will rename as class 768 version 2 like this.

  • 01:16:14 Then I will go to the DreamBooth tab, I will open my test, load settings, go to the settings

  • 01:16:21 and in here I will set the concept the classification data set directory as class 768 version 2

  • 01:16:30 and now I have 50 images for per instance.

  • 01:16:34 Okay, everything else is same.

  • 01:16:36 Just save settings and hit train and let's see if we will get out of memory error or

  • 01:16:41 not.

  • 01:16:42 So it is preprocessing class images.

  • 01:16:44 We can see the command line interface okay, uh, so it looks like the Gradio is killed

  • 01:16:51 or our web app.

  • 01:16:52 Therefore, we need to restart it.

  • 01:16:55 By the way, we also need to disable xformers, otherwise it won't work for training.

  • 01:16:59 So I am disabling xformers, saving, closing all of the terminals and starting a new instance

  • 01:17:06 of the web ui.

  • 01:17:08 Okay, restart is done.

  • 01:17:09 You see these are the command line arguments that I have used to start 2.1 version Web

  • 01:17:16 UI let's open it.

  • 01:17:18 Go to the DreamBooth select model, click load settings.

  • 01:17:22 Just verify settings quickly if they are correct or not.

  • 01:17:25 Okay, all looking good and let's click train to see how it works.

  • 01:17:30 Okay, preprocessing class.

  • 01:17:32 Let's also see the cmd window from here.

  • 01:17:35 Okay, you see it says nothing to generate because we already have sufficient number

  • 01:17:40 of classification images in our folder 456 and we need 450 images.

  • 01:17:47 So it is caching right now.

  • 01:17:49 Okay, after caching it is killed once again and trying to relaunch.

  • 01:17:55 Okay, we got out of memory error so we need to enable some more of the memory optimization

  • 01:18:03 and I already unchecked the EMA.

  • 01:18:07 Therefore, looks like we need some more optimization.

  • 01:18:10 So I will pick fp16, but we are not using mixed precision so it is probably being ignored.

  • 01:18:17 What else we can do for more optimization?

  • 01:18:21 Gradient checkpointing yes, we can do this and let's save settings, load settings, and

  • 01:18:28 hit train once again.

  • 01:18:30 Okay, looks like I had to refresh load settings.

  • 01:18:34 Hit train okay, yeah, it says that change in precision detected.

  • 01:18:39 Please restart Web UI entirely to use new precision.

  • 01:18:43 All right, so we will restart it.

  • 01:18:46 Okay.

  • 01:18:47 Restart is done.

  • 01:18:48 Let's go to DreamBooth select model load settings and now gradient checkpointing enabled.

  • 01:18:54 Use 8-bit adam fp16, memory attention default, cache latents and let's see if we will get

  • 01:19:02 any error or not.

  • 01:19:03 Okay, training started this time.

  • 01:19:05 I hope we don't get any error during preview generation because it also uses GPU and we

  • 01:19:11 can see our GPU is being used 95 percent already.

  • 01:19:16 You can also see other utilization parameters here volume, container, and this is my other

  • 01:19:21 running pod and this is how much I have spent and how much I am spending.

  • 01:19:26 So now I will show you how to install ControlNet on SD 1.5 version.

  • 01:19:33 If you don't know what is control net and how to install and use it.

  • 01:19:37 I already have a great tutorial on my channel.

  • 01:19:40 So this is the extension that we are going to install.

  • 01:19:42 Copy the extension URL.

  • 01:19:45 You can also find this in the description.

  • 01:19:47 Go to the extension tabs, go to the install from URL, copy paste it, and click install.

  • 01:19:53 Then once it is installed, go to the installed tab, apply and restart UI.

  • 01:19:58 After we clicked it and unfortunately the Gradio is died again.

  • 01:20:01 So I will relaunch it and since I am not going to do any training, I am enabling xformers

  • 01:20:07 once again because it will speed up my image generation.

  • 01:20:11 Okay, after restart, go to the text to image tab and in the bottom you should see ControlNet

  • 01:20:16 like this.

  • 01:20:17 Now we need to download ControlNet model which is hosted on Hugging Face in here.

  • 01:20:24 Go to the files and versions and just download which model that you want to use.

  • 01:20:29 Because each model files are like five gigabytes.

  • 01:20:32 I'm going to show scribble as an example.

  • 01:20:35 All others are same, exactly same and when you watch this video you will learn more about

  • 01:20:40 them.

  • 01:20:41 Okay right.

  • 01:20:43 Click the download button, copy link path, go to your RunPod.

  • 01:20:46 So these files will be put inside another folder.

  • 01:20:50 Go to the extensions, go to the sd Web UI control net, go to the models.

  • 01:20:55 We are going to put them inside here.

  • 01:20:58 So in here I will open new launcher, open terminal wget and copy paste the link and

  • 01:21:05 you see it has started downloading file from Hugging Face with an incredible speed.

  • 01:21:10 Meanwhile I will show something else how you can download your trained models into your

  • 01:21:15 computer.

  • 01:21:17 So to download your trained DreamBooth model, go to the models, go to the Stable Diffusion,

  • 01:21:23 go to the training and let's say you want to download this ckpt.

  • 01:21:27 You can right click and download.

  • 01:21:30 Or you can use runpodctl as we already shown multiple times.

  • 01:21:34 But let's just show once again, runpodctl send the checkpoint file full name, not the

  • 01:21:41 directory and it generated the download command like this: go to the download folder where

  • 01:21:47 you want to download.

  • 01:21:48 So let's say I want to download here.

  • 01:21:50 Open cmd, right!

  • 01:21:51 click, paste and hit enter and that model file will be downloaded into your computer

  • 01:21:57 with a great speed like this as you can see.

  • 01:22:00 It is downloading with 70 megabits per second and my maximum internet is 100 megabits per

  • 01:22:06 second.

  • 01:22:07 So this will of course totally depend on how other users are currently using the Pot network.

  • 01:22:14 Okay, meanwhile ControlNet file is downloaded and saved in the folder.

  • 01:22:19 Let's verify it.

  • 01:22:20 Go to the extensions sd web ui control net inside models.

  • 01:22:24 I see the pth file.

  • 01:22:26 Let's go back to the ControlNet and in here.

  • 01:22:29 When we refresh models we should see it.

  • 01:22:33 Yes it is here and there is also pre-processor, then upload your file into this canvas that

  • 01:22:40 you want to use.

  • 01:22:41 I will do a scribble.

  • 01:22:43 I am going to use this file.

  • 01:22:46 Let's set the canvas with and height like this, also set your target resolution.

  • 01:22:50 I will use the native resolution of the provided image which is 866 and 684.

  • 01:22:58 Then type your prompt here and you can use the any model from here.

  • 01:23:03 Let's use Protogen model so my prompt is dragon, awesome, intricate, cinematic, artstation.

  • 01:23:08 Let's type some negative low,bad, worse.

  • 01:23:11 Hit generate.

  • 01:23:12 Okay, we didn't get the output because we didn't enable the ControlNet.

  • 01:23:16 Don't forget that.

  • 01:23:18 And don't forget the check scribble mode, invert colors and now it is the map it it

  • 01:23:25 generated and this is the output we got.

  • 01:23:28 So you can play with different prompts and different models and generate different images.

  • 01:23:35 It works pretty fast and pretty correct.

  • 01:23:38 Just watch this video to learn more.

  • 01:23:39 Actually, I have another control net video as well which is based on the native released

  • 01:23:45 scripts from the official author.

  • 01:23:47 You can also watch this video to learn even more about ControlNet.

  • 01:23:51 Our SD 2.1 version training is going on.

  • 01:23:55 However, it looks like there are some problems because generated image is not correct.

  • 01:24:00 Okay, I have done a lot of research and looks like there is no way to do SD 2.1 version

  • 01:24:07 768 pixels training with DreamBooth without using xformers.

  • 01:24:15 I wanted to avoid xformers during training because it reduces the quality of the training.

  • 01:24:21 However, 24 gigabytes VRAM is just not enough.

  • 01:24:24 So we need to downgrade the xformers version to 0.0.14 I already have an excellent tutorial

  • 01:24:33 video for that for windows installation, so now I will show it on unix on RunPod.

  • 01:24:40 Alternatively, you can go to the browse servers and in here you can deploy a RunPod with 48

  • 01:24:49 gigabytes VRAM or 40 gigabytes VRAM.

  • 01:24:52 It is up to you, but they cost more.

  • 01:24:55 Therefore, we will just downgrade the xformers version.

  • 01:25:00 Now, follow me very carefully to learn how to downgrade xformers on RunPod io.

  • 01:25:07 First close all of the running kernels and terminals.

  • 01:25:11 Then inside python 3.10 folder, start a new terminal.

  • 01:25:16 First, we are going to run this command.

  • 01:25:19 Pip Uninstall torch torchvision.

  • 01:25:22 Paste it and hit yes and hit yes.

  • 01:25:25 Okay, it is uninstalled.

  • 01:25:26 Then we are going to run pip Uninstall torch audio.

  • 01:25:31 Paste it.

  • 01:25:32 Okay, it is done.

  • 01:25:33 Then we are going to use pip Uninstall xformers.

  • 01:25:36 Hit yes and it is done.

  • 01:25:39 You know?

  • 01:25:40 Currently I am inside workspace venv lib python 3.10.

  • 01:25:44 The folder where you are currently located makes huge difference.

  • 01:25:49 Make sure that you are inside the same folder.

  • 01:25:51 You can also apply this to SD 1.5 version as well.

  • 01:25:56 It is just same thing.

  • 01:25:57 Then we are going to install torch vision.

  • 01:25:59 Just copy this and paste it and hit enter.

  • 01:26:02 Okay, I got error.

  • 01:26:04 It says that there is no space left on the device because currently we started with five

  • 01:26:11 gigabyte space for runtime.

  • 01:26:14 Therefore, I will stop the pod like this.

  • 01:26:17 I will edit the disk space from.

  • 01:26:21 Click here.

  • 01:26:22 More actions.

  • 01:26:23 Click edit pod and in here in increase the container disk size.

  • 01:26:27 Save it, run it, start it, and reconnect to Jupyter lab.

  • 01:26:32 Enter inside the same folder venv lib, python 3.10 open terminal and make sure that you

  • 01:26:41 run all of the commands once again to be sure.

  • 01:26:44 Pip uninstall hit yes! if they are installed once again and pip uninstall torch audio,

  • 01:26:52 then pip uninstall xformers.

  • 01:26:54 Okay, it is done, then we will install this one.

  • 01:26:58 As you can see, I have changed it because this is the one that is working.

  • 01:27:02 Copy paste and hit.

  • 01:27:03 Enter and it is going to install.

  • 01:27:06 So once the full version of 0.0.17 is released, it will work with DreamBooth.

  • 01:27:11 Currently this is a development version as you can see and it is installed.

  • 01:27:16 Now we are ready to run our web UI as usual and it should support DreamBooth training

  • 01:27:22 with xformers.

  • 01:27:23 Before starting, I am going to edit xformers command line arguments minus minus xformers

  • 01:27:29 and I am going to add back the full precision minus minus no half and minus minus precision

  • 01:27:37 full and minus minus no half vae.

  • 01:27:41 Save it, run on a different port, shut down all of the terminals start a new terminal,

  • 01:27:47 relaunch the Web UI like this.

  • 01:27:49 Okay, so our application is now starting with 0.0.17.dev 448 version for xformers and these

  • 01:27:59 are the torch, torch vision, diffusers, and other versions.

  • 01:28:03 Okay, it is started now.

  • 01:28:04 Time to test whether it is working correctly or not for SD 2.1 DreamBooth training: okay,

  • 01:28:12 I am loading my model, load settings and in here.

  • 01:28:16 Let me show you quickly the latest settings.

  • 01:28:19 So let's make the amount of time to pause between epochs zero.

  • 01:28:23 I will save for every 20 epochs.

  • 01:28:26 I am unchecking gradient checkpointing.

  • 01:28:28 I will make learning rate as default.

  • 01:28:30 Actually, let's try it.

  • 01:28:32 Okay photo of ohwx man by tomer hanuka for sanity prompt and in advanced tab: now this

  • 01:28:38 is important.

  • 01:28:39 I will use EMA and in the mixed position, I am going to use fp16.

  • 01:28:44 Some cards also supports bf16, but to be sure use fp16.

  • 01:28:49 And when you hover your mouse it also says you that required when using xformers and

  • 01:28:55 in here I am going to use xformers.

  • 01:28:56 This is important.

  • 01:28:57 Cache latents: okay, then go to the concepts tab.

  • 01:29:02 They are set.

  • 01:29:03 Everything is looking good and in saving, generate a ckpt file when saving during training

  • 01:29:09 and hit train.

  • 01:29:10 By the way, we should have clicked save settings before, but I think it is automatically saved.

  • 01:29:16 If it doesn't work right away, just click save settings then hit train.

  • 01:29:20 Okay, let's watch the terminal.

  • 01:29:22 I hope that we won't get any more.

  • 01:29:24 Uh, out of memory error.

  • 01:29:26 Okay, it is killed so I will test one more time.

  • 01:29:30 Refresh the Gradio, DreamBooth select model load settings.

  • 01:29:35 Now this time I will set gradient checkpointing because it looks like necessary.

  • 01:29:40 Fp16 use EMA and yes, everything is same and let's try again with save settings.

  • 01:29:47 Train: okay, we got another error so this time I won't use EMA.

  • 01:29:51 Refresh the interface DreamBooth, model load settings uncheck gradient checkpointing and

  • 01:29:58 uncheck use EMA.

  • 01:29:59 This is significantly increasing the VRAM usage.

  • 01:30:02 Save settings hit train okay.

  • 01:30:04 Finally, the training has started and now time to wait and see how well it is learning

  • 01:30:10 and training.

  • 01:30:12 The Gradio is still responsive.

  • 01:30:13 That is very good and it is using this much of GPU memory so you see how much GPU memory

  • 01:30:20 usage the EMA is increasing when we check the EMA option.

  • 01:30:27 Meanwhile, SD 2.1 version training continues.

  • 01:30:29 I will explain what is fine tuning with DreamBooth.

  • 01:30:34 Okay, before I show how to do fine tuning.

  • 01:30:37 We got an error during the SD 2.1 version training at the 400 steps which means when

  • 01:30:44 it is generating a ckpt from the 20th epoch checkpoint.

  • 01:30:49 Therefore, I will restart the training with one change one parameter, change load settings,

  • 01:30:56 and go to the settings tab and enable gradient checkpointing.

  • 01:31:01 The rest is same like this so it should just work fine this time I think.

  • 01:31:07 Save settings hit train okay, this time we didn't get any error.

  • 01:31:11 During SD 2.1 version training, we got sample yes, somewhat similar.

  • 01:31:17 This is the first one at the 20th epoch and we got our sanity prompt as well.

  • 01:31:21 This is the loss rate which is very erratic as you can see and this is the VRAM usage

  • 01:31:27 like this.

  • 01:31:28 Now I can start showing you fine tuning.

  • 01:31:30 I have opened my 1.5 version RunPod so what is the difference of fine tuning.

  • 01:31:38 In the fine tuning we are not going to use classification images and we are going to

  • 01:31:44 use file words.

  • 01:31:45 Fine tuning is basically using a lot of good images with proper captions and not using

  • 01:31:53 any classification images.

  • 01:31:54 The rest is same.

  • 01:31:56 So every one of the keywords, every one of the tokens in the captions of the images will

  • 01:32:02 be trained and they will become like the images that you use for fine tuning.

  • 01:32:09 First of all, we need to process image files and add captions to them.

  • 01:32:15 So go to the training tab, go to the preprocess images, set the source directory.

  • 01:32:20 I don't have a data set for fine tuning a good data set you need a lot of images you

  • 01:32:24 need so I will use my own pictures that I used for training and set a destination like

  • 01:32:32 training captioned and in here use BLIP for caption and if those images are not 512 and

  • 01:32:41 512 pixels.

  • 01:32:42 If you are going to fine tune SD 2.1 version with 768 pixels then you need to change these

  • 01:32:49 resolution as well.

  • 01:32:50 You can also crop them with autofocal point crop but manually cropping them and preparing

  • 01:32:57 them is better and then click preprocess.

  • 01:33:00 When the first time you run it.

  • 01:33:01 It will download the BLIP model from internet.

  • 01:33:04 Okay, preprocessing has been completed now.

  • 01:33:07 Training captioned folder is generated.

  • 01:33:10 Now you see there are txt files named same as the image file.

  • 01:33:16 When you open them, you will see this captioning.

  • 01:33:19 So what does this mean.

  • 01:33:21 In the fine tuning all of these words, these tokens will be improved by the image they

  • 01:33:30 are same named.

  • 01:33:32 So all of these words will be improved towards this image.

  • 01:33:37 This is what is fine tuning.

  • 01:33:39 Let's say you want to improve castle images, then you should have good castle images and

  • 01:33:44 inside their description, you should have castle word.

  • 01:33:48 And if you want to associate those pictures with other words such as beautiful, intricate,

  • 01:33:53 high quality, then you should also put them.

  • 01:33:56 So put here whatever the words that you want to improve in your model with related to the

  • 01:34:03 picture they are associated with, and then once you prepared good captions and images

  • 01:34:09 inside your folder, copy the path of the new folder, go back to your DreamBooth tab and

  • 01:34:15 make this setup like this in concepts, workspace, training captioned data directory.

  • 01:34:21 Now this is important.

  • 01:34:23 In the prompt just type [filewords] and nothing else.

  • 01:34:28 This means that whenever it is training that particular image, it will load whatever is

  • 01:34:36 written inside here and replace instance prompt with it.

  • 01:34:40 That's it.

  • 01:34:41 So this will be equal to this prompt for this particular image that is going to train.

  • 01:34:49 In class prompt we are not using any classifications or class prompt.

  • 01:34:53 In the sample prompt, you can use the [filewords] to see what kind of images it is generating

  • 01:34:59 and make sure that class images per instance is zero.

  • 01:35:03 Because we don't want to try to keep the previous context of the model, we want its underlying

  • 01:35:10 context, latent space to be improved.

  • 01:35:14 And that's it everything else is same.

  • 01:35:16 So for fine tuning you need a lot of good images, good quality images with good captions.

  • 01:35:23 Those captions will be improved.

  • 01:35:25 It will also improve the Unet of the model so it will become overall better and overall

  • 01:35:30 cooked we can say.

  • 01:35:33 Because if you show less number of images than it was trained on, it will lose a lot

  • 01:35:38 of the contextual knowledge it has.

  • 01:35:42 Therefore, these cooked custom models are not good to train your faces on them because

  • 01:35:47 they don't have as much as information as these 1.5 pruned ckpt have.

  • 01:35:52 For example, this model was trained on 5 billion images as far as I know of.

  • 01:35:59 However, those custom models may be trained on 1000 images, one maybe 10 000 images.

  • 01:36:06 So their Unet has become like those 10 000 images instead of being trained on 5 billion

  • 01:36:14 images.

  • 01:36:15 That is why they are so good, but they have much lesser knowledge in their underlying

  • 01:36:21 context in their latent space.

  • 01:36:23 So this is basically fine tuning how it is done.

  • 01:36:27 If you want to be exactly same as Stable Diffusion training that the official training.

  • 01:36:33 You can also remove text encoder training with setting this parameter as zero.

  • 01:36:40 So with this way the tokens won't be improved.

  • 01:36:44 Only Unet will be improved.

  • 01:36:46 However, you don't want that for fine tuning.

  • 01:36:49 This is more like using hundreds thousands of images and training from scratch your model.

  • 01:36:57 So you should keep it perhaps like one and train Unet as well.

  • 01:37:01 So you will train both text encoder and the Unet and improve all of those keywords together.

  • 01:37:09 Hopefully I will make another very technical video about how training works, what is Unet,

  • 01:37:16 what is text encoder, how they are being changed during training, and it will explain a lot

  • 01:37:22 of the questions that are not very well answered in the community.

  • 01:37:27 So stay subscribed.

  • 01:37:29 Open notifications to not miss it.

  • 01:37:31 So let's check out our 2.1 version training.

  • 01:37:34 Okay, our sanity prompt already looks like lost its stylizing ability and the sample

  • 01:37:41 is not also looking very good.

  • 01:37:43 Uh, however, I have seen that it was learning so let's open the directory.

  • 01:37:47 Okay, inside DreamBooth, inside samples, let's look at each one of the sample.

  • 01:37:53 So this is the 20 epoch.

  • 01:37:54 Yes, it has a resemblance.

  • 01:37:56 It is not very good.

  • 01:37:57 This is the 40 epoch.

  • 01:37:59 Very minor resemblance.

  • 01:38:01 Let's check out the sanity prompt.

  • 01:38:03 The sanity prompt is much better.

  • 01:38:05 So this is somewhat similar to me, but stylized in Tomer Hanuka style.

  • 01:38:10 So the sanity prompt of the 60 epoch is not good at all.

  • 01:38:14 It lost its stylizing.

  • 01:38:17 The sample is also not very related, but this is SD 2.1 so it is harder to train and obtain

  • 01:38:23 good images.

  • 01:38:24 So you see this is the 80 epoch.

  • 01:38:26 This is almost as like me.

  • 01:38:28 Let me show you for comparison.

  • 01:38:31 With 80 epoch 2.1 version, it is started to learning my face very well.

  • 01:38:37 Let's check out the sanity prompt.

  • 01:38:39 However, sanity prompt also lost its ability to stylize so our learning rate could be very

  • 01:38:46 high.

  • 01:38:47 Perhaps we should try half of it.

  • 01:38:49 Based on your training data set the learning rate may change, number of steps, number of

  • 01:38:55 epochs that you need to do training may change.

  • 01:38:58 So it is up to you to do multiple trainings and compare how well they are working with

  • 01:39:05 x/y/z plots as I have shown.

  • 01:39:07 However, the training is working very well.

  • 01:39:10 It is learning the subject very well so we managed to make it work very well for SD 2.1

  • 01:39:17 version 768 a model.

  • 01:39:21 Let me show you the parameters once again.

  • 01:39:24 So I will slowly scroll down and you will be able to see all of the settings.

  • 01:39:30 This totally depends on your learning rate and how many number of training images you

  • 01:39:35 use.

  • 01:39:36 You should also save multiple checkpoints during training and compare them: batch size

  • 01:39:41 one and gradient accumulation one.

  • 01:39:42 If you increase this, it will increase significantly your VRAM usage.

  • 01:39:47 Also, we can't say bigger batch size is better.

  • 01:39:50 It's a debated topic.

  • 01:39:52 Mini batches versus full batches.

  • 01:39:54 These two are checked.

  • 01:39:55 Otherwise, we are getting VRAM error on 24 gigabyte.

  • 01:39:58 This is my current learning rate.

  • 01:40:01 This may be fast, so you may try half of it or even lower.

  • 01:40:05 This is the resolution.

  • 01:40:07 This is the sanity prompt to see how well it stylized.

  • 01:40:10 So don't check EMA because you will get error VRAM error even when using xformers.

  • 01:40:16 Use 8-bit adam, use fp16 to be sure that it is supported on your graphic card.

  • 01:40:22 Use xformers, cache latents, train Unet, train text encoder and these other things are just

  • 01:40:29 default.

  • 01:40:30 Okay, now I will show you how to install and run Kohya Lora training Kohya GUI on RunPod.

  • 01:40:37 To do that we are going to use Kohya ss linux branch.

  • 01:40:41 To do that we are going to use kohya ss linux fork of the official repository of kohya ss.

  • 01:40:48 This is modified to run on linux.

  • 01:40:51 So first of all, we are going to clone the repository into our RunPod.

  • 01:40:56 So this is my 1.5 RunPod.

  • 01:40:58 I am inside workspace.

  • 01:41:00 I have closed everything.

  • 01:41:02 Open a new terminal, copy paste the git clone command.

  • 01:41:05 It will clone into the kohya ss linux folder, then move into the kohya ss linux, type cd

  • 01:41:12 ko type tab.

  • 01:41:14 Hit enter and now I am inside kohya ss linux.

  • 01:41:18 Then we will generate virtual environment folder with this command.

  • 01:41:23 Copy it, hit enter inside this folder.

  • 01:41:26 Okay, it is generated.

  • 01:41:28 Let's also move it in here.

  • 01:41:30 Now we will run the next command which is for activating and entering inside that virtual

  • 01:41:36 folder.

  • 01:41:37 Actually source venv command: copy paste it.

  • 01:41:40 Hit.

  • 01:41:41 Enter now you see venv here.

  • 01:41:44 That means that currently actually we are running on the newly generated virtual environment

  • 01:41:50 folder.

  • 01:41:51 Next, we are going to install requirements.

  • 01:41:53 This is only one time necessary.

  • 01:41:56 The requirements file is located inside here and currently we are also inside that folder

  • 01:42:01 so it should work.

  • 01:42:02 The requirements installation may take some time.

  • 01:42:06 These installations will not affect your other installations because everything being installed

  • 01:42:12 here will be only installed inside this folder.

  • 01:42:16 Okay, we got an error that says no space left on the drive so I will just.

  • 01:42:21 I will just close the RunPod with stop pod and I will increase the container disk size

  • 01:42:26 to 10 gigabytes.

  • 01:42:28 To do that, click here, edit pod and run it once again.

  • 01:42:31 Start then click connect.

  • 01:42:33 Connect to Jupyter lab.

  • 01:42:35 Okay, it is still being getting launched so be patient.

  • 01:42:40 Okay notebook started once again so I will just delete the venv folder rm -r, venv.

  • 01:42:49 So I will start from beginning.

  • 01:42:51 Python minus m, venv venv then source command activate then install command.

  • 01:42:58 It will install the requirements.

  • 01:43:00 Okay, all requirements have been installed.

  • 01:43:03 As author here noted, it requires python 3.10 and it doesn't work on 3.11 since the RunPod

  • 01:43:12 runs on 3.10.9 for Stable Diffusion, it is just fine.

  • 01:43:18 Then we will set accelerate config.

  • 01:43:20 I am copying this pasting in here.

  • 01:43:23 We are still inside that venv folder.

  • 01:43:27 So now it will ask us bunch of questions.

  • 01:43:29 Select this machine hit, enter select, no distributed hit enter type no to this question,

  • 01:43:36 then type no to this question as well.

  • 01:43:39 And type no to this question as well.

  • 01:43:42 And type all for this question.

  • 01:43:45 And do you wish to use fp16 or bf16?

  • 01:43:49 select fp16.

  • 01:43:51 It will speed up your training and also use lesser VRAM.

  • 01:43:56 Okay and everything is ready.

  • 01:43:58 Then we are currently activated with source command so we don't need to run this again.

  • 01:44:03 We will just run this command and it should start our GUI.

  • 01:44:08 Okay, it is running on localhost so therefore we need to run it on shared link.

  • 01:44:15 To enable public Gradio link as we are using in Web UI.

  • 01:44:19 Open the kohyagui.py go to the interface launch tab here and add this comma Share true, save

  • 01:44:28 it and start it once again.

  • 01:44:30 So new terminal open first.

  • 01:44:33 Activate the venv like this and just run this command and now it has given us a Gradio link.

  • 01:44:40 When we run it, the famous Kohya GUI is loaded and ready to do training.

  • 01:44:46 The training with it is another topic and I won't cover it in this.

  • 01:44:50 Now I will stop my running RunPods and when I stop at them nothing will happen.

  • 01:44:55 They will just remain as they are.

  • 01:44:57 I can also start them without any GPU.

  • 01:45:00 So from here you can select zero and you can start your RunPod to backup your data to download

  • 01:45:07 your data without using any GPUs.

  • 01:45:09 So when you run them on CPU the disk cost plus it uses 0.16 dollars per hour.

  • 01:45:18 So it is still costing something.

  • 01:45:20 I think it is costing half of the original GPU price.

  • 01:45:24 However, sometimes you may not get a GPU.

  • 01:45:27 Sometimes all of the GPUs may be full on the RunPod so you will be have to run it without

  • 01:45:33 a GPU.

  • 01:45:35 So this is how you start it without using any GPU and there is also terminate.

  • 01:45:41 When you hit terminate it will delete your RunPod permanently.

  • 01:45:45 I already said this but I am saying it again.

  • 01:45:49 So do not hit terminate button unless you are 100 sure because it will delete everything

  • 01:45:55 on this RunPod and until you terminate and delete your RunPod it will continue using

  • 01:46:02 your credits.

  • 01:46:04 Currently I have two RunPods not running and it is using 0.056 dollars per hour.

  • 01:46:14 So this is the cost of keeping these two RunPods on my account.

  • 01:46:20 And when I delete them you will see this will get decreased.

  • 01:46:23 Let's delete first one with terminate pod.

  • 01:46:26 Okay and now this should get decreased.

  • 01:46:29 Let's go to the my pods.

  • 01:46:30 Let's refresh.

  • 01:46:32 Okay now you see currently it is decreasing.

  • 01:46:35 Zero point zero point twenty eight dollars per hour.

  • 01:46:39 This is charged per minute by the way, not per hour and I will also delete this and it

  • 01:46:45 will become zero.

  • 01:46:47 And now my credits are remaining as they are until I start another pod.

  • 01:46:52 There is one final thing that I want to show you cloud sync button here.

  • 01:46:57 So with cloud sync you can synchronize your data in your server to these cloud services

  • 01:47:04 and there is a great tutorial on the RunPod blog.

  • 01:47:08 I will share this link into the description as well so you can read here and set up your

  • 01:47:15 cloud storage and set a synchronization with your run pod and everything generated in your

  • 01:47:22 RunPod will be synchronized with your cloud.

  • 01:47:25 Also you can use the runpodctl that I have shown multiple times to download your data

  • 01:47:32 to upload your data.

  • 01:47:34 It is up to you that how you want to use it.

  • 01:47:36 I think I have covered everything that I have mentioned in the beginning.

  • 01:47:41 I hope you have enjoyed.

  • 01:47:42 Please like, subscribe and leave a comment to do this tutorial also join our Discord

  • 01:47:48 channel, ask any questions that you can't solve.

  • 01:47:52 Also, please support us on Patreon.

  • 01:47:55 It is really important.

  • 01:47:56 The Patreon link and the Discord link will be in the comments and description.

  • 01:48:01 All of the links we have used in this video will be in the description.

  • 01:48:05 You can also find our Patreon page on our about tab of our youtube page youtube channel.

  • 01:48:06 We have so far 26 patrons.

  • 01:48:07 I thank them a lot thank them very much.

  • 01:48:08 I hope you also become a Patreon.

  • 01:48:09 Hopefully see you in another awesome video!

  • 01:48:10 Thank you so much.

Clone this wiki locally