-
-
Notifications
You must be signed in to change notification settings - Fork 362
Ultimate RunPod Tutorial For Stable Diffusion Automatic1111 Data Transfers Extensions CivitAI
Full tutorial link > https://www.youtube.com/watch?v=QN1vdGhjcRc
Sign up RunPod: https://bit.ly/RunPodIO. This is the Grand Master tutorial for running Stable Diffusion via Web UI on RunPod cloud services. If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 https://www.patreon.com/SECourses
SECourses Discord To Get Full Support
https://discord.com/servers/software-engineering-courses-secourses-772774097734074388
#RunPod discord: https://discord.gg/pJ3P2DbUUq
Colab Tutorial 1: https://youtu.be/mnCY8uM7E50
Colab Tutorial 2: https://youtu.be/kIyqAdd_i10
Automatic1111 Command Line: https://bit.ly/StartArguments
Best DreamBooth Tutorial: https://youtu.be/Bdl-jWR3Ukc
DreamBooth second tutorial: https://youtu.be/KwxNcGhHuLY
RunPodCTL GitHub: https://github.com/runpod/runpodctl
Pre-trained models repo link : https://huggingface.co/lllyasviel/ControlNet
Web UI install tutorial on PC: https://youtu.be/AZg6vzWHOTA
How To Use Different Models Automatic1111: https://youtu.be/aAyvsX-EpG4
Textual Inversion Training Tutorial: https://youtu.be/dNOpWt-epdQ
ControlNet Tutorial Video: https://youtu.be/vhqqmkTBMlU
ControlNet extension: http://bit.ly/3IxBYc6
ControlNet Model Files: https://bit.ly/CTRLNETModels
ControlNet Native Script: https://youtu.be/YJebdQ30UZQ
Upgrade xformers Commands: https://bit.ly/UPxformers
Kohya GUI: http://bit.ly/3ICvsB7
Cloud sync: http://bit.ly/40Zf44C
00:00:00 Intro
00:01:32 How to register RunPod.io and charge your credits
00:02:34 How to deploy a pod - start a server for Stable Diffusion 1.5 Automatic1111 Web UI
00:03:30 How to select deployment template for Stable Diffusion Web UI in RunPod
00:04:00 Explanation of temporary disk and persistent volume
00:04:44 Explanation of credit spending per minute for storage usage in RunPod
00:08:10 My Pods section
00:08:30 Connect to the started Pod
00:08:41 Start SD 2.1 Version Web UI Pod
00:09:25 Why pick a lesser used Pod
00:10:53 Bidding system of RunPod.io
00:13:11 Where and how to see scheduled maintenance
00:13:31 Stop Pod vs Terminate (delete) Pod
00:14:24 Where to see logs to debug and understand errors
00:15:08 Connect your Pod via a Jupyter Lab interface
00:15:16 How to change Automatic1111 Web UI command line arguments and restart it
00:17:54 First prompt in RunPod Automatic1111 Web UI
00:18:45 Where to see logs, find error logs, debug them
00:19:35 How to install DreamBooth extension of Automatic1111 Web UI
00:20:58 Where the generated images are saved
00:21:10 How to download generated images
00:21:38 How to update installed extensions
00:21:55 How to notice port error and fix it
00:23:04 How to install runpodctl latest version to transfer files very quickly between Pods and PC
00:23:55 How to download a ckpt file very fast from Hugging Face repo
00:25:10 Start DreamBooth training with best model and settings
00:30:41 How to upload your training dataset images
00:34:15 How to upload thousands of images (big data) from your computer to RunPod via runpodctl
00:34:28 How to install RunPodCTL on your Windows computer
00:35:06 How to send files from your PC to RunPod via runpodctl
00:39:38 Where to find generated checkpoints and sample images during DreamBooth training
00:41:30 How to delete non-empty folder
00:41:51 Even though xformers not selected during training, still breaks training and how to fix it
00:42:29 How to download a folder from RunPod to your PC via runpodctl very quickly
00:43:09 How to add runpodctl to environment path to use from every folder
00:47:25 How to continue/resume DreamBooth training
00:48:20 Test all training checkpoints with x/y plot to find best one
00:52:09 How to set correct command line arguments for SD 2.1
00:52:55 Where to see currently spent credits per hour
00:54:05 How to do DreamBooth training on SD 2.1 - 768 pixel version with best possible settings
00:57:42 How to generate classification images manually very fast
01:00:26 Why SD 1.5 is superior to 2.1
01:04:34 How to download custom models very fast from CivitAI
01:08:45 How to do Textual Inversion training with some optimal settings
01:13:00 Where Textual Inversion training samples and checkpoints are saved
01:14:07 How to use Textual Inversion check points
01:15:55 Move generated SD 2.1 classification images into correct folder
01:19:26 How to install and run ControlNet extension on RunPod IO
01:21:11 How to download your trained model files (ckpt) into your PC very fast via runpodctl
01:25:00 How to upgrade xformers to 0.0.17 for DreamBooth SD 2.1 training
01:26:04 How to expand runtime disk space
01:27:21 Best settings for SD 2.1 with xformers
01:31:30 What is Stable Diffusion fine tuning and how to do fine tuning with DreamBooth
01:39:20 Best settings quick recap for SD 2.1 for 24 GB VRAM
01:40:34 How to install and run Kohya GUI on RunPod
01:44:16 How to enable public Gradio link for Kohya GUI
01:44:52 How to start RunPods without GPU
01:46:53 Cloud snyching your Pod data / content
thumbnail freepik macrovector
-
00:00:00 Greetings everyone.
-
00:00:01 In this video, I am going to show how to use Automatic1111 Web UI for Stable Diffusion
-
00:00:07 tasks on RunPod.io like you are using it on your computer.
-
00:00:11 I will cover many topics such as how to upload and download files quickly, how to delete
-
00:00:17 directories, how to install and run extensions, how to quickly download and use custom models,
-
00:00:23 how to do DreamBooth training on Stable Diffusion 1.5 or 2.1 versions, how to do fine tuning
-
00:00:30 via DreamBooth extension, how to do Textual Inversion training.
-
00:00:34 I will also explain how their pricing system works, how you can use bidding, how you can
-
00:00:39 transfer files from Pod to Pod or from Computer to Pod and vice versa, how you can install
-
00:00:45 custom other scripts such as famous Kohya graphical user interface.
-
00:00:50 I will also demonstrate how you can use new famous ControlNet on RunPod.io.
-
00:00:56 So why RunPod.io?
-
00:00:58 Because their system charges you based on per minute and they have great Discord support.
-
00:01:03 They are also easier to use with the tools they have.
-
00:01:07 But still, if you are interested in free cloud services for Stable Diffusion, I have two
-
00:01:12 great tutorials for Google Colab.
-
00:01:14 The first one is this one and the second one is this one.
-
00:01:18 And if you don't know how to use Automatic1111 Web UI, if you don't know what is Stable Diffusion,
-
00:01:23 what is Automatic1111 Web UI, I have great tutorial series for them.
-
00:01:27 For example, you can begin with watching video and you can check out the other videos in
-
00:01:31 this playlist.
-
00:01:32 So let's begin the Grandmaster RunPod.io tutorial by signing up a new account.
-
00:01:38 Click the sign up button.
-
00:01:39 For sign up I will use my Google account.
-
00:01:42 You can also enter your email and password if you wish.
-
00:01:45 Choose your account to sign up.
-
00:01:47 Click I have read and agreed RunPod Terms and Services.
-
00:01:51 Click Continue.
-
00:01:52 And yes, we are ready to start.
-
00:01:54 First of all, you need to charge some credits to start using the pods.
-
00:01:59 Click your balance from here as you can see in the right top menu, then it will show your
-
00:02:04 available balance.
-
00:02:06 From here you can pay with a card.
-
00:02:08 You can change the amount that you want to charge.
-
00:02:10 To have automatic payments you can add a card.
-
00:02:13 They also allow you to pay with a crypto.
-
00:02:16 Just click this icon.
-
00:02:17 They also show recent transactions, recent charges, and everything is very transparent.
-
00:02:23 OK, now I have logged in my account where I have my credits.
-
00:02:28 Now we can start using our Pods.
-
00:02:30 To do that, go to the browse servers tab in here and in here you will see the available
-
00:02:37 servers.
-
00:02:38 If you are going to do training, then I suggest you to get minimum 24 gigabytes VRAM having
-
00:02:45 server.
-
00:02:46 Because currently the latest officially released xformers is not working very well for training.
-
00:02:53 They have a nightly version that works well, but for training we won't use xformers.
-
00:03:00 And if you are not going to use xformers, then you should get minimum 24 gigabytes VRAM
-
00:03:06 having server.
-
00:03:07 I find that RTX A5000 is very decent GPU with a lower price.
-
00:03:14 As you can see, it is only 0.32 dollars per hour.
-
00:03:19 So I am going to deploy RTX A5000 GPU.
-
00:03:23 When you click the deploy icon, this interface will appear to you.
-
00:03:28 So in this interface, you should select your template.
-
00:03:31 There are many templates.
-
00:03:33 When you type Stable Diffusion, you see there are two very popular templates for Stable
-
00:03:39 Diffusion.
-
00:03:40 RunPod Stable Diffusion 1.5 and RunPod Stable Diffusion 2.1.
-
00:03:43 I will start both of them and I will start doing training both of them simultaneously.
-
00:03:49 So let's begin with RunPod Stable Diffusion 1.5 as a template.
-
00:03:54 So it will also download the official 1.5 version when it starts.
-
00:03:59 In here it shows us the other features.
-
00:04:01 They are very decent.
-
00:04:03 The temporary disk is the disk where the operating system will run.
-
00:04:08 You don't need to increase this.
-
00:04:10 And the persistent volume.
-
00:04:11 Now this is really important.
-
00:04:13 The persistent volume will stay remain as long as you don't delete your Pod.
-
00:04:20 So when you close your Pod, it will remain as it is.
-
00:04:23 It is like your hard drive.
-
00:04:24 It is persistent.
-
00:04:26 Everything you have generated, you have downloaded will remain there.
-
00:04:30 So this should be a sufficient amount of disk space based on your needs.
-
00:04:36 I am going to set it as 100 and when you set it, it will increase your minute credit spending.
-
00:04:44 So when you hover your mouse over this icon, it shows that 0.10 per gigabyte per month
-
00:04:52 for total disk on running Pods, 0.20 per gigabyte per month for volumes on exited Pods.
-
00:05:00 I know that this may be sounding confusing in the beginning, so I have prepared an example
-
00:05:06 for you which I will explain step by step.
-
00:05:09 So we have 105 gigabytes while running.
-
00:05:13 Why?
-
00:05:14 Persistent volume is 100 gigabytes and temporary disk is 5 gigabytes.
-
00:05:17 So while running, we are going to spend like this.
-
00:05:21 Let's say our Pod did run 75 minutes.
-
00:05:25 So 105 multiplied with 0.1 which is the per gigabyte price for per month.
-
00:05:33 In per month, how many days there are?
-
00:05:36 30 days.
-
00:05:37 So we are dividing it with 30 days.
-
00:05:40 In a day, how many hours there are?
-
00:05:42 24 hours.
-
00:05:43 So we are dividing it with 24 hours.
-
00:05:46 In an hour, how many minutes there are?
-
00:05:48 There are 60 minutes.
-
00:05:49 So this is the price of per minute running and since we are running 75 minutes, it is
-
00:05:56 going to take total 0.018 dollar from our credit.
-
00:06:02 You can also copy this.
-
00:06:04 Open your calculator with typing calculator in your search bar, paste it and hit enter
-
00:06:09 and you will get the result like this as you can see.
-
00:06:12 So in the below, I am giving example of Pod when it is not running.
-
00:06:18 When the Pod is not running, we are going to use 100 gigabytes persistent volume and
-
00:06:25 let's say our Pod did remain not running for two days.
-
00:06:30 So when Pod is not running, the price is for per gigabyte per month 0.20 dollars.
-
00:06:37 So since we have 100 gigabytes, 100 multiplied with 0.2, then let's delete this to not have
-
00:06:45 more confusion than in a month.
-
00:06:48 We have 30 days and we are going to use two days.
-
00:06:52 So this will be our spending.
-
00:06:54 So you can also open the calculator and copy-paste it, hit enter and you will get the price.
-
00:07:01 So this part is the price of one day offline for your 100 gigabytes having Pod.
-
00:07:09 And since it will be offline for two days, this is the credit that we are going to use.
-
00:07:13 The very important thing is that these credits will be deducted from your account per minute.
-
00:07:20 So if you keep using RunPod.io service for 10 minutes, you will be charged for 10 minutes.
-
00:07:26 So if it remains offline for 10 minutes, then you will be charged for 10 minutes.
-
00:07:31 It is not like taking your credits for per day, for per hour, or for per month.
-
00:07:37 It is using your credits for every minute.
-
00:07:40 When you hover your mouse over encrypt volume, you will see the message.
-
00:07:44 Encrypted volumes provide better data security, but will incur a performance penalty and cannot
-
00:07:49 be resized later.
-
00:07:50 So unless you need this, don't check this box.
-
00:07:54 Start Jupyter Notebook.
-
00:07:55 This will make your life much easier.
-
00:07:58 And this is the price per hour for our GPU.
-
00:08:02 So this price will be added these volume prices as well.
-
00:08:07 After you clicked deploy button, you will see an interface like this.
-
00:08:11 You can go to the My Pods section and you will see on demand community cloud is being
-
00:08:17 prepared.
-
00:08:18 When I click in here, you see it is showing me the messages of the Pod that is being prepared,
-
00:08:27 what is happening on the Pod.
-
00:08:28 And once it becomes ready, we will see connect button in here.
-
00:08:33 So it is initializing the Pod with the necessary installation and the Pod is now ready and
-
00:08:39 it is running.
-
00:08:40 Now I will start SD 2.1 version Pod simultaneously.
-
00:08:45 To do that I am clicking browse servers and when you open browse servers tab, you will
-
00:08:49 see in the right tab how much credits you are spending right now.
-
00:08:54 Because currently my other Pod is running, as you can see in My Pods tab.
-
00:09:00 So let's return back the browse servers and in here there are several options.
-
00:09:04 So you see there are one GPU Pods, two GPU Pods, large Pods, four GPU or x large Pods,
-
00:09:12 eight GPUs.
-
00:09:13 So if you need multiple GPUs, then you can filter them with this.
-
00:09:16 Also in each Pod, you will see their location, their available upload and download speeds,
-
00:09:22 their available disks and other things.
-
00:09:25 Choosing a less used Pod is better because if your previous Pod is fully used, then you
-
00:09:34 won't be able to get a GPU on that Pod.
-
00:09:37 So what happens then, then to use your existing files, you need to compose a new Pod and transfer
-
00:09:45 your files.
-
00:09:46 So availability is really important when choosing your Pod.
-
00:09:50 If you choose highly preferred Pod, then you will have lesser time to get it and it will
-
00:09:57 make things harder for you.
-
00:09:59 So based on this fact, you should choose your Pod.
-
00:10:03 So for the SD 2.1 version, I am going to pick another RTX A5000.
-
00:10:09 When you click more RTX A5000, it displays you other locations as well.
-
00:10:16 You see the upload and download speeds changes and the available space changes.
-
00:10:22 More available space probably means that it is being used lesser.
-
00:10:27 So for Canada server, it looks like it is not very much preferred this particular server.
-
00:10:33 So there is also Norway server.
-
00:10:35 You see it has great upload and download speeds.
-
00:10:38 It has decent hard drive space as well.
-
00:10:40 So it is probably also not very much used.
-
00:10:43 However, this is expensive than others.
-
00:10:46 So I think I will go with this Canada server.
-
00:10:50 Its speeds are also decent.
-
00:10:52 Click deploy.
-
00:10:54 There is one more thing as well that I need to explain.
-
00:10:57 Community cloud.
-
00:10:58 So what does community cloud mean that?
-
00:11:01 In the community cloud section, you will be able to bid for shared servers.
-
00:11:06 All of the servers are shared, but this is kind of that you bid and if someone overbids
-
00:11:11 you, they get your GPU.
-
00:11:14 So in here you see the prices will be lower.
-
00:11:17 When I click RTX A5000 select and then I click continue.
-
00:11:22 So you see currently this is selected.
-
00:11:24 RunPod Stable Diffusion 1.5.
-
00:11:27 I can also change it from this template.
-
00:11:29 Don't forget to change template.
-
00:11:32 When I click continue, you see now I am getting pricing summary and advanced.
-
00:11:36 When I click advanced, it will allow me to bid for a spot.
-
00:11:41 So you see the current bid is 0.198.
-
00:11:44 When I bid this, I will overbid the other person who has bidded lesser than this.
-
00:11:51 So I am going to get his GPU if there are no available other GPUs.
-
00:11:56 So let's say we did bid like this and we started our RunPod.
-
00:12:00 So someone else comes and bids 0.2 and they will get our GPU.
-
00:12:06 Then our pod will not have any GPU to do inference or training and our training will be also
-
00:12:13 halted.
-
00:12:14 So be careful with this.
-
00:12:15 If you are not going to do training, if you are only going to do image generation, then
-
00:12:20 you can go with this option and spend lesser.
-
00:12:23 The running disk cost and exited disk cost also slightly changes.
-
00:12:28 You can recalculate the cost.
-
00:12:30 So this is how you do bidding and this is how you use community cloud servers.
-
00:12:36 Since I am going to do training, I am going to use on demand server and I am going to
-
00:12:41 pick on demand server from here.
-
00:12:44 This Canada server.
-
00:12:46 Let's check again.
-
00:12:47 Yes, I am going to use this Canada server because it has the most available disk space.
-
00:12:52 Therefore, I am assuming that it is being used lesser than others.
-
00:12:57 Click deploy and we have selected RunPod Stable Diffusion 2.1 version.
-
00:13:01 Let's set our persistent volume as 100 GB and let's also deploy it so it will get deployed.
-
00:13:08 When I click My Pods, I will see them in here.
-
00:13:11 OK, when you go on My Pods, it is going to show you if there will be a maintenance or
-
00:13:18 not.
-
00:13:19 So you should be careful with this maintenance.
-
00:13:21 It says that it will start at this local time.
-
00:13:24 Therefore, I think I will delete this Pod.
-
00:13:27 So I will just click stop Pod and then I will delete it.
-
00:13:31 So when you stop your Pod, it will remain as it is.
-
00:13:34 However, if you click this terminate, then the Pod will be permanently deleted and you
-
00:13:39 won't be able to recover or access any of your data.
-
00:13:43 So now it is gone.
-
00:13:45 Let's go back to the browse servers tab and let's pick another server from here.
-
00:13:51 Maybe that is why it was being lesser used.
-
00:13:54 So I will pick this one.
-
00:13:56 OK, 2.1 version 100 GB.
-
00:13:59 Let's deploy.
-
00:14:00 Let's go to the My Pods and it is being deployed.
-
00:14:03 The first one we started is running.
-
00:14:06 The other one is being initialized and this is my per hour using credits right now.
-
00:14:12 OK, let's connect our first Pod.
-
00:14:14 To connect our first Pod.
-
00:14:16 I am clicking My Pods.
-
00:14:18 Let's refresh so you will see the interface as it is.
-
00:14:20 OK, I am clicking here.
-
00:14:22 It will open the interface.
-
00:14:24 When you click logs, it will show you the logs screen.
-
00:14:27 This is really important to debug the errors that you might encounter.
-
00:14:31 So it started with xformers with Workspace 1.5. emaonly CKPT file.
-
00:14:36 Actually, this is not the best CKPT file for training, so I will download the best one
-
00:14:43 and it is running on xformers 0.0.16.
-
00:14:47 This xformers is not compatible with DreamBooth training or Textual Inversion training, unfortunately,
-
00:14:54 so we won't use xformers during training and the other things are also displayed here.
-
00:14:59 When you click system logs, it will also show you the system logs.
-
00:15:02 When you click this refresh icon, it will refresh and when you click this X, it will
-
00:15:07 close it.
-
00:15:08 So let's click connect and I will connect it via Jupyter Lab, which will make our life
-
00:15:14 much easier.
-
00:15:15 OK, so our Jupyter has started like this.
-
00:15:19 The first thing that I am going to show you is how to change starting command line arguments.
-
00:15:25 To change them, I am zooming it for you to see easier.
-
00:15:28 That is webui-user.sh.
-
00:15:33 So this is the file where the command line arguments are provided.
-
00:15:38 You see it is starting with default port 3000.
-
00:15:41 It is starting with xformers.
-
00:15:43 The default CKPT is provided like this and there is a listen and enable insecure access.
-
00:15:50 So if you wonder what are these arguments are doing, there is a wiki page of Automatic1111
-
00:15:55 web UI and you can search for the commands by copying and pasting them and it will show
-
00:16:02 you launch gradio with 0000 as server name allowing to respond network requests.
-
00:16:07 Actually, I am going to also add share to be able to use it from my browser like this
-
00:16:14 and enable insecure extension access means that we will be able to install extensions.
-
00:16:19 Make sure that these commands are already enabled.
-
00:16:23 Otherwise, you won't be able to install extensions and I think we are ready.
-
00:16:28 I will also change the port to not get conflicted with any of the initial starting.
-
00:16:34 Just save.
-
00:16:35 When you save, you will see in the bottom saving completed.
-
00:16:37 Then go to the running terminals and kernels.
-
00:16:40 Shut down all of the running terminals and then go back to the file browser.
-
00:16:46 Make sure that you are inside Stable Diffusion web UI folder.
-
00:16:50 Then start the terminal.
-
00:16:51 When you start the terminal, it will start with the folder that you are currently in.
-
00:16:56 You see it is the same as the folder that we are in and in here we will use relauncher.py.
-
00:17:03 To do that just type python and I will copy paste the name relauncher.py hit enter and
-
00:17:10 it will restart our web UI with the newest set command line arguments.
-
00:17:15 We should be able to see them in here.
-
00:17:17 Yes, we are seeing dash dash port three thousand ten xformers.
-
00:17:21 So with this way you can also start multiple instances of web UI.
-
00:17:27 If you are a professional, then you can do that.
-
00:17:29 But if you are not, I don't suggest you to do that.
-
00:17:32 Now we can access it from this public URL.
-
00:17:35 This public URL is currently not secured by a password.
-
00:17:39 You can also add a password in here I think.
-
00:17:42 Let me show you.
-
00:17:43 Yes, you can also set out username and password.
-
00:17:46 However, if you are not giving this URL to anyone, then it should be safe.
-
00:17:51 As you can see, our interface is started.
-
00:17:54 Let's start with typing a simple prompt and see what happens.
-
00:17:58 OK, I have prepared my prompt.
-
00:18:01 I hit generate and in My Pods now you will see the GPU memory used is being increased.
-
00:18:07 GPU utilization will also increase as it generates the images and image is already generated.
-
00:18:14 Let's set the batch size as eight and batch count as one hundred.
-
00:18:18 And let's see how it is using our GPU.
-
00:18:21 So let's hit the refresh.
-
00:18:23 So it is showing like ten seconds ago.
-
00:18:25 OK, now you see the GPU utilization is one hundred percent.
-
00:18:29 GPU memory used is still significantly low because it is using also xformers, even though
-
00:18:37 we are generating images as batches with eight as batch size.
-
00:18:42 So in each time it will generate eight images.
-
00:18:45 So where are these files are being saved?
-
00:18:49 And how can I see if any error occurs?
-
00:18:52 You see in the My Pods, just click the logs and you will see all of the logs here.
-
00:18:58 This is really important to debug the logs.
-
00:19:00 And in here in the terminal window, you will see what is happening.
-
00:19:04 So how can you open the terminal.
-
00:19:05 To open the terminal, go to the running terminals and kernels.
-
00:19:08 And let's say I have closed the terminal.
-
00:19:11 I double click the terminal and it will show me the terminal as here.
-
00:19:16 As you can see in here.
-
00:19:17 This is equal to the terminal that we have on our computer when we are running it locally
-
00:19:23 on our computer.
-
00:19:24 This is the it per second.
-
00:19:26 However, since we are generating eight images at a time, it is actually over twenty four
-
00:19:31 it per second.
-
00:19:33 You need to multiply this with eight.
-
00:19:35 OK, let's hit the interrupt.
-
00:19:37 Now I will install the DreamBooth extension.
-
00:19:39 To do that go to the extension tab.
-
00:19:41 Go to the available hit load from.
-
00:19:44 Search DreamBooth, hit install.
-
00:19:47 Meanwhile, my two point one version terminal is also spending my time, my credit.
-
00:19:53 So I will just stop it.
-
00:19:55 So when you click stop Pod you are going to get this message, you should read it and understand
-
00:20:01 it.
-
00:20:02 OK, stopped Pod.
-
00:20:03 Basically what does it says that all of the things that is not saved on your workspace
-
00:20:09 will be lost.
-
00:20:11 So whatever you have in your workspace will be saved.
-
00:20:15 OK, let's see the status of the installation.
-
00:20:18 OK, it says that installed into workspace, Stable Diffusion, web ui extensions, SD DreamBooth
-
00:20:24 extension.
-
00:20:25 Now I will restart my terminal because when you first time install DreamBooth, you really
-
00:20:30 need to restart terminal so that it can install the necessary dependencies.
-
00:20:35 So I am going to do terminal stop, shut down all terminals.
-
00:20:39 Then I am going to Stable Diffusion Web UI folder and in here I will open a new terminal.
-
00:20:46 Same as before, I will type Python and relauncher.py and hit enter.
-
00:20:51 So the Web UI has been restarted and now we got a new link.
-
00:20:56 Let's copy and paste it.
-
00:20:58 Meanwhile, it is being loaded let's check out the generated images.
-
00:21:01 So they are saved in the outputs folder in the in the text to image images folder.
-
00:21:07 And yes, they are in here.
-
00:21:09 So how to download them?
-
00:21:10 You can download them one by one, right click and download.
-
00:21:13 Then it will download like this.
-
00:21:16 You can alternatively right click and download current folder as an archive.
-
00:21:20 It will first make archive and it will download all of the images like this.
-
00:21:25 It is a decent speed and it has downloaded all of these images.
-
00:21:31 121 files so far.
-
00:21:32 OK, the interface has been reloaded and now we are seeing the DreamBooth extension.
-
00:21:38 When we go to the extension tab, check for updates.
-
00:21:41 We should see the latest version in here.
-
00:21:44 Actually, it says that it is behind.
-
00:21:46 So let's click apply and restart UI.
-
00:21:49 And once we do that, we get an error.
-
00:21:53 It is relaunching in two seconds.
-
00:21:55 OK, when relaunching, we are getting port error because the previous one was crashed.
-
00:22:01 So what I'm going to do is: I will shut down all of the terminals.
-
00:22:05 Go back to the file browser.
-
00:22:07 In the first installation, you may encounter such errors.
-
00:22:10 Go to the webui user.sh file and change the port here and then go to the terminal tab.
-
00:22:18 Open a new terminal like this and type Python relauncher.py.
-
00:22:22 it will restart and when restarting now it is showing us the DreamBooth revision and
-
00:22:28 the SD Web UI revision like this.
-
00:22:30 I will just start training.
-
00:22:31 OK, it has been restarted.
-
00:22:33 Let's open the new URL.
-
00:22:35 OK, currently it is selected as 1.5 pruned emaonly CKPT and in the DreamBooth tab.
-
00:22:42 When we are going to generate a new training model, this is only available model.
-
00:22:48 However, 1.5 pruned CKPT is better than emaonly for training.
-
00:22:53 Therefore, I am going to download this CKPT file.
-
00:22:56 So how am I going to download it?
-
00:22:58 You see there is a download button in here.
-
00:23:01 I am right clicking and copying link address.
-
00:23:04 But before doing that, let's start a new terminal.
-
00:23:06 To do that, I am going to right click new plus icon here.
-
00:23:10 It will open a new launcher.
-
00:23:11 Hit terminal.
-
00:23:13 For fast download I am going to use RunPod CTL.
-
00:23:17 The RunPod CTL allows us to quickly download or upload files through our Pods to Pods or
-
00:23:24 from Windows to Pods and vice versa.
-
00:23:27 There are different versions.
-
00:23:29 I am going to install the Linux one on my RunPod.
-
00:23:32 So I am selecting it like this and copying it.
-
00:23:35 Then in my terminal I am pasting it with control V and I am hitting enter.
-
00:23:41 It will install the latest RunPod CTL.
-
00:23:44 After this command type RunPod CTL hit enter and you should get a message like this.
-
00:23:50 That means that it has been successfully installed or it was already installed.
-
00:23:55 Now how are we going to download this pruned CKPT file.
-
00:23:57 To download it first enter where you want to download, which is inside models inside
-
00:24:05 Stable Diffusion.
-
00:24:06 And in here where we want to download our model file, then I am going to click this
-
00:24:12 plus new launcher, launch a new terminal, and in this new terminal, this is the folder
-
00:24:18 where we are right now.
-
00:24:19 Now for downloading type wget and copy this URL paste it, hit enter and it will get downloaded
-
00:24:29 inside this folder.
-
00:24:30 By the way RunPod CTL is not necessary to download this file, but we will use it to
-
00:24:37 send data and get data from RunPod to our computer or from computer to RunPod or from
-
00:24:44 RunPod to RunPod.
-
00:24:45 This wget is a unix command and also alternative of it is available on windows as well.
-
00:24:52 So with this wget command, you can quickly download files into your RunPod folders like
-
00:24:59 this.
-
00:25:00 So you see currently it is downloading with 90 megabytes per second which is pretty decent
-
00:25:05 speed.
-
00:25:06 Okay the download has been completed and now the file is located in here.
-
00:25:10 Then what are we going to do is hit refresh button here and now I can see the 1.5 pruned
-
00:25:18 CKPT as well.
-
00:25:19 This is the way to download models from Hugging Face or wherever they are hosted.
-
00:25:24 If you can get direct link of it I will show examples.
-
00:25:28 Don't worry.
-
00:25:29 So now I will start DreamBooth training with the best possible settings.
-
00:25:33 First let's switch to 1.5 pruned CKPT.
-
00:25:36 This is not necessary but I'm not being sure that it is working as expected.
-
00:25:40 So I am making sure I have selected the target model in here as well.
-
00:25:45 So it has been loaded.
-
00:25:47 If it doesn't get loaded.
-
00:25:48 You should check the terminal window.
-
00:25:50 It is running on here.
-
00:25:52 It will show what is happening and you can also check the logs window in here.
-
00:25:57 It will show what is happening.
-
00:25:59 Okay now let's give a name to our training.
-
00:26:01 Let's say test SD 15 and check the source point.
-
00:26:05 So you see it is not seeing my latest checkpoint.
-
00:26:07 I am clicking refresh and I am checking the latest checkpoint.
-
00:26:11 This is very good to teach faces.
-
00:26:14 1.5 pruned CKPT the 512x model is selected and hit create model.
-
00:26:20 I am not changing other parameters because optimal parameters are currently selected.
-
00:26:26 These are more like experimental things or things that for more professional people.
-
00:26:31 And in the terminal you see it is downloading the necessary files right now.
-
00:26:36 That is why it is waiting.
-
00:26:37 Okay it says that checkpoint successfully extracted.
-
00:26:41 So the model has been generated.
-
00:26:42 However as you can see, the interface is frozen.
-
00:26:46 Unfortunately this is a problem of Gradio.
-
00:26:49 So what are we going to do is we will refresh reload this page and now it says no interface
-
00:26:54 is running.
-
00:26:55 It looks like the interface has been terminated unexpectedly.
-
00:27:01 And what do we see in the terminal in here in the system logs.
-
00:27:05 Okay it doesn't show anything and it doesn't show anything in here either.
-
00:27:09 So let's check out our terminals.
-
00:27:11 Terminal one which is our main terminal and yes it is not showing.
-
00:27:17 So what can we do.
-
00:27:19 We need to restart.
-
00:27:20 To restart I will shut down all terminals and I will follow the same procedure.
-
00:27:24 Open terminal.
-
00:27:26 However currently we are inside model Stable Diffusion so it won't work.
-
00:27:29 We need to move to the parent folder.
-
00:27:32 To moving parent folder.
-
00:27:33 I am closing this terminal going to the folders tab.
-
00:27:37 I am navigating like this opening a new terminal.
-
00:27:40 Python relauncher.py and in my pod current GPU memory usage is only 11 percent.
-
00:27:46 So it is good, which means that no other terminal or instance of Web UI is running.
-
00:27:52 Also there are some warning messages here.
-
00:27:55 I think we could ignore them.
-
00:27:57 Okay it has started.
-
00:27:58 I am opening this URL.
-
00:28:00 I am going DreamBooth tab and now I will select my model because I already created it and
-
00:28:06 it is selected.
-
00:28:07 Let's set up the settings.
-
00:28:09 Okay I won't pick this checkbox because it is usually causing me problems.
-
00:28:14 How many steps per image.
-
00:28:15 I am going to use 12 images and I am going to train up to 200 epochs.
-
00:28:20 I will save model for every 10 epoch.
-
00:28:24 Be careful with this because each save will take about five gigabyte space and with every
-
00:28:32 10 epoch, it is going to make 20 saves.
-
00:28:35 So it is going to take all of my hard drive.
-
00:28:38 So I think I will make this up to 180 or 160.
-
00:28:42 This should be sufficient.
-
00:28:44 If you don't know what are these parameters, how am I setting them.
-
00:28:48 I have an excellent DreamBooth tutorial on my YouTube channel.
-
00:28:53 You should watch this definitely to learn more about DreamBooth training.
-
00:28:57 Okay the batch size is one.
-
00:28:59 Gradient accumulation steps are one class batch size which will determine how many images
-
00:29:04 at a time that I want to be generated for classification images not related to training.
-
00:29:11 I will set this as 16 because this graphic card has huge VRAM, but if we get error, I
-
00:29:17 will reduce it.
-
00:29:18 Set gradients to none when zeroing.
-
00:29:20 Okay correct.
-
00:29:21 I am going to use half learning rate.
-
00:29:24 I am going to use sanity prompt as photo of ohwx man by Tomer Hanuka.
-
00:29:30 I will explain what are these for.
-
00:29:33 Actually I am explaining what are these for in this tutorial with details.
-
00:29:37 This is for checking the over trained or not.
-
00:29:41 And in here I am going to use EMA.
-
00:29:43 This will improve my training success rate and I have 24 gigabyte VRAM.
-
00:29:48 I will use 8 bit adam.
-
00:29:50 I am going to use mixed precision and I am going to use fp16 because this bf16 is not
-
00:29:57 supported by all graphic cards.
-
00:29:58 It is supported by RTX 2000 series or 3000 series.
-
00:30:03 I am not sure about this card as well.
-
00:30:05 So fp16 is our most safe option for every cards.
-
00:30:09 I am not going to use xformers.
-
00:30:11 This is important because the current xformers is not supporting the DreamBooth training
-
00:30:17 or Textual Inversion training.
-
00:30:18 It is you see, xformers 0016.
-
00:30:21 I think it will become compatible with xformers 0017 when it is officially released.
-
00:30:28 Currently nightly version is supporting as well as far as I know.
-
00:30:32 Cache latents.
-
00:30:33 Yes it will improve speed.
-
00:30:35 Train UNET.
-
00:30:36 Okay these are the optimal settings actually, so no need to change them.
-
00:30:40 And in here concepts.
-
00:30:41 Okay first we need to upload our training data set.
-
00:30:44 To do that go to the Stable Diffusion web ui folder or workspace.
-
00:30:48 Doesn't matter I will upload them to workspace.
-
00:30:51 In here create new folder training data set.
-
00:30:55 I have named the folder like this.
-
00:30:57 Enter inside folder and click upload files.
-
00:31:00 Select the files from your computer.
-
00:31:03 Since I don't have many files currently I am going to use this method and you see I
-
00:31:08 have only nine images which are pretty close shots.
-
00:31:12 No same background.
-
00:31:13 No same clothes as you can see.
-
00:31:16 I am all explaining what is a good training data set in this video and they are getting
-
00:31:21 uploaded.
-
00:31:22 We could also use runpodctl.
-
00:31:24 However, since there isn't many files, I am using this methodology for this task and our
-
00:31:31 training data set is ready.
-
00:31:33 Okay now we need to give the path of it.
-
00:31:35 To give the path of it.
-
00:31:36 Go back to the workspace like this: right click, copy path, paste it like this and put
-
00:31:42 a backslash to the beginning of it and where we want regularization images to be generated.
-
00:31:48 I am copy pasting like this and I will type classification images.
-
00:31:53 Okay filewords.
-
00:31:55 For training faces I am not using filewords.
-
00:31:57 It is more likely needed to fine-tune your model with lots of tokens and lots of good
-
00:32:05 images.
-
00:32:06 If you wonder how filewords are working in this short video.
-
00:32:09 I am explaining how file words are actually working.
-
00:32:13 So I'm just skipping file words and I am going to prompts.
-
00:32:17 So our instance prompt will be ohwx man.
-
00:32:20 Ohwx is our rare token and man is our class.
-
00:32:24 Class prompt will be photo of man since I am teaching a face of a man.
-
00:32:29 Sample prompt will be simply photo of ohwx man.
-
00:32:33 I am not going to set negative prompt or other things.
-
00:32:36 How many classification regularization images we want for per training image.
-
00:32:42 I have nine training images and I want 50 for per image.
-
00:32:46 This is actually a debated topic, how many is good is not precise.
-
00:32:52 In the official DreamBooth paper, the authors have used 200 so you can also try with 100
-
00:32:58 like this as well.
-
00:32:59 Okay then go to the saving tab, generate a ckpt file when saving during training.
-
00:33:04 So we will be able to generate checkpoints for every 10 epochs and then we will be able
-
00:33:11 to compare them to see which one of the checkpoint is performing best, which one of the checkpoint
-
00:33:19 has learned our subject best and with this way you can avoid over training.
-
00:33:26 And once you are ready, click save settings and hit train.
-
00:33:30 First it will start with generating class images.
-
00:33:32 In my pod I will see GPU utilization and memory usage.
-
00:33:35 Okay it says that exception training model no executable batch size found reached zero.
-
00:33:42 Why we got this error because we did set the classification images batch size pretty big.
-
00:33:50 If you make it like let's say six and try again.
-
00:33:54 And now I am seeing that it is generating six images at a time.
-
00:33:59 The it is pretty low actually only 12 because we need to multiply this with six and we are
-
00:34:07 seeing the images are being generated.
-
00:34:09 They will be saved in workspace, classification images directory like this: if you have previously
-
00:34:18 generated images on your computer, then you can alternatively upload them.
-
00:34:22 For uploading them I will install runpodctl on my windows.
-
00:34:28 To do that I am going to run this command on my windows powershell.
-
00:34:33 Type powershell, right click and hit enter.
-
00:34:37 Okay the installation has been completed, the runpodctl is now available on my command
-
00:34:42 prompt: let's see, runpodctl and now I am seeing it.
-
00:34:48 So I have previously generated 2400 images on my hard drive.
-
00:34:54 I am going to share this with runpodctl to download them in RunPod.
-
00:34:59 Alternatively, you can use upload methodology as well.
-
00:35:03 It also works, but for bigger files, runpodctl is better.
-
00:35:08 So for sharing the folder type runpodctl send and the folder path like this.
-
00:35:16 Getting the folder path easier, copy the folder path from here, paste it into the notepad
-
00:35:22 like this.
-
00:35:23 Put quotation mark to the beginning and end and type in your cmd.
-
00:35:29 I will show from beginning once again.
-
00:35:32 Open cmd type runpodctl send and paste the path like this and it will prepare like it.
-
00:35:39 It says that photo of man zip already exists because in another cmd window we used that.
-
00:35:46 So I need to delete this file.
-
00:35:49 Okay, this zip file is generated inside local disk c users and my username directory.
-
00:35:56 I am just going to delete it and I will run the command once again.
-
00:36:00 It will quickly prepare all of the files and now share link is generated I am copying this,
-
00:36:07 selecting it ctrl c or select it right click from here and copy, then go back to your Jupyter
-
00:36:14 Lab where your RunPod is running and in here I will make a new folder like this: ready
-
00:36:21 class.
-
00:36:22 I will enter inside ready class folder, then I will open a new terminal like this and I
-
00:36:28 will paste the command.
-
00:36:31 You see runpodctl receive the URL it has generated, hit enter.
-
00:36:36 It will connect to my computer and it will start downloading all of the files very quickly.
-
00:36:41 So this is how you can upload files from your computer to the remote RunPod.
-
00:36:48 The same thing applies to the RunPod to RunPod, so this is all vice versa.
-
00:36:53 RunPod to computer, RunPod to RunPod computer to RunPod.
-
00:36:57 You can send and receive files like this.
-
00:37:00 This is of course totally depends on my upload speed.
-
00:37:03 So when I open my task manager I see that it is using all of my available upload speed
-
00:37:09 like this.
-
00:37:10 This is pretty useful and convenient.
-
00:37:13 Instead of generating new classification images each time which uses your GPU time and consumes
-
00:37:20 your credits, you can prepare them on your computer and then quickly upload them to your
-
00:37:25 RunPod.
-
00:37:27 You can also upload them to any hosting, website, or other places that has better upload speed
-
00:37:33 and download them with the wget command as I have shown to download ckpt file.
-
00:37:41 RunPodCTL is extremely useful to upload and download files as you can see.
-
00:37:48 Okay 2400 photo of man.
-
00:37:51 The classification regularization images upload have been completed.
-
00:37:55 Now I see that it is uploaded as a zip here.
-
00:37:59 I need to extract them like oh.
-
00:38:01 It has automatically extracted as you can see after refresh.
-
00:38:05 Now they are here.
-
00:38:07 So what am I going to do is I will cancel training and I will give this folder.
-
00:38:13 So I will just skip image generation.
-
00:38:16 So it has been cancelled.
-
00:38:17 Let's give the new folder.
-
00:38:20 In concepts type here new folder name and click save settings and okay looks like the
-
00:38:27 train button is not appeared.
-
00:38:29 So what we need to do is we need to refresh reload.
-
00:38:34 Okay reloaded.
-
00:38:35 Go to the DreamBooth select the model, hit load settings, verify the settings are properly
-
00:38:41 loaded.
-
00:38:42 Okay, this is not being saved so you should uncheck it.
-
00:38:45 Okay, all settings are looking good and click train.
-
00:38:49 Now it won't generate any new classification regularization images because we already provided.
-
00:38:54 We can see that in the terminal window in here.
-
00:38:58 So you see it is processing the uploaded photo of man images.
-
00:39:02 Then it is going to cache the classification images with caching latents.
-
00:39:07 Okay, the training has started.
-
00:39:09 It has a pretty good speed as you can see.
-
00:39:13 It is supposed to do 180 epochs in less than 15 minutes.
-
00:39:17 However, this will take a little bit more time because it will generate ckpt during
-
00:39:22 the training.
-
00:39:23 We can also watch the training here.
-
00:39:25 However, you may get disconnected from gradio interface.
-
00:39:30 You can just watch the command line interface from here and know the status of the training
-
00:39:35 if that happens.
-
00:39:37 Okay, 10 epochs have been completed so it started generating the initial images as you
-
00:39:42 can see.
-
00:39:43 It also generated a checkpoint.
-
00:39:45 Where can we see the checkpoint?
-
00:39:47 Go to the workspace, go to the Stable Diffusion Web UI, go to the models folder, go to the
-
00:39:52 Stable Diffusion folder, and in here you will see our training name, go inside that folder
-
00:39:58 and now we can see the checkpoints being generated.
-
00:40:01 Then we will test each one of them with x/y plot and see how they are performing.
-
00:40:07 So if you want to see the generated samples during training, go to the models folder,
-
00:40:12 go to the DreamBooth folder, go to your training named folder, and in here you will see samples.
-
00:40:18 So these are the samples being generated during training and when you click the txt file,
-
00:40:24 you will see which prompt was used to generate this image.
-
00:40:27 When you double click the image, it will open image like this.
-
00:40:30 So far it is not like me at the moment.
-
00:40:34 When you go to the My Pods, you can see the GPU utilization and GPU memory being used.
-
00:40:39 The GPU memory is almost full because we are using EMA and we are not using xformers.
-
00:40:45 Because in the settings tab, we checked to use EMA and in the memory attention we didn't
-
00:40:49 use xformers.
-
00:40:50 And do, and these two are heavily increasing the memory usage.
-
00:40:55 Also, we didn't check the gradient checkpointing.
-
00:40:58 This also reduces the VRAM usage.
-
00:41:01 However, if you have sufficient amount of VRAM you shouldn't check this as well.
-
00:41:05 Okay, even after 130 epochs, it is still not learning even though it shows a good loss
-
00:41:11 rate.
-
00:41:12 That means that there is a bug currently with DreamBooth extension.
-
00:41:16 Therefore, I have cancelled the training.
-
00:41:18 Now I will delete the folder to open a space.
-
00:41:22 Right click folder.
-
00:41:23 Delete it.
-
00:41:24 It says that it is not empty so you can't delete it.
-
00:41:27 However, we can.
-
00:41:28 Now I will show you how to do it.
-
00:41:30 Click new, open a new terminal type rm minus r and the directory name test sd15.
-
00:41:39 It will iteratively delete all of the files and the folder.
-
00:41:42 After we refresh it is gone.
-
00:41:44 Now I will figure out the problem and show you the working settings and setup.
-
00:41:50 So I have figured out the problem and the problem was exactly as I have guessed it.
-
00:41:55 It was using xformers even though we didn't select use xformers.
-
00:42:02 In the settings, we had used memory attention default.
-
00:42:06 However, it was still using xformers.
-
00:42:09 Wo what did I do to fix this problem?
-
00:42:12 It is simple.
-
00:42:13 I have opened the Web UI user dot sh file and I have removed the dash dash xformers
-
00:42:21 from command line arguments.
-
00:42:22 I have restarted my Web UI.
-
00:42:25 Then I have composed a new training with the exactly same parameters and it did work very
-
00:42:31 well.
-
00:42:32 The training has been completed, so let's download the samples and check them out on
-
00:42:37 our computer.
-
00:42:38 To download the folder of samples, I will use runpodctl command.
-
00:42:42 So what I need to do is I will enter the samples folders.
-
00:42:47 So to do that, go to the models folder, go to the DreamBooth, go to the training folder
-
00:42:52 name so the samples are located here.
-
00:42:55 Open a new command terminal, write runpodctl send samples which is the folder name and
-
00:43:03 it will zip the samples folder and generate a receive command.
-
00:43:07 Copy it with ctrl c.
-
00:43:09 First, I need to add the path of runpodctl into my environment.
-
00:43:15 So the currently runpodctl exe is located inside my user folder.
-
00:43:21 Go to the users and your username and I will copy the runpot, yaml and runpodctl exe file.
-
00:43:27 Copy them.
-
00:43:28 Then I will make a new folder in my c drive as runpot exe.
-
00:43:33 Paste them here.
-
00:43:34 Then in the search bar search for environment, it will open, edit environment variables like
-
00:43:41 here and in here.
-
00:43:42 I am going to add a path variable for system variables, so go to the path, click edit and
-
00:43:49 in here click browse, select the folder where you have copy pasted which is inside c drive.
-
00:43:56 Runpod exe click ok, now the runpod exe is registered in my path.
-
00:44:02 Click ok, click ok click ok and now runpod exe should be available to call from everywhere.
-
00:44:09 Where I want to download.
-
00:44:10 I want to download the files inside my pictures, inside test samples.
-
00:44:16 I type cmd here.
-
00:44:18 So currently this is where I am.
-
00:44:20 Now I will copy and paste this command into my cmd window.
-
00:44:25 And yes, it is running as expected and the files are being copied into my folder.
-
00:44:33 And then they are automatically extracted with the folder name.
-
00:44:36 So in here we are able to see the generated sample images.
-
00:44:40 I can say that after 800 steps it started to resemble me and we have totally trained
-
00:44:48 it for 160 epochs, 3200 steps, we can see the examples here.
-
00:44:55 Okay, this is pretty much like me, so with good prompting I think we can get good results.
-
00:45:03 So let's try all of the checkpoints to see which one is working best.
-
00:45:07 How are we going to do that?
-
00:45:09 We are going to do that with text to image tab and in here we are going to use x/y/z
-
00:45:14 plot.
-
00:45:15 Okay, it didn't appear.
-
00:45:16 Let's refresh.
-
00:45:17 Oh, looks like our instance is closed so I will restart.
-
00:45:22 So before restarting make sure that you have closed all of the running terminals and I
-
00:45:27 will also close all of the open tabs.
-
00:45:29 Okay, all of the tabs and terminals are closed.
-
00:45:33 Okay, Web UI is restarted.
-
00:45:35 Let's open it.
-
00:45:36 Okay, now we can also see the checkpoints in here so you can test particularly one of
-
00:45:43 them.
-
00:45:44 But I am going to do xyz plot test.
-
00:45:47 But before that, let's decide our testing prompt.
-
00:45:51 So I am going to make my tests on 2200 step checkpoint.
-
00:45:58 I am going to select it from here.
-
00:46:00 First, let's see the raw prompt.
-
00:46:02 Ohwx man.
-
00:46:03 Okay, this is the raw prompt and it looks pretty decent.
-
00:46:08 This is the training data set you see.
-
00:46:11 It looks pretty decent, but it looks like have some memorization.
-
00:46:15 Actually, not exactly memorization.
-
00:46:18 The clothe is similar but not exactly same.
-
00:46:20 Okay, while doing testing, my Web UI has been killed.
-
00:46:24 So I have checked the terminal to see the message.
-
00:46:28 So you should be careful if some error happens.
-
00:46:31 Make sure to check the terminal to see what is happening in the behind the scenes and
-
00:46:36 now it is not able to restart.
-
00:46:39 Therefore, I will close all of the terminals and start with a different port.
-
00:46:46 To do that, you need to go to the terminals tab, shut down all and change the webui user.sh
-
00:46:53 file: change the port from here.
-
00:46:56 Save and restart.
-
00:46:57 Okay, I got a simple prompt like this: photo of ohwx man 1.2 emphasis: you can learn emphasis
-
00:47:05 from wiki page of Automatic1111.
-
00:47:09 Just pause the video and read here if you don't know.
-
00:47:12 And digital painting, artstation, masterpiece.
-
00:47:14 I don't have any negative prompts.
-
00:47:17 The picture is not exactly like me.
-
00:47:19 So now we are ready to do test and see if model is trained enough.
-
00:47:23 If it is not trained enough, then go to the DreamBooth tab, select the model load settings
-
00:47:29 and continue training.
-
00:47:31 It will continue training for the number of steps that you have defined in here.
-
00:47:36 Okay, I started continue training and it will start from this model revision which means
-
00:47:42 it will start from 3200 steps and it will continue to do training for number of epochs
-
00:47:50 that we have defined here.
-
00:47:52 However, my Gradio is crashed once again and I am able to see the continuing training from
-
00:47:59 here.
-
00:48:00 Now let's test the current checkpoints and see whether they are trained enough or not
-
00:48:05 and decide upon that to continue training or not.
-
00:48:08 However, since my Gradio is crashed, I have to restart the terminal because there is no
-
00:48:14 way to cancel the training right now.
-
00:48:16 Let's also have yes no way.
-
00:48:18 Okay, I did a restart.
-
00:48:20 So how are we going to test different checkpoints?
-
00:48:24 Prompt emphasis, and CFG values.
-
00:48:26 Go to the bottom, pick x/y/z plot and in here you see there are different type of parameters.
-
00:48:33 So first parameter will be checkpoint name.
-
00:48:36 When you click this icon it will paste the available checkpoints.
-
00:48:40 I am going to start picking from 1600 steps which means 80 epochs for me.
-
00:48:46 It depends on the your training dataset size and I will test the remaining as well like
-
00:48:52 this.
-
00:48:53 It is also displaying the calculated hash value.
-
00:48:56 Okay, as a second thing, I am going to test prompt strength.
-
00:49:00 To do that, I am going to use prompt s/r.
-
00:49:02 So I am going to give this a any keyword like prsr.
-
00:49:07 So the first value here will be prsr.
-
00:49:10 Then I will type the prompt strengths like 1.1, 1.2, 1.3 let's also try 1.0, 1.4 1.5,
-
00:49:20 1.6 and 1.7 okay, as a third comparison thing, I am going to test CFG value.
-
00:49:28 So for CFG values, I am going to test seven, seven point five, eight, eight point five,
-
00:49:34 nine, nine point five and ten.
-
00:49:37 If you keep minus one for seeds then you won't be able to compare them very well.
-
00:49:42 So do not check this checkbox so it will use same seed for all of the comparisons and then
-
00:49:48 when you click generate it will process all of them.
-
00:49:52 You can see the process in the command line interface.
-
00:49:56 Now meanwhile this is running I will start my 2.1 version RunPod as well.
-
00:50:02 Okay, it says that there is no available GPU for this RunPod right now so I can start it
-
00:50:09 without a GPU and transfer my files with runpodctl.
-
00:50:14 However, I do not have any files on it so I will just delete it because I didn't even
-
00:50:20 start it yet and I will start a new one.
-
00:50:22 Okay, I am going to use this one.
-
00:50:26 Select the template from here.
-
00:50:27 I will pick Stable Diffusion 2.1 version: I will start with 100 gigabytes, deploy my
-
00:50:34 pods, it is being initialized and my other Pod is currently working with this kind of
-
00:50:40 i/t.
-
00:50:41 By the way, xformers is still not enabled right now, so if you enable it, this will
-
00:50:46 become even faster.
-
00:50:47 But for training, make sure that you have disabled it.
-
00:50:50 And images are being generated in here.
-
00:50:52 We will download all of them and check all of them later.
-
00:50:56 Okay, 2.1 version is being generated and getting ready.
-
00:51:01 Okay, 2.1 is now ready.
-
00:51:03 Just click connect.
-
00:51:05 Connect to the Jupyter.
-
00:51:06 Okay, it says that it cannot connect yet so it is probably still not ready.
-
00:51:10 Let's wait.
-
00:51:11 Try again.
-
00:51:13 Okay, let's refresh the page.
-
00:51:14 Maybe the URL is incorrect.
-
00:51:17 Yes, after the refresh I think it is fixed or it is just started.
-
00:51:21 So just be patient a little bit.
-
00:51:23 It is getting loaded and yes, 2.1 version is started.
-
00:51:27 It is exactly same as the previous one.
-
00:51:31 We are editing the command line arguments here.
-
00:51:33 I will add dash dash share so I can use it as I want.
-
00:51:37 And I will also remove xformers because it is preventing training.
-
00:51:42 I will set the port as 3001.
-
00:51:45 Save it.
-
00:51:46 Then there is no open terminals.
-
00:51:49 Let's open a new launcher terminal python relauncher.py Our comparison on SD 1.5 trained
-
00:51:57 models are continuing.
-
00:51:58 Okay, 2.1 RunPod is ready.
-
00:52:02 Let's start it.
-
00:52:03 Okay, currently selected model is 2.1 version.
-
00:52:06 Let's test it.
-
00:52:07 Okay, I have written my prompt the the output resolution is 768 and 768.
-
00:52:14 Looks like we got a problem.
-
00:52:17 It says that a tensor with all NaNs was produced in Unet.
-
00:52:21 So we need to add no half command to the command line because with this graphic card, otherwise
-
00:52:28 it won't work.
-
00:52:29 So let's go back to the RunPod.
-
00:52:31 Open the webui dash user dash sh.
-
00:52:34 So for SD 2.1 version, make sure that you are using these command line arguments.
-
00:52:41 These may be necessary for some of the custom models as well.
-
00:52:44 So check the messages that you see in here.
-
00:52:47 This message should be available also in the terminal window.
-
00:52:51 Yes, you can also see the error in here as well.
-
00:52:55 So I will close the terminal and restart it.
-
00:52:58 Currently I am spending 0.669 dollars per hour.
-
00:53:03 My mode of the RunPods are running right now.
-
00:53:06 Okay, it looks like I have mistyped the dash dash precision.
-
00:53:12 So it says that argument precision expected one argument.
-
00:53:15 I will just fix it quickly.
-
00:53:17 To fix it, I am opening the file and I am setting dash dash precision as full, saving
-
00:53:24 it and restarting.
-
00:53:26 Make sure that you only have one active running terminal, otherwise other terminals will also
-
00:53:32 consume your VRAM memory.
-
00:53:35 You can also see the VRAM memory usage in your My Pods tab and you can see the logs
-
00:53:41 from here.
-
00:53:42 This is really important to debug the errors.
-
00:53:44 Okay, it is started with these command line arguments exactly like this.
-
00:53:49 Let's open the Gradio window.
-
00:53:51 Okay, let's hit generate with our written prompt and it is getting generated.
-
00:53:57 And we got our tank image.
-
00:54:00 Now I will install the extension same exactly as I have did.
-
00:54:04 Okay, 2.1 version is ready with DreamBooth now.
-
00:54:07 Go to the DreamBooth tab, make a new model I will name as test select the source checkpoint.
-
00:54:13 Uncheck 512 model.
-
00:54:14 Hit create.
-
00:54:15 When the first time you click hit create it is downloading the necessary files same as
-
00:54:19 before because this is a new RunPod so they are not connected.
-
00:54:23 This is a fresh installation and checkpoint successfully extracted so it is ready.
-
00:54:29 Okay, we didn't get any error so we can continue.
-
00:54:32 So for 2.1 version usually you need more epochs so I will set this as 300.
-
00:54:38 However, now it will also use more space.
-
00:54:42 Due to more epochs so I need to reduce save model frequency.
-
00:54:47 I think I will save it for every 20 epochs.
-
00:54:50 Batch size one, gradient accumulation one, class batch size will be four.
-
00:54:55 I am not going to set gradient checkpointing.
-
00:54:58 You can also leave it as default learning rate.
-
00:55:01 This would make it learn faster, however, it may also not learn very well or it may
-
00:55:07 get over trained quickly.
-
00:55:09 So I will make this as one.
-
00:55:10 But you can also leave it as default.
-
00:55:13 So the other things are same.
-
00:55:14 Now with 2.1 version, I don't know if 24 gigabytes will be enough without xformers when we use
-
00:55:20 EMA so I will test it.
-
00:55:23 Okay, it says let's make it like this.
-
00:55:26 Actually, we should click performance wizard so it will set the optimal ones for us.
-
00:55:32 Okay, okay, I am leaving the settings like this.
-
00:55:35 Let's also set the memory attention as default and let's see if it will work.
-
00:55:39 By the way, we also need to re-upload our training images and these training images
-
00:55:44 have to be 768 pixels because this model is 768 pixels model.
-
00:55:52 So to upload them I am following just the same things.
-
00:55:56 Here my 768 pixel images.
-
00:55:59 I'm just going to use drag and drop but you can use runpodctl as well as just I have displayed.
-
00:56:06 Okay they are ready.
-
00:56:07 So I am right clicking copy path, paste it, adding a backslash to the beginning.
-
00:56:12 Copy this and let's say class 786.
-
00:56:17 All other settings are same.
-
00:56:19 Ohwx man, photo of man, photo of ohwx man and I will use only 12 images because I want
-
00:56:28 training to start quickly but you should use bigger number.
-
00:56:32 I am checking generate ckpt during the checkpoints, click save settings, and hit train.
-
00:56:39 So it will start with generating class images.
-
00:56:42 So for each image we are generating 12.
-
00:56:44 Okay, we got an error.
-
00:56:46 Therefore, we need to decrease the class batch size.
-
00:56:49 Let's hit train again.
-
00:56:50 Okay, looks like our Gradio is killed, therefore it has to be restarted.
-
00:56:55 You may get these errors.
-
00:56:56 Okay, during restart, it is throwing error because port is still being in used.
-
00:57:01 So I am going to close the terminal, change the port and restart myself manually.
-
00:57:07 Okay, restart has been completed.
-
00:57:09 Let's go to the DreamBooth, select model load settings, just quickly verify settings.
-
00:57:14 I am unchecking this because it is usually problematic.
-
00:57:17 Class batch size is two and let's hit train.
-
00:57:21 You can also generate classification images from text to image directly yourself.
-
00:57:25 Cut the generated images and put them into a new folder.
-
00:57:28 Okay, we got error once again.
-
00:57:30 This is a memory error actually.
-
00:57:32 When we check the command line interface, we can see the memory error.
-
00:57:37 So looks like our only option is class batch size one.
-
00:57:40 Let's click train.
-
00:57:41 Okay, it is working.
-
00:57:42 However, this will be very slow.
-
00:57:44 So what am I going to do is?
-
00:57:45 I will enable xformers, manually generate from text to image and use them as classification
-
00:57:51 images which will save our time significantly.
-
00:57:55 So follow me how am I doing.
-
00:57:57 First, I will just terminate the terminal from here.
-
00:58:00 I will add dash dash xformers, change the port and restart python relauncher.py I would
-
00:58:08 also clear text to images tab so you can directly use it so I will just rename it.
-
00:58:14 It will generate a new folder for me and new app is started with xformers.
-
00:58:19 Let's open it!
-
00:58:20 So our class prompt is photo of man.
-
00:58:23 I am typing photo of man.
-
00:58:25 I am going to set the sampling steps as 30 which is a decent enough and I am leaving
-
00:58:30 all other options are same and I will use batch size as eight and how many images total
-
00:58:37 do you need?
-
00:58:38 Let's say for per training image 50 images since I have nine images, I am going to generate
-
00:58:44 480 images.
-
00:58:46 Therefore I need to set this minimum 57 and then hit generate and let's see if we will
-
00:58:53 get out of memory error.
-
00:58:55 And you see from text to image tab we are not getting out of memory error even when
-
00:59:01 the batch size is eight.
-
00:59:03 So it will very quickly generate all of these images for us much faster than using the classification
-
00:59:11 images that is being generated in the DreamBooth.
-
00:59:15 If you wonder why it is generating images like this or why we are using these kind of
-
00:59:19 images, in this video I am explaining all of them so we are keeping the underlying contextual
-
00:59:25 data of the model.
-
00:59:27 You could also use more beautiful images in your classification training data set.
-
00:59:32 However, it would break your model conceptual meaning so your model would become more biased
-
00:59:40 to the images that you have used.
-
00:59:42 Also, your face would be biased to the images that you use.
-
00:59:46 With this methodology, we are using the underlying contextual knowledge of the model and we are
-
00:59:53 trying to keep it as much as possible.
-
00:59:55 However, this is up to you.
-
00:59:58 So if you use all handsome images, all full colored, professional real images, then your
-
01:00:06 model would become more biased to them.
-
01:00:08 This is how custom models are usually made.
-
01:00:12 They are being cooked to those kind of images.
-
01:00:15 So whatever you type, you are getting beautiful images because all of the other underlying
-
01:00:20 conceptual data of the model is lost during the training.
-
01:00:25 Actually, according to the ControlNet developer, SD 2.1 version is inferior to the SD 1.5 due
-
01:00:35 to the used CLIP.
-
01:00:36 You can read this with pausing the video right now.
-
01:00:39 Okay, looks like our 1.5 version experiment has ended.
-
01:00:44 Let's go to the outputs and in here there are text to image grids and you see there
-
01:00:51 is a grid file.
-
01:00:53 35 megabytes.
-
01:00:54 Let's open it.
-
01:00:55 Actually I will download this and there is also 228 megabytes.
-
01:01:00 So for downloading let's use the runpodctl.
-
01:01:04 I am going to open a new command line in here.
-
01:01:07 Runpodctl, send text to image grids.
-
01:01:11 Hit enter and it will generate download link for us.
-
01:01:15 Go to the folder where you want to download.
-
01:01:17 I will download inside in here, type cmd, copy paste the link like this.
-
01:01:22 So it is going to download 265 megabyte grid output.
-
01:01:27 This is much faster than downloading from the Jupyter notebook.
-
01:01:31 Okay, the grid images are downloaded.
-
01:01:34 And in here this is the newest grid image that is generated.
-
01:01:39 It is over 200 megabytes, it is over 35 000 pixels and now we are able to compare different
-
01:01:47 checkpoints with different prompt emphasis and with different CFG scale.
-
01:01:53 So this is for CFG scale 7.
-
01:01:55 These are the checkpoints and these are the prompt emphasis.
-
01:02:00 Let's find the best one that we like and that is similar to us.
-
01:02:05 You see these faces are not like me but in here I am seeing faces like me.
-
01:02:12 So with prompt strength 1.4 in these checkpoints I am starting to get similar face like to
-
01:02:19 me.
-
01:02:20 I think this one is very similar to me.
-
01:02:22 So with prompt strength 1.4 for CFG scale 7 and for checkpoint 3000 steps.
-
01:02:30 Yeah I like it.
-
01:02:31 So you should also compare for yourself.
-
01:02:34 And after prompt strength 1.4 the image becomes very very bad.
-
01:02:39 So let's also look at the other CFG scales and checkpoints.
-
01:02:44 Okay now I will show you slowly what is happening from CFG scale 10 to 7 and this is the prompt
-
01:02:51 strength 1.4.
-
01:02:53 This is how the images are changing.
-
01:02:55 This would of course depend on your training data set, how it is trained and I can see
-
01:03:01 that they are not very good at all because we also didn't use any negative prompts.
-
01:03:08 Our aim here is finding the sweet spot of prompt strength and the checkpoint and the
-
01:03:16 CFG possibly.
-
01:03:18 Okay I think this model is still not trained enough.
-
01:03:22 Because with only 1.4 strength and in the 3200 steps, it is providing the best.
-
01:03:31 So therefore I will train this model even further with more steps and then do another
-
01:03:38 experiment.
-
01:03:39 However, currently we could use 1.4 strength with checkpoint 3200.
-
01:03:45 I suggest you to test no half and precision full training for SD 1.5 version as well without
-
01:03:54 xformers and compare whether it is learning better or not.
-
01:04:00 Because of the used graphic card this could be making a difference and you can test use
-
01:04:06 8bit adam or not.
-
01:04:08 You can test mixed precision no versus fp16 and bf16 so these all things could improve
-
01:04:16 your training success rate.
-
01:04:18 You should experiment with them and currently I do not have time to test all of them.
-
01:04:24 I am showing the some of the settings that are widely used but you should also experiment
-
01:04:30 with them.
-
01:04:31 Like options like this or like this or like this or like this.
-
01:04:36 Now I will show you how to download custom models from CivitAI .com and use them in your
-
01:04:43 RunPod io.
-
01:04:44 So I am going to show example of Protogen x3.4.
-
01:04:49 Right click download latest copy link, go to your RunPod io interface, Jupyter interface
-
01:04:57 and in here go to the folder where the model files are downloaded.
-
01:05:02 So in this folder which is model Stable Diffusion where you are supposed to put your model files,
-
01:05:09 open a new launcher, open launcher, type wget, paste the link and hit enter and it will start
-
01:05:16 downloading the model file.
-
01:05:18 So you see 5.6 gigabytes and you see there are no more space left on my hard drive.
-
01:05:26 What I need to do is I will delete the some of the models.
-
01:05:31 So I am going to delete some of the training checkpoints.
-
01:05:34 They are located inside models, inside Stable Diffusion, inside my training folder and in
-
01:05:41 here I am going to remove delete some of them.
-
01:05:43 You can also do a directory delete right, click and delete.
-
01:05:47 You can also select them and hit delete button on your keyboard.
-
01:05:51 Okay, I think we got now sufficient space so I will just rerun the prompt.
-
01:05:56 So to open back the latest executed command I just hit up arrow and hit enter and now
-
01:06:02 it will start downloading.
-
01:06:03 Currently it will be downloaded in this folder where we had opened this terminal.
-
01:06:10 Let's go back to there.
-
01:06:11 Models Stable Diffusion and now this file is being downloaded with the name of 4048.
-
01:06:19 Then I will rename it.
-
01:06:21 Meanwhile, 2.1 version classification regularization images are still being generated.
-
01:06:26 We can see the process in the terminal of it.
-
01:06:30 You see it has generated over 160 images so far.
-
01:06:34 Okay, it is downloading the custom model file with 50 megabytes per second.
-
01:06:39 You can also upload those files from your computer or you can download from Hugging
-
01:06:45 Face as I have shown you already.
-
01:06:48 So this is how you can download files fast on your Pod.
-
01:06:52 Okay, the file has been downloaded and saved as 4048.
-
01:06:57 I will rename right click, rename and let's say protogen x34 it is renamed.
-
01:07:05 Then let's go back to our Stable Diffusion interface.
-
01:07:08 Click, refresh folder.
-
01:07:10 It is not appearing because the model file extension is not correct.
-
01:07:14 Right Click.
-
01:07:15 And when renaming, add dot ckpt to end of it like this and then refresh again.
-
01:07:23 Okay, now we see the model here.
-
01:07:25 Let's test it.
-
01:07:26 Okay, it didn't load even though I have selected.
-
01:07:29 Let's look at the command line interface.
-
01:07:31 Okay, it says that we should add disable safe unpickle because we have downloaded it like
-
01:07:38 that.
-
01:07:39 So I will add this to the command line arguments and restart like this.
-
01:07:44 Let's also change the port.
-
01:07:46 Just close all of the terminals.
-
01:07:47 Okay, restart has been completed with disable safe unpickle.
-
01:07:51 Let's open the interface.
-
01:07:53 Okay, let's try with protogen.
-
01:07:55 Okay, we got error once again because when we download it, it is downloading safetensors
-
01:08:02 not ckpt.
-
01:08:03 Therefore, we have to rename it once again into safe tensors .safetensors like this and
-
01:08:11 try again.
-
01:08:12 Let's hit refresh.
-
01:08:13 Now there is safetensors.
-
01:08:15 Okay, it is loaded.
-
01:08:16 Let's test it and protogen is working as expected.
-
01:08:20 You see of awesome, intricate, fantastic, castle, in a forest and this is what I got.
-
01:08:25 Let's run again.
-
01:08:26 And yes, this is definitely protogen.
-
01:08:28 Let me run it on 1.5 version official as well.
-
01:08:32 Okay, 1.5 version is loaded and this is the result on 1.5 version official.
-
01:08:38 So this is how you can use custom models on RunPod io.
-
01:08:43 2.1 image generation is still going on.
-
01:08:45 Now I will show you how to do Textual Inversion training.
-
01:08:49 To do that, let's go to the train tab.
-
01:08:52 By the way before doing that, let's go to the settings and in here in training, move
-
01:08:56 VAE and CLIP to RAM when training if possible.
-
01:09:00 You can pick this option to reduce VRAM usage.
-
01:09:03 You can also turn on pin memory for data loader.
-
01:09:06 Makes training slightly faster, but it can increase memory usage.
-
01:09:09 You can also pick this depending on your machine's RAM memory.
-
01:09:13 However, since we have 24 gigabytes, I am not going to pick them.
-
01:09:17 So let's give a name as test initialization text is none.
-
01:09:21 Number of vectors is two.
-
01:09:24 You can watch my excellent how to do Stable Diffusion Textual Inversion video.
-
01:09:30 I am explaining in great details in this video and you can learn many of the things related
-
01:09:37 to the Textual Inversion from this video.
-
01:09:40 Hit create embedding and it is already created.
-
01:09:43 Let's go to the train tab, pick the embedding.
-
01:09:46 We also need to set dataset directory.
-
01:09:49 So our data set directory is like this.
-
01:09:52 We don't need classification images for Textual Inversion training.
-
01:09:56 You can reduce the learning rate or leave it as default.
-
01:10:00 You can test it.
-
01:10:01 Okay, we need a style file word for Textual Inversion.
-
01:10:06 When you watch this video, you will understand it better.
-
01:10:10 So this text file is located inside Stable Diffusion, inside Textual Inversion templates.
-
01:10:16 In here, i'm going to edit the none as as [name].
-
01:10:20 You need this otherwise it won't work.
-
01:10:22 This is the name of the Textual Inversion.
-
01:10:24 This is basically going to use the unique tokens that it generates so i'm going to pick
-
01:10:30 none from here.
-
01:10:32 My width and height are 512 pixels.
-
01:10:35 Max number of steps.
-
01:10:37 You can leave it as this because it will generate pretty small files, but since we are already
-
01:10:42 using a lot of space, I will delete my older checkpoints from DreamBooth, Stable Diffusion
-
01:10:48 Web UI inside models inside Stable Diffusion and inside test2 folder.
-
01:10:54 Okay for selecting hit left shift key, select first, then go to the very bottom while pressing
-
01:11:00 shift key hit here it will select all of them.
-
01:11:04 Then while hitting control button left control, unpick the ones that you don't want to delete
-
01:11:10 right, click and hit delete.
-
01:11:12 It will delete all these files and open a space for me.
-
01:11:16 Okay, now we are ready.
-
01:11:17 I want to check checkpoints for every 10 epochs.
-
01:11:23 How many training images I have.
-
01:11:25 I have nine training images.
-
01:11:26 Therefore, one epochs means nine steps.
-
01:11:30 Five epochs means 45 steps.
-
01:11:33 So for every five epoch I am going to make save.
-
01:11:36 I don't need this and I will pick deterministic.
-
01:11:40 This is the best option and we are ready.
-
01:11:43 Just click hit train embedding.
-
01:11:45 Okay, it has started training.
-
01:11:48 By the way currently xformers is enabled.
-
01:11:51 Therefore, I will disable it and restart again because there is a bug as I have just shown
-
01:11:58 and it is preventing good training.
-
01:12:01 Also in settings this is unchecked.
-
01:12:04 Use cross attention optimizations but still it could be using it due to a bug.
-
01:12:10 So best thing is just disabling the xformers and restarting the training.
-
01:12:16 However, looks like learning right now I think.
-
01:12:19 So probably there is no bug for this one unlike the DreamBooth.
-
01:12:24 The loss rate is also pretty low and it is pretty fast.
-
01:12:28 Okay, it already started learning my face.
-
01:12:32 Not very good but there is a resemblance as you can see and it is really really fast the
-
01:12:38 number of steps it is taking really really fast.
-
01:12:41 This is how fast it is you see.
-
01:12:44 Training Textual Inversion epochs, training speed, the i/t per second and it is learning.
-
01:12:52 However, which one will be best is needs to be checked from text to image tab from x/y
-
01:13:00 plot and as you can see it is learning.
-
01:13:03 So all these samples are being saved inside.
-
01:13:07 Let's go to the Stable Diffusion Web UI folder inside here, textual inversion, inside here
-
01:13:13 you will see the training date and inside here the name of the Textual Inversion training
-
01:13:17 inside here images and these are the images named with the epoch number.
-
01:13:24 You can check them like this, or you can download them and check all of them.
-
01:13:29 Okay, 2700 steps looks a little bit decent.
-
01:13:34 It is actually equal to 300 epochs.
-
01:13:38 Maybe it may get better over time or we may need to use more vector count, but since I
-
01:13:44 am just trying to explain, I will use this and show you how you can use this checkpoint
-
01:13:51 in your queries in your text to image tab.
-
01:13:54 First I will cancel the training.
-
01:13:55 This one also looks like a decent one.
-
01:13:59 Hit interrupt: yeah.
-
01:14:01 3240 also looking decent so it may get even better over time as we do more training, but
-
01:14:08 I don't have too much time.
-
01:14:10 Okay, so to be able to use these embeddings first, we need to copy the generated pt file
-
01:14:16 which is the checkpoint.
-
01:14:18 To do that, go to the Textual Inversion inside your main folder, go to the date that you
-
01:14:23 did training, go to the training name, go to the embeddings, and in here you will see
-
01:14:28 the dot pt files.
-
01:14:30 Pick the checkpoints that you want to test right, click, copy, then go back to the main
-
01:14:36 installation folder and in here you will see embeddings folder and paste them there like
-
01:14:42 this so it is pasted now here.
-
01:14:44 So to activate this Textual Inversion, we are going to type it like this.
-
01:14:50 By the way, there is one very important thing when you do training, it will train based
-
01:14:56 on the model selected here.
-
01:14:57 Therefore this will be most compatible with this selected model and just hit generate
-
01:15:04 and you see our face is generated trained subject.
-
01:15:07 Now we can try stylizing.
-
01:15:09 Okay, I did a simple test awesome, intricate, 3d artstation, cinematic lightning and generated
-
01:15:16 batch size as eight and these are the generated images.
-
01:15:20 So with better prompting it should be possible to get better results.
-
01:15:25 You can do same training on protogen or any other custom model as well, just check it
-
01:15:31 from here, make a new embedding and do training.
-
01:15:35 The Textual Inversion training works pretty decent on custom models as well.
-
01:15:40 However, custom models are not working very well with DreamBooth training.
-
01:15:44 Okay, so our image generation for classification data set for SD 2.1 is completed.
-
01:15:52 Now we will put them into the correct folder so all of the images are now generated inside
-
01:15:59 this folder.
-
01:16:00 How am I gonna do that?
-
01:16:01 I will right click cut, then I will go to the workspace, right, click paste and then
-
01:16:07 I will rename as class 768 version 2 like this.
-
01:16:14 Then I will go to the DreamBooth tab, I will open my test, load settings, go to the settings
-
01:16:21 and in here I will set the concept the classification data set directory as class 768 version 2
-
01:16:30 and now I have 50 images for per instance.
-
01:16:34 Okay, everything else is same.
-
01:16:36 Just save settings and hit train and let's see if we will get out of memory error or
-
01:16:41 not.
-
01:16:42 So it is preprocessing class images.
-
01:16:44 We can see the command line interface okay, uh, so it looks like the Gradio is killed
-
01:16:51 or our web app.
-
01:16:52 Therefore, we need to restart it.
-
01:16:55 By the way, we also need to disable xformers, otherwise it won't work for training.
-
01:16:59 So I am disabling xformers, saving, closing all of the terminals and starting a new instance
-
01:17:06 of the web ui.
-
01:17:08 Okay, restart is done.
-
01:17:09 You see these are the command line arguments that I have used to start 2.1 version Web
-
01:17:16 UI let's open it.
-
01:17:18 Go to the DreamBooth select model, click load settings.
-
01:17:22 Just verify settings quickly if they are correct or not.
-
01:17:25 Okay, all looking good and let's click train to see how it works.
-
01:17:30 Okay, preprocessing class.
-
01:17:32 Let's also see the cmd window from here.
-
01:17:35 Okay, you see it says nothing to generate because we already have sufficient number
-
01:17:40 of classification images in our folder 456 and we need 450 images.
-
01:17:47 So it is caching right now.
-
01:17:49 Okay, after caching it is killed once again and trying to relaunch.
-
01:17:55 Okay, we got out of memory error so we need to enable some more of the memory optimization
-
01:18:03 and I already unchecked the EMA.
-
01:18:07 Therefore, looks like we need some more optimization.
-
01:18:10 So I will pick fp16, but we are not using mixed precision so it is probably being ignored.
-
01:18:17 What else we can do for more optimization?
-
01:18:21 Gradient checkpointing yes, we can do this and let's save settings, load settings, and
-
01:18:28 hit train once again.
-
01:18:30 Okay, looks like I had to refresh load settings.
-
01:18:34 Hit train okay, yeah, it says that change in precision detected.
-
01:18:39 Please restart Web UI entirely to use new precision.
-
01:18:43 All right, so we will restart it.
-
01:18:46 Okay.
-
01:18:47 Restart is done.
-
01:18:48 Let's go to DreamBooth select model load settings and now gradient checkpointing enabled.
-
01:18:54 Use 8-bit adam fp16, memory attention default, cache latents and let's see if we will get
-
01:19:02 any error or not.
-
01:19:03 Okay, training started this time.
-
01:19:05 I hope we don't get any error during preview generation because it also uses GPU and we
-
01:19:11 can see our GPU is being used 95 percent already.
-
01:19:16 You can also see other utilization parameters here volume, container, and this is my other
-
01:19:21 running pod and this is how much I have spent and how much I am spending.
-
01:19:26 So now I will show you how to install ControlNet on SD 1.5 version.
-
01:19:33 If you don't know what is control net and how to install and use it.
-
01:19:37 I already have a great tutorial on my channel.
-
01:19:40 So this is the extension that we are going to install.
-
01:19:42 Copy the extension URL.
-
01:19:45 You can also find this in the description.
-
01:19:47 Go to the extension tabs, go to the install from URL, copy paste it, and click install.
-
01:19:53 Then once it is installed, go to the installed tab, apply and restart UI.
-
01:19:58 After we clicked it and unfortunately the Gradio is died again.
-
01:20:01 So I will relaunch it and since I am not going to do any training, I am enabling xformers
-
01:20:07 once again because it will speed up my image generation.
-
01:20:11 Okay, after restart, go to the text to image tab and in the bottom you should see ControlNet
-
01:20:16 like this.
-
01:20:17 Now we need to download ControlNet model which is hosted on Hugging Face in here.
-
01:20:24 Go to the files and versions and just download which model that you want to use.
-
01:20:29 Because each model files are like five gigabytes.
-
01:20:32 I'm going to show scribble as an example.
-
01:20:35 All others are same, exactly same and when you watch this video you will learn more about
-
01:20:40 them.
-
01:20:41 Okay right.
-
01:20:43 Click the download button, copy link path, go to your RunPod.
-
01:20:46 So these files will be put inside another folder.
-
01:20:50 Go to the extensions, go to the sd Web UI control net, go to the models.
-
01:20:55 We are going to put them inside here.
-
01:20:58 So in here I will open new launcher, open terminal wget and copy paste the link and
-
01:21:05 you see it has started downloading file from Hugging Face with an incredible speed.
-
01:21:10 Meanwhile I will show something else how you can download your trained models into your
-
01:21:15 computer.
-
01:21:17 So to download your trained DreamBooth model, go to the models, go to the Stable Diffusion,
-
01:21:23 go to the training and let's say you want to download this ckpt.
-
01:21:27 You can right click and download.
-
01:21:30 Or you can use runpodctl as we already shown multiple times.
-
01:21:34 But let's just show once again, runpodctl send the checkpoint file full name, not the
-
01:21:41 directory and it generated the download command like this: go to the download folder where
-
01:21:47 you want to download.
-
01:21:48 So let's say I want to download here.
-
01:21:50 Open cmd, right!
-
01:21:51 click, paste and hit enter and that model file will be downloaded into your computer
-
01:21:57 with a great speed like this as you can see.
-
01:22:00 It is downloading with 70 megabits per second and my maximum internet is 100 megabits per
-
01:22:06 second.
-
01:22:07 So this will of course totally depend on how other users are currently using the Pot network.
-
01:22:14 Okay, meanwhile ControlNet file is downloaded and saved in the folder.
-
01:22:19 Let's verify it.
-
01:22:20 Go to the extensions sd web ui control net inside models.
-
01:22:24 I see the pth file.
-
01:22:26 Let's go back to the ControlNet and in here.
-
01:22:29 When we refresh models we should see it.
-
01:22:33 Yes it is here and there is also pre-processor, then upload your file into this canvas that
-
01:22:40 you want to use.
-
01:22:41 I will do a scribble.
-
01:22:43 I am going to use this file.
-
01:22:46 Let's set the canvas with and height like this, also set your target resolution.
-
01:22:50 I will use the native resolution of the provided image which is 866 and 684.
-
01:22:58 Then type your prompt here and you can use the any model from here.
-
01:23:03 Let's use Protogen model so my prompt is dragon, awesome, intricate, cinematic, artstation.
-
01:23:08 Let's type some negative low,bad, worse.
-
01:23:11 Hit generate.
-
01:23:12 Okay, we didn't get the output because we didn't enable the ControlNet.
-
01:23:16 Don't forget that.
-
01:23:18 And don't forget the check scribble mode, invert colors and now it is the map it it
-
01:23:25 generated and this is the output we got.
-
01:23:28 So you can play with different prompts and different models and generate different images.
-
01:23:35 It works pretty fast and pretty correct.
-
01:23:38 Just watch this video to learn more.
-
01:23:39 Actually, I have another control net video as well which is based on the native released
-
01:23:45 scripts from the official author.
-
01:23:47 You can also watch this video to learn even more about ControlNet.
-
01:23:51 Our SD 2.1 version training is going on.
-
01:23:55 However, it looks like there are some problems because generated image is not correct.
-
01:24:00 Okay, I have done a lot of research and looks like there is no way to do SD 2.1 version
-
01:24:07 768 pixels training with DreamBooth without using xformers.
-
01:24:15 I wanted to avoid xformers during training because it reduces the quality of the training.
-
01:24:21 However, 24 gigabytes VRAM is just not enough.
-
01:24:24 So we need to downgrade the xformers version to 0.0.14 I already have an excellent tutorial
-
01:24:33 video for that for windows installation, so now I will show it on unix on RunPod.
-
01:24:40 Alternatively, you can go to the browse servers and in here you can deploy a RunPod with 48
-
01:24:49 gigabytes VRAM or 40 gigabytes VRAM.
-
01:24:52 It is up to you, but they cost more.
-
01:24:55 Therefore, we will just downgrade the xformers version.
-
01:25:00 Now, follow me very carefully to learn how to downgrade xformers on RunPod io.
-
01:25:07 First close all of the running kernels and terminals.
-
01:25:11 Then inside python 3.10 folder, start a new terminal.
-
01:25:16 First, we are going to run this command.
-
01:25:19 Pip Uninstall torch torchvision.
-
01:25:22 Paste it and hit yes and hit yes.
-
01:25:25 Okay, it is uninstalled.
-
01:25:26 Then we are going to run pip Uninstall torch audio.
-
01:25:31 Paste it.
-
01:25:32 Okay, it is done.
-
01:25:33 Then we are going to use pip Uninstall xformers.
-
01:25:36 Hit yes and it is done.
-
01:25:39 You know?
-
01:25:40 Currently I am inside workspace venv lib python 3.10.
-
01:25:44 The folder where you are currently located makes huge difference.
-
01:25:49 Make sure that you are inside the same folder.
-
01:25:51 You can also apply this to SD 1.5 version as well.
-
01:25:56 It is just same thing.
-
01:25:57 Then we are going to install torch vision.
-
01:25:59 Just copy this and paste it and hit enter.
-
01:26:02 Okay, I got error.
-
01:26:04 It says that there is no space left on the device because currently we started with five
-
01:26:11 gigabyte space for runtime.
-
01:26:14 Therefore, I will stop the pod like this.
-
01:26:17 I will edit the disk space from.
-
01:26:21 Click here.
-
01:26:22 More actions.
-
01:26:23 Click edit pod and in here in increase the container disk size.
-
01:26:27 Save it, run it, start it, and reconnect to Jupyter lab.
-
01:26:32 Enter inside the same folder venv lib, python 3.10 open terminal and make sure that you
-
01:26:41 run all of the commands once again to be sure.
-
01:26:44 Pip uninstall hit yes! if they are installed once again and pip uninstall torch audio,
-
01:26:52 then pip uninstall xformers.
-
01:26:54 Okay, it is done, then we will install this one.
-
01:26:58 As you can see, I have changed it because this is the one that is working.
-
01:27:02 Copy paste and hit.
-
01:27:03 Enter and it is going to install.
-
01:27:06 So once the full version of 0.0.17 is released, it will work with DreamBooth.
-
01:27:11 Currently this is a development version as you can see and it is installed.
-
01:27:16 Now we are ready to run our web UI as usual and it should support DreamBooth training
-
01:27:22 with xformers.
-
01:27:23 Before starting, I am going to edit xformers command line arguments minus minus xformers
-
01:27:29 and I am going to add back the full precision minus minus no half and minus minus precision
-
01:27:37 full and minus minus no half vae.
-
01:27:41 Save it, run on a different port, shut down all of the terminals start a new terminal,
-
01:27:47 relaunch the Web UI like this.
-
01:27:49 Okay, so our application is now starting with 0.0.17.dev 448 version for xformers and these
-
01:27:59 are the torch, torch vision, diffusers, and other versions.
-
01:28:03 Okay, it is started now.
-
01:28:04 Time to test whether it is working correctly or not for SD 2.1 DreamBooth training: okay,
-
01:28:12 I am loading my model, load settings and in here.
-
01:28:16 Let me show you quickly the latest settings.
-
01:28:19 So let's make the amount of time to pause between epochs zero.
-
01:28:23 I will save for every 20 epochs.
-
01:28:26 I am unchecking gradient checkpointing.
-
01:28:28 I will make learning rate as default.
-
01:28:30 Actually, let's try it.
-
01:28:32 Okay photo of ohwx man by tomer hanuka for sanity prompt and in advanced tab: now this
-
01:28:38 is important.
-
01:28:39 I will use EMA and in the mixed position, I am going to use fp16.
-
01:28:44 Some cards also supports bf16, but to be sure use fp16.
-
01:28:49 And when you hover your mouse it also says you that required when using xformers and
-
01:28:55 in here I am going to use xformers.
-
01:28:56 This is important.
-
01:28:57 Cache latents: okay, then go to the concepts tab.
-
01:29:02 They are set.
-
01:29:03 Everything is looking good and in saving, generate a ckpt file when saving during training
-
01:29:09 and hit train.
-
01:29:10 By the way, we should have clicked save settings before, but I think it is automatically saved.
-
01:29:16 If it doesn't work right away, just click save settings then hit train.
-
01:29:20 Okay, let's watch the terminal.
-
01:29:22 I hope that we won't get any more.
-
01:29:24 Uh, out of memory error.
-
01:29:26 Okay, it is killed so I will test one more time.
-
01:29:30 Refresh the Gradio, DreamBooth select model load settings.
-
01:29:35 Now this time I will set gradient checkpointing because it looks like necessary.
-
01:29:40 Fp16 use EMA and yes, everything is same and let's try again with save settings.
-
01:29:47 Train: okay, we got another error so this time I won't use EMA.
-
01:29:51 Refresh the interface DreamBooth, model load settings uncheck gradient checkpointing and
-
01:29:58 uncheck use EMA.
-
01:29:59 This is significantly increasing the VRAM usage.
-
01:30:02 Save settings hit train okay.
-
01:30:04 Finally, the training has started and now time to wait and see how well it is learning
-
01:30:10 and training.
-
01:30:12 The Gradio is still responsive.
-
01:30:13 That is very good and it is using this much of GPU memory so you see how much GPU memory
-
01:30:20 usage the EMA is increasing when we check the EMA option.
-
01:30:27 Meanwhile, SD 2.1 version training continues.
-
01:30:29 I will explain what is fine tuning with DreamBooth.
-
01:30:34 Okay, before I show how to do fine tuning.
-
01:30:37 We got an error during the SD 2.1 version training at the 400 steps which means when
-
01:30:44 it is generating a ckpt from the 20th epoch checkpoint.
-
01:30:49 Therefore, I will restart the training with one change one parameter, change load settings,
-
01:30:56 and go to the settings tab and enable gradient checkpointing.
-
01:31:01 The rest is same like this so it should just work fine this time I think.
-
01:31:07 Save settings hit train okay, this time we didn't get any error.
-
01:31:11 During SD 2.1 version training, we got sample yes, somewhat similar.
-
01:31:17 This is the first one at the 20th epoch and we got our sanity prompt as well.
-
01:31:21 This is the loss rate which is very erratic as you can see and this is the VRAM usage
-
01:31:27 like this.
-
01:31:28 Now I can start showing you fine tuning.
-
01:31:30 I have opened my 1.5 version RunPod so what is the difference of fine tuning.
-
01:31:38 In the fine tuning we are not going to use classification images and we are going to
-
01:31:44 use file words.
-
01:31:45 Fine tuning is basically using a lot of good images with proper captions and not using
-
01:31:53 any classification images.
-
01:31:54 The rest is same.
-
01:31:56 So every one of the keywords, every one of the tokens in the captions of the images will
-
01:32:02 be trained and they will become like the images that you use for fine tuning.
-
01:32:09 First of all, we need to process image files and add captions to them.
-
01:32:15 So go to the training tab, go to the preprocess images, set the source directory.
-
01:32:20 I don't have a data set for fine tuning a good data set you need a lot of images you
-
01:32:24 need so I will use my own pictures that I used for training and set a destination like
-
01:32:32 training captioned and in here use BLIP for caption and if those images are not 512 and
-
01:32:41 512 pixels.
-
01:32:42 If you are going to fine tune SD 2.1 version with 768 pixels then you need to change these
-
01:32:49 resolution as well.
-
01:32:50 You can also crop them with autofocal point crop but manually cropping them and preparing
-
01:32:57 them is better and then click preprocess.
-
01:33:00 When the first time you run it.
-
01:33:01 It will download the BLIP model from internet.
-
01:33:04 Okay, preprocessing has been completed now.
-
01:33:07 Training captioned folder is generated.
-
01:33:10 Now you see there are txt files named same as the image file.
-
01:33:16 When you open them, you will see this captioning.
-
01:33:19 So what does this mean.
-
01:33:21 In the fine tuning all of these words, these tokens will be improved by the image they
-
01:33:30 are same named.
-
01:33:32 So all of these words will be improved towards this image.
-
01:33:37 This is what is fine tuning.
-
01:33:39 Let's say you want to improve castle images, then you should have good castle images and
-
01:33:44 inside their description, you should have castle word.
-
01:33:48 And if you want to associate those pictures with other words such as beautiful, intricate,
-
01:33:53 high quality, then you should also put them.
-
01:33:56 So put here whatever the words that you want to improve in your model with related to the
-
01:34:03 picture they are associated with, and then once you prepared good captions and images
-
01:34:09 inside your folder, copy the path of the new folder, go back to your DreamBooth tab and
-
01:34:15 make this setup like this in concepts, workspace, training captioned data directory.
-
01:34:21 Now this is important.
-
01:34:23 In the prompt just type [filewords] and nothing else.
-
01:34:28 This means that whenever it is training that particular image, it will load whatever is
-
01:34:36 written inside here and replace instance prompt with it.
-
01:34:40 That's it.
-
01:34:41 So this will be equal to this prompt for this particular image that is going to train.
-
01:34:49 In class prompt we are not using any classifications or class prompt.
-
01:34:53 In the sample prompt, you can use the [filewords] to see what kind of images it is generating
-
01:34:59 and make sure that class images per instance is zero.
-
01:35:03 Because we don't want to try to keep the previous context of the model, we want its underlying
-
01:35:10 context, latent space to be improved.
-
01:35:14 And that's it everything else is same.
-
01:35:16 So for fine tuning you need a lot of good images, good quality images with good captions.
-
01:35:23 Those captions will be improved.
-
01:35:25 It will also improve the Unet of the model so it will become overall better and overall
-
01:35:30 cooked we can say.
-
01:35:33 Because if you show less number of images than it was trained on, it will lose a lot
-
01:35:38 of the contextual knowledge it has.
-
01:35:42 Therefore, these cooked custom models are not good to train your faces on them because
-
01:35:47 they don't have as much as information as these 1.5 pruned ckpt have.
-
01:35:52 For example, this model was trained on 5 billion images as far as I know of.
-
01:35:59 However, those custom models may be trained on 1000 images, one maybe 10 000 images.
-
01:36:06 So their Unet has become like those 10 000 images instead of being trained on 5 billion
-
01:36:14 images.
-
01:36:15 That is why they are so good, but they have much lesser knowledge in their underlying
-
01:36:21 context in their latent space.
-
01:36:23 So this is basically fine tuning how it is done.
-
01:36:27 If you want to be exactly same as Stable Diffusion training that the official training.
-
01:36:33 You can also remove text encoder training with setting this parameter as zero.
-
01:36:40 So with this way the tokens won't be improved.
-
01:36:44 Only Unet will be improved.
-
01:36:46 However, you don't want that for fine tuning.
-
01:36:49 This is more like using hundreds thousands of images and training from scratch your model.
-
01:36:57 So you should keep it perhaps like one and train Unet as well.
-
01:37:01 So you will train both text encoder and the Unet and improve all of those keywords together.
-
01:37:09 Hopefully I will make another very technical video about how training works, what is Unet,
-
01:37:16 what is text encoder, how they are being changed during training, and it will explain a lot
-
01:37:22 of the questions that are not very well answered in the community.
-
01:37:27 So stay subscribed.
-
01:37:29 Open notifications to not miss it.
-
01:37:31 So let's check out our 2.1 version training.
-
01:37:34 Okay, our sanity prompt already looks like lost its stylizing ability and the sample
-
01:37:41 is not also looking very good.
-
01:37:43 Uh, however, I have seen that it was learning so let's open the directory.
-
01:37:47 Okay, inside DreamBooth, inside samples, let's look at each one of the sample.
-
01:37:53 So this is the 20 epoch.
-
01:37:54 Yes, it has a resemblance.
-
01:37:56 It is not very good.
-
01:37:57 This is the 40 epoch.
-
01:37:59 Very minor resemblance.
-
01:38:01 Let's check out the sanity prompt.
-
01:38:03 The sanity prompt is much better.
-
01:38:05 So this is somewhat similar to me, but stylized in Tomer Hanuka style.
-
01:38:10 So the sanity prompt of the 60 epoch is not good at all.
-
01:38:14 It lost its stylizing.
-
01:38:17 The sample is also not very related, but this is SD 2.1 so it is harder to train and obtain
-
01:38:23 good images.
-
01:38:24 So you see this is the 80 epoch.
-
01:38:26 This is almost as like me.
-
01:38:28 Let me show you for comparison.
-
01:38:31 With 80 epoch 2.1 version, it is started to learning my face very well.
-
01:38:37 Let's check out the sanity prompt.
-
01:38:39 However, sanity prompt also lost its ability to stylize so our learning rate could be very
-
01:38:46 high.
-
01:38:47 Perhaps we should try half of it.
-
01:38:49 Based on your training data set the learning rate may change, number of steps, number of
-
01:38:55 epochs that you need to do training may change.
-
01:38:58 So it is up to you to do multiple trainings and compare how well they are working with
-
01:39:05 x/y/z plots as I have shown.
-
01:39:07 However, the training is working very well.
-
01:39:10 It is learning the subject very well so we managed to make it work very well for SD 2.1
-
01:39:17 version 768 a model.
-
01:39:21 Let me show you the parameters once again.
-
01:39:24 So I will slowly scroll down and you will be able to see all of the settings.
-
01:39:30 This totally depends on your learning rate and how many number of training images you
-
01:39:35 use.
-
01:39:36 You should also save multiple checkpoints during training and compare them: batch size
-
01:39:41 one and gradient accumulation one.
-
01:39:42 If you increase this, it will increase significantly your VRAM usage.
-
01:39:47 Also, we can't say bigger batch size is better.
-
01:39:50 It's a debated topic.
-
01:39:52 Mini batches versus full batches.
-
01:39:54 These two are checked.
-
01:39:55 Otherwise, we are getting VRAM error on 24 gigabyte.
-
01:39:58 This is my current learning rate.
-
01:40:01 This may be fast, so you may try half of it or even lower.
-
01:40:05 This is the resolution.
-
01:40:07 This is the sanity prompt to see how well it stylized.
-
01:40:10 So don't check EMA because you will get error VRAM error even when using xformers.
-
01:40:16 Use 8-bit adam, use fp16 to be sure that it is supported on your graphic card.
-
01:40:22 Use xformers, cache latents, train Unet, train text encoder and these other things are just
-
01:40:29 default.
-
01:40:30 Okay, now I will show you how to install and run Kohya Lora training Kohya GUI on RunPod.
-
01:40:37 To do that we are going to use Kohya ss linux branch.
-
01:40:41 To do that we are going to use kohya ss linux fork of the official repository of kohya ss.
-
01:40:48 This is modified to run on linux.
-
01:40:51 So first of all, we are going to clone the repository into our RunPod.
-
01:40:56 So this is my 1.5 RunPod.
-
01:40:58 I am inside workspace.
-
01:41:00 I have closed everything.
-
01:41:02 Open a new terminal, copy paste the git clone command.
-
01:41:05 It will clone into the kohya ss linux folder, then move into the kohya ss linux, type cd
-
01:41:12 ko type tab.
-
01:41:14 Hit enter and now I am inside kohya ss linux.
-
01:41:18 Then we will generate virtual environment folder with this command.
-
01:41:23 Copy it, hit enter inside this folder.
-
01:41:26 Okay, it is generated.
-
01:41:28 Let's also move it in here.
-
01:41:30 Now we will run the next command which is for activating and entering inside that virtual
-
01:41:36 folder.
-
01:41:37 Actually source venv command: copy paste it.
-
01:41:40 Hit.
-
01:41:41 Enter now you see venv here.
-
01:41:44 That means that currently actually we are running on the newly generated virtual environment
-
01:41:50 folder.
-
01:41:51 Next, we are going to install requirements.
-
01:41:53 This is only one time necessary.
-
01:41:56 The requirements file is located inside here and currently we are also inside that folder
-
01:42:01 so it should work.
-
01:42:02 The requirements installation may take some time.
-
01:42:06 These installations will not affect your other installations because everything being installed
-
01:42:12 here will be only installed inside this folder.
-
01:42:16 Okay, we got an error that says no space left on the drive so I will just.
-
01:42:21 I will just close the RunPod with stop pod and I will increase the container disk size
-
01:42:26 to 10 gigabytes.
-
01:42:28 To do that, click here, edit pod and run it once again.
-
01:42:31 Start then click connect.
-
01:42:33 Connect to Jupyter lab.
-
01:42:35 Okay, it is still being getting launched so be patient.
-
01:42:40 Okay notebook started once again so I will just delete the venv folder rm -r, venv.
-
01:42:49 So I will start from beginning.
-
01:42:51 Python minus m, venv venv then source command activate then install command.
-
01:42:58 It will install the requirements.
-
01:43:00 Okay, all requirements have been installed.
-
01:43:03 As author here noted, it requires python 3.10 and it doesn't work on 3.11 since the RunPod
-
01:43:12 runs on 3.10.9 for Stable Diffusion, it is just fine.
-
01:43:18 Then we will set accelerate config.
-
01:43:20 I am copying this pasting in here.
-
01:43:23 We are still inside that venv folder.
-
01:43:27 So now it will ask us bunch of questions.
-
01:43:29 Select this machine hit, enter select, no distributed hit enter type no to this question,
-
01:43:36 then type no to this question as well.
-
01:43:39 And type no to this question as well.
-
01:43:42 And type all for this question.
-
01:43:45 And do you wish to use fp16 or bf16?
-
01:43:49 select fp16.
-
01:43:51 It will speed up your training and also use lesser VRAM.
-
01:43:56 Okay and everything is ready.
-
01:43:58 Then we are currently activated with source command so we don't need to run this again.
-
01:44:03 We will just run this command and it should start our GUI.
-
01:44:08 Okay, it is running on localhost so therefore we need to run it on shared link.
-
01:44:15 To enable public Gradio link as we are using in Web UI.
-
01:44:19 Open the kohyagui.py go to the interface launch tab here and add this comma Share true, save
-
01:44:28 it and start it once again.
-
01:44:30 So new terminal open first.
-
01:44:33 Activate the venv like this and just run this command and now it has given us a Gradio link.
-
01:44:40 When we run it, the famous Kohya GUI is loaded and ready to do training.
-
01:44:46 The training with it is another topic and I won't cover it in this.
-
01:44:50 Now I will stop my running RunPods and when I stop at them nothing will happen.
-
01:44:55 They will just remain as they are.
-
01:44:57 I can also start them without any GPU.
-
01:45:00 So from here you can select zero and you can start your RunPod to backup your data to download
-
01:45:07 your data without using any GPUs.
-
01:45:09 So when you run them on CPU the disk cost plus it uses 0.16 dollars per hour.
-
01:45:18 So it is still costing something.
-
01:45:20 I think it is costing half of the original GPU price.
-
01:45:24 However, sometimes you may not get a GPU.
-
01:45:27 Sometimes all of the GPUs may be full on the RunPod so you will be have to run it without
-
01:45:33 a GPU.
-
01:45:35 So this is how you start it without using any GPU and there is also terminate.
-
01:45:41 When you hit terminate it will delete your RunPod permanently.
-
01:45:45 I already said this but I am saying it again.
-
01:45:49 So do not hit terminate button unless you are 100 sure because it will delete everything
-
01:45:55 on this RunPod and until you terminate and delete your RunPod it will continue using
-
01:46:02 your credits.
-
01:46:04 Currently I have two RunPods not running and it is using 0.056 dollars per hour.
-
01:46:14 So this is the cost of keeping these two RunPods on my account.
-
01:46:20 And when I delete them you will see this will get decreased.
-
01:46:23 Let's delete first one with terminate pod.
-
01:46:26 Okay and now this should get decreased.
-
01:46:29 Let's go to the my pods.
-
01:46:30 Let's refresh.
-
01:46:32 Okay now you see currently it is decreasing.
-
01:46:35 Zero point zero point twenty eight dollars per hour.
-
01:46:39 This is charged per minute by the way, not per hour and I will also delete this and it
-
01:46:45 will become zero.
-
01:46:47 And now my credits are remaining as they are until I start another pod.
-
01:46:52 There is one final thing that I want to show you cloud sync button here.
-
01:46:57 So with cloud sync you can synchronize your data in your server to these cloud services
-
01:47:04 and there is a great tutorial on the RunPod blog.
-
01:47:08 I will share this link into the description as well so you can read here and set up your
-
01:47:15 cloud storage and set a synchronization with your run pod and everything generated in your
-
01:47:22 RunPod will be synchronized with your cloud.
-
01:47:25 Also you can use the runpodctl that I have shown multiple times to download your data
-
01:47:32 to upload your data.
-
01:47:34 It is up to you that how you want to use it.
-
01:47:36 I think I have covered everything that I have mentioned in the beginning.
-
01:47:41 I hope you have enjoyed.
-
01:47:42 Please like, subscribe and leave a comment to do this tutorial also join our Discord
-
01:47:48 channel, ask any questions that you can't solve.
-
01:47:52 Also, please support us on Patreon.
-
01:47:55 It is really important.
-
01:47:56 The Patreon link and the Discord link will be in the comments and description.
-
01:48:01 All of the links we have used in this video will be in the description.
-
01:48:05 You can also find our Patreon page on our about tab of our youtube page youtube channel.
-
01:48:06 We have so far 26 patrons.
-
01:48:07 I thank them a lot thank them very much.
-
01:48:08 I hope you also become a Patreon.
-
01:48:09 Hopefully see you in another awesome video!
-
01:48:10 Thank you so much.
