Blazing Fast and Ultra Cheap FLUX LoRA Training on Massed Compute and RunPod Tutorial No GPU Required

Blazing Fast & Ultra Cheap FLUX LoRA Training on Massed Compute & RunPod Tutorial - No GPU Required!

Full tutorial link > https://www.youtube.com/watch?v=-uhL2nW7Ddw

Unlock the power of FLUX LoRA training, even if you're short on GPUs or looking to boost speed and scale! This comprehensive guide takes you from novice to expert, showing you how to use Kohya GUI for creating top-notch FLUX LoRAs in the cloud. We'll cover everything: maximizing quality, optimizing speed, and finding the best deals. With our exclusive Massed Compute discount, you can rent 4x RTX A6000 GPUs for just $1.25 per hour, supercharging your training process. Learn how to leverage RunPod for both cost-effective computing and permanent storage. We'll also dive into lightning-fast uploads of your training checkpoints to Hugging Face, seamless downloads, and integrating LoRAs with popular tools like SwarmUI and Forge Web UI. Get ready to master the art of efficient, high-quality AI model training!

🔗 Full Instructions and Links Written Post (the one used in the tutorial) ⤵️

▶️ https://www.patreon.com/posts/click-to-open-post-used-in-tutorial-110879657

00:00:00 Introduction to FLUX Training on Cloud Services (Massed Compute and RunPod)

00:00:45 Overview of Platform Differences and Why Massed Compute is Preferred for FLUX Training

00:02:01 Using FLUX, Kohya GUI, and Using 4x GPUs for Fast Training

00:03:08 Exploring Massed Compute Coupons and Discounts: How to Save on GPU Costs

00:05:35 Detailed Setup for Training FLUX on Massed Compute: Account Creation, Billing, and Deploying Instances

00:06:59 Deploying Multiple GPUs on Massed Compute for Faster Training

00:08:53 Setting Up ThinLinc Client for File Transfers Between Local Machine and Cloud

00:09:04 Troubleshooting ThinLinc File Transfer Issues on Massed Compute

00:09:25 Preparing to Install Kohya GUI and Download Necessary Models on Massed Compute

00:10:02 Upgrading to the Latest Version of Kohya for FLUX Training

00:11:02 Downloading FLUX Training Models and Preparing the Dataset

00:11:53 Checking VRAM Usage with nvitop: Real-Time Monitoring During FLUX Training

00:13:33 Speed Optimization Tips: Disabling T5 Attention Mask for Faster Training

00:17:44 Understanding the Trade-offs: Applying T5 Attention Mask vs. Training Speed

00:18:40 Setting Up Multi-GPU Training for FLUX on Massed Compute

00:18:52 Adjusting Epochs and Learning Rate for Multi-GPU Training

00:22:24 Achieving Near-Linear Speed Gain with 4x GPUs on Massed Compute

00:24:34 Uploading FLUX LoRAs to Hugging Face for Easy Access and Sharing

00:24:56 Using SwarmUI on Your Local Machine via Cloudflare for Image Generation

00:26:04 Moving Models to the Correct Folders in SwarmUI for FLUX Image Generation

00:27:07 Setting Up and Running Grid Generation to Compare Different Checkpoints

00:30:43 Downloading and Managing LoRAs and Models on Hugging Face

00:33:35 Generating Images with FLUX on SwarmUI and Finding the Best Checkpoints

00:38:22 Advanced Configurations in SwarmUI for Optimized Image Generation

00:39:25 How to Use Forge Web UI with FLUX Models on Massed Compute

00:39:33 Setting Up and Configuring Forge Web UI for FLUX on Massed Compute

00:40:03 Moving Models and LoRAs to Forge Web UI for Image Generation

00:41:15 Generating Images with LoRAs on Forge Web UI

00:44:38 Transition to RunPod: Setting Up FLUX Training and Using SwarmUI/Forge Web UI

00:45:13 RunPod Network Volume Storage: Setup and Integration with FLUX Training

00:45:49 Differences Between Massed Compute and RunPod: Speed, Cost, and Hardware

00:47:19 Deploying Instances on RunPod and Setting Up JupyterLab

00:48:05 Installing Kohya GUI and Downloading Models for FLUX Training on RunPod

00:48:48 Preparing Datasets and Starting FLUX Training on RunPod

00:51:55 Monitoring VRAM and Training Speed on RunPod’s A40 GPUs

00:56:42 Optimizing Training Speed by Disabling T5 Attention Mask on RunPod

00:58:20 Comparing GPU Performance Across Platforms: A6000 vs A40 in FLUX Training

00:58:38 Setting Up Multi-GPU Training on RunPod for Faster FLUX Training

00:58:58 Adjusting Learning Rate and Epochs for Multi-GPU Training on RunPod

01:03:41 Achieving Near-Linear Speed Gain with Multi-GPU FLUX Training on RunPod

01:05:46 Completing FLUX Training on RunPod and Preparing Models for Use

01:05:52 Managing Multiple Checkpoints: Best Practices for FLUX Training

01:06:04 Using SwarmUI on RunPod for Image Generation with FLUX LoRAs

01:08:18 Setting Up Multiple Backends on SwarmUI for Multi-GPU Image Generation

01:10:50 Generating Images and Comparing Checkpoints on SwarmUI on RunPod

01:11:55 Uploading FLUX LoRAs to Hugging Face from RunPod for Easy Access

01:12:08 Advanced Download Techniques: Using Hugging Face CLI for Batch Downloads

01:15:16 Fast Download and Upload of Models and LoRAs on Hugging Face

01:17:14 Using Forge Web UI on RunPod for Image Generation with FLUX LoRAs

01:18:01 Troubleshooting Installation Issues with Forge Web UI on RunPod

01:23:25 Generating Images on Forge Web UI with FLUX Models and LoRAs

01:24:20 Conclusion and Upcoming Research on Fine-Tuning FLUX with CLIP Large Models

Video Transcription

00:00:00 Greetings, everyone. Today I am going to show you how you can train FLUX and use FLUX on
00:00:07 cloud services if you don't have a powerful GPU or if you want to speed up your training.
00:00:12 With Massed Compute and also RunPod, you will be able to use the Kohya GUI and train amazing
00:00:19 FLUX models in under 1 hour by only using $1.25 per hour by using 4x GPU. 4x GPU is
00:00:28 not mandatory. You can also use 1x GPU, but I will show you how you can properly use multiple
00:00:36 GPUs to speed up your training. Not only that, I will show how you can start SwarmUI in RunPod or
00:00:41 in Massed Compute and use it on your computer, generate images very fast, do grid generation,
00:00:47 and compare your checkpoints very fast to decide the best checkpoint, both on Massed Compute and
00:00:55 RunPod. So I am going to show everything on both platforms. I will show how to rent multiple GPUs
00:01:00 and do training on multiple GPUs or on a single GPU. But this is not all. I am also going to show
00:01:06 you how to upload and download your checkpoints, your training models very fast to Hugging Face,
00:01:14 uploading 12GB these LoRA files to Hugging Face took only 2 minutes with my amazing scripts.
00:01:21 Downloading them doesn't take much longer as well. So if you want to learn how to train
00:01:26 FLUX and use FLUX privately on cloud providers, this is the tutorial that you need. Moreover,
00:01:32 I will show how to install and use Forge Web UI's latest version as well. So either by using the
00:01:39 amazing SwarmUI or by using the Forge UI, you will be able to use your generated LoRA checkpoints
00:01:47 very fast and very efficiently on both RunPod and Massed Compute platforms. But please, before
00:01:54 watching this tutorial, make sure you have watched the main FLUX LoRA training Windows tutorial
00:02:01 because I have covered all of the details there. There will be fewer details in this tutorial. So
00:02:07 make sure to watch that one, then watch this one to learn everything perfectly. As usual,
00:02:12 I have prepared very detailed post, instructions where you will find all of the information and
00:02:19 the links that you need. I will begin by showing how to train and use on Massed Compute. However,
00:02:27 there is one requirement, both for Massed Compute and for RunPod, which is watching
00:02:32 this Windows tutorial, because I am not going to repeat everything that I have shown in this
00:02:37 tutorial. This tutorial has 74 video chapters. It is prepared very well. So please watch the
00:02:44 Windows tutorial to learn how to use Kohya in general, then watch this tutorial to learn how to
00:02:49 train and use FLUX on cloud services. So our latest configuration and the installers are shared in
00:02:57 version 21. When you are watching this tutorial, it may be a higher version. Usually, I will put it
00:03:03 at the very top and also in the attachments. Click this link to download it, extract it anywhere you
00:03:08 want. You can extract it even into your downloads. Let's extract it here, enter inside the extracted
00:03:14 folder, and you will see Massed Compute and RunPod instructions. I will begin with Massed Compute,
00:03:19 as I said, then next will be RunPod. So if you are interested in RunPod, you can just look at the
00:03:25 description of the video and jump to the RunPod section. However, I prefer Massed Compute because
00:03:30 of the several things that it has. So it is up to you to use either of them. So we will open
00:03:34 the Massed Compute FLUX instructions TXT file. All the steps that we are going to need are documented
00:03:41 here. First of all, you need to have a Massed Compute account. If you use this link to register,
00:03:45 I appreciate that. Let's use this link. Since I already have registered, it is already logged in.
00:03:51 Register and log in. Then you need to set up some billing. If you get some errors during this stage,
00:03:57 you can click here and chat with the support, but it is so straightforward. Probably you won't
00:04:03 need it. It also supports crypto payment as well. Then we go to the deploy here, and we are going to
00:04:09 deploy our cloud machine. So everything will run on a cloud, and it will not use our computer. We
00:04:15 are going to rent any number of GPUs that we want. In this tutorial, I am also going to show you
00:04:21 multiple GPU training to speed up the training. So I am going to rent 4 GPUs, and then I'm going to
00:04:27 select creators. This is super important. Select SECourses. This is our special image where Kohya,
00:04:34 SwarmUI, Forge Web UI, and a lot of things are installed. We have a special coupon. You see
00:04:39 currently it is $2.5 per hour, but I am going to enter our coupon, and it will become $1.25
00:04:47 per hour for an amazing system, which has 192GB RAM and 1024GB storage, because we are renting 4
00:04:58 GPUs. You don't have to rent 4 GPUs. You can also rent 1 GPU and train on that. When you rent 1 GPU,
00:05:03 it becomes 31 cents per hour for RTX A6000 GPU. This GPU has 48GB VRAM. This is just an amazing
00:05:11 price. This is also not a spot instance, so it is permanently assigned to you until you terminate
00:05:17 the machine. But since I'm going to show you how to do training on 4 GPUs at the same time to speed
00:05:22 up training, I am going to rent 4 GPUs. Everything is the same. When you rent 1 GPU, 2 GPUs,
00:05:28 4 GPUs, or 8 GPUs, it doesn't matter. Everything is the same. Just the configuration changes,
00:05:32 which I am going to explain. So after that, click deploy. You see currently I also have
00:05:36 another instance running with 8 GPUs. The coupon will not work with 8 GPUs. This is a special given
00:05:42 coupon for me by Massed Compute, but our coupon is valid up to 4 GPUs at the same time. So you
00:05:48 can also rent 2x, 3x, or 4 GPUs running at the same time with the same price. Just wait until
00:05:55 initialization is completed. For connecting to the remote machine I am going to use ThinLinc
00:06:01 client. Click here, download and install it. It is just so straightforward. Then open the
00:06:06 ThinLinc client like this. Before starting to use it, click options and go to the local devices,
00:06:13 uncheck all and click drives. Details. Add a folder on your computer where it will be shared. You can
00:06:20 set it to read and write, or read-only, or not exported. I am setting it to read and write so
00:06:24 I can transfer files. This synchronization doesn't work well for big files. So if you have big files,
00:06:30 don't use this. Use your cloud storage like OneDrive, Hugging Face, or Google Drive,
00:06:36 but for small files like transferring the scripts, installers, or your training images,
00:06:42 if they are not very big, it works very well. And don't worry, I am going to show you how
00:06:46 you can save on the cloud, on Hugging Face, your generated model checkpoints so that you can later
00:06:52 use them very easily. By the way, one thing about the ThinLinc client is that it has Windows, Mac,
00:06:59 and Linux versions. So install according to your operating system. Don't Forget that. The machine
00:07:05 has started. You see the status is running. So we are going to connect. Click here. So it is copied,
00:07:11 copy-paste it here. You see, you don't type HTTP or the port. This is just it. You use the Ubuntu as
00:07:18 Ubuntu. This is important. And copy the password, and just paste it and connect. There is also end
00:07:24 existing session. When you check it, it will close all of the applications on the server.
00:07:29 So be careful and continue. Machine starting. Click start. Don't wait. And the machine has
00:07:34 started. You will see several things here. You will notice, for example, you can see that we
00:07:39 have 881GB free hard drive. We have 189GB RAM and currently using only 4% of the CPU. You can also
00:07:49 right-click here. This is terminal. New window. This is really important to understand. Then
00:07:54 type nvitop like this, and you can see the GPU status. You should see as many GPUs as you have
00:08:01 started. I have currently 4 GPUs. So this machine is currently started and working very well. You
00:08:06 will notice that we have run updaters for SwarmUI, for OneTrainer, for Kohya, for SD Forge, and for
00:08:12 Automatic1111 Web UI. Then we also have Pinokio AI installed here. We have JupyterLab installed here,
00:08:18 and the starting buttons for these applications are also located here. So these are for updates,
00:08:24 and these are for starting. So how are we going to move our files here to use them? First of all,
00:08:30 I am going to copy the downloaded zip file and move it back into my synchronization folder,
00:08:35 which is Massed Compute here. I will paste it here. Then I will extract it. Right-click and
00:08:41 extract this zip file so you can extract it on any machine without needing any third party. Then
00:08:47 enter inside home, and in here you will see thin drives. This is the synchronization drive with
00:08:53 your computer. You can also log in to your Patreon account and download the zip file on this machine
00:08:58 as well. Just wait a little bit. It will fetch the file names. As I said, for transferring big files,
00:09:04 this is not good, but for transferring small files, yes, it works. And we have the zip file
00:09:09 here. Kohya GUI FLUX installer. Please copy this into your downloads folder or desktop.
00:09:15 Doesn't matter. Don't use anything inside the synchronization drive. Otherwise, you will get
00:09:20 permission-related errors, and you will see the copying status here. You see it is copying from
00:09:25 my computer to the downloads folder. Just wait for this copy operation to be completed. As you copy
00:09:32 more files, it will take longer, and this also depends on your network speed, of course. Okay.
00:09:38 You see the copied all the files to downloads. Then let's move to the downloads folder. Let's
00:09:43 enter inside the folder. First of all, we are going to upgrade Kohya to the latest version,
00:09:48 but we didn't use the upgrader icon here, which is you see Kohya update. Why? Because currently,
00:09:56 the FLUX training is not available in the main branch. Therefore, we are going to switch to the
00:10:02 accurate branch and use it. Therefore, I have Massed_Compute_Kohya_FLUX_Instructions.txt,
00:10:07 which we had opened. So open it inside the Massed Compute and copy this command. Just copy it,
00:10:14 right-click, copy or Ctrl+C and start a new terminal, new window, and paste it. You see it
00:10:20 gave me an error. Why? Because this terminal is not in the accurate folder. So what you need to
00:10:25 do is go back to the folder where you have copied files, home, downloads here, and in here, click
00:10:31 this three dots icon and start a new terminal. So it will start the terminal in the accurate folder,
00:10:37 right-click and paste, and hit enter. And this time, it will work. So this is going to upgrade
00:10:42 my Kohya to the latest version with accurate libraries and the accurate branch for the FLUX
00:10:48 training. But if you are going to train SD 1.5 or SDXL, you can just use the run update Kohya
00:10:54 and start using it. So meanwhile doing this, let's also download the necessary FLUX training models.
00:11:02 To do that, in the instructions, we have Massed Compute download models command here. So copy
00:11:08 this command, go back to the folder and start a new terminal here and paste it. This will
00:11:14 download the necessary models into your downloads model. If you copy something from your computer,
00:11:19 sometimes it may require several times copy-paste because there is a problem with ThinLinc client.
00:11:25 It may not sometimes copy the thing that you copied on your computer. So pay attention to
00:11:30 that. But when you copy something inside the Massed Compute, it always works. So this will
00:11:35 download the necessary models into the downloads folder. Here you see it started downloading. And
00:11:40 meanwhile, the other script is installing and upgrading Kohya to the latest version. So at this
00:11:47 point, just patiently wait for Kohya to start and the download to be completed. Alright, so
00:11:53 the files are downloaded, and also the Kohya has started. You can see running on local URL. Also,
00:12:00 it is automatically opened because I set it to. I will first start with a single GPU because
00:12:06 many of you may like to train on a single GPU. Then I will show how to train on multiple GPUs.
00:12:12 So on our Patreon post, we have the configuration for every config. The very best one is rank 1,
00:12:18 obviously, so I'm going to start training with it. To start training with it, go to the LoRA
00:12:23 tab. This is super important. Don't load into the DreamBooth tab, otherwise your config will
00:12:27 get corrupted. Configuration. Then click this icon to load it. This is running on a remote machine,
00:12:33 not mine. You can notice the ThinLinc client here. So once you click here, it will let you pick the
00:12:39 item. So go to the above folder. Since we copied into the downloads, let's enter inside downloads,
00:12:45 enter inside the folder, and we have the best configurations here. I'm going to load it. So it
00:12:51 has loaded everything for me. This is by default set for Massed Compute. You see FLUX 1 dev,
00:12:57 safetensors. This already exists there. The output name and everything. Everything is the same
00:13:03 as on Windows. If you have watched the tutorial, as I said, you will know by now. So I will also
00:13:08 quickly prepare my dataset to show you as well. My training dataset is here. I will copy it into
00:13:14 the Massed Compute drive. Since it is not big, it will work very well. You see, this is the dataset,
00:13:20 11 megabytes. Then it will appear in here. Let's refresh this folder. You can hit F5 to refresh
00:13:27 it. Wait for a new file to be updated. Then you see my dataset has arrived here. So I'm going to
00:13:33 set my dataset. ohwx, man. I will do 1 repeating. So output where you want to save it. Let's click here. You
00:13:41 can save it anywhere. Not in thindrive, though it is important. Let's save it into downloads.
00:13:46 And let's say FLUX train like this. Prepare dataset. I explain in detail what these are doing
00:13:53 in the Windows tutorial. So watch it and copy info to respective fields. However, I am going
00:13:58 to make my model checkpoints output directly to the SwarmUI LoRA folder so I will be able to use
00:14:05 them. You can also use Forge Web UI. I will show that too. So you see the output directory for the
00:14:11 training model. I am going to click here, go to the apps, and inside here you see Stable SwarmUI.
00:14:16 This latest SwarmUI, not Stable SwarmUI. Go to the models, select LoRA, and that's it. So they
00:14:22 will be saved inside here. Let's delete the logs. I don't need them. And we don't use regularization
00:14:28 images, and we are ready. So you can save your config, and before starting training,
00:14:34 you can click print training command to see whether there are any errors or not. And it says
00:14:39 that yes, training images are failing for some reason. Let's see. Maybe we didn't copy properly.
00:14:45 Dataset preparation. Parameters. No, it should be somewhere around here. Yes. The image folder is
00:14:51 supposed to be here. Maybe there was some error when preparing. Yes. Probably it failed to read
00:14:59 my ThinLinc drive because I didn't copy it. So if you encounter that error, don't get confused.
00:15:06 So what we need to do is first move our training files to the downloads folder, then prepare the
00:15:12 training data. So let's move it to the downloads folder. I'm not going to delete this part of
00:15:17 the tutorial because you may also encounter this error. Okay. So we are going to reset, re-prepare.
00:15:23 To re-prepare it, oh, I didn't give the downloads folder first, so that was my mistake. Maybe it
00:15:30 will work with ThinLinc drive too. Okay. Let's try from the ThinLinc drive first, then we can
00:15:35 try from actually let's go with the safe place so go to the downloads. Yeah. This is my training
00:15:40 images. Prepare training data. After that verify it from here. Yes. It says done copying for the
00:15:46 respective fields. Okay. It is set. Then I'm going to set the output folder again. So from here,
00:15:52 let's go to the apps, SwarmUI, models, and LoRA and delete the logs and save again and click the
00:16:00 print training command. And yes, it shows the setup. So first I will show as a single GPU,
00:16:08 then as a multiple GPU, as I said, let's just click start training. The configurations may get
00:16:13 updated when you're watching this. It may become better because I'm currently searching for better.
00:16:18 By the way, it shows that I have 11 images, which is wrong. Why? Because we didn't wait for copying
00:16:26 files to the downloads. And when I was preparing the dataset, it wasn't full. We can see that. Yes.
00:16:32 Now all files are here. So I'm going to manually move them. So copy them and go to the downloads
00:16:38 FLUX train, image. ohwx man, you see it lacking. So I will just paste it. So make sure that all of the
00:16:45 files are fully copied. Otherwise, it will be also corrupted and you will get an error. Always wait
00:16:52 for full copy. Yes. Now it should work. So let's just click start training again. I'm not going
00:16:57 to delete any of these parts because these parts are likely the parts that you may also encounter
00:17:03 problems. And so you will know what is the reason for the problem. And it is getting ready. Okay.
00:17:09 Now let's just wait for the training to start, and let's return back to nvitop, where we will
00:17:16 monitor the VRAM usage. So it is loading the model right now. So the training has started. Wait until
00:17:22 you get like 100 steps to see the final speed because in the beginning, it is not displaying
00:17:29 the accurate speed because it is displaying average speed. So I wait like 100 steps to get the full
00:17:35 speed of the training. Okay. It has been 50 steps, and it has gone as low as 8.5 seconds / it. If you find
00:17:44 this is still very slow, what you can do is stop training and disable apply T5 attention mask. This
00:17:51 will speed up training hugely with the trade-off of some quality degradation. Alternatively,
00:17:57 what you can do? In the Massed Compute deploy you can select a more powerful GPU like L40S. This
00:18:05 is almost equal to, maybe a little bit more powerful than RTX 4090. So go with L40S GPU,
00:18:13 and you will get much better speed compared to this one. However, it will cost you more,
00:18:18 and we don't have a coupon for that. Okay. Without applying T5 attention mask, you see the speed is
00:18:24 hugely improved now, 4.43 seconds per it. It is almost double speed, and with this way, it will
00:18:31 take less than 3.5 hours duration for 3000 steps, which is amazing. So you can disable this with a
00:18:40 little bit of trade-off of quality and get a huge speed, or you can enable it and just wait. And now
00:18:46 it is time to start training with 4 GPUs at the same time. So I will stop the training. We have
00:18:52 a configuration for 4x GPU training, so you can just use it if you want. Just load it and use it.
00:18:59 It is inside the configurations. Let me show you. You see, 4x GPU batch size 1, and 4x GPU
00:19:05 batch size 2. Batch size 2 slightly improves the training speed, but quality will be lower. But for
00:19:12 people who want to learn how to set up themselves, I'm going to show that. So what we need to do is
00:19:17 we are going to set the accelerate, what we need to set is the number of processes 2. All right,
00:19:22 this is a hack to the flow of the video. Because I just figured out something, when you are setting
00:19:30 multiple GPU training, make sure that the number of processes equals the number of GPUs you have.
00:19:38 When you set it that way, you are going to get almost exactly the same number of epochs. You see,
00:19:44 currently, I am training for 60 epochs on 4 GPUs. Therefore, a total of 240 epochs,
00:19:52 and I am getting 240 epochs. Currently, I am doing training for a client, and I have figured it out.
00:19:58 There is not much speed difference. However, what is the benefit of this? With this way,
00:20:04 you can set the save every epochs accurately. So I am going to save every 20 epochs a checkpoint,
00:20:12 and it will work as expected. So set this number of processes equal to the number of GPUs you have.
00:20:20 If you are training on 8 GPUs, set it to 8. If you are training on 6, set it to 6. If you
00:20:23 are training on 4, set it to 4. So this is the logic. This is mandatory for multi-GPU and set
00:20:31 the GPU ID. So we have 4 GPUs, therefore 0, 1, 2, 3. So I'm going to train on all of the 4 GPUs
00:20:39 from the accelerated part you don't need to set anything else. So what else changes? When you set
00:20:44 the number of GPUs from there? You need to reduce your epoch. So you need to divide 200 by 4, and
00:20:52 it becomes 50. It is still the same. You can save every n epochs like this, like 25 or like 20,
00:20:59 whatever you wish. And we will compare them later. I will show that, don't worry. And one other thing
00:21:04 changes, which is the learning rate changes. You need to calculate the new learning rate.
00:21:11 How? So there is not a single formula for that, but the formula is usually it is equal to like this:
00:21:18 new LR is equal to the number of GPUs batch size divided by 2 and old LR. So what does
00:21:27 this mean? I'm going to show you in a moment. This is one of the suggested ways. So our new
00:21:32 LR will become our initial learning rate was this. So it will become multiplied by 4, multiplied by 1
00:21:39 because batch size 1, and divided by 2. So it will become like this. There is not an exact formula,
00:21:46 as I said. You can just load the configuration file, but you can also use this. If we had 8 GPUs,
00:21:51 it would become this, or if the batch size were 2, it would become like this. You see, this is the
00:21:57 logic. So whatever the learning rate at my best configuration, you can set your new learning rate
00:22:03 like this. So let's change the learning rate to the new value here. And also we have a learning
00:22:10 rate here. We can also use the configuration directly, and you can set a new name. Let's
00:22:16 say like this, 4x GPU train. Okay. This will be the output name. Change to this and save. And let's
00:22:24 see the new speed. By the way, if you apply this, it will become slower again. So it is up to you.
00:22:30 If you don't want to get quality loss, you can apply it, but if you need speed, you cannot apply
00:22:35 it. So let's see the speed without applying it. This will slightly reduce the quality but hugely
00:22:41 improve the speed. It is totally up to you. It is a trade-off. If you want the best speed,
00:22:45 don't apply it. If you don't want the best speed, but the best quality, apply it. Okay. So what do
00:22:50 we see now on the screen? When you pay attention, you will see that it is doing 750 steps instead of
00:22:57 3000 steps because it divided the task into all 4 GPUs. Therefore, now at one step, we are actually
00:23:06 doing 4 steps. So you see the speed is 4.85 seconds per it. This speed gain is almost linear.
00:23:15 We almost got a speed-up of 4x. This is amazing. With SDXL, and last time I tested, this wasn't
00:23:22 the case. But with FLUX, we are almost getting a linear speed increase. This is just mind-blowing.
00:23:28 So you can just boot up 8 GPUs and you will get 8 times the speed with a minimal amount of loss
00:23:35 of quality. It will be almost the same quality. I will let this training be completed. It will
00:23:41 take a total of like 1 hour to train 3000 steps on FLUX AI. This is just amazing. And it will cost me
00:23:48 how much money? It will cost me only $1.25 per hour for training. So this is just amazing. This
00:23:57 is the most affordable, best quality training right now with a very high-speed training. So
00:24:03 instead of the other services, you can use Massed Compute, our coupon, and train very fast with the
00:24:09 maximum possible quality. My configurations will get hopefully updated to better versions.
00:24:14 I am testing the impact of training the text encoder clip large training. So the quality
00:24:21 will likely get better. After this training has been completed, I will also show how to use it on
00:24:28 SwarmUI and on Forge Web UI in Massed Compute. So let's just wait now. Alright. So the training has
00:24:34 been completed. Now I will show how you can use these generated LoRAs on Massed Compute and also
00:24:42 upload to Hugging Face to download later anywhere and use anywhere, like on your computer or in any
00:24:49 other cloud service provider. I am going to use SwarmUI, and we already have SwarmUI in our image.
00:24:56 So first run this to update it to the latest version. You see it is updating. As you see,
00:25:01 SwarmUI started with the most updated version. However, I am going to access it from my computer
00:25:08 browser to have more fluent usage instead of using it inside the ThinLinc client, you can
00:25:14 also use it inside the ThinLinc client, but it is better to use it on my computer. So in our post,
00:25:20 as you scroll down, you will see how to use it on SwarmUI. So you can watch the main tutorial.
00:25:27 I suggest that. Also, I have SwarmUI cloud tutorial. I suggest that. So you should watch
00:25:31 these tutorials to fully learn how to use it. However, I will show how to use it quickly,
00:25:36 but I want to use it on my computer. So copy this command. This is going to install Cloudflared,
00:25:42 and we are going to access it from Cloudflared. So close this terminal, start a new terminal,
00:25:47 right-click, and new window, paste the copied command. Okay. Looks like it didn't copy.
00:25:53 Sometimes this may happen. So right-click, new window, return back here, copy again.
00:25:59 Sometimes this happens with the ThinLinc client, paste it, hit enter. It will install the necessary
00:26:04 package. Okay. It is installed. Then copy this. This is going to generate a public URL that I can
00:26:10 use on my computer. Paste it. Okay. It didn't copy again. I hate when this happens. However,
00:26:16 there is no solution as far as I know. Okay. Paste again. Okay. This time it works,
00:26:21 and you see it started on localhost and also on a public URL. So open the public URL.
00:26:28 It will load. Okay. Then copy this link. Go back to your browser. Okay. You see currently
00:26:34 it is showing an error because the Adblock Plus is preventing it. So I will just refresh. And yes,
00:26:41 now I can use the SwarmUI that is running inside Massed Compute on my computer. So I prefer 30
00:26:48 steps. I have shown it in the Windows tutorial like this. Since this is a big GPU, I'm going
00:26:53 to also change the precision. So I enabled the advanced options in the sampling. I select the
00:27:00 UniPC and then I select my base model. So you see the models are not here. So let's move each one of
00:27:07 the files to the accurate folder. So cut this, go to the home apps inside the stable SwarmUI inside
00:27:15 models inside unet. This is where we put the dev model. Then inside clip, we are going to move
00:27:23 the T5 XXL, which is... Let's go to the downloads again. And clip is this one. Let's move, cut it,
00:27:32 move to the... Move back to clip, paste it. Yes, this may be a little bit of a task. I know you can
00:27:37 also use the downloader that I have. Then we are going to move the VAE file, cut it, go to the
00:27:42 home apps inside the SwarmUI. This is a one-time thing that you need to do. And you learn again,
00:27:49 models VAE go into the here, ea.safetensors, then go to the downloads. It will take just a minute.
00:27:57 Cut. This is the T5 XXL. Go back to home, go back to apps inside stable SwarmUI inside models. And
00:28:05 this goes into the clip here and paste it. I could paste both clip large and T5 XXL at the same time.
00:28:14 Then return back to your Swarm UI, refresh the models, and you see FLUX dev appeared. And now I
00:28:20 can set also FLUX guidance scale. I prefer 4. And in the advanced sampling, you can change this to
00:28:28 16-bit because this is a big GPU. So currently it will generate an image with a single GPU. However,
00:28:34 if you have rented multiple GPUs, go to the server, go to the backends. We are going to
00:28:39 add several backends. Also, edit this and add --fast. It is making it faster for newer GPUs.
00:28:46 So how are we going to edit? ComfyUI self-starting edit, copy this paste here, copy this paste here,
00:28:53 set the GPU ID 1, save. Let's add another one and another one. So copy this paste, paste,
00:29:00 copy this paste, paste, set the GPU ID 2, save, and set the GPU ID 3, save. So it is going to
00:29:08 let me use all of the GPUs with a queue system. First of all, let's generate an image with the
00:29:14 FLUX dev, and I have amazing prompts to test the checkpoints. So the test prompts are inside here.
00:29:21 Let's open the test prompts. I have eyeglasses, so I am preparing the eyeglasses, for example, here.
00:29:27 And let's use this one, and let's copy-paste it here, and let's generate an image. So currently,
00:29:34 it will not apply my LoRA, but I want to see the model generation, and it is going to use
00:29:40 segmentation. So what does this segmentation mean? It will auto-mask the face of the generated image,
00:29:46 then with 0.7 denoise, it will inpaint it. So this is how you can use the SwarmUI that is running on
00:29:54 Massed Compute on your computer. Actually, I have shown this in the main Windows tutorial.
00:30:00 So after this, all I need to do is apply the LoRA from here, and which checkpoint I should apply.
00:30:07 Let's refresh this. You see there are LoRAs, and it saved once every 20 epochs. However, I see
00:30:15 that it only trained up to 100 epochs for some reason. Let's return back to... Oh, by the way,
00:30:22 the T5 XXL model needs a certain naming, therefore it re-downloaded it, and the name has to be like
00:30:29 this. Therefore, it has re-downloaded it. So we need to rename this to this file name. Yeah,
00:30:35 that is an error we had. You can also do that. So the LoRAs we have are 5. Let's check out the logs,
00:30:43 the reason for this. Okay, so it trained up to 94 epochs. It was supposed to fully train it,
00:30:50 but for some reason, only 94 epochs. Okay, this is the image that it generates without our LoRA.
00:30:58 Let's use the 80 epoch LoRA. This should be a pretty good one, and I suggest to use FP16 T5 XXL
00:31:06 instead of the FP8, which it downloads by default. I explain all of this in the main tutorials,
00:31:13 so you really should watch them, and you can go to the server and logs to see what is happening. So
00:31:19 it is generating the image right now with 1.25 it per second, then it will inpaint the face,
00:31:25 and the image is generated with amazing quality. So how to find the best checkpoint. So go to the
00:31:32 tools, go to the grid generator, and in the first tab select LoRA. Search for LoRAs, fill all,
00:31:39 delete this "none" one, and then in the second, we are going to use a prompt like this, and I
00:31:45 am going to use test prompts without eyeglasses. This is formatted. There are no eyeglasses here,
00:31:50 so you can use this. I have the eyeglasses, so I am going to use grid formatted eyeglasses. For grid
00:31:56 formatting this is the separator. Just copy this, paste here, and give a name like test 1 here,
00:32:03 save the grid config like test 1, and generate grid. This time it will generate images on all of
00:32:10 the GPUs at the same time, so it will be much faster. Let's make this run, and meanwhile,
00:32:16 I will show you how you can upload your models to Hugging Face. So, for uploading models to Hugging
00:32:23 Face, I have an amazing tutorial here. So go to this link in the attachments, you will see
00:32:28 the version 6. This is the newest update. I just updated it. Move it back into your Massed Compute
00:32:36 synchronization folder, wherever you have it. It is here. Go back to your Massed Compute ThinLinc
00:32:42 folder from... Let's go to the new window. Let's go to the Thin Drives, Mass Compute,
00:32:48 and let's move the file into the downloads here. Then Ctrl+Alt+D to minimize everything.
00:32:56 Start the run JupyterLab interface. You need to have a Hugging Face account to upload there.
00:33:02 I already have a Hugging Face account. You can follow me here too. It is free. Hugging Face is
00:33:08 just amazing. I congratulate them. I thank them. They are amazing. Let's go to the access tokens.
00:33:14 I'm going to generate a new temporary. Let's say delete later and make it "write" and create a token.
00:33:21 Copy the token. This is important. Then you see the JupyterLab interface started in the ThinLinc
00:33:28 client. So in this interface, go to the downloads and double-click this notebook file. First of all,
00:33:35 we need to install. This is mandatory. Just click this cell. It will install everything to
00:33:40 the latest version. Wait until this cell execution ends. After that, copy-paste your token here like
00:33:48 this. Play this cell once. This is just one time necessary, and you can set the upload folder and
00:33:55 upload everything, which I'm going to show right now. So let's go to our page. Let's click here,
00:34:01 new model, make a model, then give any name. Let's say video tutorial Massed Compute, any
00:34:09 name. You can make it private so no one else will be able to access it. Then copy. This is the path,
00:34:15 and I am going to use this one. You see, very fast new upload. There is also a single file upload and
00:34:22 other ones. This will upload everything very fast to the repository. Okay, after we set the target
00:34:28 repository and make sure that it is model type, this is important. I updated the notebook file to
00:34:33 have by default model. It was dataset and verified the local folder path. Just click the play icon,
00:34:41 and it will start massive upload with massive speed. It is just amazing. We will see that
00:34:47 it will be completed within like 1 minute or 2 minutes for 12GB of files. Let's just
00:34:54 wait. We will also be able to see the progress here. It runs the upload in multi-threading,
00:35:00 and it is just mind-blowingly fast compared to the previous upload strategies that we have. I
00:35:07 just updated this file today to be perfect. So the upload has been completed. It took like 2
00:35:13 minutes. You can see the logs here. Then when I open my repository, I will be able to see it. But
00:35:21 this is in the ThinLinc client, so I can't see it. I need to open it on my computer. And when
00:35:27 I open it and check the files, yes, all the LoRAs arrived here. It took like 1 minute or
00:35:34 mostly 2 minutes to upload all the files. So how can you download them again in another instance
00:35:41 of Massed Compute or on your computer? On your computer, you can just click this to download to
00:35:46 your computer. But let's say you started another Massed Compute instance and you want to download
00:35:52 all of them. So for downloading all of them very fast, again, you install the requirements,
00:35:57 set your Hugging Face token, and in this cell, we have an amazing download script. So first, let's
00:36:03 copy the path again from here and just delete this part like this. Paste it. Make sure that it is
00:36:11 accurately copy-pasted, and it is like this. Then wherever you want to download, let's download it
00:36:16 into home/ubuntu/apps/models/stable_diffusion. You can download it to any folder and just click play,
00:36:22 and it is going to download everything into there. We can see it. You see it started
00:36:28 multi-download. It is really, really fast. We will see it completed in a few minutes.
00:36:33 And the download completed. It didn't update these messages, but once you see the download completed,
00:36:39 it means that it is completed. We can also verify that. So where did we download them? So
00:36:47 home/apps/stable_diffusion/web_ui/inside/models/inside/stable_diffusion.
00:36:52 Yes, all the files are downloaded. This is how you can upload and download very fast by using Hugging
00:36:59 Face with my specially made Jupyter notebook file. Let's return back to our tools, grid generator,
00:37:07 load grid config, and load config from here. It has already been completed. Let's open the grid
00:37:13 so all the images will appear here. Currently, we will see the comparison of all the checkpoints. It
00:37:23 is taking some time to load on my computer because I have a limited internet connection. Also,
00:37:28 I can say auto-scale to see everything in the viewport from here, you see, and the images
00:37:33 are getting loaded. This is the 20 checkpoint. It is under-trained. I can see that clearly. This is
00:37:39 decent. This is the 40 checkpoint. This is the 80 checkpoint, which is really, really good. So with
00:37:45 this way, you can compare the checkpoints and decide which checkpoint is the best
00:37:50 one. Probably 80 will be the best, maybe the last one. The last checkpoint may be better,
00:37:56 so all I need to do is just wait for generation to be completed. Probably not completed. Let's go
00:38:02 to the server logs, go to the debug, and we can see... Yes, it is still generating. We can see
00:38:08 the progress here. Okay, it says that very fast, and we are in painting faces as well. Okay. Yes,
00:38:15 this 80 epoch is really good, so it is up to you to decide which epoch you want. I am working on a
00:38:22 better workflow, better configuration. Hopefully, I will update the configurations once I have them
00:38:29 next week. Hopefully I am going to fully research the fine-tuning and fine-tuning will be many times
00:38:36 better hopefully. If you also use a better data set you are going to get better results than me,
00:38:41 especially with the expressions. This model was trained within 1 hour, actually less than 1 hour.
00:38:48 So the grid generation has been completed. It generated 195 images, each one was 48 steps and it
00:38:58 took only around 22 minutes. If your grid doesn't show everything just refresh the page and it will
00:39:06 show everything. Then decide the best checkpoint that you want: 20, 40, 60, 80 and the last one.
00:39:14 So it is up to you, it is personal to decide which one you like most and you can also generate more
00:39:19 frequent checkpoints and decide the very best one. As a last step, I am going to show you how you can
00:39:25 use the Forge Web UI on the Massed Compute. So we already have a Forge and Forge updater. First run
00:39:33 SD Forge update so you will get the very latest version of the Forge. So it started updating
00:39:39 everything. Then it will start the Forge both locally and also on the Gradio live share. We
00:39:45 are going to use with Gradio live share. So this is the latest Forge. You see currently my model is
00:39:51 not available yet, so I will go to the apps where I have the models. You can cut also or copy paste,
00:39:57 it doesn't matter, both work. So inside the unet we have the FLUX model. Let's copy it or let's
00:40:03 just move it to the Forge Web UI. It's inside apps, inside the sd web Forge web ui, inside
00:40:11 models and we put the model inside here, you see. Then we need to put the LoRAs here. Actually we
00:40:19 need to put all the models here first. So let's go to the apps and stable SwarmUI models and inside
00:40:28 clip we have clip large and T5 fp16. So let's copy both and it allows me to copy selection from
00:40:36 here. Let's go to the apps and let's go to the web Forge models and stable diffusion and select and
00:40:46 it will copy there. It's pretty fast. And as a last thing we need to copy the VAE file. I also have an
00:40:52 automatic downloader for the models if you want to just download but it will take time to redownload.
00:40:59 Inside VAE right click and copy or move to. Let's use move to, it is easier. SwarmUI apps Forge Web
00:41:07 UI models and VAE and that's it. And let's just copy actually this just moved there. Then click
00:41:15 refresh icon here and FLUX dev appeared. We are going to select VAE and we need to select the FLUX
00:41:22 from here. So the other things will also appear. Click here actually we need to re-refresh VAE text
00:41:27 encoder. Okay I think I was remembering the text encoder path inaccurately. So let's move to apps
00:41:36 Forge Web UI inside models. Yeah text encoder has a separate folder. So inside stable diffusion I
00:41:43 will move the text encoder. So clip large and the T5 right click and move to. So just models
00:41:50 and text encoder. Okay select then let's refresh and then yes. So select all these 3 and you don't
00:41:58 need to do nothing else. Go back to here first let's generate an image that we will generate our
00:42:04 LoRA. By the way this is running locally so let's connect from the Gradio live share, it will be
00:42:10 easier to use. So let's open the Gradio, copy the link, move back to my own browser. So the Forge
00:42:17 Web UI is loading. Okay so we have the prompts. Let's go to the folder we had. Let's go to the
00:42:25 prompts and let's open our prompts here. For example let's copy this one. I'm going to remove
00:42:32 segment because there is no segmentation here. Okay let's use this one. I don't change anything
00:42:38 else. I just make the Distilled CFG Scale 4 and generate. Now Forge is not as good as SwarmUI. It
00:42:46 is good with some quantized models but if you are using on a high VRAM machine it is not as fast as
00:42:53 the SwarmUI if you ask my opinion, especially when you use LoRA. With default generation it is fast,
00:43:00 not as bad. By the way we should also close the SwarmUI but I didn't close it and it doesn't have
00:43:08 automatic queue system for multiple generations. Okay so it is here and the image generated.
00:43:15 Actually let's go back to the SwarmUI and instead of closing its CMD I will just disable the back
00:43:21 end. So this will free up the RAM. Okay now how we are going to use the LoRA. Go to the LoRA
00:43:27 and refresh. Currently we don't have any LoRA so we need to move the LoRA files as well. So let's
00:43:33 go back to the apps SwarmUI inside the models inside LoRA just select everything right click and
00:43:42 move to. This move to is very useful. Go to the Forge models and LoRA and select. So it will move
00:43:51 every file immediately because it is move like cut and paste. Refresh the folder and LoRAs appeared.
00:43:57 For example let's use this LoRA and let's go back to generation and generate. Yes it's patched to
00:44:03 LoRA accurately and now it is generating the image. So this is how you can use the Forge
00:44:09 Web UI with your trained LoRAs. The Forge web UI is like Automatic1111 web UI. I assume that you
00:44:18 already know it and the image generated. Where it is generated? It is generated on our computer and
00:44:25 yes it is here. Currently it is not face inpainted but you can use the extensions and everything. So
00:44:32 this is it. I hope you have enjoyed. Now I am going to move to the RunPod tutorial part. Okay
00:44:38 now I will start showing how to do the same training on RunPod. I am assuming that you
00:44:44 might have skipped the Massed Compute part. So we download the zip file. If you haven't downloaded
00:44:50 yet it will be also in the attachments. Please also read this post very carefully always and
00:44:55 watch the Windows tutorial. Don't skip it. Enter inside the zip file extraction. You can extract
00:45:01 it with WinRAR or just Windows itself. Just right click and extract and you will see RunPod install
00:45:07 instructions. This is very important. Just double click it. It will give you all the instructions.
00:45:13 Please register with my link if you haven't registered yet. It is here. Then login. I assume
00:45:18 that you have registered. Sign up is free. Then you need to set up your billing at your billing
00:45:22 information. Then go to the Pods. Okay in here go to the deploy. You can use either community cloud
00:45:28 or secure cloud. You can also use network volume storage. I have a full tutorial for network volume
00:45:33 storage as well. If you are wondering it you can watch it. Network volume storage link will
00:45:38 be here. I'm going to update the zip file and the instructions txt file so you can just double click
00:45:43 and watch it. Then the selections here matters. You can pick any GPU that you want. We have a
00:45:49 configuration for each GPU but my suggestion for you would be like this. You can rent 4x A40
00:45:57 GPU. It is a very cheap one you see and it has 48 gigabyte VRAM. So it is pretty decent price.
00:46:04 It is not as good as Massed Compute but it is decent. And let's also see its training speed.
00:46:09 So I'm going to rent 4 GPU. You don't have to rent 4 GPU. You can even rent one RTX 3090 or one A4000
00:46:18 and you can train but I don't suggest them. Pick at least 40 gigabyte to get the maximum quality,
00:46:25 not the maximum speed but maximum quality. And the template selection matters. I train on Python 2.1
00:46:32 template. If you train on other templates it may not work. I cannot guarantee that. So select this
00:46:39 template to not have any issues. How you select it? Click here change template type PyTorch and
00:46:44 select the 2.1 version from here. You see it is CUDA 11.8 and then edit template. This is also
00:46:50 super important. Edit the template add port 7801. This is for SwarmUI. Make the volume disk bigger
00:46:57 because we are going to save checkpoints and download models like at least 200 gigabytes to
00:47:02 not have any issues. And you can set the container disk to 30 gigabytes to not have any issues as
00:47:07 well. I'm going to show both SwarmUI and Forge Web UI and how to use it after training with Kohya
00:47:12 GUI. So set the overrides and we are ready. Then click deploy on demand. After that go to the my
00:47:19 pods. So we have started 4x A40 GPU as I said. You can rent multiple more GPUs, more powerful GPUs.
00:47:27 You can also rent 1 GPU. All of them would work however the speed will change according
00:47:32 to the GPU that you have. This is an affordable configuration. It is only $1.4 per hour. It was
00:47:40 $1.25 on Massed Compute and I wonder the speed difference between two GPUs. So we are going
00:47:46 to see that. Okay wait until the connect button appears and it appeared. Click connect and click
00:47:51 the JupyterLab port 8888. Wait for this interface to load. If it doesn't load refresh this page
00:47:57 and click connect button again and the JupyterLab interface loaded. So go here and click this icon.
00:48:05 Go to your extracted folder and load everything. Okay just select everything. It is not a big file
00:48:11 so you will be able to upload all of them very quickly like this. Then in here find the RunPod
00:48:17 install instructions. First of all we are going to install Kohya GUI proper latest version. Copy this
00:48:24 part. Copy it Ctrl+C. Open a new terminal like here and paste it and hit enter and just wait.
00:48:30 Then meanwhile it is installing the Kohya first part you can go here. You see the models that we
00:48:37 are going to use. Copy them. Open a new terminal and paste it. So meanwhile it is installing Kohya
00:48:43 it will also download the necessary models to save your time. Meanwhile you can also upload
00:48:48 your training images. So click here to upload your training images. I suggest you to upload as a zip
00:48:54 file. So I will just right click my images here, zip them like this. Then click here. You cannot
00:49:00 upload folders from there. You can also use RunPodCTL. I also have a tutorial for that.
00:49:05 Go to the folder where your files are. Here they are here. Just upload it. In the bottom of screen
00:49:10 you will see uploading. So you need to wait here to be uploaded. Uploading from here is slower than
00:49:16 the RunPod CTL or you can upload files to the Hugging Face and directly download from there
00:49:20 with wget. The download speed of the RunPod is very poor compared to the Massed Compute.
00:49:26 That is also one another reason that I pick Massed Compute over RunPod. You see it is
00:49:31 only downloading with like 15 megabytes. It was like 150 megabytes on Massed Compute. The
00:49:37 installation speed is also slow. The Kohya was already installed. We just upgraded it on the
00:49:42 Massed Compute. We also need to install SwarmUI and Forge Web UI on the RunPod and they are both
00:49:48 installed in the Massed Compute. So since this is very slow pod I'm going to just terminate this.
00:49:54 All of the downloads what I'm going to do is first refresh here. We need to delete already downloaded
00:50:00 files. Open a new terminal rm -r. You can also fully wait. Flux1. Yes you need to delete these older
00:50:08 files because they are now corrupted if you don't wait proper download they will get corrupted. Okay
00:50:13 so let's see if we have other ones. Okay there's T5 too. Okay and do we have any other one? Download
00:50:20 started. So if your files get corrupted you need to delete them like this and re-download. Okay so
00:50:25 what I'm going to do is this one. This is a good alternative. Start download separately. Copy this.
00:50:32 Open a new terminal and start it. Then let's also copy this. Open a new terminal and start it. Make
00:50:39 sure that all downloads are fully completed. Copy this and start a new terminal and download
00:50:45 it. Okay the final one is here. Copy this. Open a new terminal and start it. Okay so it is going to
00:50:53 download every file separately and with this way we get a better speed. Also you see when you start
00:50:59 a download second time it may get a speed boost. I don't know why this happens but this time it is
00:51:05 60 megabytes per second instead of 15 megabytes. Okay nice. So we are downloading with a decent
00:51:13 speed now and the files are downloaded. The Kohya is getting installed right now. My images are also
00:51:20 uploaded you see here. I will right click and I will say extract archive. So they will be
00:51:25 extracted into here like this and we are going to use prepare dataset feature of the Kohya to
00:51:31 prepare our dataset. The installation on RunPod is really taking huge time. Still downloading model
00:51:37 and the installation is at this part. Still I have to wait it fully. Then we are going to execute the
00:51:43 second part. Okay the first part installation of the Kohya has been completed. It took more
00:51:49 than 20 minutes on this pod. It may be faster on some other pods. So now we are going to run
00:51:55 this second command. This is super important. Don't forget that. You need to run it on a new
00:52:01 terminal. So it is going to terminate running instance, update libraries and start again. So
00:52:08 this is mandatory. Don't forget it. Why I am doing two steps? Because I am upgrading scripts. I am
00:52:14 adding new stuff. So therefore this is mandatory to do and once FLUX becomes in the main repository
00:52:22 merged into the main branch master branch I am going to update my scripts. Don't you need to
00:52:27 worry about them. Just use the scripts as I show. So now it is starting the Kohya GUI. You see we
00:52:33 have 4 GPUs and how did I know the first part was completed? You see we had running on local URL.
00:52:41 So once you see this you will know that the first part has been completed. The installation of the
00:52:47 first part and the second part is now getting completed. The Kohya is starting. Usually the
00:52:53 hard drives of the RunPod are being very slow for me. That is another reason why I picked the Massed
00:52:59 Compute. Unless you rent a very powerful pod the hard drives will be slower. But the negative side
00:53:04 of the Massed Compute is that you don't have a permanent storage there but on RunPod you have
00:53:09 that. So that is the main advantage of RunPod. So the Kohya started. We can access it Gradio Live.
00:53:16 You could also access it by setting a port here 7861 because it starts this port by default but
00:53:25 accessing from the Gradio Live is also perfectly fine and safe. Okay so the interface has started.
00:53:32 It is same as using on Windows but I will show you how to set it up on the RunPod. First of
00:53:37 all this is the LoRA training therefore we are going to use LoRA tab. If you load the config
00:53:42 in the DreamBooth tab it will corrupt it. So go to the LoRA and select the config according to
00:53:48 your GPU. So let's see the configs are uploaded. No because it doesn't upload the folder. So click
00:53:55 here go back to your downloads folder and we have the best configurations here. I am going to start
00:54:00 with rank 1 file which is the best one. You see rank 1. So how we are going to give its path?
00:54:06 Right click and copy path then go to configuration put a backslash. Always put a backslash to the
00:54:13 beginning in RunPod like this. So this is the path then click this icon. It will load everything. If
00:54:20 it doesn't load click this icon. It will refresh. So you see everything is loaded. Now what we need
00:54:26 to change is we need to set the model path as a beginning. So our model is set here you see
00:54:32 this one. Right click copy path put a backslash here paste and we are set. We also need to set
00:54:38 our training images which I am going to show right now. So go to the dataset preparation. Type your
00:54:44 instance prompt and class prompt. I explain everything in the Windows tutorial. We use
00:54:49 repeating 1 because we don't use regularization images. Where I set my training images? They are
00:54:55 inside here. You can open each one of them to verify they are uploaded accurately. Then right
00:54:59 click copy path put it here like this. You see I put a backslash. It is one repeating. Destination
00:55:05 where I want to set my training images? Let's say workspace train folders like this and click
00:55:13 prepare training data. Check the CMD window and see that done creating. Then click copy info to
00:55:19 respective fields and we are set for this part. Which file name you want to give? Let's say test
00:55:25 one. So the output name will be test one. Then we also need to set the other file paths which
00:55:32 are let me show you. VAE path. So for VAE path this is the path. So copy path and put a backslash and
00:55:39 paste it. You can also do this. So I copy this paste it like this you see. Copy this paste it
00:55:45 like this because I have downloaded the files with the same names and everything is set. Currently
00:55:49 apply T5 attention mask is selected. This improves quality but reduces speed. So let's see the single
00:55:55 GPU speed first because you may be training with single GPU. Let's save and click start training.
00:56:02 You can also rent RTX 4090 and use a lower VRAM configuration like rank 3 and it will be faster
00:56:11 than the training on A40. The quality difference is minimum with rank 1 and rank 2. We are training
00:56:17 in 16-bit with the other ranks we are training in 8-bit. The very low ones this starting from rank
00:56:24 5 we are using a single layer so it is getting lower quality but the difference is not very big.
00:56:31 I explain everything in details in the Patreon post. So read the post very carefully. So you
00:56:37 see it is loading the model files. It is going to start training. To monitor the VRAM usage I will
00:56:42 open a new terminal. I will install pip install nvitop like this. Then I will type nvitop like
00:56:50 this to start it. nvitop and it is started. We can see the VRAM usages of the GPUs right now. So
00:56:56 it is starting on a single GPU on the first one right now and we can monitor the status of the
00:57:03 training here. So you can verify the folders are they accurate the captions everything. I explained
00:57:10 all of this in details in the Windows tutorial. That is why you should watch it. The initial model
00:57:15 loading on RunPod is also always slower. You see this is how fast it loads. It is going to load
00:57:21 like 28 gigabytes and this is the speed very very slow. That is why I also prefer Massed Compute but
00:57:28 it is up to you. You can rent a much more powerful pod on RunPod and get much better speeds. Okay so
00:57:34 the training has started. Initially the speed that it displays will not be very accurate. Wait until
00:57:43 at least 100 steps to get the more accurate speed of the per step. So currently it is 10.30 seconds
00:57:52 per it. So let's just wait a little bit to see the accurate speed. Okay it went down to like 10
00:57:59 seconds per it and it is still very slow. So how you can speed it up? You can stop training and
00:58:06 disable apply T5 attention mask. This will hugely speed up the training with a little bit of quality
00:58:13 loss. So it is it's trade-off. Let's see the new speed. So with apply T5 attention mask is off we
00:58:20 are getting over 100% speed up. It is now 4.85 seconds per it. It is slower than RTX A6000 on
00:58:32 Massed Compute but this is a decent speed and can you further speed it up? Yes that is what we are
00:58:38 going to do now with multi GPU training. So you can directly load up the 4x GPU batch size 1 or
00:58:46 batch size 2. I suggest batch size 1 because it is better quality and use it. However for those
00:58:52 who wants to set up themselves I am going to show that right now. So stop training. Go to the
00:58:58 accelerate tab here and set number of processes 2. Alright this is a hack to the flow of the
00:59:05 video because I just figured out something. When you are setting multiple GPU training make sure
00:59:13 that number of processes equals to the number of GPUs you have. When you set it that way you are
00:59:20 going to get almost exactly same number of epochs. You see currently I am training for 60 epochs on
00:59:28 4 GPU therefore total 240 epochs and I am getting 240 epochs. Currently I am doing a training for a
00:59:37 client and I have figured out there is not much speed difference however what is the benefit of
00:59:44 this? With this way you can set the save every n epochs accurately. So I am going to save every 20
00:59:51 epochs a checkpoint and it will work as expected. So set this number of processes equal to the
00:59:59 number of GPUs you have. If you are training on 8 GPU set it 8, if you are training on 6 set it 6,
01:00:03 if you are training on 4 set it 4. So this is the logic. Set multi GPU and set the GPU IDs 0, 1, 2, 3
01:00:14 like this and that's it. Now we are ready to use multi GPU however there are two things that you
01:00:20 need to change. The first thing that you need to change is you need to divide the epoch number to
01:00:25 the number of GPUs. So it is going to be 50 and it is automatically going to handle everything
01:00:31 for us. Let's save every 20 epochs. This doesn't change with the number of GPUs. It still will save
01:00:38 basic number of the epochs and the learning rate. You need to set a new learning rate as
01:00:43 you increase number of GPUs or the batch size. There isn't an exact formula so the suggested
01:00:49 formula is new learning rate equal to number of GPUs multiplied with batch size divided by
01:00:56 2 then the older learning rate. So our new learning rate becomes like this:
01:01:01 learning rate multiplied with 4 multiplied with 1 because we are using batch size 1 divided by 2
01:01:07 and this is the new learning rate. There is also using directly multiplying without dividing 2 or
01:01:14 square root and as I said there is not an exact formula but dividing by 2 is commonly used. So
01:01:20 this is the new learning rate. Why? Because we have 4 GPUs. So multiply with 4 and divide by 2.
01:01:26 This is the new learning rate. And let's say RunPod train 4x GPU for this one. You should
01:01:34 always save your configuration like this. Save it and let's start the training. I am still not
01:01:39 going to apply the T5 attention mask to see the speed and compare with the Massed Compute but I
01:01:46 can already say that A40 GPU on RunPod is slower than A6000 on Massed Compute and it is also more
01:01:54 expensive. However as I said you can always rent more powerful GPUs such as you can rent 4x L40S
01:02:02 GPU and it will train like in 30 minutes maybe faster with maximum possible quality. So it is
01:02:09 up to you to rent number of GPUs and the certain GPU. You can also rent 4x 4090. That time you need to use
01:02:18 the lower VRAM configuration. Which one you need to use? Like rank 4 or rank 3 to see the speed
01:02:25 and you can still use multiple of them at the same time exactly same settings just the base
01:02:30 configuration changes. And what is the change in the base configuration? With the high VRAM
01:02:35 configuration we train in 16-bit so the quality loss is minimal. With the low VRAM we are training
01:02:42 in 8-bit. So when doing multi GPU training you will see that the total optimization steps
01:02:48 displayed as 750 instead of 3000. Why? Because it is dividing the number of steps equally to
01:02:56 on each GPU therefore it will display 750 steps. Everything will work exactly as same. This will
01:03:04 be almost equal to a training batch size 4 but this time we will gain linear speed increase. When
01:03:11 you increase the batch size on a single GPU you don't get such speed increase actually I tested
01:03:15 and batch size 2 just a little bit increases the speed nothing like using two GPU. Currently this
01:03:22 speed is 5.25 seconds it and it is getting better. You may think that it is same as before but now
01:03:29 you see because each time when one step is done actually we are training 4 images. We are doing
01:03:35 first step of the previous. So you need to divide this number to 4 to get the actual speed and it is
01:03:41 just amazing. We almost got 100% linear increase. So our speed is increased like 4 times compared to
01:03:50 the before. With SDXL there weren't such speed increase but with FLUX training on a multiple
01:03:55 GPU we are almost getting such perfect linear increase based on the number of GPUs. Previously
01:04:03 you had to use SXM machines to get a linear speed increase with multi GPU but with FLUX
01:04:09 you don't need to use such configuration because SXM machines are extremely expensive. It is the
01:04:15 link between GPUs. With PCI Express link in these GPUs we are still getting almost linear increase
01:04:22 no performance loss. You see we are almost getting to the previous speed but this time batch size is
01:04:28 4. So we are training 4 images at one time and it is going to take like 1 hour 2 minutes to complete
01:04:35 this training. It is just amazing. Looks like the training speed stabilized at 4.95 seconds per it.
01:04:42 So now I will wait for training to finish then we will continue and I see that it doesn't use
01:04:49 my all GPUs. This is weird. Yeah probably nvitop is broken. Yes it doesn't get updated. So let's
01:04:55 start a new terminal nvitop and yes nvitop looks like broken. It doesn't display all of the GPUs
01:05:04 because this is impossible. Can we see the usage in here in pods? Okay it doesn't show. This is
01:05:10 weird. However it shows we are training 750 steps. So it has to use. Let's also look at the logs. It
01:05:18 should have loaded 4 times. Yes I can see that it loaded 4 times. So it is working but the status of
01:05:25 the GPUs are not accurate. This was accurate on Massed Compute but in here it doesn't accurate.
01:05:30 So don't trust it. Trust the values that you see here and we are going to see the results at the
01:05:35 end. So the training has been completed. 750 steps are completed and it took 62 minutes to train on 4
01:05:46 A40 GPU with one of the very best configurations. Now how you can use them? You can download them to
01:05:52 your computer and use or you can use them on RunPod as well. I will show both of them. So
01:05:57 to use on RunPod you can watch this SwarmUI cloud tutorial. It is amazing or you can also use Forge
01:06:04 Web UI. Either one of them works and I am going to show both of them. When we go to the SwarmUI
01:06:10 cloud tutorial we have a link there. This link. When you watch the tutorial you will know it.
01:06:15 So in this link there is a RunPod installer. You should watch the tutorial if you don't know how
01:06:21 to do it. So I will just download this installer file. I will show quickly. So first I will install
01:06:27 and run it very quickly. I am not going to repeat everything in that tutorial. Let's just copy paste.
01:06:32 It failed because I didn't upload the file. So let's just upload the file and let's just name it
01:06:39 to accurate. Okay let's just install it. So the installation is getting completed as exactly as
01:06:45 shown in the SwarmUI cloud tutorial for FLUX. Okay so the installation has been completed and the
01:06:51 SwarmUI started on the RunPod. To use our LoRAs first of all we need to move the files into the
01:06:59 accurate folders. So I will move the first VAE file. So cut it. Move into the SwarmUI into the
01:07:07 models into the VAE and paste there. Then let's move to the workspace. We need to move clip large
01:07:14 and the T5 text encoder. Cut it. By the way we need to rename text encoder to the accurate name
01:07:21 or it will re-download it. What is the accurate name for the SwarmUI? I don't know from the
01:07:27 memorization but I will look from my computer and this is the accurate name. So I will just rename
01:07:34 it to this name and yes. I just noticed that we did put these two into the inaccurate folder.
01:07:40 They go into the clip not clip vision. So let's just paste it and VAE is in the accurate folder.
01:07:48 Okay as a last step we are going to move the main FLUX.1 DEV safetensors file. Cut it. Move
01:07:53 into the SwarmUI into the models. Put it into unet folder. If you don't see the unet folder you need
01:07:59 to generate it yourself. How you can generate? You can click here and generate a new folder and name
01:08:05 it as unet. Then let's return back to the models folder. Click refresh and it should appear here.
01:08:11 Then let's generate a single image first. Let's see the model then we will generate multiple
01:08:18 images compare checkpoints. Moreover since we have 4 GPUs running right now we can add more backends
01:08:26 to it which I am going to do right now. So click here to add more backends. I have shown all of
01:08:32 this in the main cloud tutorials for SwarmUI. Okay this is just extra. So GPU id 1 GPU id 2
01:08:41 and GPU id 3. Moreover you can add a new command --fast. This will improve the speed significantly
01:08:49 on newer GPUs and then save. Once you save them it will restart the backends. Okay let's return back
01:08:56 to generate. Of course since we set the fast it is restarting. Okay let's just wait for backends
01:09:03 to load. We can always go to the logs and put into debug and we can see. Yes it is now going to load
01:09:10 everything. Yeah it is starting on each backend right now. We can see that it is just starting
01:09:17 yes. Then let's hit generate again. So it is 1 current generation 1 queued 2 waiting on model
01:09:23 load. Why I do this? Because I am verifying the models and everything is set into the accurate
01:09:30 place. Then I will use my LoRA to generate and my LoRAs are not visible here yet because I also need
01:09:35 to move them. So let's go back to the workspace. Where are our LoRAs? They are inside train folder
01:09:42 inside model. You see my LoRAs are here. I am just going to select everything then cut them and let's
01:09:49 move back into the workspace into the SwarmUI into the models into the LoRA folder and paste
01:09:56 them here. And image is getting generated almost ready. It is also doing inpainting because we have
01:10:03 segment face with 0.7 70% denoise inpaint with photo of OHWX man. This is equal to using after
01:10:11 detailer a detailer extension on Automatic1111 web UI. Okay this is the base image. Then let's
01:10:17 refresh the LoRAs. LoRAs appeared. For example let's use this LoRA 80 epoch. Let's generate.
01:10:24 Okay now it is loading the LoRA and it is going to generate. We can always see in the server
01:10:30 logs. Yes it loaded the LoRA and it is generating image right now and we can see already preview. It
01:10:37 is inpainting the face right now. Inpainting face is optional however I find it improving the face
01:10:43 quality. By the way this GPU is slower than the Massed Compute RTX A6000 GPU and we got an image.
01:10:50 It is really really good. So how you can find the best checkpoint? To find the best checkpoint
01:10:55 we are going to use tools grid generator and in here first select the LoRA. LoRAs here fill all
01:11:02 like this select delete the (none) LoRA. Then I am going to use multiple prompts because we already
01:11:08 have prompts. Return back to downloads folder and inside test prompts we already have prompts
01:11:14 for the SwarmUI. I have eyeglasses so I'm going to use these prompts. Okay like this and you see
01:11:20 the prompt separator is this. These two is from separator and everything is set. Let's also give
01:11:27 a name to our grid like test1. Let's also save the grid config like test one and hit generate.
01:11:34 Now this is going to queue the generation on all 4 GPUs and it will generate them like in 20 minutes
01:11:41 not like 4 hours because we are using 4 GPUs even though we are doing 30 + 18 so 48 steps for
01:11:49 each image. It will be done in like 20 minutes not like 4 hours. We will see it. Meanwhile
01:11:55 let's also upload all the LoRAs into the Hugging Face so we can download into our computer we can
01:12:02 use them later at a time anytime we want. So to upload models to the Hugging Face I already have a
01:12:08 tutorial here and I already have a notebook file. Go to this link. You see how to save download your
01:12:14 models and at this link you will see Hugging Face upload version 6. This is just updated
01:12:20 today. Click this link to download it. Then return back to your workspace. Upload the downloaded file
01:12:28 into here. Double click and open it. Now first of all we need to install the dependencies with this
01:12:34 cell. Just run it one time. Then you need to get your Hugging Face token. To get your Hugging Face
01:12:41 token go to the Hugging Face. Also you need to generate a model folder. So first generate
01:12:47 a model new model. Everything will be saved here. Let's say test RunPod video. You can make
01:12:53 it public private. I'm going to make it private. Copy the model path here and we are going to use
01:12:59 very fast new upload feature. Just paste it there. Then go to the settings go to the access token.
01:13:06 You need to register an account. It is free don't worry and they don't charge you anything. They
01:13:11 are just amazing. Click select "write". Give a name test delete 2 like this and create token. Copy
01:13:19 this. This is important. Go back to here paste your token. Play this cell one time. It will set
01:13:25 your Hugging Face token and now we are ready. So you need to also set the LoRA path. Our LoRA path
01:13:31 is let's find it. It is inside SwarmUI currently inside models inside LoRAs. So right click and
01:13:38 copy path and delete this part and paste it. You see it's always starting with backslash and repo
01:13:44 type is model. This is important. Whatever the repo type you just generated you need to use it
01:13:49 and then just click the play icon and it will start uploading. You see it is going to upload
01:13:55 12.3 gigabytes. According to the your pod speed it may be completed in 2 minutes actually in Massed
01:14:02 Compute it was only 2 minutes or 10 minutes 20 minutes but this is the fastest way of uploading
01:14:09 models to the Hugging Face. It just arrived very recently so I am keeping everything very
01:14:14 up to date and at the same time it is generating the grid right now. You see estimated is 1 hour
01:14:20 but it will get better. Already we have generated like 20 images. We can always see the generation
01:14:26 speed in the debug. You see 1.13 it second is a really really good speed by the way. The upload
01:14:33 is slow on the RunPod though it was way faster on the Massed Compute. Okay it hashed first then it
01:14:39 will start the upload. The uploaded files will appear here which is our repository. This is a
01:14:44 model. It matters whether it is dataset or model. Okay it is saying processed. Wow it was fast. So
01:14:52 it uploaded everything. Let's refresh the files and all appeared here. So it took like 3 minutes
01:14:59 to upload everything and we have uploaded all of our models. This is just mind-blowingly fast
01:15:05 upload. Thank you so much Hugging Face you are amazing. So we saved everything into
01:15:10 the cloud forever until we delete them and we can download them anytime we wish. How you download?
01:15:16 For downloading I also have a new download. This one snapshot download. You just enter your repo
01:15:23 path here and the folder path wherever you want to download. For example let's download into the
01:15:28 workspace workspace test 2 like this and let's run this cell. This will download everything. Okay it
01:15:37 says that there is no directory workspace. Oh I need to put this into here. I'm going
01:15:43 to update this script so it will be fixed when you are using. Okay let's just play and yes it
01:15:49 started downloading. Don't worry on the RunPod we get this error because we are using the proxy
01:15:55 but in here all the files will be appearing after a while. This is a super fast download. It has
01:16:02 resume also this upload has resume capability as well. I fixed that error and updated file to the
01:16:09 version 7 and I already can see the LoRA files are downloaded. So this is huge, huge speed of
01:16:16 downloading. This is how you can save and download later and use. Grid generation has been completed.
01:16:22 We click here to open it. If not all of the images are loaded, refresh the page. I am going to use
01:16:29 auto scale images to viewport width and now all you need to do is check each checkpoint and decide
01:16:37 which one is working best. There is no easier way unfortunately, so it is a personal thing. You need
01:16:43 to check every checkpoint and decide which one is working as best. Then you can use the checkpoint
01:16:49 to generate images as you wish. I am still working on better workflows, better configuration,
01:16:55 so hopefully the results will become better when you are watching this tutorial. I will update the
01:17:02 configuration files. Currently I am searching for training the text encoder CLIP large model, so we
01:17:08 will hopefully see a better workflow soon. As a final step, I will show how you can use the Forge
01:17:14 Web UI on this RunPod machine. So for using Forge Web UI on RunPod, I have automatic installer. It
01:17:20 is here, you see under this section of the post. Let's go there and in the attachments you will
01:17:27 find Forge installer. This may be a higher version when you're watching, so click this link to
01:17:32 download it. Then go to the workspace and generate a new folder as Forge installed like this. Enter
01:17:40 inside it, upload the zip file, then right-click and extract archive. So you will not get confused,
01:17:48 the new files, it will be a clear one. And then you need to use RunPod instructions.txt file. You
01:17:55 can also extract it onto your computer and upload. So for installing the Forge Web UI, we are going
01:18:01 to run this command. Open a new terminal, copy paste it. So it is going to install Forge Web UI
01:18:07 into the stable diffusion web ui Forge under this folder. Once you install it under this folder,
01:18:14 you need to delete this part "cd workspace". Don't Forget that if you install it into your workspace,
01:18:20 then you don't need to delete it. So we are going to use this as like this. We just deleted the
01:18:27 first "cd workspace" part and we just made it like this. So whether you install into your workspace,
01:18:34 you can keep them. Whether you don't install it your workspace into a separate folder,
01:18:38 you keep it like this. Just wait for installation to be completed. Okay, so the Forge Web UI
01:18:44 installation has been completed. To start it now I will use this. As I said, be careful where you
01:18:50 have installed it and run this command inside it. If you have installed into workspace, it is fine,
01:18:56 but if you didn't install into not workspace, it will fail. Yes, we are currently failing because
01:19:02 of this. So I have to make it like this. So it will directly move into this folder. Open
01:19:07 a new terminal inside this folder and just copy paste it like this. Pay attention to the paths.
01:19:13 You will understand them as the time passes and it will help you in long run. Always you
01:19:19 can message me on Patreon or on Discord server and I will help you. So now we just need to wait for
01:19:25 start and we also need to move the files. So let's move the files while it is starting. Current our
01:19:32 files are inside here models unet. So let's move this file into the Forge Web UI unet. It will be
01:19:40 inside Forge install web ui Forge inside models inside stable diffusion. Put it here. Let's go
01:19:48 back to the SwarmUI and models and we have LoRAs. Let's move our LoRAs. Okay, click first file,
01:19:57 then while keep pressing shift select all like this. Cut, move back to the workspace into the
01:20:04 Forge install into the models inside LoRAs. LoRA folder not generated yet, so let's copy it later.
01:20:12 Go to the SwarmUI models VAE. So we can just cut or copy. Go back to Forge install stable diffusion
01:20:21 Forge models VAE, paste it. Go back to the SwarmUI inside models inside CLIP. This is important. Move
01:20:31 both of them to the Forge install stable diffusion Forge models and it will be inside text encoder.
01:20:40 Paste here. Now we just need to copy the LoRAs. Let's just wait application to start. Okay,
01:20:46 I know why it has failed because we didn't install it into the workspace. My script of installer has
01:20:54 failed. We can see the script here. It was here. So we need to copy this and modify it. So how
01:21:01 we gonna modify it? We are just going to change this workspace to like this Forge install. Okay,
01:21:09 this will fix it. So let's terminal. So you should install into workspace and not into Forge install,
01:21:16 otherwise you need to do all of this and now let's remove this share and start a new terminal. Okay,
01:21:24 I had prepared scripts to install into workspace. Once we installed into the subfolder,
01:21:29 it caused a lot of issues. So better to install into workspace. It is the best way. Now it's
01:21:34 starting but I will not delete these parts of the video because you may always encounter some
01:21:40 issues and you are learning how to fix them. This is helpful in the long run and it will help you to
01:21:48 understand the concepts what we are doing better. So now we will get a Gradio live share. Second
01:21:54 time start should be way faster than the first time start. Even the second time start on RunPod
01:21:59 is taking too long. On Massed Compute it is almost instant. Okay, so this time we got a Gradio live.
01:22:05 Let's open it and we should also move our LoRAs. So let's go back to SwarmUI models LoRA. Let's
01:22:14 select all cuts, move back the workspace into Forge install web ui Forge models LoRA and paste
01:22:23 it here and we got everything and we got the first web. So let's refresh and let's select all these 3
01:22:31 and FLUX and text to image. Okay, let's use some of our test prompts. For example this one. I will
01:22:37 first generate a prompt without my LoRA, then I will use with my LoRA. Okay, so I will use 1024
01:22:44 to 1024 and let's generate. Don't Forget to select all these 3 and the checkpoint itself. As I said,
01:22:51 if you install directly into workspace your Forge web, you will not have any of the issues that
01:22:57 I had. However, I have shown them so you learn more stuff. Okay, now it is going to generate an
01:23:04 image. First loading it. We can see the VRAM usage somewhere around here. Where it is? Here. Okay,
01:23:12 now it is loading the model. Okay, we got the first image generated. Then we are going to
01:23:18 apply our LoRA. Let's go there, refresh and LoRAs should appear. Yes, for example let's use this one
01:23:25 and generate. When you first time generate a LoRA, the Forge Web UI patches it and for patching it
01:23:32 uses a significant amount of VRAM. This doesn't exist on the SwarmUI. I didn't see in the logs.
01:23:41 This is a disadvantage of the Forge Web UI. Also I like the SwarmUI better for using the FLUX,
01:23:48 but if you want to use Forge Web UI, here how you use it. So the patching has been done and then we
01:23:56 are going to see the generated image. Okay, we got it. Currently we are not doing any face inpainting
01:24:03 so the face quality is not optimal level, but you know how to use the Forge Web UI. It is
01:24:09 like Automatic1111 Web UI. There are also other things and I need to make a dedicated tutorial
01:24:13 for Forge Web UI. So I will end the tutorial here. I hope you have enjoyed it. Please keep
01:24:20 subscribed because I am going to fully research fine-tuning of the FLUX and I bet it will be many
01:24:27 times better than the LoRA training on FLUX. We are going to see it hopefully. Moreover, I am
01:24:33 working on finding optimal training parameters for CLIP large model when training the FLUX LoRA. So
01:24:41 hopefully we will get better results compared to what we get now. Hopefully see you later.

Uh oh!

Blazing Fast and Ultra Cheap FLUX LoRA Training on Massed Compute and RunPod Tutorial No GPU Required

Blazing Fast & Ultra Cheap FLUX LoRA Training on Massed Compute & RunPod Tutorial - No GPU Required!

Full tutorial link > https://www.youtube.com/watch?v=-uhL2nW7Ddw

Video Transcription

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!