Skip to content

Qwen Image Models Training 0 to Hero Level Tutorial LoRA and Fine Tuning Base and Edit Model

FurkanGozukara edited this page Nov 2, 2025 · 1 revision

Qwen Image Models Training - 0 to Hero Level Tutorial - LoRA & Fine Tuning - Base & Edit Model

Qwen Image Models Training - 0 to Hero Level Tutorial - LoRA & Fine Tuning - Base & Edit Model

image Hits Patreon BuyMeACoffee Furkan Gözükara Medium Codio Furkan Gözükara Medium

YouTube Channel Furkan Gözükara LinkedIn Udemy Twitter Follow Furkan Gözükara

This is a full comprehensive step-by-step tutorial for how to train Qwen Image models. This tutorial covers how to do LoRA training and full Fine-Tuning / DreamBooth training on Qwen Image models. It covers both the Qwen Image base model and the Qwen Image Edit Plus 2509 model. This tutorial is the product of 21 days of full R&D, costing over $800 in cloud services to find the best configurations for training. Furthermore, we have developed an amazing, ultra-easy-to-use Gradio app to use the legendary Kohya Musubi Tuner trainer with ease. You will be able to train locally on your Windows computer with GPUs with as little as 6 GB of VRAM for both LoRA and Fine-Tuning.

The post used in tutorial to download zip file : https://www.patreon.com/posts/qwen-trainer-app-137551634

Requirements tutorial : https://youtu.be/DrhUHnYfwC0

SwarmUI tutorial : https://youtu.be/c3gEoAyL2IE

Video Chapters

00:00:00 Introduction & Tutorial Goals

00:00:59 Showcase: Realistic vs. Style Training (GTA 5 Example)

00:01:26 Showcase: High-Quality Product Training

00:01:40 Showcase: Qwen Image Edit Model Capabilities

00:01:57 Effort & Cost Behind The Tutorial

00:02:19 Introducing The Custom Training Application & Presets

00:03:09 Power of Qwen Models: High-Quality Results from a Small Dataset

00:03:58 Detailed Tutorial Outline & Chapter Flow

00:04:36 Part 4: Dataset Preparation (Critical Section)

00:05:05 Part 5: Monitoring Training & Performance

00:05:23 Part 6: Generating High-Quality Images with Presets

00:05:44 Part 7: Specialized Training Scenarios

00:06:07 Why You Should Watch The Entire Tutorial

00:07:15 Part 1 Begins: Finding Resources & Downloading The Zip File

00:07:50 Mandatory Prerequisites (Python, CUDA, FFmpeg)

00:08:30 Core Application Installation on Windows

00:09:47 Part 2: Downloading The Qwen Training Models

00:10:28 Features of The Custom Downloader (Fast & Resumable)

00:11:24 Verifying Model Downloads & Hash Check

00:12:41 Part 3 Begins: Starting The Application & UI Overview

00:13:16 Crucial First Step: Selecting & Loading a Training Preset

00:13:43 Understanding The Preset Structure (LoRA/Fine-Tune, Epochs, Tiers)

00:15:01 System & VRAM Preparation: Checking Your Free VRAM

00:16:07 How to Minimize VRAM Usage Before Training

00:17:06 Setting Checkpoint Save Path & Frequency

00:19:05 Saving Your Custom Configuration File

00:19:52 Part 4 Begins: Dataset Preparation Introduction

00:20:10 Using The Ultimate Batch Image Processing Tool

00:20:53 Stage 1: Auto-Cropping & Subject Focusing

00:23:37 Stage 2: Resizing Images to Final Training Resolution

00:25:49 Critical: Dataset Quality Guidelines & Best Practices

00:27:19 The Importance of Variety (Clothing, Backgrounds, Angles)

00:29:10 New Tool: Internal Image Pre-Processing Preview

00:31:21 Using The Debug Mode to See Each Processed Image

00:32:21 How to Structure The Dataset Folder For Training

00:34:31 Pointing The Trainer to Your Dataset Folder

00:35:19 Captioning Strategy: Why a Single Trigger Word is Best

00:36:30 Optional: Using The Built-in Detailed Image Captioner

00:39:56 Finalizing Model Paths & Settings

00:40:34 Setting The Base Model, VAE, and Text Encoder Paths

00:41:59 Training Settings: How Many Epochs Should You Use?

00:43:45 Part 5 Begins: Starting & Monitoring The Training

00:46:41 Performance Optimization: How to Improve Training Speed

00:48:35 Tip: Overclocking with MSI Afterburner

00:49:25 Part 6 Begins: Testing & Finding The Best Checkpoint

00:51:35 Using The Grid Generator to Compare Checkpoints

00:55:33 Analyzing The Comparison Grid to Find The Best Checkpoint

00:57:21 How to Resume an Incomplete LoRA Training

00:59:02 Generating Images with Your Best LoRA

01:00:21 Workflow: Generate Low-Res Previews First, Then Upscale

01:01:26 The Power of Upscaling: Before and After

01:02:08 Fixing Faces with Automatic Segmentation Inpainting

01:04:28 Manual Inpainting for Maximum Control

01:06:31 Batch Generating Images with Wildcards

01:08:49 How to Write Excellent Prompts with Google AI Studio (Gemini)

01:10:04 Quality Comparison: Tier 1 (BF16) vs Tier 2 (FP8 Scaled)

01:12:10 Part 7 Begins: Fine-Tuning (DreamBooth) Explained

01:13:36 Converting 40GB Fine-Tuned Models to FP8 Scaled

01:15:15 Testing Fine-Tuned Checkpoints

01:16:27 Training on The Qwen Image Edit Model

01:17:39 Using The Trained Edit Model for Prompt-Based Editing

01:24:22 Advanced: Teaching The Edit Model New Commands (Control Images)

01:27:01 Performance Impact of Training with Control Images

01:31:41 How to Resume an Incomplete Fine-Tuning Training

01:33:08 Recap: How to Use Your Trained Models

01:35:36 Using Fine-Tuned Models in SwarmUI

01:37:16 Specialized Scenario: Style Training

01:38:20 Style Dataset Guidelines: Consistency & No Repeating Elements

01:40:25 Generating Prompts for Your Trained Style with Gemini

01:44:45 Generating Images with Your Trained Style Model

01:46:41 Specialized Scenario: Product Training

01:47:34 Product Dataset Guidelines: Proportions & Detail Shots

01:48:56 Generating Prompts for Your Trained Product with Gemini

01:50:52 Conclusion & Community Links (Discord, GitHub, Reddit)

Video Transcription

  • 00:00:00 Greetings everyone, welcome to the most  comprehensive yet easy-to-follow Qwen  

  • 00:00:06 models training tutorial. In this tutorial, I am  going to show you from scratch to the grandmaster  

  • 00:00:14 level how to train Qwen models on your local  Windows computer. After watching this tutorial,  

  • 00:00:22 you will be able to train your models locally  on your Windows computer and generate amazing  

  • 00:00:30 images. I am going to show both LoRA training  and also fine-tuning training. Furthermore,  

  • 00:00:36 I will show Qwen base model training and Qwen  Edit Plus model training. This tutorial is  

  • 00:00:45 extremely comprehensive, so therefore, check out  the tutorial description to see the chapters.  

  • 00:00:52 Moreover, in a moment, I will show you the  layout of the tutorial, so keep watching.

  • 00:00:59 In this tutorial, I am not going  to show only realistic images,  

  • 00:01:03 but I am going to show you style training as  well. For example, I have trained GTA 5 style,  

  • 00:01:11 shared it on CivitAI and also the style dataset,  so I will explain how to train your style and  

  • 00:01:20 generate excellent images with your trained  style. Furthermore, I will show you how to  

  • 00:01:26 train a product like this one and generate  amazing product images with highest quality,  

  • 00:01:33 with small text or the logos, and keep  consistency and accuracy of the products.

  • 00:01:40 Moreover, after you trained the Qwen image  edit model, even without control images,  

  • 00:01:46 you will be able to make prompt-based  editing. For example, I say that replace  

  • 00:01:52 head of this man and it generates this  image. I will show all of that. You will  

  • 00:01:57 see it. For preparing this tutorial, I  have worked over 20 days and spent over  

  • 00:02:04 $600 for research and development. You  see on a single day, $110 I have spent  

  • 00:02:13 on RunPod. When we also include MassedCompute,  I have spent over $700 or $800 for research.

  • 00:02:19 Moreover, I have prepared an application so easy  to use with pre-made configurations. LoRA training  

  • 00:02:27 configurations already, as you can see they  are all split into each GPU tier. Fine-tuning  

  • 00:02:35 configurations already, they are also split into  GPU tiers. This application fully developed by me,  

  • 00:02:42 it is using the famous Kohya SS GUI tuner, so  easy to use. You just load the configuration  

  • 00:02:49 and set up a few things and you are ready to  go. I will explain everything. Furthermore, we  

  • 00:02:54 have one-click installers for this application for  Windows, RunPod, and MassedCompute, including the  

  • 00:03:02 base models download. This application supports  1.2.1 and Wan 2.2 models training as well. Also,  

  • 00:03:09 this model is extremely powerful. If you paid  attention to the images that I have shown,  

  • 00:03:14 you will see that it is able to do a lot of  emotions very accurately. It is able to do  

  • 00:03:20 very hard prompts, very hard complex prompts  very accurately. And I didn't even use a very  

  • 00:03:27 powerful training images dataset. I just used 28  medium quality images. However, with only small  

  • 00:03:36 and medium quality dataset, I am able to get  amazing, mind-blowing quality images like these  

  • 00:03:43 ones. You see all of them are highest quality,  really good, both realistic, and it can do style  

  • 00:03:50 images already very well. So this Qwen model is  extremely powerful and my new favorite model.

  • 00:03:58 So let me also show you the flow of the  tutorial as well. So the rest of the tutorial  

  • 00:04:04 flow will be like this. Part 1, initial setup and  installation, introduction and finding resources,  

  • 00:04:11 mandatory prerequisites, the requirement tutorial,  core application installation. Then Part 2 will  

  • 00:04:17 be the downloading training models. Part 3 will  be starting and navigating the user interface,  

  • 00:04:23 the Gradio application that I have developed,  loading and training configuration presets,  

  • 00:04:28 system and VRAM preparation, detailed training  parameters setup. Part 4 will be dataset  

  • 00:04:36 preparation. This is super critical if you are  first-time training, this part will be super  

  • 00:04:42 useful for you. Using the ultimate batch image  processing tool, this is another tool that I have  

  • 00:04:47 developed, dataset quality and guidelines, this  is super important. New tool that I have added  

  • 00:04:53 using the internal image pre-processing, dataset  structuring for the trainer, this is important,  

  • 00:04:59 captioning your dataset and the impact of  it, finalizing model paths and settings.

  • 00:05:05 In the Part 5, we are going to see monitoring  training and performance optimizations,  

  • 00:05:11 testing and finding best checkpoint, resuming  incomplete trainings, either it is a LoRA or  

  • 00:05:18 fine-tuning. Then in the Part 6, I will  show generating high-quality images. I  

  • 00:05:23 have prepared amazing presets so that with  one click you will be able to generate  

  • 00:05:28 highest quality images with your trained Qwen  models, but we are supporting so many models,  

  • 00:05:33 not just Qwen. Image generation workflow in  SwarmUI, fixing some of the images, inpainting,  

  • 00:05:39 this is also extremely useful, you will love  it. Part 7, specialized training scenarios,  

  • 00:05:44 fine-tuning difference versus LoRA, training on  Qwen image edit model. If you are interested in  

  • 00:05:50 Qwen image edit model training, by teaching  model new commands like replace clothing,  

  • 00:05:56 replace hair color, or colorize this  sketch or line art, style training,  

  • 00:06:02 what is the difference, product training, what is  the difference, and the Part 8 is the conclusion.

  • 00:06:07 So I really recommend you to watch this tutorial  from beginning to the end without skipping any  

  • 00:06:12 part. This tutorial will also help you  significantly in your future trainings,  

  • 00:06:19 whether it is Qwen or Wan 2.2. Hopefully  after this tutorial, I will work on Wan 2.2  

  • 00:06:24 training. Therefore, this tutorial will help  you significantly in the future as well. And  

  • 00:06:30 I am saying that this is a tutorial, however,  this is literally a full course. So therefore,  

  • 00:06:36 try to learn everything I have explained  in this tutorial and improve your skills,  

  • 00:06:41 your knowledge, and utilize this knowledge  in your professional life. This tutorial,  

  • 00:06:47 I can say that it is a breaking deal, like a  full course. I have spent huge time and you  

  • 00:06:54 will love this tutorial, you will enjoy from  this tutorial, and you will learn so much  

  • 00:06:59 information from this tutorial. This tutorial is  a product of experience of two years working on  

  • 00:07:07 these generative AI models, training them, doing  research, doing experimentation. So let's begin.

  • 00:07:15 So as usual, I have prepared an amazing post where  you will find all of the necessary information,  

  • 00:07:22 the zip file, instructions. Slowly scroll down.  Download the latest zip file from here. Also,  

  • 00:07:30 it is in the attachments section. Do not start  installation right away. Keep scrolling down.  

  • 00:07:36 I recommend you to read everything. Find the Qwen  image tutorial video instructions, this section,  

  • 00:07:43 and from here we will follow. The very first thing  that you need to do is following the requirements  

  • 00:07:50 tutorial. This is mandatory and super important.  When you open this tutorial, you will get to this  

  • 00:07:56 video. This video shows you everything about the  requirements. What are they? Python, CUDA, FFmpeg,  

  • 00:08:04 and other stuff. So please follow this tutorial  with its updated instructions. You see the link  

  • 00:08:10 is here. This is a fully public tutorial.  All of the links are updated. You see it is  

  • 00:08:17 fully updated 3 September 2025. After you watch  this tutorial, apply the steps here, you will  

  • 00:08:24 be ready to run all of the AI applications  that I develop or other developers develop.

  • 00:08:30 After you have followed the requirements  tutorial, return back to our main post and  

  • 00:08:35 now we will start installation. So move the  downloaded zip file into the disk where you  

  • 00:08:41 want to install. I am going to install into my  Q drive. I will right-click and I will extract  

  • 00:08:47 here. You can use your Windows extractor. After  extraction, enter inside the extracted folder,  

  • 00:08:53 do not forget that. Then all you  need to do is just double-click  

  • 00:08:57 windows_install_and_update.bat file and  run. Do not run anything as administrator,  

  • 00:09:04 it will break it. So run everything with  double-click or select and hit enter. You  

  • 00:09:09 see that it will generate a virtual environment  and install all the libraries inside it. So this  

  • 00:09:15 will not affect anything else on your computer.  All of my applications installed into secured  

  • 00:09:23 and isolated virtual environment folders. Just  wait for installation to be completed. Okay,  

  • 00:09:28 so the installation has been completed. You can  scroll up and see if there are any errors. If  

  • 00:09:33 there are any errors, select everything like this,  control C, save into a text file and message me  

  • 00:09:41 the text file. You can message me from email, from  Patreon, from Discord, anywhere. Then close this.

  • 00:09:47 Now we need to download Qwen training  models. To download the models, double-click  

  • 00:09:52 windows_download_training_models, run. It will  install necessary requirements, then it will ask  

  • 00:09:58 you which model you want to download. So you can  download Qwen base model or you can download Qwen  

  • 00:10:04 image edit plus model. I will download both  of them because I will show you both of them.  

  • 00:10:09 So let's download the option one. The option one  will download these following models. The option  

  • 00:10:14 two will download the newer model. It will not  download twice. They will be downloaded into  

  • 00:10:20 training/models/Qwen folder. You will see here.  As you have noticed, there are 16 parts because  

  • 00:10:28 this downloader is extremely robust. It downloads  with 16 different simultaneous connections, so it  

  • 00:10:37 utilizes your entire internet speed. Moreover,  it is fully resumable and it is fully robust.  

  • 00:10:44 For example, I can close this, run the downloader  again. Okay, let's run it. Then I will select the  

  • 00:10:49 option one again, and it will fully resume  wherever it is left. You see it is resuming  

  • 00:10:55 back from wherever it is left. As you can see,  it is downloading with 1 gigabit per second on my  

  • 00:11:01 personal computer. This is an amazing speed. This  is maximum speed that my internet connection has.

  • 00:11:08 Once the model fully downloaded, it will merge  the split parts into a single part, then it will  

  • 00:11:16 verify its hash value to ensure that it has been  downloaded accurately. We will see in a moment.  

  • 00:11:24 Yes, it is merged, then it is verifying the hash  value so that your downloaded models will never  

  • 00:11:31 be corrupted or have any issues. Then it will move  to the next download like this. And it is moving  

  • 00:11:38 to the next download. When you next time resume  or start the downloader, it will just skip the  

  • 00:11:43 already downloaded files and start with the next  file. This is a downloader that I have developed,  

  • 00:11:49 and I am using this downloader in my all  applications. So it is always very fast,  

  • 00:11:55 robust, and accurate. This downloader works  with slow internet connections and also with  

  • 00:12:00 very high internet connections. This is  the best downloader you will ever find.

  • 00:12:05 Once the first downloads have been completed,  start the windows downloader again and download  

  • 00:12:10 the Qwen image edit model as well if you want  it. And at the end of the downloads, you will  

  • 00:12:16 see that all the files have been downloaded like  this. If you already have the files, you can also  

  • 00:12:22 move them or you can also use them. However, I  recommend you to use the windows downloader to  

  • 00:12:28 download accurate version of the models. You  see these are the models that we are going to  

  • 00:12:33 use. BF16 version of the models are mandatory.  FP8 version or GGUF versions will not work.

  • 00:12:41 Then we are ready to start the application.  Moreover, if you want to update the application  

  • 00:12:46 before starting, let's say you are going to use  it afterwards, just double-click and start the  

  • 00:12:52 windows_install_and_update file again and it will  update it to latest version. So let's start the  

  • 00:12:57 application, windows_start_app.bat file, run.  It will automatically open the interface like  

  • 00:13:02 this. Always follow CMD windows as well to see if  there are any errors or not or what is happening.  

  • 00:13:09 So this is our application interface. I will  explain everything, don't worry. First of all,  

  • 00:13:16 begin with selecting your preset. This is super  important. So make sure that you are in the Qwen  

  • 00:13:21 image training tab. We also support one model  training, and hopefully it will be the next  

  • 00:13:26 tutorial after this. I am going to work on that  as well. So make sure you are at this tab. Also,  

  • 00:13:32 whenever you are going to load a new config,  refresh page and then load. So for loading  

  • 00:13:37 the config, click this folder icon, go back to  your installation folder. This is where I have  

  • 00:13:43 installed. Enter inside Qwen-Training-Configs  and from here you are going to choose whatever  

  • 00:13:49 you want to train. I am going to show LoRA  training first, then I will show DreamBooth,  

  • 00:13:54 but both of them are exactly same. So let's  enter inside LoRA training, and based on your  

  • 00:14:00 GPU or how much you can wait, select the epochs.  So 200 epochs is the best quality, 100 epochs is  

  • 00:14:10 a little bit lesser quality, and 50 epochs is  lesser quality. Why? Because with more epochs,  

  • 00:14:16 we are actually using a lower learning rate and  we are doing more steps. Therefore, we are able  

  • 00:14:22 to train more details. So higher epochs, lower  learning rate is better. And now you will see  

  • 00:14:28 the tier 1 and tier 2 and tier 3, 4, 5, 6 configs.  You may be wondering what are the differences. To  

  • 00:14:37 learn the differences, enter inside the folder and  you will see LoRA_Configs_Explanation.jpg files,  

  • 00:14:43 and when you open it, it will tell you what  are the each configurations and what are  

  • 00:14:48 their difference. So based on your GPU, you are  going to select the configuration. Therefore,  

  • 00:14:54 I am going to use 200 epoch, and I'm going to use  tier 2 30,000 megabyte toml file. Double-click on  

  • 00:15:01 the toml file, it will open the file from here,  then click this icon to load it, and you see it is  

  • 00:15:08 saying configuration loaded successfully. Why did  I pick this configuration file? Type CMD and open  

  • 00:15:16 a CMD window, then type nvidia-smi. This will show  you your GPU list like this. So I have RTX 5090,  

  • 00:15:25 it has 32 GB of VRAM, but how much free VRAM I  have matters. So to learn that, open a CMD window,  

  • 00:15:34 type pip install nvitop like this, it will install  the nvitop very quickly, then type nvitop. And it  

  • 00:15:43 will show your GPUs' VRAM usages. Currently, I  am using 3.5 GB of VRAM on my GPU. But I need  

  • 00:15:52 30 GB of free VRAM for this configuration. Don't  worry, I will show what you can do. Therefore,  

  • 00:15:59 I should restart my PC and minimize my VRAM  usage. Moreover, you can open Task Manager,  

  • 00:16:07 go to Startup apps, and in here you can disable  all the starting apps except the necessary ones,  

  • 00:16:14 and after that restart, it will minimize  your VRAM usage as well. So I should get  

  • 00:16:19 this VRAM usage under 2 GB before I  start training. Okay, let's continue.

  • 00:16:25 You can click this open all panels and it  will open all of the panels or you can just  

  • 00:16:30 hide all the panels. So let's begin with first  option, accelerate launch settings. This option  

  • 00:16:35 is extremely useful when you do multiple GPU  training, but if you don't have multiple GPUs,  

  • 00:16:41 you don't need to set anything here. Multiple  GPU training on Windows not working very well.  

  • 00:16:47 Hopefully, I will show that on cloud tutorial  on MassedCompute and RunPod. But if you have  

  • 00:16:52 multiple GPUs like me, you see I have two GPUs, I  can set my GPU ID to 1 and the training will run  

  • 00:16:59 on my second GPU. However, I'm not going to use my  second GPU, I'm going to use my first GPU. Okay,  

  • 00:17:06 the second tab, click it. Now, this is super  important. Where you are going to save your  

  • 00:17:12 checkpoints. So click this folder icon or you  can directly copy paste the folder path here.  

  • 00:17:17 I will show directly copy paste. I am going to  save my models inside my SwarmUI installation,  

  • 00:17:24 inside models, inside diffusion_models, inside  lora, because this is going to be a LoRA  

  • 00:17:31 training. So copy this path and paste it. Or now  I will show with select folder. Click this icon,  

  • 00:17:38 find wherever you want to save. Okay, let's  go to SwarmUI installation, inside models,  

  • 00:17:44 inside lora folder, then click select folder. And  it will select the folder. Both works. Then how  

  • 00:17:50 frequently you want to save? Each saved checkpoint  of LoRA will be 2.3 GB. Currently, this setup is  

  • 00:17:58 saving eight different checkpoints. How? Because  you see it is going to save every N epochs. So  

  • 00:18:05 after every 25 epochs, it will save a checkpoint.  And you may be wondering what is epoch? One epoch  

  • 00:18:12 means that all of your images are trained one  time. I will explain that as we progress. So you  

  • 00:18:19 can keep this as a 25 epoch or you can reduce  this number to get more frequent checkpoints,  

  • 00:18:25 or you can make it higher to get lesser frequent  checkpoints. 25 is decent because after the  

  • 00:18:31 training, we will compare checkpoints and see  which one of the checkpoint is the best one.  

  • 00:18:37 Checkpoint means that the snapshot of the model  during that moment. The output name. Output name  

  • 00:18:45 means that with which name you are going to save  your LoRA files. So I am going to name my LoRAs  

  • 00:18:53 like this: Qwen-Image-Lora-Tutorial. Okay, you  don't need to change anything else in here. These  

  • 00:18:59 are all set. Then you can move to the next part,  but before moving that, I recommend you to save  

  • 00:19:05 your configuration to be able to load it later.  Where should we save? You can save this right  

  • 00:19:12 away from here. It will overwrite the base config,  or the better way is, for example, let's save it  

  • 00:19:18 inside here, like this. So I am going to save  it into this folder, and click save. Actually,  

  • 00:19:25 let's save it into our new folder to not have  any issues. Inside here, and save. Okay, then  

  • 00:19:33 click save. Yes, it's saved. Don't forget to click  save to save. You see it shows that configuration  

  • 00:19:39 saved. It is inside my new installation folder,  and I can see that tier 2 30,000 megabyte toml.

  • 00:19:45 Let's move with Qwen image training data  set. Now, the dataset part is extremely  

  • 00:19:52 important. Pay attention to this part. If  you are first time going to make a training,  

  • 00:19:57 preparation of the dataset matters hugely. You  need to have your images accurately prepared. To  

  • 00:20:04 automatically prepare your images, I recommend to  use Ultimate Batch Image Processing app. You see  

  • 00:20:10 it is under accelerate tools section. So let's go  to this link. I recommend you to check out these  

  • 00:20:17 screenshots, read this post. Let's scroll down  and let's download the latest version. Then let's  

  • 00:20:23 move it into our Q drive, right-click, extract  here, enter inside it. First of all, we need to  

  • 00:20:29 install. This is a pretty fast installation.  This application is very lightweight, but it  

  • 00:20:35 has so many features. Okay, the installation has  been completed. Scroll up to see if there are any  

  • 00:20:41 errors or not, then close this. Then let's start  the application. windows_start_application,  

  • 00:20:46 run. Why this application important? Because this  will allow you to batch preprocess your training  

  • 00:20:53 images. You can of course manually preprocess  your images, but this makes it much easier and  

  • 00:21:00 accurate. So I have some sample images to  demonstrate you the power of this tool. I  

  • 00:21:06 will copy this path and enter as an input folder.  Then as an output folder, let's output them into  

  • 00:21:14 my other folder as Pre-process Stage 1. Then the  aspect ratio. If you are going to generate images  

  • 00:21:23 with 16x9 always, you can make your aspect ratio  accordingly. However, if you are not sure which  

  • 00:21:31 aspect ratio you are going to use, I recommend  you to use square aspect ratio with 1328 to 1328  

  • 00:21:39 pixels. This is the base resolution of the Qwen  image model or Qwen image edit model. This works  

  • 00:21:45 best and with this aspect ratio and resolution,  you can still generate any aspect ratio. All the  

  • 00:21:51 images I have shown you in the beginning of  the tutorial were trained with 1328 to 1328.  

  • 00:21:58 Then there are several options. You can select  the classes from here to zoom them in. This is  

  • 00:22:04 extremely useful when you are training a person  because you want to zoom in the person. What  

  • 00:22:10 I mean by that? You see in these images, there  are a lot of extra spaces that can be zoomed in.  

  • 00:22:18 For example, in this image, I can zoom in myself  a lot. So you can choose this or there is a better  

  • 00:22:25 one which is based on SAM2. This takes anything  as a prompt. Let's say person. You can set your  

  • 00:22:32 batch size, GPU IDs, these are all advanced stuff  if you are going to process a lot of images. So  

  • 00:22:39 default is good. Let's start processing. What this  is going to do is it is going to zoom in the class  

  • 00:22:46 I have given without cropping any part of the  class. So this will not make these images exactly  

  • 00:22:52 as this resolution or this aspect ratio. It will  try to match this aspect ratio without cropping  

  • 00:22:59 any part of the subject. So let's see what kind of  images we are getting. We are saving them inside  

  • 00:23:04 here. You see it has generated this subfolder.  This is important because in the second stage,  

  • 00:23:11 we are going to use this to make them exactly  same resolution. When I enter inside this folder,  

  • 00:23:19 you can see that it has zoomed in the person. So  this is how it works. And when it is zooming in,  

  • 00:23:25 it will not crop any parts of the image. And also  when zooming in, it will try to match the aspect  

  • 00:23:32 ratio that you have given like this. Okay, the  first stage has been completed. Now the second  

  • 00:23:37 stage is resizing them into the exact resolution.  This will crop the subject if it is necessary,  

  • 00:23:44 like cropping the body parts to match the exact  resolution. So this takes the parent folder,  

  • 00:23:50 not this folder. This is not the folder, but this  is the folder that I need to give. And I need to  

  • 00:23:56 change the resolution that I want. So this will  look a subfolder named it as exactly like this.  

  • 00:24:02 You can have multiple resolutions actually. For  example, in the image cropper, I can add here  

  • 00:24:07 another resolution. Let's say 16:9. So this is the  resolution of 16:9 for Qwen image model. Let's add  

  • 00:24:14 it like 1744 to 992. Let's start processing.  It will process this new resolution as well.  

  • 00:24:23 And I am going to see a folder generated here in  a minute when it is processed. Okay, it is started  

  • 00:24:30 processing. Now it will try to match this aspect  ratio. It may not match it exactly. Why? Because  

  • 00:24:36 it is not going to crop any body parts. So you see  this image cannot match that aspect ratio. This is  

  • 00:24:43 not a suitable image for that. This is almost  still square. However, in the second tab, when  

  • 00:24:48 I go to image resizer, when I type it, you see I  have given the parent folder. Let's wait for this  

  • 00:24:55 one to finish. Okay, it is almost finished. By the  way, if you use this YOLO, it is faster than SAM2.  

  • 00:25:02 So just delete this and select your class from  here. It supports so many classes to focus on  

  • 00:25:08 them. Okay, it is done. Now, I am going to make  the output folder as final images, like this,  

  • 00:25:15 and I will click resize images. You can also make  resize without cropping, so it will make padding  

  • 00:25:21 expansion. So let's resize images. I recommend  cropping, it is better. Then let's go back to  

  • 00:25:28 our folder, final images. Okay. In here, you will  see that it has cropped the body parts, resized  

  • 00:25:35 it into the exact resolution like this. And these  are the square images. They are much more accurate  

  • 00:25:42 than the other ones. Now I have my images ready.  However, this is not a very good collection of  

  • 00:25:49 images. It is another thing that you need to be  careful of. I have used these images to train  

  • 00:25:55 the models that I have shown you in the beginning  of the tutorial. So when we analyze these images,  

  • 00:26:01 what do you see? I have full body pose like this.  I have half body pose. I have very close shot.  

  • 00:26:08 And when you have images, what matters is that it  should have good lightning, good focus. These two  

  • 00:26:16 are extremely important. It should be very clear.  All of these images are captured with my cheap  

  • 00:26:22 phone, so they are not taken with a professional  camera. For example, when we look at this image,  

  • 00:26:28 you see it is not even a very good quality. Also,  these are some old images. I didn't update my  

  • 00:26:33 dataset yet, but using medium quality images, and  I am showing you how much you can obtain with a  

  • 00:26:40 medium quality. If you use a higher quality, then  you will get even better results than I did get.  

  • 00:26:46 Why these images are medium quality? I mean, let  me show you this image. You see this image is not  

  • 00:26:52 even a very high quality. This is how it looks.  And this is a real image. This is a raw image.  

  • 00:26:58 And when we look at the AI generated image, as  you can see, it is even higher quality than my  

  • 00:27:04 raw image. And therefore, you should add highest  possible quality images into your training dataset  

  • 00:27:12 to get the maximum quality images. What else  is important? You should try to have different  

  • 00:27:19 clothings, so it will not memorize your clothing.  This is super important. Try to have different  

  • 00:27:24 clothings, different times, different backgrounds,  all of these will help. Whatever you repeat in  

  • 00:27:30 your training dataset, the model will memorize  them. You don't want that. You want only yourself  

  • 00:27:37 or the subject if you are training a style, the  style or an object, the object to be repeated,  

  • 00:27:43 nothing else. I will explain them in the style  and the item training, the product training part.  

  • 00:27:49 And one another thing is that you should add the  emotions that you want. If you want smiling, you  

  • 00:27:55 should add it. If you want laughing, you should  add it. So whatever the emotion you have will make  

  • 00:28:02 100% quality difference in your outputs. Try to  have all the emotions you want. But this is not  

  • 00:28:10 all. Also, try to have all the angles you want.  If you want to generate images that looks down,  

  • 00:28:17 you should have an image that has a look down  like this, or from this angle, this angle,  

  • 00:28:23 whatever angle. So do not add the angles and poses  that you don't want to see after training, and  

  • 00:28:30 add the poses and the angles you want to generate  after training. So if we summarize again, have the  

  • 00:28:38 emotions, have the poses, have the angles, have  different backgrounds, have different clothings,  

  • 00:28:45 have highest possible quality, lightning,  and focus. Do not have blurry backgrounds,  

  • 00:28:52 do not have fuzzy backgrounds, they will impact  your output quality. So in the AI world, whatever  

  • 00:28:58 you give, you get it. And with this medium quality  dataset, I am able to generate amazing images.  

  • 00:29:04 If I increase the number of images, the variety  in these images, I can get even better quality.

  • 00:29:10 Another extremely useful tab we have is Image  Pre-processing. The aim of this tab to make  

  • 00:29:17 you see exact version of your training images  dataset during the training. This tab is extremely  

  • 00:29:25 useful especially if you want to do training with  bucketing, with multiple aspect ratio resolutions.  

  • 00:29:31 So let's say I have a dataset like this and  I want to do training with multiple aspect  

  • 00:29:36 ratios. Remember, for multiple aspect ratios  in the Qwen image training dataset, you have  

  • 00:29:41 to enable bucketing. If you want to find the  parameter fast, open all panels, control F, type  

  • 00:29:48 the name like bucket, and you can find it very  easily. So let's say you have enabled bucketing,  

  • 00:29:54 and you are going to process your images to  see their final version which the Kohya SS  

  • 00:30:00 GUI tuner processes them. So put your input images  folder here, define an output like this one, sub,  

  • 00:30:07 and enable bucketing, then from the architecture,  select the architecture. This matters because  

  • 00:30:13 based on this, the Kohya does bucketing.  So I'm going to select Qwen image. You can  

  • 00:30:19 also make fix exif orientation. Currently, it is  broken. If your image has an orientation problem,  

  • 00:30:24 the Kohya won't fix it. So let's process images,  and it is processed, it shows how many processed,  

  • 00:30:30 the resolutions, the buckets. Now when I open  this subfolder where I have processed them,  

  • 00:30:36 this is how Kohya is going to use my images. You  see these images have inaccurate orientation. So  

  • 00:30:43 it won't be proper training. And furthermore,  some of the images have padding. Let me show  

  • 00:30:49 you one of them. Okay, I couldn't find any  example, but in some images, you may see them,  

  • 00:30:55 they have pad like this to fit into the accurate  bucket. This is how you can preprocess your images  

  • 00:31:03 and see the bucket distribution. This is using  the Kohya implementation itself, so this is 100%  

  • 00:31:10 accurate. This is extremely useful. You can also  change your target resolution to see how they are  

  • 00:31:15 processed actually during the training and you can  see the actual images. One another feature we have  

  • 00:31:21 is in the caching. In the caching section, you  can enable debug mode. If you enable debug mode,  

  • 00:31:28 it will show you each image. However, it won't  work. This is just for debugging to see. So  

  • 00:31:34 you can also enable debug mode image, and when  you run the training this way, it will show you  

  • 00:31:40 every image one by one. Let me demonstrate you  like this one. So it will pop up the image and  

  • 00:31:45 you will see each processed image in your training  dataset. We had only one, so we have seen only one  

  • 00:31:52 image from here. So you can also use this debug  mode. It has console, video, image to see how  

  • 00:31:58 they are actually used during the training. This  can be extremely useful to understand how they  

  • 00:32:03 were actually trained. I really recommend you  to use this image pre-processing. You can also  

  • 00:32:09 fix exif orientation and use the pre-processed  dataset as your final dataset. So this screen  

  • 00:32:16 is extremely important to understand  your images dataset, how it is composed.

  • 00:32:21 Okay, now we have our images ready. How we  are going to structure them? I am going to  

  • 00:32:26 generate a folder here and I will call it as  training_images_dataset. And I am not going to  

  • 00:32:33 put all the images inside here. I am going  to make a subfolder, this is mandatory, 1,  

  • 00:32:39 and I am going to use ohwx. Then I will paste  all the images inside it. This 1 means that it  

  • 00:32:47 is repeating. Repeating means that how many  times these images will be repeated in every  

  • 00:32:52 epoch. You don't need to try to understand  this. The repeating is important when you  

  • 00:32:58 have different subsets of images, and when you  are training a single concept, single subject,  

  • 00:33:04 you don't need different subsets of images. It  is used to balance unbalanced datasets. And with  

  • 00:33:12 Qwen or with Flux or Wan, we are only able to  train a single subject at a time at the moment.  

  • 00:33:19 So currently, we make all repeating 1. However, in  future if we be able to train multiple concepts,  

  • 00:33:26 multiple persons, subjects, styles at the same  time, to balance between different datasets,  

  • 00:33:33 we can have different repeating. What I  mean by that, let me show you. For example,  

  • 00:33:38 the other folder is BBK. And this folder has  only half amount of images. So let's delete this,  

  • 00:33:46 delete this. Yes. So you see this folder has 14  images, the other folder has 28 images. So in  

  • 00:33:55 every epoch, these folder images will be repeated  two times. So each image will be trained twice,  

  • 00:34:02 and each image in this folder will be trained  once. This is the logic of training to balance  

  • 00:34:08 unbalanced datasets during training, but we don't  need it right now. Just make it as 1. And you see  

  • 00:34:16 this is ohwx. Why? Because I am going to generate  captions with just ohwx. I'm not going to write  

  • 00:34:24 detailed captions, and I will explain why. So copy  this path or from here, click this icon and select  

  • 00:34:31 the training_images_dataset folder and select  folder. So make sure to select the parent folder,  

  • 00:34:39 not the subfolder, because it will look for the  subfolder like this. Then set your resolution  

  • 00:34:45 and height. It is trained with best this one, but  if you want to train with a different resolution,  

  • 00:34:51 with a different aspect ratio, you can set it. The  batch size is 1, this is the best quality. I don't  

  • 00:34:57 recommend higher batch sizes. It is only necessary  when you need speed or when you are going to do a  

  • 00:35:03 massive training, but when you are training  a person or a subject, go with batch size 1,  

  • 00:35:09 it is the best quality. Also, learning rates  are set for batch size 1. When you increase the  

  • 00:35:14 batch size, you need to set a new learning  rate. Create missing captions. Currently,  

  • 00:35:19 I don't have any captions in my folder, so they  will be created. It is going to use the folder  

  • 00:35:24 name as a captioning strategy. Then there is  control directory, I will explain that in the  

  • 00:35:30 Qwen image edit model training part. You don't  need to set anything else in here. All you need to  

  • 00:35:36 do is generate dataset configuration, and it will  generate the dataset configuration automatically.  

  • 00:35:43 This is formatted for the Kohya. You can open  this file and see what kind of dataset it has  

  • 00:35:51 generated. This is the config we are going to  give to Kohya automatically. And when I return  

  • 00:35:56 back inside my folder, you will see that it has  generated caption files with the same name as my  

  • 00:36:04 images. I recommend to train with only ohwx as  a trigger word and do not have detailed captions  

  • 00:36:11 because it reduces the accuracy of the training.  You need detailed captions when you are doing a  

  • 00:36:16 very big training like thousands of images or when  you are training multiple concepts which doesn't  

  • 00:36:23 work right now. They bleed each other. But if you  insist on using captions, we have image captioning  

  • 00:36:30 here. This is using the Qwen 2.5 VL, which is the  text encoder used by the model itself. So how does  

  • 00:36:39 it work? First, you need to select the model path.  Click this icon, go back to downloaded models,  

  • 00:36:45 which is here, select this one, okay. You can use  FP8 precision if you have a GPU lower than 24 GB,  

  • 00:36:55 but I have it. Then you can drag and drop any  image file to here. For example, let's see what  

  • 00:37:02 kind of captions it generates for this. By the  way, don't forget to close your Ultimate Image  

  • 00:37:07 Processing CMD window after it is done. Okay, you  see it has generated this caption. So I can use  

  • 00:37:13 this, I can modify this. Let's try another one  with our training images. For example, let's use  

  • 00:37:19 this image and generate caption. Okay, so this is  another caption. You can give custom command to  

  • 00:37:26 it. For example, this is a default prompt it  takes, you can modify this. Or you can batch  

  • 00:37:31 process with caption prefix or caption suffix. It  supports everything. You can also replace words  

  • 00:37:38 like it generates with a individual. You can make  this as a cheerful ohwx, or it may generate with  

  • 00:37:45 a man word. So you can replace man with ohwx man,  person with ohwx person. This supports everything  

  • 00:37:53 as a captioning. This is a really powerful  captioner. Alternatively, you can use Joy Caption  

  • 00:37:59 application we have as well. It is here, you see  this link. So you can install Joy Caption and use  

  • 00:38:04 it to generate captions as well. This is also  one of the most famous captioning model, image  

  • 00:38:09 captioning model. It is also amazing. So this  is captioning. Let me also demonstrate you batch  

  • 00:38:15 captioning. So let's delete the existing captions,  like this. Select this folder. I'm not going to  

  • 00:38:22 give output folder so they will be automatically  saved there. We can also replace words like man,  

  • 00:38:27 ohwx man, it will replace the man word with  it. You can also add caption prefix like ohwx,  

  • 00:38:34 it supports everything. You can also auto-unload,  this is important, so it won't take your VRAM  

  • 00:38:40 space. And then we just need to click start batch  captioning. It supports copy images, scan folders,  

  • 00:38:47 overwrite existing captions, or output format as  a JSON. Also, there are some other parameters you  

  • 00:38:53 can play here to see which one is working best  for your captioning. It supports everything.  

  • 00:38:58 You can follow the start from the CMD window. So  it is currently generating captions, 10 to 28. It  

  • 00:39:06 is pretty fast. And we can see the captions are  getting generated here. When we open the caption,  

  • 00:39:11 you see it added this, also replaced man with  ohwx man. So it supports everything. However,  

  • 00:39:17 I recommend to have only ohwx as a caption. I  compared it with different captioning strategies,  

  • 00:39:26 detailed caption or ultra detailed caption, and  just the trigger word, ohwx is working best. You  

  • 00:39:33 can use any trigger word. And the logic of the  trigger word is a very random keyword. So it  

  • 00:39:39 should be random. It should be a rare word, and  it should be a single word. Use something like  

  • 00:39:44 that as a trigger word, and that's it. Okay,  so the captioning has been completed, but I  

  • 00:39:49 will return back to my dataset preparation and I  will delete all these generated captions, and I  

  • 00:39:56 will click the generate dataset configuration  and I will save my config and I will proceed.

  • 00:40:03 And the next section is Qwen Image Model Settings.  Do not change LoRA to DreamBooth or DreamBooth to  

  • 00:40:09 LoRA because the configurations are automatically  set properly. Always use the base configuration  

  • 00:40:15 from the configs folder. So here, I'm not going to  make any changes. However, if you want to use Qwen  

  • 00:40:23 image edit model, which I will show after training  started as a next step, you can enable this, but  

  • 00:40:28 currently we don't need it. You can train on Qwen  image base model. Okay, the next thing that you  

  • 00:40:34 need to set is the base model checkpoint. So click  this, go back to your training models downloaded  

  • 00:40:40 folder, select the model. So this is the base  model, you see. Then you need to set the VAE.  

  • 00:40:46 Click this, select the VAE, this one. Then select  the text encoder, and it is this one. So we did  

  • 00:40:54 set the folders accurately. Don't change anything  else. Don't change any of these unless you get out  

  • 00:41:03 of VRAM, which can happen if you are using too  much VRAM. So since I am already using like 6 GB  

  • 00:41:10 of VRAM, I can make this like 25. I recommend you  to try to reduce this maybe like 1 or maybe like  

  • 00:41:18 2 and see your speed. If you are getting very slow  speeds, try to increase it slowly. So this depends  

  • 00:41:26 on your computer. I am trying to set them as much  as accurately. Probably you shouldn't change this  

  • 00:41:32 at all, but if you get extremely slow speeds, that  means that it is using shared VRAM. Therefore,  

  • 00:41:39 increase the block swap. Block swap means that it  is going to use your RAM memory for swapping and  

  • 00:41:46 try to fit the trained part of the model into your  GPU. Since I'm using more VRAM than recommended,  

  • 00:41:54 let's make this like 30. My training  speed will get slower, or maybe like 25,  

  • 00:41:59 we can see. Don't change any other settings. And  the next thing that you need to change is inside  

  • 00:42:06 training settings. What you can change here? You  can change the maximum number of epochs. People  

  • 00:42:12 are asking me how many epochs they should do. If  you have below 50 training images or even 100,  

  • 00:42:19 but it depends how much you can wait for training  to be finished, use 200 epochs. Then compare each  

  • 00:42:27 checkpoint and see which one is generating  the best. But let's say you have 100 images,  

  • 00:42:32 then you can reduce this to like 150. Let's  say you have 200 images, then you can reduce  

  • 00:42:38 this to like 100. However, 200 epochs is really  good below 50 training images. And as you have  

  • 00:42:46 more training images with highest quality, with  variety like different backgrounds, clothings,  

  • 00:42:52 angles, poses, it is better quality. So try to  increase the number of images that you have,  

  • 00:42:58 the training images with keeping the quality,  then you can reduce these training epochs.  

  • 00:43:04 As I said, it depends on your GPU, how much you  can wait, what is your computer, your GPU speed,  

  • 00:43:10 but 200 epoch is recommended if you have below 50  images. So I will leave it as a 200 epoch. Don't  

  • 00:43:18 change anything else in here. You can generate  samples during the training, but I don't recommend  

  • 00:43:24 it. It will slow down your training significantly.  Generate samples, the comparison after training,  

  • 00:43:30 which I will show. And in the advanced settings,  you can provide the extra parameters that you  

  • 00:43:37 might have. Currently, we don't need any extra  parameters, and we are all set. Now I will save my  

  • 00:43:45 configuration and I will click start training.  First, it will generate cache files for my  

  • 00:43:53 training images, so it will first load the Qwen VL  model, the text encoder, it will generate encoded  

  • 00:44:00 caches, you can see the progress here, then it  will deload model and start the training. Okay,  

  • 00:44:05 it is going to load the model. I'm using a lot  of VRAM right now. You should restart your PC,  

  • 00:44:12 minimize your VRAM usage. And this loading speed  totally depends on your hard drive speed and also  

  • 00:44:19 your CPU speed because currently we are on the fly  when loading the model converting model into FP8  

  • 00:44:26 scaled. Why we are doing that? Because currently  on Windows, as you use more block swapping,  

  • 00:44:33 it is way slower than compared to Linux. The Kohya  is aware of this and he's working on that. Let me  

  • 00:44:41 show you. So you see he's trying to eliminate the  speed difference between Linux and the Windows  

  • 00:44:49 based on this issue. Let me also show you the  issue that I have generated after doing a lot  

  • 00:44:54 of test and experimentation. Currently, because  of the Windows system, it takes three times more  

  • 00:45:03 duration to swap between RAM and GPU. And as  we use more block swapping, it becomes slower  

  • 00:45:11 than Linux. And if we don't use FP8 scaled,  it becomes even slower because it takes twice  

  • 00:45:18 amount of RAM memory or VRAM memory. So the model  takes twice space on our system. And you will see  

  • 00:45:26 that the training has started. You should try  to get maximum amount of watt usage. Currently,  

  • 00:45:32 it is lower than what I expect, so I might be  using some shared VRAM. So I may reduce block  

  • 00:45:40 swap and compare again. Furthermore, you should  wait more steps because as you do more steps,  

  • 00:45:45 it will get faster. So wait until like 100 steps  to see the duration that is going to take. If you  

  • 00:45:53 say that it is too long for you, what you need  to do is selecting faster configuration from  

  • 00:46:00 the configs. What I mean by that? Select the 100  epoch or 50 epoch. So these uses higher learning  

  • 00:46:08 rates and doing lesser steps. Therefore, for  example, if I use 50 epoch, it will take 1 over  

  • 00:46:16 4 times. So it will be four times faster, and  the quality is very similar to the 200 epoch,  

  • 00:46:22 but 200 epoch is the best quality. But it is up  to you whether you want faster training or not,  

  • 00:46:28 choose your configuration accordingly. Make sure  that you are using minimal amount of VRAM and do  

  • 00:46:35 not do different stuffs while training  and wait for training to be finished.

  • 00:46:41 So can we improve the speed? Yes, as you can see,  I am able to push speed further. How? First of  

  • 00:46:50 all, if you have dual GPUs, connect your monitors  to your weaker GPU. This will make a huge impact  

  • 00:46:57 of the idle GPU usage and with that way you can  push your block swapping lower. For example,  

  • 00:47:04 currently I am just doing seven block swaps on  RTX 5090 and I am training highest quality FP8  

  • 00:47:12 scaled LoRA model. Furthermore, there is a newer  feature we have added. This has been added while  

  • 00:47:19 I was editing the tutorial. You will find it  as use pinned memory for block swapping. This  

  • 00:47:25 is a new feature. It is not merged into the main  repository yet. However, when you are watching,  

  • 00:47:31 hopefully it will be already merged. You can  see the pull request here. I am back and forth  

  • 00:47:37 communicating with Kohya to improve the speed on  Windows devices. We are figuring out new stuff,  

  • 00:47:44 we are trying to make it perfect. Hopefully  when you are watching this tutorial,  

  • 00:47:48 when you are following this tutorial, it will  be implemented and it will be working better  

  • 00:47:52 than right now. You should enable this. This will  increase the RAM usage, so if you get out of RAM,  

  • 00:47:58 out of VRAM errors, then you can disable it.  This is using more system RAM, not the GPU RAM,  

  • 00:48:06 not the GPU memory. So when I say RAM,  it is the system RAM. When I say VRAM,  

  • 00:48:11 it is the GPU memory. For this feature to fully  work, open graphics, you see graphics settings,  

  • 00:48:17 then in here, go to advanced graphics settings,  and in here, uncheck this hardware-accelerated  

  • 00:48:24 GPU scheduling and restart your PC. This should  help you to improve your training speed even  

  • 00:48:29 further. And there is one another thing that  you can even push your speed further. You can  

  • 00:48:35 use MSI Afterburner to increase your GPU clock  speed. This should work fairly well because we  

  • 00:48:42 are still not using the GPU fully because we are  spending a lot of time with the block swapping. So  

  • 00:48:47 how can I make the increase? It depends on your  GPU, but on RTX 5090, I can so I can increase  

  • 00:48:53 the core speed by 320 and I can increase the  memory speed with like 1000 and it should work  

  • 00:49:00 fairly well. I can just apply. You can see the  actual speeds of the core and the memory here  

  • 00:49:07 and this should increase your training speed even  further. So these are the tricks that we have  

  • 00:49:12 right now to improve. And hopefully when this new  feature becomes more mature and fully implemented,  

  • 00:49:19 it will work way faster on Windows and  it will get close speed to the Linux.

  • 00:49:25 So I have trained previously exactly with these  settings. So let's see them how to test them and  

  • 00:49:32 then we will proceed. So once the training has  been finished, you will get exactly like this if  

  • 00:49:38 you did setup like me, the checkpoints, the LoRA  checkpoints. Now we are ready to use them. So I  

  • 00:49:44 am going to use SwarmUI with the ComfyUI backend.  If you don't know how to install and use SwarmUI  

  • 00:49:51 with the ComfyUI backend, we have an excellent  tutorial. You see it is right under the Qwen image  

  • 00:49:56 tutorial video instructions. The link is here.  You need to watch this to learn how to use it.  

  • 00:50:02 Let's open it. So this is a very recent tutorial  that I have made like a few days ago. It is like  

  • 00:50:08 26 minutes, not much long. Watch this to learn  how to install ComfyUI and SwarmUI. You need to  

  • 00:50:17 set it up to be able to use like me. So this is a  fresh install SwarmUI. First of all, I'm going to  

  • 00:50:23 update my SwarmUI. I recommend that and start the  SwarmUI after it. Okay, it is going to start. Yes,  

  • 00:50:29 it has started. I recommend to get the latest  zip file and set the presets. So let's install  

  • 00:50:35 the presets. These are all shown in the tutorial.  Then let's refresh the presets. Okay, our presets  

  • 00:50:41 arrived. The presets are extremely important  because I did update presets and I have made  

  • 00:50:48 them with the best quality for either stylized  generation or realistic generation. So let's sort  

  • 00:50:54 by name. Then for realistic generation, I am going  to use Qwen-Image-Realism-Tier-2. This is a very  

  • 00:51:03 fast one. Direct apply. When you direct apply,  you should see that it has selected this LoRA,  

  • 00:51:09 this base model. When you watch the tutorial,  you will learn how all of these are downloaded,  

  • 00:51:16 installed, and set up. I recommend to follow  that first. Okay. So then let's actually reset  

  • 00:51:22 params to default and then direct apply.  Okay, we are all set. The first thing that  

  • 00:51:27 you need to do is compare your checkpoints to  find out which checkpoint is performing best.  

  • 00:51:35 And how did I do that? Go to tools, select grid  generator, select prompt. Then in this prompt,  

  • 00:51:43 you need to use some prompts. I have pre-made  prompts, but you can write your own prompts as  

  • 00:51:48 well for comparing. So the prompts are  inside Qwen-Training-Tutorial-Prompts,  

  • 00:51:54 and you will see all the prompts that I used.  I'm going to use the prompts for grid find best  

  • 00:52:01 checkpoint prompts myself. Copy it entirely,  paste it into here. Now with these prompts,  

  • 00:52:08 there is one significant difference. You  see that I have written the LoRA name,  

  • 00:52:14 the fast LoRA name at the end of each prompt. And  each prompt is separated with this character. This  

  • 00:52:21 is the format of the SwarmUI. Why do I need to  define it here? Because I'm going to compare LoRA  

  • 00:52:28 checkpoints and I need this fast LoRA, you see it  is also set here, to be able to accurately get my  

  • 00:52:36 images with low number of steps. Otherwise, you  won't get quality outputs. The next step is I  

  • 00:52:43 am going to select LoRA from here. LoRAs. If your  LoRAs doesn't appear here, go to LoRAs and refresh  

  • 00:52:50 for it to see or restart. Then, depending on how  many epochs you did, you should start from the  

  • 00:52:57 half epoch, like 100, and it will be selected,  like 125, click and select, like 150, like 175,  

  • 00:53:06 the final one is this one. So I'm going to compare  these checkpoints and decide which checkpoint I'm  

  • 00:53:14 going to use. You see as a base model, I am using  Qwen image FP8 scaled model because it uses half  

  • 00:53:21 VRAM. This model is huge. If you use BF16, it uses  too much RAM memory and VRAM memory. Therefore,  

  • 00:53:29 I recommend to use this on your Windows computer.  Then set a grid name to your testing, testing  

  • 00:53:36 grid, and click generate grid. Then the SwarmUI  will use the ComfyUI backend and start generating.  

  • 00:53:44 Let's see the first generated image. First of all,  it will load the model. You can see from the logs,  

  • 00:53:50 debug menu, what is happening. You can also follow  the CMD window. This web API is not important or  

  • 00:53:57 this error is also not important. You can ignore  both of them. Okay, I can see the logs. Yes, it is  

  • 00:54:04 starting. We should see the preview around here.  You see it says that there are 61 generations,  

  • 00:54:11 they are queued. Okay, it is loading. You  can watch the nvitop window as well what is  

  • 00:54:17 happening. It is loading the model, it will move  the model into VRAM. Okay. So you see the first  

  • 00:54:23 thumbnail started to appear. This will also  upscale images to 2x. This brings huge amount  

  • 00:54:32 of quality. However, it will take much more time.  If you don't want to wait that much, you can just  

  • 00:54:37 disable this and generate your grid that way. So  it will be way faster. However, if you want the  

  • 00:54:44 highest quality comparison, you shouldn't disable  this. With this preset, it will do four steps for  

  • 00:54:51 base image generation, then it will do four steps  of upscaling into which resolution, into 2536 to  

  • 00:55:00 2536 because we are doubling the resolution which  we set here. We can see the speed here. These are  

  • 00:55:07 the speeds. The upscaling will take like 4x time.  You can see it is like 8 seconds per it, but we  

  • 00:55:14 are doing only total eight steps. And this will  bring highest quality. Currently, it is probably  

  • 00:55:21 testing the first LoRA, which is 100 epoch. This  will be probably under-trained. Okay, let's see.  

  • 00:55:27 Yes, the first image has been generated. I can  say that it is under-trained, not there yet.  

  • 00:55:33 Then to see the entire grid, I will click this  and it will load the entire grid like this. So I  

  • 00:55:40 have done this previously. Let me show you that.  I will close this running SwarmUI and go back to  

  • 00:55:46 my previous installation. Let's start the SwarmUI.  Okay, let's go to tools and grid generator. Let's  

  • 00:55:53 load the grid config and I have the grid somewhere  around here. Yes, LoRA checkpoint test, improved,  

  • 00:56:00 load grid config. Then let's open the grid. Okay.  So this shows all the tests. I am going to change  

  • 00:56:06 how I view it from LoRAs to prompt. So now, you  see the first tested LoRA is here, 75 epoch,  

  • 00:56:15 and the quality is not great. As I scroll to the  right, you see this is 125 epoch. As I scroll to  

  • 00:56:23 right, this is 175 epoch. It is much better. This  is a really good quality. This is exactly the  

  • 00:56:32 config I used just a moment ago. And this is the  final epoch. This is the best one in my opinion.  

  • 00:56:38 As I scroll down, I can see the other images.  So scroll between each image and decide which  

  • 00:56:47 checkpoint is working best for your case. So this  is totally subjective. You need to decide which  

  • 00:56:53 checkpoint is looking best. However, I can see  that 75, 100, 125, even 150 is not very good. They  

  • 00:57:02 are under-trained. And I can see that now it gets  better as I do more training. If you decide to do  

  • 00:57:10 more training, let's say the final epoch is still  not very trained. It is still under-trained. It  

  • 00:57:15 is not your character or style or whatever you are  training. How you can resume training? How you can  

  • 00:57:21 continue training? With LoRA training, to resume  your training, go to LoRA settings and you see  

  • 00:57:27 there is network weights LoRA weight. So you need  to give the path of your final LoRA checkpoint  

  • 00:57:34 here. What I mean by that? Currently my LoRA is  here. So this is the folder of my LoRA. Let's say  

  • 00:57:40 I will continue from this LoRA, while then copy  this path and paste, then put a backslash and copy  

  • 00:57:48 the entire file name. So this is a full path to  my LoRA. Now when I start training, it will start  

  • 00:57:57 from this LoRA and it will continue training from  this checkpoint. However, there is one thing that  

  • 00:58:04 you need to fix. It will still see as starting  from the first epoch. Therefore, let's say I want  

  • 00:58:11 to do total 250 epochs, and my last checkpoint  is 200 epochs, then I type here 50. So it will  

  • 00:58:20 do 50 more epochs, and new saved files will be  actually 250 epochs. I recommend you to change  

  • 00:58:29 the output folder, otherwise it will overwrite  your older LoRAs because it will save them with  

  • 00:58:37 the same way as before. So it really doesn't see  that it is starting from 200 epochs. It sees as it  

  • 00:58:45 thinks as it is starting from the first epoch. So  make sure to change your output directory if you  

  • 00:58:51 are going to resume training, if you are going  to do more epochs with your training. And after  

  • 00:58:56 analyzing this grid, you pick your best checkpoint  and generate images with it. How you can do it?  

  • 00:59:02 Let's refresh. Okay, then let's reset params to  default, let's go to presets, select our preset,  

  • 00:59:10 direct apply. Then select your checkpoint. The  checkpoint that you decided as best. Let's say  

  • 00:59:17 I decided last checkpoint as best, so I click  it. You see now lightning LoRA and my trained  

  • 00:59:23 LoRA are selected. You can change the impact, the  weight of your LoRA from here. Let's say if it is  

  • 00:59:31 too much overfit, you can reduce your LoRA weight  or if it is underfit, you can also increase your  

  • 00:59:37 weight from here. I don't recommend change the  other LoRA weight, it is set accordingly. Then  

  • 00:59:43 type your prompt and generate. So I have some demo  prompts for example here. I can use any of them  

  • 00:59:51 or I can use all of them. So let's make several  examples. For example, let's use this one. Paste  

  • 00:59:58 it here. If you paid attention to my prompts, you  will see that they are constructed for realism.  

  • 01:00:06 They include prompts that would make model to  behave more realistic like Canon 15-35 mm, the  

  • 01:00:15 lens and such. And I will show how I made it. So  then I will click generate, but I want to show you  

  • 01:00:21 one thing. I will first disable the upscale and  I will generate four random images. Okay, let's  

  • 01:00:29 generate. This should be fairly fast when there  is no upscale, it is really fast. And I'm also  

  • 01:00:34 going to change the resolution. So let's cancel  it. Let's make the aspect ratio as 16:9. Okay,  

  • 01:00:42 let's generate. Okay, for example, this image,  it takes only like 14-15 seconds. Why? Because  

  • 01:00:49 I'm recording a tutorial right now. Also, I made  it to reserve VRAM, so it is not the best speed,  

  • 01:00:56 but it is decent. Okay, then let's say I like this  image. I will click reuse parameters. Then I will  

  • 01:01:04 apply the upscale. So this is a specific upscale.  Direct apply. Then the upscale is applied. So pay  

  • 01:01:13 attention to these values. And if your base  model gets changed, if you do fine-tuning, it  

  • 01:01:19 will get changed, repick your base model. However,  currently it is same. Then I will click generate.  

  • 01:01:26 Actually, I need to make this one. Yes. So we will  see the difference between the base generation  

  • 01:01:32 and the upscaled generation. And I am not doing  any face inpainting. If necessary, you can do  

  • 01:01:39 face inpainting as well. I will show an example of  that. You can always from server logs, debug, and  

  • 01:01:46 watch where it is, where is the SwarmUI currently,  what it is doing. Okay, now let's compare the  

  • 01:01:53 difference. This is the base image and this is  the upscaled. You can see how much details and  

  • 01:01:59 realism it adds. This image may not be perfect so  that we may need face inpainting, I will show, but  

  • 01:02:08 this is it. You see, like this to this. Let's also  apply a face inpainting. To apply automatic face  

  • 01:02:14 inpainting, at the end of the prompt, I will type  segment:face and I will type my face prompt, which  

  • 01:02:22 is photograph of ohwx man. Then go to segment  refiner and you see there is segment steps. This  

  • 01:02:30 is important. I am going to make this seven. Why?  Because when I make this seven with 60% of image  

  • 01:02:40 inpainting, I think it is default 60%, let's see.  Yeah, as far as I know it is 60%. It will do four  

  • 01:02:47 steps. And this is necessary because we are using  lightning LoRA. So I have made this segment step  

  • 01:02:53 seven and the rest is default. Let's generate.  This is one option of doing that or you can edit  

  • 01:03:00 the image and inpaint face. I will show that too  after this. Okay, you see first it is inpainting  

  • 01:03:06 the face. I think after that it will upscale. Oh  wait, it used the last generated image then it  

  • 01:03:13 just did the face inpainting. Nice. Okay, I can  see that this is a perfect face. I can play with  

  • 01:03:20 it with the parameters. So the default parameters  are 0.6 to 0.5. I don't remember exactly what were  

  • 01:03:29 they. So to remember it, let's go to SwarmUI  GitHub. In here there is documents. Then in  

  • 01:03:34 the documents, let's search for segment. Okay,  you see there is documents, features, prompt,  

  • 01:03:41 syntax. I go into features and I go to prompt  syntax MD file, then search here segment and  

  • 01:03:51 let's see if it does tell us the variables. Okay,  it explains the variables here. It says that the  

  • 01:03:59 first parameter is the creativity, the other  one is the threshold. So I'm going to increase  

  • 01:04:03 the creativity to like 70% like this. Let's see.  And you can also increase the number of steps it  

  • 01:04:12 does. It can also increase your quality. Okay,  let's see what happens. And yes, this is it. So  

  • 01:04:19 you can inpaint face to make it perfect. How about  inpainting this first image? So to do it, select  

  • 01:04:28 that image, click edit image, and in here, you see  it did set the resolution like this, init image  

  • 01:04:35 and the upscale, yeah, it is not enabled. Okay, I  need to turn off the refine upscale, then I need  

  • 01:04:43 to mask the face. Okay, here. Let's change the  mask radius. Yeah, this needs a total remaking,  

  • 01:04:51 but let's mask face. Okay, like this. Yes. Then  I am going to use the this prompt and still it  

  • 01:05:00 will use the steps from here. I'm not sure. Let's  generate and see what happens. This should only  

  • 01:05:08 inpaint the face. We can see how many steps it  is making. Okay. By the way, the resolution is  

  • 01:05:14 massive, so I don't know how it will do. Okay, it  did only two steps. This is wrong. We need to make  

  • 01:05:20 at least four steps. Yes. Therefore, I'm going to  increase my steps count to like seven. Okay, let's  

  • 01:05:26 try again. Let's see how many steps it is making.  Why it did two steps? The reason is that we have  

  • 01:05:33 init image creativity 60%. So 60% multiplied with  four steps, it does two steps. 60% multiplied with  

  • 01:05:42 seven steps, it is going to do four steps. Yes, I  can see it is doing four steps. Okay, it is using  

  • 01:05:48 the same amount of time as the upscaling. The  advantage of this way is that I can change the  

  • 01:05:54 seed now and I can generate multiple times until  I get the very best one. Yes. Now it is like this.  

  • 01:06:02 If you are not satisfied with it, what you can  do is you can play with the parameters here. You  

  • 01:06:08 can make this 65%, you can make the mask blur  like eight, generate, and decide which one is  

  • 01:06:16 best. This is the way of doing that. You can  change the seed, make it random. So this way,  

  • 01:06:22 you can mask face or fix any part of the image  many times until you get the satisfied results.  

  • 01:06:31 But usually, the generated images are highest  quality, you don't need it. You just need to  

  • 01:06:37 write good prompts, which I am going to show in a  minute. It's upcoming. You can increase the number  

  • 01:06:43 of generations so it will do multiple times image  generation, the face inpainting, and you can pick  

  • 01:06:50 the best one. For example, let's generate four  times with random seed and pick the best one.  

  • 01:06:55 Okay, now it is going to queue. Yes, four images  queued. So I can see which will be the best one.

  • 01:07:02 Okay, so with different seeds, we have different  results and you can pick the best one with this  

  • 01:07:09 strategy. So to continue, I will reset params  to default, then I will refresh, then from the  

  • 01:07:15 preset, let's reselect our preset, select back our  best LoRA checkpoint like this. And let's say you  

  • 01:07:24 want to generate hundreds images with different  prompts. Select your resolution, decide whether  

  • 01:07:29 you want to upscale or not. You can upscale later.  So let's turn it off. Go to wildcards. In here,  

  • 01:07:36 create a wildcard, name it like whatever you  want, and type each prompt here as a new line,  

  • 01:07:42 with a new line. So I have got some demo prompts  I have generated here. So let's copy all of them,  

  • 01:07:50 paste and save. Then click it, it will use each  generation this one of the prompt randomly,  

  • 01:07:58 it will insert it here, and let's generate  10 images. Okay, and generate. This way,  

  • 01:08:03 you can generate hundreds of images with different  prompts, then pick the best one and upscale it,  

  • 01:08:11 inpaint it, work on it. This is a really  good way of batch generating images and  

  • 01:08:17 picking the best image. As you are seeing  live right now, it is really fast to generate  

  • 01:08:23 if you don't upscale because these presets  uses only four steps for base generation.  

  • 01:08:29 I did huge research to find out these presets,  and you can see that even without upscale,  

  • 01:08:36 the quality is decent. But when we latent upscale  it, it becomes the next level. So this is the  

  • 01:08:42 way of finding good images. And how to write  these prompts? So for writing these prompts,  

  • 01:08:49 I am using Google AI Studio. Let's go to Google AI  Studio, Google AI Studio from here. Then in this  

  • 01:08:55 screen, select the Gemini 2.5. Hopefully Gemini  3 is coming. Then in our example prompts, you  

  • 01:09:03 will see that there is Gemini generate realistic  character. Open it, modify this with your needs,  

  • 01:09:11 then copy and paste it into Gemini. Then I make  the temperature lower so it will obey my command  

  • 01:09:20 prompt more and generate. This way, I have  generated the realistic prompts. So read this,  

  • 01:09:27 modify it as you wish, and you can generate  random prompts with this preset way. You can  

  • 01:09:35 test them and pick the good prompts. Then you can  pick the good image and upscale it. This is the  

  • 01:09:41 way of generating amazing quality images. I have  prepared Gemini prompts for stylized character or  

  • 01:09:48 for trained product item or for trained style.  All of them exist to generate random prompts.

  • 01:09:54 Okay, as a next question, you may be asking what  is the difference between tier 1 LoRA and tier 2  

  • 01:10:04 LoRA? So you may be wondering what is the actual  difference between tier 1 and tier 2. As I have  

  • 01:10:10 explained, tier 2 uses FP8 scaled, tier 1 uses  BF16, not FP8 scaled. And these other tiers uses  

  • 01:10:20 lower network rank or lower resolution to reduce  the VRAM usage. So in my test, let me open it,  

  • 01:10:28 I also have tested the quality difference. You  see there is FP8 scaled version BF16 quality  

  • 01:10:34 difference. Let's open the grid and let's make  it as prompt. Okay, here. So the first one is FP8  

  • 01:10:44 scaled, the second one is the BF16. And the third  one is a LoRA trained on the Qwen image edit plus  

  • 01:10:53 model. I need to apply it to the Qwen image edit  as a base model. So this is its actual output. You  

  • 01:11:00 can use the Qwen image trained model on Qwen  image edit plus model or vice versa. However,  

  • 01:11:07 the max quality obtained when you use it on the  same trained model. And the quality difference  

  • 01:11:13 is minimal. I think these are just the seed  differences. However, the actual change appears  

  • 01:11:21 when you apply the Qwen image trained LoRA on  Qwen image edit model like this, but all of  

  • 01:11:27 them is working. So we lose or we don't even lose  much quality between the FP8 and the BF16, between  

  • 01:11:36 the tier 2 and tier 1. You see this is tier 2,  this is tier 1, or this is tier 2, this is tier  

  • 01:11:42 1. Almost same quality. These are just the random  noise differences, tier 2, tier 1. So you can use  

  • 01:11:50 either of them, tier 2, tier 1. Almost same, you  see. There is no big quality difference. Moreover,  

  • 01:11:57 you can train on Qwen image edit model as well. It  works as you can see, and there is an advantage of  

  • 01:12:04 Qwen image edit model which I will show you  in a moment as we progress in the tutorial.

  • 01:12:10 So now, as a next step, how you do fine tuning?  Are there any difference? The only difference of  

  • 01:12:18 fine tuning is that you select fine tuning  configuration. The rest is exactly same as  

  • 01:12:26 the LoRA training. So from the training  configs, select the fine tuning. Again,  

  • 01:12:32 select the number of epochs. By the way, the fine  tuning is slower than LoRA right now on Windows  

  • 01:12:38 especially. On Linux, they are almost same speed.  And select the tier. If you have paid attention,  

  • 01:12:44 all are tier 1 in fine tuning. Because fine  tuning is more optimized, therefore we don't  

  • 01:12:51 sacrifice any quality. But the speed gets slower,  especially on Windows, it is really slow compared  

  • 01:12:58 to the Linux. So select the VRAM according to  your GPU and load with this icon, and that's  

  • 01:13:05 it. The rest is exactly same, absolutely  nothing different. It just sets accurate  

  • 01:13:12 training parameters according to the DreamBooth.  However, there is one important thing that these  

  • 01:13:20 model checkpoints will be 40 GB. Therefore,  by default, I am only generating once every  

  • 01:13:29 40 epochs. Therefore, it will get five  checkpoints, 200 GB. And after training,  

  • 01:13:36 what you need to do is you should convert them  into FP8 scaled. How it works? Let me demonstrate  

  • 01:13:43 you. So let's say I have a full checkpoint in this  folder. Copy this folder path, enter as an input  

  • 01:13:51 folder. You can set output folder, not mandatory.  We are going to use tensor-wise. This is scaled.  

  • 01:13:58 This is not default FP8 generation. This is  tensor-wise made by the ComfyUI and the Musubi.  

  • 01:14:07 The Musubi has also block-wise, this is higher  quality, but ComfyUI is not supporting it yet.  

  • 01:14:13 I made an issue thread and the ComfyUI developer  replied me with Torch version 2.10, he said that  

  • 01:14:21 it is coming hopefully. Currently, we are going to  use tensor-wise. You can also delete the original  

  • 01:14:27 files after conversation, but don't do it at the  first time. So click start conversation. It will  

  • 01:14:34 convert it into FP8 scaled with tensor-wise.  This is really high quality and it is almost  

  • 01:14:42 same quality. After you did this, you will see  that. So you see it is saving the converted  

  • 01:14:49 model. Yes. And it is going to take half space,  20 GB, and it will work on your GPU much easier.  

  • 01:14:58 This is almost same quality as BF16. I have  tested it because this is scaled conversation.

  • 01:15:06 So what is different when you are testing the grid  of the fine-tuned models? This time, we don't need  

  • 01:15:15 to select LoRA. So reset params to default and  let's refresh the models here, and let's go to  

  • 01:15:22 preset, apply our preset, direct apply, go to  tools, grid, let's select the prompt. This is  

  • 01:15:30 for finding the best checkpoint. Tutorial prompts  are here. So the grid test prompt is here. So copy  

  • 01:15:39 them, paste them into prompt. As a next parameter,  we select model and same strategy. Let's refresh  

  • 01:15:45 models, go back to tools and type your epoch like  100, 125, 150, 175. Okay, it is not the accurate  

  • 01:15:56 one. 175 and the last checkpoint. So that's it.  So it will generate the grid and exactly same  

  • 01:16:04 as LoRA, you will compare it and then all you need  to do is select your best checkpoint. For example,  

  • 01:16:12 it is 200, but make sure that you have converted  them into FP8 scaled. Otherwise, it will use  

  • 01:16:20 a lot of RAM memory, it will do a lot of block  swapping, so it will be slower on consumer GPUs.

  • 01:16:27 Okay, as a next step, Qwen image edit  model. This is also exactly same as  

  • 01:16:34 LoRA and fine-tuning. First of all, decide  whether you want to do LoRA or fine-tuning,  

  • 01:16:40 doesn't matter. Let's give an example with the  LoRA since it is lighter weight. So let's load our  

  • 01:16:46 config. Then what is different? The difference  comes from the training dataset. Currently,  

  • 01:16:53 we can generate images with Qwen image edit model  with just text. Therefore, you don't need to use  

  • 01:17:00 edit images. You can use just your base images  to train and it will train and it will use same  

  • 01:17:07 amount of VRAM, same amount of RAM memory, it will  be same speed. So what is different? This time,  

  • 01:17:14 you enable this Qwen image edit model checkbox  and you select the different checkpoint. Which  

  • 01:17:23 checkpoint? You select the Qwen image edit plus  checkpoint and that's it. Now you will be training  

  • 01:17:31 on the Qwen image edit plus model. What advantage  it has? It supports command-based actions. For  

  • 01:17:39 example, let me demonstrate you with this one.  So I can upload an image here. Let's upload a  

  • 01:17:47 prompt image. I am going to use this image as an  upload. Then to get accurate size, I have shown  

  • 01:17:54 all of these in the other tutorials. Let's upload  it here and let's say use closest aspect ratio. So  

  • 01:18:01 it will set accordingly to your input image, then  uncheck this. I also recommend to still upscale,  

  • 01:18:09 and type your command prompt. This is what the  Qwen image edit plus model for. So you see this  

  • 01:18:16 command is replace his face with ohwx man, and hit  generate. By the way, you see that this base model  

  • 01:18:25 is BF16, not FP8 scaled. Therefore, it will  be slower than FP8 scaled. However, it will  

  • 01:18:33 still work. Why? Because since this is using the  ComfyUI as a backend, it will do automatic block  

  • 01:18:40 swapping and it will work, but it will just work  slower. The model loading, the inference because  

  • 01:18:46 of the block swapping. And one more thing is  that, okay, I just noticed that I don't have  

  • 01:18:52 the accurate model right now. Okay, the model  is here. Qwen image edit model trained without  

  • 01:18:58 control images. So same as training the Qwen base  model. I will first convert it into FP8 so it will  

  • 01:19:07 be faster. Copy the folder and batch process.  This convert tool also skips already FP8 models.  

  • 01:19:16 So it is converting the new model. It is also  properly applying metadata as well. Currently,  

  • 01:19:24 it supports Qwen base and Qwen image edit models.  Okay, you see it is converted. Let's put it into  

  • 01:19:31 diffusion models. This is a full fine-tune. Then  let's go back to our model list. Okay, here. Now  

  • 01:19:36 the accurate is selected and hit generate. So  now we are going to, by the way, ignore this  

  • 01:19:42 image. This is from the previous generation. It  is going to apply this input image and convert  

  • 01:19:49 it into new image with this prompt. Actually, let  me make another one so you will see. For example,  

  • 01:19:57 this one, and this has a different aspect ratio.  So to get the accurate aspect ratio, I will use  

  • 01:20:04 the same strategy. Closest aspect ratio. Okay. So  let's cancel the current one. Let's generate a few  

  • 01:20:10 images and pick the best one. Then we can upscale.  Okay, image prompting is automatically selected.  

  • 01:20:16 Let's generate four images. Okay. The upscale  helps here as well. And you can of course do the  

  • 01:20:25 face inpainting as well. This is a Qwen image edit  model trained without control images. Don't worry,  

  • 01:20:32 I am also going to show you how to train  Qwen image edit model with control images  

  • 01:20:40 and prompts like this, like replace his face. So  you will be able to teach the model new prompts,  

  • 01:20:48 new instructions. It is actually so easy. Okay, we  are getting some results. For example, this one,  

  • 01:20:55 this one, this one. Based on whichever the one  you like, then we are going to upscale it. The  

  • 01:21:02 upscaling will improve the quality significantly.  And remember, this model was trained without the  

  • 01:21:10 control images. Okay, for example, let's say  this one. So I will say reuse parameters,  

  • 01:21:17 so it will set the seed accurately. Then I  will enable the upscale. So I will do 60%,  

  • 01:21:25 2x. We are using the 4x Real Web Photo, and I  will make the step count 7. Okay, and generate.  

  • 01:21:34 Let's see after upscaling what we will get.  By the way, some of the images are horrible,  

  • 01:21:39 but after upscale, we should, but after upscaling,  we should get a pretty good quality. And remember,  

  • 01:21:48 this is a prompt that it knows. Furthermore,  you may need to generate more seed to get a  

  • 01:21:54 more accurate one. For example, in the history,  I can show you that this was another generation  

  • 01:22:01 that I have made, and you see it worked perfect.  Moreover, since we upscale, we add more details  

  • 01:22:08 compared to the original image. Let me show you  the original image. So this is the original image.  

  • 01:22:13 You can see the original image details, and this  is the regenerated image. We added more details  

  • 01:22:21 to the original image as well. When we compare  it, you can see that our generated image has  

  • 01:22:27 some more details. And yes, this is the result.  I mean, not the every upscale will be perfect or  

  • 01:22:34 the seed will be perfect. You just need to, oh,  oh, I just noticed something. Currently, we are  

  • 01:22:40 not using the accurate LoRA. That is why we got  these results. So, always, always apply the preset  

  • 01:22:48 to not make mistake like me. So I will just say  direct apply, and let's turn off the refine, and  

  • 01:22:54 let's generate five images. Okay. Now I will pick  a better one. So lightning LoRA is super important  

  • 01:23:02 because we are doing just four steps and without  lightning LoRA, it will not work. Oh, by the way,  

  • 01:23:08 base model changed when I applied the preset.  So you can also edit the preset and set your  

  • 01:23:14 base image. You can duplicate it. I will also show  you the duplicate. So I will say duplicate preset.  

  • 01:23:21 I will edit the preset. Then in the bottom,  display advanced and display normally hidden,  

  • 01:23:28 and I will change the base model into my model,  my trained model, which is here. Then save. Then  

  • 01:23:36 when I apply the preset, it will accurately select  my model. This is the way of duplicating presets,  

  • 01:23:42 editing them. Then let's generate five images.  And let's remove this from batch view. Okay,  

  • 01:23:48 let's delete. You will see that how better it  works now. I'm not going to delete this part of  

  • 01:23:54 the video so that you can learn why it happened.  These are some errors I had. Yes, you see much  

  • 01:24:02 better. Now that we apply the accurate LoRA, it  works much better. And this is the logic. Now when  

  • 01:24:09 I upscale it, it will become perfect. Okay, every  image is accurate. So without LoRA, you get noise,  

  • 01:24:15 you get you get horrible images, but with  accurate preset, you get the accurate images.

  • 01:24:22 So how you can train a real control having  training, like teaching a new command action  

  • 01:24:33 result to the Qwen image edit model. It is so so  easy. Let's open our last configuration, this one.  

  • 01:24:42 Let's open all panels. Then let's go to Qwen image  training dataset section. So this was my dataset.  

  • 01:24:51 Now I am going to also auto-generate black control  images, set your control image and height like  

  • 01:24:59 this with your resolution, and generate dataset.  Then what you need to do is properly replace the  

  • 01:25:09 control images. So let's go back to our training  images dataset folder. This was our folder. Okay.  

  • 01:25:16 So these images wouldn't work for this task. What  kind of images you need? I will show you. When you  

  • 01:25:23 extracted the zip file, when you enter inside  the Qwen training configs, you will see that we  

  • 01:25:30 have Qwen image edit model example dataset. And  this is the example dataset. Let's copy paste  

  • 01:25:38 it into here and analyze it. So now edit images  are provided like this. You see dataset_image_0,  

  • 01:25:46 dataset_image_1. Why I have named them like this?  Because my input image, actually the final image  

  • 01:25:54 that I expect is named as dataset_image. And  this is the caption. So in this caption, you  

  • 01:26:02 give the command, make him wear the headphones.  So this way, you have to prepare your final image,  

  • 01:26:09 input images, and the prompt. Let's say this  is final_image_A. Okay. Then you need to make  

  • 01:26:16 the prompt final_image_A. Then you need to rename  them like final_image_A0, final_image_A1. You can  

  • 01:26:25 provide up to three images as a control image.  So you can have another image named it as like  

  • 01:26:31 two. So you can provide up to three images. Then  you can train it. When you train this way, it will  

  • 01:26:38 learn this command to generate this final image  when you provide these input images. However,  

  • 01:26:47 there is one tricky issue. When you train Qwen  image edit model with control images like this,  

  • 01:26:54 what happens is that it will become slower  and it will use more VRAM. Therefore, this is  

  • 01:27:01 super important to keep in your mind. You need to  increase the block swap count. For example, let's  

  • 01:27:10 make a demonstration. I will close my SwarmUI  and let's save. Then what I need to do is I need  

  • 01:27:18 to enable Qwen image edit model. Then I need to  increase the block swap. Let's make it like 35.  

  • 01:27:27 I'm not sure how much will be sufficient because  I have two control images and they are not even  

  • 01:27:33 the accurate sizes. They are not all 1328. We can  see the generated dataset toml file which it is  

  • 01:27:41 going to use. So you see it says that it is going  to use Qwen image edit control resolution 1328,  

  • 01:27:48 1328, and the general resolution, the directory  of the edit images. So it is all automatically set  

  • 01:27:56 for you. What I need to do is I need to make these  images all 1328, 1328. Actually, let's make it as  

  • 01:28:02 a demo. So I will resize these to 1328. Okay. Then  I will resize this to 1328 as well. How am I going  

  • 01:28:11 to do that? So first resize this to 1328, then  1328 and we can add a padding like this. And then,  

  • 01:28:21 yes, that's it. So all my control images and  my output image is now accurate resolution.  

  • 01:28:27 Then when I click start training, let's watch  what happens. Okay, it says that you don't have  

  • 01:28:35 the okay, I got an error. Why? Because I didn't  click load. So I need to click load. Then okay, I  

  • 01:28:43 have overwritten the previous files because I had  forgotten to click load and I hit save. Therefore,  

  • 01:28:51 I need to reset the parameters. Okay, this one is  true. This one is also true. Okay, now I need to  

  • 01:28:59 select the model file from here. Okay, edit plus,  select it. I will enable this. Okay, these are all  

  • 01:29:08 true. Let's also verify this toml is valid one.  Yes. Okay, now I need to click save. I also need  

  • 01:29:16 to set the swap count to like 35. I'm not sure  which one is best because depending on your number  

  • 01:29:24 of control images, this changes. Now it will  recache because I changed the dataset. Therefore,  

  • 01:29:31 I need a recache. So it is doing the recaching  like this. When it caches, it combines all these  

  • 01:29:39 two and one image into single cache safe tensor  file. So it still generates one image, but this  

  • 01:29:48 one contains all those three images. And you see  it is doing the text encoder caching as well. Now  

  • 01:29:55 we will start the training. However, how much VRAM  it will use, I'm not sure. Okay, you see it has  

  • 01:30:01 filled my VRAM. So let's stop. Let's go to swap  and let's make this 40 and click start training  

  • 01:30:09 again. You should also save your configuration  like this to be sure. Okay, let's see what  

  • 01:30:16 happens now. You can also read the logs on the  CMD. It shows found one matching control images  

  • 01:30:23 for arbitrary images, one images have two control  images. You should verify your logs from here too.  

  • 01:30:30 Okay, this time it is not using the full VRAM.  Therefore, these many block swap was sufficient.  

  • 01:30:37 Now I can reduce block swap count, see the speed.  However, as you use more control images, it will  

  • 01:30:45 become slower. But this is a professional thing  mostly. So you can rent a cloud machine and do the  

  • 01:30:51 training there with a more powerful GPU like RTX  6000 Pro. Hopefully, I will make a cloud tutorial  

  • 01:30:59 as well after this, so you will see how easy it is  to train there. Still, this tutorial is mandatory.  

  • 01:31:06 Okay, you see the first step has been passed. It  is really, really slow. And I need to wait more to  

  • 01:31:14 see its actual speed, but currently I'm not at my  max performance. I am recording video. I need to  

  • 01:31:20 restart, close all the running applications and  such. But this is the way of training an actual  

  • 01:31:28 Qwen image edit model with a specific task, with a  specific command you want, like replace clothings,  

  • 01:31:35 change hair, or whatever you want to do as  a command, you can teach it to the model.

  • 01:31:41 So how you can resume your fine-tuning tutorial?  Let's refresh our configuration. Normally,  

  • 01:31:47 we give the base model, either it is Qwen image  base model, Qwen image edit plus base model. So  

  • 01:31:54 to continue your fine-tuning training, we are  going to give our checkpoint. For example,  

  • 01:32:00 you see my checkpoints. This is 125  epoch, this is 175 epoch. Let's say my  

  • 01:32:08 last checkpoint was 100 epoch. So I select  that model, and when I start training now,  

  • 01:32:15 it will be continuing from this checkpoint. My  configuration, my workflow is made in a such way  

  • 01:32:23 that this is equal to training from start to 200  or doing 100 more steps to reach the 200 epochs.  

  • 01:32:32 So it will be totally same whether you continue  from your last checkpoint or you do from 0 to 200  

  • 01:32:39 epochs at once. This is the logic of continuing  the fine-tuning. Now I need to reduce my training  

  • 01:32:47 epoch count from 200 to 100 because when you  use either it's a LoRA or fine-tune checkpoint,  

  • 01:32:55 it will not know where the training was left  off. So you need to calculate the difference  

  • 01:33:02 and do more epochs like this way. This is the  way of continuing your fine-tuning training.

  • 01:33:08 Before I show you the style training and also the  product training, let's make a recap of how to  

  • 01:33:15 use our trained LoRAs and fine-tuned models. So  for LoRAs, you put your LoRAs into SwarmUI into  

  • 01:33:23 models into LoRA folder like this. For fine-tuned  models, first convert them into FP8 scaled. I  

  • 01:33:31 recommend that. It is not mandatory, but make sure  to convert so they will work faster. Then put them  

  • 01:33:38 into SwarmUI/models/diffusion_models folder like  this. You see my files are here. Then let's start  

  • 01:33:45 our SwarmUI as usual, windows_start_swarmui.  Then Quick Tools, reset params to default,  

  • 01:33:52 presets, apply our preset. This is the preset  that we use. You see, Qwen-Image-UHD-Tier-2,  

  • 01:34:00 direct apply. If you are going to use a LoRA,  you just need to go to your LoRA tab, select  

  • 01:34:07 your LoRA, whichever the one you want to use. For  example, this LoRA, make sure that no unnecessary  

  • 01:34:13 LoRAs are selected, and this Lightning 4-step  LoRA is selected. The preset may get updated,  

  • 01:34:19 so this selected LoRA may get changed when you  are watching this tutorial because there are  

  • 01:34:24 always some newer LoRAs, some newer ways that  gets faster. So just additionally select your  

  • 01:34:31 LoRA. Then type your prompt. For example, let's  use this prompt and hit generate. You see that the  

  • 01:34:38 preset selecting the Qwen image FP8 scaled model  as a base model because when you are using LoRA,  

  • 01:34:45 you need to use the base model that you trained  it on. You can use with other base models as well,  

  • 01:34:50 as long as they are Qwen models. However, it  will work best with the base model that it was  

  • 01:34:56 trained on. This is the logic of LoRAs. And we  are getting our image generated. To test faster,  

  • 01:35:02 I recommend to turn off upscale, generate images,  then on the ones that you like, you can apply the  

  • 01:35:10 upscale as well so that you won't be waiting  unnecessarily for upscale part to be finished.  

  • 01:35:17 If you don't like the preview image, you can  always cancel and try with a new different  

  • 01:35:23 seed. As long as the seed is -1, it will generate  a different image. And we got our image generated.

  • 01:35:31 So how do I use my fine-tuned model? And  you may be wondering why you should train  

  • 01:35:36 fine-tune because fine-tuned models are higher  quality than LoRAs. That is the reason. They  

  • 01:35:42 are able to generalize better, they can do more  poses, more emotions better, not much different,  

  • 01:35:49 very close to the LoRA, but still better. So  let's refresh this page, reset params to default,  

  • 01:35:56 presets, let's apply our preset, direct apply,  type our prompt. And now you need to select  

  • 01:36:02 your fine-tuned model instead of the base selected  model. So I'm going to select my fine-tuned model,  

  • 01:36:09 which is here. You see my Qwen fine-tuned model  FP8 converted by me. And that's it. Then you need  

  • 01:36:16 to select your aspect ratio, the resolution  whichever you want. For example, this one and  

  • 01:36:21 generate. We also already have seen how to do  face inpainting, how to fix face. The logic is  

  • 01:36:27 same. You can also fix other parts, either with  inpainting or with segmentation. It should work,  

  • 01:36:34 the logic never changes, but how you apply it  changes, and it comes with experience and using  

  • 01:36:40 the program, doing more generations. And this is  the generation of fine-tuned model. If you ask my  

  • 01:36:46 opinion, of course fine-tuned model is better,  but with LoRA you can generate more images and  

  • 01:36:51 get the perfect image or you can do inpainting,  face inpainting, and fix manually. It depends your  

  • 01:36:59 case. If you are using this professionally, then  I recommend to either wait for fine-tuning to be  

  • 01:37:05 finished or use cloud services like MassedCompute  or RunPod. We already have the installer scripts,  

  • 01:37:11 and hopefully I will make another tutorial to show  that, but you can already train on them as well.

  • 01:37:16 So now let's talk about style training. What  changes? With style training, everything is  

  • 01:37:23 exactly same. So what is changing? What changes is  the dataset. So I already have attached the GTA 5  

  • 01:37:31 style dataset in our post. You see, remember, Qwen  image tutorial video instructions. Let's download  

  • 01:37:37 the style dataset and I also shared the result  model in this CivitAI link, so you can download  

  • 01:37:44 and use it already. The FP8 scaled version is  shared. You see it is 19 GB of file. So far,  

  • 01:37:53 the comments are good, and you can use this model  and generate yourself. Okay, let's look at the  

  • 01:37:59 used style dataset. So let's move this into our  folder. You can move anywhere. Let's extract it,  

  • 01:38:06 and let's make the analysis of it. So the style  dataset, again, only trained with a trigger word,  

  • 01:38:13 not a detailed captions, just ohwx. I didn't  use anything else. And this was the dataset.  

  • 01:38:20 When you analyze this dataset, you will see some  of the key things. The first thing is that it is  

  • 01:38:26 extremely consistent. This is mandatory for  training a style. Consistency of the style.  

  • 01:38:32 The second thing is that no character repeats or  no scene repeats or no object item repeats. This  

  • 01:38:41 is super important. So you should try to avoid  repeating. For example, repeating a person that  

  • 01:38:48 will cause model to memorize, or repeating an item  like this helicopter, you shouldn't repeat items,  

  • 01:38:55 you shouldn't repeat objects, you shouldn't repeat  persons, places, buildings, nothing should repeat.  

  • 01:39:01 But you may be saying that, okay, these two scenes  are very similar. It is true because there weren't  

  • 01:39:08 enough, there weren't sufficient amount of  dataset to train. Therefore, I cropped some  

  • 01:39:15 of the images and made them multiple images.  So this image is actually, let me open it,  

  • 01:39:21 so this image is actually cropped from this big  image. But you see, I tried to not repeat the same  

  • 01:39:29 objects as much as possible. I tried to avoid it.  So this is the way of preparing a style dataset,  

  • 01:39:36 consistency, not repeating objects, items,  persons, characters, whatever you can think  

  • 01:39:43 of. Only style should repeat. Only style should  consistent. Everything else should be different  

  • 01:39:51 in every image. With style training, as more as  images you have, you will get better results. This  

  • 01:39:58 is really, really important. Try to collect more  images for style training. And when you train,  

  • 01:40:05 you will see how high quality you get. I don't  recommend to have detailed captions. Just use  

  • 01:40:11 ohwx. This is working best for the Qwen and also  recently for Flux, I am using the same strategy,  

  • 01:40:18 and also for one which is coming, probably will  be same. I haven't tested yet, but probably. So  

  • 01:40:25 how I am able to generate amazing quality images  with just using ohwx during the training? I mean,  

  • 01:40:34 let's look at some of the images again, like this  one or like this one. The logic is the detailed  

  • 01:40:41 prompting. So for very detailed prompting, I am  using this strategy. Let's open the Google AI  

  • 01:40:48 Studio. As usual, Google AI Studio from here,  and then upload your style images. This is the  

  • 01:40:57 lazy way of doing that. You can of course manually  also test it, but I prefer this lazy way because  

  • 01:41:03 it makes it easier. So the dataset images are  here. So just select all of them or like 20 of  

  • 01:41:10 them. It is up to you. And drag and drop into the  this section. Then, this is super important, make  

  • 01:41:18 the media resolution highest possible. Currently  medium is highest possible. This will make the  

  • 01:41:24 model process these images with higher quality  and higher accuracy. Then set the temperature like  

  • 01:41:31 50%, and what prompt, what command you need to use  to get proper captions, proper prompts? It is all  

  • 01:41:40 shared inside the Qwen training tutorial prompts.  So to generate example prompts, I'm going to use  

  • 01:41:48 Gemini generate trained style prompts. You can  read this and change it according to your needs,  

  • 01:41:55 then copy paste it here. So with this prompt, it  is going to give me 100 unique prompts to generate  

  • 01:42:03 in SwarmUI or in ComfyUI, whichever the one you  are using. This will ensure that the generated  

  • 01:42:10 prompts includes elements that will make the  model generate images according to my trained  

  • 01:42:17 style. It will improve its consistency, its  accuracy. Even though I trained with just ohwx,  

  • 01:42:24 this will work. Why? Because these models, Flux or  the Qwen, encode your training images. So whether  

  • 01:42:31 you caption them or doesn't caption them, they are  still internally captioned during the training. It  

  • 01:42:38 is a very technical thing, but you can still  say that the model knows your image content.  

  • 01:42:44 So it still flows information into those captions,  whether you use detailed captions or not. Then hit  

  • 01:42:52 generate icon. So now it will generate me example  prompts. Analyze the generated prompts and you  

  • 01:42:58 will understand the logic. It will give you idea  how you should prompt your style after training.  

  • 01:43:05 This will significantly improve the accuracy of  your generated images with your style. And this  

  • 01:43:13 applies to all style trainings. Believe me, you  will be able to generate amazing stylized images,  

  • 01:43:20 amazing images in your style after you do this.  Another use case of style training could be that  

  • 01:43:26 you might have line art image, then you can say  turn it into my style and final image. So you can  

  • 01:43:33 train Qwen edit model with this strategy and you  can have a model that can convert your line art  

  • 01:43:40 images into your style painted, into your style  colored images. We have already seen the logic  

  • 01:43:48 of the Qwen image edit model training, so check  that part again if you don't know, but this is  

  • 01:43:54 the way of training a style, the logic of training  a style. You can see that these are all amazing,  

  • 01:44:00 these are all extremely consistent with the  dataset, and it is extremely versatile model,  

  • 01:44:06 not overfit. It can still generate pretty much  everything or anything, and this is the exactly  

  • 01:44:13 way that I have trained. I am still using the same  configuration. The configuration doesn't change  

  • 01:44:18 for style or for product or for person, it doesn't  change. What changes is the dataset, how dataset  

  • 01:44:25 is prepared and the how many epochs you do. If  you have more images, you can do lesser epochs,  

  • 01:44:32 but with style, I recommend to do more epochs  because it learns it matter. And you can just  

  • 01:44:38 download this model from CivitAI and generate  images right away yourself if you wish as well.

  • 01:44:45 So how are you going to generate images  with your trained style? Let's refresh.  

  • 01:44:49 Let's make reset params to default. Go  to presets and for style generation,  

  • 01:44:55 we have two presets. Qwen-Images-Stylized-UHD  or Qwen-Images-Stylized-UHD-Tier-1. The tier  

  • 01:45:02 1 is better, but it takes more time, it takes  more steps. So let's make an example with the  

  • 01:45:09 tier 2. This will be a quick example. I have  selected it. Then I need to select my trained  

  • 01:45:14 model. Currently I have full trained model, not  a LoRA for style. It is here. I have selected  

  • 01:45:20 it. Let's change the aspect ratio. Then let's  use one of the generated prompt. For example,  

  • 01:45:27 let's use this one and turn off refine upscale.  Let's generate eight images. Then we can pick  

  • 01:45:33 the best one and upscale it. Okay, I have got two  images generated. For example, let's upscale this  

  • 01:45:39 particular one. The seed is here. I will set  the seed and I will just enable refine upscale  

  • 01:45:46 and generate. So this was the base generated image  without any upscale and let's see the result after  

  • 01:45:52 upscaling. So it is upscaling right now. If you  instead use tier 1, it will do more steps during  

  • 01:46:00 the upscale and it improves the quality. So if  you are looking for maximum quality, you can use  

  • 01:46:06 Qwen-Image-Stylized-UHD-Tier-1 configuration.  These configs may get updated over time,  

  • 01:46:13 so make sure to read Patreon post changes  and the newest presets descriptions. Okay,  

  • 01:46:18 the upscale completed. I have forgotten images  eight, so it was generating another one. So yes,  

  • 01:46:24 this is the upscaled version. Let's compare  it with the base version. So this was the  

  • 01:46:28 base version and this is the upscaled version.  And this upscale was very, very fast because  

  • 01:46:34 it was only four steps. However, you can do more  steps to get even better, higher quality details.

  • 01:46:41 Okay, what about product training? The product  training dataset preparation is different than  

  • 01:46:48 both style and character. And let me explain  you the logic of product training. So I have  

  • 01:46:55 prepared a product dataset like this one. Probably  I have used this one, not the very accurate one.  

  • 01:47:02 And because I used this dataset, what happened  is that its sizes in some cases, the perfume  

  • 01:47:10 size is not very accurate because you see all  of these images are extremely close shot. So the  

  • 01:47:17 AI didn't learn its proportions properly. I also  had another dataset which I was planning to use,  

  • 01:47:26 this one, were including shots that a person was  holding it like this. So you should have mixed of  

  • 01:47:34 product images. Some of them should be very close,  so it learns details. Some of them should be far  

  • 01:47:41 distant, so it will learn its proportions. This  is important. Imagine with the which way you want  

  • 01:47:47 to generate the product images after training, so  that you should have such images so that it can  

  • 01:47:54 learn its proportions. You see there is a glass  behind of the perfume bottle. So the model will  

  • 01:48:01 understand that this is the proportion of the  product image according to a glass. Moreover,  

  • 01:48:08 you can see how powerful this training is. You  see this icon was perfectly learned by the AI,  

  • 01:48:17 like this one. So Qwen is extremely powerful  when it comes to learning details or learning the  

  • 01:48:24 detailed small text, unlike Flux, this model is  much more powerful for text learning, for learning  

  • 01:48:32 the text on small products, and it can generate  amazing quality images like this one. It is up  

  • 01:48:40 to your imagination after training. And again,  I just used the ohwx as a caption. I didn't use  

  • 01:48:48 detailed captions, and there is another strategy  to generate the product prompts for inference.  

  • 01:48:56 So again, we upload our product images into the  Gemini. So select a few of them, like these ones.  

  • 01:49:03 You can select more, of course. Selecting more  will help the Gemini to understand better. Then  

  • 01:49:09 in the Qwen training tutorial prompts, you will  see Gemini generate trained product item prompts.  

  • 01:49:15 So you can modify this as the way you want and  then paste it here and hit enter. So this way,  

  • 01:49:23 it will generate me example prompts and it will  also describe the text on the product. So you see,  

  • 01:49:31 during the inference, we describe whatever we want  with details to improve its accuracy, to improve  

  • 01:49:38 its consistency. During training, we just used a  single activation token, a rare word, a rare token  

  • 01:49:46 like ohwx, but during the inference, we give a  very detailed description, a very detailed prompt  

  • 01:49:54 to match perfectly with whatever we have trained,  especially if the product is a very rare product,  

  • 01:50:01 this will help more significant. When you train  a character, this is not that mandatory because  

  • 01:50:07 character knowledge, the person knowledge of  the models are massive compared to the your  

  • 01:50:13 specific products or your specific styles. And  you will see that it has generated some example  

  • 01:50:20 prompts. You see it defines double C logo  and the text on the product. These two will  

  • 01:50:26 help significantly to generate product images  accurately after training. And then you will  

  • 01:50:32 be able to generate amazing quality images like  these ones that you can use for advertisement,  

  • 01:50:38 for demo. I mean, you can even see that it has  this pipe accurately as well. This is a very,  

  • 01:50:46 very small detail. However, it is able to do  that. So this is the way of training a product.

  • 01:50:52 Thank you so much for watching. I recommend you to  join our Discord channel. You can always message  

  • 01:50:58 me from there. You will see the Discord channel  link at the top. I recommend you to go to our  

  • 01:51:03 GitHub. You will see a lot of information  here, fork it, star it, watch it. You can  

  • 01:51:09 also sponsor me from here. When you go to our  wiki, you will see all of our tutorials. You  

  • 01:51:15 see we have hundreds of tutorials. You can search  the tutorials from here with control F. Also on  

  • 01:51:20 the main page, you will see some sorted way of  tutorials. Let me show you. As you scroll down,  

  • 01:51:27 you will see starting from one to the  latest ones going this way. Moreover,  

  • 01:51:32 we have Reddit. I recommend to join our Reddit.  We are getting bigger and bigger, more visitors,  

  • 01:51:39 more people. Let's see some of the stats.  You see 300k visits we have. We have members,  

  • 01:51:45 they are increasing. And you can follow me on my  LinkedIn. This is my real LinkedIn profile. You  

  • 01:51:52 can follow me here. Furthermore, don't forget  to subscribe our channel, also open bell,  

  • 01:51:57 the notifications. You can see our videos from  here. You can search our videos. We are getting  

  • 01:52:03 hopefully bigger and bigger. I am also giving  private lectures. Let's say you want to learn  

  • 01:52:09 one to one, you can message me. I am giving  private lectures to both individuals or the  

  • 01:52:14 companies. Moreover, I am giving consultation  to companies as well. So you can always message  

  • 01:52:20 me with replying to the video or from Discord  or from LinkedIn, all of them should work. So  

  • 01:52:26 thank you so much for watching. Hopefully,  see you in another amazing tutorial video.

Clone this wiki locally