Double Your Stable Diffusion Inference Speed with RTX Acceleration TensorRT A Comprehensive Guide

Double Your Stable Diffusion Inference Speed with RTX Acceleration TensorRT: A Comprehensive Guide

Full tutorial link > https://www.youtube.com/watch?v=kvxX6NrPtEk

Stable Diffusion Gets A Major Boost With RTX Acceleration. One of the most common ways to use Stable Diffusion, the popular Generative AI tool that allows users to produce images from simple text descriptions, is through the Stable Diffusion Web UI by Automatic1111. In today’s Game Ready Driver, NVIDIA added TensorRT acceleration for Stable Diffusion Web UI, which boosts GeForce RTX performance by up to 2X. In this tutorial video I will show you everything about this new Speed up via extension installation and TensorRT SD UNET generation.

#TensorRT #StableDiffusion #NVIDIA

Automatic Installer Of Tutorial ⤵️

https://www.patreon.com/posts/automatic-for-ui-86307255

Tutorial GitHub Readme File ⤵️

https://github.com/FurkanGozukara/Stable-Diffusion/blob/main/Tutorials/Tutorial-Achieving-Significant-Stable-Diffusion-Speed-Improvement-With-RTX-Acceleration.md

00:00:00 Introduction to how to utilize RTX Acceleration / TensorRT for 2x inference speed

00:02:15 How to do a fresh installation of Automatic1111 SD Web UI

00:03:32 How to enable quick SD VAE and SD UNET selections from settings of Automatic1111 SD Web UI

00:04:38 How to install TensorRT extension to hugely speed up Stable Diffusion image generation

00:06:35 How to start / run Automatic1111 SD Web UI

00:07:19 How to install TensorRT extension manually via URL install

00:07:58 How to install TensorRT extension via git clone method

00:08:57 How to download and upgrade cuDNN files

00:11:23 Speed test of SD 1.5 model without TensorRT

00:11:56 How to generate a TensorRT for a model

00:12:47 Explanation of min, optimal, max settings when generating a TensorRT model

00:14:00 Where is ONNX file is exported

00:15:48 How to set command line arguments to not get any errors during TensorRT generation

00:16:55 How to get maximum performance when generating and using TensorRT

00:17:41 How to start using generated TensorRT for almost double speed

00:18:08 How to switch to dev branch of Automatic1111 SD Web UI for SDXL TensorRT usage

00:20:33 The comparison of image difference between TensoRT on and off

00:20:45 Speed test of TensorRT with multiple resolutions

00:21:32 Generating a TensorRT for Stable Diffusion XL (SDXL)

00:23:24 How to verify you have switched to dev branch of Automatic1111 Web UI to make SDXL TensorRT work

00:24:32 Generating images with SDXL TensorRT

00:25:00 How to generate TensorRT for your DreamBooth trained model

00:25:49 How to install After Detailer (ADetailer) extension and what does it do explanation

00:27:23 Starting generation of TensorRT for SDXL

00:28:06 Batch size vs batch count difference

00:29:00 How to train amazing SDXL DreamBooth model

00:29:10 How to get amazing prompt list for DreamBooth models and use them

00:30:25 The dataset I used for DreamBooth training myself and why it is deliberately low quality

00:30:46 How to generate TensorRT for LoRA models

00:33:30 Where and how to see TensorRT profiles you have for each model

00:36:57 Generating LoRA TensorRT for SD 1.5 and testing it

00:39:54 How to fix TensorRT LoRA not being effective bug

Video Transcription

00:00:00 Greetings everyone.
00:00:01 NVIDIA has released their newest driver along with an amazing extension made for Stable
00:00:07 Diffusion Automatic1111 Web UI interface.
00:00:11 So what is it?
00:00:12 It is RTX Acceleration with TensorRT.
00:00:15 When we go to this announcement page, you will see that up to 2 times speed up with
00:00:22 geforce RTX 4090.
00:00:24 In this tutorial, I will show you how to install this amazing extension and how to use it step
00:00:32 by step.
00:00:33 Why you should follow this video?
00:00:35 If you want to obtain up to 70% speed improvements, then you should follow.
00:00:41 Let me show you some of the comparisons.
00:00:43 For example, when used on Stable Diffusion 1.5 based model and 512x512, you see from
00:00:52 19.30 it second to 30.87 it per second we are reaching with RTX 3090 TI.
00:01:02 It uses a little bit more VRAM, however it is totally worth it.
00:01:07 There is a huge speed improvement from 7.94 to 13.75 it per second and for SDXL the speed
00:01:17 is even more amazing from 3.61 it per second to 6.04 it per second and these are obtained
00:01:28 with relaxed TensorRT not for static.
00:01:33 You will understand what I mean when you watch the entire tutorial.
00:01:37 With making static resolution TensorRT models you can get over 70% improvements as you are
00:01:45 seeing here.
00:01:46 So watch this tutorial do not miss it.
00:01:48 Watch the every part of this tutorial.
00:01:50 So for this tutorial as well I have prepared an amazing Github readme file.
00:01:55 The link of this file will be in the description of the video.
00:01:58 Please do not forget to star my repository, follow me on these platforms.
00:02:04 It is helping me hugely and don't forget to subscribe to our channel.
00:02:08 So I will begin with installing a fresh installation of Automatic1111 Web UI.
00:02:14 If you don't know how to install Automatic1111 Web UI please watch this amazing tutorial
00:02:20 where I have shown how to install Python, Git and Automatic1111 Web UI.
00:02:26 You need to have Python and Git to follow this tutorial.
00:02:30 I will use my automatic installer.
00:02:32 The link is here.
00:02:33 I will also show how to install everything, how to install RTX acceleration with TensorRT
00:02:40 step by step as well.
00:02:42 So this tutorial will cover both for my Patreon supporters and for my non-Patreon supporters.
00:02:49 I will show every step don't you worry about that.
00:02:51 So all the files are here.
00:02:53 Let's download all_files.zip file from here or you can download from the attachments.
00:02:59 This is not mandatory but for fresh installation I will use it because it is easier for me
00:03:05 to do.
00:03:06 So let's say fresh_1 let's enter inside it.
00:03:09 Extract here.
00:03:10 You can skip this step if you already have a fresh installation of Automatic1111 Web
00:03:15 UI.
00:03:16 So let's begin the automatic installation.
00:03:17 This will do everything for us automatically.
00:03:20 As I said if you don't know how to install Auto1111 automatically follow this excellent
00:03:25 tutorial.
00:03:26 I have shown step by step how to do it.
00:03:28 So the automatic installation has been completed and the Web UI started.
00:03:32 First of all go to the settings and in here find the User Interface which is here.
00:03:38 Click there.
00:03:39 Here go to the info quick settings list.
00:03:41 Search for SD and select SD_UNET and search for VAE and select SD_VAE.
00:03:49 After that apply settings and reload UI.
00:03:52 So after the reload you will have SD UNET and SD VAE options here.
00:03:58 My automatic installer automatically downloaded Realistic Vision version 5.1, SDXL base 1.0,
00:04:04 SDXL refiner and SD 1.5 base model.
00:04:09 It has downloaded best VAE for SDXL FP16 version and best VAE for SD 1.5 based models.
00:04:19 I also made another installation, another fresh installation fresh_2.
00:04:24 So on fresh_1 I will show my Patreon script automatic installation and on fresh_2 I will
00:04:31 show manual installation.
00:04:33 The next step is installing the extension.
00:04:37 Which extension?
00:04:38 The Stable Diffusion Web UI TensorRT extension.
00:04:40 The link is on the Github readme file.
00:04:43 So this is the extension that is developed by NVIDIA.
00:04:46 Can you imagine that?
00:04:48 NVIDIA is officially developing an extension for Stable Diffusion Automatic1111 Web UI
00:04:55 Github repository.
00:04:56 That is amazing.
00:04:57 That is amazing.
00:04:58 This is where AMD is lacking.
00:05:01 This is where...
00:05:02 Actually Intel also started to support it.
00:05:04 So this is where AMD is lacking.
00:05:07 We are going to install this extension.
00:05:09 Let's install it automatically first then let's install it manually.
00:05:14 So in my fresh_1 folder I have automatic installer for this extension.
00:05:19 All you need to do is double click install_TensorRT.bat file.
00:05:26 It will install the extension into the accurate folder.
00:05:29 It will also download and install the latest cuDNN file for you fully automatically.
00:05:37 Don't you worry, we will also download it from here manually.
00:05:40 So if you don't have my Patreon supporting, you will know how to install it.
00:05:45 Moreover, I suggest you to install latest Game Ready driver, which is released by NVIDIA
00:05:52 very recently.
00:05:53 The driver link is also here.
00:05:55 You see NVIDIA Geforce drivers.
00:05:57 I also compared it 536 to 545, which is the latest driver.
00:06:03 I didn't see any speed changes with my RTX 3090 TI GPU.
00:06:10 Actually, I had also recorded that part, but since there were no speed changes, I am not
00:06:15 going to include that part into this video.
00:06:18 But for TensorRT, I suggest you to install latest Game Ready Drivers from here.
00:06:25 So the automatic extension installation is completed.
00:06:28 It also downloaded and replaced latest cuDNN files as you are seeing right now.
00:06:33 All we need to do is now start the Automatic Web UI one more time.
00:06:39 Moreover, this is the only command line arguments I am using.
00:06:42 I suggest you to test this extension on a fresh installation first.
00:06:47 Once you make it working, then you can use it on your existing installation.
00:06:51 So now it will install the extension and its dependencies.
00:06:55 There are still some issues with the extension.
00:06:58 It is not still fully stable yet.
00:07:00 It is getting developed by active developers.
00:07:04 I am also very active here, opening issues, doing pull requests, and other things.
00:07:09 So just wait when the first time it is getting installed.
00:07:14 Meanwhile, let's also do the installation on the manual installation.
00:07:18 So let's start the manual installation.
00:07:20 If you are not my Patreon supporter, this is how you are going to install.
00:07:24 Let's start the Web UI.
00:07:25 So this one is now installing the dependencies of the extension.
00:07:30 Okay, it is loaded.
00:07:31 Let's also make the same changes from here.
00:07:34 User interface, VAE.
00:07:35 This is my manual installation and SD UNET.
00:07:40 Apply settings, reload UI.
00:07:42 So how you are going to install this extension manually?
00:07:45 Copy the URL.
00:07:46 You can also copy the URL from here, right-click and copy link address.
00:07:50 Go to extensions.
00:07:52 And in here go to install from URL, paste it there, install.
00:07:56 Alternatively, you could clone it manually into the extension folder.
00:08:01 How is it done?
00:08:03 It is so easily done.
00:08:04 Go to the extensions folder.
00:08:06 And in here open a CMD like this.
00:08:09 You see while you are inside extensions folder, type git clone, paste the link and hit enter.
00:08:14 And it will clone the extension into the extension folder, then restart the Automatic Web UI.
00:08:21 But you can also alternatively use install from URL.
00:08:25 If you use install from URL, it will install dependencies as well.
00:08:29 You will also get this error even though when you have installed the latest cuDNN file.
00:08:35 This is the mistake of extension.
00:08:38 Still not fixed but it will be fixed.
00:08:40 You can ignore this.
00:08:41 So click OK to all of the errors and extension is installed.
00:08:46 Now it is ready.
00:08:47 Let's close this.
00:08:48 This is how you install extension.
00:08:50 So as a second step for my automatic installation you don't need it.
00:08:54 But for manual installation you should upgrade your cuDNN to latest version.
00:09:00 The link is here.
00:09:01 There are actually two links.
00:09:02 Let me show you.
00:09:03 This is the official website for NVIDIA cuDNN.
00:09:05 And in here you are supposed to download cuDNN version 8.9.4 which is released on 8 August
00:09:15 for Cuda 11.x.
00:09:17 This will ask you to register and log in NVIDIA developer site.
00:09:21 It is free.
00:09:22 Just register and log in it.
00:09:24 Then download the local installer for Windows.
00:09:27 My Patreon supporters don't need this step but if you are not my Patreon supporter do
00:09:32 not skip this step.
00:09:33 This is important.
00:09:34 Then what you're going to do is you are going to enter inside your installation Web UI,
00:09:40 enter inside venv folder, virtual environment folder, enter inside library folder, site
00:09:46 packages, in here go to the NVIDIA, in here go to the cuDNN, in here go to the binary.
00:09:53 So this is the path where you will put your downloaded cuDNN files.
00:09:59 It will upgrade your cuDNN files to the version that you are going to install.
00:10:04 So it is inside installed folder here.
00:10:07 I copy paste it here, extract to cuDNN Windows.
00:10:11 I am using WinRAR for extract but Windows already has that.
00:10:15 So enter inside the downloaded file library binary.
00:10:18 This.
00:10:19 Cut them and go back to binary here and paste them.
00:10:23 It will ask you to replace, replace all of them.
00:10:26 And how can you be sure you are using the correct version?
00:10:29 For example, click one of them, click properties.
00:10:31 And when you go to details, you will see the product version.
00:10:36 This is showing the cuDNN version.
00:10:39 This is automatically done with my automatic installer.
00:10:43 Okay, so now what?
00:10:45 Let's start our Web UI one more time.
00:10:47 And now we will generate TensorRT versions of the models that we are going to use.
00:10:53 Each time we are going to get this error, even though we are using the latest cuDNN
00:10:57 file.
00:10:58 Okay, so let's say you are wanting to use Realistic Vision version 5.1 with the TensorRT.
00:11:06 Let's see the speed without TensorRT.
00:11:09 So this is the best VAE file.
00:11:11 Okay, as a prompt let's go something with simple photo of an amazing expensive luxury
00:11:19 sports car.
00:11:20 Okay it is not important.
00:11:21 I am preferring DPM++ 2M SDE Karras.
00:11:26 I find that this is working best so let's see the speed with 512 and 512.
00:11:32 Okay it is 16.31.
00:11:34 Let's generate another one.
00:11:36 16.5.
00:11:38 I am also recording a video right now so it is not the best speed but this is the speed
00:11:43 for 512 and let's go with 768 to 768.
00:11:47 Let's generate.
00:11:48 Okay this is the it per second.
00:11:51 It is like let me show you.
00:11:53 It is like 7 it per second.
00:11:56 So let's generate its TensorRT.
00:11:59 Go to this tab.
00:12:00 This is what you need to do for each model and each LoRA you have to generate a TensorRT
00:12:06 version.
00:12:07 I suggest you to generate batch size and make it dynamic.
00:12:13 What I mean: So select batch size option here.
00:12:15 Let's say 768.
00:12:17 Click advanced settings and do not use static shapes if you want to generate multiple different
00:12:22 resolutions and which batch size you want to generate.
00:12:26 I prefer batch size 1 currently.
00:12:28 But let's say min batch size 1, optimal batch size 2, if you are going to generate images
00:12:34 with batch size 4, then make the optimal 4 and maximum batch size.
00:12:38 Let's make it 4.
00:12:39 These are important.
00:12:40 If you don't set these and try to generate images later, you will get error.
00:12:46 So what is the minimum height?
00:12:47 I would like to generate it is 512.
00:12:50 Optimal.
00:12:51 So you may be wondering, what does optimal mean?
00:12:53 And max meaning?
00:12:54 Min means the minimum height your TensorRT will support.
00:12:59 If you are going to generate images below that, you will get errors.
00:13:04 Optimal means that it will be most optimal, it will do the best speed with that resolution.
00:13:10 So if we're going to mostly generate with 768, then it should be 768.
00:13:14 Or 512 then it should be 512.
00:13:18 So what would be the maximum resolution that I am going to generate with this model?
00:13:21 Let's say 1536.
00:13:23 This is the double size of this one.
00:13:26 Let's select same for width too.
00:13:28 So we are setting for each dimension height and width.
00:13:33 So min prompt token.
00:13:34 This is also important.
00:13:36 What is the prompt count that you are usually using?
00:13:39 Let's make this 75 minimum.
00:13:42 Let's make it 150 optimal and maximum 225.
00:13:47 And if you already have an existing TensorRT, then you should click force rebuild.
00:13:53 When you first time export engine, it will look for ONNX file.
00:13:59 You see it says no ONNX file is found exporting ONNX file.
00:14:05 And where is this exported?
00:14:07 This will be exported into ONNX temp folder first, then it will be inside models, inside
00:14:14 UNET ONNX, we will see the model file here.
00:14:18 This file generation may use a lot of VRAM.
00:14:23 Unfortunately, as far as I know, each GPU line requires different rebuild.
00:14:28 For example, I have RTX 3090.
00:14:31 So all of the RTX 3000 series can use my generated UNET files.
00:14:36 I don't know if they can use ONNX files, but they can use UNET files.
00:14:41 So if you are my Patreon supporter and if you have 3000 series, I can generate compiled
00:14:48 UNET files for you and upload them to the Patreon for you and you can use them.
00:14:53 Message me on Patreon for that.
00:14:54 If you don't have a sufficient VRAM, I think it still should work, but it will be slower
00:15:00 than what it is on my computer.
00:15:03 So you see Realistic Vision version 5.1 ONNX file is generated.
00:15:08 By the way, this file is not necessary.
00:15:11 You can delete it after your UNET file, which will be inside Unet-trt is generated.
00:15:18 So the TensorRT file will be here.
00:15:21 Each time when you generate a new TensorRT file, it will update this model.json file.
00:15:27 And it will append it here.
00:15:28 So this is really important.
00:15:30 This file.
00:15:31 Without this model.json file, you won't be able to use them.
00:15:35 So now it is generating the TensorRT file.
00:15:39 Once this operation is completed, we will be able to use it.
00:15:42 This part is actually much faster than ONNX file generation.
00:15:46 And it is also using much lesser VRAM, as you are seeing right now.
00:15:51 By the way, it is super important that if you set your command line arguments as --medvram
00:15:57 or --lowvram you will get an error in this part.
00:16:04 Let me show what I mean.
00:16:05 So this is our file where we set the command line arguments.
00:16:09 So if you have --medvram or --lowvram remove them when you are generating your TensorRT
00:16:18 files, these will cause errors.
00:16:20 Moreover, with TensorRT use xFormers.
00:16:23 I suggest you to use xFormers.
00:16:25 So this is the all the command line arguments I have.
00:16:28 One another thing is that if you have custom extensions, they could conflict with this
00:16:35 process, or with your TensorRT.
00:16:38 So if you get an error, do a fresh installation of Automatic1111 Web UI, then generate your
00:16:44 TensorRT files.
00:16:45 As I said, this is still in development, but it is hugely improving the speed.
00:16:51 We will see that in a moment, once the file has been generated.
00:16:55 And if you want the maximum performance instead of these custom sizes, you can make them static
00:17:01 shapes, and it will be faster.
00:17:03 You see it, but it will only support this height and this width and this number of batch
00:17:10 size and optimal prompt token.
00:17:12 So do not forget that.
00:17:13 If you need dynamic ranges, then set them like this as a dynamic range, follow the output
00:17:19 here and also on the CMD window.
00:17:22 Okay, it is still working.
00:17:24 And the engine is about to be generated.
00:17:28 And it is generated and saved to the disk.
00:17:30 So when I now refresh this SD UNET, you see, it is here.
00:17:34 When this is set to automatic, it will automatically use the UNET.
00:17:39 So what was our speed?
00:17:41 Our speed was for 768 resolution 6.91 it per second, if you remember.
00:17:48 So let's load the same seed.
00:17:51 Also, let's see the seed difference.
00:17:53 So this is the latest seed and everything is same.
00:17:55 Let's generate and let's see the speed.
00:17:58 So it is going to load the UNET.
00:18:00 Okay it says that no valid profile found.
00:18:03 We need to switch to the development branch of Automatic1111 Web UI.
00:18:07 What does this mean?
00:18:08 The Automatic1111 Web UI has several branches.
00:18:11 So when you click branches you will see master and you will see test FP16 and development
00:18:17 branch.
00:18:18 So the development branch was last updated 5 days ago and there is also test FP16.
00:18:22 For now I will go to the development branch.
00:18:25 Let's switch to the development branch.
00:18:26 How you are going to do that?
00:18:28 If you are my Patreon supporter this download_TensorRT_Enable_SDXL.bat file has switch to development branch automatically.
00:18:36 So let's hit 5.
00:18:37 And we are now switching to development branch.
00:18:40 Alternatively open a CMD window inside that folder.
00:18:43 Inside your Stable Diffusion Web UI folder type git checkout dev and it will switch to
00:18:49 dev.
00:18:50 If you want to return to the original git checkout master and you will return back to
00:18:54 master.
00:18:55 So let's go back to the development branch and let's restart the Web UI.
00:19:01 Once this development branch is merged with the master branch you won't need these changes
00:19:07 but for now we need to do this.
00:19:09 This is also mandatory for SDXL TensorRT.
00:19:14 So let's load the latest values from here and let's try again.
00:19:19 Okay this is weird.
00:19:20 Okay maybe we should try one more time.
00:19:23 So I will go with 768.
00:19:26 Let's make the optimal height 512.
00:19:29 Optimal width 512.
00:19:30 Actually let's make it like this.
00:19:32 512 768 1536.
00:19:33 This is interesting.
00:19:39 I wonder if it is caused by something else.
00:19:41 Maybe it was because of the prompt token count.
00:19:44 So I will not change it.
00:19:46 And okay, it says that export supports.
00:19:50 Maybe it is not able to support over 1024.
00:19:54 Yeah let's go with 1024 like this.
00:19:57 And when we click export engine.
00:20:02 Okay it started to export again.
00:20:03 You see it will not generate the ONNX file one more time.
00:20:08 It will use the existing ONNX file and it started rebuilding.
00:20:13 Okay it is generated even though I have sufficient VRAM it shows some error messages but I think
00:20:20 they shouldn't cause any errors.
00:20:22 So let's refresh the here and maybe let's make it automatic and try again.
00:20:27 Okay this time it looks like working and we got the image.
00:20:31 Let's see.
00:20:32 Yes it looks like the correct image.
00:20:35 Let's compare them.
00:20:36 Almost same.
00:20:37 Not exactly same but almost same as you are seeing right now.
00:20:40 Very good.
00:20:42 Okay.
00:20:43 So let's see the speed let's generate one more time.
00:20:45 Wow the speed is amazing.
00:20:47 You see from like 7 it to 12 it we got about 70% increase.
00:20:54 Let's try another resolution.
00:20:55 Last time probably it was because too big therefore why it failed.
00:20:59 Okay, this is another custom.
00:21:01 Wow working.
00:21:02 You see the resolution and the speed it is amazing 512 to 1024.
00:21:08 Let's try 1024 1024.
00:21:10 Okay, this is another one and you see the speed.
00:21:14 It is almost 6 it per second.
00:21:17 All right what if you are my Patreon supporter.
00:21:19 As I said you can use this download download_TensorRT_Enable_SDXL.bat file and you can download all of the pre-compiled
00:21:29 models for this.
00:21:31 If you request me more I will do that for you hopefully.
00:21:34 So let's also generate a TensorRT for SDXL too.
00:21:39 First of all make sure that you have selected the target model from here.
00:21:44 Let's select it SDXL base 1.0 version.
00:21:47 It is getting loaded.
00:21:49 Okay and let's select the best VAE from here.
00:21:53 I don't know if VAE selection is necessary probably not but to be sure and let's generate
00:21:58 the same image with SDXL base version.
00:22:02 Let's look at the speed.
00:22:03 The speed is 3.14 it 3.13 it per second.
00:22:08 I am recording a video right now also and the image is getting generated.
00:22:13 Okay here the image.
00:22:15 Let's go to the TensorRT and let's select this batch size.
00:22:20 I will make it 1 more time like this let's say 768 1024 1280.
00:22:27 Let's make it 1536.
00:22:28 I hope this works.
00:22:31 Okay let's try one more time.
00:22:32 Let's make this 512 actually and 512.
00:22:36 Okay 1024 1536 okay like this.
00:22:41 By the way use the arrow keys here instead of entering them if you are going to enter
00:22:45 something custom.
00:22:46 I'm not going to change prompt and the batch sizes but prompt changing is also important
00:22:51 if you are going to use too big prompt and export engine.
00:22:55 Since there is ONNX file it is going to first export it like the last time then we will
00:23:02 be able to generate it.
00:23:03 By the way it looks like downloading something from the Internet.
00:23:08 It is interesting.
00:23:09 Yeah.
00:23:10 Oh not this one is downloading.
00:23:11 The download is from the other CMD window we started which is downloading the pre-compiled
00:23:18 TensorRT files.
00:23:19 My bad.
00:23:20 Yeah.
00:23:21 So it is generating the ONNX file for the SDXL base version.
00:23:25 For SDXL to work, for SDXL TensorRT to work you really need to switch to the development
00:23:31 version of the SD Web UI.
00:23:33 And when you start your Automatic Web UI you will see there is an additional info here
00:23:39 which is the development branch version.
00:23:42 That is really important.
00:23:43 Okay.
00:23:44 ONNX generation has been completed.
00:23:46 It automatically started generating TensorRT file as well and now it is generating the
00:23:52 TensorRT file for SDXL base version.
00:23:56 Let's look at the ONNX.
00:23:57 It is here 5 gigabytes but the real file that will be used is will be here Unet-trt folder.
00:24:05 By the way you can have multiple TensorRT for each model and if you select automatic
00:24:11 option here.
00:24:12 According to your settings it will automatically select the accurate TensorRT configuration
00:24:18 for you.
00:24:19 So theoretically for each dimension you can generate different TensorRT configurations
00:24:26 and let the application pick the best one according to that.
00:24:30 Let's just wait for TensorRT for SDXL base version to be generated.
00:24:35 So the TensorRT file is generated.
00:24:38 Let's refresh the SD UNET.
00:24:40 Now it is here and let's generate another image right now and see the speed difference.
00:24:46 Last time it was like 3.1 it per second.
00:24:50 Wow you see now it is over 5 it per second.
00:24:54 The speed increase is huge, definitely huge and we got our car.
00:24:59 Let's generate another time and it is working amazing.
00:25:02 So can you generate TensorRT file for your custom made DreamBooth training models or
00:25:10 LoRAs?
00:25:11 Yes.
00:25:12 Now I will show you that.
00:25:13 I have an amazing checkpoint.
00:25:15 I trained it myself on let's load it.
00:25:18 Let's select some prompts.
00:25:20 I share all of my amazing prompts on Patreon and I am preparing the very best SDXL DreamBooth
00:25:29 training tutorial right now.
00:25:30 I am still working on it so I am also generating some samples.
00:25:34 Let's select some of the prompts here and see the speed and TensorRT result.
00:25:41 Okay.
00:25:42 So for example let's go with this one.
00:25:45 Okay here.
00:25:46 The file is loaded.
00:25:47 This is also using after detailer as well.
00:25:50 Oh the after detailer extension is not installed.
00:25:53 So let's go to here.
00:25:55 Available, load from, after detailer.
00:25:58 Let's install it quickly.
00:25:59 With after detailer you can automatically inpaint your face after each generation and
00:26:07 have an improved face in the final output.
00:26:10 What does it do is: it is automatically masking your face and inpainting with the parameters,
00:26:16 prompt and settings you give.
00:26:17 It is not different.
00:26:19 By the way TensorRT is also supporting inpainting as well and it is amazing because of that.
00:26:24 Okay for after detailer to be fully installed let's apply and restart UI.
00:26:31 You see these were the after detailer necessary packages.
00:26:35 Okay I think everything is installed.
00:26:38 Yeah every time we get this error it is very annoying.
00:26:40 I hope the developers fix it as soon as possible.
00:26:43 All right let's select the model one more time.
00:26:47 Actually I will try another prompt so let's go with this prompt.
00:26:51 Let's send to the text to image tab.
00:26:53 With SDXL you don't even need negative prompts.
00:26:56 It is amazing because of that.
00:26:58 Let's make this random, 40 steps.
00:27:00 Okay and after detailer here.
00:27:04 All right.
00:27:05 Okay everything is set.
00:27:06 Let's generate one image and see the speed.
00:27:09 So the speed is around 3.15 it per second as expected.
00:27:14 It will also do face inpainting with after detailer.
00:27:18 It will also take some time.
00:27:19 You see it is inpainting face.
00:27:22 The face inpainting speed is also same and we got the image.
00:27:26 Now let's generate the TensorRT version.
00:27:28 So this model is selected and let's go with 1024, for batch size 1 static, export engine.
00:27:37 And since this is a different base model it is going to generate ONNX file one more time.
00:27:42 We have to generate different ONNX and TensorRT file for each model.
00:27:48 I hope the NVIDIA overcome this issue and make it on the fly speed up.
00:27:55 But this is better than nothing.
00:27:56 And it is extremely super useful.
00:27:58 Okay.
00:27:59 So the TensorRT file has been generated for my custom model.
00:28:04 Let's refresh here and now it is here.
00:28:06 Let's generate 9 images.
00:28:09 Batch count but not batch size.
00:28:11 Batch count 9 will generate images one by one.
00:28:15 Batch size means it will generate in parallel but we didn't set this for parallel generation.
00:28:22 Okay the it is super.
00:28:23 You see 6 it per second on RTX 3090.
00:28:27 It was 3 it previously.
00:28:30 It is almost double speed.
00:28:32 This is amazing.
00:28:33 This is superb.
00:28:35 And you see in face inpainting it is still also very fast.
00:28:40 So it is working amazing.
00:28:41 Let's see the results.
00:28:43 Okay the results are generated.
00:28:45 Let's look one by one.
00:28:47 For example.
00:28:48 Okay.
00:28:49 Let's.
00:28:50 This is fully zoomed in.
00:28:51 That is why not very high quality.
00:28:52 But okay.
00:28:53 Let's try another prompt but the results are really good.
00:28:56 And since it is fast we will get much better images with just trying more.
00:29:01 If you wonder how did I train this model the configuration file and quick tutorial are
00:29:07 shared here and I also have an amazing prompt list.
00:29:11 So from this prompt list let's look at the prompts.
00:29:14 It also as a text file as well.
00:29:17 Let's open the PDF file and in here there are unique prompts and the generated images
00:29:23 as you are seeing right now.
00:29:24 So we can use any of them.
00:29:26 Let's try this one.
00:29:28 Okay you can also download image like this.
00:29:31 Save as.
00:29:32 Let's download to the downloads and let's move to the PNG info.
00:29:37 Let's go to download.
00:29:39 It should appear in a moment.
00:29:41 Oh this is not the correct one.
00:29:42 Let's go here.
00:29:44 Okay let's select it.
00:29:45 Okay it is appeared.
00:29:47 Let's send text to image tab and everything is set.
00:29:50 Okay let's remove this.
00:29:53 Okay.
00:29:54 Let's generate 9 images with this prompt and see the result.
00:29:57 You see the images are getting generated very fast and face inpainting is improving the
00:30:02 face.
00:30:03 We got the results and amazing quality as you are seeing right now.
00:30:07 One another thing is that this this model is generated from very poor quality images.
00:30:13 Why?
00:30:14 With purpose.
00:30:15 Because I am preparing an amazing full tutorial for SDXL DreamBooth and I am using deliberately
00:30:22 a low quality data set.
00:30:24 Let me show you my data set.
00:30:25 So this is my data set.
00:30:27 From this data set we get these images.
00:30:29 Why I am using poor data set because you will be also able to get amazing results even with
00:30:36 not a good data set.
00:30:37 When you use a better data set you will get even better results.
00:30:40 And now let's see how we can combine this with LoRAs.
00:30:45 I am going to download SDXL Pixel Art XL LoRA from CivitAI.
00:30:51 Okay it is downloaded.
00:30:52 Let's put it into our models LoRA folder so we can use here.
00:30:57 Okay, then let's select the SDXL base version from here and I'm going to use this prompt.
00:31:04 Let's copy it and let's copy the negative prompt as well and let's make this automatic.
00:31:09 And let's also add the LoRA.
00:31:11 When you refresh you should see your LoRA here.
00:31:14 You will see your LoRA only if the SDXL model is selected here.
00:31:19 Because if you have selected not SDXL model the SDXL LoRA will not appear here.
00:31:25 The vice versa is also valid.
00:31:27 So if you have selected SD 1.5 based version model here only SD 1.5 based LoRAs will appear
00:31:35 here and let's generate.
00:31:37 Probably we will not get a speed boost or the LoRA will not work.
00:31:41 Let's see.
00:31:42 Okay, we got the speed boost.
00:31:44 Let's.
00:31:45 Yeah we need to also change this to resolution.
00:31:48 All right let's try again.
00:31:49 Yeah the speed boot is effective but I think the LoRA is not effective.
00:31:54 Because we didn't generate the LoRA TensorRT yet so let's make this seed.
00:31:59 Let's go to TensorRT.
00:32:00 How we are going to generate a LoRA?
00:32:02 We go to the TensorRT LoRA tab refresh and yeah the LoRA is not appearing so we need
00:32:08 to restart the Web UI.
00:32:10 This is a bug that is not fixed yet so I am going to restart the Web UI.
00:32:16 Okay Web UI has been restarted.
00:32:19 So let's load the latest values from here.
00:32:23 Let's come look to LoRA.
00:32:24 Yes, the LoRA is looking correct.
00:32:26 Pixel art.
00:32:27 All right.
00:32:28 And let's try one more time.
00:32:30 Then I will generate the TensorRT file.
00:32:33 Okay.
00:32:34 Interesting.
00:32:35 Did it generate the LoRA?
00:32:37 No.
00:32:39 But it is, it is as it is worked.
00:32:41 It it is like half speed.
00:32:43 Interesting.
00:32:44 Very interesting.
00:32:45 I think it still used the LoRA but it didn't improve the LoRA speed.
00:32:50 Weird.
00:32:51 Okay let's go to TensorRT.
00:32:52 TensorRT LoRA.
00:32:54 Select the LoRA and convert to TensorRT.
00:32:57 TensorRT LoRA generation should be faster.
00:33:02 For even LoRA it is generating an ONNX file.
00:33:06 Let's also look at the generated ONNX file which should be inside models, inside here.
00:33:14 And do we see the file?
00:33:16 Not yet.
00:33:17 Okay let's just wait.
00:33:19 I think this is because the default preset was selected and it is regenerating a default
00:33:25 preset, LoRA for us.
00:33:28 So we probably need to change it.
00:33:31 And it shows in the bottom here the available profiles we have.
00:33:35 You see, we have a profile 0 for SDXL base, which has minimum, optimal and maximum heights.
00:33:42 Probably we should have selected it, but I don't know if there is a way to select it,
00:33:46 since it is a custom profile.
00:33:48 When we selected this, I wonder if it was showing the last one.
00:33:53 Anyway, this is why probably it will generate another model.
00:33:58 So if you ever wonder what your profile consists of click refresh here.
00:34:04 Oh, since it is generating, the refresh will take time.
00:34:07 I should have waited.
00:34:09 So when it is generating, the refresh will also wait, but we can see the profile already.
00:34:14 So the profile 0 for Realistic Vision version 5.1.
00:34:17 This is minimum height, optimal height, maximum height, batch size, minimum optimal, and maximum
00:34:24 batch size and text length.
00:34:26 So based on these profiles, they are working.
00:34:29 Oh, now I see the profile 1 for SDXL base 1.0.
00:34:32 This is a new one that is being generated for our LoRA.
00:34:36 As you are seeing right now, it is getting generated right now.
00:34:39 I had an error, so I restarted the Web UI, and I also selected force rebuild this time.
00:34:46 So if you get an error, select also force rebuild and try again.
00:34:50 Make sure that accurate models are selected here on the top.
00:34:54 Okay, I found the error reason.
00:34:57 First of all, you need to generate your TensorRT profile, and then select that profile from
00:35:04 here.
00:35:05 And then you need to generate your LoRA, otherwise it will not work.
00:35:09 So what we need to do is first we need to generate an engine based from here.
00:35:16 And then we need to generate the TensorRT LoRA.
00:35:19 So let's make default preset engine for SDXL base model.
00:35:25 And then we will be able to generate the TensorRT LoRA, since I have an ONNX for SDXL base.
00:35:31 It is just going to generate a TensorRT default preset profile here.
00:35:37 All right, so the TensorRT generation has been completed with default preset.
00:35:43 And we can see the presets here.
00:35:44 When I click refresh here you see now we have 2 presets for the SDXL base version.
00:35:50 Let's go to the LoRA and refresh.
00:35:53 Okay it is not visible one more time.
00:35:56 So we need to restart the UI one more time.
00:35:59 Let's do it.
00:36:00 I will restart the Web UI.
00:36:03 I'm not going to delete the parts where I had errors.
00:36:07 Why?
00:36:08 Because so that if you encounter similar errors you can solve them yourself.
00:36:13 Okay, we are going to get this error, this annoying error.
00:36:16 I hope it gets fixed very soon.
00:36:19 Okay, it is loaded.
00:36:20 Let's go to TensorRT.
00:36:22 We have selected the default preset.
00:36:25 Let's refresh.
00:36:26 LoRA is selected and convert TensorRT.
00:36:30 It says no TensorRT engine found building.
00:36:33 Now.
00:36:34 Let's see.
00:36:35 Loading bytes.
00:36:36 Let's look at the messages.
00:36:37 It says loading TensorRT.
00:36:39 Yes we are still having issue for some reason.
00:36:43 I tested this on SD 1.5.
00:36:46 It was working.
00:36:47 I wonder what is the reason for SDXL LoRA is not working.
00:36:51 I will report this to the developers so it may get fixed when you are watching this video.
00:36:58 Okay instead let's try SD 1.5 based model LoRA Tensor.
00:37:03 So I'm going to download this LoRA.
00:37:06 I will also report this error as I said.
00:37:09 Maybe this could be related to Automatic1111 SD Web UI as well.
00:37:13 Okay let's move this into the LoRA folder.
00:37:16 Models, LoRA.
00:37:17 All right.
00:37:19 So now when I refresh my LoRAs it shouldn't appear.
00:37:23 Yes because I need to select SD 1.5 based version.
00:37:28 Let's select Realistic Vision because we already have a TensorRT profile for that.
00:37:33 Let's also select the correct VAE file.
00:37:36 Let's go to the TensorRT.
00:37:37 Let's refresh.
00:37:38 Still not showing.
00:37:39 Refresh here.
00:37:40 Yes the LoRA appeared as you are seeing right now.
00:37:43 Let's refresh.
00:37:44 Yeah we really need to restart one more time.
00:37:47 Okay I'm going to restart right now so when you add a new LoRA you need to restart the
00:37:52 Web UI.
00:37:53 If we get error I will return back to the main branch of the Automatic1111 Web UI.
00:37:58 That could be the reason and we already have a profile for Realistic Vision.
00:38:03 Okay, okay Web UI started.
00:38:06 Let's go to TensorRT.
00:38:07 Let's refresh.
00:38:09 Yes now we can see the LoRA.
00:38:11 Convert to TensorRT.
00:38:14 It says it is generating.
00:38:16 Exporting ONNX file.
00:38:18 Okay let's look at the export.
00:38:21 Okay this worked very fast.
00:38:23 It exported you see Xrs 2.0 Realistic Vision version 5.1 ONNX weights.
00:38:32 So probably it is not working with SDXL yet but it is working with SD 1.5 based models.
00:38:38 Let's try an example here.
00:38:40 Photo of a car.
00:38:43 Okay let's try this and I am not going to add first the TensorRT.
00:38:48 Let's generate an image so it loaded the UNET from Realistic Vision.
00:38:52 Okay what do I see?
00:38:55 Yes it is working great.
00:38:57 The speed is amazing.
00:38:58 Okay then let's add the LoRA.
00:39:01 Okay I have added the LoRA.
00:39:03 Let's generate again and let's generate again.
00:39:06 Yes the speed is amazing.
00:39:08 Does it use the LoRA TensorRT?
00:39:11 I'm not sure.
00:39:12 Can we select it from here?
00:39:14 Yes.
00:39:15 Let's select it from here and try again.
00:39:17 Okay.
00:39:18 Okay.
00:39:19 Yes.
00:39:20 Now working I think.
00:39:21 Yes you see the output difference.
00:39:23 I think we need to manually select LoRA TensorRT from here.
00:39:28 Now the the output is really different.
00:39:31 Let's.
00:39:32 Let me show you the difference.
00:39:33 So I will make this seed.
00:39:34 I will remove the LoRA and I will return back to Realistic Vision TensorRT.
00:39:40 Let's try again and you see completely different picture different style.
00:39:45 So the LoRA is working and LoRA TensorRT is working as well as you are seeing right now.
00:39:53 So this is how you use LoRAs.
00:39:55 Actually it is changed again.
00:39:57 I think there is a bug.
00:39:58 So it was working but not working right now.
00:40:01 Let's see what could be the reason.
00:40:04 Let's add again, TensorRT is selected.
00:40:06 Because this is in development.
00:40:09 So sometimes, maybe we need to restart.
00:40:11 Oh, I see after I did refresh, now it is effective.
00:40:15 Yeah, there are some still bugs.
00:40:16 I think they will get all fixed in near future.
00:40:20 But as you are seeing right now, it is working, it is super fast.
00:40:24 The it per second is super fast, even though I am recording a video right now.
00:40:29 So this is it.
00:40:30 I hope you have enjoyed, please subscribe our channel.
00:40:33 When you go to Stable Diffusion.
00:40:35 When you click here, you will see the options of starring our repository.
00:40:39 This is super important.
00:40:40 We have over 1k stars.
00:40:42 I appreciate all of you.
00:40:44 Please star it.
00:40:45 Please also fork it.
00:40:46 And please also watch it.
00:40:47 If you also sponsor me, I would appreciate that very much.
00:40:50 Currently I have zero sponsors on Github, but you can also support me on Patreon.
00:40:55 Moreover, I have all the links here.
00:40:57 Patreon.
00:40:58 Buy me a coffee.
00:40:59 Medium.
00:41:00 I am sharing amazing articles on Medium, CivitAI, DeviantArt.
00:41:03 Follow us on Youtube as well, Subscribe our channel, follow me on Linkedin.
00:41:07 You can also purchase my Udemy course.
00:41:09 And you can follow me on Twitter.
00:41:10 I am very active on Twitter.
00:41:12 If you have any questions, please ask me through Youtube.
00:41:16 Or you can open an issue here.
00:41:18 I am replying all of them.
00:41:20 I am super active currently.
00:41:21 My full-time income and work is AI and you guys.
00:41:26 Because I left the university this month, hopefully I will start another university
00:41:30 very soon.
00:41:31 It will be a remote education, but currently I am 24 dedicated to you.
00:41:37 I am also doing consultation.
00:41:38 I am also doing side projects.
00:41:40 I am also doing model training.
00:41:42 Client helping.
00:41:43 I am open to every kind of collaborations, and the links are here.
00:41:46 I will update this file if it be necessary.
00:41:47 Thank you so much.
00:41:48 Hopefully see you in another amazing tutorial.

Uh oh!

Double Your Stable Diffusion Inference Speed with RTX Acceleration TensorRT A Comprehensive Guide

Double Your Stable Diffusion Inference Speed with RTX Acceleration TensorRT: A Comprehensive Guide

Full tutorial link > https://www.youtube.com/watch?v=kvxX6NrPtEk

Video Transcription

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!