Skip to content

NVFP4 With CUDA 13 Full Tutorial 100 Speed Gain Quality Comparison and New Cheap Cloud SimplePod

FurkanGozukara edited this page Jan 11, 2026 · 1 revision

NVFP4 With CUDA 13 Full Tutorial, 100%+ Speed Gain + Quality Comparison & New Cheap Cloud SimplePod

NVFP4 With CUDA 13 Full Tutorial, 100%+ Speed Gain + Quality Comparison & New Cheap Cloud SimplePod

image Hits Patreon BuyMeACoffee Furkan Gözükara Medium Codio Furkan Gözükara Medium

YouTube Channel Furkan Gözükara LinkedIn Udemy Twitter Follow Furkan Gözükara

Finally NVFP4 models has arrived to ComfyUI thus SwarmUI with CUDA 13. NVFP4 models are literally 100%+ faster with minimal impact on quality. I have done grid quality comparison to show you the difference on FLUX 2, Z Image Turbo and FLUX 1 of NVFP4 versions. To make CUDA 13 work, I have compiled Flash Attention, Sage Attention & xFormers for both Windows and Linux with all of the CUDA archs to support literally all GPUs starting from GTX 1650 series, RTX 2000, 3000, 4000, 5000 series and more.

In this full tutorial, I will show you how to upgrade your ComfyUI and thus SwarmUI to use latest CUDA 13 with latest libraries and Torch 2.9.1. Moreover, our compiled libraries such as Sage Attention works with all models on all GPUs without generating black images or videos such as Qwen Image or Wan 2.2 models. Hopefully LTX 2 presets and tutorial coming soon too. Finally, I introduce a new private cloud GPU platform called as SimplePod like RunPod. This platform has all the features of RunPod same way but much faster and cheaper.

📂 Resources & Links:

ComfyUI Installers: [ https://www.patreon.com/posts/ComfyUI-Installers-105023709 ]

SimplePod: [ https://simplepod.ai/ref?user=secourses ]

SwarmUI Installer, Model Auto Downloader and Presets: [ https://www.patreon.com/posts/SwarmUI-Install-Download-Models-Presets-114517862 ]

How to Use SwarmUI Presets & Workflows in ComfyUI + Custom Model Paths Setup for ComfyUI & SwarmUI Tutorial: [ https://youtu.be/EqFilBM3i7s ]

SECourses Discord Channel for 7/24 Support: [ https://discord.com/invite/software-engineering-courses-secourses-772774097734074388 ]

NVIDIA NVFP4 Blog Post More: [ https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/ ]

⏱️ Video Chapters:

00:00:00 New ComfyUI installer (CUDA 13, Torch 2.9.1, Triton + attention libs)

00:00:19 NVFP4 speedup claims vs real tests; why CUDA 13 enables new models

00:00:34 Prebuilt FlashAttention/SageAttention/xFormers for many GPUs (Windows + Linux)

00:01:00 Quality roadmap: FLUX2 Dev, Z Image Turbo, FLUX Dev (BF16/FP8/GGUF/NVFP4)

00:01:23 Downloader adds NVFP4: FLUX2 Dev, FLUX Dev (Context/Dev), Z Image Turbo

00:01:51 SimplePod AI intro: RunPod-style pods, cheaper rates, permanent storage

00:02:36 Musubi Tuner FP8 Scaled: quality myths vs GGUF + why scaled matters

00:03:10 Quantization & precision (FP32/BF16/FP8/GGUF) + Qwen3 low-VRAM encoders

00:03:34 ComfyUI v73 zip: CUDA 13 included; update NVIDIA drivers only (v72 deprecated)

00:04:13 Update steps: overwrite zip, delete venv, run install/update .bat

00:05:02 Python: 3.10 recommended (supports 3.10-3.13); fresh vs update

00:06:02 New installer flow: uv speed, standalone use, backend libs detected

00:07:12 Stability flags: --cache-none vs --disable-smart-memory (OOM/stuck fixes)

00:07:54 SwarmUI presets: 32 presets supported; drag/drop + auto model downloader

00:08:25 Update SwarmUI model-downloader zip (extract + overwrite)

00:08:49 Download bundles/models (Z Image Turbo Core + NVFP4 options)

00:09:25 Update/launch SwarmUI; point to updated ComfyUI backend + set args

00:10:32 Live gen test: Z Image Turbo BF16 @1536x1536

00:11:29 Switch to NVFP4: VRAM cache behavior; 1024x1024

00:12:36 FLUX2 Dev quality: FP8 Scaled vs NVFP4 side-by-side comparisons

00:13:33 Speed chart: FLUX2 NVFP4 about 193% faster than FP8 Scaled

00:14:10 Z Image Turbo quality: BF16 vs NVFP4 vs FP8 Scaled (quant method)

00:15:25 FLUX Dev: FP8 Scaled approx GGUF Q8; NVFP4 currently shows degradation

00:16:45 What precision means + model size examples (FP32/BF16/FP8 Scaled/NVFP4)

00:18:07 Practical recommendations: BF16 best; avoid FP16; raw FP8 vs FP8 Scaled

00:19:43 GGUF explained: block quant, slower runtime; use only when RAM is too low

00:21:36 Precision hierarchy recap + when to pick FP8 mixed/scaled over GGUF

00:21:58 SimplePod setup: register, add credits, open template link

00:22:31 Template config + RunPod price comparison (disk, ports, GPU selection)

00:24:02 Persistent volume: create + mount to /workspace

00:25:11 Launch RTX Pro 6000 pod; SimplePod vs RunPod pricing differences

00:26:29 Temp vs persistent disk: deleting instance wipes temp data - backup!

00:26:55 JupyterLab: upload zips, apt install zip, unzip ComfyUI in workspace

00:27:48 Run install script; unzip SwarmUI; start the model downloader

00:29:02 Downloader path for ComfyUI + folder structure; download Z Image Turbo bundle

00:30:08 Start ComfyUI; confirm CUDA 13 + Torch 2.9.1; connect via port 3000 Direct

00:31:08 Preset demo: Z Image Turbo Quality 1; fix VAE path; monitor VRAM

00:33:18 File Browser Direct: download outputs/models fast; upload files back

00:34:41 Restart server; install/start SwarmUI; open Cloudflared URL

00:36:26 SwarmUI backend: /workspace/ComfyUI/main.py + args; import presets

00:37:27 Download FLUX2 Core + NVFP4; share model paths between SwarmUI & ComfyUI

00:39:27 FLUX2 NVFP4 generation @2048x2048; VRAM usage + step speed

00:40:43 Cloud GPU pitfall: diagnosing a power-capped GPU

00:41:28 Resume: re-run template w/ volume; reconnect fast

00:45:02 Wrap-up: SimplePod pros (direct/secure, cheaper storage)

Video Transcription

  • 00:00:00 Greetings everyone. Today I am going to  introduce you our newest ComfyUI installer  

  • 00:00:04 which installs ComfyUI with latest CUDA  13, Torch 2.9.1. With the latest Triton,  

  • 00:00:13 SageAttention, FlashAttention, xFormers,  InsightFace, DeepSpeed libraries. If you  

  • 00:00:19 remember NVIDIA had published this chart  which shows extreme speed ups with NVFP4  

  • 00:00:27 and now with CUDA 13 and latest ComfyUI we are  actually able to use these models. I have tested  

  • 00:00:34 them. We are not getting that much speed but  we are still gaining significant speed ups. I  

  • 00:00:40 have compiled the latest libraries with all of  the GPUs out there. You see all the supported  

  • 00:00:46 GPUs that our ComfyUI installer is supporting for  FlashAttention, SageAttention and xFormers which  

  • 00:00:53 you need to run the models both for Windows  and Linux. I have compiled for both of them.

  • 00:01:00 But this is not all. I also have compared the  actual quality difference. This is FLUX 2 Dev  

  • 00:01:06 model. This is Z Image Turbo model and this  is FLUX Dev model. I have compared the BF16,  

  • 00:01:14 FP8 Scaled, GGUF Q8 and NVFP4 models quality.  So today I will show all of them. I also  

  • 00:01:23 have added very famous 4 models into our  model downloader FLUX 2 Dev NVFP4 model,  

  • 00:01:30 FLUX Dev Context NVFP4 model, FLUX Dev NVFP4  model and Z Image Turbo NVFP4 model. There is  

  • 00:01:39 no NVFP8 yet but I am expecting soon hopefully.  So when you download the models you will see the  

  • 00:01:46 models like this in your SwarmUI or you will  be able to use them in your ComfyUI as well.

  • 00:01:51 Additionally, I will introduce you a new  platform called as SimplePod AI. This is  

  • 00:01:58 like RunPod but the prices are much better. For  example, RTX 5090 is starting from 44 cents,  

  • 00:02:07 on RunPod it is starting from 89 cents.  This one also has permanent storage system  

  • 00:02:14 like RunPod but it is twice cheaper  than RunPod. It is also faster than  

  • 00:02:19 RunPod. So I will show everything about  SimplePod AI. Our installation scripts  

  • 00:02:24 and our applications will work right  away on SimplePod AI just as RunPod.

  • 00:02:29 Moreover, with our SE courses Musubi Tuner  application I have generated Quant FP8 Scaled  

  • 00:02:36 version of the FLUX Dev model. Why? Because  I wanted to compare its quality because there  

  • 00:02:44 are a lot of misinformation that even GGUF Q6  is better than FP8 Scaled but it is not true.  

  • 00:02:52 If you have a properly quantized scaled model  it is equal to GGUF Q8 or even BF16 very close.  

  • 00:03:02 So to enlighten this issue I will also talk about  the quantization and explain what are these FP32,  

  • 00:03:10 FP16, BF16, FP8, GGUF. Furthermore, we have a  new 2 text encoders for Z Image Turbo models  

  • 00:03:19 which are Qwen 3 4 billion parameters FP8 mixed  and Qwen 3 4 billion parameters FP4 mixed. These  

  • 00:03:28 are very good text encoders for very low VRAM  GPUs. So they will get even further speed ups.

  • 00:03:34 So I have updated ComfyUI installation post  with all the newer information. You need to  

  • 00:03:41 download latest ComfyUI zip file. The link is  here, the link will be also in the description  

  • 00:03:45 of the video. Download the latest version 73.  We are not going to update CUDA 12.9 and Torch  

  • 00:03:52 2.8 anymore but I am still keeping that as a  deprecated version 72 if you need it. I really  

  • 00:03:58 recommend you to read all the latest changes 10  January version 73 update. You don't need to have  

  • 00:04:06 CUDA 13 installed in your system. You only need  to have updated NVIDIA drivers. Don't forget that.

  • 00:04:13 So to update your existing ComfyUI all you  need to do is this. Move the zip file into  

  • 00:04:19 your previous installation, right click and  extract all the files in the same folder. You  

  • 00:04:26 need to see overwrite. This is important.  Overwrite all the files. Then you need to  

  • 00:04:31 enter inside ComfyUI, delete your venv folder,  virtual environment folder. This is mandatory.  

  • 00:04:39 Make sure that your ComfyUI is not running  otherwise it will not delete it. So I am  

  • 00:04:44 going to close my running instances so that I  will be able to delete the virtual environment  

  • 00:04:49 folder. Once your virtual environment folder  is deleted all you need to do is double click  

  • 00:04:54 windows install or update ComfyUI.bat file.  I recommend to use Python 3.10. I am testing  

  • 00:05:02 with it, I am using it. You need to have  it installed yourself. So select option  

  • 00:05:07 1 and hit enter. But we support all  Python 3.10, 11, 12 and 13 versions.

  • 00:05:13 If you want to make a fresh installation extract  it into your any drive then again all you need to  

  • 00:05:20 do is just windows install or update ComfyUI.bat  file and it will start a fresh installation same  

  • 00:05:27 as updating after deleting virtual environment.  This update requires deleting virtual environment.  

  • 00:05:33 You don't need to do this unless we change CUDA or  Torch version again. So normally you don't need to  

  • 00:05:40 delete your virtual environment. But you can do  that, there is no harm in it. Virtual environment  

  • 00:05:44 is 100% isolated, it will not cause any data loss.  Again you can make a fresh installation which I  

  • 00:05:52 recommend, test it then you can move your main  installation into newer version. The installation  

  • 00:05:57 is super fast. It will take few minutes  depending on your computer and network speed.  

  • 00:06:02 You see it is almost done in my computer because  we are now using uv packages for installing.

  • 00:06:08 Our ComfyUI installer is standalone so therefore  you can use it as a standalone but I prefer to  

  • 00:06:15 use it with SwarmUI which I will show. So after  installation you can windows run GPU and it will  

  • 00:06:21 start your latest version ComfyUI. Let's see what  features we are getting. So this is my updated  

  • 00:06:29 folder not a fresh installation folder. So you  see that now found ComfyUI Kitchen Backend. It  

  • 00:06:35 supports all available true, available true,  available true. So it is disabling ComfyUI  

  • 00:06:41 Kitchen Backend Triton because it is using eager  and CUDA versions. It is automatic but we have  

  • 00:06:47 all 3 of them available true, available true and  available true. You see it supports dequantize  

  • 00:06:53 NF4, dequantize per tensor, everything.  My installer supporting everything. It is  

  • 00:06:59 by default using the SageAttention. You see  it has these backend quantizations as well.

  • 00:07:05 So if you want to add new arguments to your  ComfyUI installation, edit this windows run  

  • 00:07:12 GPU with any text editor and change what is said  here. For example people are getting stuck or out  

  • 00:07:20 of memory errors recently so you have 2 options.  You can use --cache-none. This deloads every  

  • 00:07:28 model after they are executed whether it is text  encoder or whether it is dual model. So this will  

  • 00:07:33 clear your RAM and VRAM 100%. Alternatively you  can use --disable-smart-memory. This will return  

  • 00:07:41 back to older memory management. This uses  lesser GPU memory. It is a little bit slower  

  • 00:07:47 but this will prevent getting you stuck or out  of memory errors. So this is all about ComfyUI  

  • 00:07:54 installation. You can use it as usual. And now we  support all 32 presets of the SwarmUI. Hopefully  

  • 00:08:03 I will make LTX 2 preset as well. So just wait  me to do that. All you need to do is just drag  

  • 00:08:10 and drop the preset and it will work right  away like this with the auto model downloader.

  • 00:08:17 Okay so how do we use newer NVFP4 models? We  are going to download the latest SwarmUI model  

  • 00:08:25 downloader zip file. This also has installation  for SwarmUI if you remember. Move it into your  

  • 00:08:31 SwarmUI previous installation or you can make  a fresh installation. It is same. So again it  

  • 00:08:38 is super important you extract and overwrite all  the files. Extract here, overwrite all files. This  

  • 00:08:44 is important. If you don't overwrite you won't  see the newer models. Then first let's download  

  • 00:08:49 newer models so I am going to windows start  download models app.bat file. I am assuming  

  • 00:08:55 that you have downloaded bundles previously. If  you didn't download bundles previously you need  

  • 00:09:00 to download them to be able to use newer models.  For example you can download bundles like this  

  • 00:09:06 Z Image Turbo Core Bundle. Just click download  and it will download all the necessary models.  

  • 00:09:12 So you can download newer NVFP4 models. These  models bring speed up for RTX 5000 series. So  

  • 00:09:19 just click download and it will download all the  models. Then you need to update your SwarmUI so I  

  • 00:09:25 will just windows update SwarmUI and it will  start it. If you are first time installing  

  • 00:09:30 just use windows install SwarmUI. Okay it is  updating. The update completed and it has started.

  • 00:09:37 The first thing you need to do is you need  to update your backend and give the new or  

  • 00:09:43 updated ComfyUI backend like this. I am using  --use-sage-attention. You can also use other  

  • 00:09:50 attentions or as I just explained you, you can  use this disable smart memory or cache none.  

  • 00:09:58 If you are getting stuck or getting out of VRAM  errors I recommend to use disable smart memory  

  • 00:10:04 first. You can compare which one is working  better for you or you can use cache none. But  

  • 00:10:09 cache none makes it load models every time at  every generation from hard drive. Disable smart  

  • 00:10:16 memory doesn't do that. However this uses even  lesser RAM memory. If you have a very limited  

  • 00:10:21 RAM memory you should use this. So once you  done this you are ready to start using. Again  

  • 00:10:26 you should update your presets if you haven't  yet. Then Quick Tools reset params to default.

  • 00:10:32 Let's make a demonstration with the Z Image  Turbo. This is 1536 by 1536 pixels. Super fast  

  • 00:10:42 car. First with the Z Image Turbo BF16 model. So  let's generate 10 images then I will show with  

  • 00:10:49 the Z Image Turbo NVFP4 model. So you will see  the speed difference live while I am recording  

  • 00:10:57 a video. Okay the generations started so this is  the first generation. You are watching it live.  

  • 00:11:04 Look at the speed. So this is second generation.  So let's see the speed. The second generation  

  • 00:11:11 took 6.31 second. So this is third generation. It  took 6.38 seconds. This is fourth generation. It  

  • 00:11:20 took 6.33 seconds. So it is super fast. But what  about NVFP4 model? So I will just cancel this and  

  • 00:11:29 I am just going to select the NVFP4 model and hit  generate. Let's see the speed. This will blow your  

  • 00:11:36 mind. And let's also see the VRAM usage. So it is  using 14 gigabytes of VRAM because it cached the  

  • 00:11:43 text encoder as well because I have VRAM. Don't  worry you can run this as low as 6 gigabyte GPUs.

  • 00:11:49 And look at the speed. I mean these are almost  instant and these are 1536 to 1536 pixel images.  

  • 00:11:56 Not 1024. So it is taking 3 seconds to generate  2.25 megapixel images. These images are really  

  • 00:12:05 bigger than 1024 and you see the speed.  Let's try 1024 to show you the speed. 1024,  

  • 00:12:14 1024 and let's hit generate. I mean look at  the speed. They are like instant. You see? It  

  • 00:12:20 is taking 1.19 second. 1.2 second. Look at the  speed. The NVFP model is just amazing. But what  

  • 00:12:28 about quality? So I have tested the quality and  let's see the quality. This is FLUX 2 Dev model.  

  • 00:12:36 Which preset did I use? For this model I have  used the FLUX 2 Quality 1 preset. You see this  

  • 00:12:43 one. So this is the highest quality preset. So  the left ones are FP8 mixed scaled and the right  

  • 00:12:50 one are NVFP4. So you see left, right.  Pretty good, pretty close. Left, right.  

  • 00:12:58 The man changed but very good quality. Left,  right. This is very very close. You see? Left,  

  • 00:13:05 right. Almost same. Actually NVFP4 has a better  fur of the animal if you ask my opinion. Left,  

  • 00:13:12 right. Both of them is great. Left,  right. Both of them is excellent. Left,  

  • 00:13:18 right. Both of them is excellent. So with  FLUX 2 Dev model we don't lose any quality.

  • 00:13:25 But how much speed do we gain? When we look at  our actual speed chart compared to the FP8 scaled,  

  • 00:13:33 the NVFP4 is 193% faster. You see? So we are  gaining 100% speed. From 8.34 second IT to 4.31  

  • 00:13:49 second IT. So we get 100% speed gain, this is  193% faster and there is no quality difference.  

  • 00:13:58 We don't get this much speed gain that  NVIDIA claims. I don't know how did  

  • 00:14:04 they made this chart but we are gaining  massive amount of speed nevertheless.

  • 00:14:10 So the second comparison is Z Image Turbo model  from BF16 to NVFP4 and at the right one we have  

  • 00:14:19 the Z Image Turbo FP8 scaled. So let's see the  quality difference. This is BF16, this is NVFP4  

  • 00:14:27 and this is FP8 scaled. You see the FP8 scaled  is almost same as BF16. Very high quality. This  

  • 00:14:34 is BF16, this is NVFP4 and this is FP8 scaled.  You see FP8 scaled is almost same as BF16. Why?  

  • 00:14:44 Because I am using our SE courses Musubi Tuner  included quantization methodology. This is a very  

  • 00:14:53 high quality quantization. We also automatically  install necessary nodes for you so you don't spend  

  • 00:14:59 any time to make them work and I am quantizing  these models for you but you can also use yourself  

  • 00:15:05 if you want to quantize any specific model into  FP8 scaled. So another example, this is BF16,  

  • 00:15:11 this is NVFP4 and this is FP8 scaled. Another  example, this is BF16, this is NVFP4 and this  

  • 00:15:20 is FP8 scaled. All of them is amazing if you  ask my opinion and these are just random.

  • 00:15:25 What about FLUX Dev model? FLUX Dev is not very  realistic as you know as a base. So this is BF16,  

  • 00:15:32 this is NVFP4, this is FP8 scaled  which I made and this is GGUF Q8.  

  • 00:15:39 You see the GGUF Q8 and FP8 scaled almost  same quality. This is BF16, this is NVFP4,  

  • 00:15:47 this is FP8 scaled and this is GGUF Q8. Our  FP8 scaled is almost same quality as BF16,  

  • 00:15:56 this is FP8 scaled and almost same quality as  GGUF Q8. So some people claiming that GGUF is much  

  • 00:16:02 better. No. FP8 scaled has almost same quality  as GGUF Q8. However FP8 scaled is much faster.  

  • 00:16:11 With FLUX Dev model I think NVFP4 got some quality  degrade. You see this is BF16, this is NVFP4, this  

  • 00:16:20 is FP8 scaled and this is GGUF Q8. This is BF16,  this is NVFP4. Yes I can see some noise, some  

  • 00:16:28 quality degrade. This is FP8 scaled and this is  GGUF Q8. So for FLUX Dev NVFP4 is low quality at  

  • 00:16:37 the moment. They need to upgrade it. However for Z  Image Turbo and FLUX 2 they are perfectly usable.

  • 00:16:45 So let's talk about precision. You always hear  some precision but what is that? The AI models  

  • 00:16:51 are made of billions of parameters and these  parameters are just numbers. Like this one 3.15,  

  • 00:17:00 12, 14, 35. So this is FP2 weight. A weight,  a parameter weight in a billions of parameters  

  • 00:17:10 models. So normally trainings are usually done  at FP32 or BF16. However we don't use the models  

  • 00:17:17 as FP32 because they are massive when they are  FP32. What I mean by that? So you see that BF16  

  • 00:17:25 of the Z Image Turbo is 11.46 gigabyte. If  it was FP32 it would be 20.9 gigabyte. FP8  

  • 00:17:35 scaled is 5.73 gigabytes and NVFP4 model is  4.20 gigabyte. This is not half of the 5.73  

  • 00:17:45 because NVFP4 is a little bit different. You  can read it in the NVIDIA developer blog if you  

  • 00:17:52 are interested in that but this is the size  difference. So FP32 is the highest quality,  

  • 00:17:58 highest precision, not needed for these newer  big models. BF16 is what I recommend as a  

  • 00:18:07 highest quality because it is both fast  and very high quality. You won't notice  

  • 00:18:12 the difference between FP32 and BF16. FP16 I  don't recommend so if there is a BF16 of the  

  • 00:18:19 model use it and if there are not I am usually  compiling making myself BF16 of the model from  

  • 00:18:27 FP32. FP16 is not as good as BF16 in none of  the generative AI models that I have tested.

  • 00:18:35 FP8 E4M3 is the default FP8. Now this is  low quality because it loses the ability  

  • 00:18:42 to properly represent the base model weights. This  is just default conversation. I mean look at this.  

  • 00:18:50 Normally the value is 3.15. It is represented  as 3.15 in BF16, 3.15 as FP16 but when it comes  

  • 00:19:00 to FP8 it becomes 3.25. You see this is a major  precision mistake. However when we do FP8 scaled  

  • 00:19:10 precision especially with this new quantization  method it is making it in a way that it becomes  

  • 00:19:18 much more representative to the original model.  So if you see raw FP8 E4M3 you can know that it is  

  • 00:19:27 very low quality compared to the BF16 or compared  to the GGUF Q8 or GGUF Q6. This is low quality.  

  • 00:19:35 FP4 is very primitive. You see it lost all the  precision became 3.0. This is a major mistake.

  • 00:19:43 The GGUF works differently compared to the FP  scaling because GGUF is block based and you see  

  • 00:19:52 the block average error is 0.01 for GGUF Q8. That  is why it is very high precision. With GGUF Q4 it  

  • 00:20:01 becomes average 0.15 so you lose significant  amount of quality. Moreover GGUF models are  

  • 00:20:08 slower to run. Minimum you will get like 20%  slower speed. It depends on the model and the  

  • 00:20:16 GPU. Sometimes it will be even more. So therefore  I recommend you to not use GGUF models. When you  

  • 00:20:23 should use? Let's say FP8 scaled is not fitting  into your RAM memory. Not VRAM, RAM memory.  

  • 00:20:31 Therefore you have to use even lower VRAM and  RAM requiring model like FLUX 2 model. For FLUX  

  • 00:20:38 2 we have low RAM preset which uses the GGUF Q4  because it is half size of the FP8 scaled model.  

  • 00:20:47 Therefore don't use GGUF unless your RAM memory  is not enough. Why? Because ComfyUI by default  

  • 00:20:55 does block swapping VRAM stream, it is also  called like that. Therefore as long as you have  

  • 00:21:01 sufficient amount of RAM memory I recommend you to  use always properly made FP8 scaled. Sometimes you  

  • 00:21:10 will see FP8 mixed. Mixed means that some of the  parameters are quantized, some of them are not,  

  • 00:21:17 like kept as BF16 or FP32. These models are also  good so prefer them over GGUF models. Because even  

  • 00:21:26 if you do block swapping with your model it  will be faster than the GGUF model. Test and  

  • 00:21:32 you will see and it will be better quality.  So this is the precisions of the model. You  

  • 00:21:36 will lose the precision like this. FP32 maximum,  BF16 like this, FP16 like this, FP8 like this but  

  • 00:21:44 don't compare this with FP8 scaled or FP8 mixed  precision. FP4 E5M2 like this, GGUF are like this.

  • 00:21:52 Finally in this tutorial I will introduce  you a new cloud service SimplePod,  

  • 00:21:58 like RunPod. What is the difference of  SimplePod compared to RunPod? It is working  

  • 00:22:03 exactly same as RunPod so any of my RunPod  tutorial will work with SimplePod as well.  

  • 00:22:08 Please use this link to register. I appreciate  that. After registering go to your dashboard,  

  • 00:22:15 go to billing and add some credits to your  billing. It shows every spending as well.  

  • 00:22:21 Then use this template link then click edit and  use and decide how much system disk you want. So  

  • 00:22:31 I will show first with base system disk then I  will show with permanent storage disk. So I am  

  • 00:22:38 going to set this as 100 gigabytes. This will  be temporary, it will get deleted. It is going  

  • 00:22:44 to use 3000 port for ComfyUI, for SwarmUI you can  also use. I will show both of them. Save and use.

  • 00:22:52 Then you need to pick your GPU. So what advantage  this platform has? It is much faster than RunPod  

  • 00:23:00 and it is much cheaper. So you see this RTX 5090  is only 44 cents. Let's see it on the RunPod.  

  • 00:23:08 This txt file also contains RunPod so let's use  our link to open RunPod. I appreciate that. Then  

  • 00:23:14 let's sign in. And for RunPod please use this new  template. You need to use this. Okay let's double  

  • 00:23:22 click it. It has selected the template accurately.  And you see RTX 5090 is 89 cents. So there is  

  • 00:23:30 a massive amount of difference. This is 100%  cheaper and it is faster both network and disk.  

  • 00:23:39 Okay I am going to rent this one. The template  is set. If you want to change it again just X  

  • 00:23:45 and click edit and edit the template. You can  add other ports. You can add your persistent  

  • 00:23:51 storage which I will show. Okay save and  use. Okay I am ready then I will click run.

  • 00:23:56 So this is my temporary disk based template. So  when I delete this everything will be deleted.  

  • 00:24:02 Let's also make a permanent storage. You see  there is storage here so I click this. I click  

  • 00:24:07 add a new persistent volume. You see currently  this is the location. Let's say tutorial. And  

  • 00:24:14 let's make this like 200 gigabytes. You can make  as much as you want and you see this is only  

  • 00:24:19 6 dollars per month. However on RunPod let's look  at the network cost. So I am going to pick Europe  

  • 00:24:27 like this and let's make this like 200 and when I  set this 200 gigabytes it is 14 dollars per month.  

  • 00:24:35 It is more than twice expensive than the  SimplePod. Okay let's save. So to be able to  

  • 00:24:41 use this template now I will return back to the  documentation and open the template link again  

  • 00:24:48 and in here edit and use and in here you need to  select your persistence volume. You see tutorial  

  • 00:24:55 arrived. And mount point you need to make this  workspace. Don't forget. Hopefully we will make  

  • 00:25:00 this automatic so you won't be needed to make  this workspace but currently this is mandatory.  

  • 00:25:05 So it is going to use my persistence volume  tutorial. Save and use. Now I need to pick my GPU.

  • 00:25:11 Let's look at the RTX Pro 6000. So you see this  is only 72 cents per hour and these are not spot  

  • 00:25:20 prices. These are private server prices. These are  not like community cloud. These are like secure  

  • 00:25:26 cloud on RunPod. So this is only 72 cents. I am  going to select this and run. Now you see this  

  • 00:25:32 is my second running. When I go to my servers  I will see both of them. So first one is here,  

  • 00:25:38 the second one is here. You can see that the  second one has a volume disk 200 gigabytes.  

  • 00:25:42 The first one doesn't have so this is using  temporary disk. Let's see the price on RunPod.  

  • 00:25:47 On RunPod the RTX Pro 6000 price is 1.84  dollars. So it is almost 3 times expensive  

  • 00:25:58 than my RTX Pro 6000 on SimplePod. That is why  now I recommend SimplePod to use. And all of  

  • 00:26:06 the RunPod scripts will work on SimplePod. You  just need to follow these steps exactly same.

  • 00:26:13 Okay the first server started. This SimplePod has  some advantages which I will show. So let's make  

  • 00:26:20 installation on RTX Pro 6000 because after this  point both of them are same just one difference  

  • 00:26:29 when you delete your instance this one all the  data will be lost. So make sure to backup your  

  • 00:26:36 data so I will just delete it. There is no stop  button. Make sure that using permanent storage  

  • 00:26:42 if you want to keep your data. So I will just  delete my instance. So my second instance is  

  • 00:26:48 now here which has a permanent storage. You see it  shows my volume and I have set it as a workspace.  

  • 00:26:55 To connect it we are going to use Jupyter. You  see there is Jupyter so click secure and it will  

  • 00:27:00 open the Jupyter Lab interface as usual. So let's  install ComfyUI into here. All I need to do is  

  • 00:27:07 just drag and drop my zip file into here. When you  right click there is no extract option yet. I am  

  • 00:27:15 talking with the developer hopefully they will  add. So to be able to extract it we will open  

  • 00:27:20 a new terminal apt install zip and click yes.  You can also drag and drop all the files after  

  • 00:27:28 extracting in your folder then I will do unzip  ComfyUI zip file. I click it tab and it completed  

  • 00:27:36 the name then refresh. So everything is here. But  hopefully we will have right click and unzip soon.

  • 00:27:42 Then I need to follow the RunPod SimplePod  instructions for installation. Copy this  

  • 00:27:48 terminal and copy paste and hit enter if you  don't want to install these additional options.  

  • 00:27:54 Remember we do set workspace as a mount storage.  This is important. So the installation will be  

  • 00:28:01 really fast. You can follow it here. Let's see  the speed live. Meanwhile we can also download  

  • 00:28:09 models or install our SwarmUI if you want to use  SwarmUI. Okay it is installing. Look at the speed.  

  • 00:28:15 It is amazing. We upgraded our installers. Our  installers are now working perfect on RunPod,  

  • 00:28:21 MassCompute and SimplePod. All of them.  So meanwhile let's upload our SwarmUI zip  

  • 00:28:28 file here. Okay it is uploaded. Now I need  to unzip this. I will open a new terminal.  

  • 00:28:35 unzip this one. Okay it is unzipped. Let's  refresh. Now let's start the model downloader.  

  • 00:28:44 So let's make a demonstration both ComfyUI and  SwarmUI. So let's go to RunPod model download  

  • 00:28:50 instructions. Copy this like this and terminal  and paste. This will start the downloader.

  • 00:28:57 Okay downloader started. Let's open the link. The  ComfyUI installation almost completed. So I will  

  • 00:29:02 download Z Image Turbo model as a demonstration  into ComfyUI first. So I need to enter inside my  

  • 00:29:09 ComfyUI folder, select the models copy path like  this. Then in here I will copy paste it into here  

  • 00:29:17 with a backslash to the beginning. Don't forget  this backslash to the beginning otherwise it  

  • 00:29:22 will not work. Then I will select ComfyUI folder  structure. This is important. Then I will select  

  • 00:29:27 the Z Image Turbo Core Bundle like this. It  is 20 gigabytes so it will be really fast to  

  • 00:29:32 download. Let's follow the download here. So the  download speed is 180 megabytes, 200 megabytes  

  • 00:29:39 almost per second. It will download. The ComfyUI  installation is almost done. It is installing the  

  • 00:29:45 necessary libraries. But the speed is amazing  compared to the RunPod because I compared it.

  • 00:29:50 If you are a Linux user all you need to  do is using MassCompute install sh file.  

  • 00:29:56 You can look at the MassCompute instructions and  when you execute this command it will install and  

  • 00:30:02 work on your Linux machine. If you own a Linux  machine don't worry about that. Okay ComfyUI has  

  • 00:30:08 been installed on the SimplePod. So how we start  it? Go to bottom and select this command. Open a  

  • 00:30:15 new terminal and paste it and it will start the  ComfyUI. Meanwhile models are getting downloaded  

  • 00:30:21 almost done. So you can look at the logs and  you will see that it is supporting everything,  

  • 00:30:27 every library. You see CUDA 13 with PyTorch  version 2.9.1 and this is NVIDIA RTX Pro  

  • 00:30:35 6000 Blackwell Workstation Edition. Okay it  started locally. How do we connect this? We  

  • 00:30:42 are going to connect it from this port 3000.  So go back to your SimplePod interface and  

  • 00:30:48 you see that port 3000 became available. So  click direct. This direct works much faster  

  • 00:30:55 than RunPod. It is as if it is running  in your computer. Okay it is started.

  • 00:31:00 So since I have downloaded the Z Image Turbo  model bundles it is almost done. I am going  

  • 00:31:08 to generate some Z images. We have the presets  you see inside this ComfyUI version 73 presets.  

  • 00:31:15 Let's look at the Z Image Turbo. Let's use the  Z Image Turbo Quality 1. This has a upscale.  

  • 00:31:21 Okay it is loaded. Let's look if the all  models downloaded. Almost done. Not yet.  

  • 00:31:26 And every file I made here will be permanently  stored. When I go to storage I can see that I  

  • 00:31:32 am using 25.9 gigabytes of disk space. Next time  when you run it you will still run the install  

  • 00:31:40 command. However it will be much faster this time  because we previously installed everything. You  

  • 00:31:46 can even run the run command directly but if  you get errors run the install command again.

  • 00:31:51 Okay all the files downloaded so let's refresh  this. This is running on the SimplePod not on  

  • 00:31:57 my computer. So now everything is auto set.  Let's generate 5 images from here and run.  

  • 00:32:05 Okay it didn't see the VAE because these VAE  are sometimes downloaded for SwarmUI. You see  

  • 00:32:11 there is backslash problem. This is preset  saving problem. You just need to click this.  

  • 00:32:17 Okay it is fixed and run. You may get error only  with VAE because SwarmUI default sets VAE into  

  • 00:32:24 subfolder but other than that it is just right  away ready to use. And it started generation  

  • 00:32:30 already. Let's see the nvidia-smi. So pip install  nvidia-smi. nvidia-smi. We are using 21 gigabytes  

  • 00:32:39 of VRAM. This workflow generates 1536 to 1536  and upscales into 1920 pixel. So it will be  

  • 00:32:47 really high resolution, high quality. You can see  that it is doing upscaling as well with the best  

  • 00:32:53 upscaler model. So we are going to get amazing  quality images and you can see the speed is  

  • 00:32:59 just amazing. And we got the first image. It is  saved in the outputs folder. You can also right  

  • 00:33:05 click and save image. All of the generated  images videos will be inside output folder.

  • 00:33:11 So how you can download them? You can download  them fast on SimplePod. Go to your my servers your  

  • 00:33:18 interface. Go to file browser direct. I recommend  this. This is super fast. So this is where you can  

  • 00:33:25 upload and download files directly to the server  and it is ultra fast. When I go to the workspace  

  • 00:33:32 this is where my files are. Double click and  it is inside ComfyUI inside output. So when I  

  • 00:33:39 select this folder I can click download. It will  zip and download it instantly. It will be really  

  • 00:33:45 fast. Just click keep. It is done. If it was a  model training it would be inside models. Let's  

  • 00:33:51 download one of the model to see the speed. So  diffusion models and Z Image Turbo. This is like  

  • 00:33:57 12 gigabyte. I click download icon and I click  keep and it will start download. You see the speed  

  • 00:34:03 is amazing to download. You can download directly  your trained models, your generated images, videos  

  • 00:34:08 by using the direct method of the SimplePod.  You can also upload files here. It is just same.  

  • 00:34:15 Click this upload icon into the selected folder.  You can upload both folders and file. So let's  

  • 00:34:21 upload a file like output zip file and it will be  instantly uploaded. So this is super fast with the  

  • 00:34:29 SimplePod. You can also use secure connections  if you want or direct. Direct is faster.

  • 00:34:34 So how to use SwarmUI on SimplePod? I will show  that now. If you click this icon it will restart  

  • 00:34:41 your server. So it is restarting. It will flush  out my GPU. It will close the running ComfyUI.  

  • 00:34:48 It is fine but we already have installed it. As  a next step I will show installation of SwarmUI  

  • 00:34:54 and I will test it with FLUX 2 because this GPU  is a beast RTX Pro 6000. Let's see the speed. So  

  • 00:35:01 restart is happening probably done. Yes. You see  VRAM usage is dropped. Now I will connect from  

  • 00:35:07 the Jupyter once again. So open the RunPod  SwarmUI install instructions. It is same as  

  • 00:35:14 the SimplePod. I will rename the file. So copy,  open a new terminal, paste it. You need to start  

  • 00:35:22 installation then you can start download but make  sure that installation started and SwarmUI folder  

  • 00:35:29 has been cloned otherwise it will not work.  So once you see the SwarmUI folder now you can  

  • 00:35:34 begin the model downloads at the same time. But  let's just wait for installation because it is  

  • 00:35:40 ultra fast to install SwarmUI. Okay it  is almost getting done. We will see the  

  • 00:35:45 Cloudflared URL here. Yes it has started.  Okay localhost started. Now we are waiting  

  • 00:35:52 Cloudflared. Yes here. Open the Cloudflared  URL. Okay one more time. If it doesn't start  

  • 00:35:57 immediately wait for a while. Keep refreshing  keep clicking and it will start like this.

  • 00:36:03 Click agree. Customize settings. Select your  template. Next. Just install. Next. This is  

  • 00:36:10 none because we are going to use our ComfyUI  installed backend. This is mandatory. Don't  

  • 00:36:15 forget that. Don't use ComfyUI local. If you use  ComfyUI local it will not work. Next. I am not  

  • 00:36:20 going to download anything. Next. And yes I am  sure install. The installation is done. Now I  

  • 00:36:26 need to give my backend. So go to backends. Add  ComfyUI self starting like this. Okay. Refresh  

  • 00:36:32 so it will fix. Okay. Then pick your ComfyUI  folder. Right click copy path. Click this edit.  

  • 00:36:39 Put a backslash and main.py. So the path is like  this /workspace/ComfyUI/main.py. Add any extra  

  • 00:36:48 arguments that you want --use-sage-attention.  This is also same in the Windows installation  

  • 00:36:54 if you remember. Save. So it will start on  the GPU ID 0. If you have multiple GPUs you  

  • 00:36:59 can duplicate this multiple times and change  the GPU ID every time. Now it will start the  

  • 00:37:04 ComfyUI backend and it will become ready. We  need to import our presets. Go to presets.  

  • 00:37:12 Import. Choose from file and select the  latest SwarmUI amazing preset. You can  

  • 00:37:17 click overwrite if you want and import.  And it is all imported and click refresh.

  • 00:37:21 Now we need to download the model. So I will start  the model downloader one more time. RunPod model  

  • 00:37:27 download instructions. Since we did restart  the server new terminal. Exactly same as on  

  • 00:37:32 the RunPod but I am also showing you on the  SimplePod as well because I recommend you to  

  • 00:37:37 start using SimplePod if you want cheaper  prices. Okay let's open the Gradio live.  

  • 00:37:43 Then I am going to start downloading the  FLUX 2 bundle. By default our application  

  • 00:37:49 our downloader automatically detects the SwarmUI.  You see? You don't need to enter custom path. You  

  • 00:37:55 need to enter only if you are downloading into  ComfyUI. So FLUX 2 Core Bundle here and download  

  • 00:38:00 all models. You can follow the progress here.  It will start and it will get faster. Remember  

  • 00:38:06 in this tutorial video how to use SwarmUI  presets and workflows in ComfyUI. We have  

  • 00:38:12 explained how to use models of SwarmUI inside  ComfyUI by using the extra_model_paths.yaml file  

  • 00:38:21 so you don't need to have duplicate models.  You can have all of them inside SwarmUI and  

  • 00:38:25 use in ComfyUI as well. Or vice versa. You can  have them in ComfyUI and use in SwarmUI. This  

  • 00:38:32 tutorial video explains that perfectly. The  link will be in the description of the video.

  • 00:38:36 Remember we are spending time to set up and  download models right now but all of them are  

  • 00:38:41 being saved inside our permanent storage. So next  time I start they will be immediately ready which  

  • 00:38:46 I will show once everything is done how to start  again later. Okay so the FLUX 2 bundle download  

  • 00:38:54 has been completed but let's also download NVFP4  of the FLUX 2 model so click download. Remember  

  • 00:39:01 if you don't want to download any specific model  of the bundle you can just click download buttons  

  • 00:39:06 from here. It will queue them and it will  download them. So this way you can skip any  

  • 00:39:12 model download. So now it is downloading the NVFP4  version. After that we will be ready. Okay NVFP4  

  • 00:39:20 also downloaded. Let's go back to our running  Cloudflared SwarmUI. Go to models refresh so it  

  • 00:39:27 will see all the models. Then in the preset let's  select the FLUX High Quality Preset 1 because we  

  • 00:39:34 have the best GPU. Then Quick Tools reset params  to default and let's select the FLUX 2 Quality 1  

  • 00:39:41 because we have the best GPU. You see the default  resolution is 2048 to 2048. You can change it if  

  • 00:39:48 you want. Let's generate 16:9 image. Let's type  something simple. Super expensive sports car.  

  • 00:39:57 And let's generate 3 images. And I am going to  select the NVFP4 model and generate. Let's also  

  • 00:40:05 open nvidia-smi to see the VRAM usage. Okay it is  loading the model. You see without block swapping  

  • 00:40:12 and VRAM streaming it is using 43 gigabytes of  VRAM with NVFP4 model. It is also keeping probably  

  • 00:40:20 the text encoder in the VRAM as well. Maybe it  is in the RAM memory. It depends on how ComfyUI  

  • 00:40:26 is handling that. But our generation started.  Let's see the step speed. So I see 9.61 second  

  • 00:40:35 per IT. Yeah this is definitely interesting.  Its speed is not as fast as on my RTX 5090.  

  • 00:40:43 Okay I found the reason. The reason is that this  GPU has a power cap at the moment. I will report  

  • 00:40:52 this to the SimplePod developer. So if there are  such GPUs they need to discard them. They need  

  • 00:41:01 to not allow them to be used because this GPU is  currently power capped. This can happen on RunPod  

  • 00:41:08 as well. So pay attention to your GPU power cap.  If they are capped like this, this is 250 watt  

  • 00:41:15 capped. That is why this is much slower. Yes. So  this is why we are not getting the good speed.

  • 00:41:21 Okay now I will show you how you can resume  your work afterwards. So I will delete my  

  • 00:41:28 instance but all of my data will be kept  inside my permanent storage system. Confirm.  

  • 00:41:35 Then again you need to pick the template. So  return back to RunPod SimplePod instructions.  

  • 00:41:41 Double click the template again. Click edit  and use. Select the persistence volume from  

  • 00:41:48 here. Make the storage point workspace  and save and use. Then you will see the  

  • 00:41:56 template selected here like before. And pick  your GPU. Let's pick this one to see whether  

  • 00:42:02 this will be also power capped or not. Okay  run. So now it will be very fast to resume  

  • 00:42:09 our installed work. This is also same in the  RunPod. When you are using permanent storage  

  • 00:42:14 everything is same there as well. So let's  just wait for it to start our machine. It is  

  • 00:42:20 starting. Console appeared. As it start the  links will appear. Just wait until Jupyter  

  • 00:42:26 Lab link appears. Okay Jupyter direct appeared so  let's open it. If you get connection warnings like  

  • 00:42:33 this just click continue. This can also happen in  your PC if you use direct because direct link is  

  • 00:42:40 HTTP not secure HTTPS. So it depends whether  you want to use or not. I prefer to use it.

  • 00:42:47 Then I will use the RunPod SwarmUI install  instructions. I will rename it by the way.  

  • 00:42:52 I shouldn't be need to reinstall but if you need  it for some reason you can do if you get errors.  

  • 00:42:58 So it should right away start because SwarmUI also  automatically updates ComfyUI when it is starting.  

  • 00:43:04 So you see I am using the same command always.  Whether I am resuming or whether I am first time  

  • 00:43:10 installing. With other applications you also just  run the running part not the installation part.  

  • 00:43:16 I skipped the ComfyUI installation entirely. Okay  the Cloudflared started. I need to just wait like  

  • 00:43:22 1 minute sometimes for it to become available.  Just keep hitting the F5 to refresh the page.  

  • 00:43:28 Okay it is starting. Nice. You see the backend  will be automatically here. I can add another  

  • 00:43:34 backend because this one has 2X GPU. Let's also  open nvidia-smi to verify. nvidia-smi. Okay it  

  • 00:43:41 is not here. I need to pip install nvidia-smi.  nvidia-smi. And let's see the watt usage whether  

  • 00:43:47 it is capped or not here. So I will Quick Tools  reset params to default presets Quality 1 select  

  • 00:43:54 model NVFP4 super fast car and let's select this  aspect ratio. Let's generate 3 images and let's  

  • 00:44:04 see whether this one is also power capped or  not. Hopefully they will remove such GPUs from  

  • 00:44:11 their GPU pool. Okay it is loading the model.  You see the model loading is very fast on the  

  • 00:44:16 SimplePod compared to RunPod. On RunPod sometimes  I wait 10 minutes for just model loading. Their  

  • 00:44:21 disk speed is very slow and they are replying  our every request so if you have any questions  

  • 00:44:28 any problems with them just join our Discord  channel and message me there. If you type SE  

  • 00:44:34 courses Discord you will see our link like this.  Also it will be in the description of the video.  

  • 00:44:39 Just join to our channel and type in any channel  and you can mention me and I will hopefully reply.

  • 00:44:46 Okay the generation started. Yeah this GPU is  also power capped for some reason. So this is  

  • 00:44:53 how you resume and continue your work. This is how  SimplePod works. So this is how you use ComfyUI,  

  • 00:45:02 SwarmUI or any of my installer on SimplePod. They  have a lot of advantages like direct connection,  

  • 00:45:11 like secure connection. Both works with any  ports. With permanent storage system that is  

  • 00:45:17 much cheaper and also faster. So if you have any  questions always ask me. Hopefully see you later.  

  • 00:45:23 You can ask me from Patreon, from email, from  YouTube wherever you want. From Discord also.

Clone this wiki locally