NVFP4 With CUDA 13 Full Tutorial 100 Speed Gain Quality Comparison and New Cheap Cloud SimplePod

NVFP4 With CUDA 13 Full Tutorial, 100%+ Speed Gain + Quality Comparison & New Cheap Cloud SimplePod

Full tutorial link > https://www.youtube.com/watch?v=yOj9PYq3XYM

Finally NVFP4 models has arrived to ComfyUI thus SwarmUI with CUDA 13. NVFP4 models are literally 100%+ faster with minimal impact on quality. I have done grid quality comparison to show you the difference on FLUX 2, Z Image Turbo and FLUX 1 of NVFP4 versions. To make CUDA 13 work, I have compiled Flash Attention, Sage Attention & xFormers for both Windows and Linux with all of the CUDA archs to support literally all GPUs starting from GTX 1650 series, RTX 2000, 3000, 4000, 5000 series and more.

In this full tutorial, I will show you how to upgrade your ComfyUI and thus SwarmUI to use latest CUDA 13 with latest libraries and Torch 2.9.1. Moreover, our compiled libraries such as Sage Attention works with all models on all GPUs without generating black images or videos such as Qwen Image or Wan 2.2 models. Hopefully LTX 2 presets and tutorial coming soon too. Finally, I introduce a new private cloud GPU platform called as SimplePod like RunPod. This platform has all the features of RunPod same way but much faster and cheaper.

📂 Resources & Links:

ComfyUI Installers: [ https://www.patreon.com/posts/ComfyUI-Installers-105023709 ]

SimplePod: [ https://simplepod.ai/ref?user=secourses ]

SwarmUI Installer, Model Auto Downloader and Presets: [ https://www.patreon.com/posts/SwarmUI-Install-Download-Models-Presets-114517862 ]

How to Use SwarmUI Presets & Workflows in ComfyUI + Custom Model Paths Setup for ComfyUI & SwarmUI Tutorial: [ https://youtu.be/EqFilBM3i7s ]

SECourses Discord Channel for 7/24 Support: [ https://discord.com/invite/software-engineering-courses-secourses-772774097734074388 ]

NVIDIA NVFP4 Blog Post More: [ https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/ ]

⏱️ Video Chapters:

00:00:00 New ComfyUI installer (CUDA 13, Torch 2.9.1, Triton + attention libs)

00:00:19 NVFP4 speedup claims vs real tests; why CUDA 13 enables new models

00:00:34 Prebuilt FlashAttention/SageAttention/xFormers for many GPUs (Windows + Linux)

00:01:00 Quality roadmap: FLUX2 Dev, Z Image Turbo, FLUX Dev (BF16/FP8/GGUF/NVFP4)

00:01:23 Downloader adds NVFP4: FLUX2 Dev, FLUX Dev (Context/Dev), Z Image Turbo

00:01:51 SimplePod AI intro: RunPod-style pods, cheaper rates, permanent storage

00:02:36 Musubi Tuner FP8 Scaled: quality myths vs GGUF + why scaled matters

00:03:10 Quantization & precision (FP32/BF16/FP8/GGUF) + Qwen3 low-VRAM encoders

00:03:34 ComfyUI v73 zip: CUDA 13 included; update NVIDIA drivers only (v72 deprecated)

00:04:13 Update steps: overwrite zip, delete venv, run install/update .bat

00:05:02 Python: 3.10 recommended (supports 3.10-3.13); fresh vs update

00:06:02 New installer flow: uv speed, standalone use, backend libs detected

00:07:12 Stability flags: --cache-none vs --disable-smart-memory (OOM/stuck fixes)

00:07:54 SwarmUI presets: 32 presets supported; drag/drop + auto model downloader

00:08:25 Update SwarmUI model-downloader zip (extract + overwrite)

00:08:49 Download bundles/models (Z Image Turbo Core + NVFP4 options)

00:09:25 Update/launch SwarmUI; point to updated ComfyUI backend + set args

00:10:32 Live gen test: Z Image Turbo BF16 @1536x1536

00:11:29 Switch to NVFP4: VRAM cache behavior; 1024x1024

00:12:36 FLUX2 Dev quality: FP8 Scaled vs NVFP4 side-by-side comparisons

00:13:33 Speed chart: FLUX2 NVFP4 about 193% faster than FP8 Scaled

00:14:10 Z Image Turbo quality: BF16 vs NVFP4 vs FP8 Scaled (quant method)

00:15:25 FLUX Dev: FP8 Scaled approx GGUF Q8; NVFP4 currently shows degradation

00:16:45 What precision means + model size examples (FP32/BF16/FP8 Scaled/NVFP4)

00:18:07 Practical recommendations: BF16 best; avoid FP16; raw FP8 vs FP8 Scaled

00:19:43 GGUF explained: block quant, slower runtime; use only when RAM is too low

00:21:36 Precision hierarchy recap + when to pick FP8 mixed/scaled over GGUF

00:21:58 SimplePod setup: register, add credits, open template link

00:22:31 Template config + RunPod price comparison (disk, ports, GPU selection)

00:24:02 Persistent volume: create + mount to /workspace

00:25:11 Launch RTX Pro 6000 pod; SimplePod vs RunPod pricing differences

00:26:29 Temp vs persistent disk: deleting instance wipes temp data - backup!

00:26:55 JupyterLab: upload zips, apt install zip, unzip ComfyUI in workspace

00:27:48 Run install script; unzip SwarmUI; start the model downloader

00:29:02 Downloader path for ComfyUI + folder structure; download Z Image Turbo bundle

00:30:08 Start ComfyUI; confirm CUDA 13 + Torch 2.9.1; connect via port 3000 Direct

00:31:08 Preset demo: Z Image Turbo Quality 1; fix VAE path; monitor VRAM

00:33:18 File Browser Direct: download outputs/models fast; upload files back

00:34:41 Restart server; install/start SwarmUI; open Cloudflared URL

00:36:26 SwarmUI backend: /workspace/ComfyUI/main.py + args; import presets

00:37:27 Download FLUX2 Core + NVFP4; share model paths between SwarmUI & ComfyUI

00:39:27 FLUX2 NVFP4 generation @2048x2048; VRAM usage + step speed

00:40:43 Cloud GPU pitfall: diagnosing a power-capped GPU

00:41:28 Resume: re-run template w/ volume; reconnect fast

00:45:02 Wrap-up: SimplePod pros (direct/secure, cheaper storage)

Video Transcription

00:00:00 Greetings everyone. Today I am going to introduce you our newest ComfyUI installer
00:00:04 which installs ComfyUI with latest CUDA 13, Torch 2.9.1. With the latest Triton,
00:00:13 SageAttention, FlashAttention, xFormers, InsightFace, DeepSpeed libraries. If you
00:00:19 remember NVIDIA had published this chart which shows extreme speed ups with NVFP4
00:00:27 and now with CUDA 13 and latest ComfyUI we are actually able to use these models. I have tested
00:00:34 them. We are not getting that much speed but we are still gaining significant speed ups. I
00:00:40 have compiled the latest libraries with all of the GPUs out there. You see all the supported
00:00:46 GPUs that our ComfyUI installer is supporting for FlashAttention, SageAttention and xFormers which
00:00:53 you need to run the models both for Windows and Linux. I have compiled for both of them.
00:01:00 But this is not all. I also have compared the actual quality difference. This is FLUX 2 Dev
00:01:06 model. This is Z Image Turbo model and this is FLUX Dev model. I have compared the BF16,
00:01:14 FP8 Scaled, GGUF Q8 and NVFP4 models quality. So today I will show all of them. I also
00:01:23 have added very famous 4 models into our model downloader FLUX 2 Dev NVFP4 model,
00:01:30 FLUX Dev Context NVFP4 model, FLUX Dev NVFP4 model and Z Image Turbo NVFP4 model. There is
00:01:39 no NVFP8 yet but I am expecting soon hopefully. So when you download the models you will see the
00:01:46 models like this in your SwarmUI or you will be able to use them in your ComfyUI as well.
00:01:51 Additionally, I will introduce you a new platform called as SimplePod AI. This is
00:01:58 like RunPod but the prices are much better. For example, RTX 5090 is starting from 44 cents,
00:02:07 on RunPod it is starting from 89 cents. This one also has permanent storage system
00:02:14 like RunPod but it is twice cheaper than RunPod. It is also faster than
00:02:19 RunPod. So I will show everything about SimplePod AI. Our installation scripts
00:02:24 and our applications will work right away on SimplePod AI just as RunPod.
00:02:29 Moreover, with our SE courses Musubi Tuner application I have generated Quant FP8 Scaled
00:02:36 version of the FLUX Dev model. Why? Because I wanted to compare its quality because there
00:02:44 are a lot of misinformation that even GGUF Q6 is better than FP8 Scaled but it is not true.
00:02:52 If you have a properly quantized scaled model it is equal to GGUF Q8 or even BF16 very close.
00:03:02 So to enlighten this issue I will also talk about the quantization and explain what are these FP32,
00:03:10 FP16, BF16, FP8, GGUF. Furthermore, we have a new 2 text encoders for Z Image Turbo models
00:03:19 which are Qwen 3 4 billion parameters FP8 mixed and Qwen 3 4 billion parameters FP4 mixed. These
00:03:28 are very good text encoders for very low VRAM GPUs. So they will get even further speed ups.
00:03:34 So I have updated ComfyUI installation post with all the newer information. You need to
00:03:41 download latest ComfyUI zip file. The link is here, the link will be also in the description
00:03:45 of the video. Download the latest version 73. We are not going to update CUDA 12.9 and Torch
00:03:52 2.8 anymore but I am still keeping that as a deprecated version 72 if you need it. I really
00:03:58 recommend you to read all the latest changes 10 January version 73 update. You don't need to have
00:04:06 CUDA 13 installed in your system. You only need to have updated NVIDIA drivers. Don't forget that.
00:04:13 So to update your existing ComfyUI all you need to do is this. Move the zip file into
00:04:19 your previous installation, right click and extract all the files in the same folder. You
00:04:26 need to see overwrite. This is important. Overwrite all the files. Then you need to
00:04:31 enter inside ComfyUI, delete your venv folder, virtual environment folder. This is mandatory.
00:04:39 Make sure that your ComfyUI is not running otherwise it will not delete it. So I am
00:04:44 going to close my running instances so that I will be able to delete the virtual environment
00:04:49 folder. Once your virtual environment folder is deleted all you need to do is double click
00:04:54 windows install or update ComfyUI.bat file. I recommend to use Python 3.10. I am testing
00:05:02 with it, I am using it. You need to have it installed yourself. So select option
00:05:07 1 and hit enter. But we support all Python 3.10, 11, 12 and 13 versions.
00:05:13 If you want to make a fresh installation extract it into your any drive then again all you need to
00:05:20 do is just windows install or update ComfyUI.bat file and it will start a fresh installation same
00:05:27 as updating after deleting virtual environment. This update requires deleting virtual environment.
00:05:33 You don't need to do this unless we change CUDA or Torch version again. So normally you don't need to
00:05:40 delete your virtual environment. But you can do that, there is no harm in it. Virtual environment
00:05:44 is 100% isolated, it will not cause any data loss. Again you can make a fresh installation which I
00:05:52 recommend, test it then you can move your main installation into newer version. The installation
00:05:57 is super fast. It will take few minutes depending on your computer and network speed.
00:06:02 You see it is almost done in my computer because we are now using uv packages for installing.
00:06:08 Our ComfyUI installer is standalone so therefore you can use it as a standalone but I prefer to
00:06:15 use it with SwarmUI which I will show. So after installation you can windows run GPU and it will
00:06:21 start your latest version ComfyUI. Let's see what features we are getting. So this is my updated
00:06:29 folder not a fresh installation folder. So you see that now found ComfyUI Kitchen Backend. It
00:06:35 supports all available true, available true, available true. So it is disabling ComfyUI
00:06:41 Kitchen Backend Triton because it is using eager and CUDA versions. It is automatic but we have
00:06:47 all 3 of them available true, available true and available true. You see it supports dequantize
00:06:53 NF4, dequantize per tensor, everything. My installer supporting everything. It is
00:06:59 by default using the SageAttention. You see it has these backend quantizations as well.
00:07:05 So if you want to add new arguments to your ComfyUI installation, edit this windows run
00:07:12 GPU with any text editor and change what is said here. For example people are getting stuck or out
00:07:20 of memory errors recently so you have 2 options. You can use --cache-none. This deloads every
00:07:28 model after they are executed whether it is text encoder or whether it is dual model. So this will
00:07:33 clear your RAM and VRAM 100%. Alternatively you can use --disable-smart-memory. This will return
00:07:41 back to older memory management. This uses lesser GPU memory. It is a little bit slower
00:07:47 but this will prevent getting you stuck or out of memory errors. So this is all about ComfyUI
00:07:54 installation. You can use it as usual. And now we support all 32 presets of the SwarmUI. Hopefully
00:08:03 I will make LTX 2 preset as well. So just wait me to do that. All you need to do is just drag
00:08:10 and drop the preset and it will work right away like this with the auto model downloader.
00:08:17 Okay so how do we use newer NVFP4 models? We are going to download the latest SwarmUI model
00:08:25 downloader zip file. This also has installation for SwarmUI if you remember. Move it into your
00:08:31 SwarmUI previous installation or you can make a fresh installation. It is same. So again it
00:08:38 is super important you extract and overwrite all the files. Extract here, overwrite all files. This
00:08:44 is important. If you don't overwrite you won't see the newer models. Then first let's download
00:08:49 newer models so I am going to windows start download models app.bat file. I am assuming
00:08:55 that you have downloaded bundles previously. If you didn't download bundles previously you need
00:09:00 to download them to be able to use newer models. For example you can download bundles like this
00:09:06 Z Image Turbo Core Bundle. Just click download and it will download all the necessary models.
00:09:12 So you can download newer NVFP4 models. These models bring speed up for RTX 5000 series. So
00:09:19 just click download and it will download all the models. Then you need to update your SwarmUI so I
00:09:25 will just windows update SwarmUI and it will start it. If you are first time installing
00:09:30 just use windows install SwarmUI. Okay it is updating. The update completed and it has started.
00:09:37 The first thing you need to do is you need to update your backend and give the new or
00:09:43 updated ComfyUI backend like this. I am using --use-sage-attention. You can also use other
00:09:50 attentions or as I just explained you, you can use this disable smart memory or cache none.
00:09:58 If you are getting stuck or getting out of VRAM errors I recommend to use disable smart memory
00:10:04 first. You can compare which one is working better for you or you can use cache none. But
00:10:09 cache none makes it load models every time at every generation from hard drive. Disable smart
00:10:16 memory doesn't do that. However this uses even lesser RAM memory. If you have a very limited
00:10:21 RAM memory you should use this. So once you done this you are ready to start using. Again
00:10:26 you should update your presets if you haven't yet. Then Quick Tools reset params to default.
00:10:32 Let's make a demonstration with the Z Image Turbo. This is 1536 by 1536 pixels. Super fast
00:10:42 car. First with the Z Image Turbo BF16 model. So let's generate 10 images then I will show with
00:10:49 the Z Image Turbo NVFP4 model. So you will see the speed difference live while I am recording
00:10:57 a video. Okay the generations started so this is the first generation. You are watching it live.
00:11:04 Look at the speed. So this is second generation. So let's see the speed. The second generation
00:11:11 took 6.31 second. So this is third generation. It took 6.38 seconds. This is fourth generation. It
00:11:20 took 6.33 seconds. So it is super fast. But what about NVFP4 model? So I will just cancel this and
00:11:29 I am just going to select the NVFP4 model and hit generate. Let's see the speed. This will blow your
00:11:36 mind. And let's also see the VRAM usage. So it is using 14 gigabytes of VRAM because it cached the
00:11:43 text encoder as well because I have VRAM. Don't worry you can run this as low as 6 gigabyte GPUs.
00:11:49 And look at the speed. I mean these are almost instant and these are 1536 to 1536 pixel images.
00:11:56 Not 1024. So it is taking 3 seconds to generate 2.25 megapixel images. These images are really
00:12:05 bigger than 1024 and you see the speed. Let's try 1024 to show you the speed. 1024,
00:12:14 1024 and let's hit generate. I mean look at the speed. They are like instant. You see? It
00:12:20 is taking 1.19 second. 1.2 second. Look at the speed. The NVFP model is just amazing. But what
00:12:28 about quality? So I have tested the quality and let's see the quality. This is FLUX 2 Dev model.
00:12:36 Which preset did I use? For this model I have used the FLUX 2 Quality 1 preset. You see this
00:12:43 one. So this is the highest quality preset. So the left ones are FP8 mixed scaled and the right
00:12:50 one are NVFP4. So you see left, right. Pretty good, pretty close. Left, right.
00:12:58 The man changed but very good quality. Left, right. This is very very close. You see? Left,
00:13:05 right. Almost same. Actually NVFP4 has a better fur of the animal if you ask my opinion. Left,
00:13:12 right. Both of them is great. Left, right. Both of them is excellent. Left,
00:13:18 right. Both of them is excellent. So with FLUX 2 Dev model we don't lose any quality.
00:13:25 But how much speed do we gain? When we look at our actual speed chart compared to the FP8 scaled,
00:13:33 the NVFP4 is 193% faster. You see? So we are gaining 100% speed. From 8.34 second IT to 4.31
00:13:49 second IT. So we get 100% speed gain, this is 193% faster and there is no quality difference.
00:13:58 We don't get this much speed gain that NVIDIA claims. I don't know how did
00:14:04 they made this chart but we are gaining massive amount of speed nevertheless.
00:14:10 So the second comparison is Z Image Turbo model from BF16 to NVFP4 and at the right one we have
00:14:19 the Z Image Turbo FP8 scaled. So let's see the quality difference. This is BF16, this is NVFP4
00:14:27 and this is FP8 scaled. You see the FP8 scaled is almost same as BF16. Very high quality. This
00:14:34 is BF16, this is NVFP4 and this is FP8 scaled. You see FP8 scaled is almost same as BF16. Why?
00:14:44 Because I am using our SE courses Musubi Tuner included quantization methodology. This is a very
00:14:53 high quality quantization. We also automatically install necessary nodes for you so you don't spend
00:14:59 any time to make them work and I am quantizing these models for you but you can also use yourself
00:15:05 if you want to quantize any specific model into FP8 scaled. So another example, this is BF16,
00:15:11 this is NVFP4 and this is FP8 scaled. Another example, this is BF16, this is NVFP4 and this
00:15:20 is FP8 scaled. All of them is amazing if you ask my opinion and these are just random.
00:15:25 What about FLUX Dev model? FLUX Dev is not very realistic as you know as a base. So this is BF16,
00:15:32 this is NVFP4, this is FP8 scaled which I made and this is GGUF Q8.
00:15:39 You see the GGUF Q8 and FP8 scaled almost same quality. This is BF16, this is NVFP4,
00:15:47 this is FP8 scaled and this is GGUF Q8. Our FP8 scaled is almost same quality as BF16,
00:15:56 this is FP8 scaled and almost same quality as GGUF Q8. So some people claiming that GGUF is much
00:16:02 better. No. FP8 scaled has almost same quality as GGUF Q8. However FP8 scaled is much faster.
00:16:11 With FLUX Dev model I think NVFP4 got some quality degrade. You see this is BF16, this is NVFP4, this
00:16:20 is FP8 scaled and this is GGUF Q8. This is BF16, this is NVFP4. Yes I can see some noise, some
00:16:28 quality degrade. This is FP8 scaled and this is GGUF Q8. So for FLUX Dev NVFP4 is low quality at
00:16:37 the moment. They need to upgrade it. However for Z Image Turbo and FLUX 2 they are perfectly usable.
00:16:45 So let's talk about precision. You always hear some precision but what is that? The AI models
00:16:51 are made of billions of parameters and these parameters are just numbers. Like this one 3.15,
00:17:00 12, 14, 35. So this is FP2 weight. A weight, a parameter weight in a billions of parameters
00:17:10 models. So normally trainings are usually done at FP32 or BF16. However we don't use the models
00:17:17 as FP32 because they are massive when they are FP32. What I mean by that? So you see that BF16
00:17:25 of the Z Image Turbo is 11.46 gigabyte. If it was FP32 it would be 20.9 gigabyte. FP8
00:17:35 scaled is 5.73 gigabytes and NVFP4 model is 4.20 gigabyte. This is not half of the 5.73
00:17:45 because NVFP4 is a little bit different. You can read it in the NVIDIA developer blog if you
00:17:52 are interested in that but this is the size difference. So FP32 is the highest quality,
00:17:58 highest precision, not needed for these newer big models. BF16 is what I recommend as a
00:18:07 highest quality because it is both fast and very high quality. You won't notice
00:18:12 the difference between FP32 and BF16. FP16 I don't recommend so if there is a BF16 of the
00:18:19 model use it and if there are not I am usually compiling making myself BF16 of the model from
00:18:27 FP32. FP16 is not as good as BF16 in none of the generative AI models that I have tested.
00:18:35 FP8 E4M3 is the default FP8. Now this is low quality because it loses the ability
00:18:42 to properly represent the base model weights. This is just default conversation. I mean look at this.
00:18:50 Normally the value is 3.15. It is represented as 3.15 in BF16, 3.15 as FP16 but when it comes
00:19:00 to FP8 it becomes 3.25. You see this is a major precision mistake. However when we do FP8 scaled
00:19:10 precision especially with this new quantization method it is making it in a way that it becomes
00:19:18 much more representative to the original model. So if you see raw FP8 E4M3 you can know that it is
00:19:27 very low quality compared to the BF16 or compared to the GGUF Q8 or GGUF Q6. This is low quality.
00:19:35 FP4 is very primitive. You see it lost all the precision became 3.0. This is a major mistake.
00:19:43 The GGUF works differently compared to the FP scaling because GGUF is block based and you see
00:19:52 the block average error is 0.01 for GGUF Q8. That is why it is very high precision. With GGUF Q4 it
00:20:01 becomes average 0.15 so you lose significant amount of quality. Moreover GGUF models are
00:20:08 slower to run. Minimum you will get like 20% slower speed. It depends on the model and the
00:20:16 GPU. Sometimes it will be even more. So therefore I recommend you to not use GGUF models. When you
00:20:23 should use? Let's say FP8 scaled is not fitting into your RAM memory. Not VRAM, RAM memory.
00:20:31 Therefore you have to use even lower VRAM and RAM requiring model like FLUX 2 model. For FLUX
00:20:38 2 we have low RAM preset which uses the GGUF Q4 because it is half size of the FP8 scaled model.
00:20:47 Therefore don't use GGUF unless your RAM memory is not enough. Why? Because ComfyUI by default
00:20:55 does block swapping VRAM stream, it is also called like that. Therefore as long as you have
00:21:01 sufficient amount of RAM memory I recommend you to use always properly made FP8 scaled. Sometimes you
00:21:10 will see FP8 mixed. Mixed means that some of the parameters are quantized, some of them are not,
00:21:17 like kept as BF16 or FP32. These models are also good so prefer them over GGUF models. Because even
00:21:26 if you do block swapping with your model it will be faster than the GGUF model. Test and
00:21:32 you will see and it will be better quality. So this is the precisions of the model. You
00:21:36 will lose the precision like this. FP32 maximum, BF16 like this, FP16 like this, FP8 like this but
00:21:44 don't compare this with FP8 scaled or FP8 mixed precision. FP4 E5M2 like this, GGUF are like this.
00:21:52 Finally in this tutorial I will introduce you a new cloud service SimplePod,
00:21:58 like RunPod. What is the difference of SimplePod compared to RunPod? It is working
00:22:03 exactly same as RunPod so any of my RunPod tutorial will work with SimplePod as well.
00:22:08 Please use this link to register. I appreciate that. After registering go to your dashboard,
00:22:15 go to billing and add some credits to your billing. It shows every spending as well.
00:22:21 Then use this template link then click edit and use and decide how much system disk you want. So
00:22:31 I will show first with base system disk then I will show with permanent storage disk. So I am
00:22:38 going to set this as 100 gigabytes. This will be temporary, it will get deleted. It is going
00:22:44 to use 3000 port for ComfyUI, for SwarmUI you can also use. I will show both of them. Save and use.
00:22:52 Then you need to pick your GPU. So what advantage this platform has? It is much faster than RunPod
00:23:00 and it is much cheaper. So you see this RTX 5090 is only 44 cents. Let's see it on the RunPod.
00:23:08 This txt file also contains RunPod so let's use our link to open RunPod. I appreciate that. Then
00:23:14 let's sign in. And for RunPod please use this new template. You need to use this. Okay let's double
00:23:22 click it. It has selected the template accurately. And you see RTX 5090 is 89 cents. So there is
00:23:30 a massive amount of difference. This is 100% cheaper and it is faster both network and disk.
00:23:39 Okay I am going to rent this one. The template is set. If you want to change it again just X
00:23:45 and click edit and edit the template. You can add other ports. You can add your persistent
00:23:51 storage which I will show. Okay save and use. Okay I am ready then I will click run.
00:23:56 So this is my temporary disk based template. So when I delete this everything will be deleted.
00:24:02 Let's also make a permanent storage. You see there is storage here so I click this. I click
00:24:07 add a new persistent volume. You see currently this is the location. Let's say tutorial. And
00:24:14 let's make this like 200 gigabytes. You can make as much as you want and you see this is only
00:24:19 6 dollars per month. However on RunPod let's look at the network cost. So I am going to pick Europe
00:24:27 like this and let's make this like 200 and when I set this 200 gigabytes it is 14 dollars per month.
00:24:35 It is more than twice expensive than the SimplePod. Okay let's save. So to be able to
00:24:41 use this template now I will return back to the documentation and open the template link again
00:24:48 and in here edit and use and in here you need to select your persistence volume. You see tutorial
00:24:55 arrived. And mount point you need to make this workspace. Don't forget. Hopefully we will make
00:25:00 this automatic so you won't be needed to make this workspace but currently this is mandatory.
00:25:05 So it is going to use my persistence volume tutorial. Save and use. Now I need to pick my GPU.
00:25:11 Let's look at the RTX Pro 6000. So you see this is only 72 cents per hour and these are not spot
00:25:20 prices. These are private server prices. These are not like community cloud. These are like secure
00:25:26 cloud on RunPod. So this is only 72 cents. I am going to select this and run. Now you see this
00:25:32 is my second running. When I go to my servers I will see both of them. So first one is here,
00:25:38 the second one is here. You can see that the second one has a volume disk 200 gigabytes.
00:25:42 The first one doesn't have so this is using temporary disk. Let's see the price on RunPod.
00:25:47 On RunPod the RTX Pro 6000 price is 1.84 dollars. So it is almost 3 times expensive
00:25:58 than my RTX Pro 6000 on SimplePod. That is why now I recommend SimplePod to use. And all of
00:26:06 the RunPod scripts will work on SimplePod. You just need to follow these steps exactly same.
00:26:13 Okay the first server started. This SimplePod has some advantages which I will show. So let's make
00:26:20 installation on RTX Pro 6000 because after this point both of them are same just one difference
00:26:29 when you delete your instance this one all the data will be lost. So make sure to backup your
00:26:36 data so I will just delete it. There is no stop button. Make sure that using permanent storage
00:26:42 if you want to keep your data. So I will just delete my instance. So my second instance is
00:26:48 now here which has a permanent storage. You see it shows my volume and I have set it as a workspace.
00:26:55 To connect it we are going to use Jupyter. You see there is Jupyter so click secure and it will
00:27:00 open the Jupyter Lab interface as usual. So let's install ComfyUI into here. All I need to do is
00:27:07 just drag and drop my zip file into here. When you right click there is no extract option yet. I am
00:27:15 talking with the developer hopefully they will add. So to be able to extract it we will open
00:27:20 a new terminal apt install zip and click yes. You can also drag and drop all the files after
00:27:28 extracting in your folder then I will do unzip ComfyUI zip file. I click it tab and it completed
00:27:36 the name then refresh. So everything is here. But hopefully we will have right click and unzip soon.
00:27:42 Then I need to follow the RunPod SimplePod instructions for installation. Copy this
00:27:48 terminal and copy paste and hit enter if you don't want to install these additional options.
00:27:54 Remember we do set workspace as a mount storage. This is important. So the installation will be
00:28:01 really fast. You can follow it here. Let's see the speed live. Meanwhile we can also download
00:28:09 models or install our SwarmUI if you want to use SwarmUI. Okay it is installing. Look at the speed.
00:28:15 It is amazing. We upgraded our installers. Our installers are now working perfect on RunPod,
00:28:21 MassCompute and SimplePod. All of them. So meanwhile let's upload our SwarmUI zip
00:28:28 file here. Okay it is uploaded. Now I need to unzip this. I will open a new terminal.
00:28:35 unzip this one. Okay it is unzipped. Let's refresh. Now let's start the model downloader.
00:28:44 So let's make a demonstration both ComfyUI and SwarmUI. So let's go to RunPod model download
00:28:50 instructions. Copy this like this and terminal and paste. This will start the downloader.
00:28:57 Okay downloader started. Let's open the link. The ComfyUI installation almost completed. So I will
00:29:02 download Z Image Turbo model as a demonstration into ComfyUI first. So I need to enter inside my
00:29:09 ComfyUI folder, select the models copy path like this. Then in here I will copy paste it into here
00:29:17 with a backslash to the beginning. Don't forget this backslash to the beginning otherwise it
00:29:22 will not work. Then I will select ComfyUI folder structure. This is important. Then I will select
00:29:27 the Z Image Turbo Core Bundle like this. It is 20 gigabytes so it will be really fast to
00:29:32 download. Let's follow the download here. So the download speed is 180 megabytes, 200 megabytes
00:29:39 almost per second. It will download. The ComfyUI installation is almost done. It is installing the
00:29:45 necessary libraries. But the speed is amazing compared to the RunPod because I compared it.
00:29:50 If you are a Linux user all you need to do is using MassCompute install sh file.
00:29:56 You can look at the MassCompute instructions and when you execute this command it will install and
00:30:02 work on your Linux machine. If you own a Linux machine don't worry about that. Okay ComfyUI has
00:30:08 been installed on the SimplePod. So how we start it? Go to bottom and select this command. Open a
00:30:15 new terminal and paste it and it will start the ComfyUI. Meanwhile models are getting downloaded
00:30:21 almost done. So you can look at the logs and you will see that it is supporting everything,
00:30:27 every library. You see CUDA 13 with PyTorch version 2.9.1 and this is NVIDIA RTX Pro
00:30:35 6000 Blackwell Workstation Edition. Okay it started locally. How do we connect this? We
00:30:42 are going to connect it from this port 3000. So go back to your SimplePod interface and
00:30:48 you see that port 3000 became available. So click direct. This direct works much faster
00:30:55 than RunPod. It is as if it is running in your computer. Okay it is started.
00:31:00 So since I have downloaded the Z Image Turbo model bundles it is almost done. I am going
00:31:08 to generate some Z images. We have the presets you see inside this ComfyUI version 73 presets.
00:31:15 Let's look at the Z Image Turbo. Let's use the Z Image Turbo Quality 1. This has a upscale.
00:31:21 Okay it is loaded. Let's look if the all models downloaded. Almost done. Not yet.
00:31:26 And every file I made here will be permanently stored. When I go to storage I can see that I
00:31:32 am using 25.9 gigabytes of disk space. Next time when you run it you will still run the install
00:31:40 command. However it will be much faster this time because we previously installed everything. You
00:31:46 can even run the run command directly but if you get errors run the install command again.
00:31:51 Okay all the files downloaded so let's refresh this. This is running on the SimplePod not on
00:31:57 my computer. So now everything is auto set. Let's generate 5 images from here and run.
00:32:05 Okay it didn't see the VAE because these VAE are sometimes downloaded for SwarmUI. You see
00:32:11 there is backslash problem. This is preset saving problem. You just need to click this.
00:32:17 Okay it is fixed and run. You may get error only with VAE because SwarmUI default sets VAE into
00:32:24 subfolder but other than that it is just right away ready to use. And it started generation
00:32:30 already. Let's see the nvidia-smi. So pip install nvidia-smi. nvidia-smi. We are using 21 gigabytes
00:32:39 of VRAM. This workflow generates 1536 to 1536 and upscales into 1920 pixel. So it will be
00:32:47 really high resolution, high quality. You can see that it is doing upscaling as well with the best
00:32:53 upscaler model. So we are going to get amazing quality images and you can see the speed is
00:32:59 just amazing. And we got the first image. It is saved in the outputs folder. You can also right
00:33:05 click and save image. All of the generated images videos will be inside output folder.
00:33:11 So how you can download them? You can download them fast on SimplePod. Go to your my servers your
00:33:18 interface. Go to file browser direct. I recommend this. This is super fast. So this is where you can
00:33:25 upload and download files directly to the server and it is ultra fast. When I go to the workspace
00:33:32 this is where my files are. Double click and it is inside ComfyUI inside output. So when I
00:33:39 select this folder I can click download. It will zip and download it instantly. It will be really
00:33:45 fast. Just click keep. It is done. If it was a model training it would be inside models. Let's
00:33:51 download one of the model to see the speed. So diffusion models and Z Image Turbo. This is like
00:33:57 12 gigabyte. I click download icon and I click keep and it will start download. You see the speed
00:34:03 is amazing to download. You can download directly your trained models, your generated images, videos
00:34:08 by using the direct method of the SimplePod. You can also upload files here. It is just same.
00:34:15 Click this upload icon into the selected folder. You can upload both folders and file. So let's
00:34:21 upload a file like output zip file and it will be instantly uploaded. So this is super fast with the
00:34:29 SimplePod. You can also use secure connections if you want or direct. Direct is faster.
00:34:34 So how to use SwarmUI on SimplePod? I will show that now. If you click this icon it will restart
00:34:41 your server. So it is restarting. It will flush out my GPU. It will close the running ComfyUI.
00:34:48 It is fine but we already have installed it. As a next step I will show installation of SwarmUI
00:34:54 and I will test it with FLUX 2 because this GPU is a beast RTX Pro 6000. Let's see the speed. So
00:35:01 restart is happening probably done. Yes. You see VRAM usage is dropped. Now I will connect from
00:35:07 the Jupyter once again. So open the RunPod SwarmUI install instructions. It is same as
00:35:14 the SimplePod. I will rename the file. So copy, open a new terminal, paste it. You need to start
00:35:22 installation then you can start download but make sure that installation started and SwarmUI folder
00:35:29 has been cloned otherwise it will not work. So once you see the SwarmUI folder now you can
00:35:34 begin the model downloads at the same time. But let's just wait for installation because it is
00:35:40 ultra fast to install SwarmUI. Okay it is almost getting done. We will see the
00:35:45 Cloudflared URL here. Yes it has started. Okay localhost started. Now we are waiting
00:35:52 Cloudflared. Yes here. Open the Cloudflared URL. Okay one more time. If it doesn't start
00:35:57 immediately wait for a while. Keep refreshing keep clicking and it will start like this.
00:36:03 Click agree. Customize settings. Select your template. Next. Just install. Next. This is
00:36:10 none because we are going to use our ComfyUI installed backend. This is mandatory. Don't
00:36:15 forget that. Don't use ComfyUI local. If you use ComfyUI local it will not work. Next. I am not
00:36:20 going to download anything. Next. And yes I am sure install. The installation is done. Now I
00:36:26 need to give my backend. So go to backends. Add ComfyUI self starting like this. Okay. Refresh
00:36:32 so it will fix. Okay. Then pick your ComfyUI folder. Right click copy path. Click this edit.
00:36:39 Put a backslash and main.py. So the path is like this /workspace/ComfyUI/main.py. Add any extra
00:36:48 arguments that you want --use-sage-attention. This is also same in the Windows installation
00:36:54 if you remember. Save. So it will start on the GPU ID 0. If you have multiple GPUs you
00:36:59 can duplicate this multiple times and change the GPU ID every time. Now it will start the
00:37:04 ComfyUI backend and it will become ready. We need to import our presets. Go to presets.
00:37:12 Import. Choose from file and select the latest SwarmUI amazing preset. You can
00:37:17 click overwrite if you want and import. And it is all imported and click refresh.
00:37:21 Now we need to download the model. So I will start the model downloader one more time. RunPod model
00:37:27 download instructions. Since we did restart the server new terminal. Exactly same as on
00:37:32 the RunPod but I am also showing you on the SimplePod as well because I recommend you to
00:37:37 start using SimplePod if you want cheaper prices. Okay let's open the Gradio live.
00:37:43 Then I am going to start downloading the FLUX 2 bundle. By default our application
00:37:49 our downloader automatically detects the SwarmUI. You see? You don't need to enter custom path. You
00:37:55 need to enter only if you are downloading into ComfyUI. So FLUX 2 Core Bundle here and download
00:38:00 all models. You can follow the progress here. It will start and it will get faster. Remember
00:38:06 in this tutorial video how to use SwarmUI presets and workflows in ComfyUI. We have
00:38:12 explained how to use models of SwarmUI inside ComfyUI by using the extra_model_paths.yaml file
00:38:21 so you don't need to have duplicate models. You can have all of them inside SwarmUI and
00:38:25 use in ComfyUI as well. Or vice versa. You can have them in ComfyUI and use in SwarmUI. This
00:38:32 tutorial video explains that perfectly. The link will be in the description of the video.
00:38:36 Remember we are spending time to set up and download models right now but all of them are
00:38:41 being saved inside our permanent storage. So next time I start they will be immediately ready which
00:38:46 I will show once everything is done how to start again later. Okay so the FLUX 2 bundle download
00:38:54 has been completed but let's also download NVFP4 of the FLUX 2 model so click download. Remember
00:39:01 if you don't want to download any specific model of the bundle you can just click download buttons
00:39:06 from here. It will queue them and it will download them. So this way you can skip any
00:39:12 model download. So now it is downloading the NVFP4 version. After that we will be ready. Okay NVFP4
00:39:20 also downloaded. Let's go back to our running Cloudflared SwarmUI. Go to models refresh so it
00:39:27 will see all the models. Then in the preset let's select the FLUX High Quality Preset 1 because we
00:39:34 have the best GPU. Then Quick Tools reset params to default and let's select the FLUX 2 Quality 1
00:39:41 because we have the best GPU. You see the default resolution is 2048 to 2048. You can change it if
00:39:48 you want. Let's generate 16:9 image. Let's type something simple. Super expensive sports car.
00:39:57 And let's generate 3 images. And I am going to select the NVFP4 model and generate. Let's also
00:40:05 open nvidia-smi to see the VRAM usage. Okay it is loading the model. You see without block swapping
00:40:12 and VRAM streaming it is using 43 gigabytes of VRAM with NVFP4 model. It is also keeping probably
00:40:20 the text encoder in the VRAM as well. Maybe it is in the RAM memory. It depends on how ComfyUI
00:40:26 is handling that. But our generation started. Let's see the step speed. So I see 9.61 second
00:40:35 per IT. Yeah this is definitely interesting. Its speed is not as fast as on my RTX 5090.
00:40:43 Okay I found the reason. The reason is that this GPU has a power cap at the moment. I will report
00:40:52 this to the SimplePod developer. So if there are such GPUs they need to discard them. They need
00:41:01 to not allow them to be used because this GPU is currently power capped. This can happen on RunPod
00:41:08 as well. So pay attention to your GPU power cap. If they are capped like this, this is 250 watt
00:41:15 capped. That is why this is much slower. Yes. So this is why we are not getting the good speed.
00:41:21 Okay now I will show you how you can resume your work afterwards. So I will delete my
00:41:28 instance but all of my data will be kept inside my permanent storage system. Confirm.
00:41:35 Then again you need to pick the template. So return back to RunPod SimplePod instructions.
00:41:41 Double click the template again. Click edit and use. Select the persistence volume from
00:41:48 here. Make the storage point workspace and save and use. Then you will see the
00:41:56 template selected here like before. And pick your GPU. Let's pick this one to see whether
00:42:02 this will be also power capped or not. Okay run. So now it will be very fast to resume
00:42:09 our installed work. This is also same in the RunPod. When you are using permanent storage
00:42:14 everything is same there as well. So let's just wait for it to start our machine. It is
00:42:20 starting. Console appeared. As it start the links will appear. Just wait until Jupyter
00:42:26 Lab link appears. Okay Jupyter direct appeared so let's open it. If you get connection warnings like
00:42:33 this just click continue. This can also happen in your PC if you use direct because direct link is
00:42:40 HTTP not secure HTTPS. So it depends whether you want to use or not. I prefer to use it.
00:42:47 Then I will use the RunPod SwarmUI install instructions. I will rename it by the way.
00:42:52 I shouldn't be need to reinstall but if you need it for some reason you can do if you get errors.
00:42:58 So it should right away start because SwarmUI also automatically updates ComfyUI when it is starting.
00:43:04 So you see I am using the same command always. Whether I am resuming or whether I am first time
00:43:10 installing. With other applications you also just run the running part not the installation part.
00:43:16 I skipped the ComfyUI installation entirely. Okay the Cloudflared started. I need to just wait like
00:43:22 1 minute sometimes for it to become available. Just keep hitting the F5 to refresh the page.
00:43:28 Okay it is starting. Nice. You see the backend will be automatically here. I can add another
00:43:34 backend because this one has 2X GPU. Let's also open nvidia-smi to verify. nvidia-smi. Okay it
00:43:41 is not here. I need to pip install nvidia-smi. nvidia-smi. And let's see the watt usage whether
00:43:47 it is capped or not here. So I will Quick Tools reset params to default presets Quality 1 select
00:43:54 model NVFP4 super fast car and let's select this aspect ratio. Let's generate 3 images and let's
00:44:04 see whether this one is also power capped or not. Hopefully they will remove such GPUs from
00:44:11 their GPU pool. Okay it is loading the model. You see the model loading is very fast on the
00:44:16 SimplePod compared to RunPod. On RunPod sometimes I wait 10 minutes for just model loading. Their
00:44:21 disk speed is very slow and they are replying our every request so if you have any questions
00:44:28 any problems with them just join our Discord channel and message me there. If you type SE
00:44:34 courses Discord you will see our link like this. Also it will be in the description of the video.
00:44:39 Just join to our channel and type in any channel and you can mention me and I will hopefully reply.
00:44:46 Okay the generation started. Yeah this GPU is also power capped for some reason. So this is
00:44:53 how you resume and continue your work. This is how SimplePod works. So this is how you use ComfyUI,
00:45:02 SwarmUI or any of my installer on SimplePod. They have a lot of advantages like direct connection,
00:45:11 like secure connection. Both works with any ports. With permanent storage system that is
00:45:17 much cheaper and also faster. So if you have any questions always ask me. Hopefully see you later.
00:45:23 You can ask me from Patreon, from email, from YouTube wherever you want. From Discord also.

Uh oh!

NVFP4 With CUDA 13 Full Tutorial 100 Speed Gain Quality Comparison and New Cheap Cloud SimplePod

NVFP4 With CUDA 13 Full Tutorial, 100%+ Speed Gain + Quality Comparison & New Cheap Cloud SimplePod

Full tutorial link > https://www.youtube.com/watch?v=yOj9PYq3XYM

Video Transcription

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!