-
-
Notifications
You must be signed in to change notification settings - Fork 362
NVFP4 With CUDA 13 Full Tutorial 100 Speed Gain Quality Comparison and New Cheap Cloud SimplePod
Full tutorial link > https://www.youtube.com/watch?v=yOj9PYq3XYM
Finally NVFP4 models has arrived to ComfyUI thus SwarmUI with CUDA 13. NVFP4 models are literally 100%+ faster with minimal impact on quality. I have done grid quality comparison to show you the difference on FLUX 2, Z Image Turbo and FLUX 1 of NVFP4 versions. To make CUDA 13 work, I have compiled Flash Attention, Sage Attention & xFormers for both Windows and Linux with all of the CUDA archs to support literally all GPUs starting from GTX 1650 series, RTX 2000, 3000, 4000, 5000 series and more.
In this full tutorial, I will show you how to upgrade your ComfyUI and thus SwarmUI to use latest CUDA 13 with latest libraries and Torch 2.9.1. Moreover, our compiled libraries such as Sage Attention works with all models on all GPUs without generating black images or videos such as Qwen Image or Wan 2.2 models. Hopefully LTX 2 presets and tutorial coming soon too. Finally, I introduce a new private cloud GPU platform called as SimplePod like RunPod. This platform has all the features of RunPod same way but much faster and cheaper.
📂 Resources & Links:
ComfyUI Installers: [ https://www.patreon.com/posts/ComfyUI-Installers-105023709 ]
SimplePod: [ https://simplepod.ai/ref?user=secourses ]
SwarmUI Installer, Model Auto Downloader and Presets: [ https://www.patreon.com/posts/SwarmUI-Install-Download-Models-Presets-114517862 ]
How to Use SwarmUI Presets & Workflows in ComfyUI + Custom Model Paths Setup for ComfyUI & SwarmUI Tutorial: [ https://youtu.be/EqFilBM3i7s ]
SECourses Discord Channel for 7/24 Support: [ https://discord.com/invite/software-engineering-courses-secourses-772774097734074388 ]
NVIDIA NVFP4 Blog Post More: [ https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/ ]
⏱️ Video Chapters:
00:00:00 New ComfyUI installer (CUDA 13, Torch 2.9.1, Triton + attention libs)
00:00:19 NVFP4 speedup claims vs real tests; why CUDA 13 enables new models
00:00:34 Prebuilt FlashAttention/SageAttention/xFormers for many GPUs (Windows + Linux)
00:01:00 Quality roadmap: FLUX2 Dev, Z Image Turbo, FLUX Dev (BF16/FP8/GGUF/NVFP4)
00:01:23 Downloader adds NVFP4: FLUX2 Dev, FLUX Dev (Context/Dev), Z Image Turbo
00:01:51 SimplePod AI intro: RunPod-style pods, cheaper rates, permanent storage
00:02:36 Musubi Tuner FP8 Scaled: quality myths vs GGUF + why scaled matters
00:03:10 Quantization & precision (FP32/BF16/FP8/GGUF) + Qwen3 low-VRAM encoders
00:03:34 ComfyUI v73 zip: CUDA 13 included; update NVIDIA drivers only (v72 deprecated)
00:04:13 Update steps: overwrite zip, delete venv, run install/update .bat
00:05:02 Python: 3.10 recommended (supports 3.10-3.13); fresh vs update
00:06:02 New installer flow: uv speed, standalone use, backend libs detected
00:07:12 Stability flags: --cache-none vs --disable-smart-memory (OOM/stuck fixes)
00:07:54 SwarmUI presets: 32 presets supported; drag/drop + auto model downloader
00:08:25 Update SwarmUI model-downloader zip (extract + overwrite)
00:08:49 Download bundles/models (Z Image Turbo Core + NVFP4 options)
00:09:25 Update/launch SwarmUI; point to updated ComfyUI backend + set args
00:10:32 Live gen test: Z Image Turbo BF16 @1536x1536
00:11:29 Switch to NVFP4: VRAM cache behavior; 1024x1024
00:12:36 FLUX2 Dev quality: FP8 Scaled vs NVFP4 side-by-side comparisons
00:13:33 Speed chart: FLUX2 NVFP4 about 193% faster than FP8 Scaled
00:14:10 Z Image Turbo quality: BF16 vs NVFP4 vs FP8 Scaled (quant method)
00:15:25 FLUX Dev: FP8 Scaled approx GGUF Q8; NVFP4 currently shows degradation
00:16:45 What precision means + model size examples (FP32/BF16/FP8 Scaled/NVFP4)
00:18:07 Practical recommendations: BF16 best; avoid FP16; raw FP8 vs FP8 Scaled
00:19:43 GGUF explained: block quant, slower runtime; use only when RAM is too low
00:21:36 Precision hierarchy recap + when to pick FP8 mixed/scaled over GGUF
00:21:58 SimplePod setup: register, add credits, open template link
00:22:31 Template config + RunPod price comparison (disk, ports, GPU selection)
00:24:02 Persistent volume: create + mount to /workspace
00:25:11 Launch RTX Pro 6000 pod; SimplePod vs RunPod pricing differences
00:26:29 Temp vs persistent disk: deleting instance wipes temp data - backup!
00:26:55 JupyterLab: upload zips, apt install zip, unzip ComfyUI in workspace
00:27:48 Run install script; unzip SwarmUI; start the model downloader
00:29:02 Downloader path for ComfyUI + folder structure; download Z Image Turbo bundle
00:30:08 Start ComfyUI; confirm CUDA 13 + Torch 2.9.1; connect via port 3000 Direct
00:31:08 Preset demo: Z Image Turbo Quality 1; fix VAE path; monitor VRAM
00:33:18 File Browser Direct: download outputs/models fast; upload files back
00:34:41 Restart server; install/start SwarmUI; open Cloudflared URL
00:36:26 SwarmUI backend: /workspace/ComfyUI/main.py + args; import presets
00:37:27 Download FLUX2 Core + NVFP4; share model paths between SwarmUI & ComfyUI
00:39:27 FLUX2 NVFP4 generation @2048x2048; VRAM usage + step speed
00:40:43 Cloud GPU pitfall: diagnosing a power-capped GPU
00:41:28 Resume: re-run template w/ volume; reconnect fast
00:45:02 Wrap-up: SimplePod pros (direct/secure, cheaper storage)
-
00:00:00 Greetings everyone. Today I am going to introduce you our newest ComfyUI installer
-
00:00:04 which installs ComfyUI with latest CUDA 13, Torch 2.9.1. With the latest Triton,
-
00:00:13 SageAttention, FlashAttention, xFormers, InsightFace, DeepSpeed libraries. If you
-
00:00:19 remember NVIDIA had published this chart which shows extreme speed ups with NVFP4
-
00:00:27 and now with CUDA 13 and latest ComfyUI we are actually able to use these models. I have tested
-
00:00:34 them. We are not getting that much speed but we are still gaining significant speed ups. I
-
00:00:40 have compiled the latest libraries with all of the GPUs out there. You see all the supported
-
00:00:46 GPUs that our ComfyUI installer is supporting for FlashAttention, SageAttention and xFormers which
-
00:00:53 you need to run the models both for Windows and Linux. I have compiled for both of them.
-
00:01:00 But this is not all. I also have compared the actual quality difference. This is FLUX 2 Dev
-
00:01:06 model. This is Z Image Turbo model and this is FLUX Dev model. I have compared the BF16,
-
00:01:14 FP8 Scaled, GGUF Q8 and NVFP4 models quality. So today I will show all of them. I also
-
00:01:23 have added very famous 4 models into our model downloader FLUX 2 Dev NVFP4 model,
-
00:01:30 FLUX Dev Context NVFP4 model, FLUX Dev NVFP4 model and Z Image Turbo NVFP4 model. There is
-
00:01:39 no NVFP8 yet but I am expecting soon hopefully. So when you download the models you will see the
-
00:01:46 models like this in your SwarmUI or you will be able to use them in your ComfyUI as well.
-
00:01:51 Additionally, I will introduce you a new platform called as SimplePod AI. This is
-
00:01:58 like RunPod but the prices are much better. For example, RTX 5090 is starting from 44 cents,
-
00:02:07 on RunPod it is starting from 89 cents. This one also has permanent storage system
-
00:02:14 like RunPod but it is twice cheaper than RunPod. It is also faster than
-
00:02:19 RunPod. So I will show everything about SimplePod AI. Our installation scripts
-
00:02:24 and our applications will work right away on SimplePod AI just as RunPod.
-
00:02:29 Moreover, with our SE courses Musubi Tuner application I have generated Quant FP8 Scaled
-
00:02:36 version of the FLUX Dev model. Why? Because I wanted to compare its quality because there
-
00:02:44 are a lot of misinformation that even GGUF Q6 is better than FP8 Scaled but it is not true.
-
00:02:52 If you have a properly quantized scaled model it is equal to GGUF Q8 or even BF16 very close.
-
00:03:02 So to enlighten this issue I will also talk about the quantization and explain what are these FP32,
-
00:03:10 FP16, BF16, FP8, GGUF. Furthermore, we have a new 2 text encoders for Z Image Turbo models
-
00:03:19 which are Qwen 3 4 billion parameters FP8 mixed and Qwen 3 4 billion parameters FP4 mixed. These
-
00:03:28 are very good text encoders for very low VRAM GPUs. So they will get even further speed ups.
-
00:03:34 So I have updated ComfyUI installation post with all the newer information. You need to
-
00:03:41 download latest ComfyUI zip file. The link is here, the link will be also in the description
-
00:03:45 of the video. Download the latest version 73. We are not going to update CUDA 12.9 and Torch
-
00:03:52 2.8 anymore but I am still keeping that as a deprecated version 72 if you need it. I really
-
00:03:58 recommend you to read all the latest changes 10 January version 73 update. You don't need to have
-
00:04:06 CUDA 13 installed in your system. You only need to have updated NVIDIA drivers. Don't forget that.
-
00:04:13 So to update your existing ComfyUI all you need to do is this. Move the zip file into
-
00:04:19 your previous installation, right click and extract all the files in the same folder. You
-
00:04:26 need to see overwrite. This is important. Overwrite all the files. Then you need to
-
00:04:31 enter inside ComfyUI, delete your venv folder, virtual environment folder. This is mandatory.
-
00:04:39 Make sure that your ComfyUI is not running otherwise it will not delete it. So I am
-
00:04:44 going to close my running instances so that I will be able to delete the virtual environment
-
00:04:49 folder. Once your virtual environment folder is deleted all you need to do is double click
-
00:04:54 windows install or update ComfyUI.bat file. I recommend to use Python 3.10. I am testing
-
00:05:02 with it, I am using it. You need to have it installed yourself. So select option
-
00:05:07 1 and hit enter. But we support all Python 3.10, 11, 12 and 13 versions.
-
00:05:13 If you want to make a fresh installation extract it into your any drive then again all you need to
-
00:05:20 do is just windows install or update ComfyUI.bat file and it will start a fresh installation same
-
00:05:27 as updating after deleting virtual environment. This update requires deleting virtual environment.
-
00:05:33 You don't need to do this unless we change CUDA or Torch version again. So normally you don't need to
-
00:05:40 delete your virtual environment. But you can do that, there is no harm in it. Virtual environment
-
00:05:44 is 100% isolated, it will not cause any data loss. Again you can make a fresh installation which I
-
00:05:52 recommend, test it then you can move your main installation into newer version. The installation
-
00:05:57 is super fast. It will take few minutes depending on your computer and network speed.
-
00:06:02 You see it is almost done in my computer because we are now using uv packages for installing.
-
00:06:08 Our ComfyUI installer is standalone so therefore you can use it as a standalone but I prefer to
-
00:06:15 use it with SwarmUI which I will show. So after installation you can windows run GPU and it will
-
00:06:21 start your latest version ComfyUI. Let's see what features we are getting. So this is my updated
-
00:06:29 folder not a fresh installation folder. So you see that now found ComfyUI Kitchen Backend. It
-
00:06:35 supports all available true, available true, available true. So it is disabling ComfyUI
-
00:06:41 Kitchen Backend Triton because it is using eager and CUDA versions. It is automatic but we have
-
00:06:47 all 3 of them available true, available true and available true. You see it supports dequantize
-
00:06:53 NF4, dequantize per tensor, everything. My installer supporting everything. It is
-
00:06:59 by default using the SageAttention. You see it has these backend quantizations as well.
-
00:07:05 So if you want to add new arguments to your ComfyUI installation, edit this windows run
-
00:07:12 GPU with any text editor and change what is said here. For example people are getting stuck or out
-
00:07:20 of memory errors recently so you have 2 options. You can use --cache-none. This deloads every
-
00:07:28 model after they are executed whether it is text encoder or whether it is dual model. So this will
-
00:07:33 clear your RAM and VRAM 100%. Alternatively you can use --disable-smart-memory. This will return
-
00:07:41 back to older memory management. This uses lesser GPU memory. It is a little bit slower
-
00:07:47 but this will prevent getting you stuck or out of memory errors. So this is all about ComfyUI
-
00:07:54 installation. You can use it as usual. And now we support all 32 presets of the SwarmUI. Hopefully
-
00:08:03 I will make LTX 2 preset as well. So just wait me to do that. All you need to do is just drag
-
00:08:10 and drop the preset and it will work right away like this with the auto model downloader.
-
00:08:17 Okay so how do we use newer NVFP4 models? We are going to download the latest SwarmUI model
-
00:08:25 downloader zip file. This also has installation for SwarmUI if you remember. Move it into your
-
00:08:31 SwarmUI previous installation or you can make a fresh installation. It is same. So again it
-
00:08:38 is super important you extract and overwrite all the files. Extract here, overwrite all files. This
-
00:08:44 is important. If you don't overwrite you won't see the newer models. Then first let's download
-
00:08:49 newer models so I am going to windows start download models app.bat file. I am assuming
-
00:08:55 that you have downloaded bundles previously. If you didn't download bundles previously you need
-
00:09:00 to download them to be able to use newer models. For example you can download bundles like this
-
00:09:06 Z Image Turbo Core Bundle. Just click download and it will download all the necessary models.
-
00:09:12 So you can download newer NVFP4 models. These models bring speed up for RTX 5000 series. So
-
00:09:19 just click download and it will download all the models. Then you need to update your SwarmUI so I
-
00:09:25 will just windows update SwarmUI and it will start it. If you are first time installing
-
00:09:30 just use windows install SwarmUI. Okay it is updating. The update completed and it has started.
-
00:09:37 The first thing you need to do is you need to update your backend and give the new or
-
00:09:43 updated ComfyUI backend like this. I am using --use-sage-attention. You can also use other
-
00:09:50 attentions or as I just explained you, you can use this disable smart memory or cache none.
-
00:09:58 If you are getting stuck or getting out of VRAM errors I recommend to use disable smart memory
-
00:10:04 first. You can compare which one is working better for you or you can use cache none. But
-
00:10:09 cache none makes it load models every time at every generation from hard drive. Disable smart
-
00:10:16 memory doesn't do that. However this uses even lesser RAM memory. If you have a very limited
-
00:10:21 RAM memory you should use this. So once you done this you are ready to start using. Again
-
00:10:26 you should update your presets if you haven't yet. Then Quick Tools reset params to default.
-
00:10:32 Let's make a demonstration with the Z Image Turbo. This is 1536 by 1536 pixels. Super fast
-
00:10:42 car. First with the Z Image Turbo BF16 model. So let's generate 10 images then I will show with
-
00:10:49 the Z Image Turbo NVFP4 model. So you will see the speed difference live while I am recording
-
00:10:57 a video. Okay the generations started so this is the first generation. You are watching it live.
-
00:11:04 Look at the speed. So this is second generation. So let's see the speed. The second generation
-
00:11:11 took 6.31 second. So this is third generation. It took 6.38 seconds. This is fourth generation. It
-
00:11:20 took 6.33 seconds. So it is super fast. But what about NVFP4 model? So I will just cancel this and
-
00:11:29 I am just going to select the NVFP4 model and hit generate. Let's see the speed. This will blow your
-
00:11:36 mind. And let's also see the VRAM usage. So it is using 14 gigabytes of VRAM because it cached the
-
00:11:43 text encoder as well because I have VRAM. Don't worry you can run this as low as 6 gigabyte GPUs.
-
00:11:49 And look at the speed. I mean these are almost instant and these are 1536 to 1536 pixel images.
-
00:11:56 Not 1024. So it is taking 3 seconds to generate 2.25 megapixel images. These images are really
-
00:12:05 bigger than 1024 and you see the speed. Let's try 1024 to show you the speed. 1024,
-
00:12:14 1024 and let's hit generate. I mean look at the speed. They are like instant. You see? It
-
00:12:20 is taking 1.19 second. 1.2 second. Look at the speed. The NVFP model is just amazing. But what
-
00:12:28 about quality? So I have tested the quality and let's see the quality. This is FLUX 2 Dev model.
-
00:12:36 Which preset did I use? For this model I have used the FLUX 2 Quality 1 preset. You see this
-
00:12:43 one. So this is the highest quality preset. So the left ones are FP8 mixed scaled and the right
-
00:12:50 one are NVFP4. So you see left, right. Pretty good, pretty close. Left, right.
-
00:12:58 The man changed but very good quality. Left, right. This is very very close. You see? Left,
-
00:13:05 right. Almost same. Actually NVFP4 has a better fur of the animal if you ask my opinion. Left,
-
00:13:12 right. Both of them is great. Left, right. Both of them is excellent. Left,
-
00:13:18 right. Both of them is excellent. So with FLUX 2 Dev model we don't lose any quality.
-
00:13:25 But how much speed do we gain? When we look at our actual speed chart compared to the FP8 scaled,
-
00:13:33 the NVFP4 is 193% faster. You see? So we are gaining 100% speed. From 8.34 second IT to 4.31
-
00:13:49 second IT. So we get 100% speed gain, this is 193% faster and there is no quality difference.
-
00:13:58 We don't get this much speed gain that NVIDIA claims. I don't know how did
-
00:14:04 they made this chart but we are gaining massive amount of speed nevertheless.
-
00:14:10 So the second comparison is Z Image Turbo model from BF16 to NVFP4 and at the right one we have
-
00:14:19 the Z Image Turbo FP8 scaled. So let's see the quality difference. This is BF16, this is NVFP4
-
00:14:27 and this is FP8 scaled. You see the FP8 scaled is almost same as BF16. Very high quality. This
-
00:14:34 is BF16, this is NVFP4 and this is FP8 scaled. You see FP8 scaled is almost same as BF16. Why?
-
00:14:44 Because I am using our SE courses Musubi Tuner included quantization methodology. This is a very
-
00:14:53 high quality quantization. We also automatically install necessary nodes for you so you don't spend
-
00:14:59 any time to make them work and I am quantizing these models for you but you can also use yourself
-
00:15:05 if you want to quantize any specific model into FP8 scaled. So another example, this is BF16,
-
00:15:11 this is NVFP4 and this is FP8 scaled. Another example, this is BF16, this is NVFP4 and this
-
00:15:20 is FP8 scaled. All of them is amazing if you ask my opinion and these are just random.
-
00:15:25 What about FLUX Dev model? FLUX Dev is not very realistic as you know as a base. So this is BF16,
-
00:15:32 this is NVFP4, this is FP8 scaled which I made and this is GGUF Q8.
-
00:15:39 You see the GGUF Q8 and FP8 scaled almost same quality. This is BF16, this is NVFP4,
-
00:15:47 this is FP8 scaled and this is GGUF Q8. Our FP8 scaled is almost same quality as BF16,
-
00:15:56 this is FP8 scaled and almost same quality as GGUF Q8. So some people claiming that GGUF is much
-
00:16:02 better. No. FP8 scaled has almost same quality as GGUF Q8. However FP8 scaled is much faster.
-
00:16:11 With FLUX Dev model I think NVFP4 got some quality degrade. You see this is BF16, this is NVFP4, this
-
00:16:20 is FP8 scaled and this is GGUF Q8. This is BF16, this is NVFP4. Yes I can see some noise, some
-
00:16:28 quality degrade. This is FP8 scaled and this is GGUF Q8. So for FLUX Dev NVFP4 is low quality at
-
00:16:37 the moment. They need to upgrade it. However for Z Image Turbo and FLUX 2 they are perfectly usable.
-
00:16:45 So let's talk about precision. You always hear some precision but what is that? The AI models
-
00:16:51 are made of billions of parameters and these parameters are just numbers. Like this one 3.15,
-
00:17:00 12, 14, 35. So this is FP2 weight. A weight, a parameter weight in a billions of parameters
-
00:17:10 models. So normally trainings are usually done at FP32 or BF16. However we don't use the models
-
00:17:17 as FP32 because they are massive when they are FP32. What I mean by that? So you see that BF16
-
00:17:25 of the Z Image Turbo is 11.46 gigabyte. If it was FP32 it would be 20.9 gigabyte. FP8
-
00:17:35 scaled is 5.73 gigabytes and NVFP4 model is 4.20 gigabyte. This is not half of the 5.73
-
00:17:45 because NVFP4 is a little bit different. You can read it in the NVIDIA developer blog if you
-
00:17:52 are interested in that but this is the size difference. So FP32 is the highest quality,
-
00:17:58 highest precision, not needed for these newer big models. BF16 is what I recommend as a
-
00:18:07 highest quality because it is both fast and very high quality. You won't notice
-
00:18:12 the difference between FP32 and BF16. FP16 I don't recommend so if there is a BF16 of the
-
00:18:19 model use it and if there are not I am usually compiling making myself BF16 of the model from
-
00:18:27 FP32. FP16 is not as good as BF16 in none of the generative AI models that I have tested.
-
00:18:35 FP8 E4M3 is the default FP8. Now this is low quality because it loses the ability
-
00:18:42 to properly represent the base model weights. This is just default conversation. I mean look at this.
-
00:18:50 Normally the value is 3.15. It is represented as 3.15 in BF16, 3.15 as FP16 but when it comes
-
00:19:00 to FP8 it becomes 3.25. You see this is a major precision mistake. However when we do FP8 scaled
-
00:19:10 precision especially with this new quantization method it is making it in a way that it becomes
-
00:19:18 much more representative to the original model. So if you see raw FP8 E4M3 you can know that it is
-
00:19:27 very low quality compared to the BF16 or compared to the GGUF Q8 or GGUF Q6. This is low quality.
-
00:19:35 FP4 is very primitive. You see it lost all the precision became 3.0. This is a major mistake.
-
00:19:43 The GGUF works differently compared to the FP scaling because GGUF is block based and you see
-
00:19:52 the block average error is 0.01 for GGUF Q8. That is why it is very high precision. With GGUF Q4 it
-
00:20:01 becomes average 0.15 so you lose significant amount of quality. Moreover GGUF models are
-
00:20:08 slower to run. Minimum you will get like 20% slower speed. It depends on the model and the
-
00:20:16 GPU. Sometimes it will be even more. So therefore I recommend you to not use GGUF models. When you
-
00:20:23 should use? Let's say FP8 scaled is not fitting into your RAM memory. Not VRAM, RAM memory.
-
00:20:31 Therefore you have to use even lower VRAM and RAM requiring model like FLUX 2 model. For FLUX
-
00:20:38 2 we have low RAM preset which uses the GGUF Q4 because it is half size of the FP8 scaled model.
-
00:20:47 Therefore don't use GGUF unless your RAM memory is not enough. Why? Because ComfyUI by default
-
00:20:55 does block swapping VRAM stream, it is also called like that. Therefore as long as you have
-
00:21:01 sufficient amount of RAM memory I recommend you to use always properly made FP8 scaled. Sometimes you
-
00:21:10 will see FP8 mixed. Mixed means that some of the parameters are quantized, some of them are not,
-
00:21:17 like kept as BF16 or FP32. These models are also good so prefer them over GGUF models. Because even
-
00:21:26 if you do block swapping with your model it will be faster than the GGUF model. Test and
-
00:21:32 you will see and it will be better quality. So this is the precisions of the model. You
-
00:21:36 will lose the precision like this. FP32 maximum, BF16 like this, FP16 like this, FP8 like this but
-
00:21:44 don't compare this with FP8 scaled or FP8 mixed precision. FP4 E5M2 like this, GGUF are like this.
-
00:21:52 Finally in this tutorial I will introduce you a new cloud service SimplePod,
-
00:21:58 like RunPod. What is the difference of SimplePod compared to RunPod? It is working
-
00:22:03 exactly same as RunPod so any of my RunPod tutorial will work with SimplePod as well.
-
00:22:08 Please use this link to register. I appreciate that. After registering go to your dashboard,
-
00:22:15 go to billing and add some credits to your billing. It shows every spending as well.
-
00:22:21 Then use this template link then click edit and use and decide how much system disk you want. So
-
00:22:31 I will show first with base system disk then I will show with permanent storage disk. So I am
-
00:22:38 going to set this as 100 gigabytes. This will be temporary, it will get deleted. It is going
-
00:22:44 to use 3000 port for ComfyUI, for SwarmUI you can also use. I will show both of them. Save and use.
-
00:22:52 Then you need to pick your GPU. So what advantage this platform has? It is much faster than RunPod
-
00:23:00 and it is much cheaper. So you see this RTX 5090 is only 44 cents. Let's see it on the RunPod.
-
00:23:08 This txt file also contains RunPod so let's use our link to open RunPod. I appreciate that. Then
-
00:23:14 let's sign in. And for RunPod please use this new template. You need to use this. Okay let's double
-
00:23:22 click it. It has selected the template accurately. And you see RTX 5090 is 89 cents. So there is
-
00:23:30 a massive amount of difference. This is 100% cheaper and it is faster both network and disk.
-
00:23:39 Okay I am going to rent this one. The template is set. If you want to change it again just X
-
00:23:45 and click edit and edit the template. You can add other ports. You can add your persistent
-
00:23:51 storage which I will show. Okay save and use. Okay I am ready then I will click run.
-
00:23:56 So this is my temporary disk based template. So when I delete this everything will be deleted.
-
00:24:02 Let's also make a permanent storage. You see there is storage here so I click this. I click
-
00:24:07 add a new persistent volume. You see currently this is the location. Let's say tutorial. And
-
00:24:14 let's make this like 200 gigabytes. You can make as much as you want and you see this is only
-
00:24:19 6 dollars per month. However on RunPod let's look at the network cost. So I am going to pick Europe
-
00:24:27 like this and let's make this like 200 and when I set this 200 gigabytes it is 14 dollars per month.
-
00:24:35 It is more than twice expensive than the SimplePod. Okay let's save. So to be able to
-
00:24:41 use this template now I will return back to the documentation and open the template link again
-
00:24:48 and in here edit and use and in here you need to select your persistence volume. You see tutorial
-
00:24:55 arrived. And mount point you need to make this workspace. Don't forget. Hopefully we will make
-
00:25:00 this automatic so you won't be needed to make this workspace but currently this is mandatory.
-
00:25:05 So it is going to use my persistence volume tutorial. Save and use. Now I need to pick my GPU.
-
00:25:11 Let's look at the RTX Pro 6000. So you see this is only 72 cents per hour and these are not spot
-
00:25:20 prices. These are private server prices. These are not like community cloud. These are like secure
-
00:25:26 cloud on RunPod. So this is only 72 cents. I am going to select this and run. Now you see this
-
00:25:32 is my second running. When I go to my servers I will see both of them. So first one is here,
-
00:25:38 the second one is here. You can see that the second one has a volume disk 200 gigabytes.
-
00:25:42 The first one doesn't have so this is using temporary disk. Let's see the price on RunPod.
-
00:25:47 On RunPod the RTX Pro 6000 price is 1.84 dollars. So it is almost 3 times expensive
-
00:25:58 than my RTX Pro 6000 on SimplePod. That is why now I recommend SimplePod to use. And all of
-
00:26:06 the RunPod scripts will work on SimplePod. You just need to follow these steps exactly same.
-
00:26:13 Okay the first server started. This SimplePod has some advantages which I will show. So let's make
-
00:26:20 installation on RTX Pro 6000 because after this point both of them are same just one difference
-
00:26:29 when you delete your instance this one all the data will be lost. So make sure to backup your
-
00:26:36 data so I will just delete it. There is no stop button. Make sure that using permanent storage
-
00:26:42 if you want to keep your data. So I will just delete my instance. So my second instance is
-
00:26:48 now here which has a permanent storage. You see it shows my volume and I have set it as a workspace.
-
00:26:55 To connect it we are going to use Jupyter. You see there is Jupyter so click secure and it will
-
00:27:00 open the Jupyter Lab interface as usual. So let's install ComfyUI into here. All I need to do is
-
00:27:07 just drag and drop my zip file into here. When you right click there is no extract option yet. I am
-
00:27:15 talking with the developer hopefully they will add. So to be able to extract it we will open
-
00:27:20 a new terminal apt install zip and click yes. You can also drag and drop all the files after
-
00:27:28 extracting in your folder then I will do unzip ComfyUI zip file. I click it tab and it completed
-
00:27:36 the name then refresh. So everything is here. But hopefully we will have right click and unzip soon.
-
00:27:42 Then I need to follow the RunPod SimplePod instructions for installation. Copy this
-
00:27:48 terminal and copy paste and hit enter if you don't want to install these additional options.
-
00:27:54 Remember we do set workspace as a mount storage. This is important. So the installation will be
-
00:28:01 really fast. You can follow it here. Let's see the speed live. Meanwhile we can also download
-
00:28:09 models or install our SwarmUI if you want to use SwarmUI. Okay it is installing. Look at the speed.
-
00:28:15 It is amazing. We upgraded our installers. Our installers are now working perfect on RunPod,
-
00:28:21 MassCompute and SimplePod. All of them. So meanwhile let's upload our SwarmUI zip
-
00:28:28 file here. Okay it is uploaded. Now I need to unzip this. I will open a new terminal.
-
00:28:35 unzip this one. Okay it is unzipped. Let's refresh. Now let's start the model downloader.
-
00:28:44 So let's make a demonstration both ComfyUI and SwarmUI. So let's go to RunPod model download
-
00:28:50 instructions. Copy this like this and terminal and paste. This will start the downloader.
-
00:28:57 Okay downloader started. Let's open the link. The ComfyUI installation almost completed. So I will
-
00:29:02 download Z Image Turbo model as a demonstration into ComfyUI first. So I need to enter inside my
-
00:29:09 ComfyUI folder, select the models copy path like this. Then in here I will copy paste it into here
-
00:29:17 with a backslash to the beginning. Don't forget this backslash to the beginning otherwise it
-
00:29:22 will not work. Then I will select ComfyUI folder structure. This is important. Then I will select
-
00:29:27 the Z Image Turbo Core Bundle like this. It is 20 gigabytes so it will be really fast to
-
00:29:32 download. Let's follow the download here. So the download speed is 180 megabytes, 200 megabytes
-
00:29:39 almost per second. It will download. The ComfyUI installation is almost done. It is installing the
-
00:29:45 necessary libraries. But the speed is amazing compared to the RunPod because I compared it.
-
00:29:50 If you are a Linux user all you need to do is using MassCompute install sh file.
-
00:29:56 You can look at the MassCompute instructions and when you execute this command it will install and
-
00:30:02 work on your Linux machine. If you own a Linux machine don't worry about that. Okay ComfyUI has
-
00:30:08 been installed on the SimplePod. So how we start it? Go to bottom and select this command. Open a
-
00:30:15 new terminal and paste it and it will start the ComfyUI. Meanwhile models are getting downloaded
-
00:30:21 almost done. So you can look at the logs and you will see that it is supporting everything,
-
00:30:27 every library. You see CUDA 13 with PyTorch version 2.9.1 and this is NVIDIA RTX Pro
-
00:30:35 6000 Blackwell Workstation Edition. Okay it started locally. How do we connect this? We
-
00:30:42 are going to connect it from this port 3000. So go back to your SimplePod interface and
-
00:30:48 you see that port 3000 became available. So click direct. This direct works much faster
-
00:30:55 than RunPod. It is as if it is running in your computer. Okay it is started.
-
00:31:00 So since I have downloaded the Z Image Turbo model bundles it is almost done. I am going
-
00:31:08 to generate some Z images. We have the presets you see inside this ComfyUI version 73 presets.
-
00:31:15 Let's look at the Z Image Turbo. Let's use the Z Image Turbo Quality 1. This has a upscale.
-
00:31:21 Okay it is loaded. Let's look if the all models downloaded. Almost done. Not yet.
-
00:31:26 And every file I made here will be permanently stored. When I go to storage I can see that I
-
00:31:32 am using 25.9 gigabytes of disk space. Next time when you run it you will still run the install
-
00:31:40 command. However it will be much faster this time because we previously installed everything. You
-
00:31:46 can even run the run command directly but if you get errors run the install command again.
-
00:31:51 Okay all the files downloaded so let's refresh this. This is running on the SimplePod not on
-
00:31:57 my computer. So now everything is auto set. Let's generate 5 images from here and run.
-
00:32:05 Okay it didn't see the VAE because these VAE are sometimes downloaded for SwarmUI. You see
-
00:32:11 there is backslash problem. This is preset saving problem. You just need to click this.
-
00:32:17 Okay it is fixed and run. You may get error only with VAE because SwarmUI default sets VAE into
-
00:32:24 subfolder but other than that it is just right away ready to use. And it started generation
-
00:32:30 already. Let's see the nvidia-smi. So pip install nvidia-smi. nvidia-smi. We are using 21 gigabytes
-
00:32:39 of VRAM. This workflow generates 1536 to 1536 and upscales into 1920 pixel. So it will be
-
00:32:47 really high resolution, high quality. You can see that it is doing upscaling as well with the best
-
00:32:53 upscaler model. So we are going to get amazing quality images and you can see the speed is
-
00:32:59 just amazing. And we got the first image. It is saved in the outputs folder. You can also right
-
00:33:05 click and save image. All of the generated images videos will be inside output folder.
-
00:33:11 So how you can download them? You can download them fast on SimplePod. Go to your my servers your
-
00:33:18 interface. Go to file browser direct. I recommend this. This is super fast. So this is where you can
-
00:33:25 upload and download files directly to the server and it is ultra fast. When I go to the workspace
-
00:33:32 this is where my files are. Double click and it is inside ComfyUI inside output. So when I
-
00:33:39 select this folder I can click download. It will zip and download it instantly. It will be really
-
00:33:45 fast. Just click keep. It is done. If it was a model training it would be inside models. Let's
-
00:33:51 download one of the model to see the speed. So diffusion models and Z Image Turbo. This is like
-
00:33:57 12 gigabyte. I click download icon and I click keep and it will start download. You see the speed
-
00:34:03 is amazing to download. You can download directly your trained models, your generated images, videos
-
00:34:08 by using the direct method of the SimplePod. You can also upload files here. It is just same.
-
00:34:15 Click this upload icon into the selected folder. You can upload both folders and file. So let's
-
00:34:21 upload a file like output zip file and it will be instantly uploaded. So this is super fast with the
-
00:34:29 SimplePod. You can also use secure connections if you want or direct. Direct is faster.
-
00:34:34 So how to use SwarmUI on SimplePod? I will show that now. If you click this icon it will restart
-
00:34:41 your server. So it is restarting. It will flush out my GPU. It will close the running ComfyUI.
-
00:34:48 It is fine but we already have installed it. As a next step I will show installation of SwarmUI
-
00:34:54 and I will test it with FLUX 2 because this GPU is a beast RTX Pro 6000. Let's see the speed. So
-
00:35:01 restart is happening probably done. Yes. You see VRAM usage is dropped. Now I will connect from
-
00:35:07 the Jupyter once again. So open the RunPod SwarmUI install instructions. It is same as
-
00:35:14 the SimplePod. I will rename the file. So copy, open a new terminal, paste it. You need to start
-
00:35:22 installation then you can start download but make sure that installation started and SwarmUI folder
-
00:35:29 has been cloned otherwise it will not work. So once you see the SwarmUI folder now you can
-
00:35:34 begin the model downloads at the same time. But let's just wait for installation because it is
-
00:35:40 ultra fast to install SwarmUI. Okay it is almost getting done. We will see the
-
00:35:45 Cloudflared URL here. Yes it has started. Okay localhost started. Now we are waiting
-
00:35:52 Cloudflared. Yes here. Open the Cloudflared URL. Okay one more time. If it doesn't start
-
00:35:57 immediately wait for a while. Keep refreshing keep clicking and it will start like this.
-
00:36:03 Click agree. Customize settings. Select your template. Next. Just install. Next. This is
-
00:36:10 none because we are going to use our ComfyUI installed backend. This is mandatory. Don't
-
00:36:15 forget that. Don't use ComfyUI local. If you use ComfyUI local it will not work. Next. I am not
-
00:36:20 going to download anything. Next. And yes I am sure install. The installation is done. Now I
-
00:36:26 need to give my backend. So go to backends. Add ComfyUI self starting like this. Okay. Refresh
-
00:36:32 so it will fix. Okay. Then pick your ComfyUI folder. Right click copy path. Click this edit.
-
00:36:39 Put a backslash and main.py. So the path is like this /workspace/ComfyUI/main.py. Add any extra
-
00:36:48 arguments that you want --use-sage-attention. This is also same in the Windows installation
-
00:36:54 if you remember. Save. So it will start on the GPU ID 0. If you have multiple GPUs you
-
00:36:59 can duplicate this multiple times and change the GPU ID every time. Now it will start the
-
00:37:04 ComfyUI backend and it will become ready. We need to import our presets. Go to presets.
-
00:37:12 Import. Choose from file and select the latest SwarmUI amazing preset. You can
-
00:37:17 click overwrite if you want and import. And it is all imported and click refresh.
-
00:37:21 Now we need to download the model. So I will start the model downloader one more time. RunPod model
-
00:37:27 download instructions. Since we did restart the server new terminal. Exactly same as on
-
00:37:32 the RunPod but I am also showing you on the SimplePod as well because I recommend you to
-
00:37:37 start using SimplePod if you want cheaper prices. Okay let's open the Gradio live.
-
00:37:43 Then I am going to start downloading the FLUX 2 bundle. By default our application
-
00:37:49 our downloader automatically detects the SwarmUI. You see? You don't need to enter custom path. You
-
00:37:55 need to enter only if you are downloading into ComfyUI. So FLUX 2 Core Bundle here and download
-
00:38:00 all models. You can follow the progress here. It will start and it will get faster. Remember
-
00:38:06 in this tutorial video how to use SwarmUI presets and workflows in ComfyUI. We have
-
00:38:12 explained how to use models of SwarmUI inside ComfyUI by using the extra_model_paths.yaml file
-
00:38:21 so you don't need to have duplicate models. You can have all of them inside SwarmUI and
-
00:38:25 use in ComfyUI as well. Or vice versa. You can have them in ComfyUI and use in SwarmUI. This
-
00:38:32 tutorial video explains that perfectly. The link will be in the description of the video.
-
00:38:36 Remember we are spending time to set up and download models right now but all of them are
-
00:38:41 being saved inside our permanent storage. So next time I start they will be immediately ready which
-
00:38:46 I will show once everything is done how to start again later. Okay so the FLUX 2 bundle download
-
00:38:54 has been completed but let's also download NVFP4 of the FLUX 2 model so click download. Remember
-
00:39:01 if you don't want to download any specific model of the bundle you can just click download buttons
-
00:39:06 from here. It will queue them and it will download them. So this way you can skip any
-
00:39:12 model download. So now it is downloading the NVFP4 version. After that we will be ready. Okay NVFP4
-
00:39:20 also downloaded. Let's go back to our running Cloudflared SwarmUI. Go to models refresh so it
-
00:39:27 will see all the models. Then in the preset let's select the FLUX High Quality Preset 1 because we
-
00:39:34 have the best GPU. Then Quick Tools reset params to default and let's select the FLUX 2 Quality 1
-
00:39:41 because we have the best GPU. You see the default resolution is 2048 to 2048. You can change it if
-
00:39:48 you want. Let's generate 16:9 image. Let's type something simple. Super expensive sports car.
-
00:39:57 And let's generate 3 images. And I am going to select the NVFP4 model and generate. Let's also
-
00:40:05 open nvidia-smi to see the VRAM usage. Okay it is loading the model. You see without block swapping
-
00:40:12 and VRAM streaming it is using 43 gigabytes of VRAM with NVFP4 model. It is also keeping probably
-
00:40:20 the text encoder in the VRAM as well. Maybe it is in the RAM memory. It depends on how ComfyUI
-
00:40:26 is handling that. But our generation started. Let's see the step speed. So I see 9.61 second
-
00:40:35 per IT. Yeah this is definitely interesting. Its speed is not as fast as on my RTX 5090.
-
00:40:43 Okay I found the reason. The reason is that this GPU has a power cap at the moment. I will report
-
00:40:52 this to the SimplePod developer. So if there are such GPUs they need to discard them. They need
-
00:41:01 to not allow them to be used because this GPU is currently power capped. This can happen on RunPod
-
00:41:08 as well. So pay attention to your GPU power cap. If they are capped like this, this is 250 watt
-
00:41:15 capped. That is why this is much slower. Yes. So this is why we are not getting the good speed.
-
00:41:21 Okay now I will show you how you can resume your work afterwards. So I will delete my
-
00:41:28 instance but all of my data will be kept inside my permanent storage system. Confirm.
-
00:41:35 Then again you need to pick the template. So return back to RunPod SimplePod instructions.
-
00:41:41 Double click the template again. Click edit and use. Select the persistence volume from
-
00:41:48 here. Make the storage point workspace and save and use. Then you will see the
-
00:41:56 template selected here like before. And pick your GPU. Let's pick this one to see whether
-
00:42:02 this will be also power capped or not. Okay run. So now it will be very fast to resume
-
00:42:09 our installed work. This is also same in the RunPod. When you are using permanent storage
-
00:42:14 everything is same there as well. So let's just wait for it to start our machine. It is
-
00:42:20 starting. Console appeared. As it start the links will appear. Just wait until Jupyter
-
00:42:26 Lab link appears. Okay Jupyter direct appeared so let's open it. If you get connection warnings like
-
00:42:33 this just click continue. This can also happen in your PC if you use direct because direct link is
-
00:42:40 HTTP not secure HTTPS. So it depends whether you want to use or not. I prefer to use it.
-
00:42:47 Then I will use the RunPod SwarmUI install instructions. I will rename it by the way.
-
00:42:52 I shouldn't be need to reinstall but if you need it for some reason you can do if you get errors.
-
00:42:58 So it should right away start because SwarmUI also automatically updates ComfyUI when it is starting.
-
00:43:04 So you see I am using the same command always. Whether I am resuming or whether I am first time
-
00:43:10 installing. With other applications you also just run the running part not the installation part.
-
00:43:16 I skipped the ComfyUI installation entirely. Okay the Cloudflared started. I need to just wait like
-
00:43:22 1 minute sometimes for it to become available. Just keep hitting the F5 to refresh the page.
-
00:43:28 Okay it is starting. Nice. You see the backend will be automatically here. I can add another
-
00:43:34 backend because this one has 2X GPU. Let's also open nvidia-smi to verify. nvidia-smi. Okay it
-
00:43:41 is not here. I need to pip install nvidia-smi. nvidia-smi. And let's see the watt usage whether
-
00:43:47 it is capped or not here. So I will Quick Tools reset params to default presets Quality 1 select
-
00:43:54 model NVFP4 super fast car and let's select this aspect ratio. Let's generate 3 images and let's
-
00:44:04 see whether this one is also power capped or not. Hopefully they will remove such GPUs from
-
00:44:11 their GPU pool. Okay it is loading the model. You see the model loading is very fast on the
-
00:44:16 SimplePod compared to RunPod. On RunPod sometimes I wait 10 minutes for just model loading. Their
-
00:44:21 disk speed is very slow and they are replying our every request so if you have any questions
-
00:44:28 any problems with them just join our Discord channel and message me there. If you type SE
-
00:44:34 courses Discord you will see our link like this. Also it will be in the description of the video.
-
00:44:39 Just join to our channel and type in any channel and you can mention me and I will hopefully reply.
-
00:44:46 Okay the generation started. Yeah this GPU is also power capped for some reason. So this is
-
00:44:53 how you resume and continue your work. This is how SimplePod works. So this is how you use ComfyUI,
-
00:45:02 SwarmUI or any of my installer on SimplePod. They have a lot of advantages like direct connection,
-
00:45:11 like secure connection. Both works with any ports. With permanent storage system that is
-
00:45:17 much cheaper and also faster. So if you have any questions always ask me. Hopefully see you later.
-
00:45:23 You can ask me from Patreon, from email, from YouTube wherever you want. From Discord also.
