-
-
Notifications
You must be signed in to change notification settings - Fork 362
How To Use Mochi 1 Open Source Video Generation Model On Your Windows PC RunPod and Massed Compute
Full tutorial link > https://www.youtube.com/watch?v=iqBV7bCbDJY
Mochi 1 from Genmo is the newest state-of-the-art Open Source video generation model that you can use for free on your computer. This model is a breakthrough like the very first Stable Diffusion model but this time it is starting for the video generation models. In this tutorial, I am going to show you how to use Genmo Mochi 1 video generation model on your computer, on windows, locally with the most advanced and very easy to use SwarmUI. SwarmUI as fast as ComfyUI but also as easy as using Automatic1111 Stable Diffusion web UI. Moreover, if you don't have a powerful GPU to run this model locally, I am going to show you how to use this model on the best cloud providers RunPod and Massed Compute.
🔗 Public Open Access Article Used in Video
Amazing Ultra Important Tutorials with Chapters and Manually Written Subtitles / Captions
Stable Diffusion 3.5 Large How To Use Tutorial With Best Configuration and Comparison With FLUX DEV : https://youtu.be/-zOKhoO9a5s
FLUX Full Fine-Tuning / DreamBooth Tutorial That Shows A Lot Info Regarding SwarmUI Latest : https://youtu.be/FvpWy1x5etM
Full FLUX Tutorial - FLUX Beats Midjourney for Real : https://youtu.be/bupRePUOA18
Main Windows SwarmUI Tutorial (Watch To Learn How to Use)
How to install and use. You have to watch this to learn how to use SwarmUI
Has 70 chapters and manually fixed captions : https://youtu.be/HKX8_F1Er_w
Cloud Tutorial (Massed Compute - RunPod - Kaggle)
If you don't have a powerful GPU or you want to use powerful GPU this is the tutorial you need
48 GB A6000 GPU is only 31 cents per hour on Massed Compute with our special coupon : https://youtu.be/XFUZof6Skkw
Free Kaggle Account Notebook for GPU-Poor
Installs latest version of SwarmUI on a free Kaggle account
Works with Dual T4 GPU at the same time
Supports SD 1.5, SDXL, SD3, FLUX and Stable Cascade and more :
Download from here : https://www.patreon.com/posts/106650931
00:00:00 Introduction to the tutorial
00:01:44 How to download, install and use Mochi 1 on Windows
00:03:59 How to update SwarmUI to the latest version to be able to use Mochi 1
00:04:17 How to start SwarmUI on Windows
00:04:27 How to set SwarmUI to use which GPU for generation
00:04:55 How to generate a video with Mochi 1, what are the best configurations
00:06:45 Where I have shared all the prompts I used to generate intro demo AI videos
00:07:30 Where to see step speed of your video generation and what are the speeds of RTX 3060 and RTX 3090
00:08:04 How do I activate my first primary GPU while also generating on my secondary GPU
00:08:25 Why queue system may not immediately start using your multiple GPUs and how to fix
00:09:45 How to solve out of memory error by enabling VAE tiling
00:10:02 Which parameters are best for VAE tile size and VAE tile overlap
00:10:53 How to use Mochi 1 and SwarmUI on Massed Compute cloud service - you don't need a GPU for this
00:11:13 How to apply our SECourses coupon to get 50% discount for real for RTX A6000 GPU
00:11:37 How to connect initialized Massed Compute and start using Mochi 1
00:12:23 How to update SwarmUI to latest version on Massed Compute
00:12:51 How to start SwarmUI with public share to access from computer directly and use in computer browser
00:14:10 How to install and use Mochi 1 on RunPod with SwarmUI
00:16:45 How to monitor back-ends loading of SwarmUI on RunPod
00:17:35 How to properly terminate your RunPod pod and Massed Compute instance to not lose any money
Repo : https://huggingface.co/genmo/mochi-1-preview
Model Architecture
Mochi 1 represents a significant advancement in open-source video generation, featuring a 10 billion parameter diffusion model built on our novel Asymmetric Diffusion Transformer (AsymmDiT) architecture. Trained entirely from scratch, it is the largest video generative model ever openly released. And best of all, it’s a simple, hackable architecture.
Alongside Mochi, we are open-sourcing our video VAE. Our VAE causally compresses videos to a 128x smaller size, with an 8x8 spatial and a 6x temporal compression to a 12-channel latent space.
An AsymmDiT efficiently processes user prompts alongside compressed video tokens by streamlining text processing and focusing neural network capacity on visual reasoning. AsymmDiT jointly attends to text and visual tokens with multi-modal self-attention and learns separate MLP layers for each modality, similar to Stable Diffusion 3. However, our visual stream has nearly 4 times as many parameters as the text stream via a larger hidden dimension. To unify the modalities in self-attention, we use non-square QKV and output projection layers. This asymmetric design reduces inference memory requirements. Many modern diffusion models use multiple pretrained language models to represent user prompts. In contrast, Mochi 1 simply encodes prompts with a single T5-XXL language model.
-
00:00:00 Greetings everyone, today I am going to show you how to use the newest state-of-the-art
-
00:00:06 text-to-video model Mochi 1 Preview. And I am going to use our newest favorite, very easy,
-
00:00:14 very convenient, and very advanced to use SwarmUI. If you have been following me recently,
-
00:00:20 I am using SwarmUI for FLUX, for Stable Diffusion 3.5, and the newest models like Mochi 1. Because
-
00:00:28 it has so many features, it is using ComfyUI as a backend. Therefore, it is super, super optimized,
-
00:00:36 super fast, and it gives us all the features that we need. Like wildcards, like grid generation,
-
00:00:44 like presets, like image history, and anything that you need. And since it is using ComfyUI
-
00:00:50 as a backend, it is the best UI to use
-
00:00:54 the video generation models as well. And it is starting to get supporting more
-
00:00:59 video models. Therefore, I really recommend you to learn how to use SwarmUI. So, today,
-
00:01:05 I will show you how to use this amazing new Mochi 1 Preview on the SwarmUI. This will be a public
-
00:01:13 tutorial. You will access all the resources and the links. And moreover, I will show it how to
-
00:01:20 use on Windows computer with as low as 12 GB GPUs, and I will show how to use on RunPod and Massed
-
00:01:28 Compute. So, don't worry, if you have a powerful GPU, you will be able to use this amazing model with amazing
-
00:01:36 prices on RunPod and on Massed Compute.
-
00:01:39 I don't know if you have a lower than 12 GB, GPU, it would work or not. First of all,
-
00:01:45 go to this page. This is a public post. The link will be in the description of the video.
-
00:01:49 This is the main
-
00:01:51 post that I use for SwarmUI tutorials.
-
00:01:53 And as I said, this is a public post. You can access this post from even private
-
00:01:58 window. You don't need to login or register or be subscriber of me. You will see that we have
-
00:02:04 these following tutorials. You can watch this latest one to learn how to install SwarmUI.
-
00:02:11 But I really recommend you to watch this cloud and main SwarmUI tutorials if you don't know
-
00:02:17 how to use SwarmUI. And in the very bottom of the post in the attachments, you will see the install
-
00:02:22 linux.sh file to install on RunPod. And you will see the windows installer.bat file. So as I said,
-
00:02:30 watch the tutorials to learn how to install SwarmUI. So how we are going to install and
-
00:02:36 use Genmo Mochi 1 preview video model. In this tutorial, I am going to use FP8 scaled
-
00:02:42 unified model. So click here and you will go to the ComfyUI Org Mochi preview repackaged.
-
00:02:49 Download the file from here and put it into your SwarmUI models Stable diffusion folder. That's it.
-
00:02:55 However, you can also use our SwarmUI unified downloader. Go to here and we have the version
-
00:03:00 11. Click here to download. Extract the files into your SwarmUI installed folder. Like this.
-
00:03:07 Okay. Move the files into the main parent folder like this. You see my SwarmUI has been installed
-
00:03:13 here. So this is the main parent folder. This is the SwarmUI folder. And this is the folder where
-
00:03:18 you need to put the installer files. Then double click windows download model.bat file. More info
-
00:03:24 run anyway. And it will give you options to download all these models. If you remember,
-
00:03:29 I had introduced you FP8 FLUX development model. But the scaled version which works better and
-
00:03:35 really fast compared to the GGUF models. This model has been fixed and improved. Now it works
-
00:03:42 on RTX 4000 series as well like RTX 4090. I recommend you to check it out. How you gonna
-
00:03:49 download the Genmo Mochi model? You see the option 12. So just type 12 and hit enter and it will
-
00:03:55 start downloading the necessary model. Since I have previously downloaded it, it is ready.
-
00:04:00 Then enter inside your SwarmUI folder. Now this is very important. You need to first update your
-
00:04:05 SwarmUI to the very latest version. So double click update windows.bat file. It will update
-
00:04:11 the SwarmUI. Make sure that there are no errors and it is done. Then we double click the launch
-
00:04:18 windows.bat file and it will start. So the SwarmUI has been been started on my
-
00:04:24 local computer. So in the server in the back ends currently this is set for my the second
-
00:04:29 GPU. What I mean by that when I type nvidia-smi you can see my GPUs RTX 3090 TI and RTX 3060.
-
00:04:38 Let's also open the nvitop like this and you can see I am currently using 7000 megabytes. But it
-
00:04:45 is fine because this model is using very little amount of VRAM during the generation and in the
-
00:04:51 end. When it is reconstructing video it is using tiling. So go to the generate options and go to
-
00:04:57 the models. In here select the downloaded Mochi 1 preview FP8 scaled model. This is an all-in-one
-
00:05:05 model. It contains all the files that you need and click here display advanced options. Then
-
00:05:10 what are the recommended values? I recommend 40 steps however it may take a time depending on
-
00:05:17 your GPU. 20 is also fine. I recommend CFG scale 6. And this is the most crucial part. How many
-
00:05:24 frames you want to generate. The examples I have shown in the beginning are 121 frames.
-
00:05:31 So you may be wondering why this is 25 frame not 24. Because this is how the model works. It has
-
00:05:38 to be multiplication of the 6 plus 1. So you can increment this by 6 like this to 31 or 37 or 43.
-
00:05:50 And as you increase the number of frames you can increase the number of frames you can increase the
-
00:05:52 number of frames it will increase your generation time. Not the VRAM usage but the generation time
-
00:05:58 will almost linearly increase. The fps, you can decide the fps of the video. If you set this as
-
00:06:05 8 you will get a 3 times longer video than your generated video. What I mean for example if I
-
00:06:13 generate 49 frames with the 8 fps I am going to get 6 second video. It is actually 2 second video
-
00:06:21 but it will be 6 second because of the frame rate. So let's make an example video generation and see
-
00:06:28 the speed. But keep in mind that currently I am recording I am using a lot of GPU power. You see
-
00:06:35 I am using 10% GPU utilization right now and using this amount of memory. So it will be slower than
-
00:06:43 you generate without recording anything. And I have shared all the prompts that I used to
-
00:06:47 generate the videos in the beginning. So they are here click this GitHub gist file and you will see
-
00:06:53 the prompts that I have used. So let's pick this prompt as an example and put it here. And click
-
00:06:59 the generate. Now as you increase the number of video frames at the end when it is decoding with
-
00:07:05 the VAE you may get out of VRAM error. You see currently this is still using my GPU ID 1. It
-
00:07:12 is generating this 49 frames video on my second GPU. And this is 3 times slower than my
-
00:07:21 RTX 3090 GPU. However it is working. So don't worry even if you have low VRAM GPU you will
-
00:07:28 get the result eventually. And what is the speed of this generation on this GPU if you wonder. Go
-
00:07:35 to the server. Go to the logs. Go to the debug. And you will see the step speed here. As I said
-
00:07:41 as you increase the number of frames the duration will linearly increase. So currently this is 26
-
00:07:49 seconds / it on RTX 3060 for 49 frames. So this will take around 500 seconds. Because it is 20
-
00:08:00 steps. 20 multiplied with 26. And meanwhile it is generating with my secondary GPU. I am going to
-
00:08:07 use my first GPU as well. So I will make this GPU ID 0 and save like this you see. Then I am going
-
00:08:15 to make another generation. Generate. Now this should start using my second GPU. Let's see if it
-
00:08:23 is going to start or not. Since it has a queue system it may not start. Therefore I may be needed to
-
00:08:30 order it to do 10 generation like this. Okay now it should start. And you see it is now generating
-
00:08:37 video on my both of the GPUs and the speed you see here. So on my RTX 3090 it is 9.39 seconds right
-
00:08:46 now. It would be faster if I was not doing anything else on my GPU. So it would
-
00:08:52 take let's say 9 seconds maybe 8 seconds with 20. It would take only like 3 minutes, 4 minutes,
-
00:09:00 3 minutes on my GPU. So if you have 4090 GPU it would take less than 1 minute for 20 steps and 49
-
00:09:10 frames. It is really fast. And it is really easy and convenient to use with the SwarmUI. And if you
-
00:09:16 increase the frame count don't worry. It will just almost linearly increase. The VRAM usage
-
00:09:22 will stay same. So you can also generate very long videos like 10 second video. The generation with
-
00:09:28 my RTX 3090 Ti has been completed. I clicked it here and interrupted all sessions. There is one
-
00:09:35 another thing that you can change which is the output file. However webp is also working fine.
-
00:09:41 You can try each one of them and see which one of them is working best for your case. So what
-
00:09:46 if. If you get. Out of memory error like this. Which you shouldn't get on Windows because of
-
00:09:52 the shared VRAM. How to fix it? The fix is very easy. Go to the advanced sampling here and set the
-
00:10:00 sample size. How? VAE tile size as equal as 160 and VAE overlap tile overlap as 96. And with this way
-
00:10:12 you can generate very long videos even 10 second video maybe even longer than 10 second video. I
-
00:10:17 have tested this. So it should work even on the lower end GPUs. And how did I come up with these
-
00:10:25 values. The developer of the SwarmUI has done some testing and he recommends VAE tile size as
-
00:10:33 160 and VAE tile overlap as 96. Also make sure that you set this on cloud machines as well if
-
00:10:43 you are going to generate longer videos. As I am going to show in the upcoming part of the video.
-
00:10:48 So now I am going to show how to use this amazing model with SwarmUI on RunPod and Massed Compute.
-
00:10:54 If you have not registered Massed Compute before please use this link to register. I appreciate it.
-
00:10:59 You see there is Massed Compute instructions here. You should always read here. After registration
-
00:11:04 setting up your billing. Go to the deploy here. And I recommend to use RTX A6000 GPU,
-
00:11:11 Creator, SECourses. These are all as usual as before. Verify. And deploy. That's it. You see
-
00:11:19 this is only 31 cents per hour for this amazing, very powerful GPU. 48 GB of RAM, 48 GB of VRAM,
-
00:11:27 256 GB of storage but this is many many times faster than RunPod. For both downloading and for
-
00:11:34 both disk speeds. So as usual I will connect with our ThinLinc client. Then I will use the unified
-
00:11:43 downloader through my synchronizing drive. So drag and drop into download folder from
-
00:11:50 my synchronizing drive. And right click and extract here. I will use it but you can also
-
00:11:56 manually download. So the instructions are here. Let's just open and copy this. And start a new
-
00:12:02 terminal. Paste. And all I need to do is just select option 12. You can also download other
-
00:12:08 models from here. And the download has started. Currently Hugging Face Limited single connection
-
00:12:14 downloads to 40 MB. So it is going to take like 6 minutes. It is fine.
-
00:12:20 Meanwhile let's also start SwarmUI. So first of all you need to update your SwarmUI. This
-
00:12:26 is the update icon here. Double click it. It will update the SwarmUI to the latest version. It is
-
00:12:33 pre-installed and the update will take only like 1 minute. Maybe lesser. Let's see. And the update
-
00:12:40 has been completed and the SwarmUI has started. So I can use it inside the ThinLinc client.
-
00:12:47 Like this. With these pre-downloaded models. However, if you want to use it faster on your PC,
-
00:12:54 it is way easier. Just close this and you will see that there is Run Cloudflare SwarmUI here.
-
00:13:01 You see here. Double click it. It will start the SwarmUI with a public link that you can use on
-
00:13:08 your computer. It is here. You see just open link. This is using the Cloudflared. Just refresh until
-
00:13:15 it arrives. And yes, now I will copy this link and open a page in my computer and you see I am using
-
00:13:22 the SwarmUI from my Windows computer very fast and this is running on the Massed Compute. So even
-
00:13:29 if you don't have any GPU, it will not make any difference because this is running on the Massed
-
00:13:34 Compute cloud computing. Now all I need to do is just wait for the Mochi model to be downloaded.
-
00:13:40 So the download has been completed. It took like 7 minutes. However, it could be way faster because
-
00:13:48 of that I just opened an issue on Hugging Face to make my scripts download way faster hopefully.
-
00:13:55 So now I need to refresh the models folder. Go to models. Click refresh icon and you will see
-
00:14:01 the all-in-one Mochi 1 preview. Click it. And then click display advanced options as in the
-
00:14:07 Windows tutorial part. And the rest is exactly the same. And now it is time for me to show you how to
-
00:14:14 use the use this model on the RunPod. So go to the RunPod
-
00:14:18 instructions. Follow the older tutorials if you don't know. Please use this link to register. I
-
00:14:24 appreciate that so that I can get more credits and I can test them. I am not getting paid
-
00:14:29 any money from them. I am using the credits to make more tutorials to you. Go to deploy. I don't
-
00:14:35 recommend community cloud recently because it is way slow. I am using US Texas 3. This is my
-
00:14:40 favorite. And I am going to show with RTX 4090. My newest template is, let me show you which one.
-
00:14:47 It is Torch 2.2.0. This is the best newest one. And set the volume disk as you wish. Let's set it
-
00:14:55 like 50 gigabytes it is sufficient. This is really important if you remember this is the port that
-
00:15:00 we need and set overrides. I am not going to show entire thing because it is always same with the
-
00:15:04 previous videos. Nothing has changed. Download the install linux.sh file. Click connect. Connect
-
00:15:10 to Jupyterlab. I feel like this pod is so slow because it took a lot of time for to
-
00:15:16 get prepared. So I will rent another pod. If this happens, I usually rent another pod to save my
-
00:15:23 time. So when you run multiple pods, you can calculate their starting speed and decide which
-
00:15:29 one of them is working better. And this one has better statistics here. So I will delete
-
00:15:34 the other one. Unfortunately, it is very very non-stable with the RunPod. Sometimes you
-
00:15:40 may get a good pod but it is very rare. But mostly your pods will be very bad. Recently.
-
00:15:47 Unfortunately. With Massed Compute, it is never the case. Okay connect to Jupyterlab.
-
00:15:52 This one initialized faster. Click this arrow. Select install and SwarmUI model file. Follow
-
00:15:59 the instructions here for install. Okay it will start installation. Meanwhile let's also extract
-
00:16:05 here archive and refresh and then open the RunPod instructions. You can also manually download.
-
00:16:11 By the way, this is not. I don't recommend anymore. Open a terminal. Copy paste and select the model.
-
00:16:17 So while installing we will also download the model to speed up our initialization part.
-
00:16:23 Everything is same as before. I agree, customize, modern dark, this, ComfyUI local. I'm not going to
-
00:16:30 download anything so it will be fast and you see meanwhile installing I am also downloading the
-
00:16:35 model and making everything ready. I am doing the multiple things at the same time. So the SwarmUI
-
00:16:41 installation has been completed. However, model is still getting downloaded. Moreover you need
-
00:16:46 to wait until backends are loaded on the server. To see what's happening. Go to logs. Go to debug and
-
00:16:52 watch here. On the Massed Compute we already have installed. So you don't wait anything. You see it
-
00:16:58 is trying to fix all the libraries at the moment. Okay the model has been downloaded. You can also
-
00:17:04 download manually with wget as in my another tutorial that I have shown. And we are still
-
00:17:10 waiting backends to be loaded. However, if this fails forever, if you see repeating messages here,
-
00:17:17 you need to restart your pod. Okay. So it was able to load the backend as you are seeing
-
00:17:23 right now. It took a lot of time to install. Let's go to the generate tab, click models,
-
00:17:28 refresh and select the model and it is same as on the Windows, display advanced options text to
-
00:17:35 video. So if you want to terminate and not spend your money with RunPod close from here. Stop pod.
-
00:17:43 This will still use your money, but it will be costing less and you can resume again without
-
00:17:50 installing again. And when you terminated, you will lose everything. With the Massed Compute,
-
00:17:55 even if you stop instance, you will still be fully charged. Therefore, you need to delete
-
00:18:00 your instance and delete everything permanently. I hope you have enjoyed. Please follow us on
-
00:18:07 Patreon. Please join our discord channel. We have over 9000 members. When you click here,
-
00:18:12 you will see it. Please also go to our GitHub repository and please Star our repository.
-
00:18:19 Fork it. Watch it. You see, we have a lot of followers, stars, and if you sponsor me,
-
00:18:24 I appreciate that. And our discord server has over 9000 members. So come and ask and chat
-
00:18:31 with me and chat with everyone. Hopefully see you in another amazing tutorial video.
