|
| 1 | +--- |
| 2 | +title: "Wan2.2 Video Generation ComfyUI Official Native Workflow Example" |
| 3 | +description: "Official usage guide for Alibaba Cloud Tongyi Wanxiang 2.2 video generation model in ComfyUI" |
| 4 | +sidebarTitle: Wan2.2 |
| 5 | +--- |
| 6 | + |
| 7 | +import UpdateReminder from '/snippets/tutorials/update-reminder.mdx' |
| 8 | + |
| 9 | +Wan 2.2 is a new generation multimodal generative model launched by WAN AI. This model adopts an innovative MoE (Mixture of Experts) architecture, consisting of high-noise and low-noise expert models. It can divide expert models according to denoising timesteps, thus generating higher quality video content. |
| 10 | + |
| 11 | +Wan 2.2 has three core features: cinematic-level aesthetic control, deeply integrating professional film industry aesthetic standards, supporting multi-dimensional visual control such as lighting, color, and composition; large-scale complex motion, easily restoring various complex motions and enhancing the smoothness and controllability of motion; precise semantic compliance, excelling in complex scenes and multi-object generation, better restoring users' creative intentions. |
| 12 | +The model supports multiple generation modes such as text-to-video and image-to-video, suitable for content creation, artistic creation, education and training, and other application scenarios. |
| 13 | + |
| 14 | +## Model Highlights |
| 15 | + |
| 16 | +- **Cinematic-level Aesthetic Control**: Professional camera language, supports multi-dimensional visual control such as lighting, color, and composition |
| 17 | +- **Large-scale Complex Motion**: Smoothly restores various complex motions, enhances motion controllability and naturalness |
| 18 | +- **Precise Semantic Compliance**: Complex scene understanding, multi-object generation, better restoring creative intentions |
| 19 | +- **Efficient Compression Technology**: 5B version with high compression ratio VAE, memory optimization, supports mixed training |
| 20 | + |
| 21 | +## Wan2.2 Open Source Model Versions |
| 22 | + |
| 23 | +The Wan2.2 series models are based on the Apache 2.0 open source license and support commercial use. The Apache 2.0 license allows you to freely use, modify, and distribute these models, including for commercial purposes, as long as you retain the original copyright notice and license text. |
| 24 | + |
| 25 | +| Model Type | Model Name | Parameters | Main Function | Model Repository | |
| 26 | +|------------|------------|------------|---------------|-----------------| |
| 27 | +| Hybrid Model | Wan2.2-TI2V-5B | 5B | Hybrid version supporting both text-to-video and image-to-video, a single model meets two core task requirements | 🤗 [Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B) | |
| 28 | +| Image-to-Video | Wan2.2-I2V-A14B | 14B | Converts static images into dynamic videos, maintaining content consistency and smooth dynamic process | 🤗 [Wan2.2-I2V-A14B](https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B) | |
| 29 | +| Text-to-Video | Wan2.2-T2V-A14B | 14B | Generates high-quality videos from text descriptions, with cinematic-level aesthetic control and precise semantic compliance | 🤗 [Wan2.2-T2V-A14B](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B) | |
| 30 | + |
| 31 | +This tutorial will use the [🤗 Comfy-Org/Wan_2.2_ComfyUI_Repackaged](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged) version. |
| 32 | + |
| 33 | +<UpdateReminder/> |
| 34 | + |
| 35 | + |
| 36 | + |
| 37 | +## Wan2.2 TI2V 5B Hybrid Version Workflow Example |
| 38 | + |
| 39 | +### 1. Download Workflow File |
| 40 | + |
| 41 | +Please update your ComfyUI to the latest version, and through the menu `Workflow` -> `Browse Templates` -> `Video`, find "Wan2.2 5B video generation" to load the workflow. |
| 42 | + |
| 43 | +### 2. Manually Download Models |
| 44 | + |
| 45 | +**Diffusion Model** |
| 46 | +- [wan2.2_ti2v_5B_fp16.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors) |
| 47 | + |
| 48 | +**VAE** |
| 49 | +- [wan2.2_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan2.2_vae.safetensors) |
| 50 | + |
| 51 | +**Text Encoder** |
| 52 | +- [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors) |
| 53 | + |
| 54 | +``` |
| 55 | +ComfyUI/ |
| 56 | +├───📂 models/ |
| 57 | +│ ├───📂 diffusion_models/ |
| 58 | +│ │ └───wan2.2_ti2v_5B_fp16.safetensors |
| 59 | +│ ├───📂 text_encoders/ |
| 60 | +│ │ └─── umt5_xxl_fp8_e4m3fn_scaled.safetensors |
| 61 | +│ └───📂 vae/ |
| 62 | +│ └── wan2.2_vae.safetensors |
| 63 | +``` |
| 64 | + |
| 65 | +### 3. Follow the Workflow Steps |
| 66 | + |
| 67 | + |
| 68 | +1. Ensure the `Load Diffusion Model` node loads the `wan2.2_ti2v_5B_fp16.safetensors` model. |
| 69 | +2. Ensure the `Load CLIP` node loads the `umt5_xxl_fp8_e4m3fn_scaled.safetensors` model. |
| 70 | +3. Ensure the `Load VAE` node loads the `wan2.2_vae.safetensors` model. |
| 71 | +4. (Optional) If you need to perform image-to-video generation, you can use the shortcut Ctrl+B to enable the `Load image` node to upload an image. |
| 72 | +5. (Optional) In the `Wan22ImageToVideoLatent` node, you can adjust the size settings and the total number of video frames (`length`). |
| 73 | +6. (Optional) If you need to modify the prompts (positive and negative), please do so in the `CLIP Text Encoder` node at step 5. |
| 74 | +7. Click the `Run` button, or use the shortcut `Ctrl(cmd) + Enter` to execute video generation. |
| 75 | + |
| 76 | +## Wan2.2 14B T2V Text-to-Video Workflow Example |
| 77 | + |
| 78 | +### 1. Workflow File |
| 79 | + |
| 80 | +Please update your ComfyUI to the latest version, and through the menu `Workflow` -> `Browse Templates` -> `Video`, find "Wan2.2 14B T2V" to load the workflow. |
| 81 | + |
| 82 | +### 2. Manually Download Models |
| 83 | + |
| 84 | +**Diffusion Model** |
| 85 | +- [wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors) |
| 86 | +- [wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors) |
| 87 | + |
| 88 | +**VAE** |
| 89 | +- [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors) |
| 90 | + |
| 91 | +**Text Encoder** |
| 92 | +- [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors) |
| 93 | + |
| 94 | + |
| 95 | +``` |
| 96 | +ComfyUI/ |
| 97 | +├───📂 models/ |
| 98 | +│ ├───📂 diffusion_models/ |
| 99 | +│ │ ├─── wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors |
| 100 | +│ │ └─── wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors |
| 101 | +│ ├───📂 text_encoders/ |
| 102 | +│ │ └─── umt5_xxl_fp8_e4m3fn_scaled.safetensors |
| 103 | +│ └───📂 vae/ |
| 104 | +│ └── wan_2.1_vae.safetensors |
| 105 | +``` |
| 106 | + |
| 107 | +### 3. Follow the Workflow Steps |
| 108 | + |
| 109 | + |
| 110 | +1. Ensure the first `Load Diffusion Model` node loads the `wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors` model. |
| 111 | +2. Ensure the second `Load Diffusion Model` node loads the `wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors` model. |
| 112 | +3. Ensure the `Load CLIP` node loads the `umt5_xxl_fp8_e4m3fn_scaled.safetensors` model. |
| 113 | +4. Ensure the `Load VAE` node loads the `wan_2.1_vae.safetensors` model. |
| 114 | +5. (Optional) In the `EmptyHunyuanLatentVideo` node, you can adjust the size settings and the total number of video frames (`length`). |
| 115 | +6. (Optional) If you need to modify the prompts (positive and negative), please do so in the `CLIP Text Encoder` node at step 5. |
| 116 | +7. Click the `Run` button, or use the shortcut `Ctrl(cmd) + Enter` to execute video generation. |
| 117 | + |
| 118 | +## Wan2.2 14B I2V Image-to-Video Workflow Example |
| 119 | + |
| 120 | +### 1. Workflow File |
| 121 | + |
| 122 | +Please update your ComfyUI to the latest version, and through the menu `Workflow` -> `Browse Templates` -> `Video`, find "Wan2.2 14B I2V" to load the workflow. |
| 123 | + |
| 124 | +### 2. Manually Download Models |
| 125 | + |
| 126 | +**Diffusion Model** |
| 127 | +- [wan2.2_i2v_high_noise_14B_fp16.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_i2v_high_noise_14B_fp16.safetensors) |
| 128 | +- [wan2.2_i2v_low_noise_14B_fp16.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_i2v_low_noise_14B_fp16.safetensors) |
| 129 | + |
| 130 | +**VAE** |
| 131 | +- [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors) |
| 132 | + |
| 133 | +**Text Encoder** |
| 134 | +- [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors) |
| 135 | + |
| 136 | +``` |
| 137 | +ComfyUI/ |
| 138 | +├───📂 models/ |
| 139 | +│ ├───📂 diffusion_models/ |
| 140 | +│ │ ├─── wan2.2_i2v_low_noise_14B_fp16.safetensors |
| 141 | +│ │ └─── wan2.2_i2v_high_noise_14B_fp16.safetensors |
| 142 | +│ ├───📂 text_encoders/ |
| 143 | +│ │ └─── umt5_xxl_fp8_e4m3fn_scaled.safetensors |
| 144 | +│ └───📂 vae/ |
| 145 | +│ └── wan_2.1_vae.safetensors |
| 146 | +``` |
| 147 | +### 3. Follow the Workflow Steps |
| 148 | + |
| 149 | + |
| 150 | +1. Make sure the first `Load Diffusion Model` node loads the `wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors` model. |
| 151 | +2. Make sure the second `Load Diffusion Model` node loads the `wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors` model. |
| 152 | +3. Make sure the `Load CLIP` node loads the `umt5_xxl_fp8_e4m3fn_scaled.safetensors` model. |
| 153 | +4. Make sure the `Load VAE` node loads the `wan_2.1_vae.safetensors` model. |
| 154 | +5. In the `Load Image` node, upload the image to be used as the initial frame. |
| 155 | +6. If you need to modify the prompts (positive and negative), do so in the `CLIP Text Encoder` node at step 6. |
| 156 | +7. (Optional) In `EmptyHunyuanLatentVideo`, you can adjust the size settings and the total number of video frames (`length`). |
| 157 | +8. Click the `Run` button, or use the shortcut `Ctrl(cmd) + Enter` to execute video generation. |
0 commit comments