|
| 1 | +--- |
| 2 | +title: Wan2.2-S2V Audio-Driven Video Generation ComfyUI Native Workflow Example |
| 3 | +description: This is a native workflow example for Wan2.2-S2V audio-driven video generation in ComfyUI. |
| 4 | +sidebarTitle: "Wan2.2 S2V" |
| 5 | +--- |
| 6 | + |
| 7 | +import UpdateReminder from '/snippets/tutorials/update-reminder.mdx' |
| 8 | + |
| 9 | +We're excited to announce that Wan2.2-S2V, the advanced audio-driven video generation model, is now natively supported in ComfyUI! This powerful AI model can transform static images and audio inputs into dynamic video content, supporting dialogue, singing, performance, and various creative content needs. |
| 10 | + |
| 11 | +**Model Highlights** |
| 12 | +- **Audio-Driven Video Generation**: Transforms static images and audio into synchronized videos |
| 13 | +- **Cinematic-Grade Quality**: Generates film-quality videos with natural expressions and movements |
| 14 | +- **Minute-Level Generation**: Supports long-form video creation |
| 15 | +- **Multi-Format Support**: Works with full-body and half-body characters |
| 16 | +- **Enhanced Motion Control**: Generates actions and environments from text instructions |
| 17 | + |
| 18 | +Wan2.2 S2V Code: [GitHub](https://github.com/aigc-apps/VideoX-Fun) |
| 19 | +Wan2.2 S2V Model: [Hugging Face](https://huggingface.co/Wan-AI/Wan2.2-S2V-14B) |
| 20 | + |
| 21 | + |
| 22 | +## Wan2.2 S2V ComfyUI Native Workflow |
| 23 | + |
| 24 | +<UpdateReminder/> |
| 25 | + |
| 26 | +### 1. Download Workflow File |
| 27 | + |
| 28 | +Download the following workflow file and drag it into ComfyUI to load the workflow. |
| 29 | + |
| 30 | +<video |
| 31 | + controls |
| 32 | + className="w-full aspect-video" |
| 33 | + src="https://raw.githubusercontent.com/Comfy-Org/example_workflows/refs/heads/main/video/wan/wan2.2_s2v/wan2.2-s2v.mp4" |
| 34 | +></video> |
| 35 | + |
| 36 | +<a className="prose" target='_blank' href="https://raw.githubusercontent.com/Comfy-Org/workflow_templates/refs/heads/main/templates/video_wan2_2_14B_s2v.json" style={{ display: 'inline-block', backgroundColor: '#0078D6', color: '#ffffff', padding: '10px 20px', borderRadius: '8px', borderColor: "transparent", textDecoration: 'none', fontWeight: 'bold'}}> |
| 37 | + <p className="prose" style={{ margin: 0, fontSize: "0.8rem" }}>Download JSON Workflow</p> |
| 38 | +</a> |
| 39 | + |
| 40 | +Download the following image and audio as input: |
| 41 | + |
| 42 | + |
| 43 | + |
| 44 | +<a className="prose" target='_blank' href="https://raw.githubusercontent.com/Comfy-Org/example_workflows/refs/heads/main/video/wan/wan2.2_s2v/input_audio.MP3" style={{ display: 'inline-block', backgroundColor: '#0078D6', color: '#ffffff', padding: '10px 20px', borderRadius: '8px', borderColor: "transparent", textDecoration: 'none', fontWeight: 'bold'}}> |
| 45 | + <p className="prose" style={{ margin: 0, fontSize: "0.8rem" }}>Download Input Audio</p> |
| 46 | +</a> |
| 47 | + |
| 48 | +### 2. Model Links |
| 49 | + |
| 50 | +You can find the models in [our repo](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged) |
| 51 | + |
| 52 | +**diffusion_models** |
| 53 | +- [wan2.2_s2v_14B_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_s2v_14B_fp8_scaled.safetensors) |
| 54 | +- [wan2.2_s2v_14B_bf16.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_s2v_14B_bf16.safetensors) |
| 55 | + |
| 56 | +**audio_encoders** |
| 57 | +- [wav2vec2_large_english_fp16.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/audio_encoders/wav2vec2_large_english_fp16.safetensors) |
| 58 | + |
| 59 | +**vae** |
| 60 | +- [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors) |
| 61 | + |
| 62 | +**text_encoders** |
| 63 | +- [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors) |
| 64 | + |
| 65 | + |
| 66 | +``` |
| 67 | +ComfyUI/ |
| 68 | +├───📂 models/ |
| 69 | +│ ├───📂 diffusion_models/ |
| 70 | +│ │ ├─── wan2.2_s2v_14B_fp8_scaled.safetensors |
| 71 | +│ │ └─── wan2.2_s2v_14B_bf16.safetensors |
| 72 | +│ ├───📂 text_encoders/ |
| 73 | +│ │ └─── umt5_xxl_fp8_e4m3fn_scaled.safetensors |
| 74 | +│ ├───📂 audio_encoders/ # Create one if you can't find this folder |
| 75 | +│ │ └─── wav2vec2_large_english_fp16.safetensors |
| 76 | +│ └───📂 vae/ |
| 77 | +│ └── wan_2.1_vae.safetensors |
| 78 | +``` |
| 79 | + |
| 80 | + |
| 81 | +### 3. Workflow Instructions |
| 82 | + |
| 83 | + |
| 84 | + |
| 85 | +#### 3.1 About Lightning LoRA |
| 86 | + |
| 87 | +#### 3.2 About fp8_scaled and bf16 Models |
| 88 | + |
| 89 | +You can find both models [here](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/diffusion_models): |
| 90 | + |
| 91 | +- [wan2.2_s2v_14B_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_s2v_14B_fp8_scaled.safetensors) |
| 92 | +- [wan2.2_s2v_14B_bf16.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_s2v_14B_bf16.safetensors) |
| 93 | + |
| 94 | +This template uses `wan2.2_s2v_14B_fp8_scaled.safetensors`, which requires less VRAM. But you can try `wan2.2_s2v_14B_bf16.safetensors` to reduce quality degradation. |
| 95 | + |
| 96 | +#### 3.3 Step-by-Step Operation Instructions |
| 97 | + |
| 98 | +**Step 1: Load Models** |
| 99 | +1. **Load Diffusion Model**: Load `wan2.2_s2v_14B_fp8_scaled.safetensors` or `wan2.2_s2v_14B_bf16.safetensors` |
| 100 | + - The provided workflow uses `wan2.2_s2v_14B_fp8_scaled.safetensors`, which requires less VRAM |
| 101 | + - But you can try `wan2.2_s2v_14B_bf16.safetensors` to reduce quality degradation |
| 102 | +2. **Load CLIP**: Load `umt5_xxl_fp8_e4m3fn_scaled.safetensors` |
| 103 | +3. **Load VAE**: Load `wan_2.1_vae.safetensors` |
| 104 | +4. **AudioEncoderLoader**: Load `wav2vec2_large_english_fp16.safetensors` |
| 105 | +5. **LoraLoaderModelOnly**: Load `wan2.2_t2v_lightx2v_4steps_lora_v1.1_high_noise.safetensors` (Lightning LoRA) |
| 106 | + - We tested all wan2.2 lightning LoRAs. Since this is not a LoRA specifically trained for Wan2.2 S2V, many key values don't match, but we added it because it significantly reduces generation time. We will continue to optimize this template |
| 107 | + - Using it will cause significant dynamic and quality loss |
| 108 | + - If you find the output quality too poor, you can try the original 20-step workflow |
| 109 | +6. **LoadAudio**: Upload our provided audio file or your own audio |
| 110 | +7. **Load Image**: Upload reference image |
| 111 | +8. **Batch sizes**: Set according to the number of Video S2V Extend subgraph nodes you add |
| 112 | + - Each Video S2V Extend subgraph adds 77 frames to the final output |
| 113 | + - For example: If you added 2 Video S2V Extend subgraphs, the batch size should be 3, which means the total number of sampling iterations |
| 114 | + - **Chunk Length**: Keep the default value of 77 |
| 115 | + |
| 116 | +9. **Sampler Settings**: Choose different settings based on whether you use Lightning LoRA |
| 117 | + - With 4-step Lightning LoRA: steps: 4, cfg: 1.0 |
| 118 | + - Without 4-step Lightning LoRA: steps: 20, cfg: 6.0 |
| 119 | +10. **Size Settings**: Set the output video dimensions |
| 120 | +11. **Video S2V Extend**: Video extension subgraph nodes. Since our default frames per sampling is 77, and this is a 16fps model, each extension will generate 77 / 16 = 4.8125 seconds of video |
| 121 | + - You need some calculation to match the number of video extension subgraph nodes with the input audio length. For example: If input audio is 14s, the total frames needed are 14x16=224, each video extension is 77 frames, so you need 224/77 = 2.9, rounded up to 3 video extension subgraph nodes |
| 122 | +12. Use Ctrl-Enter or click the Run button to execute the workflow |
| 123 | + |
0 commit comments