diff --git a/docs.json b/docs.json index 9bddd0fc5..5722cb48a 100644 --- a/docs.json +++ b/docs.json @@ -151,6 +151,7 @@ "group": "Wan Video", "pages": [ "tutorials/video/wan/wan2_2", + "tutorials/video/wan/wan2-2-fun-inp", { "group": "Wan2.1", "pages": [ @@ -698,6 +699,7 @@ "group": "万相视频", "pages": [ "zh-CN/tutorials/video/wan/wan2_2", + "zh-CN/tutorials/video/wan/wan2-2-fun-inp", { "group": "Wan2.1", "pages": [ diff --git a/images/tutorial/image/qwen/image_qwen_image-guide.jpg b/images/tutorial/image/qwen/image_qwen_image-guide.jpg index 4a303f2fd..0005ce43c 100644 Binary files a/images/tutorial/image/qwen/image_qwen_image-guide.jpg and b/images/tutorial/image/qwen/image_qwen_image-guide.jpg differ diff --git a/images/tutorial/video/wan/wan2_2/wan_2.2_14b_fun_inp.jpg b/images/tutorial/video/wan/wan2_2/wan_2.2_14b_fun_inp.jpg new file mode 100644 index 000000000..c9632a19d Binary files /dev/null and b/images/tutorial/video/wan/wan2_2/wan_2.2_14b_fun_inp.jpg differ diff --git a/tutorials/image/qwen/qwen-image.mdx b/tutorials/image/qwen/qwen-image.mdx index 4f32f02fc..2169d428e 100644 --- a/tutorials/image/qwen/qwen-image.mdx +++ b/tutorials/image/qwen/qwen-image.mdx @@ -22,17 +22,21 @@ import UpdateReminder from '/snippets/tutorials/update-reminder.mdx' -**VRAM usage reference** -Tested with **RTX 4090D 24GB** - Model Version: Qwen-Image_fp8 -- VRAM: 86% -- Generation time: 94s for the first time, 71s for the second time +There are three different models used in the workflow attached to this document: +1. Qwen-Image original model fp8_e4m3fn +2. 8-step accelerated version: Qwen-Image original model fp8_e4m3fn with lightx2v 8-step LoRA +3. Distilled version: Qwen-Image distilled model fp8_e4m3fn -**Model Version: Qwen-Image_bf16** -- VRAM: 96% -- Generation time: 295s for the first time, 131s for the second time +**VRAM Usage Reference** +GPU: RTX4090D 24GB + +| Model Used | VRAM Usage | First Generation | Second Generation | +| --------------------------------------- | ---------- | --------------- | ---------------- | +| fp8_e4m3fn | 86% | ≈ 94s | ≈ 71s | +| fp8_e4m3fn with lightx2v 8-step LoRA | 86% | ≈ 55s | ≈ 34s | +| Distilled fp8_e4m3fn | 86% | ≈ 69s | ≈ 36s | ### 1. Workflow File @@ -59,23 +63,27 @@ Distilled version All models are available at [Huggingface](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/tree/main) and [Modelscope](https://modelscope.cn/models/Comfy-Org/Qwen-Image_ComfyUI/files) -**Diffusion Model** +**Diffusion model** + +- [qwen_image_fp8_e4m3fn.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/diffusion_models/qwen_image_fp8_e4m3fn.safetensors) -[qwen_image_fp8_e4m3fn.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/diffusion_models/qwen_image_fp8_e4m3fn.safetensors) +Qwen_image_distill -The following models are unofficial distilled versions that require only 15 steps. -[Distilled Versions](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/tree/main/non_official/diffusion_models) -- [qwen_image_distill_full_bf16.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/non_official/diffusion_models/qwen_image_distill_full_bf16.safetensors) 40.9 GB -- [qwen_image_distill_full_fp8.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/non_official/diffusion_models/qwen_image_distill_full_fp8_e4m3fn.safetensors) 20.4 GB +- [qwen_image_distill_full_fp8_e4m3fn.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/non_official/diffusion_models/qwen_image_distill_full_fp8_e4m3fn.safetensors) +- [qwen_image_distill_full_bf16.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/non_official/diffusion_models/qwen_image_distill_full_bf16.safetensors) - - The original author of the distilled version recommends using 15 steps with cfg 1.0. -- According to tests, this distilled version also performs well at 10 steps with cfg 1.0. You can choose euler or res_multistep according to your desired image type. +- The original author of the distilled version recommends using 15 steps with cfg 1.0. +- According to tests, this distilled version also performs well at 10 steps with cfg 1.0. You can choose either euler or res_multistep based on the type of image you want. -**Text Encoder** +**LoRA** + +- [Qwen-Image-Lightning-8steps-V1.0.safetensors](https://huggingface.co/lightx2v/Qwen-Image-Lightning/resolve/main/Qwen-Image-Lightning-8steps-V1.0.safetensors) + +**Text encoder** -[qwen_2.5_vl_7b_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors) +- [qwen_2.5_vl_7b_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors) **VAE** @@ -87,19 +95,29 @@ The following models are unofficial distilled versions that require only 15 step 📂 ComfyUI/ ├── 📂 models/ │ ├── 📂 diffusion_models/ -│ │ └── qwen_image_fp8_e4m3fn.safetensors +│ │ ├── qwen_image_fp8_e4m3fn.safetensors +│ │ └── qwen_image_distill_full_fp8_e4m3fn.safetensors ## 蒸馏版 +│ ├── 📂 loras/ +│ │ └── Qwen-Image-Lightning-8steps-V1.0.safetensors ## 8步加速 LoRA 模型 │ ├── 📂 vae/ │ │ └── qwen_image_vae.safetensors │ └── 📂 text_encoders/ │ └── qwen_2.5_vl_7b_fp8_scaled.safetensors ``` + ### 3. Complete the Workflow Step by Step ![Step Guide](/images/tutorial/image/qwen/image_qwen_image-guide.jpg) -1. Load `qwen_image_fp8_e4m3fn.safetensors` in the `Load Diffusion Model` node -2. Load `qwen_2.5_vl_7b_fp8_scaled.safetensors` in the `Load CLIP` node -3. Load `qwen_image_vae.safetensors` in the `Load VAE` node -4. Set image dimensions in the `EmptySD3LatentImage` node -5. Enter your prompts in the `CLIP Text Encoder` (supports English, Chinese, Korean, Japanese, Italian, etc.) -6. Click Queue or press `Ctrl+Enter` to run \ No newline at end of file +1. Make sure the `Load Diffusion Model` node has loaded `qwen_image_fp8_e4m3fn.safetensors` +2. Make sure the `Load CLIP` node has loaded `qwen_2.5_vl_7b_fp8_scaled.safetensors` +3. Make sure the `Load VAE` node has loaded `qwen_image_vae.safetensors` +4. Make sure the `EmptySD3LatentImage` node is set with the correct image dimensions +5. Set your prompt in the `CLIP Text Encoder` node; currently, it supports at least English, Chinese, Korean, Japanese, Italian, etc. +6. If you want to enable the 8-step acceleration LoRA by lightx2v, select the node and use `Ctrl + B` to enable it, and modify the Ksampler settings as described in step 8 +7. Click the `Queue` button, or use the shortcut `Ctrl(cmd) + Enter` to run the workflow +8. For different model versions and workflows, adjust the KSampler parameters accordingly + + + The distilled model and the 8-step acceleration LoRA by lightx2v do not seem to be compatible for simultaneous use. You can experiment with different combinations to verify if they can be used together. + \ No newline at end of file diff --git a/tutorials/video/wan/wan2-2-fun-inp.mdx b/tutorials/video/wan/wan2-2-fun-inp.mdx new file mode 100644 index 000000000..caadcf1ab --- /dev/null +++ b/tutorials/video/wan/wan2-2-fun-inp.mdx @@ -0,0 +1,114 @@ +--- +title: "ComfyUI Wan2.2 Fun Inp Start-End Frame Video Generation Example" +description: "This article introduces how to use ComfyUI to complete the Wan2.2 Fun Inp start-end frame video generation example" +sidebarTitle: "Wan2.2 Fun Inp" +--- + +import UpdateReminder from '/snippets/tutorials/update-reminder.mdx' + +**Wan2.2-Fun-Inp** is a start-end frame controlled video generation model launched by Alibaba PAI team. It supports inputting **start and end frame images** to generate intermediate transition videos, providing creators with greater creative control. The model is released under the **Apache 2.0 license** and supports commercial use. + +**Key Features**: +- **Start-End Frame Control**: Supports inputting start and end frame images to generate intermediate transition videos, enhancing video coherence and creative freedom +- **High-Quality Video Generation**: Based on the Wan2.2 architecture, outputs film-level quality videos +- **Multi-Resolution Support**: Supports generating videos at 512×512, 768×768, 1024×1024 and other resolutions to suit different scenarios + +**Model Version**: +- **14B High-Performance Version**: Model size exceeds 32GB, with better results but requires high VRAM + +Below are the relevant model weights and code repositories: + +- [🤗Wan2.2-Fun-Inp-14B](https://huggingface.co/alibaba-pai/Wan2.2-Fun-A14B-InP) +- Code repository: [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun) + + + +## Wan2.2 Fun Inp Start-End Frame Video Generation Workflow Example + +This workflow provides two versions: +1. A version using [Wan2.2-Lightning](https://huggingface.co/lightx2v/Wan2.2-Lightning) 4-step LoRA from lightx2v for accelerated video generation +2. A fp8_scaled version without acceleration LoRA + +Below are the test results using an RTX4090D 24GB VRAM GPU + +| Model Type | Resolution | VRAM Usage | First Generation Time | Second Generation Time | +| ------------------------ | ---------- | ---------- | -------------------- | --------------------- | +| fp8_scaled | 640×640 | 83% | ≈ 524s | ≈ 520s | +| fp8_scaled + 4-step LoRA | 640×640 | 89% | ≈ 138s | ≈ 79s | + +Since the acceleration with LoRA is significant, the provided workflows enable the accelerated LoRA version by default. If you want to enable the other workflow, select it and use **Ctrl+B** to activate. + +### 1. Download Workflow File + +Please update your ComfyUI to the latest version, and find "**Wan2.2 Fun Inp**" under the menu `Workflow` -> `Browse Templates` -> `Video` to load the workflow. + +Or, after updating ComfyUI to the latest version, download the workflow below and drag it into ComfyUI to load. + + + + +

Download JSON Workflow

+
+ +Use the following materials as the start and end frames + +![Wan2.2 Fun Control ComfyUI Workflow Start Frame Material](https://raw.githubusercontent.com/Comfy-Org/example_workflows/refs/heads/main/video/wan/wan2.2_fun_inp/start_image.png) +![Wan2.2 Fun Control ComfyUI Workflow End Frame Material](https://raw.githubusercontent.com/Comfy-Org/example_workflows/refs/heads/main/video/wan/wan2.2_fun_inp/end_image.png) + +### 2. Manually Download Models + +**Diffusion Model** +- [wan2.2_fun_inpaint_high_noise_14B_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_fun_inpaint_high_noise_14B_fp8_scaled.safetensors) +- [wan2.2_fun_inpaint_low_noise_14B_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_fun_inpaint_low_noise_14B_fp8_scaled.safetensors) + +**Lightning LoRA (Optional, for acceleration)** +- [wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/loras/wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors) +- [wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/loras/wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors) + +**VAE** +- [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors) + +**Text Encoder** +- [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors) + +``` +ComfyUI/ +├───📂 models/ +│ ├───📂 diffusion_models/ +│ │ ├─── wan2.2_fun_inpaint_high_noise_14B_fp8_scaled.safetensors +│ │ └─── wan2.2_fun_inpaint_low_noise_14B_fp8_scaled.safetensors +│ ├───📂 loras/ +│ │ ├─── wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors +│ │ └─── wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors +│ ├───📂 text_encoders/ +│ │ └─── umt5_xxl_fp8_e4m3fn_scaled.safetensors +│ └───📂 vae/ +│ └── wan_2.1_vae.safetensors +``` + +### 3. Step-by-Step Workflow Guide + +![Workflow Step Image](/images/tutorial/video/wan/wan2_2/wan_2.2_14b_fun_inp.jpg) + + + This workflow uses LoRA. Please make sure the corresponding Diffusion model and LoRA are matched. + + +1. **High noise** model and **LoRA** loading + - Ensure the `Load Diffusion Model` node loads the `wan2.2_fun_inpaint_high_noise_14B_fp8_scaled.safetensors` model + - Ensure the `LoraLoaderModelOnly` node loads the `wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors` +2. **Low noise** model and **LoRA** loading + - Ensure the `Load Diffusion Model` node loads the `wan2.2_fun_inpaint_low_noise_14B_fp8_scaled.safetensors` model + - Ensure the `LoraLoaderModelOnly` node loads the `wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors` +3. Ensure the `Load CLIP` node loads the `umt5_xxl_fp8_e4m3fn_scaled.safetensors` model +4. Ensure the `Load VAE` node loads the `wan_2.1_vae.safetensors` model +5. Upload the start and end frame images as materials +6. Enter your prompt in the Prompt group +7. Adjust the size and video length in the `WanFunInpaintToVideo` node + - Adjust the `width` and `height` parameters. The default is `640`. We set a smaller size, but you can modify it as needed. + - Adjust the `length`, which is the total number of frames. The current workflow fps is 16. For example, if you want to generate a 5-second video, you should set it to 5*16 = 80. +8. Click the `Run` button, or use the shortcut `Ctrl(cmd) + Enter` to execute video generation diff --git a/zh-CN/tutorials/image/qwen/qwen-image.mdx b/zh-CN/tutorials/image/qwen/qwen-image.mdx index 39bdc4d87..dd197d4fa 100644 --- a/zh-CN/tutorials/image/qwen/qwen-image.mdx +++ b/zh-CN/tutorials/image/qwen/qwen-image.mdx @@ -22,16 +22,20 @@ import UpdateReminder from '/snippets/zh/tutorials/update-reminder.mdx' -**显存使用参考** -使用 **RTX 4090D 24GB** 测试 -**模型版本: Qwen-Image_fp8** -- VRAM: 86% -- 生成时间: 首次 94 秒,第二次 71 秒 +在本篇文档所附工作流中使用的不同模型有三种 +1. Qwen-Image 原版模型 fp8_e4m3fn +2. 8步加速版: Qwen-Image 原版模型 fp8_e4m3fn 使用 lightx2v 8步 LoRA, +3. 蒸馏版:Qwen-Image 蒸馏版模型 fp8_e4m3fn + +**显存使用参考** +GPU: RTX4090D 24GB -**模型版本: Qwen-Image_bf16** -- VRAM: 96% -- 生成时间: 首次 295 秒,第二次 131 秒 +| 使用模型 | VRAM Usage | 首次生成 | 第二次生成 | +| --------------------------------- | ---------- | -------- | ---------- | +| fp8_e4m3fn | 86% | ≈ 94s | ≈ 71s | +| fp8_e4m3fn 使用 lightx2v 8步 LoRA | 86% | ≈ 55s | ≈ 34s | +| 蒸馏版 fp8_e4m3fn | 86% | ≈ 69s | ≈ 36s | ### 1. 工作流文件 @@ -48,7 +52,7 @@ import UpdateReminder from '/snippets/zh/tutorials/update-reminder.mdx' ### 2. 模型下载 -**ComfyUI 提供的版本** +**你可以在 ComfyOrg 仓库找到的版本** - Qwen-Image_bf16 (40.9 GB) - Qwen-Image_fp8 (20.4 GB) - 蒸馏版本 (非官方,仅需 15 步) @@ -56,39 +60,48 @@ import UpdateReminder from '/snippets/zh/tutorials/update-reminder.mdx' 所有模型均可在 [Huggingface](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/tree/main) 或者 [魔搭](https://modelscope.cn/models/Comfy-Org/Qwen-Image_ComfyUI/files) 找到 -**Diffusion Model** +**Diffusion model** -[qwen_image_fp8_e4m3fn.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/diffusion_models/qwen_image_fp8_e4m3fn.safetensors) +- [qwen_image_fp8_e4m3fn.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/diffusion_models/qwen_image_fp8_e4m3fn.safetensors) -下面的模型为非官方仅需 15 步的蒸馏版本 -[蒸馏版本](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/tree/main/non_official/diffusion_models) -- [qwen_image_distill_full_bf16.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/non_official/diffusion_models/qwen_image_distill_full_bf16.safetensors) 40.9 GB -- [qwen_image_distill_full_fp8.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/non_official/diffusion_models/qwen_image_distill_full_fp8_e4m3fn.safetensors) 20.4 GB +Qwen_image_distill + +- [qwen_image_distill_full_fp8_e4m3fn.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/non_official/diffusion_models/qwen_image_distill_full_fp8_e4m3fn.safetensors) +- [qwen_image_distill_full_bf16.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/non_official/diffusion_models/qwen_image_distill_full_bf16.safetensors) - 蒸馏版本原始作者建议在 15 步 cfg 1.0 - 经测试该蒸馏版本在 10 步 cfg 1.0 下表现良好,根据你想要的图像类型选择 euler 或 res_multistep -**Text Encoder** +**LoRA** + +- [Qwen-Image-Lightning-8steps-V1.0.safetensors](https://huggingface.co/lightx2v/Qwen-Image-Lightning/resolve/main/Qwen-Image-Lightning-8steps-V1.0.safetensors) -[qwen_2.5_vl_7b_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors) +**Text encoder** + +- [qwen_2.5_vl_7b_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors) **VAE** -[qwen_image_vae.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/vae/qwen_image_vae.safetensors) +- [qwen_image_vae.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/vae/qwen_image_vae.safetensors) +模型保存位置 ``` 📂 ComfyUI/ ├── 📂 models/ │ ├── 📂 diffusion_models/ -│ │ └── qwen_image_fp8_e4m3fn.safetensors +│ │ ├── qwen_image_fp8_e4m3fn.safetensors +│ │ └── qwen_image_distill_full_fp8_e4m3fn.safetensors ## 蒸馏版 +│ ├── 📂 loras/ +│ │ └── Qwen-Image-Lightning-8steps-V1.0.safetensors ## 8步加速 LoRA 模型 │ ├── 📂 vae/ │ │ └── qwen_image_vae.safetensors │ └── 📂 text_encoders/ │ └── qwen_2.5_vl_7b_fp8_scaled.safetensors ``` + ### 3. 按步骤完成工作流 ![步骤图](/images/tutorial/image/qwen/image_qwen_image-guide.jpg) @@ -98,4 +111,10 @@ import UpdateReminder from '/snippets/zh/tutorials/update-reminder.mdx' 3. 确保 `Load VAE`节点中加载了`qwen_image_vae.safetensors` 4. 确保 `EmptySD3LatentImage`节点中设置好了图片的尺寸 5. 在`CLIP Text Encoder`节点中设置好提示词,目前经过测试目前至少支持:英语、中文、韩语、日语、意大利语等 -6. 点击 `Queue` 按钮,或者使用快捷键 `Ctrl(cmd) + Enter(回车)` 来运行工作流 \ No newline at end of file +6. 如果需要启用 lightx2v 的 8 步加速 LoRA ,请选中后用 `Ctrl + B` 启用该节点,并按 序号`8` 处的设置参数修改 Ksampler 的设置设置 +7. 点击 `Queue` 按钮,或者使用快捷键 `Ctrl(cmd) + Enter(回车)` 来运行工作流 +8. 对于不同版本的模型和工作流的对应 KSampler 的参数设置 + + + 蒸馏版模型和 lightx2v 的 8 步加速 LoRA 似乎不能同时使用,你可以测试具体的组合参数来验证组合使用的方式是否可行 + \ No newline at end of file diff --git a/zh-CN/tutorials/video/wan/wan2-2-fun-inp.mdx b/zh-CN/tutorials/video/wan/wan2-2-fun-inp.mdx new file mode 100644 index 000000000..28f6f1cb6 --- /dev/null +++ b/zh-CN/tutorials/video/wan/wan2-2-fun-inp.mdx @@ -0,0 +1,114 @@ +--- +title: "ComfyUI Wan2.2 Fun Inp 首尾帧视频生成示例" +description: "本文介绍了如何在 ComfyUI 中完成 Wan2.2 Fun Inp 首尾帧视频生成示例" +sidebarTitle: "Wan2.2 Fun Inp" +--- + +import UpdateReminder from '/snippets/zh/tutorials/update-reminder.mdx' + +**Wan2.2-Fun-Inp** 是 Alibaba pai团队推出的首尾帧控制视频生成模型,支持输入**首帧和尾帧图像**,生成中间过渡视频,为创作者带来更强的创意控制力。该模型采用 **Apache 2.0 许可协议**发布,支持商业使用。 + +**核心功能**: +- **首尾帧控制**:支持输入首帧和尾帧图像,生成中间过渡视频,提升视频连贯性与创意自由度 +- **高质量视频生成**:基于 Wan2.2 架构,输出影视级质量视频 +- **多分辨率支持**:支持生成512×512、768×768、1024×1024等分辨率的视频,适配不同场景需求 + +**模型版本**: +- **14B 高性能版**:模型体积达 32GB+,效果更优但需高显存支持 + +下面是相关模型权重和代码仓库: + +- [🤗Wan2.2-Fun-Inp-14B](https://huggingface.co/alibaba-pai/Wan2.2-Fun-A14B-InP) +- 代码仓库:[VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun) + + + +## Wan2.2 Fun Inp 首尾帧视频生成工作流示例 + +这里提供的工作流包含了两个版本的 +1. 使用了 lightx2v 的 [Wan2.2-Lightning](https://huggingface.co/lightx2v/Wan2.2-Lightning) 4 步 LoRA 来实现视频生成提速的版本 +2. 没有使用加速 LoRA 的 fp8_scaled 版本 + +下面是使用 RTX4090D 24GB 显存 GPU 测试的结果 + +| 模型类型 | 分辨率 | 显存占用 | 首次生成时长 | 第二次生成时长 | +| ------------------------ | ------- | -------- | ------------ | -------------- | +| fp8_scaled | 640×640 | 83% | ≈ 524秒 | ≈ 520秒 | +| fp8_scaled + 4步LoRA加速 | 640×640 | 89% | ≈ 138秒 | ≈ 79秒 | + +由于使用了加速 LoRA 后提速较为明显,在提供的两组工作流中,我们默认启用了使用了加速 LoRA 版本,如果你需要启用另一组的工作流,框选后使用 **Ctrl+B** 即可启用 + +### 1. 工作流文件下载 + +请更新你的 ComfyUI 到最新版本,并通过菜单 `工作流` -> `浏览模板` -> `视频` 找到 "**Wan2.2 Fun Inp**" 以加载工作流 + +或者更新你的 ComfyUI 到最新版本后,下载下面的工作流并拖入 ComfyUI 以加载工作流 + + + + +

下载 JSON 格式工作流

+
+ +使用下面的素材作为首尾帧 + +![Wan2.2 Fun Control ComfyUI 工作流起始帧素材](https://raw.githubusercontent.com/Comfy-Org/example_workflows/refs/heads/main/video/wan/wan2.2_fun_inp/start_image.png) +![Wan2.2 Fun Control ComfyUI 工作流起始帧素材](https://raw.githubusercontent.com/Comfy-Org/example_workflows/refs/heads/main/video/wan/wan2.2_fun_inp/end_image.png) + +### 2. 手动下载模型 + +**Diffusion Model** +- [wan2.2_fun_inpaint_high_noise_14B_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_fun_inpaint_high_noise_14B_fp8_scaled.safetensors) +- [wan2.2_fun_inpaint_low_noise_14B_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_fun_inpaint_low_noise_14B_fp8_scaled.safetensors) + +**Lightning LoRA (可选,用于加速)** +- [wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/loras/wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors) +- [wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/loras/wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors) + +**VAE** +- [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors) + +**Text Encoder** +- [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors) + +``` +ComfyUI/ +├───📂 models/ +│ ├───📂 diffusion_models/ +│ │ ├─── wan2.2_fun_inpaint_high_noise_14B_fp8_scaled.safetensors +│ │ └─── wan2.2_fun_inpaint_low_noise_14B_fp8_scaled.safetensors +│ ├───📂 loras/ +│ │ ├─── wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors +│ │ └─── wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors +│ ├───📂 text_encoders/ +│ │ └─── umt5_xxl_fp8_e4m3fn_scaled.safetensors +│ └───📂 vae/ +│ └── wan_2.1_vae.safetensors +``` + +### 3. 按步骤完成工作流 + +![步骤图](/images/tutorial/video/wan/wan2_2/wan_2.2_14b_fun_inp.jpg) + + + 这个工作流是使用了 LoRA 的工作流,请确保对应的 Diffusion model 和 LoRA 是一致的 + + +1. **High noise** 模型及 **LoRA** 加载 + - 确保 `Load Diffusion Model` 节点加载了 `wan2.2_fun_inpaint_high_noise_14B_fp8_scaled.safetensors` 模型 + - 确保 `LoraLoaderModelOnly` 节点加载了 `wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors` +2. **Low noise** 模型及 **LoRA** 加载 + - 确保 `Load Diffusion Model` 节点加载了 `wan2.2_fun_inpaint_low_noise_14B_fp8_scaled.safetensors` 模型 + - 确保 `LoraLoaderModelOnly` 节点加载了 `wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors` +3. 确保 `Load CLIP` 节点加载了 `umt5_xxl_fp8_e4m3fn_scaled.safetensors` 模型 +4. 确保 `Load VAE` 节点加载了 `wan_2.1_vae.safetensors` 模型 +5. 首尾帧图片上传,分别上传首尾帧图片素材 +6. 在 Prompt 组中输入提示词 +7. `WanFunInpaintToVideo` 节点尺寸和视频长度调整 + - 调整 `width` 和 `height` 的尺寸,默认为 `640`, 我们设置了较小的尺寸你可以按需进行修改 + - 调整 `length`, 这里为视频总帧数,当前工作流 fps 为 16, 假设你需要生成一个 5 秒的视频,那么你应该设置 5*16 = 80 +8. 点击 `Run` 按钮,或者使用快捷键 `Ctrl(cmd) + Enter(回车)` 来执行视频生成