Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,7 @@
"group": "Wan Video",
"pages": [
"tutorials/video/wan/wan2_2",
"tutorials/video/wan/wan2-2-fun-inp",
{
"group": "Wan2.1",
"pages": [
Expand Down Expand Up @@ -698,6 +699,7 @@
"group": "万相视频",
"pages": [
"zh-CN/tutorials/video/wan/wan2_2",
"zh-CN/tutorials/video/wan/wan2-2-fun-inp",
{
"group": "Wan2.1",
"pages": [
Expand Down
Binary file modified images/tutorial/image/qwen/image_qwen_image-guide.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
68 changes: 43 additions & 25 deletions tutorials/image/qwen/qwen-image.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,17 +22,21 @@ import UpdateReminder from '/snippets/tutorials/update-reminder.mdx'

<UpdateReminder />

**VRAM usage reference**

Tested with **RTX 4090D 24GB**

Model Version: Qwen-Image_fp8
- VRAM: 86%
- Generation time: 94s for the first time, 71s for the second time
There are three different models used in the workflow attached to this document:
1. Qwen-Image original model fp8_e4m3fn
2. 8-step accelerated version: Qwen-Image original model fp8_e4m3fn with lightx2v 8-step LoRA
3. Distilled version: Qwen-Image distilled model fp8_e4m3fn

**Model Version: Qwen-Image_bf16**
- VRAM: 96%
- Generation time: 295s for the first time, 131s for the second time
**VRAM Usage Reference**
GPU: RTX4090D 24GB

| Model Used | VRAM Usage | First Generation | Second Generation |
| --------------------------------------- | ---------- | --------------- | ---------------- |
| fp8_e4m3fn | 86% | ≈ 94s | ≈ 71s |
| fp8_e4m3fn with lightx2v 8-step LoRA | 86% | ≈ 55s | ≈ 34s |
| Distilled fp8_e4m3fn | 86% | ≈ 69s | ≈ 36s |


### 1. Workflow File
Expand All @@ -59,23 +63,27 @@ Distilled version

All models are available at [Huggingface](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/tree/main) and [Modelscope](https://modelscope.cn/models/Comfy-Org/Qwen-Image_ComfyUI/files)

**Diffusion Model**
**Diffusion model**

- [qwen_image_fp8_e4m3fn.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/diffusion_models/qwen_image_fp8_e4m3fn.safetensors)

[qwen_image_fp8_e4m3fn.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/diffusion_models/qwen_image_fp8_e4m3fn.safetensors)
Qwen_image_distill

The following models are unofficial distilled versions that require only 15 steps.
[Distilled Versions](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/tree/main/non_official/diffusion_models)
- [qwen_image_distill_full_bf16.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/non_official/diffusion_models/qwen_image_distill_full_bf16.safetensors) 40.9 GB
- [qwen_image_distill_full_fp8.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/non_official/diffusion_models/qwen_image_distill_full_fp8_e4m3fn.safetensors) 20.4 GB
- [qwen_image_distill_full_fp8_e4m3fn.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/non_official/diffusion_models/qwen_image_distill_full_fp8_e4m3fn.safetensors)
- [qwen_image_distill_full_bf16.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/non_official/diffusion_models/qwen_image_distill_full_bf16.safetensors)

<Note>
- The original author of the distilled version recommends using 15 steps with cfg 1.0.
- According to tests, this distilled version also performs well at 10 steps with cfg 1.0. You can choose euler or res_multistep according to your desired image type.
- The original author of the distilled version recommends using 15 steps with cfg 1.0.
- According to tests, this distilled version also performs well at 10 steps with cfg 1.0. You can choose either euler or res_multistep based on the type of image you want.
</Note>

**Text Encoder**
**LoRA**

- [Qwen-Image-Lightning-8steps-V1.0.safetensors](https://huggingface.co/lightx2v/Qwen-Image-Lightning/resolve/main/Qwen-Image-Lightning-8steps-V1.0.safetensors)

**Text encoder**

[qwen_2.5_vl_7b_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors)
- [qwen_2.5_vl_7b_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors)

**VAE**

Expand All @@ -87,19 +95,29 @@ The following models are unofficial distilled versions that require only 15 step
📂 ComfyUI/
├── 📂 models/
│ ├── 📂 diffusion_models/
│ │ └── qwen_image_fp8_e4m3fn.safetensors
│ │ ├── qwen_image_fp8_e4m3fn.safetensors
│ │ └── qwen_image_distill_full_fp8_e4m3fn.safetensors ## 蒸馏版
│ ├── 📂 loras/
│ │ └── Qwen-Image-Lightning-8steps-V1.0.safetensors ## 8步加速 LoRA 模型
│ ├── 📂 vae/
│ │ └── qwen_image_vae.safetensors
│ └── 📂 text_encoders/
│ └── qwen_2.5_vl_7b_fp8_scaled.safetensors
```

### 3. Complete the Workflow Step by Step

![Step Guide](/images/tutorial/image/qwen/image_qwen_image-guide.jpg)

1. Load `qwen_image_fp8_e4m3fn.safetensors` in the `Load Diffusion Model` node
2. Load `qwen_2.5_vl_7b_fp8_scaled.safetensors` in the `Load CLIP` node
3. Load `qwen_image_vae.safetensors` in the `Load VAE` node
4. Set image dimensions in the `EmptySD3LatentImage` node
5. Enter your prompts in the `CLIP Text Encoder` (supports English, Chinese, Korean, Japanese, Italian, etc.)
6. Click Queue or press `Ctrl+Enter` to run
1. Make sure the `Load Diffusion Model` node has loaded `qwen_image_fp8_e4m3fn.safetensors`
2. Make sure the `Load CLIP` node has loaded `qwen_2.5_vl_7b_fp8_scaled.safetensors`
3. Make sure the `Load VAE` node has loaded `qwen_image_vae.safetensors`
4. Make sure the `EmptySD3LatentImage` node is set with the correct image dimensions
5. Set your prompt in the `CLIP Text Encoder` node; currently, it supports at least English, Chinese, Korean, Japanese, Italian, etc.
6. If you want to enable the 8-step acceleration LoRA by lightx2v, select the node and use `Ctrl + B` to enable it, and modify the Ksampler settings as described in step 8
7. Click the `Queue` button, or use the shortcut `Ctrl(cmd) + Enter` to run the workflow
8. For different model versions and workflows, adjust the KSampler parameters accordingly

<Note>
The distilled model and the 8-step acceleration LoRA by lightx2v do not seem to be compatible for simultaneous use. You can experiment with different combinations to verify if they can be used together.
</Note>
114 changes: 114 additions & 0 deletions tutorials/video/wan/wan2-2-fun-inp.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
---
title: "ComfyUI Wan2.2 Fun Inp Start-End Frame Video Generation Example"
description: "This article introduces how to use ComfyUI to complete the Wan2.2 Fun Inp start-end frame video generation example"
sidebarTitle: "Wan2.2 Fun Inp"
---

import UpdateReminder from '/snippets/tutorials/update-reminder.mdx'

**Wan2.2-Fun-Inp** is a start-end frame controlled video generation model launched by Alibaba PAI team. It supports inputting **start and end frame images** to generate intermediate transition videos, providing creators with greater creative control. The model is released under the **Apache 2.0 license** and supports commercial use.

**Key Features**:
- **Start-End Frame Control**: Supports inputting start and end frame images to generate intermediate transition videos, enhancing video coherence and creative freedom
- **High-Quality Video Generation**: Based on the Wan2.2 architecture, outputs film-level quality videos
- **Multi-Resolution Support**: Supports generating videos at 512×512, 768×768, 1024×1024 and other resolutions to suit different scenarios

**Model Version**:
- **14B High-Performance Version**: Model size exceeds 32GB, with better results but requires high VRAM

Below are the relevant model weights and code repositories:

- [🤗Wan2.2-Fun-Inp-14B](https://huggingface.co/alibaba-pai/Wan2.2-Fun-A14B-InP)
- Code repository: [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun)

<UpdateReminder/>

## Wan2.2 Fun Inp Start-End Frame Video Generation Workflow Example

This workflow provides two versions:
1. A version using [Wan2.2-Lightning](https://huggingface.co/lightx2v/Wan2.2-Lightning) 4-step LoRA from lightx2v for accelerated video generation
2. A fp8_scaled version without acceleration LoRA

Below are the test results using an RTX4090D 24GB VRAM GPU

| Model Type | Resolution | VRAM Usage | First Generation Time | Second Generation Time |
| ------------------------ | ---------- | ---------- | -------------------- | --------------------- |
| fp8_scaled | 640×640 | 83% | ≈ 524s | ≈ 520s |
| fp8_scaled + 4-step LoRA | 640×640 | 89% | ≈ 138s | ≈ 79s |

Since the acceleration with LoRA is significant, the provided workflows enable the accelerated LoRA version by default. If you want to enable the other workflow, select it and use **Ctrl+B** to activate.

### 1. Download Workflow File

Please update your ComfyUI to the latest version, and find "**Wan2.2 Fun Inp**" under the menu `Workflow` -> `Browse Templates` -> `Video` to load the workflow.

Or, after updating ComfyUI to the latest version, download the workflow below and drag it into ComfyUI to load.

<video
controls
className="w-full aspect-video"
src="https://raw.githubusercontent.com/Comfy-Org/example_workflows/refs/heads/main/video/wan/wan2.2_fun_inp/wan2.2_14B_fun_inp.mp4"
></video>

<a className="prose" target='_blank' href="https://raw.githubusercontent.com/Comfy-Org/workflow_templates/refs/heads/main/templates/video_wan2_2_14B_fun_inpaint.json" style={{ display: 'inline-block', backgroundColor: '#0078D6', color: '#ffffff', padding: '10px 20px', borderRadius: '8px', borderColor: "transparent", textDecoration: 'none', fontWeight: 'bold'}}>
<p className="prose" style={{ margin: 0, fontSize: "0.8rem" }}>Download JSON Workflow</p>
</a>

Use the following materials as the start and end frames

![Wan2.2 Fun Control ComfyUI Workflow Start Frame Material](https://raw.githubusercontent.com/Comfy-Org/example_workflows/refs/heads/main/video/wan/wan2.2_fun_inp/start_image.png)
![Wan2.2 Fun Control ComfyUI Workflow End Frame Material](https://raw.githubusercontent.com/Comfy-Org/example_workflows/refs/heads/main/video/wan/wan2.2_fun_inp/end_image.png)

### 2. Manually Download Models

**Diffusion Model**
- [wan2.2_fun_inpaint_high_noise_14B_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_fun_inpaint_high_noise_14B_fp8_scaled.safetensors)
- [wan2.2_fun_inpaint_low_noise_14B_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_fun_inpaint_low_noise_14B_fp8_scaled.safetensors)

**Lightning LoRA (Optional, for acceleration)**
- [wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/loras/wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors)
- [wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/loras/wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors)

**VAE**
- [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors)

**Text Encoder**
- [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)

```
ComfyUI/
├───📂 models/
│ ├───📂 diffusion_models/
│ │ ├─── wan2.2_fun_inpaint_high_noise_14B_fp8_scaled.safetensors
│ │ └─── wan2.2_fun_inpaint_low_noise_14B_fp8_scaled.safetensors
│ ├───📂 loras/
│ │ ├─── wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors
│ │ └─── wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors
│ ├───📂 text_encoders/
│ │ └─── umt5_xxl_fp8_e4m3fn_scaled.safetensors
│ └───📂 vae/
│ └── wan_2.1_vae.safetensors
```

### 3. Step-by-Step Workflow Guide

![Workflow Step Image](/images/tutorial/video/wan/wan2_2/wan_2.2_14b_fun_inp.jpg)

<Note>
This workflow uses LoRA. Please make sure the corresponding Diffusion model and LoRA are matched.
</Note>

1. **High noise** model and **LoRA** loading
- Ensure the `Load Diffusion Model` node loads the `wan2.2_fun_inpaint_high_noise_14B_fp8_scaled.safetensors` model
- Ensure the `LoraLoaderModelOnly` node loads the `wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors`
2. **Low noise** model and **LoRA** loading
- Ensure the `Load Diffusion Model` node loads the `wan2.2_fun_inpaint_low_noise_14B_fp8_scaled.safetensors` model
- Ensure the `LoraLoaderModelOnly` node loads the `wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors`
3. Ensure the `Load CLIP` node loads the `umt5_xxl_fp8_e4m3fn_scaled.safetensors` model
4. Ensure the `Load VAE` node loads the `wan_2.1_vae.safetensors` model
5. Upload the start and end frame images as materials
6. Enter your prompt in the Prompt group
7. Adjust the size and video length in the `WanFunInpaintToVideo` node
- Adjust the `width` and `height` parameters. The default is `640`. We set a smaller size, but you can modify it as needed.
- Adjust the `length`, which is the total number of frames. The current workflow fps is 16. For example, if you want to generate a 5-second video, you should set it to 5*16 = 80.
8. Click the `Run` button, or use the shortcut `Ctrl(cmd) + Enter` to execute video generation
59 changes: 39 additions & 20 deletions zh-CN/tutorials/image/qwen/qwen-image.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,20 @@ import UpdateReminder from '/snippets/zh/tutorials/update-reminder.mdx'

<UpdateReminder />

**显存使用参考**
使用 **RTX 4090D 24GB** 测试

**模型版本: Qwen-Image_fp8**
- VRAM: 86%
- 生成时间: 首次 94 秒,第二次 71 秒
在本篇文档所附工作流中使用的不同模型有三种
1. Qwen-Image 原版模型 fp8_e4m3fn
2. 8步加速版: Qwen-Image 原版模型 fp8_e4m3fn 使用 lightx2v 8步 LoRA,
3. 蒸馏版:Qwen-Image 蒸馏版模型 fp8_e4m3fn

**显存使用参考**
GPU: RTX4090D 24GB

**模型版本: Qwen-Image_bf16**
- VRAM: 96%
- 生成时间: 首次 295 秒,第二次 131 秒
| 使用模型 | VRAM Usage | 首次生成 | 第二次生成 |
| --------------------------------- | ---------- | -------- | ---------- |
| fp8_e4m3fn | 86% | ≈ 94s | ≈ 71s |
| fp8_e4m3fn 使用 lightx2v 8步 LoRA | 86% | ≈ 55s | ≈ 34s |
| 蒸馏版 fp8_e4m3fn | 86% | ≈ 69s | ≈ 36s |

### 1. 工作流文件

Expand All @@ -48,47 +52,56 @@ import UpdateReminder from '/snippets/zh/tutorials/update-reminder.mdx'
</a>
### 2. 模型下载

**ComfyUI 提供的版本**
**你可以在 ComfyOrg 仓库找到的版本**
- Qwen-Image_bf16 (40.9 GB)
- Qwen-Image_fp8 (20.4 GB)
- 蒸馏版本 (非官方,仅需 15 步)


所有模型均可在 [Huggingface](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/tree/main) 或者 [魔搭](https://modelscope.cn/models/Comfy-Org/Qwen-Image_ComfyUI/files) 找到

**Diffusion Model**
**Diffusion model**

[qwen_image_fp8_e4m3fn.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/diffusion_models/qwen_image_fp8_e4m3fn.safetensors)
- [qwen_image_fp8_e4m3fn.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/diffusion_models/qwen_image_fp8_e4m3fn.safetensors)

下面的模型为非官方仅需 15 步的蒸馏版本
[蒸馏版本](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/tree/main/non_official/diffusion_models)
- [qwen_image_distill_full_bf16.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/non_official/diffusion_models/qwen_image_distill_full_bf16.safetensors) 40.9 GB
- [qwen_image_distill_full_fp8.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/non_official/diffusion_models/qwen_image_distill_full_fp8_e4m3fn.safetensors) 20.4 GB
Qwen_image_distill

- [qwen_image_distill_full_fp8_e4m3fn.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/non_official/diffusion_models/qwen_image_distill_full_fp8_e4m3fn.safetensors)
- [qwen_image_distill_full_bf16.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/non_official/diffusion_models/qwen_image_distill_full_bf16.safetensors)

<Note>
- 蒸馏版本原始作者建议在 15 步 cfg 1.0
- 经测试该蒸馏版本在 10 步 cfg 1.0 下表现良好,根据你想要的图像类型选择 euler 或 res_multistep
</Note>

**Text Encoder**
**LoRA**

- [Qwen-Image-Lightning-8steps-V1.0.safetensors](https://huggingface.co/lightx2v/Qwen-Image-Lightning/resolve/main/Qwen-Image-Lightning-8steps-V1.0.safetensors)

[qwen_2.5_vl_7b_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors)
**Text encoder**

- [qwen_2.5_vl_7b_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors)

**VAE**

[qwen_image_vae.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/vae/qwen_image_vae.safetensors)
- [qwen_image_vae.safetensors](https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/vae/qwen_image_vae.safetensors)

模型保存位置

```
📂 ComfyUI/
├── 📂 models/
│ ├── 📂 diffusion_models/
│ │ └── qwen_image_fp8_e4m3fn.safetensors
│ │ ├── qwen_image_fp8_e4m3fn.safetensors
│ │ └── qwen_image_distill_full_fp8_e4m3fn.safetensors ## 蒸馏版
│ ├── 📂 loras/
│ │ └── Qwen-Image-Lightning-8steps-V1.0.safetensors ## 8步加速 LoRA 模型
│ ├── 📂 vae/
│ │ └── qwen_image_vae.safetensors
│ └── 📂 text_encoders/
│ └── qwen_2.5_vl_7b_fp8_scaled.safetensors
```

### 3. 按步骤完成工作流

![步骤图](/images/tutorial/image/qwen/image_qwen_image-guide.jpg)
Expand All @@ -98,4 +111,10 @@ import UpdateReminder from '/snippets/zh/tutorials/update-reminder.mdx'
3. 确保 `Load VAE`节点中加载了`qwen_image_vae.safetensors`
4. 确保 `EmptySD3LatentImage`节点中设置好了图片的尺寸
5. 在`CLIP Text Encoder`节点中设置好提示词,目前经过测试目前至少支持:英语、中文、韩语、日语、意大利语等
6. 点击 `Queue` 按钮,或者使用快捷键 `Ctrl(cmd) + Enter(回车)` 来运行工作流
6. 如果需要启用 lightx2v 的 8 步加速 LoRA ,请选中后用 `Ctrl + B` 启用该节点,并按 序号`8` 处的设置参数修改 Ksampler 的设置设置
7. 点击 `Queue` 按钮,或者使用快捷键 `Ctrl(cmd) + Enter(回车)` 来运行工作流
8. 对于不同版本的模型和工作流的对应 KSampler 的参数设置

<Note>
蒸馏版模型和 lightx2v 的 8 步加速 LoRA 似乎不能同时使用,你可以测试具体的组合参数来验证组合使用的方式是否可行
</Note>
Loading