diff --git a/changelog/index.mdx b/changelog/index.mdx index c257f703f..f2b612fde 100644 --- a/changelog/index.mdx +++ b/changelog/index.mdx @@ -10,28 +10,28 @@ icon: "clock-rotate-left" This release brings significant user experience improvements and cutting-edge model support that enhance workflow creation and performance across diverse AI applications: -## User Interface Enhancements +### User Interface Enhancements - **Recently Used Items API**: New API for tracking recently used items in the interface, streamlining workflow creation by providing quick access to frequently used nodes and components - **Improved Workflow Navigation**: Enhanced user experience with better organization of commonly accessed elements, reducing time spent searching for nodes -## Advanced Model Integration +### Advanced Model Integration - **Qwen Vision Model Support**: Initial support for Qwen image models with comprehensive configuration options including default shift settings and flexible latent size handling - **Optimized Image Processing**: Enhanced Qwen model integration allows for more versatile image analysis and generation workflows, expanding AI capabilities for vision tasks -## Revolutionary Video Generation +### Revolutionary Video Generation - **Veo3 Video Generation**: Added powerful Veo3 video generation node with integrated audio support, enabling creators to produce high-quality video content with synchronized audio directly in ComfyUI workflows - **Audio-Visual Synthesis**: Breakthrough capability combining video and audio generation in a single node, perfect for content creators and multimedia professionals -## Performance & Stability Improvements +### Performance & Stability Improvements - **Enhanced Memory Management**: Optimized conditional (cond) VRAM usage through improved casting and device transfer operations, reducing memory overhead during complex generation tasks - **Device Consistency**: Comprehensive fixes ensuring all conditioning data and context remain on correct devices, preventing crashes and improving workflow reliability - **ControlNet Stability**: Resolved critical ControlNet compatibility issues, restoring full functionality for precise image control workflows -## Developer & System Enhancements +### Developer & System Enhancements - **Robust Error Handling**: Added intelligent warnings and crash prevention when conditioning devices don't match, improving workflow debugging and stability - **Template Updates**: Multiple template version updates (0.1.47, 0.1.48, 0.1.51) maintaining compatibility with latest development standards and ensuring smooth node integration -## Workflow Benefits +### Workflow Benefits - **Faster Iteration**: Recently used items API enables quicker workflow assembly and modification - **Enhanced Creativity**: Qwen vision models open new possibilities for image understanding and manipulation workflows - **Professional Video Production**: Veo3 integration transforms ComfyUI into a comprehensive multimedia creation platform @@ -48,23 +48,23 @@ This release significantly expands ComfyUI's capabilities, particularly for mult This release introduces significant backend improvements and performance optimizations that enhance workflow execution and node development capabilities: -## ComfyAPI Core Framework +### ComfyAPI Core Framework - **ComfyAPI Core v0.0.2**: Major update to the core API framework, providing improved stability and extensibility for custom node development and third-party integrations - **Partial Execution Support**: New backend support for partial workflow execution, enabling more efficient processing of complex multi-stage workflows by allowing selective node execution -## Video Processing Improvements +### Video Processing Improvements - **WAN Camera Memory Optimization**: Enhanced memory management for WAN-based camera workflows, reducing VRAM usage during video processing operations - **WanFirstLastFrameToVideo Fix**: Resolved critical issue preventing proper video generation when clip vision components are not available, improving workflow reliability -## Performance & Model Optimizations +### Performance & Model Optimizations - **VAE Nonlinearity Enhancement**: Replaced manual activation functions with optimized torch.silu in VAE operations, providing better performance and numerical stability for image encoding/decoding - **WAN VAE Optimizations**: Additional fine-tuning optimizations for WAN VAE operations, improving processing speed and memory efficiency in video generation workflows -## Node Schema Evolution +### Node Schema Evolution - **V3 Node Schema Definition**: Initial implementation of next-generation node schema system, laying the groundwork for enhanced node type definitions and improved workflow validation - **Template Updates**: Multiple template version updates (0.1.44, 0.1.45) ensuring compatibility with latest node development standards and best practices -## Workflow Development Benefits +### Workflow Development Benefits - **Enhanced Video Workflows**: Improved stability and performance for video generation pipelines, particularly those using WAN-based models - **Better Memory Management**: Optimized memory usage patterns enable more complex workflows on systems with limited VRAM - **Improved API Reliability**: Core API enhancements provide more stable foundation for custom node development and workflow automation @@ -80,15 +80,15 @@ These foundational improvements strengthen ComfyUI's core architecture while del This release focuses on critical memory optimizations for large model workflows, particularly improving performance with WAN 2.2 models and enhancing VRAM management for high-end systems: -## WAN 2.2 Model Optimizations +### WAN 2.2 Model Optimizations - **Reduced Memory Footprint**: Eliminated unnecessary memory clones in WAN 2.2 VAE operations, significantly reducing memory usage during image encoding/decoding workflows - **5B I2V Model Support**: Major memory optimization for WAN 2.2 5B image-to-video models, making these large-scale models more accessible for creators with limited VRAM -## Enhanced VRAM Management +### Enhanced VRAM Management - **Windows Large Card Support**: Added extra reserved VRAM allocation for high-end graphics cards on Windows, preventing system instability during intensive generation workflows - **Better Memory Allocation**: Improved memory management for users working with multiple large models simultaneously -## Workflow Performance Benefits +### Workflow Performance Benefits - **Faster VAE Processing**: WAN 2.2 VAE operations now run more efficiently with reduced memory overhead, enabling smoother image generation pipelines - **Stable Large Model Inference**: Enhanced stability when working with billion-parameter models, crucial for professional AI art creation and research workflows - **Improved Batch Processing**: Memory optimizations enable better handling of batch operations with large models @@ -103,35 +103,35 @@ These targeted optimizations make ComfyUI more reliable for professional workflo This release focuses on expanding hardware support and enhancing audio processing capabilities for workflow creators: -## Audio Processing Enhancements +### Audio Processing Enhancements - **PyAV Audio Backend**: Replaced torchaudio.load with PyAV for more reliable audio processing in video workflows, improving compatibility and performance - **Better Audio Integration**: Enhanced audio handling for multimedia generation workflows, particularly beneficial for video content creators -## Expanded Hardware Support +### Expanded Hardware Support - **Iluvatar CoreX Support**: Added native support for Iluvatar CoreX accelerators, expanding hardware options for AI inference - **Intel XPU Optimization**: Comprehensive XPU support improvements including async offload capabilities and device-specific optimizations - **AMD ROCm Enhancements**: Enabled PyTorch attention by default for gfx1201 on Torch 2.8, improving performance on AMD hardware - **CUDA Memory Management**: Fixed CUDA malloc to only activate on CUDA-enabled PyTorch installations, preventing conflicts on other platforms -## Sampling Algorithm Improvements +### Sampling Algorithm Improvements - **Euler CFG++ Enhancement**: Separated denoised and noise estimation processes in Euler CFG++ sampler for improved numerical precision and quality - **WAN Model Support**: Added comprehensive support for WAN (Wavelet-based Attention Network) models including ATI support and WAN 2.2 compatibility -## Advanced Training Features +### Advanced Training Features - **Enhanced Training Nodes**: Added algorithm support, gradient accumulation, and optional gradient checkpointing to training workflows - **Improved Training Flexibility**: Better memory management and performance optimization for custom model training -## Node & Workflow Enhancements +### Node & Workflow Enhancements - **Moonvalley V2V Node**: Added Moonvalley Marey V2V node with enhanced input validation for video-to-video workflows - **Negative Prompt Updates**: Improved negative prompt handling for Moonvalley nodes, providing better control over generation outputs - **History API Enhancement**: Added map_function parameter to get_history API for more flexible workflow history management -## API & System Improvements +### API & System Improvements - **Frontend Version Tracking**: Added required_frontend_version parameter in /system_stats API response for better version compatibility - **Device Information**: Enhanced XPU device name printing for better hardware identification and debugging - **Template Updates**: Multiple template updates (0.1.40, 0.1.41) ensuring compatibility with latest node development standards -## Developer Experience +### Developer Experience - **Documentation Updates**: Enhanced README with HiDream E1.1 examples and updated model integration guides - **Line Ending Fixes**: Improved cross-platform compatibility by standardizing line endings in workflows - **Code Cleanup**: Removed deprecated code and optimized various components for better maintainability @@ -145,36 +145,36 @@ These improvements make ComfyUI more accessible across different hardware platfo This release introduces significant enhancements to sampling algorithms, training capabilities, and node functionality for AI researchers and workflow creators: -## New Sampling & Generation Features +### New Sampling & Generation Features - **SA-Solver Sampler**: New reconstructed SA-Solver sampling algorithm providing enhanced numerical stability and quality for complex generation workflows - **Experimental CFGNorm Node**: Advanced classifier-free guidance normalization for improved control over generation quality and style consistency - **Nested Dual CFG Support**: Added nested style configuration to DualCFGGuider node, offering more sophisticated guidance control patterns - **SamplingPercentToSigma Node**: New utility node for precise sigma calculation from sampling percentages, improving workflow flexibility -## Enhanced Training Capabilities +### Enhanced Training Capabilities - **Multi Image-Caption Dataset Support**: LoRA training node now handles multiple image-caption datasets simultaneously, streamlining training workflows - **Better Training Loop Implementation**: Optimized training algorithms for improved convergence and stability during model fine-tuning - **Enhanced Error Detection**: Added model detection error hints for LoRA operations, providing clearer feedback when issues occur -## Platform & Performance Improvements +### Platform & Performance Improvements - **Async Node Support**: Full support for asynchronous node functions with earlier execution optimization, improving workflow performance for I/O intensive operations - **Chroma Flexibility**: Un-hardcoded patch_size parameter in Chroma, allowing better adaptation to different model configurations - **LTXV VAE Decoder**: Switched to improved default padding mode for better image quality with LTXV models - **Safetensors Memory Management**: Added workaround for mmap issues, improving reliability when loading large model files -## API & Integration Enhancements +### API & Integration Enhancements - **Custom Prompt IDs**: API now allows specifying prompt IDs for better workflow tracking and management - **Kling API Optimization**: Increased polling timeout to prevent user timeouts during video generation workflows - **History Token Cleanup**: Removed sensitive tokens from history items for improved security - **Python 3.9 Compatibility**: Fixed compatibility issues ensuring broader platform support -## Bug Fixes & Stability +### Bug Fixes & Stability - **MaskComposite Fixes**: Resolved errors when destination masks have 2 dimensions, improving mask workflow reliability - **Fresca Input/Output**: Corrected input and output handling for Fresca model workflows - **Reference Bug Fixes**: Resolved incorrect reference bugs in Gemini node implementations - **Line Ending Standardization**: Automated detection and removal of Windows line endings for cross-platform consistency -## Developer Experience +### Developer Experience - **Warning Systems**: Added torch import mistake warnings to catch common configuration issues - **Template Updates**: Multiple template version updates (0.1.36, 0.1.37, 0.1.39) for improved custom node development - **Documentation**: Enhanced fast_fp16_accumulation documentation in portable configurations @@ -188,37 +188,37 @@ These improvements make ComfyUI more robust for production workflows while intro This release delivers significant improvements to sampling algorithms and model control systems, particularly benefiting advanced AI researchers and workflow creators: -## New Sampling Capabilities +### New Sampling Capabilities - **TCFG Node**: Enhanced classifier-free guidance control for more nuanced generation control in your workflows - **ER-SDE Sampler**: Migrated from VE to VP algorithm with new sampler node, providing better numerical stability for complex generation tasks - **Skip Layer Guidance (SLG)**: Alternative implementation for precise layer-level control during inference, perfect for advanced model steering workflows -## Enhanced Development Tools +### Enhanced Development Tools - **Custom Node Management**: New `--whitelist-custom-nodes` argument pairs with `--disable-all-custom-nodes` for precise development control - **Performance Optimizations**: Dual CFG node now optimizes automatically when CFG is 1.0, reducing computational overhead - **GitHub Actions Integration**: Automated release webhook notifications keep developers informed of new updates -## Image Processing Improvements +### Image Processing Improvements - **New Transform Nodes**: Added ImageRotate and ImageFlip nodes for enhanced image manipulation workflows - **ImageColorToMask Fix**: Corrected mask value returns for more accurate color-based masking operations - **3D Model Support**: Upload 3D models to custom subfolders for better organization in complex projects -## Guidance & Conditioning Enhancements +### Guidance & Conditioning Enhancements - **PerpNeg Guider**: Updated with improved pre and post-CFG handling plus performance optimizations - **Latent Conditioning Fix**: Resolved issues with conditioning at index > 0 for multi-step workflows - **Denoising Steps**: Added denoising step support to several samplers for cleaner outputs -## Platform Stability +### Platform Stability - **PyTorch Compatibility**: Fixed contiguous memory issues with PyTorch nightly builds - **FP8 Fallback**: Automatic fallback to regular operations when FP8 operations encounter exceptions - **Audio Processing**: Removed deprecated torchaudio.save function dependencies with warning fixes -## Model Integration +### Model Integration - **Moonvalley Nodes**: Added native support for Moonvalley model workflows - **Scheduler Reordering**: Simple scheduler now defaults first for better user experience - **Template Updates**: Multiple template version updates (0.1.31-0.1.35) for improved custom node development -## Security & Safety +### Security & Safety - **Safe Loading**: Added warnings when loading files unsafely, with documentation noting that checkpoint files are loaded safely by default - **File Validation**: Enhanced checkpoint loading safety measures for secure workflow execution @@ -274,27 +274,27 @@ This release significantly expands ComfyUI's model ecosystem support while deliv This release brings powerful new workflow utilities and performance optimizations for ComfyUI creators: -## New Workflow Tools +### New Workflow Tools - **ImageStitch Node**: Concatenate multiple images seamlessly in your workflows - perfect for creating comparison grids or composite outputs - **GetImageSize Node**: Extract image dimensions with batch processing support, essential for dynamic sizing workflows - **Regex Replace Node**: Advanced text manipulation capabilities for prompt engineering and string processing workflows -## Enhanced Model Compatibility +### Enhanced Model Compatibility - **Improved Tensor Handling**: Streamlined list processing makes complex multi-model workflows more reliable - **BFL API Optimization**: Refined support for Kontext [pro] and [max] models with cleaner node interfaces - **Performance Boost**: Fused multiply-add operations in chroma processing for faster generation times -## Developer Experience Improvements +### Developer Experience Improvements - **Custom Node Support**: Added pyproject.toml support for better custom node dependency management - **Help Menu Integration**: New help system in the Node Library sidebar for faster node discovery - **API Documentation**: Enhanced API nodes documentation for workflow automation -## Frontend & UI Enhancements +### Frontend & UI Enhancements - **Frontend Updated to v1.21.7**: Multiple stability fixes and performance improvements - **Custom API Base Support**: Better subpath handling for custom deployment configurations - **Security Hardening**: XSS vulnerability fixes for safer workflow sharing -## Bug Fixes & Stability +### Bug Fixes & Stability - **Pillow Compatibility**: Updated deprecated API calls to maintain compatibility with latest image processing libraries - **ROCm Support**: Improved version detection for AMD GPU users - **Template Updates**: Enhanced project templates for custom node development diff --git a/docs.json b/docs.json index 5722cb48a..26ee6e99d 100644 --- a/docs.json +++ b/docs.json @@ -152,6 +152,7 @@ "pages": [ "tutorials/video/wan/wan2_2", "tutorials/video/wan/wan2-2-fun-inp", + "tutorials/video/wan/wan2-2-fun-control", { "group": "Wan2.1", "pages": [ @@ -700,6 +701,7 @@ "pages": [ "zh-CN/tutorials/video/wan/wan2_2", "zh-CN/tutorials/video/wan/wan2-2-fun-inp", + "zh-CN/tutorials/video/wan/wan2-2-fun-control", { "group": "Wan2.1", "pages": [ diff --git a/images/tutorial/video/wan/wan2_2/wan_2.2_14b_fun_control.jpg b/images/tutorial/video/wan/wan2_2/wan_2.2_14b_fun_control.jpg new file mode 100644 index 000000000..b22d373bb Binary files /dev/null and b/images/tutorial/video/wan/wan2_2/wan_2.2_14b_fun_control.jpg differ diff --git a/tutorials/video/wan/wan2-2-fun-control.mdx b/tutorials/video/wan/wan2-2-fun-control.mdx new file mode 100644 index 000000000..cbe9ee7f5 --- /dev/null +++ b/tutorials/video/wan/wan2-2-fun-control.mdx @@ -0,0 +1,123 @@ +--- +title: "ComfyUI Wan2.2 Fun Control Video Generation Example" +description: "This article introduces how to use ComfyUI to complete the Wan2.2 Fun Control video generation using control videos" +sidebarTitle: "Wan2.2 Fun Control" +--- + +import UpdateReminder from '/snippets/tutorials/update-reminder.mdx' + +**Wan2.2-Fun-Control** is a next-generation video generation and control model launched by Alibaba PAI team. Through innovative Control Codes mechanism combined with deep learning and multi-modal conditional inputs, it can generate high-quality videos that comply with preset control conditions. The model is released under the **Apache 2.0 license** and supports commercial use. + +**Key Features**: +- **Multi-modal Control**: Supports multiple control conditions including **Canny (line art)**, **Depth**, **OpenPose (human pose)**, **MLSD (geometric edges)**, and **trajectory control** +- **High-Quality Video Generation**: Based on the Wan2.2 architecture, outputs film-level quality videos +- **Multi-language Support**: Supports multi-language prompts including Chinese and English + +Below are the relevant model weights and code repositories: + +- [🤗Wan2.2-Fun-A14B-Control](https://huggingface.co/alibaba-pai/Wan2.2-Fun-A14B-Control) +- Code repository: [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun) + + + +## Wan2.2 Fun Control Video Generation Workflow Example + +This workflow provides two versions: +1. A version using [Wan2.2-Lightning](https://huggingface.co/lightx2v/Wan2.2-Lightning) 4-step LoRA from lightx2v: may cause some loss in video dynamics but offers faster speed +2. A fp8_scaled version without acceleration LoRA + +Below are the test results using an RTX4090D 24GB VRAM GPU at 640×640 resolution with 81 frames + +| Model Type | VRAM Usage | First Generation Time | Second Generation Time | +| ------------------------ | ---------- | -------------------- | --------------------- | +| fp8_scaled | 83% | ≈ 524s | ≈ 520s | +| fp8_scaled + 4-step LoRA | 89% | ≈ 138s | ≈ 79s | + +Since using the 4-step LoRA provides a better experience for first-time workflow users, but may cause some loss in video dynamics, we have enabled the accelerated LoRA version by default. If you want to enable the other workflow, select it and use **Ctrl+B** to activate. + +### 1. Download Workflow and Materials + +Download the video below or JSON file and drag it into ComfyUI to load the workflow + + + + +

Download JSON Workflow

+
+ +Please download the following images and videos as input materials. + +![Input start image](https://raw.githubusercontent.com/Comfy-Org/example_workflows/refs/heads/main/video/wan/wan2.2_fun_control/input.jpg) + + + +> We use a preprocessed video here. + +### 2. Models + +You can find the models below at [Wan_2.2_ComfyUI_Repackaged](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged) + +**Diffusion Model** +- [wan2.2_fun_control_high_noise_14B_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_fun_control_high_noise_14B_fp8_scaled.safetensors) +- [wan2.2_fun_control_low_noise_14B_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_fun_control_low_noise_14B_fp8_scaled.safetensors) + +**Wan2.2-Lightning LoRA (Optional, for acceleration)** +- [wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/loras/wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors) +- [wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/loras/wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors) + +**VAE** +- [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors) + +**Text Encoder** +- [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors) + +``` +ComfyUI/ +├───📂 models/ +│ ├───📂 diffusion_models/ +│ │ ├─── wan2.2_fun_control_low_noise_14B_fp8_scaled.safetensors +│ │ └─── wan2.2_fun_control_high_noise_14B_fp8_scaled.safetensors +│ ├───📂 loras/ +│ │ ├─── wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors +│ │ └─── wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors +│ ├───📂 text_encoders/ +│ │ └─── umt5_xxl_fp8_e4m3fn_scaled.safetensors +│ └───📂 vae/ +│ └── wan_2.1_vae.safetensors +``` + + +### 3. Workflow Guide + +![Wan2.2 Fun Control Workflow Steps](/images/tutorial/video/wan/wan2_2/wan_2.2_14b_fun_control.jpg) + + + This workflow uses LoRA. Please ensure the corresponding Diffusion model and LoRA are matched - high noise and low noise models and LoRAs need to be used correspondingly. + + +1. **High noise** model and **LoRA** loading + - Ensure the `Load Diffusion Model` node loads the `wan2.2_fun_control_high_noise_14B_fp8_scaled.safetensors` model + - Ensure the `LoraLoaderModelOnly` node loads the `wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors` +2. **Low noise** model and **LoRA** loading + - Ensure the `Load Diffusion Model` node loads the `wan2.2_fun_control_low_noise_14B_fp8_scaled.safetensors` model + - Ensure the `LoraLoaderModelOnly` node loads the `wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors` +3. Ensure the `Load CLIP` node loads the `umt5_xxl_fp8_e4m3fn_scaled.safetensors` model +4. Ensure the `Load VAE` node loads the `wan_2.1_vae.safetensors` model +5. Upload the start frame in the `Load Image` node +6. In the second `Load video` node, load the pose control video. The provided video has been preprocessed and can be used directly +7. Since we provide a preprocessed pose video, the corresponding video image preprocessing node needs to be disabled. You can select it and use `Ctrl + B` to disable it +8. Modify the Prompt - you can use both Chinese and English +9. In `Wan22FunControlToVideo`, modify the video dimensions. The default is set to 640×640 resolution to avoid excessive processing time for users with low VRAM +10. Click the `Run` button, or use the shortcut `Ctrl(cmd) + Enter` to execute video generation + +### Additional Notes + +Since ComfyUI's built-in nodes only include Canny preprocessor, you can use tools like [ComfyUI-comfyui_controlnet_aux](https://github.com/Fannovel16/comfyui_controlnet_aux) to implement other types of image preprocessing \ No newline at end of file diff --git a/tutorials/video/wan/wan2-2-fun-inp.mdx b/tutorials/video/wan/wan2-2-fun-inp.mdx index caadcf1ab..75b602ba7 100644 --- a/tutorials/video/wan/wan2-2-fun-inp.mdx +++ b/tutorials/video/wan/wan2-2-fun-inp.mdx @@ -29,14 +29,14 @@ This workflow provides two versions: 1. A version using [Wan2.2-Lightning](https://huggingface.co/lightx2v/Wan2.2-Lightning) 4-step LoRA from lightx2v for accelerated video generation 2. A fp8_scaled version without acceleration LoRA -Below are the test results using an RTX4090D 24GB VRAM GPU +Below are the test results using an RTX4090D 24GB VRAM GPU at 640×640 resolution with 81 frames -| Model Type | Resolution | VRAM Usage | First Generation Time | Second Generation Time | -| ------------------------ | ---------- | ---------- | -------------------- | --------------------- | -| fp8_scaled | 640×640 | 83% | ≈ 524s | ≈ 520s | -| fp8_scaled + 4-step LoRA | 640×640 | 89% | ≈ 138s | ≈ 79s | +| Model Type | VRAM Usage | First Generation Time | Second Generation Time | +| ------------------------ | ---------- | -------------------- | --------------------- | +| fp8_scaled | 83% | ≈ 524s | ≈ 520s | +| fp8_scaled + 4-step LoRA | 89% | ≈ 138s | ≈ 79s | -Since the acceleration with LoRA is significant, the provided workflows enable the accelerated LoRA version by default. If you want to enable the other workflow, select it and use **Ctrl+B** to activate. +Since the acceleration with LoRA is significant but the video dynamic is lost, the provided workflows enable the accelerated LoRA version by default. If you want to enable the other workflow, select it and use **Ctrl+B** to activate. ### 1. Download Workflow File @@ -59,7 +59,7 @@ Use the following materials as the start and end frames ![Wan2.2 Fun Control ComfyUI Workflow Start Frame Material](https://raw.githubusercontent.com/Comfy-Org/example_workflows/refs/heads/main/video/wan/wan2.2_fun_inp/start_image.png) ![Wan2.2 Fun Control ComfyUI Workflow End Frame Material](https://raw.githubusercontent.com/Comfy-Org/example_workflows/refs/heads/main/video/wan/wan2.2_fun_inp/end_image.png) -### 2. Manually Download Models +### 2. Models **Diffusion Model** - [wan2.2_fun_inpaint_high_noise_14B_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_fun_inpaint_high_noise_14B_fp8_scaled.safetensors) @@ -90,7 +90,7 @@ ComfyUI/ │ └── wan_2.1_vae.safetensors ``` -### 3. Step-by-Step Workflow Guide +### 3. Workflow Guide ![Workflow Step Image](/images/tutorial/video/wan/wan2_2/wan_2.2_14b_fun_inp.jpg) diff --git a/zh-CN/changelog/index.mdx b/zh-CN/changelog/index.mdx index cd9e80be7..0ce729475 100644 --- a/zh-CN/changelog/index.mdx +++ b/zh-CN/changelog/index.mdx @@ -10,28 +10,28 @@ icon: "clock-rotate-left" 此版本带来了重大的用户体验改进和前沿的模型支持,提升了工作流创建和在各种AI应用中的性能表现: -## 用户界面增强 +### 用户界面增强 - **最近使用项目 API**:新增用于跟踪界面中最近使用项目的 API,通过提供对常用节点和组件的快速访问来简化工作流创建 - **改进的工作流导航**:通过更好地组织常用元素来增强用户体验,减少搜索节点所花费的时间 -## 先进模型集成 +### 先进模型集成 - **Qwen 视觉模型支持**:初步支持 Qwen 图像模型,包含全面的配置选项,包括默认偏移设置和灵活的潜在空间尺寸处理 - **优化的图像处理**:增强的 Qwen 模型集成允许更多样化的图像分析和生成工作流,扩展了视觉任务的AI能力 -## 革命性的视频生成 +### 革命性的视频生成 - **Veo3 视频生成**:添加了强大的 Veo3 视频生成节点,集成了音频支持,使创作者能够直接在 ComfyUI 工作流中制作具有同步音频的高质量视频内容 - **音视频合成**:突破性功能,在单个节点中结合视频和音频生成,非常适合内容创作者和多媒体专业人士 -## 性能与稳定性改进 +### 性能与稳定性改进 - **增强的内存管理**:通过改进的类型转换和设备传输操作优化条件 (cond) VRAM 使用,减少复杂生成任务期间的内存开销 - **设备一致性**:全面修复确保所有条件数据和上下文保持在正确的设备上,防止崩溃并提高工作流可靠性 - **ControlNet 稳定性**:解决了关键的 ControlNet 兼容性问题,恢复了精确图像控制工作流的完整功能 -## 开发者与系统增强 +### 开发者与系统增强 - **强大的错误处理**:当条件设备不匹配时添加智能警告和崩溃预防,改进工作流调试和稳定性 - **模板更新**:多个模板版本更新 (0.1.47, 0.1.48, 0.1.51),保持与最新开发标准的兼容性,确保平滑的节点集成 -## 工作流优势 +### 工作流优势 - **更快的迭代**:最近使用项目 API 实现更快的工作流组装和修改 - **增强的创造力**:Qwen 视觉模型为图像理解和操作工作流开辟了新的可能性 - **专业视频制作**:Veo3 集成将 ComfyUI 转变为综合的多媒体创作平台 @@ -47,23 +47,23 @@ icon: "clock-rotate-left" 此版本引入了重要的后端改进和性能优化,增强了工作流执行和节点开发能力: -## ComfyAPI 核心框架 +### ComfyAPI 核心框架 - **ComfyAPI Core v0.0.2**:核心 API 框架的重大更新,为自定义节点开发和第三方集成提供了更好的稳定性和可扩展性 - **部分执行支持**:新的后端支持部分工作流执行,通过允许选择性节点执行,实现复杂多阶段工作流的更高效处理 -## 视频处理改进 +### 视频处理改进 - **WAN Camera 内存优化**:增强了基于 WAN 的相机工作流的内存管理,减少视频处理操作期间的 VRAM 使用 - **WanFirstLastFrameToVideo 修复**:解决了当 clip vision 组件不可用时阻止正确生成视频的关键问题,提高了工作流可靠性 -## 性能与模型优化 +### 性能与模型优化 - **VAE 非线性增强**:在 VAE 操作中用优化的 torch.silu 替换手动激活函数,为图像编码/解码提供更好的性能和数值稳定性 - **WAN VAE 优化**:对 WAN VAE 操作进行额外的微调优化,提高视频生成工作流中的处理速度和内存效率 -## 节点架构演进 +### 节点架构演进 - **V3 节点架构定义**:下一代节点架构系统的初始实现,为增强节点类型定义和改进工作流验证奠定基础 - **模板更新**:多个模板版本更新(0.1.44、0.1.45),确保与最新节点开发标准和最佳实践的兼容性 -## 工作流开发优势 +### 工作流开发优势 - **增强的视频工作流**:改善了视频生成管道的稳定性和性能,特别是那些使用基于 WAN 模型的工作流 - **更好的内存管理**:优化的内存使用模式使有限 VRAM 系统能够运行更复杂的工作流 - **改进的 API 可靠性**:核心 API 增强为自定义节点开发和工作流自动化提供了更稳定的基础 @@ -78,15 +78,15 @@ icon: "clock-rotate-left" 本次发布专注于大模型工作流的关键内存优化,特别是改进了 WAN 2.2 模型的性能,并增强了高端显卡的 VRAM 管理: -## WAN 2.2 模型优化 +### WAN 2.2 模型优化 - **减少内存占用**:去除了 WAN 2.2 VAE 操作中不必要的内存克隆,显著减少了图像编码/解码工作流中的内存使用 - **5B I2V 模型支持**:对 WAN 2.2 5B 图像到视频模型进行内存优化,使这些模型对 VRAM 有限的创作者更加易用 -## 增强的 VRAM 管理 +### 增强的 VRAM 管理 - **Windows 大显卡支持**:为 Windows 上的高端显卡增加了额外的保留 VRAM 分配,防止在密集生成工作流中出现系统不稳定 - **更好的内存分配**:改进了同时使用多个大模型的用户的内存管理 -## 工作流性能优势 +### 工作流性能优势 - **更快的 VAE 处理**:WAN 2.2 VAE 操作现在运行更高效,内存开销更少,实现更流畅的图像生成管道 - **稳定的大模型推理**:增强了处理数十亿参数模型时的稳定性 - **改进的批处理**:内存优化使大模型的批操作处理能力更强 @@ -100,35 +100,35 @@ icon: "clock-rotate-left" 此版本专注于扩展硬件支持并增强工作流创建者的音频处理能力: -## 音频处理增强 +### 音频处理增强 - **PyAV 音频后端**:用 PyAV 替换 torchaudio.load,在视频工作流中提供更可靠的音频处理,改善兼容性和性能 - **更好的音频集成**:增强多媒体生成工作流的音频处理,特别有利于视频内容创作者 -## 扩展的硬件支持 +### 扩展的硬件支持 - **Iluvatar CoreX 支持**:添加对 Iluvatar CoreX 加速器的原生支持,为 AI 推理扩展硬件选项 - **Intel XPU 优化**:全面的 XPU 支持改进,包括异步卸载功能和设备特定优化 - **AMD ROCm 增强**:在 Torch 2.8 上为 gfx1201 默认启用 PyTorch attention,提升 AMD 硬件性能 - **CUDA 内存管理**:修复 CUDA malloc 仅在启用 CUDA 的 PyTorch 安装上激活,防止在其他平台上发生冲突 -## 采样算法改进 +### 采样算法改进 - **Euler CFG++ 增强**:在 Euler CFG++ 采样器中分离去噪和噪声估计过程,改善数值精度和质量 - **WAN 模型支持**:添加对 WAN(基于小波的注意力网络)模型的全面支持,包括 ATI 支持和 WAN 2.2 兼容性 -## 高级训练功能 +### 高级训练功能 - **增强的训练节点**:向训练工作流添加算法支持、梯度累积和可选梯度检查点 - **改进的训练灵活性**:为自定义模型训练提供更好的内存管理和性能优化 -## 节点和工作流增强 +### 节点和工作流增强 - **Moonvalley V2V 节点**:添加 Moonvalley Marey V2V 节点,为视频到视频工作流提供增强的输入验证 - **负面提示词更新**:改进 Moonvalley 节点的负面提示词处理,提供对生成输出的更好控制 - **历史 API 增强**:向 get_history API 添加 map_function 参数,实现更灵活的工作流历史管理 -## API 和系统改进 +### API 和系统改进 - **前端版本跟踪**:在 /system_stats API 响应中添加 required_frontend_version 参数,改善版本兼容性 - **设备信息**:增强 XPU 设备名称打印,改善硬件识别和调试 - **模板更新**:多个模板更新(0.1.40、0.1.41),确保与最新节点开发标准的兼容性 -## 开发者体验 +### 开发者体验 - **文档更新**:使用 HiDream E1.1 示例增强 README,并更新模型集成指南 - **行结束符修复**:通过标准化工作流中的行结束符改善跨平台兼容性 - **代码清理**:移除已弃用的代码并优化各种组件以提高可维护性 @@ -142,36 +142,36 @@ icon: "clock-rotate-left" 本版本为AI研究人员和工作流程创建者引入了采样算法、训练功能和节点功能的重大增强: -## 新的采样和生成功能 +### 新的采样和生成功能 - **SA-Solver采样器**:新的重构SA-Solver采样算法,为复杂生成工作流提供增强的数值稳定性和质量 - **实验性CFGNorm节点**:高级无分类器引导标准化,用于改进生成质量和风格一致性的控制 - **嵌套双CFG支持**:为DualCFGGuider节点添加嵌套风格配置,提供更复杂的引导控制模式 - **SamplingPercentToSigma节点**:用于从采样百分比精确计算sigma的新实用节点,提高工作流程灵活性 -## 增强的训练功能 +### 增强的训练功能 - **多图像-描述数据集支持**:LoRA训练节点现在可以同时处理多个图像-描述数据集,简化训练工作流程 - **更好的训练循环实现**:优化的训练算法,在模型微调过程中改善收敛性和稳定性 - **增强的错误检测**:为LoRA操作添加模型检测错误提示,在出现问题时提供更清晰的反馈 -## 平台和性能改进 +### 平台和性能改进 - **异步节点支持**:完全支持异步节点函数,优化早期执行,改善I/O密集型操作的工作流程性能 - **Chroma灵活性**:在Chroma中取消硬编码的patch_size参数,允许更好地适应不同的模型配置 - **LTXV VAE解码器**:切换到改进的默认填充模式,提高LTXV模型的图像质量 - **Safetensors内存管理**:为mmap问题添加解决方案,提高加载大型模型文件时的可靠性 -## API和集成增强 +### API和集成增强 - **自定义提示ID**:API现在允许指定提示ID,以便更好地跟踪和管理工作流程 - **Kling API优化**:增加轮询超时时间,防止视频生成工作流程中的用户超时 - **历史令牌清理**:从历史项目中删除敏感令牌以提高安全性 - **Python 3.9兼容性**:修复兼容性问题,确保更广泛的平台支持 -## 错误修复和稳定性 +### 错误修复和稳定性 - **MaskComposite修复**:解决目标蒙版具有2个维度时的错误,提高蒙版工作流程可靠性 - **Fresca输入/输出**:修正Fresca模型工作流程的输入和输出处理 - **引用错误修复**:解决Gemini节点实现中的错误引用问题 - **行结束标准化**:自动检测和删除Windows行结束符,确保跨平台一致性 -## 开发者体验 +### 开发者体验 - **警告系统**:添加torch导入错误警告,以捕获常见配置问题 - **模板更新**:多个模板版本更新(0.1.36、0.1.37、0.1.39),改进自定义节点开发 - **文档**:增强便携式配置中fast_fp16_accumulation的文档 @@ -185,37 +185,37 @@ icon: "clock-rotate-left" 此版本在采样算法和模型控制系统方面提供了重大改进,特别有利于高级AI研究人员和工作流创建者: -## 新采样功能 +### 新采样功能 - **TCFG节点**:增强的分类器无关引导控制,为您的工作流提供更细致的生成控制 - **ER-SDE采样器**:从VE迁移到VP算法,配备新的采样器节点,为复杂生成任务提供更好的数值稳定性 - **跳层引导(SLG)**:用于推理期间精确层级控制的替代实现,完美适用于高级模型导向工作流 -## 增强的开发工具 +### 增强的开发工具 - **自定义节点管理**:新的`--whitelist-custom-nodes`参数与`--disable-all-custom-nodes`配对,提供精确的开发控制 - **性能优化**:双CFG节点现在在CFG为1.0时自动优化,减少计算开销 - **GitHub Actions集成**:自动化发布webhook通知让开发者及时了解新更新 -## 图像处理改进 +### 图像处理改进 - **新变换节点**:添加了ImageRotate和ImageFlip节点,增强图像操作工作流 - **ImageColorToMask修复**:修正了掩码值返回,提供更准确的基于颜色的掩码操作 - **3D模型支持**:上传3D模型到自定义子文件夹,为复杂项目提供更好的组织 -## 引导和条件增强 +### 引导和条件增强 - **PerpNeg引导器**:更新了改进的前后CFG处理以及性能优化 - **潜在条件修复**:解决了多步骤工作流中索引 > 0 的条件问题 - **去噪步骤**:为多个采样器添加去噪步骤支持,获得更清洁的输出 -## 平台稳定性 +### 平台稳定性 - **PyTorch兼容性**:修复了PyTorch nightly构建的连续内存问题 - **FP8回退**:当FP8操作遇到异常时自动回退到常规操作 - **音频处理**:移除了已弃用的torchaudio.save函数依赖并修复警告 -## 模型集成 +### 模型集成 - **Moonvalley节点**:为Moonvalley模型工作流添加原生支持 - **调度器重新排序**:简单调度器现在默认优先,提供更好的用户体验 - **模板更新**:多个模板版本更新(0.1.31-0.1.35),改进自定义节点开发 -## 安全性和安全保护 +### 安全性和安全保护 - **安全加载**:在不安全加载文件时添加警告,文档说明检查点文件默认安全加载 - **文件验证**:增强检查点加载安全措施,确保工作流安全执行 @@ -274,27 +274,27 @@ icon: "clock-rotate-left" 本次发布为 ComfyUI 创作者带来了强大的新工作流实用工具和性能优化: -## 新的工作流工具 +### 新的工作流工具 - **ImageStitch 节点**:在工作流中无缝拼接多个图像 - 非常适合创建对比网格或复合输出 - **GetImageSize 节点**:提取图像尺寸并支持批处理,对于动态调整大小的工作流至关重要 - **Regex Replace 节点**:高级文本处理功能,适用于提示词工程和字符串处理工作流 -## 增强的模型兼容性 +### 增强的模型兼容性 - **改进的张量处理**:简化的列表处理使复杂的多模型工作流更加可靠 - **BFL API 优化**:完善了对 Kontext [pro] 和 [max] 模型的支持,提供更清晰的节点界面 - **性能提升**:在色度处理中使用融合乘加运算,加快生成速度 -## 开发者体验改进 +### 开发者体验改进 - **自定义节点支持**:添加 pyproject.toml 支持,改善自定义节点依赖管理 - **帮助菜单集成**:在节点库侧边栏中新增帮助系统,加快节点发现速度 - **API 文档**:增强 API 节点文档,支持工作流自动化 -## 前端和 UI 增强 +### 前端和 UI 增强 - **前端更新至 v1.21.7**:多项稳定性修复和性能改进 - **自定义 API 基础支持**:改进了自定义部署配置的子路径处理 - **安全加固**:修复 XSS 漏洞,确保工作流分享更安全 -## 错误修复和稳定性 +### 错误修复和稳定性 - **Pillow 兼容性**:更新了已弃用的 API 调用,保持与最新图像处理库的兼容性 - **ROCm 支持**:改进了 AMD GPU 用户的版本检测 - **模板更新**:增强了自定义节点开发的项目模板 diff --git a/zh-CN/tutorials/video/wan/wan2-2-fun-control.mdx b/zh-CN/tutorials/video/wan/wan2-2-fun-control.mdx new file mode 100644 index 000000000..bd6a00531 --- /dev/null +++ b/zh-CN/tutorials/video/wan/wan2-2-fun-control.mdx @@ -0,0 +1,126 @@ +--- +title: "ComfyUI Wan2.2 Fun Control 视频控制生成示例" +description: "本文介绍了如何在 ComfyUI 中完成 Wan2.2 Fun Control 使用控制视频来完成视频生成的示例" +sidebarTitle: "Wan2.2 Fun Control" +--- + +import UpdateReminder from '/snippets/zh/tutorials/update-reminder.mdx' + +**Wan2.2-Fun-Control** 是 Alibaba PAI 团队推出的新一代视频生成与控制模型,通过引入创新性的控制代码(Control Codes)机制,结合深度学习和多模态条件输入,能够生成高质量且符合预设控制条件的视频。该模型采用 **Apache 2.0 许可协议**发布,支持商业使用。 + +**核心功能**: +- **多模态控制**:支持多种控制条件,包括 **Canny(线稿)**、**Depth(深度)**、**OpenPose(人体姿势)**、**MLSD(几何边缘)** 等,同时支持使用 **轨迹控制** +- **高质量视频生成**:基于 Wan2.2 架构,输出影视级质量视频 +- **多语言支持**:支持中英文等多语言提示词输入 + +下面是相关模型权重和代码仓库: + +- [🤗Wan2.2-Fun-A14B-Control](https://huggingface.co/alibaba-pai/Wan2.2-Fun-A14B-Control) +- 代码仓库:[VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun) + + + +## Wan2.2 Fun Control 视频控制生成工作流示例 + +这里提供的工作流包含了两个版本: +1. 使用了 lightx2v 的 [Wan2.2-Lightning](https://huggingface.co/lightx2v/Wan2.2-Lightning) 4 步 LoRA : 但可能导致生成的视频动态会有损失,但速度会更快 +2. 没有使用加速 LoRA 的 fp8_scaled 版本 + +下面是使用 RTX4090D 24GB 显存 GPU 测试的结果 640*640 分辨率, 81 帧长度的用时对比 + +| 模型类型 | 分辨率 | 显存占用 | 首次生成时长 | 第二次生成时长 | +| ------------------------ | ------- | -------- | ------------ | -------------- | +| fp8_scaled | 640×640 | 83% | ≈ 524秒 | ≈ 520秒 | +| fp8_scaled + 4步LoRA加速 | 640×640 | 89% | ≈ 138秒 | ≈ 79秒 | + +由于使用了4 步 LoRA 对于初次使用工作流的用户体验较好, 但可能导致生成的视频动态会有损失, 我们默认启用了使用了加速 LoRA 版本,如果你需要启用另一组的工作流,框选后使用 **Ctrl+B** 即可启用 + + +### 1. 工作流及素材下载 + +下载下面的视频或者 JSON 文件并拖入 ComfyUI 中以加载对应的工作流 + + + + +

下载 JSON 格式工作流

+
+ +请下载下面的图片及视频,我们将作为输入。 + +![输入起始图片](https://raw.githubusercontent.com/Comfy-Org/example_workflows/refs/heads/main/video/wan/wan2.2_fun_control/input.jpg) + + + +> 这里我们使用了经过预处理的视频, 可以直接用于控制视频生成 + +### 2. 手动下载模型 + +下面的模型你可以在 [Wan_2.2_ComfyUI_Repackaged](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged) 找到 + +**Diffusion Model** +- [wan2.2_fun_control_high_noise_14B_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_fun_control_high_noise_14B_fp8_scaled.safetensors) +- [wan2.2_fun_control_low_noise_14B_fp8_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/diffusion_models/wan2.2_fun_control_low_noise_14B_fp8_scaled.safetensors) + +** Wan2.2-Lightning LoRA (可选,用于加速)** +- [wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/loras/wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors) +- [wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/loras/wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors) + +**VAE** +- [wan_2.1_vae.safetensors](https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors) + +**Text Encoder** +- [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors) + + +File save location + +``` +ComfyUI/ +├───📂 models/ +│ ├───📂 diffusion_models/ +│ │ ├─── wan2.2_fun_control_low_noise_14B_fp8_scaled.safetensors +│ │ └─── wan2.2_fun_control_high_noise_14B_fp8_scaled.safetensors +│ ├───📂 loras/ +│ │ ├─── wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors +│ │ └─── wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors +│ ├───📂 text_encoders/ +│ │ └─── umt5_xxl_fp8_e4m3fn_scaled.safetensors +│ └───📂 vae/ +│ └── wan_2.1_vae.safetensors +``` + + +### 3. 按步骤完成工作流 + +![Wan2.2 Fun Control 工作流步骤](/images/tutorial/video/wan/wan2_2/wan_2.2_14b_fun_control.jpg) + + + 这个工作流是使用了 LoRA 的工作流,请确保对应的 Diffusion model 和 LoRA 是一致的, high noise 和 low noise 的模型和 LoRA 需要对应使用 + + +1. **High noise** 模型及 **LoRA** 加载 + - 确保 `Load Diffusion Model` 节点加载了 `wan2.2_fun_control_high_noise_14B_fp8_scaled.safetensors` 模型 + - 确保 `LoraLoaderModelOnly` 节点加载了 `wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors` +2. **Low noise** 模型及 **LoRA** 加载 + - 确保 `Load Diffusion Model` 节点加载了 `wan2.2_fun_control_low_noise_14B_fp8_scaled.safetensors` 模型 + - 确保 `LoraLoaderModelOnly` 节点加载了 `wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors` +3. 确保 `Load CLIP` 节点加载了 `umt5_xxl_fp8_e4m3fn_scaled.safetensors` 模型 +4. 确保 `Load VAE` 节点加载了 `wan_2.1_vae.safetensors` 模型 +5. 在 `Load Image` 节点上传起始帧 +6. 在第二个 `Load video` 节点控制视频的 pose 视频, 提供的视频已经经过预处理可以直接使用 +7. 由于我们提供的视频是预处理过的 pose 视频,所以对应的视频图像预处理节点需要禁用,你可以选中后使用 Ctrl + B` 来禁用 +8. 修改 Prompt 使用中英文都可以 +9. 在 `Wan22FunControlToVideo` 修改对应视频的尺寸, 默认设置了 640*640 的分辨率来避免低显存用户使用这个工作流时过于耗时 +10. 点击 `Run` 按钮,或者使用快捷键 `Ctrl(cmd) + Enter(回车)` 来执行视频生成 + +### 补充说明 +由于在 ComfyUI 自带的节点中,预处理器节点只有 Canny 的预处理器,你可以使用使用类似 [ComfyUI-comfyui_controlnet_aux](https://github.com/Fannovel16/comfyui_controlnet_aux) 来实现其它类型的图像预处理 \ No newline at end of file diff --git a/zh-CN/tutorials/video/wan/wan2-2-fun-inp.mdx b/zh-CN/tutorials/video/wan/wan2-2-fun-inp.mdx index 28f6f1cb6..1742f4a94 100644 --- a/zh-CN/tutorials/video/wan/wan2-2-fun-inp.mdx +++ b/zh-CN/tutorials/video/wan/wan2-2-fun-inp.mdx @@ -25,25 +25,21 @@ import UpdateReminder from '/snippets/zh/tutorials/update-reminder.mdx' ## Wan2.2 Fun Inp 首尾帧视频生成工作流示例 -这里提供的工作流包含了两个版本的 -1. 使用了 lightx2v 的 [Wan2.2-Lightning](https://huggingface.co/lightx2v/Wan2.2-Lightning) 4 步 LoRA 来实现视频生成提速的版本 -2. 没有使用加速 LoRA 的 fp8_scaled 版本 +这里提供的工作流包含了两个版本: +1. 使用了 lightx2v 的 [Wan2.2-Lightning](https://huggingface.co/lightx2v/Wan2.2-Lightning) 4 步 LoRA : 但可能导致生成的视频动态会有损失,但速度会更快 +2. 没有使用加速 LoRA 的 fp8_scaled 版本 -下面是使用 RTX4090D 24GB 显存 GPU 测试的结果 +下面是使用 RTX4090D 24GB 显存 GPU 测试的结果 640*640 分辨率, 81 帧长度的用时对比 | 模型类型 | 分辨率 | 显存占用 | 首次生成时长 | 第二次生成时长 | | ------------------------ | ------- | -------- | ------------ | -------------- | | fp8_scaled | 640×640 | 83% | ≈ 524秒 | ≈ 520秒 | | fp8_scaled + 4步LoRA加速 | 640×640 | 89% | ≈ 138秒 | ≈ 79秒 | -由于使用了加速 LoRA 后提速较为明显,在提供的两组工作流中,我们默认启用了使用了加速 LoRA 版本,如果你需要启用另一组的工作流,框选后使用 **Ctrl+B** 即可启用 +由于使用了加速 LoRA 后提速较为明显,虽然动态有所损失,但对低显存用户较为友好,所以在提供的两组工作流中,我们默认启用了使用了加速 LoRA 版本,如果你需要启用另一组的工作流,框选后使用 **Ctrl+B** 即可启用 ### 1. 工作流文件下载 -请更新你的 ComfyUI 到最新版本,并通过菜单 `工作流` -> `浏览模板` -> `视频` 找到 "**Wan2.2 Fun Inp**" 以加载工作流 - -或者更新你的 ComfyUI 到最新版本后,下载下面的工作流并拖入 ComfyUI 以加载工作流 -