You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+34-19Lines changed: 34 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@
4
4
5
5
Official release of [SeedVR2](https://github.com/ByteDance-Seed/SeedVR) for ComfyUI that enables high-quality video and image upscaling.
6
6
7
-
Can run as **Multi-GPU standalone CLI** too, see [🖥️ Run as Standalone](#️-run-as-standalone-cli) section.
7
+
Can run as **Multi-GPU standalone CLI** too, see [🖥️ Run as Standalone](#-run-as-standalone-cli) section.
8
8
9
9
[](https://youtu.be/MBtWYXq_r60)
10
10
@@ -14,8 +14,8 @@ Can run as **Multi-GPU standalone CLI** too, see [🖥️ Run as Standalone](#
14
14
15
15
## 📋 Quick Access
16
16
17
-
-[🆙 Future Releases](#-future-releases)
18
-
-[🚀 Updates](#-updates)
17
+
-[🆙 Future Work](#-future-work)
18
+
-[🚀 Release Notes](#-release-notes)
19
19
-[🎯 Features](#-features)
20
20
-[🔧 Requirements](#-requirements)
21
21
-[📦 Installation](#-installation)
@@ -26,15 +26,24 @@ Can run as **Multi-GPU standalone CLI** too, see [🖥️ Run as Standalone](#
26
26
-[🙏 Credits](#-credits)
27
27
-[📜 License](#-license)
28
28
29
-
## 🆙 Future Releases
29
+
## 🆙 Future Work
30
30
31
31
We're actively working on improvements and new features. To stay informed:
32
32
33
33
-**📌 Track Active Development**: Visit [Issues](https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler/issues) to see active development, report bugs, and request new features
34
34
-**💬 Join the Community**: Learn from others, share your workflows, and get help in the [Discussions](https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler/discussions)
35
35
-**🔮 Next Model Survey**: We're looking for community input on the next open-source super-powerful generic restoration model. Share your suggestions in [Issue #164](https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler/issues/164)
36
36
37
-
## 🚀 Updates
37
+
## 🚀 Release Notes
38
+
39
+
**2025.12.12 - Version 2.5.20**
40
+
41
+
-**⚡ Expanded attention backends** - Full support for Flash Attention 2 (Ampere+), Flash Attention 3 (Hopper+), SageAttention 2, and SageAttention 3 (Blackwell/RTX 50xx), with automatic fallback chains to PyTorch SDPA when unavailable *(based on PR by [@naxci1](https://github.com/naxci1) - thank you!)*
42
+
-**🍎 macOS/Apple Silicon compatibility** - Replaced MPS autocast with explicit dtype conversion throughout VAE and DiT pipelines, resolving hangs and crashes on M-series Macs. BlockSwap now auto-disables with warning (unified memory makes it meaningless)
43
+
-**🛡️ Flash Attention graceful fallback** - Added compatibility shims for corrupted or partially installed flash_attn/xformers DLLs, preventing startup crashes
44
+
-**🛡️ AMD ROCm: bitsandbytes conflict fix** - Prevent kernel registration errors when diffusers attempts to re-import broken bitsandbytes installations
45
+
-**📦 ComfyUI Manager: macOS classifier fix** - Removed NVIDIA CUDA classifier causing false "GPU not supported" warnings on macOS
46
+
-**📚 Documentation updates** - Updated README with attention backend details, BlockSwap macOS notes, and clarified model caching descriptions
38
47
39
48
**2025.12.10 - Version 2.5.19**
40
49
@@ -232,7 +241,7 @@ We're actively working on improvements and new features. To stay informed:
232
241
233
242
**2025.07.03**
234
243
235
-
- 🛠️ Can run as **standalone mode** with **Multi GPU** see [🖥️ Run as Standalone](#️-run-as-standalone-cli)
244
+
- 🛠️ Can run as **standalone mode** with **Multi GPU** see [🖥️ Run as Standalone](#run-as-standalone-cli)
236
245
237
246
**2025.06.30**
238
247
@@ -279,8 +288,8 @@ We're actively working on improvements and new features. To stay informed:
279
288
### Performance Features
280
289
-**torch.compile Integration**: Optional 20-40% DiT speedup and 15-25% VAE speedup with PyTorch 2.0+ compilation
281
290
-**Multi-GPU CLI**: Distribute workload across multiple GPUs with automatic temporal overlap blending
282
-
-**Model Caching**: Keep models loaded in memory for faster batch processing
283
-
-**Flexible Attention Backends**: Choose between PyTorch SDPA (stable, always available) or Flash Attention 2 (faster on supported hardware)
291
+
-**Model Caching**: Keep models loaded between generations for single-GPU directory processing or multi-GPU streaming
292
+
-**Flexible Attention Backends**: Choose between PyTorch SDPA (stable, always available), Flash Attention 2/3, or SageAttention 2/3 for faster computation on supported hardware
284
293
285
294
### Quality Control
286
295
-**Advanced Color Correction**: Five methods including LAB (recommended for highest fidelity), wavelet, wavelet adaptive, HSV, and AdaIN
@@ -309,7 +318,7 @@ With the current optimizations (tiling, BlockSwap, GGUF quantization), SeedVR2 c
309
318
-**Python**: 3.12+ (Python 3.12 and 3.13 tested and recommended)
310
319
-**PyTorch**: 2.0+ for torch.compile support (optional but recommended)
311
320
-**Triton**: Required for torch.compile with inductor backend (optional)
312
-
-**Flash Attention 2**: Provides faster attention computation on supported hardware (optional, falls back to PyTorch SDPA)
321
+
-**Flash Attention / SageAttention**: Flash Attention 2 (Ampere+), Flash Attention 3 (Hopper+), SageAttention 2 or SageAttention 3 (Blackwell) provide faster attention computation on supported hardware (optional, falls back to PyTorch SDPA)
313
322
314
323
## 📦 Installation
315
324
@@ -424,14 +433,21 @@ Configure the DiT (Diffusion Transformer) model for video upscaling.
424
433
- Requires offload_device to be set and different from device
-**torch_compile_args**: Connect to SeedVR2 Torch Compile Settings node for 20-40% speedup
431
443
432
444
**BlockSwap Explained:**
433
445
434
-
BlockSwap enables running large models on GPUs with limited VRAM by dynamically swapping transformer blocks between GPU and CPU memory during inference. Here's how it works:
446
+
BlockSwap enables running large models on GPUs with limited VRAM by dynamically swapping transformer blocks between GPU and CPU memory during inference.
447
+
448
+
> **Note:** BlockSwap is not available on macOS. Apple Silicon Macs use unified memory architecture where GPU and CPU share the same memory pool, making BlockSwap meaningless. The option will be automatically disabled with a warning if requested on macOS.
449
+
450
+
Here's how it works:
435
451
436
452
-**What it does**: Keeps only the currently-needed transformer blocks on the GPU, while storing the rest on CPU or another device
437
453
-**When to use it**: When you get OOM (Out of Memory) errors during the upscaling phase
-`--dit_offload_device`: Device to offload DiT model: 'none' (keep on GPU), 'cpu', or 'cuda:X' (default: none)
869
885
-`--vae_offload_device`: Device to offload VAE model: 'none', 'cpu', or 'cuda:X' (default: none)
870
-
-`--blocks_to_swap`: Number of transformer blocks to swap (0=disabled, 3B: 0-32, 7B: 0-36). Requires dit_offload_device (default: 0)
871
-
-`--swap_io_components`: Offload I/O components for additional VRAM savings. Requires dit_offload_device
872
-
-`--use_non_blocking`: Use non-blocking memory transfers for BlockSwap (recommended)
886
+
-`--blocks_to_swap`: Number of transformer blocks to swap (0=disabled, 3B: 0-32, 7B: 0-36). Requires dit_offload_device (default: 0). Not available on macOS.
887
+
-`--swap_io_components`: Offload I/O components for additional VRAM savings. Requires dit_offload_device. Not available on macOS.
873
888
874
889
**VAE Tiling:**
875
890
-`--vae_encode_tiled`: Enable VAE encode tiling to reduce VRAM during encoding
-`--compile_dynamo_recompile_limit`: Max recompilation attempts before fallback (default: 128)
894
909
895
910
**Model Caching (batch processing):**
896
-
-`--cache_dit`: Cache DiT model between files (single GPU only, speeds up directory processing)
897
-
-`--cache_vae`: Cache VAE model between files (single GPU only, speeds up directory processing)
911
+
-`--cache_dit`: Keep DiT model in memory between generations. Works with single-GPU directory processing or multi-GPU streaming (`--chunk_size`). Requires `--dit_offload_device`
912
+
-`--cache_vae`: Keep VAE model in memory between generations. Works with single-GPU directory processing or multi-GPU streaming (`--chunk_size`). Requires `--vae_offload_device`
898
913
899
914
**Multi-GPU:**
900
915
-`--cuda_device`: CUDA device id(s). Single id (e.g., '0') or comma-separated list '0,1' for multi-GPU
@@ -997,7 +1012,7 @@ For detailed contribution guidelines, see [CONTRIBUTING.md](CONTRIBUTING.md).
997
1012
998
1013
This ComfyUI implementation is a collaborative project by **[NumZ](https://github.com/numz)** and **[AInVFX](https://www.youtube.com/@AInVFX)** (Adrien Toupet), based on the original [SeedVR2](https://github.com/ByteDance-Seed/SeedVR) by ByteDance Seed Team.
999
1014
1000
-
Special thanks to our community contributors including [benjaminherb](https://github.com/benjaminherb), [cmeka](https://github.com/cmeka), [FurkanGozukara](https://github.com/FurkanGozukara), [JohnAlcatraz](https://github.com/JohnAlcatraz), [lihaoyun6](https://github.com/lihaoyun6), [Luchuanzhao](https://github.com/Luchuanzhao), [Luke2642](https://github.com/Luke2642), [naxci1](https://github.com/naxci1), [q5sys](https://github.com/q5sys), and many others for their improvements, bug fixes, and testing.
1015
+
Special thanks to our community contributors including [naxci1](https://github.com/naxci1), [benjaminherb](https://github.com/benjaminherb), [cmeka](https://github.com/cmeka), [FurkanGozukara](https://github.com/FurkanGozukara), [JohnAlcatraz](https://github.com/JohnAlcatraz), [lihaoyun6](https://github.com/lihaoyun6), [Luchuanzhao](https://github.com/Luchuanzhao), [Luke2642](https://github.com/Luke2642), [proxyid](https://github.com/proxyid), [q5sys](https://github.com/q5sys), and many others for their improvements, bug fixes, and testing.
debug.log(f"VAE decode tile overlap ({args.vae_decode_tile_overlap}) must be smaller than tile size ({args.vae_decode_tile_size})", level="ERROR", category="vae", force=True)
1436
1439
sys.exit(1)
1437
1440
1438
-
# Validate BlockSwap configuration - either blocks_to_swap or swap_io_components requires dit_offload_device
0 commit comments