v1.2.3 - SageAttention Support #94
Enemyx-net
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
⚡ SageAttention Integration
🚀 Performance Improvements
Speed Comparison
✨ How It Works
SageAttention uses quantized operations (INT8/FP8) instead of full precision:
⚙️ Requirements
For SageAttention:
🎛️ Usage
Select "sage" from the attention_type dropdown:
auto
: Let transformers choose (default)eager
: Standard implementationsdpa
: Scaled Dot Product Attentionflash_attention_2
: Flash Attention 2sage
: SageAttention (NEW) - quantized for speed💾 Installation
Install via ComfyUI Manager or manually:
git clone https://github.com/Enemyx-net/VibeVoice-ComfyUI
📋 Notes
SageAttention is only available for NVIDIA GPUs with CUDA
Falls back to standard attention if requirements aren't met
Ideal for production environments prioritizing speed
Compatible with all model variants including 4-bit quantized models
This discussion was created from the release v1.2.3 - SageAttention Support.
Beta Was this translation helpful? Give feedback.
All reactions