cd ComfyUI/custom_nodes/
git clone https://github.com/yourusername/ComfyUI-Qwen3-TTS
cd ComfyUI-Qwen3-TTS
pip install -r requirements.txtpip install flash-attn --no-build-isolation-
Add these nodes:
Qwen3 TTS Model LoaderQwen3 TTS Custom VoicePreview Audio(built-in)Save Audio(built-in)
-
Connect them:
Model Loader → Custom Voice → Preview Audio → Save Audio -
Configure:
- Model Loader: Select
Qwen3-TTS-12Hz-1.7B-CustomVoice - Custom Voice:
- Text: "Hello! This is my first AI voice generation."
- Speaker: "Vivian"
- Language: "English"
- Model Loader: Select
-
Queue Prompt!
Use Case: Generate narration in multiple languages with consistent voice
Nodes:
Model Loader (CustomVoice)
→ Custom Voice #1 (English, Ryan) → Preview
→ Custom Voice #2 (Chinese, Dylan) → Preview
→ Custom Voice #3 (Japanese, Ono_Anna) → Preview
Tips:
- Use the same model instance for all generations (faster)
- Each speaker has a native language for best quality
- Can use any speaker with any language
Use Case: Clone your voice or any voice from an audio file
Nodes:
Model Loader (Base)
→ Load Audio
→ Voice Clone (File)
→ Preview Audio
Setup:
- Model Loader:
Qwen3-TTS-12Hz-1.7B-Base - Load Audio: Choose your reference audio (3+ seconds)
- Voice Clone:
- Text: What you want to say
- ref_audio_path: Path to your audio file
- ref_text: Exact transcript of the audio
- Language: Target language
Example:
- ref_audio_path:
/path/to/my_voice.wav - ref_text: "This is my natural speaking voice."
- text: "Now I can say anything in this voice!"
Use Case: Create unique character voices for storytelling/games
Nodes:
Model Loader (VoiceDesign)
→ Voice Design
→ Preview Audio
Character Examples:
Grumpy Old Wizard:
Text: "Back in my day, we didn't have fancy magic wands!"
Description: "Elderly male voice, gruff and gravelly, annoyed tone with slight wheeze"
Language: English
Cheerful Shopkeeper:
Text: "欢迎光临!今天有特价哦!"
Description: "中年男性,热情洋溢,略带地方口音,语速稍快"
Language: Chinese
Mysterious Villain:
Text: "You fools, you've walked right into my trap."
Description: "Deep male voice, smooth and sinister, speaking slowly with dramatic pauses"
Language: English
Use Case: Generate multiple lines with the same cloned voice efficiently
Nodes:
Model Loader (Base)
→ Load Audio
→ Create Clone Prompt
→ Clone with Prompt #1 (Line 1)
→ Clone with Prompt #2 (Line 2)
→ Clone with Prompt #3 (Line 3)
Why This Is Better:
- Creates the voice embedding ONCE
- Reuses it for multiple generations
- 3-5x faster than cloning separately
- Perfect for audiobooks, podcasts, tutorials
Example Text List:
Welcome to this tutorial.
Today we'll learn about AI voice synthesis.
Let's get started!
Use Case: Same voice, different emotions
Nodes:
Model Loader (CustomVoice)
→ Custom Voice #1 → Preview (Happy)
→ Custom Voice #2 → Preview (Sad)
→ Custom Voice #3 → Preview (Angry)
Setup All with Same Text: "I can't believe what just happened!"
Different Instructions:
- Happy: "Very excited and joyful"
- Sad: "Disappointed and melancholic"
- Angry: "Furious and intense"
- Model Caching: Load model once, use multiple times
- Flash Attention: Always enable if available
- bfloat16: Best speed/quality balance
- Batch Generation: Use for multiple similar generations
- Clone Prompts: Reuse for same voice
- Clean Audio: For cloning, use clear audio with minimal background noise
- Accurate Transcripts: Especially important for cloning
- Native Languages: Use speaker's native language when possible
- Specific Descriptions: More detail = better voice design
- Punctuation: Affects rhythm and pauses
❌ Using VoiceDesign model for CustomVoice (won't work) ❌ Forgetting ref_text when cloning ❌ Using dirty/noisy reference audio ❌ Expecting real-time on CPU (use GPU!) ❌ Not caching models (loads every time)
- Use Dylan/Eric for regional dialects
- Punctuation matters: 。!?
- Works great with poetry/formal text
- Ryan = energetic, Aiden = casual
- Great with complex sentences
- Handles slang well
- Ono_Anna optimized for Japanese phonetics
- Handles kanji/kana naturally
- Good for anime-style voices
- Sohee native pronunciation
- Handles honorifics correctly
Problem: "qwen-tts not installed"
pip install qwen-ttsProblem: "Out of memory"
- Use 0.6B model instead of 1.7B
- Lower dtype to float16
- Close other programs
- Reduce batch size
Problem: "Model download slow"
# China users:
export HF_ENDPOINT=https://hf-mirror.comProblem: "Voice quality poor"
- Check reference audio quality (cloning)
- Verify transcript accuracy
- Try bfloat16 instead of float16
- Use 1.7B instead of 0.6B
Problem: "Flash Attention error"
- Disable in Model Loader settings
- Or install:
pip install flash-attn --no-build-isolation
- Check out README.md for full documentation
- Experiment with different voices and languages
- Try voice design with creative descriptions
- Share your workflows with the community!
- GitHub Issues: Report bugs and request features
- Discord: Join the ComfyUI community
- Documentation: Qwen3-TTS Official Repo
Happy voice generating! 🎙️