Qwen3-TTS ComfyUI Quick Start Guide

5-Minute Setup

1. Installation

cd ComfyUI/custom_nodes/
git clone https://github.com/yourusername/ComfyUI-Qwen3-TTS
cd ComfyUI-Qwen3-TTS
pip install -r requirements.txt

2. Optional: Install Flash Attention (2x faster)

pip install flash-attn --no-build-isolation

3. Restart ComfyUI

Your First Workflow

Simple TTS Generation (30 seconds)

Add these nodes:
- Qwen3 TTS Model Loader
- Qwen3 TTS Custom Voice
- Preview Audio (built-in)
- Save Audio (built-in)

Connect them:

Model Loader → Custom Voice → Preview Audio → Save Audio

Configure:
- Model Loader: Select Qwen3-TTS-12Hz-1.7B-CustomVoice
- Custom Voice:
  - Text: "Hello! This is my first AI voice generation."
  - Speaker: "Vivian"
  - Language: "English"
Queue Prompt!

Common Workflows

Workflow 1: Multilingual Narration

Use Case: Generate narration in multiple languages with consistent voice

Nodes:

Model Loader (CustomVoice) 
  → Custom Voice #1 (English, Ryan) → Preview
  → Custom Voice #2 (Chinese, Dylan) → Preview
  → Custom Voice #3 (Japanese, Ono_Anna) → Preview

Tips:

Use the same model instance for all generations (faster)
Each speaker has a native language for best quality
Can use any speaker with any language

Workflow 2: Voice Cloning from File

Use Case: Clone your voice or any voice from an audio file

Nodes:

Model Loader (Base)
  → Load Audio
  → Voice Clone (File)
  → Preview Audio

Setup:

Model Loader: Qwen3-TTS-12Hz-1.7B-Base
Load Audio: Choose your reference audio (3+ seconds)
Voice Clone:
- Text: What you want to say
- ref_audio_path: Path to your audio file
- ref_text: Exact transcript of the audio
- Language: Target language

Example:

ref_audio_path: /path/to/my_voice.wav
ref_text: "This is my natural speaking voice."
text: "Now I can say anything in this voice!"

Workflow 3: Custom Character Voice Design

Use Case: Create unique character voices for storytelling/games

Nodes:

Model Loader (VoiceDesign)
  → Voice Design
  → Preview Audio

Character Examples:

Grumpy Old Wizard:

Text: "Back in my day, we didn't have fancy magic wands!"
Description: "Elderly male voice, gruff and gravelly, annoyed tone with slight wheeze"
Language: English

Cheerful Shopkeeper:

Text: "欢迎光临！今天有特价哦！"
Description: "中年男性，热情洋溢，略带地方口音，语速稍快"
Language: Chinese

Mysterious Villain:

Text: "You fools, you've walked right into my trap."
Description: "Deep male voice, smooth and sinister, speaking slowly with dramatic pauses"
Language: English

Workflow 4: Batch Voice Cloning (Efficient)

Use Case: Generate multiple lines with the same cloned voice efficiently

Nodes:

Model Loader (Base)
  → Load Audio
  → Create Clone Prompt
    → Clone with Prompt #1 (Line 1)
    → Clone with Prompt #2 (Line 2)
    → Clone with Prompt #3 (Line 3)

Why This Is Better:

Creates the voice embedding ONCE
Reuses it for multiple generations
3-5x faster than cloning separately
Perfect for audiobooks, podcasts, tutorials

Example Text List:

Welcome to this tutorial.
Today we'll learn about AI voice synthesis.
Let's get started!

Workflow 5: Emotion Control

Use Case: Same voice, different emotions

Nodes:

Model Loader (CustomVoice)
  → Custom Voice #1 → Preview (Happy)
  → Custom Voice #2 → Preview (Sad)
  → Custom Voice #3 → Preview (Angry)

Setup All with Same Text: "I can't believe what just happened!"

Different Instructions:

Happy: "Very excited and joyful"
Sad: "Disappointed and melancholic"
Angry: "Furious and intense"

Pro Tips

Speed Optimization

Model Caching: Load model once, use multiple times
Flash Attention: Always enable if available
bfloat16: Best speed/quality balance
Batch Generation: Use for multiple similar generations
Clone Prompts: Reuse for same voice

Quality Optimization

Clean Audio: For cloning, use clear audio with minimal background noise
Accurate Transcripts: Especially important for cloning
Native Languages: Use speaker's native language when possible
Specific Descriptions: More detail = better voice design
Punctuation: Affects rhythm and pauses

Common Mistakes to Avoid

❌ Using VoiceDesign model for CustomVoice (won't work) ❌ Forgetting ref_text when cloning ❌ Using dirty/noisy reference audio ❌ Expecting real-time on CPU (use GPU!) ❌ Not caching models (loads every time)

Language-Specific Tips

Chinese

Use Dylan/Eric for regional dialects
Punctuation matters: 。！？
Works great with poetry/formal text

English

Ryan = energetic, Aiden = casual
Great with complex sentences
Handles slang well

Japanese

Ono_Anna optimized for Japanese phonetics
Handles kanji/kana naturally
Good for anime-style voices

Korean

Sohee native pronunciation
Handles honorifics correctly

Troubleshooting Quick Fixes

Problem: "qwen-tts not installed"

pip install qwen-tts

Problem: "Out of memory"

Use 0.6B model instead of 1.7B
Lower dtype to float16
Close other programs
Reduce batch size

Problem: "Model download slow"

# China users:
export HF_ENDPOINT=https://hf-mirror.com

Problem: "Voice quality poor"

Check reference audio quality (cloning)
Verify transcript accuracy
Try bfloat16 instead of float16
Use 1.7B instead of 0.6B

Problem: "Flash Attention error"

Disable in Model Loader settings
Or install: pip install flash-attn --no-build-isolation

Next Steps

Check out README.md for full documentation
Experiment with different voices and languages
Try voice design with creative descriptions
Share your workflows with the community!

Need Help?

GitHub Issues: Report bugs and request features
Discord: Join the ComfyUI community
Documentation: Qwen3-TTS Official Repo

Happy voice generating! 🎙️

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3-TTS ComfyUI Quick Start Guide

5-Minute Setup

1. Installation

2. Optional: Install Flash Attention (2x faster)

3. Restart ComfyUI

Your First Workflow

Simple TTS Generation (30 seconds)

Common Workflows

Workflow 1: Multilingual Narration

Workflow 2: Voice Cloning from File

Workflow 3: Custom Character Voice Design

Workflow 4: Batch Voice Cloning (Efficient)

Workflow 5: Emotion Control

Pro Tips

Speed Optimization

Quality Optimization

Common Mistakes to Avoid

Language-Specific Tips

Chinese

English

Japanese

Korean

Troubleshooting Quick Fixes

Next Steps

Need Help?

FilesExpand file tree

QUICKSTART.md

Latest commit

History

QUICKSTART.md

File metadata and controls

Qwen3-TTS ComfyUI Quick Start Guide

5-Minute Setup

1. Installation

2. Optional: Install Flash Attention (2x faster)

3. Restart ComfyUI

Your First Workflow

Simple TTS Generation (30 seconds)

Common Workflows

Workflow 1: Multilingual Narration

Workflow 2: Voice Cloning from File

Workflow 3: Custom Character Voice Design

Workflow 4: Batch Voice Cloning (Efficient)

Workflow 5: Emotion Control

Pro Tips

Speed Optimization

Quality Optimization

Common Mistakes to Avoid

Language-Specific Tips

Chinese

English

Japanese

Korean

Troubleshooting Quick Fixes

Next Steps

Need Help?