Qwen3-TTS ComfyUI Nodes

This is a test wrapper // code generated by claude

Qwen3-TTS ComfyUI Nodes

Fast and efficient Text-to-Speech nodes for ComfyUI based on Qwen3-TTS.

Features

🎙️ Custom Voice Generation: 9 premium voices with emotion/style control
🎨 Voice Design: Create unique voices from text descriptions
👥 Voice Cloning: Clone any voice from 3+ seconds of audio
🌍 Multilingual: Supports 10 languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian)
⚡ Fast: Optimized with model caching and flash attention
📦 Batch Processing: Generate multiple audio clips efficiently

Installation

Method 1: Via ComfyUI Manager (Recommended)

Open ComfyUI Manager
Search for "Qwen3-TTS"
Click Install
Restart ComfyUI

Method 2: Manual Installation

Navigate to your ComfyUI custom nodes directory:

cd ComfyUI/custom_nodes/

Clone this repository:

git clone https://github.com/yourusername/ComfyUI-Qwen3-TTS

Install dependencies:

cd ComfyUI-Qwen3-TTS
pip install qwen-tts soundfile torch

(Optional) Install FlashAttention for better performance:

pip install flash-attn --no-build-isolation

Restart ComfyUI

Nodes

1. Qwen3 TTS Model Loader

Loads and caches Qwen3-TTS models.

Inputs:

model_name: Choose from available models (1.7B or 0.6B variants)
device: GPU or CPU selection
dtype: Precision (bfloat16, float16, float32)
use_flash_attn: Enable FlashAttention 2 for speed

Outputs:

QWEN_TTS_MODEL: Loaded model instance

2. Qwen3 TTS Custom Voice

Generate speech using one of 9 premium voices.

Available Speakers:

Vivian: Bright, slightly edgy young female (Chinese)
Serena: Warm, gentle young female (Chinese)
Uncle_Fu: Seasoned male with low, mellow timbre (Chinese)
Dylan: Youthful Beijing male (Chinese - Beijing Dialect)
Eric: Lively Chengdu male (Chinese - Sichuan Dialect)
Ryan: Dynamic male with strong rhythm (English)
Aiden: Sunny American male (English)
Ono_Anna: Playful Japanese female (Japanese)
Sohee: Warm Korean female (Korean)

Inputs:

model: Qwen3-TTS model
text: Text to synthesize
speaker: Voice selection
language: Target language (or "Auto")
instruct (optional): Style/emotion instruction (e.g., "Very happy", "Angry tone")

3. Qwen3 TTS Voice Design

Create custom voices from text descriptions.

Inputs:

model: Qwen3-TTS VoiceDesign model
text: Text to synthesize
voice_description: Describe the desired voice (e.g., "A deep, authoritative male voice with British accent")
language: Target language

Example Descriptions:

"A warm, gentle young female voice with clear pronunciation"
"Deep masculine voice with slight rasp, confident and commanding"
"Cheerful teenage girl voice, energetic and playful"
"Elderly male voice, wise and calm, with slight tremor"

4. Qwen3 TTS Voice Clone

Clone a voice from reference audio.

Inputs:

model: Qwen3-TTS Base model
text: Text to synthesize
ref_audio: Reference audio (ComfyUI AUDIO format)
ref_text: Transcript of reference audio
language: Target language
x_vector_only (optional): Use only speaker embedding (faster but lower quality)

5. Qwen3 TTS Voice Clone (File)

Clone voice from file path or URL.

Inputs:

ref_audio_path: Local file path or URL
Other inputs same as Voice Clone

Example URLs:

https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone.wav

6. Qwen3 TTS Create Clone Prompt

Create reusable clone prompt for batch generation (optimization).

Inputs:

model: Qwen3-TTS Base model
ref_audio: Reference audio
ref_text: Transcript

Outputs:

QWEN_CLONE_PROMPT: Reusable prompt for faster generation

7. Qwen3 TTS Clone with Prompt

Generate speech using pre-created clone prompt.

Inputs:

model: Qwen3-TTS Base model
text: Text to synthesize
clone_prompt: From "Create Clone Prompt" node
language: Target language

8. Qwen3 TTS Batch Generate

Generate multiple audio clips from a list of texts.

Inputs:

model: Qwen3-TTS model
text_list: Multiple texts separated by delimiter
separator: Text delimiter (default: "\n")
speaker (optional): For custom voice mode
language: Target language
clone_prompt (optional): For voice cloning mode

Workflow Examples

Basic Custom Voice Generation

Model Loader → Custom Voice → Preview Audio → Save Audio

Voice Design

Model Loader (VoiceDesign) → Voice Design → Preview Audio

Voice Cloning

Load Audio → Model Loader (Base) → Voice Clone → Preview Audio

Optimized Batch Cloning

Load Audio → Model Loader (Base) → Create Clone Prompt → Clone with Prompt (×3) → Batch → Save

Performance Tips

Use Model Caching: The Model Loader caches models, so load once and reuse
Enable FlashAttention: Set use_flash_attn=True for 2-3x speed improvement
Use bfloat16: Best balance of speed and quality
Batch Processing: Use the Batch node or Clone with Prompt for multiple generations
GPU Selection: Use cuda:0 for best performance

Models

Models auto-download on first use from Hugging Face. You can also pre-download:

# Via ModelScope (faster in China)
pip install modelscope
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local_dir ./models/Qwen3-TTS-12Hz-1.7B-CustomVoice

# Via Hugging Face
pip install huggingface_hub[cli]
huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local-dir ./models/Qwen3-TTS-12Hz-1.7B-CustomVoice

Available Models:

Qwen3-TTS-12Hz-1.7B-CustomVoice (9 premium voices)
Qwen3-TTS-12Hz-1.7B-VoiceDesign (voice design)
Qwen3-TTS-12Hz-1.7B-Base (voice cloning)
Qwen3-TTS-12Hz-0.6B-CustomVoice (smaller, faster)
Qwen3-TTS-12Hz-0.6B-Base (smaller, faster)

Requirements

Python 3.8+
PyTorch 2.0+
qwen-tts
soundfile
CUDA-capable GPU recommended (CPU works but slower)

Troubleshooting

Model download fails:

Set environment variable: export HF_ENDPOINT=https://hf-mirror.com (China users)
Or manually download models as shown above

Flash Attention not available:

Install with: pip install flash-attn --no-build-isolation
Or disable in Model Loader settings

Out of memory:

Use 0.6B model instead of 1.7B
Use float16 or reduce batch size
Close other applications

Poor voice quality:

Use bfloat16 instead of float16
Ensure reference audio is clean (for cloning)
Provide accurate transcript for reference audio

License

Apache 2.0 (same as Qwen3-TTS)

Credits

Based on Qwen3-TTS by Alibaba Cloud Qwen Team.

Support

Report issues: GitHub Issues
Qwen3-TTS docs: Official Repository

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
QUICKSTART.md		QUICKSTART.md
README.md		README.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml
qwen3_tts_nodes.py		qwen3_tts_nodes.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen3-TTS ComfyUI Nodes

Features

Installation

Method 1: Via ComfyUI Manager (Recommended)

Method 2: Manual Installation

Nodes

1. Qwen3 TTS Model Loader

2. Qwen3 TTS Custom Voice

3. Qwen3 TTS Voice Design

4. Qwen3 TTS Voice Clone

5. Qwen3 TTS Voice Clone (File)

6. Qwen3 TTS Create Clone Prompt

7. Qwen3 TTS Clone with Prompt

8. Qwen3 TTS Batch Generate

Workflow Examples

Basic Custom Voice Generation

Voice Design

Voice Cloning

Optimized Batch Cloning

Performance Tips

Models

Requirements

Troubleshooting

License

Credits

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Qwen3-TTS ComfyUI Nodes

Features

Installation

Method 1: Via ComfyUI Manager (Recommended)

Method 2: Manual Installation

Nodes

1. Qwen3 TTS Model Loader

2. Qwen3 TTS Custom Voice

3. Qwen3 TTS Voice Design

4. Qwen3 TTS Voice Clone

5. Qwen3 TTS Voice Clone (File)

6. Qwen3 TTS Create Clone Prompt

7. Qwen3 TTS Clone with Prompt

8. Qwen3 TTS Batch Generate

Workflow Examples

Basic Custom Voice Generation

Voice Design

Voice Cloning

Optimized Batch Cloning

Performance Tips

Models

Requirements

Troubleshooting

License

Credits

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages