This is a test wrapper // code generated by claude
Fast and efficient Text-to-Speech nodes for ComfyUI based on Qwen3-TTS.
- 🎙️ Custom Voice Generation: 9 premium voices with emotion/style control
- 🎨 Voice Design: Create unique voices from text descriptions
- 👥 Voice Cloning: Clone any voice from 3+ seconds of audio
- 🌍 Multilingual: Supports 10 languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian)
- ⚡ Fast: Optimized with model caching and flash attention
- 📦 Batch Processing: Generate multiple audio clips efficiently
- Open ComfyUI Manager
- Search for "Qwen3-TTS"
- Click Install
- Restart ComfyUI
- Navigate to your ComfyUI custom nodes directory:
cd ComfyUI/custom_nodes/- Clone this repository:
git clone https://github.com/yourusername/ComfyUI-Qwen3-TTS- Install dependencies:
cd ComfyUI-Qwen3-TTS
pip install qwen-tts soundfile torch- (Optional) Install FlashAttention for better performance:
pip install flash-attn --no-build-isolation- Restart ComfyUI
Loads and caches Qwen3-TTS models.
Inputs:
model_name: Choose from available models (1.7B or 0.6B variants)device: GPU or CPU selectiondtype: Precision (bfloat16, float16, float32)use_flash_attn: Enable FlashAttention 2 for speed
Outputs:
QWEN_TTS_MODEL: Loaded model instance
Generate speech using one of 9 premium voices.
Available Speakers:
- Vivian: Bright, slightly edgy young female (Chinese)
- Serena: Warm, gentle young female (Chinese)
- Uncle_Fu: Seasoned male with low, mellow timbre (Chinese)
- Dylan: Youthful Beijing male (Chinese - Beijing Dialect)
- Eric: Lively Chengdu male (Chinese - Sichuan Dialect)
- Ryan: Dynamic male with strong rhythm (English)
- Aiden: Sunny American male (English)
- Ono_Anna: Playful Japanese female (Japanese)
- Sohee: Warm Korean female (Korean)
Inputs:
model: Qwen3-TTS modeltext: Text to synthesizespeaker: Voice selectionlanguage: Target language (or "Auto")instruct(optional): Style/emotion instruction (e.g., "Very happy", "Angry tone")
Create custom voices from text descriptions.
Inputs:
model: Qwen3-TTS VoiceDesign modeltext: Text to synthesizevoice_description: Describe the desired voice (e.g., "A deep, authoritative male voice with British accent")language: Target language
Example Descriptions:
- "A warm, gentle young female voice with clear pronunciation"
- "Deep masculine voice with slight rasp, confident and commanding"
- "Cheerful teenage girl voice, energetic and playful"
- "Elderly male voice, wise and calm, with slight tremor"
Clone a voice from reference audio.
Inputs:
model: Qwen3-TTS Base modeltext: Text to synthesizeref_audio: Reference audio (ComfyUI AUDIO format)ref_text: Transcript of reference audiolanguage: Target languagex_vector_only(optional): Use only speaker embedding (faster but lower quality)
Clone voice from file path or URL.
Inputs:
ref_audio_path: Local file path or URL- Other inputs same as Voice Clone
Example URLs:
https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone.wav
Create reusable clone prompt for batch generation (optimization).
Inputs:
model: Qwen3-TTS Base modelref_audio: Reference audioref_text: Transcript
Outputs:
QWEN_CLONE_PROMPT: Reusable prompt for faster generation
Generate speech using pre-created clone prompt.
Inputs:
model: Qwen3-TTS Base modeltext: Text to synthesizeclone_prompt: From "Create Clone Prompt" nodelanguage: Target language
Generate multiple audio clips from a list of texts.
Inputs:
model: Qwen3-TTS modeltext_list: Multiple texts separated by delimiterseparator: Text delimiter (default: "\n")speaker(optional): For custom voice modelanguage: Target languageclone_prompt(optional): For voice cloning mode
Model Loader → Custom Voice → Preview Audio → Save Audio
Model Loader (VoiceDesign) → Voice Design → Preview Audio
Load Audio → Model Loader (Base) → Voice Clone → Preview Audio
Load Audio → Model Loader (Base) → Create Clone Prompt → Clone with Prompt (×3) → Batch → Save
- Use Model Caching: The Model Loader caches models, so load once and reuse
- Enable FlashAttention: Set
use_flash_attn=Truefor 2-3x speed improvement - Use bfloat16: Best balance of speed and quality
- Batch Processing: Use the Batch node or Clone with Prompt for multiple generations
- GPU Selection: Use
cuda:0for best performance
Models auto-download on first use from Hugging Face. You can also pre-download:
# Via ModelScope (faster in China)
pip install modelscope
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local_dir ./models/Qwen3-TTS-12Hz-1.7B-CustomVoice
# Via Hugging Face
pip install huggingface_hub[cli]
huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local-dir ./models/Qwen3-TTS-12Hz-1.7B-CustomVoiceAvailable Models:
Qwen3-TTS-12Hz-1.7B-CustomVoice(9 premium voices)Qwen3-TTS-12Hz-1.7B-VoiceDesign(voice design)Qwen3-TTS-12Hz-1.7B-Base(voice cloning)Qwen3-TTS-12Hz-0.6B-CustomVoice(smaller, faster)Qwen3-TTS-12Hz-0.6B-Base(smaller, faster)
- Python 3.8+
- PyTorch 2.0+
- qwen-tts
- soundfile
- CUDA-capable GPU recommended (CPU works but slower)
Model download fails:
- Set environment variable:
export HF_ENDPOINT=https://hf-mirror.com(China users) - Or manually download models as shown above
Flash Attention not available:
- Install with:
pip install flash-attn --no-build-isolation - Or disable in Model Loader settings
Out of memory:
- Use 0.6B model instead of 1.7B
- Use float16 or reduce batch size
- Close other applications
Poor voice quality:
- Use bfloat16 instead of float16
- Ensure reference audio is clean (for cloning)
- Provide accurate transcript for reference audio
Apache 2.0 (same as Qwen3-TTS)
Based on Qwen3-TTS by Alibaba Cloud Qwen Team.
- Report issues: GitHub Issues
- Qwen3-TTS docs: Official Repository