Skip to content

PGCRT/ComfyUI-QWEN3_TTS

Repository files navigation

This is a test wrapper // code generated by claude

Qwen3-TTS ComfyUI Nodes

Fast and efficient Text-to-Speech nodes for ComfyUI based on Qwen3-TTS.

Features

  • 🎙️ Custom Voice Generation: 9 premium voices with emotion/style control
  • 🎨 Voice Design: Create unique voices from text descriptions
  • 👥 Voice Cloning: Clone any voice from 3+ seconds of audio
  • 🌍 Multilingual: Supports 10 languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian)
  • Fast: Optimized with model caching and flash attention
  • 📦 Batch Processing: Generate multiple audio clips efficiently

Installation

Method 1: Via ComfyUI Manager (Recommended)

  1. Open ComfyUI Manager
  2. Search for "Qwen3-TTS"
  3. Click Install
  4. Restart ComfyUI

Method 2: Manual Installation

  1. Navigate to your ComfyUI custom nodes directory:
cd ComfyUI/custom_nodes/
  1. Clone this repository:
git clone https://github.com/yourusername/ComfyUI-Qwen3-TTS
  1. Install dependencies:
cd ComfyUI-Qwen3-TTS
pip install qwen-tts soundfile torch
  1. (Optional) Install FlashAttention for better performance:
pip install flash-attn --no-build-isolation
  1. Restart ComfyUI

Nodes

1. Qwen3 TTS Model Loader

Loads and caches Qwen3-TTS models.

Inputs:

  • model_name: Choose from available models (1.7B or 0.6B variants)
  • device: GPU or CPU selection
  • dtype: Precision (bfloat16, float16, float32)
  • use_flash_attn: Enable FlashAttention 2 for speed

Outputs:

  • QWEN_TTS_MODEL: Loaded model instance

2. Qwen3 TTS Custom Voice

Generate speech using one of 9 premium voices.

Available Speakers:

  • Vivian: Bright, slightly edgy young female (Chinese)
  • Serena: Warm, gentle young female (Chinese)
  • Uncle_Fu: Seasoned male with low, mellow timbre (Chinese)
  • Dylan: Youthful Beijing male (Chinese - Beijing Dialect)
  • Eric: Lively Chengdu male (Chinese - Sichuan Dialect)
  • Ryan: Dynamic male with strong rhythm (English)
  • Aiden: Sunny American male (English)
  • Ono_Anna: Playful Japanese female (Japanese)
  • Sohee: Warm Korean female (Korean)

Inputs:

  • model: Qwen3-TTS model
  • text: Text to synthesize
  • speaker: Voice selection
  • language: Target language (or "Auto")
  • instruct (optional): Style/emotion instruction (e.g., "Very happy", "Angry tone")

3. Qwen3 TTS Voice Design

Create custom voices from text descriptions.

Inputs:

  • model: Qwen3-TTS VoiceDesign model
  • text: Text to synthesize
  • voice_description: Describe the desired voice (e.g., "A deep, authoritative male voice with British accent")
  • language: Target language

Example Descriptions:

  • "A warm, gentle young female voice with clear pronunciation"
  • "Deep masculine voice with slight rasp, confident and commanding"
  • "Cheerful teenage girl voice, energetic and playful"
  • "Elderly male voice, wise and calm, with slight tremor"

4. Qwen3 TTS Voice Clone

Clone a voice from reference audio.

Inputs:

  • model: Qwen3-TTS Base model
  • text: Text to synthesize
  • ref_audio: Reference audio (ComfyUI AUDIO format)
  • ref_text: Transcript of reference audio
  • language: Target language
  • x_vector_only (optional): Use only speaker embedding (faster but lower quality)

5. Qwen3 TTS Voice Clone (File)

Clone voice from file path or URL.

Inputs:

  • ref_audio_path: Local file path or URL
  • Other inputs same as Voice Clone

Example URLs:

https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-TTS-Repo/clone.wav

6. Qwen3 TTS Create Clone Prompt

Create reusable clone prompt for batch generation (optimization).

Inputs:

  • model: Qwen3-TTS Base model
  • ref_audio: Reference audio
  • ref_text: Transcript

Outputs:

  • QWEN_CLONE_PROMPT: Reusable prompt for faster generation

7. Qwen3 TTS Clone with Prompt

Generate speech using pre-created clone prompt.

Inputs:

  • model: Qwen3-TTS Base model
  • text: Text to synthesize
  • clone_prompt: From "Create Clone Prompt" node
  • language: Target language

8. Qwen3 TTS Batch Generate

Generate multiple audio clips from a list of texts.

Inputs:

  • model: Qwen3-TTS model
  • text_list: Multiple texts separated by delimiter
  • separator: Text delimiter (default: "\n")
  • speaker (optional): For custom voice mode
  • language: Target language
  • clone_prompt (optional): For voice cloning mode

Workflow Examples

Basic Custom Voice Generation

Model Loader → Custom Voice → Preview Audio → Save Audio

Voice Design

Model Loader (VoiceDesign) → Voice Design → Preview Audio

Voice Cloning

Load Audio → Model Loader (Base) → Voice Clone → Preview Audio

Optimized Batch Cloning

Load Audio → Model Loader (Base) → Create Clone Prompt → Clone with Prompt (×3) → Batch → Save

Performance Tips

  1. Use Model Caching: The Model Loader caches models, so load once and reuse
  2. Enable FlashAttention: Set use_flash_attn=True for 2-3x speed improvement
  3. Use bfloat16: Best balance of speed and quality
  4. Batch Processing: Use the Batch node or Clone with Prompt for multiple generations
  5. GPU Selection: Use cuda:0 for best performance

Models

Models auto-download on first use from Hugging Face. You can also pre-download:

# Via ModelScope (faster in China)
pip install modelscope
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local_dir ./models/Qwen3-TTS-12Hz-1.7B-CustomVoice

# Via Hugging Face
pip install huggingface_hub[cli]
huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local-dir ./models/Qwen3-TTS-12Hz-1.7B-CustomVoice

Available Models:

  • Qwen3-TTS-12Hz-1.7B-CustomVoice (9 premium voices)
  • Qwen3-TTS-12Hz-1.7B-VoiceDesign (voice design)
  • Qwen3-TTS-12Hz-1.7B-Base (voice cloning)
  • Qwen3-TTS-12Hz-0.6B-CustomVoice (smaller, faster)
  • Qwen3-TTS-12Hz-0.6B-Base (smaller, faster)

Requirements

  • Python 3.8+
  • PyTorch 2.0+
  • qwen-tts
  • soundfile
  • CUDA-capable GPU recommended (CPU works but slower)

Troubleshooting

Model download fails:

  • Set environment variable: export HF_ENDPOINT=https://hf-mirror.com (China users)
  • Or manually download models as shown above

Flash Attention not available:

  • Install with: pip install flash-attn --no-build-isolation
  • Or disable in Model Loader settings

Out of memory:

  • Use 0.6B model instead of 1.7B
  • Use float16 or reduce batch size
  • Close other applications

Poor voice quality:

  • Use bfloat16 instead of float16
  • Ensure reference audio is clean (for cloning)
  • Provide accurate transcript for reference audio

License

Apache 2.0 (same as Qwen3-TTS)

Credits

Based on Qwen3-TTS by Alibaba Cloud Qwen Team.

Support

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages