| title | emoji | colorFrom | colorTo | sdk | sdk_version | app_file | pinned | license | suggested_hardware |
|---|---|---|---|---|---|---|---|---|---|
Qwen3-TTS Demo |
🎙️ |
blue |
purple |
gradio |
5.33.0 |
app.py |
false |
apache-2.0 |
zero-a10g |
Text-to-Speech model based on Qwen3 architecture with voice cloning and voice design capabilities.
A pre-built Docker image is available on DockerHub at starlento/qwen3-tts:latest.
Prerequisites:
- Docker and Docker Compose installed
- NVIDIA GPU with CUDA support
- NVIDIA Container Toolkit installed
Steps:
- Start the container:
docker-compose up -dThe application will be available at http://localhost:7860
Available Apps:
app_custom_voice.py- Custom voice synthesis (default)app_voice_clone.py- Voice cloningapp_voice_design.py- Voice design
To use a different app, modify the command line in docker-compose.yaml:
command: python app_voice_clone.py --server-name 0.0.0.0 --server-port 7860- Python 3.10
- NVIDIA GPU with CUDA 12.9 support
- UV package manager (will be installed automatically if not present)
- Run the UV setup script:
chmod +x setup_uv_env.sh
./setup_uv_env.shThe script will:
- Install UV if not already installed
- Create a virtual environment with Python 3.10
- Install PyTorch 2.8.0 with CUDA 12.9 support
- Install all required dependencies
- Activate the environment:
source .venv/bin/activate- Run the application:
# Default app
python app.py
# Or choose a specific app
python app_custom_voice.py
python app_voice_clone.py
python app_voice_design.pyNote: Flash Attention is NOT installed by default in either setup method. If you need flash-attn for optimized attention mechanisms, you'll need to install it manually.
Models are downloaded to:
- Docker:
/modelsdirectory (mapped to~/modelson host) - Host: Default HuggingFace cache (
~/.cache/huggingface)
- DockerHub:
starlento/qwen3-tts:latest - Base: Python 3.10-slim
- Includes: PyTorch 2.8.0 with CUDA 12.9 support
- Size: ~8GB