You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**light-tts** is a lightweight and high-performance text-to-speech (TTS) inference and service framework based on Python. It is built around the [cosyvoice](https://github.com/FunAudioLLM/CosyVoice) model and based on the [lightllm](https://github.com/ModelTC/lightllm), with optimizations to support fast, scalable, and service-ready TTS deployment.
**⚡ Lightning-Fast Text-to-Speech Inference & Service Framework**
9
+
10
+
**LightTTS** is a lightweight and high-performance text-to-speech (TTS) inference and service framework based on Python. It supports **CosyVoice2** and **CosyVoice3** models, built upon the [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) architecture and [LightLLM](https://github.com/ModelTC/lightllm) framework, with optimizations to support fast, scalable, and service-ready TTS deployment.
6
11
7
12
---
8
13
@@ -19,34 +24,33 @@
19
24
20
25
### Installation
21
26
22
-
-Installing with Docker
27
+
-(Option 1 Recommended) Run with Docker
23
28
```bash
24
-
# The easiest way to install Lightllm is by using the official image. You can directly pull and run the official image
25
-
docker pull lighttts/light-tts:v1.0
29
+
# The easiest way to install LightTTS is by using the official image. You can directly pull and run the official image
30
+
docker pull lighttts/light-tts:latest
26
31
27
32
# Or you can manually build the image
28
-
docker build -t light-tts:v1.0.
33
+
docker build -t light-tts:latest.
29
34
30
35
# Run the image
31
-
docker run -it --gpus all -p 8080:8080 --shm-size 4g -v your_local_path:/data/ light-tts:v1.0 /bin/bash
36
+
docker run -it --gpus all -p 8080:8080 --shm-size 4g -v your_local_path:/data/ light-tts:latest /bin/bash
(We have already installed the ttsfrd package in the docker image. If you are using docker image, you can skip this installation)
75
81
For better text normalization performance, you can optionally install the ttsfrd package and unzip its resources. This step is not required — if skipped, the system will fall back to WeTextProcessing by default.
📝 This setup instruction is based on the original guide from the [CosyVoice repository](https://github.com/FunAudioLLM/CosyVoice).
84
89
85
90
### Start the Model Service
86
91
92
+
**Note:** It is recommended to enable the `load_trt` parameter for acceleration. The default flow precision is fp16 for CosyVoice2 and fp32 for CosyVoice3.
93
+
94
+
**For CosyVoice2:**
95
+
87
96
```bash
88
-
# It is recommended to enable the load_trt parameter for acceleration.
- max_total_token_num: llm arg, the total token nums the gpu and model can support, equals = `max_batch * (input_len + output_len)`
94
-
- max_req_total_len: llm arg, the max value for`req_input_len + req_output_len`, 32768 is set here because the `max_position_embeddings` of the llm part is 32768
95
-
- There are many other parameters that can be viewed in`light_tts/server/api_cli.py`
The default values are usually the fastest and generally do not need to be adjusted. If you need to customize them, please refer to the following parameter descriptions:
116
+
- `load_trt`: Whether to load the flow_decoder in TensorRT mode (default: True).
117
+
- `data_type`: The data typefor LLM inference (default: float16)
118
+
- `load_jit`: Whether to load the flow_encoder in JIT mode (default: False).
119
+
- `max_total_token_num`: LLM arg, total token count the GPU and model can support = `max_batch * (input_len + output_len)` (default: 64 * 1024)
120
+
- `max_req_total_len`: LLM arg, maximum value for`req_input_len + req_output_len` (default: 32768, matches `max_position_embeddings`)
Once the service is running, you can interact with it through the HTTP API. We support three modes: **non-streaming**, **streaming**, and **bi-streaming**.
131
+
132
+
- **Non-streaming and Streaming**: Use `test/test_zero_shot.py`for examples, which prints metrics such as RTF (Real-Time Factor) and TTFT (Time To First Token)
133
+
- **Bi-streaming**: Uses WebSocket interface. See usage examples in`test/test_bistream.py`
134
+
135
+
## 📊 Performance Benchmarks
136
+
137
+
We have conducted performance benchmarks on different GPU configurations to demonstrate the throughput and latency characteristics of LightTTS in streaming mode.
- **total_cost_time**: Total benchmark duration in seconds
186
+
- **qps**: Queries Per Second
150
187
151
188
## License
152
-
This repository is released under the [Apache-2.0](LICENSE) license.
189
+
190
+
This repository is released under the [Apache-2.0](LICENSE) license.
191
+
192
+
### Third-Party Code Attribution
193
+
194
+
This project includes code from [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) (Copyright Alibaba, Inc. and its affiliates), which is also licensed under Apache-2.0. The CosyVoice code is located in the `cosyvoice/` directory and has been integrated and modified as part of LightTTS. See the [NOTICE](NOTICE) file forcomplete attribution details.
0 commit comments