Skip to content

Commit 7eb3ab0

Browse files
authored
Merge pull request #1806 from lym0302/r1.0
[server] update streaming demos readme
2 parents 118c742 + 93467d5 commit 7eb3ab0

File tree

2 files changed

+293
-26
lines changed

2 files changed

+293
-26
lines changed

demos/streaming_tts_server/README.md

Lines changed: 146 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ You can choose one way from meduim and hard to install paddlespeech.
1616

1717
### 2. Prepare config File
1818
The configuration file can be found in `conf/tts_online_application.yaml`.
19-
- `protocol` indicates the network protocol used by the streaming TTS service. Currently, both http and websocket are supported.
19+
- `protocol` indicates the network protocol used by the streaming TTS service. Currently, both **http and websocket** are supported.
2020
- `engine_list` indicates the speech engine that will be included in the service to be started, in the format of `<speech task>_<engine type>`.
2121
- This demo mainly introduces the streaming speech synthesis service, so the speech task should be set to `tts`.
2222
- the engine type supports two forms: **online** and **online-onnx**. `online` indicates an engine that uses python for dynamic graph inference; `online-onnx` indicates an engine that uses onnxruntime for inference. The inference speed of online-onnx is faster.
@@ -31,12 +31,12 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
3131
- Inference speed: mb_melgan > hifigan; Audio quality: mb_melgan < hifigan
3232

3333

34-
35-
### 3. Server Usage
34+
### 3. Streaming speech synthesis server and client using http protocol
35+
#### 3.1 Server Usage
3636
- Command Line (Recommended)
3737

38+
Start the service (the configuration file uses http by default):
3839
```bash
39-
# start the service
4040
paddlespeech_server start --config_file ./conf/tts_online_application.yaml
4141
```
4242

@@ -76,7 +76,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
7676
log_file="./log/paddlespeech.log")
7777
```
7878

79-
Output:
79+
Output:
8080
```bash
8181
[2022-04-24 21:00:16,934] [ INFO] - The first response time of the 0 warm up: 1.268730878829956 s
8282
[2022-04-24 21:00:17,046] [ INFO] - The first response time of the 1 warm up: 0.11168622970581055 s
@@ -94,17 +94,15 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
9494

9595
```
9696

97-
98-
### 4. Streaming TTS client Usage
97+
#### 3.2 Streaming TTS client Usage
9998
- Command Line (Recommended)
10099

101-
```bash
102-
# Access http streaming TTS service
103-
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
100+
Access http streaming TTS service:
104101

105-
# Access websocket streaming TTS service
106-
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
102+
```bash
103+
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
107104
```
105+
108106
Usage:
109107

110108
```bash
@@ -122,7 +120,6 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
122120
- `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0
123121
- `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
124122
- `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**.
125-
126123

127124
Output:
128125
```bash
@@ -165,8 +162,144 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
165162
[2022-04-24 21:11:16,802] [ INFO] - 音频时长:3.825 s
166163
[2022-04-24 21:11:16,802] [ INFO] - RTF: 0.7846773683635238
167164
[2022-04-24 21:11:16,837] [ INFO] - 音频保存至:./output.wav
165+
```
166+
167+
168+
### 4. Streaming speech synthesis server and client using websocket protocol
169+
#### 4.1 Server Usage
170+
- Command Line (Recommended)
171+
First modify the configuration file `conf/tts_online_application.yaml`, **set `protocol` to `websocket`**.
172+
Start the service:
173+
```bash
174+
paddlespeech_server start --config_file ./conf/tts_online_application.yaml
175+
```
176+
177+
Usage:
178+
179+
```bash
180+
paddlespeech_server start --help
181+
```
182+
Arguments:
183+
- `config_file`: yaml file of the app, defalut: ./conf/tts_online_application.yaml
184+
- `log_file`: log file. Default: ./log/paddlespeech.log
185+
186+
Output:
187+
```bash
188+
[2022-04-27 10:18:09,107] [ INFO] - The first response time of the 0 warm up: 1.1551103591918945 s
189+
[2022-04-27 10:18:09,219] [ INFO] - The first response time of the 1 warm up: 0.11204338073730469 s
190+
[2022-04-27 10:18:09,324] [ INFO] - The first response time of the 2 warm up: 0.1051797866821289 s
191+
[2022-04-27 10:18:09,325] [ INFO] - **********************************************************************
192+
INFO: Started server process [17600]
193+
[2022-04-27 10:18:09] [INFO] [server.py:75] Started server process [17600]
194+
INFO: Waiting for application startup.
195+
[2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup.
196+
INFO: Application startup complete.
197+
[2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete.
198+
INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
199+
[2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
168200
169201
170202
```
171203

204+
- Python API
205+
```python
206+
from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
207+
208+
server_executor = ServerExecutor()
209+
server_executor(
210+
config_file="./conf/tts_online_application.yaml",
211+
log_file="./log/paddlespeech.log")
212+
```
213+
214+
Output:
215+
```bash
216+
[2022-04-27 10:20:16,660] [ INFO] - The first response time of the 0 warm up: 1.0945196151733398 s
217+
[2022-04-27 10:20:16,773] [ INFO] - The first response time of the 1 warm up: 0.11222052574157715 s
218+
[2022-04-27 10:20:16,878] [ INFO] - The first response time of the 2 warm up: 0.10494542121887207 s
219+
[2022-04-27 10:20:16,878] [ INFO] - **********************************************************************
220+
INFO: Started server process [23466]
221+
[2022-04-27 10:20:16] [INFO] [server.py:75] Started server process [23466]
222+
INFO: Waiting for application startup.
223+
[2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup.
224+
INFO: Application startup complete.
225+
[2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete.
226+
INFO: Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
227+
[2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://127.0.0.1:8092 (Press CTRL+C to quit)
228+
229+
```
230+
231+
#### 4.2 Streaming TTS client Usage
232+
- Command Line (Recommended)
233+
234+
Access websocket streaming TTS service:
235+
236+
```bash
237+
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol websocket --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
238+
```
239+
240+
Usage:
241+
242+
```bash
243+
paddlespeech_client tts_online --help
244+
```
245+
246+
Arguments:
247+
- `server_ip`: erver ip. Default: 127.0.0.1
248+
- `port`: server port. Default: 8092
249+
- `protocol`: Service protocol, choices: [http, websocket], default: http.
250+
- `input`: (required): Input text to generate.
251+
- `spk_id`: Speaker id for multi-speaker text to speech. Default: 0
252+
- `speed`: Audio speed, the value should be set between 0 and 3. Default: 1.0
253+
- `volume`: Audio volume, the value should be set between 0 and 3. Default: 1.0
254+
- `sample_rate`: Sampling rate, choices: [0, 8000, 16000], the default is the same as the model. Default: 0
255+
- `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
256+
- `play`: Whether to play audio, play while synthesizing, default value: False, which means not playing. **Playing audio needs to rely on the pyaudio library**.
257+
258+
259+
Output:
260+
```bash
261+
[2022-04-27 10:21:04,262] [ INFO] - tts websocket client start
262+
[2022-04-27 10:21:04,496] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。
263+
[2022-04-27 10:21:04,496] [ INFO] - 首包响应:0.2124948501586914 s
264+
[2022-04-27 10:21:07,483] [ INFO] - 尾包响应:3.199106454849243 s
265+
[2022-04-27 10:21:07,484] [ INFO] - 音频时长:3.825 s
266+
[2022-04-27 10:21:07,484] [ INFO] - RTF: 0.8363677006141812
267+
[2022-04-27 10:21:07,516] [ INFO] - 音频保存至:output.wav
268+
269+
```
270+
271+
- Python API
272+
```python
273+
from paddlespeech.server.bin.paddlespeech_client import TTSOnlineClientExecutor
274+
import json
275+
276+
executor = TTSOnlineClientExecutor()
277+
executor(
278+
input="您好,欢迎使用百度飞桨语音合成服务。",
279+
server_ip="127.0.0.1",
280+
port=8092,
281+
protocol="websocket",
282+
spk_id=0,
283+
speed=1.0,
284+
volume=1.0,
285+
sample_rate=0,
286+
output="./output.wav",
287+
play=False)
288+
289+
```
290+
291+
Output:
292+
```bash
293+
[2022-04-27 10:22:48,852] [ INFO] - tts websocket client start
294+
[2022-04-27 10:22:49,080] [ INFO] - 句子:您好,欢迎使用百度飞桨语音合成服务。
295+
[2022-04-27 10:22:49,080] [ INFO] - 首包响应:0.21017956733703613 s
296+
[2022-04-27 10:22:52,100] [ INFO] - 尾包响应:3.2304444313049316 s
297+
[2022-04-27 10:22:52,101] [ INFO] - 音频时长:3.825 s
298+
[2022-04-27 10:22:52,101] [ INFO] - RTF: 0.8445606356352762
299+
[2022-04-27 10:22:52,134] [ INFO] - 音频保存至:./output.wav
300+
301+
```
302+
303+
304+
172305

0 commit comments

Comments
 (0)