PaddlePaddle
diff --git a/‎demos/speech_server/README.md‎
Lines changed: 29 additions & 29 deletions b/‎demos/speech_server/README.md‎
Lines changed: 29 additions & 29 deletions
diff --git a/‎demos/speech_server/README_cn.md‎
Lines changed: 31 additions & 40 deletions b/‎demos/speech_server/README_cn.md‎
Lines changed: 31 additions & 40 deletions
diff --git a/‎demos/streaming_asr_server/README.md‎
Lines changed: 146 additions & 150 deletions b/‎demos/streaming_asr_server/README.md‎
Lines changed: 146 additions & 150 deletions
diff --git a/‎demos/streaming_asr_server/README_cn.md‎
Lines changed: 175 additions & 184 deletions b/‎demos/streaming_asr_server/README_cn.md‎
Lines changed: 175 additions & 184 deletions
diff --git a/‎demos/streaming_tts_server/README.md‎
Lines changed: 41 additions & 56 deletions b/‎demos/streaming_tts_server/README.md‎
Lines changed: 41 additions & 56 deletions
diff --git a/‎demos/streaming_tts_server/README_cn.md‎
Lines changed: 35 additions & 40 deletions b/‎demos/streaming_tts_server/README_cn.md‎
Lines changed: 35 additions & 40 deletions
@@ -9,7 +9,6 @@ For service interface definition, please check:
 - [PaddleSpeech Server RESTful API](https://github.com/PaddlePaddle/PaddleSpeech/wiki/PaddleSpeech-Server-RESTful-API)
 - [PaddleSpeech Streaming Server WebSocket API](https://github.com/PaddlePaddle/PaddleSpeech/wiki/PaddleSpeech-Server-WebSocket-API)
 
-
 ## Usage
 ### 1. Installation
 see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
@@ -34,11 +33,10 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
     - Both hifigan and mb_melgan support streaming voc inference.
     - When the voc model is mb_melgan, when voc_pad=14, the synthetic audio for streaming inference is consistent with the non-streaming synthetic audio; the minimum voc_pad can be set to 7, and the synthetic audio has no abnormal hearing. If the voc_pad is less than 7, the synthetic audio sounds abnormal.
     - When the voc model is hifigan, when voc_pad=19, the streaming inference synthetic audio is consistent with the non-streaming synthetic audio; when voc_pad=14, the synthetic audio has no abnormal hearing.
+    - Pad calculation method of streaming vocoder in PaddleSpeech: [AIStudio tutorial](https://aistudio.baidu.com/aistudio/projectdetail/4151335)
 - Inference speed: mb_melgan > hifigan; Audio quality: mb_melgan < hifigan
 - **Note:** If the service can be started normally in the container, but the client access IP is unreachable, you can try to replace the `host` address in the configuration file with the local IP address.
 
-
-
 ### 3. Streaming speech synthesis server and client using http protocol
 #### 3.1 Server Usage
 - Command Line (Recommended)
@@ -58,7 +56,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
   - `log_file`: log file. Default: ./log/paddlespeech.log
 
   Output:
-  ```bash
+  ```text
   [2022-04-24 20:05:27,887] [    INFO] - The first response time of the 0 warm up: 1.0123658180236816 s
   [2022-04-24 20:05:28,038] [    INFO] - The first response time of the 1 warm up: 0.15108466148376465 s
   [2022-04-24 20:05:28,191] [    INFO] - The first response time of the 2 warm up: 0.15317344665527344 s
@@ -84,8 +82,8 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
       log_file="./log/paddlespeech.log")
   ```
 
- Output:
-  ```bash
+  Output:
+  ```text
   [2022-04-24 21:00:16,934] [    INFO] - The first response time of the 0 warm up: 1.268730878829956 s
   [2022-04-24 21:00:17,046] [    INFO] - The first response time of the 1 warm up: 0.11168622970581055 s
   [2022-04-24 21:00:17,151] [    INFO] - The first response time of the 2 warm up: 0.10413002967834473 s
@@ -98,8 +96,6 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
   [2022-04-24 21:00:17] [INFO] [on.py:59] Application startup complete.
   INFO:     Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
   [2022-04-24 21:00:17] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
-
-
   ```
 
 #### 3.2 Streaming TTS client Usage
@@ -130,7 +126,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
     - Currently, only the single-speaker model is supported in the code, so `spk_id` does not take effect. Streaming TTS does not support changing sample rate, variable speed and volume.
 
     Output:
-    ```bash
+    ```text
     [2022-04-24 21:08:18,559] [    INFO] - tts http client start
     [2022-04-24 21:08:21,702] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
     [2022-04-24 21:08:21,703] [    INFO] - 首包响应：0.18863153457641602 s
@@ -159,7 +155,7 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
   ```
 
   Output:
-  ```bash
+  ```text
   [2022-04-24 21:11:13,798] [    INFO] - tts http client start
   [2022-04-24 21:11:16,800] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
   [2022-04-24 21:11:16,801] [    INFO] - 首包响应：0.18234872817993164 s
@@ -169,7 +165,6 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
   [2022-04-24 21:11:16,837] [    INFO] - 音频保存至：./output.wav
   ```
 
- 
 ### 4. Streaming speech synthesis server and client using websocket protocol
 #### 4.1 Server Usage
 - Command Line (Recommended)
@@ -189,21 +184,19 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
   - `log_file`: log file. Default: ./log/paddlespeech.log
 
   Output:
-  ```bash
-    [2022-04-27 10:18:09,107] [    INFO] - The first response time of the 0 warm up: 1.1551103591918945 s
-    [2022-04-27 10:18:09,219] [    INFO] - The first response time of the 1 warm up: 0.11204338073730469 s
-    [2022-04-27 10:18:09,324] [    INFO] - The first response time of the 2 warm up: 0.1051797866821289 s
-    [2022-04-27 10:18:09,325] [    INFO] - **********************************************************************
-    INFO:     Started server process [17600]
-    [2022-04-27 10:18:09] [INFO] [server.py:75] Started server process [17600]
-    INFO:     Waiting for application startup.
-    [2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup.
-    INFO:     Application startup complete.
-    [2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete.
-    INFO:     Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
-    [2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
-
-
+  ```text
+  [2022-04-27 10:18:09,107] [    INFO] - The first response time of the 0 warm up: 1.1551103591918945 s
+  [2022-04-27 10:18:09,219] [    INFO] - The first response time of the 1 warm up: 0.11204338073730469 s
+  [2022-04-27 10:18:09,324] [    INFO] - The first response time of the 2 warm up: 0.1051797866821289 s
+  [2022-04-27 10:18:09,325] [    INFO] - **********************************************************************
+  INFO:     Started server process [17600]
+  [2022-04-27 10:18:09] [INFO] [server.py:75] Started server process [17600]
+  INFO:     Waiting for application startup.
+  [2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup.
+  INFO:     Application startup complete.
+  [2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete.
+  INFO:     Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+  [2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
   ```
 
 - Python API
@@ -217,20 +210,19 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
   ```
 
   Output:
-  ```bash
-    [2022-04-27 10:20:16,660] [    INFO] - The first response time of the 0 warm up: 1.0945196151733398 s
-    [2022-04-27 10:20:16,773] [    INFO] - The first response time of the 1 warm up: 0.11222052574157715 s
-    [2022-04-27 10:20:16,878] [    INFO] - The first response time of the 2 warm up: 0.10494542121887207 s
-    [2022-04-27 10:20:16,878] [    INFO] - **********************************************************************
-    INFO:     Started server process [23466]
-    [2022-04-27 10:20:16] [INFO] [server.py:75] Started server process [23466]
-    INFO:     Waiting for application startup.
-    [2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup.
-    INFO:     Application startup complete.
-    [2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete.
-    INFO:     Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
-    [2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
-
+  ```text
+  [2022-04-27 10:20:16,660] [    INFO] - The first response time of the 0 warm up: 1.0945196151733398 s
+  [2022-04-27 10:20:16,773] [    INFO] - The first response time of the 1 warm up: 0.11222052574157715 s
+  [2022-04-27 10:20:16,878] [    INFO] - The first response time of the 2 warm up: 0.10494542121887207 s
+  [2022-04-27 10:20:16,878] [    INFO] - **********************************************************************
+  INFO:     Started server process [23466]
+  [2022-04-27 10:20:16] [INFO] [server.py:75] Started server process [23466]
+  INFO:     Waiting for application startup.
+  [2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup.
+  INFO:     Application startup complete.
+  [2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete.
+  INFO:     Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+  [2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
   ```
 
 #### 4.2 Streaming TTS client Usage
@@ -263,15 +255,14 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
 
 
     Output:
-    ```bash
+    ```text
     [2022-04-27 10:21:04,262] [    INFO] - tts websocket client start
     [2022-04-27 10:21:04,496] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
     [2022-04-27 10:21:04,496] [    INFO] - 首包响应：0.2124948501586914 s
     [2022-04-27 10:21:07,483] [    INFO] - 尾包响应：3.199106454849243 s
     [2022-04-27 10:21:07,484] [    INFO] - 音频时长：3.825 s
     [2022-04-27 10:21:07,484] [    INFO] - RTF: 0.8363677006141812
     [2022-04-27 10:21:07,516] [    INFO] - 音频保存至：output.wav
-
     ```
 
 - Python API
@@ -288,21 +279,15 @@ The configuration file can be found in `conf/tts_online_application.yaml`.
       spk_id=0,
       output="./output.wav",
       play=False)
-
   ```
 
   Output:
-  ```bash
-    [2022-04-27 10:22:48,852] [    INFO] - tts websocket client start
-    [2022-04-27 10:22:49,080] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
-    [2022-04-27 10:22:49,080] [    INFO] - 首包响应：0.21017956733703613 s
-    [2022-04-27 10:22:52,100] [    INFO] - 尾包响应：3.2304444313049316 s
-    [2022-04-27 10:22:52,101] [    INFO] - 音频时长：3.825 s
-    [2022-04-27 10:22:52,101] [    INFO] - RTF: 0.8445606356352762
-    [2022-04-27 10:22:52,134] [    INFO] - 音频保存至：./output.wav
-
+  ```text
+  [2022-04-27 10:22:48,852] [    INFO] - tts websocket client start
+  [2022-04-27 10:22:49,080] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
+  [2022-04-27 10:22:49,080] [    INFO] - 首包响应：0.21017956733703613 s
+  [2022-04-27 10:22:52,100] [    INFO] - 尾包响应：3.2304444313049316 s
+  [2022-04-27 10:22:52,101] [    INFO] - 音频时长：3.825 s
+  [2022-04-27 10:22:52,101] [    INFO] - RTF: 0.8445606356352762
+  [2022-04-27 10:22:52,134] [    INFO] - 音频保存至：./output.wav
   ```
-
-
-
-  
@@ -18,7 +18,6 @@
 
 **如果使用简单模式安装，需要自行准备 yaml 文件，可参考 conf 目录下的 yaml 文件。**
 
-
 ### 2. 准备配置文件
 配置文件可参见 `conf/tts_online_application.yaml` 。
 - `protocol` 表示该流式 TTS 服务使用的网络协议，目前支持 **http 和 websocket** 两种。
@@ -33,6 +32,7 @@
     - hifigan, mb_melgan 均支持流式 voc 推理
     - 当 voc 模型为 mb_melgan，当 voc_pad=14 时，流式推理合成音频与非流式合成音频一致；voc_pad 最小可以设置为7，合成音频听感上没有异常，若 voc_pad 小于7，合成音频听感上存在异常。
     - 当 voc 模型为 hifigan，当 voc_pad=19 时，流式推理合成音频与非流式合成音频一致；当 voc_pad=14 时，合成音频听感上没有异常。
+    - PaddleSpeech 中流式声码器 Pad 计算方法: [AIStudio 教程](https://aistudio.baidu.com/aistudio/projectdetail/4151335)
 - 推理速度：mb_melgan > hifigan; 音频质量：mb_melgan < hifigan
 - **注意：** 如果在容器里可正常启动服务，但客户端访问 ip 不可达，可尝试将配置文件中 `host` 地址换成本地 ip 地址。
 
@@ -56,7 +56,7 @@
   - `log_file`: log 文件. 默认：./log/paddlespeech.log
 
   输出:
-  ```bash
+  ```text
   [2022-04-24 20:05:27,887] [    INFO] - The first response time of the 0 warm up: 1.0123658180236816 s
   [2022-04-24 20:05:28,038] [    INFO] - The first response time of the 1 warm up: 0.15108466148376465 s
   [2022-04-24 20:05:28,191] [    INFO] - The first response time of the 2 warm up: 0.15317344665527344 s
@@ -81,8 +81,8 @@
       log_file="./log/paddlespeech.log")
   ```
 
-  输出：
-  ```bash
+  输出:
+  ```text
   [2022-04-24 21:00:16,934] [    INFO] - The first response time of the 0 warm up: 1.268730878829956 s
   [2022-04-24 21:00:17,046] [    INFO] - The first response time of the 1 warm up: 0.11168622970581055 s
   [2022-04-24 21:00:17,151] [    INFO] - The first response time of the 2 warm up: 0.10413002967834473 s
@@ -126,7 +126,7 @@
 
 
     输出:
-    ```bash
+    ```text
     [2022-04-24 21:08:18,559] [    INFO] - tts http client start
     [2022-04-24 21:08:21,702] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
     [2022-04-24 21:08:21,703] [    INFO] - 首包响应：0.18863153457641602 s
@@ -184,20 +184,19 @@
   - `log_file`: log 文件. 默认：./log/paddlespeech.log
 
   输出:
-  ```bash
-    [2022-04-27 10:18:09,107] [    INFO] - The first response time of the 0 warm up: 1.1551103591918945 s
-    [2022-04-27 10:18:09,219] [    INFO] - The first response time of the 1 warm up: 0.11204338073730469 s
-    [2022-04-27 10:18:09,324] [    INFO] - The first response time of the 2 warm up: 0.1051797866821289 s
-    [2022-04-27 10:18:09,325] [    INFO] - **********************************************************************
-    INFO:     Started server process [17600]
-    [2022-04-27 10:18:09] [INFO] [server.py:75] Started server process [17600]
-    INFO:     Waiting for application startup.
-    [2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup.
-    INFO:     Application startup complete.
-    [2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete.
-    INFO:     Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
-    [2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
-
+  ```text
+  [2022-04-27 10:18:09,107] [    INFO] - The first response time of the 0 warm up: 1.1551103591918945 s
+  [2022-04-27 10:18:09,219] [    INFO] - The first response time of the 1 warm up: 0.11204338073730469 s
+  [2022-04-27 10:18:09,324] [    INFO] - The first response time of the 2 warm up: 0.1051797866821289 s
+  [2022-04-27 10:18:09,325] [    INFO] - **********************************************************************
+  INFO:     Started server process [17600]
+  [2022-04-27 10:18:09] [INFO] [server.py:75] Started server process [17600]
+  INFO:     Waiting for application startup.
+  [2022-04-27 10:18:09] [INFO] [on.py:45] Waiting for application startup.
+  INFO:     Application startup complete.
+  [2022-04-27 10:18:09] [INFO] [on.py:59] Application startup complete.
+  INFO:     Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+  [2022-04-27 10:18:09] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
   ```
 
 - Python API
@@ -210,21 +209,20 @@
       log_file="./log/paddlespeech.log")
   ```
 
-  输出：
-  ```bash
-    [2022-04-27 10:20:16,660] [    INFO] - The first response time of the 0 warm up: 1.0945196151733398 s
-    [2022-04-27 10:20:16,773] [    INFO] - The first response time of the 1 warm up: 0.11222052574157715 s
-    [2022-04-27 10:20:16,878] [    INFO] - The first response time of the 2 warm up: 0.10494542121887207 s
-    [2022-04-27 10:20:16,878] [    INFO] - **********************************************************************
-    INFO:     Started server process [23466]
-    [2022-04-27 10:20:16] [INFO] [server.py:75] Started server process [23466]
-    INFO:     Waiting for application startup.
-    [2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup.
-    INFO:     Application startup complete.
-    [2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete.
-    INFO:     Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
-    [2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
-
+  输出:
+  ```text
+  [2022-04-27 10:20:16,660] [    INFO] - The first response time of the 0 warm up: 1.0945196151733398 s
+  [2022-04-27 10:20:16,773] [    INFO] - The first response time of the 1 warm up: 0.11222052574157715 s
+  [2022-04-27 10:20:16,878] [    INFO] - The first response time of the 2 warm up: 0.10494542121887207 s
+  [2022-04-27 10:20:16,878] [    INFO] - **********************************************************************
+  INFO:     Started server process [23466]
+  [2022-04-27 10:20:16] [INFO] [server.py:75] Started server process [23466]
+  INFO:     Waiting for application startup.
+  [2022-04-27 10:20:16] [INFO] [on.py:45] Waiting for application startup.
+  INFO:     Application startup complete.
+  [2022-04-27 10:20:16] [INFO] [on.py:59] Application startup complete.
+  INFO:     Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
+  [2022-04-27 10:20:16] [INFO] [server.py:211] Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
   ```
 
 #### 4.2 客户端使用方法
@@ -256,15 +254,14 @@
 
 
     输出:
-    ```bash
+    ```text
     [2022-04-27 10:21:04,262] [    INFO] - tts websocket client start
     [2022-04-27 10:21:04,496] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
     [2022-04-27 10:21:04,496] [    INFO] - 首包响应：0.2124948501586914 s
     [2022-04-27 10:21:07,483] [    INFO] - 尾包响应：3.199106454849243 s
     [2022-04-27 10:21:07,484] [    INFO] - 音频时长：3.825 s
     [2022-04-27 10:21:07,484] [    INFO] - RTF: 0.8363677006141812
     [2022-04-27 10:21:07,516] [    INFO] - 音频保存至：output.wav
-
     ```
 
 - Python API
@@ -281,17 +278,15 @@
       spk_id=0,
       output="./output.wav",
       play=False)
-
   ```
 
   输出:
-  ```bash
+  ```text
     [2022-04-27 10:22:48,852] [    INFO] - tts websocket client start
     [2022-04-27 10:22:49,080] [    INFO] - 句子：您好，欢迎使用百度飞桨语音合成服务。
     [2022-04-27 10:22:49,080] [    INFO] - 首包响应：0.21017956733703613 s
     [2022-04-27 10:22:52,100] [    INFO] - 尾包响应：3.2304444313049316 s
     [2022-04-27 10:22:52,101] [    INFO] - 音频时长：3.825 s
     [2022-04-27 10:22:52,101] [    INFO] - RTF: 0.8445606356352762
     [2022-04-27 10:22:52,134] [    INFO] - 音频保存至：./output.wav
-
-  ```
+  ```