PaddlePaddle
diff --git a/‎README.md‎
Lines changed: 3 additions & 3 deletions b/‎README.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎README_cn.md‎
Lines changed: 167 additions & 29 deletions b/‎README_cn.md‎
Lines changed: 167 additions & 29 deletions
diff --git a/‎demos/README.md‎
Lines changed: 1 addition & 0 deletions b/‎demos/README.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎demos/README_cn.md‎
Lines changed: 3 additions & 2 deletions b/‎demos/README_cn.md‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎demos/custom_streaming_asr/setup_docker.sh‎
100644100755 b/‎demos/custom_streaming_asr/setup_docker.sh‎
100644100755
diff --git a/‎demos/keyword_spotting/run.sh‎
100644100755 b/‎demos/keyword_spotting/run.sh‎
100644100755
diff --git a/‎demos/speaker_verification/run.sh‎
100644100755 b/‎demos/speaker_verification/run.sh‎
100644100755
diff --git a/‎demos/speech_recognition/run.sh‎
100644100755
Lines changed: 17 additions & 1 deletion b/‎demos/speech_recognition/run.sh‎
100644100755
Lines changed: 17 additions & 1 deletion
diff --git a/‎demos/speech_server/README.md‎
Lines changed: 4 additions & 1 deletion b/‎demos/speech_server/README.md‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎demos/speech_server/README_cn.md‎
Lines changed: 9 additions & 2 deletions b/‎demos/speech_server/README_cn.md‎
Lines changed: 9 additions & 2 deletions
@@ -25,7 +25,7 @@
   | <a href="#documents"> Documents </a>
   | <a href="#model-list"> Models List </a>
   | <a href="https://aistudio.baidu.com/aistudio/education/group/info/25130"> AIStudio Courses </a>
-  | <a href="https://arxiv.org/abs/2205.12007"> NAACL2022 Paper </a>
+  | <a href="https://arxiv.org/abs/2205.12007"> NAACL2022 Best Demo Award Paper </a>
   | <a href="https://gitee.com/paddlepaddle/PaddleSpeech"> Gitee </a>
 </h4>
 </div>
@@ -34,7 +34,7 @@
 
 **PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models. 
 
-**PaddleSpeech** won the [NAACL2022 Best Demo Award](https://2022.naacl.org/blog/best-demo-award/).
+**PaddleSpeech** won the [NAACL2022 Best Demo Award](https://2022.naacl.org/blog/best-demo-award/), please check out our paper on [Arxiv](https://arxiv.org/abs/2205.12007).
 
 ##### Speech Recognition
 
@@ -179,7 +179,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
 
 ## Installation
 
-We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.7*.
+We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.7* and *paddlepaddle>=2.3.1*.
 Up to now, **Linux** supports CLI for the all our tasks, **Mac OSX** and **Windows** only supports PaddleSpeech CLI for Audio Classification, Speech-to-Text and Text-to-Speech. To install `PaddleSpeech`, please see [installation](./docs/source/install.md).
 
 
 
@@ -20,7 +20,8 @@
 </p>
 <div align="center">  
 <h4>
-    <a href="#快速开始"> 快速开始 </a>
+  <a href="#安装"> 安装 </a>
+  | <a href="#快速开始"> 快速开始 </a>
   | <a href="#快速使用服务"> 快速使用服务 </a>
   | <a href="#快速使用流式服务"> 快速使用流式服务 </a>
   | <a href="#教程文档"> 教程文档 </a>
@@ -36,8 +37,10 @@
 
 **PaddleSpeech** 是基于飞桨 [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) 的语音方向的开源模型库，用于语音和音频中的各种关键任务的开发，包含大量基于深度学习前沿和有影响力的模型，一些典型的应用示例如下：
 
-**PaddleSpeech** 荣获 [NAACL2022 Best Demo Award](https://2022.naacl.org/blog/best-demo-award/).
+**PaddleSpeech** 荣获 [NAACL2022 Best Demo Award](https://2022.naacl.org/blog/best-demo-award/), 请访问 [Arxiv](https://arxiv.org/abs/2205.12007) 论文。
 
+### 效果展示
+
 ##### 语音识别
 
 <div align = "center">
@@ -154,7 +157,7 @@
 本项目采用了易用、高效、灵活以及可扩展的实现，旨在为工业应用、学术研究提供更好的支持，实现的功能包含训练、推断以及测试模块，以及部署过程，主要包括
 - 📦 **易用性**: 安装门槛低，可使用 [CLI](#quick-start) 快速开始。
 - 🏆 **对标 SoTA**: 提供了高速、轻量级模型，且借鉴了最前沿的技术。
-- 🏆 **流式ASR和TTS系统**：工业级的端到端流式识别、流式合成系统。
+- 🏆 **流式 ASR 和 TTS 系统**：工业级的端到端流式识别、流式合成系统。
 - 💯 **基于规则的中文前端**: 我们的前端包含文本正则化和字音转换（G2P）。此外，我们使用自定义语言规则来适应中文语境。
 - **多种工业界以及学术界主流功能支持**:
   - 🛎️ 典型音频任务: 本工具包提供了音频任务如音频分类、语音翻译、自动语音识别、文本转语音、语音合成、声纹识别、KWS等任务的实现。
@@ -182,61 +185,195 @@
 <img src="https://user-images.githubusercontent.com/23690325/169763015-cbd8e28d-602c-4723-810d-dbc6da49441e.jpg"  width = "200"  />
 </div>
 
+<a name="安装"></a>
 ## 安装
 
 我们强烈建议用户在 **Linux** 环境下，*3.7* 以上版本的 *python* 上安装 PaddleSpeech。
-目前为止，**Linux** 支持声音分类、语音识别、语音合成和语音翻译四种功能，**Mac OSX、 Windows** 下暂不支持语音翻译功能。 想了解具体安装细节，可以参考[安装文档](./docs/source/install_cn.md)。
+
+### 相关依赖
++ gcc >= 4.8.5
++ paddlepaddle >= 2.3.1
++ python >= 3.7
++ linux(推荐), mac, windows
+
+PaddleSpeech依赖于paddlepaddle，安装可以参考[paddlepaddle官网](https://www.paddlepaddle.org.cn/)，根据自己机器的情况进行选择。这里给出cpu版本示例，其它版本大家可以根据自己机器的情况进行安装。
+
+```shell
+pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+PaddleSpeech快速安装方式有两种，一种是pip安装，一种是源码编译（推荐）。
+
+### pip 安装
+```shell
+pip install pytest-runner
+pip install paddlespeech
+```
+
+### 源码编译
+```shell
+git clone https://github.com/PaddlePaddle/PaddleSpeech.git
+cd PaddleSpeech
+pip install pytest-runner
+pip install .
+```
+
+更多关于安装问题，如 conda 环境，librosa 依赖的系统库，gcc 环境问题，kaldi 安装等，可以参考这篇[安装文档](docs/source/install_cn.md)，如安装上遇到问题可以在 [#2150](https://github.com/PaddlePaddle/PaddleSpeech/issues/2150) 上留言以及查找相关问题
 
 <a name="快速开始"></a>
 ## 快速开始
 
-安装完成后，开发者可以通过命令行快速开始，改变 `--input` 可以尝试用自己的音频或文本测试。
+安装完成后，开发者可以通过命令行或者Python快速开始，命令行模式下改变 `--input` 可以尝试用自己的音频或文本测试，支持16k wav格式音频。
+
+你也可以在`aistudio`中快速体验 👉🏻[PaddleSpeech API Demo ](https://aistudio.baidu.com/aistudio/projectdetail/4281335?shared=1)。
 
-**声音分类**     
+测试音频示例下载
 ```shell
-paddlespeech cls --input input.wav
+wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
+wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
 ```
-**声纹识别**
+
+### 语音识别
+<details><summary>&emsp;（点击可展开）开源中文语音识别</summary>
+
+命令行一键体验
+
 ```shell
-paddlespeech vector --task spk --input input_16k.wav
+paddlespeech asr --lang zh --input zh.wav
+```
+
+Python API 一键预测
+
+```python
+>>> from paddlespeech.cli.asr.infer import ASRExecutor
+>>> asr = ASRExecutor()
+>>> result = asr(audio_file="zh.wav")
+>>> print(result)
+我认为跑步最重要的就是给我带来了身体健康
 ```
-**语音识别**
+</details>
+
+### 语音合成
+
+<details><summary>&emsp;开源中文语音合成</summary>
+
+输出 24k 采样率wav格式音频
+
+
+命令行一键体验
+
 ```shell
-paddlespeech asr --lang zh --input input_16k.wav
+paddlespeech tts --input "你好，欢迎使用百度飞桨深度学习框架！" --output output.wav
+```
+
+Python API 一键预测
+
+```python
+>>> from paddlespeech.cli.tts.infer import TTSExecutor
+>>> tts = TTSExecutor()
+>>> tts(text="今天天气十分不错。", output="output.wav")
 ```
-**语音翻译** (English to Chinese)
+- 语音合成的 web demo 已经集成进了 [Huggingface Spaces](https://huggingface.co/spaces). 请参考: [TTS Demo](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS)
+
+</details>
+
+### 声音分类   
+
+<details><summary>&emsp;适配多场景的开放领域声音分类工具</summary>
+
+基于AudioSet数据集527个类别的声音分类模型
+
+命令行一键体验
+
 ```shell
-paddlespeech st --input input_16k.wav
+paddlespeech cls --input zh.wav
 ```
-**语音合成** 
+
+python API 一键预测
+
+```python
+>>> from paddlespeech.cli.cls.infer import CLSExecutor
+>>> cls = CLSExecutor()
+>>> result = cls(audio_file="zh.wav")
+>>> print(result)
+Speech 0.9027186632156372
+```
+
+</details>
+
+### 声纹提取
+
+<details><summary>&emsp;工业级声纹提取工具</summary>
+
+命令行一键体验
+
 ```shell
-paddlespeech tts --input "你好，欢迎使用百度飞桨深度学习框架！" --output output.wav
+paddlespeech vector --task spk --input zh.wav
 ```
-- 语音合成的 web demo 已经集成进了 [Huggingface Spaces](https://huggingface.co/spaces). 请参考: [TTS Demo](https://huggingface.co/spaces/akhaliq/paddlespeech)
 
-**文本后处理** 
- - 标点恢复
-   ```bash
-   paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
-   ```
+Python API 一键预测
 
-**批处理**
+```python
+>>> from paddlespeech.cli.vector import VectorExecutor
+>>> vec = VectorExecutor()
+>>> result = vec(audio_file="zh.wav")
+>>> print(result) # 187维向量
+[ -0.19083306   9.474295   -14.122263    -2.0916545    0.04848729
+   4.9295826    1.4780062    0.3733844   10.695862     3.2697146
+  -4.48199     -0.6617882   -9.170393   -11.1568775   -1.2358263 ...]
 ```
-echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts
+
+</details>
+
+### 标点恢复 
+
+<details><summary>&emsp;一键恢复文本标点，可与ASR模型配合使用</summary>
+
+命令行一键体验
+
+```shell
+paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
+```
+
+Python API 一键预测
+
+```python
+>>> from paddlespeech.cli.text.infer import TextExecutor
+>>> text_punc = TextExecutor()
+>>> result = text_punc(text="今天的天气真不错啊你下午有空吗我想约你一起去吃饭")
+今天的天气真不错啊！你下午有空吗？我想约你一起去吃饭。
 ```
 
-**Shell管道**
-ASR + Punc:
+</details>
+
+### 语音翻译
+
+<details><summary>&emsp;端到端英译中语音翻译工具</summary>
+
+使用预编译的kaldi相关工具，只支持在Ubuntu系统中体验
+
+命令行一键体验
+
+```shell
+paddlespeech st --input en.wav
 ```
-paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
+
+python API 一键预测
+
+```python
+>>> from paddlespeech.cli.st.infer import STExecutor
+>>> st = STExecutor()
+>>> result = st(audio_file="en.wav")
+['我 在 这栋 建筑 的 古老 门上 敲门 。']
 ```
 
-更多命令行命令请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos)
-> Note: 如果需要训练或者微调，请查看[语音识别](./docs/source/asr/quick_start.md)， [语音合成](./docs/source/tts/quick_start.md)。
+</details>
+
+
 
 <a name="快速使用服务"></a>
 ## 快速使用服务
-安装完成后，开发者可以通过命令行快速使用服务。
+安装完成后，开发者可以通过命令行一键启动语音识别，语音合成，音频分类三种服务。
 
 **启动服务**     
 ```shell
@@ -614,6 +751,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
 
 语音合成模块最初被称为 [Parakeet](https://github.com/PaddlePaddle/Parakeet)，现在与此仓库合并。如果您对该任务的学术研究感兴趣，请参阅 [TTS 研究概述](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/docs/source/tts#overview)。此外，[模型介绍](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/tts/models_introduction.md) 是了解语音合成流程的一个很好的指南。
 
+
 ## ⭐ 应用案例
 - **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): 使用 PaddleSpeech 的语音合成模块生成虚拟人的声音。**
 
 
@@ -12,6 +12,7 @@ This directory contains many speech applications in multiple scenarios.
 * speech recognition - recognize text of an audio file 
 * speech server - Server for Speech Task, e.g. ASR,TTS,CLS
 * streaming asr server - receive audio stream from websocket, and recognize to transcript.
+* streaming tts server - receive text from http or websocket, and streaming audio data stream.
 * speech translation - end to end speech translation  
 * story talker - book reader based on OCR and TTS  
 * style_fs2 - multi style control for FastSpeech2 model  
 
@@ -10,8 +10,9 @@
 * 元宇宙 - 基于语音合成的 2D 增强现实。
 * 标点恢复 - 通常作为语音识别的文本后处理任务，为一段无标点的纯文本添加相应的标点符号。
 * 语音识别 - 识别一段音频中包含的语音文字。
-* 语音服务 - 离线语音服务，包括ASR、TTS、CLS等
-* 流式语音识别服务 - 流式输入语音数据流识别音频中的文字
+* 语音服务 - 离线语音服务，包括ASR、TTS、CLS等。
+* 流式语音识别服务 - 流式输入语音数据流识别音频中的文字。
+* 流式语音合成服务 - 根据待合成文本流式生成合成音频数据流。
 * 语音翻译 - 实时识别音频中的语言，并同时翻译成目标语言。
 * 会说话的故事书 - 基于 OCR 和语音合成的会说话的故事书。
 * 个性化语音合成 - 基于 FastSpeech2 模型的个性化语音合成。 
 
@@ -1,10 +1,26 @@
 #!/bin/bash
 
-wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
+wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
+wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
 
 # asr
 paddlespeech asr --input ./zh.wav
 
 
 # asr + punc
 paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
+
+
+# asr help
+paddlespeech asr --help
+
+
+# english asr
+paddlespeech asr --lang en --model transformer_librispeech --input ./en.wav
+
+# model stats
+paddlespeech stats --task asr
+
+
+# paddlespeech help
+paddlespeech --help
@@ -14,7 +14,10 @@ For service interface definition, please check:
 see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
 
 It is recommended to use **paddlepaddle 2.3.1** or above.
-You can choose one way from meduim and hard to install paddlespeech.
+
+You can choose one way from easy, meduim and hard to install paddlespeech.
+
+**If you install in easy mode, you need to prepare the yaml file by yourself, you can refer to the yaml file in the conf directory.**
 
 ### 2. Prepare config File
 The configuration file can be found in `conf/application.yaml` .
 
@@ -3,8 +3,10 @@
 # 语音服务
 
 ## 介绍
+
 这个 demo 是一个启动离线语音服务和访问服务的实现。它可以通过使用 `paddlespeech_server` 和 `paddlespeech_client` 的单个命令或 python 的几行代码来实现。
 
+
 服务接口定义请参考:
 - [PaddleSpeech Server RESTful API](https://github.com/PaddlePaddle/PaddleSpeech/wiki/PaddleSpeech-Server-RESTful-API)
 
@@ -13,12 +15,17 @@
 请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
 
 推荐使用 **paddlepaddle 2.3.1** 或以上版本。
-你可以从 medium，hard 两种方式中选择一种方式安装 PaddleSpeech。
+
+你可以从简单，中等，困难 几种方式中选择一种方式安装 PaddleSpeech。
+
+**如果使用简单模式安装，需要自行准备 yaml 文件，可参考 conf 目录下的 yaml 文件。**
 
 ### 2. 准备配置文件
 配置文件可参见 `conf/application.yaml` 。
-其中，`engine_list`表示即将启动的服务将会包含的语音引擎，格式为 <语音任务>_<引擎类型>。
+其中，`engine_list` 表示即将启动的服务将会包含的语音引擎，格式为 <语音任务>_<引擎类型>。
+
 目前服务集成的语音任务有： asr (语音识别)、tts (语音合成)、cls (音频分类)、vector (声纹识别)以及 text (文本处理)。
+
 目前引擎类型支持两种形式：python 及 inference (Paddle Inference)
 **注意：** 如果在容器里可正常启动服务，但客户端访问 ip 不可达，可尝试将配置文件中 `host` 地址换成本地 ip 地址。