Skip to content

Commit a36cfd3

Browse files
authored
Merge pull request #1818 from Honei/r1.0
[R1.0]update the paddlespeech_client asr_online cli
2 parents 7eb3ab0 + 0d2cc0c commit a36cfd3

File tree

5 files changed

+124
-39
lines changed

5 files changed

+124
-39
lines changed

demos/streaming_asr_server/README.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
3131
- Command Line (Recommended)
3232

3333
```bash
34-
# start the service
34+
# in PaddleSpeech/demos/streaming_asr_server start the service
3535
paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml
3636
```
3737

@@ -111,6 +111,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
111111
112112
- Python API
113113
```python
114+
# in PaddleSpeech/demos/streaming_asr_server directory
114115
from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
115116

116117
server_executor = ServerExecutor()
@@ -186,10 +187,11 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
186187
187188
188189
### 4. ASR Client Usage
190+
189191
**Note:** The response time will be slightly longer when using the client for the first time
190192
- Command Line (Recommended)
191193
```
192-
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav --protocol websocket
194+
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
193195
```
194196
195197
Usage:
@@ -204,6 +206,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
204206
- `sample_rate`: Audio ampling rate, default: 16000.
205207
- `lang`: Language. Default: "zh_cn".
206208
- `audio_format`: Audio format. Default: "wav".
209+
- `punc.server_ip`: punctuation server ip. Default: None.
210+
- `punc.server_port`: punctuation server port. Default: None.
207211
208212
Output:
209213
```bash
@@ -275,18 +279,16 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
275279
276280
- Python API
277281
```python
278-
from paddlespeech.server.bin.paddlespeech_client import ASRClientExecutor
279-
import json
282+
from paddlespeech.server.bin.paddlespeech_client import ASROnlineClientExecutor
280283

281-
asrclient_executor = ASRClientExecutor()
284+
asrclient_executor = ASROnlineClientExecutor()
282285
res = asrclient_executor(
283286
input="./zh.wav",
284287
server_ip="127.0.0.1",
285288
port=8090,
286289
sample_rate=16000,
287290
lang="zh_cn",
288-
audio_format="wav",
289-
protocol="websocket")
291+
audio_format="wav")
290292
print(res)
291293
```
292294
@@ -353,5 +355,4 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
353355
[2022-04-21 15:59:08,016] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
354356
[2022-04-21 15:59:08,024] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
355357
[2022-04-21 15:59:12,883] [ INFO] - final receive msg={'status': 'ok', 'signal': 'finished', 'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
356-
[2022-04-21 15:59:12,884] [ INFO] - 我认为跑步最重要的就是给我带来了身体健康
357-
```
358+
```

demos/streaming_asr_server/README_cn.md

Lines changed: 21 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,19 +5,26 @@
55
## 介绍
66
这个demo是一个启动流式语音服务和访问服务的实现。 它可以通过使用`paddlespeech_server``paddlespeech_client`的单个命令或 python 的几行代码来实现。
77

8-
流式语音识别服务只支持 `weboscket` 协议,不支持 `http` 协议。
8+
**流式语音识别服务只支持 `weboscket` 协议,不支持 `http` 协议。**
99

1010
## 使用方法
1111
### 1. 安装
12-
请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
12+
安装 PaddleSpeech 的详细过程请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md)
1313

1414
推荐使用 **paddlepaddle 2.2.1** 或以上版本。
15-
你可以从medium,hard 二中方式中选择一种方式安装 PaddleSpeech。
15+
你可以从medium,hard 两种方式中选择一种方式安装 PaddleSpeech。
1616

1717

1818
### 2. 准备配置文件
19-
配置文件可参见 `conf/ws_application.yaml``conf/ws_conformer_application.yaml`
20-
目前服务集成的模型有: DeepSpeech2和conformer模型。
19+
20+
流式ASR的服务启动脚本和服务测试脚本存放在 `PaddleSpeech/demos/streaming_asr_server` 目录。
21+
下载好 `PaddleSpeech` 之后,进入到 `PaddleSpeech/demos/streaming_asr_server` 目录。
22+
配置文件可参见该目录下 `conf/ws_application.yaml``conf/ws_conformer_application.yaml`
23+
24+
目前服务集成的模型有: DeepSpeech2和 conformer模型,对应的配置文件如下:
25+
* DeepSpeech: `conf/ws_application.yaml`
26+
* conformer: `conf/ws_conformer_application.yaml`
27+
2128

2229

2330
这个 ASR client 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。
@@ -31,7 +38,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
3138
- 命令行 (推荐使用)
3239

3340
```bash
34-
# 启动服务
41+
# 在 PaddleSpeech/demos/streaming_asr_server 目录启动服务
3542
paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml
3643
```
3744

@@ -111,6 +118,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
111118
112119
- Python API
113120
```python
121+
# 在 PaddleSpeech/demos/streaming_asr_server 目录
114122
from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
115123

116124
server_executor = ServerExecutor()
@@ -185,11 +193,11 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
185193
```
186194
187195
### 4. ASR 客户端使用方法
196+
188197
**注意:** 初次使用客户端时响应时间会略长
189198
- 命令行 (推荐使用)
190199
```
191-
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav --protocol websocket
192-
200+
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
193201
```
194202
195203
使用帮助:
@@ -205,6 +213,8 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
205213
- `sample_rate`: 音频采样率,默认值:16000。
206214
- `lang`: 模型语言,默认值:zh_cn。
207215
- `audio_format`: 音频格式,默认值:wav。
216+
- `punc.server_ip` 标点预测服务的ip。默认是None。
217+
- `punc.server_port` 标点预测服务的端口port。默认是None。
208218
209219
输出:
210220
@@ -276,18 +286,16 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
276286
277287
- Python API
278288
```python
279-
from paddlespeech.server.bin.paddlespeech_client import ASRClientExecutor
280-
import json
289+
from paddlespeech.server.bin.paddlespeech_client import ASROnlineClientExecutor
281290

282-
asrclient_executor = ASRClientExecutor()
291+
asrclient_executor = ASROnlineClientExecutor()
283292
res = asrclient_executor(
284293
input="./zh.wav",
285294
server_ip="127.0.0.1",
286295
port=8090,
287296
sample_rate=16000,
288297
lang="zh_cn",
289-
audio_format="wav",
290-
protocol="websocket")
298+
audio_format="wav")
291299
print(res)
292300
```
293301
@@ -354,5 +362,4 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
354362
[2022-04-21 15:59:08,016] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
355363
[2022-04-21 15:59:08,024] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
356364
[2022-04-21 15:59:12,883] [ INFO] - final receive msg={'status': 'ok', 'signal': 'finished', 'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
357-
[2022-04-21 15:59:12,884] [ INFO] - 我认为跑步最重要的就是给我带来了身体健康
358365
```

examples/voxceleb/sv0/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,6 @@ tar -xvf sv0_ecapa_tdnn_voxceleb12_ckpt_0_2_0.tar.gz
146146
source path.sh
147147
# If you have processed the data and get the manifest file, you can skip the following 2 steps
148148
149-
CUDA_VISIBLE_DEVICES= ./local/test.sh ./data sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_2 conf/ecapa_tdnn.yaml
149+
CUDA_VISIBLE_DEVICES= bash ./local/test.sh ./data sv0_ecapa_tdnn_voxceleb12_ckpt_0_1_2/model/ conf/ecapa_tdnn.yaml
150150
```
151151
The performance of the released models are shown in [this](./RESULTS.md)

examples/voxceleb/sv0/local/test.sh

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,10 +33,26 @@ dir=$1
3333
exp_dir=$2
3434
conf_path=$3
3535

36+
# get the gpu nums for training
37+
ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
38+
echo "using $ngpu gpus..."
39+
40+
# setting training device
41+
device="cpu"
42+
if ${use_gpu}; then
43+
device="gpu"
44+
fi
45+
if [ $ngpu -le 0 ]; then
46+
echo "no gpu, training in cpu mode"
47+
device='cpu'
48+
use_gpu=false
49+
fi
50+
3651
if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
3752
# test the model and compute the eer metrics
3853
python3 ${BIN_DIR}/test.py \
3954
--data-dir ${dir} \
4055
--load-checkpoint ${exp_dir} \
41-
--config ${conf_path}
56+
--config ${conf_path} \
57+
--device ${device}
4258
fi

paddlespeech/server/bin/paddlespeech_client.py

Lines changed: 75 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535

3636
__all__ = [
3737
'TTSClientExecutor', 'TTSOnlineClientExecutor', 'ASRClientExecutor',
38-
'CLSClientExecutor'
38+
'ASROnlineClientExecutor', 'CLSClientExecutor'
3939
]
4040

4141

@@ -370,25 +370,15 @@ def __call__(self,
370370
str: The ASR results
371371
"""
372372
# we use the asr server to recognize the audio text content
373+
# and paddlespeech_client asr only support http protocol
374+
protocol = "http"
373375
if protocol.lower() == "http":
374376
from paddlespeech.server.utils.audio_handler import ASRHttpHandler
375377
logger.info("asr http client start")
376378
handler = ASRHttpHandler(server_ip=server_ip, port=port)
377379
res = handler.run(input, audio_format, sample_rate, lang)
378380
res = res['result']['transcription']
379381
logger.info("asr http client finished")
380-
381-
elif protocol.lower() == "websocket":
382-
logger.info("asr websocket client start")
383-
handler = ASRWsAudioHandler(
384-
server_ip,
385-
port,
386-
punc_server_ip=punc_server_ip,
387-
punc_server_port=punc_server_port)
388-
loop = asyncio.get_event_loop()
389-
res = loop.run_until_complete(handler.run(input))
390-
res = res['result']
391-
logger.info("asr websocket client finished")
392382
else:
393383
logger.error(f"Sorry, we have not support protocol: {protocol},"
394384
"please use http or websocket protocol")
@@ -397,6 +387,77 @@ def __call__(self,
397387
return res
398388

399389

390+
@cli_client_register(
391+
name='paddlespeech_client.asr_online',
392+
description='visit asr online service')
393+
class ASROnlineClientExecutor(BaseExecutor):
394+
def __init__(self):
395+
super(ASROnlineClientExecutor, self).__init__()
396+
self.parser = argparse.ArgumentParser(
397+
prog='paddlespeech_client.asr_online', add_help=True)
398+
self.parser.add_argument(
399+
'--server_ip', type=str, default='127.0.0.1', help='server ip')
400+
self.parser.add_argument(
401+
'--port', type=int, default=8091, help='server port')
402+
self.parser.add_argument(
403+
'--input',
404+
type=str,
405+
default=None,
406+
help='Audio file to be recognized',
407+
required=True)
408+
self.parser.add_argument(
409+
'--sample_rate', type=int, default=16000, help='audio sample rate')
410+
self.parser.add_argument(
411+
'--lang', type=str, default="zh_cn", help='language')
412+
self.parser.add_argument(
413+
'--audio_format', type=str, default="wav", help='audio format')
414+
415+
def execute(self, argv: List[str]) -> bool:
416+
args = self.parser.parse_args(argv)
417+
input_ = args.input
418+
server_ip = args.server_ip
419+
port = args.port
420+
sample_rate = args.sample_rate
421+
lang = args.lang
422+
audio_format = args.audio_format
423+
try:
424+
time_start = time.time()
425+
res = self(
426+
input=input_,
427+
server_ip=server_ip,
428+
port=port,
429+
sample_rate=sample_rate,
430+
lang=lang,
431+
audio_format=audio_format)
432+
time_end = time.time()
433+
logger.info(res)
434+
logger.info("Response time %f s." % (time_end - time_start))
435+
return True
436+
except Exception as e:
437+
logger.error("Failed to speech recognition.")
438+
logger.error(e)
439+
return False
440+
441+
@stats_wrapper
442+
def __call__(self,
443+
input: str,
444+
server_ip: str="127.0.0.1",
445+
port: int=8091,
446+
sample_rate: int=16000,
447+
lang: str="zh_cn",
448+
audio_format: str="wav"):
449+
"""
450+
Python API to call an executor.
451+
"""
452+
logger.info("asr websocket client start")
453+
handler = ASRWsAudioHandler(server_ip, port)
454+
loop = asyncio.get_event_loop()
455+
res = loop.run_until_complete(handler.run(input))
456+
logger.info("asr websocket client finished")
457+
458+
return res['result']
459+
460+
400461
@cli_client_register(
401462
name='paddlespeech_client.cls', description='visit cls service')
402463
class CLSClientExecutor(BaseExecutor):
@@ -521,4 +582,4 @@ def __call__(self, input: str, server_ip: str="127.0.0.1", port: int=8090):
521582
res = requests.post(url=url, data=json.dumps(request))
522583
response_dict = res.json()
523584
punc_text = response_dict["result"]["punc_text"]
524-
return punc_text
585+
return punc_text

0 commit comments

Comments
 (0)