Skip to content

Commit ba57969

Browse files
heliqiZeyuChen
andauthored
ERNIE 3.0 serving deploy update (#2146)
* optimized code * optimized code * optimized code2 Co-authored-by: Zeyu Chen <[email protected]>
1 parent 9f00755 commit ba57969

File tree

6 files changed

+118
-44
lines changed

6 files changed

+118
-44
lines changed

model_zoo/ernie-3.0/deploy/serving/README.md

Lines changed: 96 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,26 @@
1-
# ERNIE-3.0服务化部署
1+
# 基于Paddle Serving的服务化部署
2+
3+
本文档将介绍如何使用[Paddle Serving](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署ERNIE 3.0新闻分类和命名实体识别模型的pipeline在线服务。
4+
5+
## 目录
6+
- [环境准备](#环境准备)
7+
- [模型转换](#模型转换)
8+
- [部署模型](#部署模型)
29

310
## 环境准备
11+
需要[准备PaddleNLP的运行环境]()和Paddle Serving的运行环境。
412

5-
### 安装Paddle Serving
13+
### 安装Paddle Serving
614
安装指令如下,更多wheel包请参考[serving官网文档](https://github.com/PaddlePaddle/Serving/blob/develop/doc/Latest_Packages_CN.md)
715
```
16+
# 安装client和serving app,用于向服务发送请求
817
pip install paddle_serving_app paddle_serving_clinet
918
19+
# 安装serving,用于启动服务
1020
# CPU server
1121
pip install paddle_serving_server
1222
13-
# GPU server, 需要确认环境再选择执行哪一条:
23+
# GPU server, 选择跟本地环境一致的命令:
1424
# CUDA10.2 + Cudnn7 + TensorRT6
1525
pip install paddle-serving-server-gpu==0.8.3.post102 -i https://pypi.tuna.tsinghua.edu.cn/simple
1626
# CUDA10.1 + TensorRT6
@@ -22,41 +32,67 @@ pip install paddle-serving-server-gpu==0.8.3.post112 -i https://pypi.tuna.tsingh
2232
默认开启国内清华镜像源来加速下载,如果您使用 HTTP 代理可以关闭(-i https://pypi.tuna.tsinghua.edu.cn/simple)
2333

2434

25-
### 安装Paddle库
26-
更多Paddle库下载安装可参考[Paddle官网文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/index_cn.html)
35+
### 安装FasterTokenizer文本处理加速库(可选)
36+
如果部署环境是Linux,推荐安装faster_tokenizer可以得到更极致的文本处理效率,进一步提升服务性能。目前暂不支持Windows设备安装,将会在下个版本支持。
2737
```
28-
# CPU 环境请执行
29-
pip3 install paddlepaddle
30-
31-
# GPU CUDA 环境(默认CUDA10.2)
32-
pip3 install paddlepaddle-gpu
38+
pip install faster_tokenizers
3339
```
3440

35-
## 准备模型和数据
36-
下载[Erine-3.0模型](TODO)
3741

38-
### 转换模型
39-
如果是链接中下载的部署模型或训练导出的静态图推理模型(含`xx.pdmodel``xx.pdiparams`),需要转换成serving模型
42+
## 模型转换
43+
44+
使用Paddle Serving做服务化部署时,需要将保存的inference模型转换为serving易于部署的模型。
45+
46+
下载ERNIE 3.0的新闻分类、命名实体识别模型:
47+
48+
```bash
49+
# 下载并解压新闻分类模型
50+
wget https://paddlenlp.bj.bcebos.com/models/transformers/ernie_3.0/tnews_pruned_infer_model.zip
51+
unzip tnews_pruned_infer_model.zip
52+
# 下载并解压命名实体识别模型
53+
wget https://paddlenlp.bj.bcebos.com/models/transformers/ernie_3.0/msra_ner_pruned_infer_model.zip
54+
unzip msra_ner_pruned_infer_model.zip
4055
```
56+
57+
用已安装的paddle_serving_client将inference模型转换成serving格式。
58+
59+
```bash
4160
# 模型地址根据实际填写即可
42-
python -m paddle_serving_client.convert --dirname models/erinie-3.0 --model_filename infer.pdmodel --params_filename infer.pdiparams
61+
# 转换新闻分类模型
62+
python -m paddle_serving_client.convert --dirname tnews_pruned_infer_model --model_filename float32.pdmodel --params_filename float32.pdiparams
63+
64+
# 转换命名实体识别模型
65+
python -m paddle_serving_client.convert --dirname msra_ner_pruned_infer_model --model_filename float32.pdmodel --params_filename float32.pdiparams
4366

44-
# 可通过指令查看参数含义
67+
# 可通过命令查参数含义
4568
python -m paddle_serving_client.convert --help
4669
```
4770
转换成功后的目录如下:
4871
```
4972
serving_server
50-
├── infer.pdiparams
51-
├── infer.pdmodel
73+
├── float32.pdiparams
74+
├── float32.pdmodel
5275
├── serving_server_conf.prototxt
5376
└── serving_server_conf.stream.prototxt
5477
```
5578

79+
## 部署模型
80+
81+
serving目录包含启动pipeline服务和发送预测请求的代码,包括:
82+
83+
```
84+
seq_cls_config.yml # 新闻分类任务启动服务端的配置文件
85+
seq_cls_rpc_client.py # 新闻分类任务发送pipeline预测请求的脚本
86+
seq_cls_service.py # 新闻分类任务启动服务端的脚本
87+
88+
token_cls_config.yml # 命名实体识别任务启动服务端的配置文件
89+
token_cls_rpc_client.py # 命名实体识别任务发送pipeline预测请求的脚本
90+
token_cls_service.py # 命名实体识别任务启动服务端的脚本
91+
```
92+
5693

57-
## 服务化部署模型
5894
### 修改配置文件
59-
目录中的`xx_config.yml`文件解释了每一个参数的含义,可以根据实际需要修改其中的配置。比如:
95+
目录中的`seq_cls_config.yml``token_cls_config.yml`文件解释了每一个参数的含义,可以根据实际需要修改其中的配置。比如:
6096
```
6197
# 修改模型目录为下载的模型目录或自己的模型目录:
6298
model_config: no_task_emb/serving_server => model_config: erine-3.0-tiny/serving_server
@@ -70,10 +106,25 @@ device_type: 1 => device_type: 0
70106

71107
### 分类任务
72108
#### 启动服务
73-
修改好配置文件后,执行下面指令启动服务:
109+
修改好配置文件后,执行下面命令启动服务:
74110
```
75111
python seq_cls_service.py
76112
```
113+
输出打印如下:
114+
```
115+
[DAG] Succ init
116+
[PipelineServicer] succ init
117+
--- Running analysis [ir_graph_build_pass]
118+
......
119+
--- Running analysis [ir_graph_to_program_pass]
120+
I0515 05:36:48.316895 62364 analysis_predictor.cc:714] ======= optimize end =======
121+
I0515 05:36:48.320442 62364 naive_executor.cc:98] --- skip [feed], feed -> token_type_ids
122+
I0515 05:36:48.320463 62364 naive_executor.cc:98] --- skip [feed], feed -> input_ids
123+
I0515 05:36:48.321842 62364 naive_executor.cc:98] --- skip [linear_113.tmp_1], fetch -> fetch
124+
[2022-05-15 05:36:49,316] [ INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'ernie-3.0-medium-zh'.
125+
[2022-05-15 05:36:49,317] [ INFO] - Already cached /vdb1/home/heliqi/.paddlenlp/models/ernie-3.0-medium-zh/ernie_3.0_medium_zh_vocab.txt
126+
[OP Object] init success
127+
```
77128

78129
#### 启动client测试
79130
注意执行客户端请求时关闭代理,并根据实际情况修改init_client函数中的ip地址(启动服务所在的机器)
@@ -82,32 +133,51 @@ python seq_cls_rpc_client.py
82133
```
83134
输出打印如下:
84135
```
85-
{'label': array([6, 2]), 'confidence': array([4.9473147, 5.7493963], dtype=float32)}
86-
acc: 0.5745
136+
{'label': array([6, 2]), 'confidence': array([0.5543532, 0.9495907], dtype=float32)}acc: 0.5745
87137
```
88138

89139
### 实体识别任务
90140
#### 启动服务
91-
修改好配置文件后,执行下面指令启动服务:
141+
修改好配置文件后,执行下面命令启动服务:
92142
```
93143
python token_cls_service.py
94144
```
145+
输出打印如下:
146+
```
147+
[DAG] Succ init
148+
[PipelineServicer] succ init
149+
--- Running analysis [ir_graph_build_pass]
150+
......
151+
--- Running analysis [ir_graph_to_program_pass]
152+
I0515 05:36:48.316895 62364 analysis_predictor.cc:714] ======= optimize end =======
153+
I0515 05:36:48.320442 62364 naive_executor.cc:98] --- skip [feed], feed -> token_type_ids
154+
I0515 05:36:48.320463 62364 naive_executor.cc:98] --- skip [feed], feed -> input_ids
155+
I0515 05:36:48.321842 62364 naive_executor.cc:98] --- skip [linear_113.tmp_1], fetch -> fetch
156+
[2022-05-15 05:36:49,316] [ INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'ernie-3.0-medium-zh'.
157+
[2022-05-15 05:36:49,317] [ INFO] - Already cached /vdb1/home/heliqi/.paddlenlp/models/ernie-3.0-medium-zh/ernie_3.0_medium_zh_vocab.txt
158+
[OP Object] init success
159+
```
95160

96161
#### 启动client测试
97162
注意执行客户端请求时关闭代理,并根据实际情况修改init_client函数中的ip地址(启动服务所在的机器)
98163
```
99-
python seq_cls_rpc_client.py
164+
python token_cls_rpc_client.py
100165
```
101166
输出打印如下:
102167
```
103-
input data: 古老的文明,使我们引以为豪,彼此钦佩
168+
input data: 北京的涮肉,重庆的火锅,成都的小吃都是极具特色的美食
104169
The model detects all entities:
170+
entity: 北京 label: LOC pos: [0, 1]
171+
entity: 重庆 label: LOC pos: [6, 7]
172+
entity: 成都 label: LOC pos: [12, 13]
105173
-----------------------------
106174
input data: 原产玛雅故国的玉米,早已成为华夏大地主要粮食作物之一。
107175
The model detects all entities:
108176
entity: 玛雅 label: LOC pos: [2, 3]
109177
entity: 华夏 label: LOC pos: [14, 15]
110178
-----------------------------
179+
PipelineClient::predict pack_data time:1652593013.713769
180+
PipelineClient::predict before time:1652593013.7141528
111181
input data: ['从', '首', '都', '利', '隆', '圭', '乘', '车', '向', '湖', '边', '小', '镇', '萨', '利', '马', '进', '发', '时', ',', '不', '到', '1', '0', '0', '公', '里', '的', '道', '路', '上', '坑', '坑', '洼', '洼', ',', '又', '逢', '阵', '雨', '迷', '蒙', ',', '令', '人', '不', '时', '发', '出', '路', '难', '行', '的', '慨', '叹', '。']
112182
The model detects all entities:
113183
entity: 利隆圭 label: LOC pos: [3, 5]

model_zoo/ernie-3.0/deploy/serving/seq_cls_config.yml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,11 +35,10 @@ op:
3535
client_type: local_predictor
3636

3737
#模型路径
38-
model_config: /vdb1/home/heliqi/tnews_0421/original_fp32_serving_server
39-
# model_config: no_task_emb/serving_server
38+
model_config: serving_server
4039

4140
#Fetch结果列表,以client_config中fetch_var的alias_name为准
42-
fetch_list: ["linear_75.tmp_1"]
41+
fetch_list: ["linear_113.tmp_1"]
4342

4443
# device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
4544
device_type: 1

model_zoo/ernie-3.0/deploy/serving/seq_cls_service.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@ class ErnieSeqClsOp(Op):
2525
def init_op(self):
2626
from paddlenlp.transformers import AutoTokenizer
2727
self.tokenizer = AutoTokenizer.from_pretrained("ernie-3.0-medium-zh")
28+
# Output nodes may differ from model to model
29+
# You can see the output node name in the conf.prototxt file of serving_server
30+
self.fetch_names = ["linear_113.tmp_1", ]
2831

2932
def preprocess(self, input_dicts, data_id, log_id):
3033
# convert input format
@@ -64,11 +67,13 @@ def postprocess(self, input_dicts, fetch_dict, data_id, log_id):
6467
It is handled in the same way as exception.
6568
prod_errinfo: "" default
6669
"""
67-
result = fetch_dict["linear_75.tmp_1"]
68-
# np.argpartition
70+
result = fetch_dict[self.fetch_names[0]]
71+
max_value = np.max(result, axis=1, keepdims=True)
72+
exp_data = np.exp(result - max_value)
73+
probs = exp_data / np.sum(exp_data, axis=1, keepdims=True)
6974
out_dict = {
7075
"label": result.argmax(axis=-1),
71-
"confidence": result.max(axis=-1)
76+
"confidence": probs.max(axis=-1)
7277
}
7378
return out_dict, None, ""
7479

model_zoo/ernie-3.0/deploy/serving/token_cls_config.yml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,10 @@ op:
3636
client_type: local_predictor
3737

3838
#模型路径
39-
model_config: /vdb1/home/heliqi/ner_0506/original_model/serving_server
40-
# model_config: no_task_emb/serving_server
39+
model_config: serving_server
4140

4241
#Fetch结果列表,以client_config中fetch_var的alias_name为准
43-
fetch_list: ["linear_75.tmp_1"]
42+
fetch_list: ["linear_113.tmp_1"]
4443

4544
# device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
4645
device_type: 1

model_zoo/ernie-3.0/deploy/serving/token_cls_rpc_client.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ def init_client():
9797

9898
def test_demo(client):
9999
text1 = [
100-
"古老的文明,使我们引以为豪,彼此钦佩。",
100+
"北京的涮肉,重庆的火锅,成都的小吃都是极具特色的美食。",
101101
"原产玛雅故国的玉米,早已成为华夏大地主要粮食作物之一。",
102102
]
103103
ret = client.predict(feed_dict={"tokens": text1})
@@ -118,4 +118,3 @@ def test_demo(client):
118118
if __name__ == "__main__":
119119
client = init_client()
120120
test_demo(client)
121-
# test_ner_dataset(client)

model_zoo/ernie-3.0/deploy/serving/token_cls_service.py

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,13 @@ class ErnieTokenClsOp(Op):
2626
def init_op(self):
2727
from paddlenlp.transformers import AutoTokenizer
2828
self.tokenizer = AutoTokenizer.from_pretrained("ernie-3.0-medium-zh")
29-
self.labele_names = [
29+
# The label names of NER models trained by different data sets may be different
30+
self.label_names = [
3031
'O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC'
3132
]
33+
# Output nodes may differ from model to model
34+
# You can see the output node name in the conf.prototxt file of serving_server
35+
self.fetch_names = ["linear_113.tmp_1", ]
3236

3337
def get_input_data(self, input_dicts):
3438
(_, input_dict), = input_dicts.items()
@@ -69,7 +73,6 @@ def preprocess(self, input_dicts, data_id, log_id):
6973
is_split_into_words=is_split_into_words)
7074

7175
input_ids = data["input_ids"]
72-
# print("input shape:", len(input_ids), len(input_ids[0]))
7376
token_type_ids = data["token_type_ids"]
7477
return {
7578
"input_ids": np.array(
@@ -93,17 +96,16 @@ def postprocess(self, input_dicts, fetch_dict, data_id, log_id):
9396
prod_errinfo: "" default
9497
"""
9598
input_data = self.get_input_data(input_dicts)
96-
result = fetch_dict["linear_75.tmp_1"]
99+
result = fetch_dict[self.fetch_names[0]]
97100
tokens_label = result.argmax(axis=-1).tolist()
98101
# 获取batch中每个token的实体
99102
value = []
100103
for batch, token_label in enumerate(tokens_label):
101-
# print("label:", token_label)
102104
start = -1
103105
label_name = ""
104106
items = []
105107
for i, label in enumerate(token_label):
106-
if label == 0 and start >= 0:
108+
if self.label_names[label] == "O" and start >= 0:
107109
entity = input_data[batch][start:i - 1]
108110
if isinstance(entity, list):
109111
entity = "".join(entity)
@@ -113,9 +115,9 @@ def postprocess(self, input_dicts, fetch_dict, data_id, log_id):
113115
"label": label_name,
114116
})
115117
start = -1
116-
elif label in [1, 3, 5]:
118+
elif "B-" in self.label_names[label]:
117119
start = i - 1
118-
label_name = self.labele_names[label][2:]
120+
label_name = self.label_names[label][2:]
119121
if start >= 0:
120122
items.append({
121123
"pos": [start, len(token_label) - 1],

0 commit comments

Comments
 (0)