Add text classification serving example (#499)

Steffy-zxf · web-flow · commit c9854327fe37 · 2021-06-24T16:26:15.000+08:00
* add serving scripts

* add serving deploy scripts

* rm deploy/serving/README.md

* update docs

* update docs

* update serving usage docs
diff --git a/examples/text_classification/pretrained_models/README.md b/examples/text_classification/pretrained_models/README.md
@@ -55,6 +55,9 @@ pretrained_models/
 ├── deploy # 部署
 │   └── python
 │       └── predict.py # python预测部署示例
+│   └── serving
+│       ├── client.py # 客户端预测脚本
+│       └── export_servable_model.py # 导出Serving模型及其配置
 ├── export_model.py # 动态图参数导出静态图参数脚本
 ├── predict.py # 预测脚本
 ├── README.md # 使用说明
@@ -176,3 +179,97 @@ Data: 这个宾馆比较陈旧了，特价的房间也很一般。总体来说
 Data: 怀着十分激动的心情放映，可是看着看着发现，在放映完毕后，出现一集米老鼠的动画片      Label: negative
 Data: 作为老的四星酒店，房间依然很整洁，相当不错。机场接机服务很好，可以在车上办理入住手续，节省时间。      Label: positive
 ```
+
+
+## 使用Paddle Serving API进行推理部署
+
+**NOTE：**
+
+使用Paddle Serving服务化部署需要将动态图保存的模型参数导出为静态图Inference模型参数文件。如何导出模型参考上述提到的**导出模型**。
+
+Inference模型参数文件：
+| 文件                          | 说明                                   |
+|-------------------------------|----------------------------------------|
+| static_graph_params.pdiparams | 模型权重文件，供推理时加载使用            |
+| static_graph_params.pdmodel   | 模型结构文件，供推理时加载使用            |
+
+
+### 依赖安装
+
+* 服务器端依赖：
+
+```shell
+pip install paddle-serving-app paddle-serving-client paddle-serving-server==0.5.0
+```
+
+如果服务器端可以使用GPU进行推理，则安装server的gpu版本，安装时要注意参考服务器当前CUDA、TensorRT的版本来安装对应的版本：[Serving readme](https://github.com/PaddlePaddle/Serving/tree/v0.5.0)
+
+```shell
+pip install paddle-serving-app paddle-serving-client paddle-serving-server-gpu==0.5.0
+```
+
+* 客户端依赖：
+
+```shell
+pip install paddle-serving-app paddle-serving-client
+```
+
+建议在**docker**容器中运行服务器端和客户端以避免一些系统依赖库问题，启动docker镜像的命令参考：[Serving readme](https://github.com/PaddlePaddle/Serving/tree/v0.5.0)
+
+### Serving的模型和配置导出
+
+使用Serving进行预测部署时，需要将静态图inference model导出为Serving可读入的模型参数和配置。运行方式如下：
+
+```shell
+python -u deploy/serving/export_servable_model.py \
+    --inference_model_dir ./ \
+    --model_file static_graph_params.pdmodel \
+    --params_file static_graph_params.pdiparams
+```
+
+可支持配置的参数：
+* `inference_model_dir`： Inference推理模型所在目录，这里假设为当前目录。
+* `model_file`： 推理需要加载的模型结构文件。
+* `params_file`： 推理需要加载的模型权重文件。
+
+执行命令后，会在当前目录下生成2个目录：serving_server 和 serving_client。serving_server目录包含服务器端所需的模型和配置，需将其拷贝到服务器端容器中；serving_client目录包含客户端所需的配置，需将其拷贝到客户端容器中。
+
+### 服务器启动server
+
+在服务器端容器中，启动server
+
+```shell
+python -m deploy/serving/paddle_serving_server_gpu.serve \
+    --model ./serving_server \
+    --port 8090
+```
+其中：
+* `model`： server加载的模型和配置所在目录。
+* `port`： 表示server开启的服务端口8090。
+
+如果服务器端可以使用GPU进行推理计算，则启动服务器时可以配置server使用的GPU id
+
+```shell
+python -m paddle_serving_server_gpu.serve \
+    --model ./serving_server \
+    --port 8090 \
+    --gpu_id 0
+```
+* `gpu_id`： server使用0号GPU。
+
+
+### 客服端发送推理请求
+
+在客户端容器中，使用前面得到的serving_client目录启动client发起RPC推理请求。和使用Paddle Inference API进行推理一样。
+
+### 从命令行读取输入数据发起推理请求
+```shell
+python deploy/serving/client.py \
+    --client_config_file ./serving_client/serving_client_conf.prototxt \
+    --server_ip_port 127.0.0.1:8090 \
+    --max_seq_length 128
+```
+其中参数释义如下：
+- `client_config_file` 表示客户端需要加载的配置文件。
+- `server_ip_port` 表示服务器端的ip地址和端口号。ip地址和端口号需要根据实际情况进行更换。
+- `max_seq_length` 表示输入的最大句子长度，超过该长度将被截断。
diff --git a/examples/text_classification/pretrained_models/deploy/serving/client.py b/examples/text_classification/pretrained_models/deploy/serving/client.py
@@ -0,0 +1,171 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import time
+import numpy as np
+import os
+
+import paddle
+from paddlenlp.data import Stack, Tuple, Pad
+from paddlenlp.transformers import ErnieTinyTokenizer
+from paddle_serving_client import Client
+from scipy.special import softmax
+
+parser = argparse.ArgumentParser()
+parser.add_argument(
+    "--client_config_file",
+    type=str,
+    default="./serving_client/serving_client_conf.prototxt",
+    help="Client prototxt config file.")
+parser.add_argument(
+    "--server_ip_port",
+    type=str,
+    default="127.0.0.1:8090",
+    help="The ip address and port of the server.")
+parser.add_argument(
+    "--batch_size",
+    type=int,
+    default=1,
+    help="Batch size per GPU/CPU for training.")
+parser.add_argument(
+    "--max_seq_length",
+    type=int,
+    default=128,
+    help="The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded."
+)
+args = parser.parse_args()
+
+
+def convert_example(example,
+                    tokenizer,
+                    label_list,
+                    max_seq_length=512,
+                    is_test=False):
+    """
+    Builds model inputs from a sequence or a pair of sequence for sequence classification tasks
+    by concatenating and adding special tokens. And creates a mask from the two sequences passed 
+    to be used in a sequence-pair classification task.
+        
+    A BERT sequence has the following format:
+
+    - single sequence: ``[CLS] X [SEP]``
+    - pair of sequences: ``[CLS] A [SEP] B [SEP]``
+
+    A BERT sequence pair mask has the following format:
+    ::
+        0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
+        | first sequence    | second sequence |
+
+    If only one sequence, only returns the first portion of the mask (0's).
+
+
+    Args:
+        example(obj:`list[str]`): List of input data, containing text and label if it have label.
+        tokenizer(obj:`PretrainedTokenizer`): This tokenizer inherits from :class:`~paddlenlp.transformers.PretrainedTokenizer` 
+            which contains most of the methods. Users should refer to the superclass for more information regarding methods.
+        label_list(obj:`list[str]`): All the labels that the data has.
+        max_seq_len(obj:`int`): The maximum total input sequence length after tokenization. 
+            Sequences longer than this will be truncated, sequences shorter will be padded.
+        is_test(obj:`False`, defaults to `False`): Whether the example contains label or not.
+
+    Returns:
+        input_ids(obj:`list[int]`): The list of token ids.
+        token_type_ids(obj: `list[int]`): List of sequence pair mask.
+        label(obj:`numpy.array`, data type of int64, optional): The input label if not is_test.
+    """
+    text = example
+    encoded_inputs = tokenizer(text=text, max_seq_len=max_seq_length)
+    input_ids = encoded_inputs["input_ids"]
+    token_type_ids = encoded_inputs["token_type_ids"]
+
+    if not is_test:
+        # create label maps
+        label_map = {}
+        for (i, l) in enumerate(label_list):
+            label_map[l] = i
+
+        label = label_map[label]
+        label = np.array([label], dtype="int64")
+        return input_ids, token_type_ids, label
+    else:
+        return input_ids, token_type_ids
+
+
+def predict(data, label_map, batch_size):
+    """
+    Args:
+        sentences (list[str]): each string is a sentence. If have sentences then no need paths
+        paths (list[str]): The paths of file which contain sentences. If have paths then no need sentences
+    Returns:
+        res (list(numpy.ndarray)): The result of sentence, indicate whether each word is replaced, same shape with sentences.
+    """
+
+    # initialize client
+    client = Client()
+    client.load_client_config(args.client_config_file)
+    client.connect([args.server_ip_port])
+
+    # TODO: Text tokenization which is done in the serving end not the client end may be better.
+    tokenizer = ErnieTinyTokenizer.from_pretrained("ernie-tiny")
+    examples = []
+    for text in data:
+        input_ids, token_type_ids = convert_example(
+            text,
+            tokenizer,
+            label_list=label_map.values(),
+            max_seq_length=args.max_seq_length,
+            is_test=True)
+        examples.append((input_ids, token_type_ids))
+
+    batchify_fn = lambda samples, fn=Tuple(
+        Pad(axis=0, pad_val=tokenizer.pad_token_id, dtype='int64'),  # input ids
+        Pad(axis=0, pad_val=tokenizer.pad_token_id, dtype='int64'),  # token type ids
+    ): fn(samples)
+
+    # Seperates data into some batches.
+    batches = [
+        examples[idx:idx + batch_size]
+        for idx in range(0, len(examples), batch_size)
+    ]
+
+    results = []
+    for batch in batches:
+        input_ids, token_type_ids = batchify_fn(batch)
+        fetch_map = client.predict(
+            feed={"input_ids": input_ids,
+                  "token_type_ids": token_type_ids},
+            fetch=["save_infer_model/scale_0.tmp_1"],
+            batch=True)
+        output_data = np.array(fetch_map["save_infer_model/scale_0.tmp_1"])
+        probs = softmax(output_data, axis=1)
+        idx = np.argmax(probs, axis=1)
+        idx = idx.tolist()
+        labels = [label_map[i] for i in idx]
+        results.extend(labels)
+
+    return results
+
+
+if __name__ == '__main__':
+    paddle.enable_static()
+    data = [
+        '这个宾馆比较陈旧了，特价的房间也很一般。总体来说一般',
+        '怀着十分激动的心情放映，可是看着看着发现，在放映完毕后，出现一集米老鼠的动画片',
+        '作为老的四星酒店，房间依然很整洁，相当不错。机场接机服务很好，可以在车上办理入住手续，节省时间。',
+    ]
+    label_map = {0: 'negative', 1: 'positive'}
+    results = predict(data, label_map, args.batch_size)
+    for idx, text in enumerate(data):
+        print('Data: {} \t Label: {}'.format(text, results[idx]))
diff --git a/examples/text_classification/pretrained_models/deploy/serving/export_servable_model.py b/examples/text_classification/pretrained_models/deploy/serving/export_servable_model.py
@@ -0,0 +1,50 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import paddle
+import paddle_serving_client.io as serving_io
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--inference_model_dir",
+        type=str,
+        default="./",
+        help="The directory of the inference model.")
+    parser.add_argument(
+        "--model_file",
+        type=str,
+        default='./static_graph_params.pdmodel',
+        help="The inference model file name.")
+    parser.add_argument(
+        "--params_file",
+        type=str,
+        default='./static_graph_params.pdiparams',
+        help="The input inference parameters file name.")
+    return parser.parse_args()
+
+
+if __name__ == '__main__':
+    paddle.enable_static()
+    args = parse_args()
+    feed_names, fetch_names = serving_io.inference_model_to_serving(
+        dirname=args.inference_model_dir,
+        serving_server="serving_server",
+        serving_client="serving_client",
+        model_filename=args.model_file,
+        params_filename=args.params_file)
+    print("model feed_names : %s" % feed_names)
+    print("model fetch_names : %s" % fetch_names)