PaddlePaddle
diff --git a/‎applications/neural_search/recall/in_batch_negative/README.md‎
Lines changed: 77 additions & 7 deletions b/‎applications/neural_search/recall/in_batch_negative/README.md‎
Lines changed: 77 additions & 7 deletions
diff --git a/‎applications/neural_search/recall/in_batch_negative/base_model.py‎
Lines changed: 4 additions & 4 deletions b/‎applications/neural_search/recall/in_batch_negative/base_model.py‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎applications/neural_search/recall/in_batch_negative/data.py‎
Lines changed: 1 addition & 3 deletions b/‎applications/neural_search/recall/in_batch_negative/data.py‎
Lines changed: 1 addition & 3 deletions
diff --git a/‎applications/neural_search/recall/in_batch_negative/deploy/python/config_nlp.yml‎
Lines changed: 32 additions & 0 deletions b/‎applications/neural_search/recall/in_batch_negative/deploy/python/config_nlp.yml‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎applications/neural_search/recall/in_batch_negative/deploy/python/predict.py‎
Lines changed: 18 additions & 12 deletions b/‎applications/neural_search/recall/in_batch_negative/deploy/python/predict.py‎
Lines changed: 18 additions & 12 deletions
diff --git a/‎applications/neural_search/recall/in_batch_negative/deploy/python/rpc_client.py‎
Lines changed: 35 additions & 0 deletions b/‎applications/neural_search/recall/in_batch_negative/deploy/python/rpc_client.py‎
Lines changed: 35 additions & 0 deletions
diff --git a/‎applications/neural_search/recall/in_batch_negative/deploy/python/web_service.py‎
Lines changed: 82 additions & 0 deletions b/‎applications/neural_search/recall/in_batch_negative/deploy/python/web_service.py‎
Lines changed: 82 additions & 0 deletions
diff --git a/‎applications/neural_search/recall/in_batch_negative/evaluate.py‎
Lines changed: 6 additions & 3 deletions b/‎applications/neural_search/recall/in_batch_negative/evaluate.py‎
Lines changed: 6 additions & 3 deletions
diff --git a/‎applications/neural_search/recall/in_batch_negative/export_model.py‎
Lines changed: 4 additions & 2 deletions b/‎applications/neural_search/recall/in_batch_negative/export_model.py‎
Lines changed: 4 additions & 2 deletions
@@ -91,15 +91,20 @@ Recall@K召回率是指预测的前topK（top-k是指从最后的按得分排序
 |—— export_model.py # 动态图转换成静态图
 |—— scripts
     |—— export_model.sh  # 动态图转换成静态图脚本
-    |—— predict.sh  # 预测bash版本
-    |—— evaluate.sh # 评估bash版本
-    |—— run_build_index.sh # 构建索引bash版本
-    |—— train_batch_neg.sh  # 训练bash版本
+    |—— predict.sh  # 预测 bash 版本
+    |—— evaluate.sh # 评估 bash 版本
+    |—— run_build_index.sh # 构建索引 bash 版本
+    |—— train_batch_neg.sh  # 训练 bash 版本
+    |—— export_to_serving.sh  # Paddle Inference 转 Serving 的 bash 脚本
 |—— deploy
     |—— python
         |—— predict.py # PaddleInference
-        |—— deploy.sh # Paddle Inference部署脚本
+        |—— deploy.sh # Paddle Inference 部署脚本
+        |—— rpc_client.py # Paddle Serving 的 Client 端
+        |—— web_service.py # Paddle Serving 的 Serving 端
+        |—— config_nlp.yml # Paddle Serving 的配置文件
 |—— inference.py # 动态图抽取向量
+|—— export_to_serving.py # 静态图转 Serving
 
 ```
 
@@ -237,7 +242,7 @@ c. 获取Query的Embedding并查询相似结果
 
 d. 评估
 
-基于评估集 `same_semantic.tsv` 和召回结果 `recall_result` 计算评估指标 Recall@k，其中k取值1，5，10，20，50。
+基于评估集 `dev.csv` 和召回结果 `recall_result` 计算评估指标 Recall@k，其中k取值1，5，10，20，50。
 
 运行如下命令进行 ANN 建库、召回，产出召回结果数据 `recall_result`
 
@@ -267,7 +272,7 @@ python -u -m paddle.distributed.launch --gpus "3" --log_dir "recall_log/" \
 * `hnsw_ef`: hnsw 算法相关参数，保持默认即可
 * `output_emb_size`: Transformer 顶层输出的文本向量维度
 * `recall_num`: 对 1 个文本召回的相似文本数量
-* `similar_text_pair`: 由相似文本对构成的评估集 semantic_similar_pair.tsv
+* `similar_text_pair`: 由相似文本对构成的评估集
 * `corpus_file`: 召回库数据 corpus_file
 
 也可以使用下面的bash脚本：
@@ -447,6 +452,71 @@ sh deploy.sh
 [0.959269642829895, 0.04725276678800583]
 ```
 
+### Paddle Serving部署
+
+Paddle Serving 的详细文档请参考 [Pipeline_Design](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Python_Pipeline/Pipeline_Design_CN.md)和[Serving_Design](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Serving_Design_CN.md),首先把静态图模型转换成Serving的格式：
+
+```
+python export_to_serving.py \
+    --dirname "output" \
+    --model_filename "inference.get_pooled_embedding.pdmodel" \
+    --params_filename "inference.get_pooled_embedding.pdiparams" \
+    --server_path "./serving_server" \
+    --client_path "./serving_client" \
+    --fetch_alias_names "output_embedding"
+
+```
+
+参数含义说明
+* `dirname`: 需要转换的模型文件存储路径，Program 结构文件和参数文件均保存在此目录。
+* `model_filename`： 存储需要转换的模型 Inference Program 结构的文件名称。如果设置为 None ，则使用 `__model__` 作为默认的文件名
+* `params_filename`: 存储需要转换的模型所有参数的文件名称。当且仅当所有模型参数被保>存在一个单独的二进制文件中，它才需要被指定。如果模型参数是存储在各自分离的文件中，设置它的值为 None
+* `server_path`: 转换后的模型文件和配置文件的存储路径。默认值为 serving_server
+* `client_path`: 转换后的客户端配置文件存储路径。默认值为 serving_client
+* `fetch_alias_names`: 模型输出的别名设置，比如输入的 input_ids 等，都可以指定成其他名字，默认不指定
+* `feed_alias_names`: 模型输入的别名设置，比如输出 pooled_out 等，都可以重新指定成其他模型，默认不指定
+
+也可以运行下面的 bash 脚本：
+```
+sh scripts/export_to_serving.sh
+```
+
+然后启动server:
+
+```
+python web_service.py
+```
+
+启动客户端调用 Server。
+
+首先修改rpc_client.py中需要预测的样本：
+
+```
+list_data = [
+    "国有企业引入非国有资本对创新绩效的影响——基于制造业国有上市公司的经验证据",
+    "试论翻译过程中的文化差异与语言空缺翻译过程,文化差异,语言空缺,文化对比"
+]
+```
+然后运行：
+
+```
+python rpc_client.py
+```
+模型的输出为：
+
+```
+{'0': '国有企业引入非国有资本对创新绩效的影响——基于制造业国有上市公司的经验证据', '1': '试论翻译过程中的文化差异与语言空缺翻译过程,文化差异,语言空缺,文化对比'}
+PipelineClient::predict pack_data time:1641450851.3752182
+PipelineClient::predict before time:1641450851.375738
+['output_embedding']
+(2, 256)
+[[ 0.07830612 -0.14036864  0.03433796 -0.14967982 -0.03386067  0.06630666
+   0.01357943  0.03531194  0.02411093  0.02000859  0.05724002 -0.08119463
+   ......
+```
+
+可以看到客户端发送了2条文本，返回了2个 embedding 向量
+
 ## Reference
 
 [1] Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih, Dense Passage Retrieval for Open-Domain Question Answering, Preprint 2020.
@@ -28,8 +28,8 @@ def __init__(self, pretrained_model, dropout=None, output_emb_size=None):
         self.ptm = pretrained_model
         self.dropout = nn.Dropout(dropout if dropout is not None else 0.1)
 
-        # if output_emb_size is not None, then add Linear layer to reduce embedding_size, 
-        # we recommend set output_emb_size = 256 considering the trade-off beteween 
+        # if output_emb_size is not None, then add Linear layer to reduce embedding_size,
+        # we recommend set output_emb_size = 256 considering the trade-off beteween
         # recall performance and efficiency
 
         self.output_emb_size = output_emb_size
@@ -105,8 +105,8 @@ def __init__(self, pretrained_model, dropout=None, output_emb_size=None):
         self.ptm = pretrained_model
         self.dropout = nn.Dropout(dropout if dropout is not None else 0.1)
 
-        # if output_emb_size is not None, then add Linear layer to reduce embedding_size, 
-        # we recommend set output_emb_size = 256 considering the trade-off beteween 
+        # if output_emb_size is not None, then add Linear layer to reduce embedding_size,
+        # we recommend set output_emb_size = 256 considering the trade-off beteween
         # recall performance and efficiency
 
         self.output_emb_size = output_emb_size
 
@@ -13,9 +13,7 @@
 # limitations under the License.
 
 import os
-
 import paddle
-
 from paddlenlp.utils.log import logger
 
 
@@ -47,7 +45,7 @@ def convert_example(example,
                     pad_to_max_seq_len=False):
     """
     Builds model inputs from a sequence.
-        
+
     A BERT sequence has the following format:
 
     - single sequence: ``[CLS] X [SEP]``
 
@@ -0,0 +1,32 @@
+# worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG
+# 当build_dag_each_worker=False时，框架会设置主线程grpc线程池的max_workers=worker_num
+worker_num: 20
+# build_dag_each_worker, False，框架在进程内创建一条DAG；True，框架会每个进程内创建多个独立的DAG
+build_dag_each_worker: false
+
+dag:
+  # op资源类型, True, 为线程模型；False，为进程模型
+  is_thread_op: False
+  # 使用性能分析, True，生成Timeline性能数据，对性能有一定影响；False为不使用
+  tracer:
+    interval_s: 10
+# http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时，不自动生成http_port
+http_port: 18082
+# rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时，会自动将rpc_port设置为http_port+1
+rpc_port: 8080
+op:
+  ernie:
+    # 并发数，is_thread_op=True时，为线程并发；否则为进程并发
+    concurrency: 1
+    # 当op配置没有server_endpoints时，从local_service_conf读取本地服务配置
+    local_service_conf:
+      # client类型，包括brpc, grpc和local_predictor.local_predictor不启动Serving服务，进程内预测
+      client_type: local_predictor
+      # device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
+      device_type: 1
+      # 计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
+      devices: '2'
+      # Fetch结果列表，以client_config中fetch_var的alias_name为准, 如果没有设置则全部返回
+      fetch_list: ['output_embedding']
+      # 模型路径
+      model_config: ../../serving_server/
@@ -230,11 +230,13 @@ def predict(self, data, tokenizer):
             self.autolog.times.start()
 
         examples = []
-        for idx,text in enumerate(data):
-            input_ids, segment_ids = convert_example(
-                {idx:text[0]}, tokenizer)
-            title_ids,title_segment_ids=convert_example({idx:text[1]},tokenizer)
-            examples.append((input_ids, segment_ids,title_ids,title_segment_ids))
+        for idx, text in enumerate(data):
+            input_ids, segment_ids = convert_example({idx: text[0]}, tokenizer)
+            title_ids, title_segment_ids = convert_example({
+                idx: text[1]
+            }, tokenizer)
+            examples.append(
+                (input_ids, segment_ids, title_ids, title_segment_ids))
 
         batchify_fn = lambda samples, fn=Tuple(
             Pad(axis=0, pad_val=tokenizer.pad_token_id),  # input
@@ -246,7 +248,8 @@ def predict(self, data, tokenizer):
         if args.benchmark:
             self.autolog.times.stamp()
 
-        query_ids, query_segment_ids,title_ids, title_segment_ids = batchify_fn(examples)
+        query_ids, query_segment_ids, title_ids, title_segment_ids = batchify_fn(
+            examples)
         self.input_handles[0].copy_from_cpu(query_ids)
         self.input_handles[1].copy_from_cpu(query_segment_ids)
         self.predictor.run()
@@ -259,10 +262,13 @@ def predict(self, data, tokenizer):
 
         if args.benchmark:
             self.autolog.times.stamp()
-            
+
         if args.benchmark:
             self.autolog.times.end(stamp=True)
-        result=[float(1 - spatial.distance.cosine(arr1, arr2)) for arr1, arr2 in zip(query_logits, title_logits)]
+        result = [
+            float(1 - spatial.distance.cosine(arr1, arr2))
+            for arr1, arr2 in zip(query_logits, title_logits)
+        ]
         return result
 
 
@@ -277,10 +283,10 @@ def predict(self, data, tokenizer):
     tokenizer = ppnlp.transformers.ErnieTokenizer.from_pretrained('ernie-1.0')
     id2corpus = {0: '国有企业引入非国有资本对创新绩效的影响——基于制造业国有上市公司的经验证据'}
     corpus_list = [{idx: text} for idx, text in id2corpus.items()]
-    res=predictor.extract_embedding(corpus_list, tokenizer)
+    res = predictor.extract_embedding(corpus_list, tokenizer)
     print(res.shape)
     print(res)
-    corpus_list=[['中西方语言与文化的差异','中西方文化差异以及语言体现中西方文化,差异,语言体现'],
-                    ['中西方语言与文化的差异','飞桨致力于让深度学习技术的创新与应用更简单']]
-    res=predictor.predict(corpus_list,tokenizer)
+    corpus_list = [['中西方语言与文化的差异', '中西方文化差异以及语言体现中西方文化,差异,语言体现'],
+                   ['中西方语言与文化的差异', '飞桨致力于让深度学习技术的创新与应用更简单']]
+    res = predictor.predict(corpus_list, tokenizer)
     print(res)
@@ -0,0 +1,35 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from paddle_serving_server.pipeline import PipelineClient
+import numpy as np
+
+client = PipelineClient()
+client.connect(['127.0.0.1:8080'])
+
+list_data = [
+    "国有企业引入非国有资本对创新绩效的影响——基于制造业国有上市公司的经验证据",
+    "试论翻译过程中的文化差异与语言空缺翻译过程,文化差异,语言空缺,文化对比"
+]
+feed = {}
+for i, item in enumerate(list_data):
+    feed[str(i)] = item
+
+print(feed)
+ret = client.predict(feed_dict=feed)
+# print(ret)
+result = np.array(eval(ret.value[0]))
+print(ret.key)
+print(result.shape)
+print(result)
@@ -0,0 +1,82 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import logging
+import numpy as np
+import sys
+
+from paddle_serving_server.web_service import WebService, Op
+
+_LOGGER = logging.getLogger()
+
+
+def convert_example(example,
+                    tokenizer,
+                    max_seq_length=512,
+                    pad_to_max_seq_len=False):
+    result = []
+    for text in example:
+        encoded_inputs = tokenizer(
+            text=text,
+            max_seq_len=max_seq_length,
+            pad_to_max_seq_len=pad_to_max_seq_len)
+        input_ids = encoded_inputs["input_ids"]
+        token_type_ids = encoded_inputs["token_type_ids"]
+        result += [input_ids, token_type_ids]
+    return result
+
+
+class ErnieOp(Op):
+    def init_op(self):
+        import paddlenlp as ppnlp
+        self.tokenizer = ppnlp.transformers.ErnieTokenizer.from_pretrained(
+            'ernie-1.0')
+
+    def preprocess(self, input_dicts, data_id, log_id):
+        from paddlenlp.data import Stack, Tuple, Pad
+
+        (_, input_dict), = input_dicts.items()
+        print("input dict", input_dict)
+        batch_size = len(input_dict.keys())
+        examples = []
+        for i in range(batch_size):
+            input_ids, segment_ids = convert_example([input_dict[str(i)]],
+                                                     self.tokenizer)
+            examples.append((input_ids, segment_ids))
+        batchify_fn = lambda samples, fn=Tuple(
+            Pad(axis=0, pad_val=self.tokenizer.pad_token_id),  # input
+            Pad(axis=0, pad_val=self.tokenizer.pad_token_id),  # segment
+        ): fn(samples)
+        input_ids, segment_ids = batchify_fn(examples)
+        feed_dict = {}
+        feed_dict['input_ids'] = input_ids
+        feed_dict['token_type_ids'] = segment_ids
+        return feed_dict, False, None, ""
+
+    def postprocess(self, input_dicts, fetch_dict, data_id, log_id):
+        new_dict = {}
+        new_dict["output_embedding"] = str(fetch_dict["output_embedding"]
+                                           .tolist())
+        return new_dict, None, ""
+
+
+class ErnieService(WebService):
+    def get_pipeline_response(self, read_op):
+        ernie_op = ErnieOp(name="ernie", input_ops=[read_op])
+        return ernie_op
+
+
+ernie_service = ErnieService(name="ernie")
+ernie_service.prepare_pipeline_config("config_nlp.yml")
+ernie_service.run_service()
@@ -22,9 +22,12 @@
 
 # yapf: disable
 parser = argparse.ArgumentParser()
-parser.add_argument("--similar_text_pair", type=str, default='', help="The full path of similat pair file")
-parser.add_argument("--recall_result_file", type=str, default='', help="The full path of recall result file")
-parser.add_argument("--recall_num", type=int, default=10, help="Most similair number of doc recalled from corpus per query")
+parser.add_argument("--similar_text_pair", type=str,
+                    default='', help="The full path of similat pair file")
+parser.add_argument("--recall_result_file", type=str,
+                    default='', help="The full path of recall result file")
+parser.add_argument("--recall_num", type=int, default=10,
+                    help="Most similair number of doc recalled from corpus per query")
 
 
 args = parser.parse_args()
 
@@ -26,8 +26,10 @@
 
 # yapf: disable
 parser = argparse.ArgumentParser()
-parser.add_argument("--params_path", type=str, required=True, default='./checkpoint/model_900/model_state.pdparams', help="The path to model parameters to be loaded.")
-parser.add_argument("--output_path", type=str, default='./output', help="The path of model parameter in static graph to be saved.")
+parser.add_argument("--params_path", type=str, required=True,
+                    default='./checkpoint/model_900/model_state.pdparams', help="The path to model parameters to be loaded.")
+parser.add_argument("--output_path", type=str, default='./output',
+                    help="The path of model parameter in static graph to be saved.")
 args = parser.parse_args()
 # yapf: enable