PaddlePaddle
diff --git a/‎README.md‎
Lines changed: 2 additions & 2 deletions b/‎README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎README_en.md‎
Lines changed: 2 additions & 2 deletions b/‎README_en.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/data_prepare/dataset_list.md‎
Lines changed: 6 additions & 0 deletions b/‎docs/data_prepare/dataset_list.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎docs/data_prepare/dataset_load.rst‎
Lines changed: 2 additions & 0 deletions b/‎docs/data_prepare/dataset_load.rst‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎examples/dialogue/plato-2/interaction.py‎
Lines changed: 1 addition & 0 deletions b/‎examples/dialogue/plato-2/interaction.py‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎examples/dialogue/plato-2/utils/tokenization.py‎
Lines changed: 1 addition & 1 deletion b/‎examples/dialogue/plato-2/utils/tokenization.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/dialogue/unified_transformer/README.md‎
Lines changed: 2 additions & 4 deletions b/‎examples/dialogue/unified_transformer/README.md‎
Lines changed: 2 additions & 4 deletions
diff --git a/‎examples/dialogue/unified_transformer/interaction.py‎
Lines changed: 1 addition & 0 deletions b/‎examples/dialogue/unified_transformer/interaction.py‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎examples/few_shot/p-tuning/predict.py‎
Lines changed: 1 addition & 1 deletion b/‎examples/few_shot/p-tuning/predict.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/information_extraction/waybill_ie/README.md‎
Lines changed: 12 additions & 1 deletion b/‎examples/information_extraction/waybill_ie/README.md‎
Lines changed: 12 additions & 1 deletion
@@ -53,7 +53,7 @@ pip install --upgrade paddlenlp
 
 ### Transformer API: 强大的预训练模型生态底座
 
-覆盖**15**个网络结构和**67**个预训练模型参数，既包括百度自研的预训练模型如ERNIE系列, PLATO, SKEP等，也涵盖业界主流的中文预训练模型。也欢迎开发者进预训练模贡献！🤗 
+覆盖**15**个网络结构和**67**个预训练模型参数，既包括百度自研的预训练模型如ERNIE系列, PLATO, SKEP等，也涵盖业界主流的中文预训练模型。也欢迎开发者进预训练模贡献！🤗
 
 ```python
 from paddlenlp.transformers import *
@@ -78,7 +78,7 @@ text = tokenizer('自然语言处理')
 
 # 语义表示
 model = ErnieModel.from_pretrained('ernie-1.0')
-pooled_output, sequence_output = model(input_ids=paddle.to_tensor([text['input_ids']]))
+sequence_output, pooled_output = model(input_ids=paddle.to_tensor([text['input_ids']]))
 # 文本分类 & 句对匹配
 model = ErnieForSequenceClassification.from_pretrained('ernie-1.0')
 # 序列标注
 
@@ -20,7 +20,7 @@ English | [简体中文](./README.md)
 
 ## Introduction
 
-**PaddleNLP** is a powerful NLP library with **Awesome** pre-trained Transformer models and easy-to-use interface, supporting wide-range of NLP tasks from research to industrial applications. 
+**PaddleNLP** is a powerful NLP library with **Awesome** pre-trained Transformer models and easy-to-use interface, supporting wide-range of NLP tasks from research to industrial applications.
 
 
 * **Easy-to-Use API**
@@ -76,7 +76,7 @@ text = tokenizer('natural language understanding')
 
 # Semantic Representation
 model = ErnieModel.from_pretrained('ernie-1.0')
-pooled_output, sequence_output = model(input_ids=paddle.to_tensor([text['input_ids']]))
+sequence_output, pooled_output = model(input_ids=paddle.to_tensor([text['input_ids']]))
 # Text Classificaiton and Matching
 model = ErnieForSequenceClassification.from_pretrained('ernie-1.0')
 # Sequence Labeling
 
@@ -11,6 +11,9 @@ PaddleNLP提供了以下数据集的快速读取API，实际使用时请根据
 |  [DuReader-robust](https://aistudio.baidu.com/aistudio/competition/detail/49) | 千言数据集：阅读理解，答案原文抽取|`paddlenlp.datasets.load_dataset('dureader_robust')` |
 |  [CMRC2018](http://hfl-rc.com/cmrc2018/) | 第二届“讯飞杯”中文机器阅读理解评测数据集|`paddlenlp.datasets.load_dataset('cmrc2018')` |
 |  [DRCD](https://github.com/DRCKnowledgeTeam/DRCD) | 台達閱讀理解資料集|`paddlenlp.datasets.load_dataset('drcd')` |
+|  [TriviaQA](http://nlp.cs.washington.edu/triviaqa/) | Washington大学问答数据集|`paddlenlp.datasets.load_dataset('triviaqa')` |
+|  [C3](https://dataset.org/c3/) | 阅读理解单选题 |`paddlenlp.datasets.load_dataset('c3')` |
+
 
 ## 文本分类
 
@@ -48,6 +51,9 @@ PaddleNLP提供了以下数据集的快速读取API，实际使用时请根据
 | [THUCNews](https://github.com/gaussic/text-classification-cnn-rnn#%E6%95%B0%E6%8D%AE%E9%9B%86) |  THUCNews中文新闻类别分类 | `paddlenlp.datasets.load_dataset('thucnews')` |
 | [HYP](https://pan.webis.de/semeval19/semeval19-web/) | 英文政治新闻情感分类语料  | `paddlenlp.datasets.load_dataset('hyp')` |
 
+## 文本匹配
+| [CAIL2019-SCM](https://github.com/china-ai-law-challenge/CAIL2019/tree/master/scm) | 相似法律案例匹配  | `paddlenlp.datasets.load_dataset('cail2019_scm')` |
+
 ## 序列标注
 
 |  数据集名称   | 简介 | 调用方法 |
 
@@ -65,3 +65,5 @@
 
         >>> from paddlenlp.datasets import load_dataset
         >>> train_ds, test_ds = load_dataset("glue", "cola", splits=["train", "test"], data_files=["my_train_file.csv", "my_test_file.csv"])
+
+    **另外需要注意数据集的是没有默认加载选项的，**:attr:`splits` **和**:attr:`data_files` **必须至少指定一个。**
@@ -76,6 +76,7 @@ def interact(args):
                 example, is_infer=True)
             data = plato_reader._pad_batch_records([record], is_infer=True)
             inputs = gen_inputs(data, args.latent_type_size)
+            inputs['tgt_ids'] = inputs['tgt_ids'].astype('int64')
             pred = model(inputs)[0]
             bot_response = pred["response"]
             print(
 
@@ -86,7 +86,7 @@ def convert_to_unicode(text):
 def load_vocab(vocab_file):
     """Loads a vocabulary file into a dictionary."""
     vocab = collections.OrderedDict()
-    fin = open(vocab_file)
+    fin = open(vocab_file, 'r', encoding="UTF-8")
     for num, line in enumerate(fin):
         items = convert_to_unicode(line.rstrip()).split("\t")
         if len(items) > 2:
 
@@ -48,7 +48,7 @@ train_ds, dev_ds, test1_ds, test2_ds = load_dataset('duconv', splits=('train', '
 
 ### 模型训练
 
-运行如下命令即可在练集上进行finetune，并在验证集上进行验证
+运行如下命令即可在训练集上进行finetune，并在验证集上进行验证
 
 ```shell
 # GPU启动，参数`--gpus`指定训练所用的GPU卡号，可以是单卡，也可以多卡
@@ -81,7 +81,6 @@ python -m paddle.distributed.launch --gpus '0' --log_dir ./log finetune.py \
    |---------------------------------|
    | unified_transformer-12L-cn      |
    | unified_transformer-12L-cn-luge |
-   | plato-mini |
 
 - `save_dir` 表示模型的保存路径。
 - `logging_steps` 表示日志打印间隔。
@@ -143,7 +142,6 @@ python infer.py \
    |---------------------------------|
    | unified_transformer-12L-cn      |
    | unified_transformer-12L-cn-luge |
-   | plato-mini |
 
 - `output_path` 表示预测结果的保存路径。
 - `logging_steps` 表示日志打印间隔。
@@ -202,7 +200,7 @@ python interaction.py \
 - `top_k` 表示采用"sampling"解码策略时，token的概率按从大到小排序，生成的token只从前`top_k`个中进行采样。
 - `device` 表示使用的设备。
 
-**NOTE:** 输入"[EXIT]"退出交互程序，输入"[NEXT]"开启下一轮新的对话。
+**NOTE:** 输入"[EXIT]"退出交互程序，输入"[NEXT]"开启下一轮新的对话。需要注意使用退格会导致错误。
 
 ## Reference
 
 
@@ -48,6 +48,7 @@ def interaction(args, model, tokenizer):
                 add_start_token_as_response=True,
                 return_tensors=True,
                 is_split_into_words=False)
+            inputs['input_ids'] = inputs['input_ids'].astype('int64')
             ids, scores = model.generate(
                 input_ids=inputs['input_ids'],
                 token_type_ids=inputs['token_type_ids'],
 
@@ -340,7 +340,7 @@ def write_chid(task_name, output_file, pred_labels):
                                         args.task_name + ".json")
 
     label_norm_dict = None
-    with open(label_normalize_json) as f:
+    with open(label_normalize_json, encoding='utf-8') as f:
         label_norm_dict = json.load(f)
 
     convert_example_fn = convert_example if args.task_name != "chid" else convert_chid_example
 
@@ -11,7 +11,7 @@
 执行以下命令，下载并解压示例数据集：
 
 ```bash
-python download.py --data_dir ./  
+python download.py --data_dir ./waybill_ie
 ```
 
 数据示例如下：
@@ -51,6 +51,17 @@ python run_bigru_crf.py
 export CUDA_VISIBLE_DEVICES=0
 python run_ernie.py
 ```
+##### 模型导出
+使用动态图训练结束之后，还可以将动态图参数导出成静态图参数，具体代码见export_model.py。静态图参数保存在output_path指定路径中。 运行方式：
+
+`python export_model.py --params_path ernie_ckpt/model_80.pdparams --output_path=./output`
+
+其中`params_path`是指动态图训练保存的参数路径，`output_path`是指静态图参数导出路径。
+
+导出模型之后，可以用于部署，deploy/python/predict.py文件提供了python部署预测示例。运行方式：
+
+`python deploy/python/predict.py --model_dir ./output`
+
 
 #### 启动ERNIE + CRF训练