@@ -678,6 +678,31 @@ my_ner = Taskflow("ner", mode="accurate", task_path="./custom_task_path/")
678
678
```
679
679
</div ></details >
680
680
681
+ ## 模型算法
682
+
683
+ <details ><summary >模型算法说明</summary ><div >
684
+
685
+ <table >
686
+ <tr ><td >任务名称<td >模型<td >模型详情<td >训练集
687
+ <tr ><td rowspan =" 3 " >中文分词<td >默认模式: BiGRU+CRF<td > <a href =" https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/lexical_analysis " > 训练详情 <td > 百度自建数据集,包含近2200万句子,覆盖多种场景
688
+ <tr ><td >快速模式:Jieba<td > - <td > -
689
+ <tr ><td >精确模式:WordTag<td > <a href =" https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/text_to_knowledge/ernie-ctm " > 训练详情 <td > 百度自建数据集,词类体系基于TermTree构建
690
+ <tr ><td >词性标注<td >BiGRU+CRF<td > <a href =" https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/lexical_analysis " > 训练详情 <td > 百度自建数据集,包含2200万句子,覆盖多种场景
691
+ <tr ><td rowspan =" 2 " >命名实体识别<td >精确模式:WordTag<td > <a href =" https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/text_to_knowledge/ernie-ctm " > 训练详情 <td > 百度自建数据集,词类体系基于TermTree构建
692
+ <tr ><td >快速模式:BiGRU+CRF <td > <a href =" https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/lexical_analysis " > 训练详情 <td > 百度自建数据集,包含2200万句子,覆盖多种场景
693
+ <tr ><td >依存句法分析<td >DDParser<td > <a href =" https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/dependency_parsing/ddparser " > 训练详情 <td > 百度自建数据集,DuCTB 1.0中文依存句法树库
694
+ <tr ><td rowspan =" 2 " >解语知识标注<td >词类知识标注:WordTag<td > <a href =" https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/text_to_knowledge/ernie-ctm " > 训练详情 <td > 百度自建数据集,词类体系基于TermTree构建
695
+ <tr ><td >名词短语标注:NPTag <td > <a href =" https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/text_to_knowledge/nptag " > 训练详情 <td > 百度自建数据集
696
+ <tr ><td >文本纠错<td >ERNIE-CSC<td > <a href =" https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/text_correction/ernie-csc " > 训练详情 <td > SIGHAN简体版数据集及 <a href =" https://github.com/wdimmy/Automatic-Corpus-Generation/blob/master/corpus/train.sgml " > Automatic Corpus Generation生成的中文纠错数据集
697
+ <tr ><td >文本相似度<td >SimBERT<td > - <td > 收集百度知道2200万对相似句组
698
+ <tr ><td rowspan =" 2 " >情感倾向分析<td > BiLSTM <td > - <td > 百度自建数据集
699
+ <tr ><td > SKEP <td > <a href =" https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/sentiment_analysis/skep " > 训练详情 <td > 百度自建数据集
700
+ <tr ><td >生成式问答<td >CPM<td > - <td > 100GB级别中文数据
701
+ <tr ><td >智能写诗<td >CPM<td > - <td > 100GB级别中文数据
702
+ <tr ><td >开放域对话<td >PLATO-Mini<td > - <td > 十亿级别中文对话数据
703
+ </table >
704
+
705
+ </div ></details >
681
706
682
707
## FAQ
683
708
@@ -699,6 +724,22 @@ ner = Taskflow("ner", home_path="/workspace")
699
724
700
725
</div ></details >
701
726
727
+ <details ><summary ><b >Q:</b >Taskflow如何提升预测速度?</summary ><div >
728
+
729
+ ** A:** 可以结合设备情况适当调整batch_size,采用批量输入的方式来提升平均速率。示例:
730
+ ``` python
731
+ from paddlenlp import Taskflow
732
+
733
+ # 精确模式模型体积较大,可结合机器情况适当调整batch_size,采用批量样本输入的方式。
734
+ seg_accurate = Taskflow(" word_segmentation" , mode = " accurate" , batch_size = 32 )
735
+
736
+ # 批量样本输入,输入为多个句子组成的list,预测速度更快
737
+ texts = [" 热梅茶是一道以梅子为主要原料制作的茶饮" , " 《孤女》是2010年九州出版社出版的小说,作者是余兼羽" ]
738
+ seg_accurate(texts)
739
+ ```
740
+ 通过上述方式进行分词可以大幅提升预测速度。
741
+
742
+ </div ></details >
702
743
703
744
<details ><summary ><b >Q:</b >后续会增加更多任务支持吗?</summary ><div >
704
745
0 commit comments