Skip to content

Commit 40f45e2

Browse files
authored
Merge branch 'develop' into ner_spo_debug
2 parents 5126f82 + 7c47b62 commit 40f45e2

File tree

126 files changed

+14461
-1813
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

126 files changed

+14461
-1813
lines changed

README.md

Lines changed: 8 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,19 @@ PaddleNLP是飞桨自然语言处理开发库,具备**易用的文本领域API
2323

2424
- **易用的文本领域API**
2525
- 提供丰富的产业级预置任务能力[Taskflow](./docs/model_zoo/taskflow.md)和全流程的文本领域API:支持丰富中文数据集加载的[Dataset API](https://paddlenlp.readthedocs.io/zh/latest/data_prepare/dataset_list.html);灵活高效地完成数据预处理的[Data API](https://paddlenlp.readthedocs.io/zh/latest/source/paddlenlp.data.html);提供100+预训练模型的[Transformers API](./docs/model_zoo/transformers.rst)等,可大幅提升NLP任务建模的效率。
26-
2726
- **多场景的应用示例**
2827
- 覆盖从学术到产业级的NLP[应用示例](#多场景的应用示例),涵盖NLP基础技术、NLP系统应用以及相关拓展应用。全面基于飞桨核心框架2.0全新API体系开发,为开发者提供飞桨文本领域的最佳实践。
29-
3028
- **高性能分布式训练**
3129
- 基于飞桨核心框架领先的自动混合精度优化策略,结合分布式Fleet API,支持4D混合并行策略,可高效地完成大规模预训练模型训练。
3230

31+
## 社区交流
32+
33+
微信扫描下方二维码加入官方交流群,与各行各业开发者充分交流,期待您的加入⬇️
34+
35+
<div align="center">
36+
<img src="https://user-images.githubusercontent.com/11793384/157790710-cfad5c8a-0edd-49d7-9711-eb1c683a687c.png" width="188" height="188" />
37+
</div>
38+
3339
## 安装
3440

3541
### 环境依赖
@@ -306,21 +312,6 @@ PaddleNLP提供了多粒度、多场景的NLP应用示例,面向动态图模
306312

307313
更多教程参见[PaddleNLP on AI Studio](https://aistudio.baidu.com/aistudio/personalcenter/thirdview/574995)
308314

309-
## 社区贡献与技术交流
310-
311-
### 特殊兴趣小组
312-
313-
- 欢迎您加入PaddleNLP的SIG社区,贡献优秀的模型实现、公开数据集、教程与案例等。
314-
315-
### WeChat
316-
317-
- 现在就加入PaddleNLP的技术交流群,一起交流NLP技术吧!⬇️
318-
319-
<div align="center">
320-
<img src="https://user-images.githubusercontent.com/11793384/156540272-353d3d80-f2ec-410d-b863-b51f2d156a72.jpg" width="230" height="300" />
321-
</div>
322-
323-
324315

325316
## 版本更新
326317

README_en.md

Lines changed: 18 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,24 @@ English | [简体中文](./README.md)
3232
* **High Performance Distributed Training**
3333
- We provide an industrial level training pipeline for super large-scale Transformer model based on **Auto Mixed Precision** and Fleet distributed training API by PaddlePaddle, which can support customized model pre-training efficiently.
3434

35+
## Community
36+
37+
### Special Interest Group (SIG)
38+
39+
Welcome to join [PaddleNLP SIG](https://iwenjuan.baidu.com/?code=bkypg8) for contribution, eg. Dataset, Models and Toolkit.
40+
41+
### Slack
42+
43+
To connect with other users and contributors, welcome to join our [Slack channel](https://paddlenlp.slack.com/).
44+
45+
### WeChat
46+
47+
Scan the QR code below with your Wechat⬇️. You can access to official technical exchange group. Look forward to your participation.
48+
49+
<div align="center">
50+
<img src="https://user-images.githubusercontent.com/11793384/157790710-cfad5c8a-0edd-49d7-9711-eb1c683a687c.png" width="188" height="188" />
51+
</div>
52+
3553
## Installation
3654

3755
### Prerequisites
@@ -204,23 +222,6 @@ Please refer to our official AI Studio account for more interactive tutorials: [
204222

205223
* [Use TCN Model to predict COVID-19 confirmed cases](https://aistudio.baidu.com/aistudio/projectdetail/1290873)
206224

207-
## Community
208-
209-
### Special Interest Group (SIG)
210-
211-
Welcome to join [PaddleNLP SIG](https://iwenjuan.baidu.com/?code=bkypg8) for contribution, eg. Dataset, Models and Toolkit.
212-
213-
### Slack
214-
To connect with other users and contributors, welcome to join our [Slack channel](https://paddlenlp.slack.com/).
215-
216-
### WeChat
217-
Scan the QR code below with your Wechat⬇️. You can access to official technical exchange group. Look forward to your participation.
218-
219-
<div align="center">
220-
<img src="https://user-images.githubusercontent.com/11793384/156540669-c9453a1a-3ed1-4434-a68e-73b9e2f5f771.jpg" width="210" height="200" />
221-
</div>
222-
223-
224225
## ChangeLog
225226

226227
For more details about our release, please refer to [ChangeLog](./docs/changelog.md)

community/nosaydomore/deepset_roberta_base_squad2/README.md renamed to community/nosaydomore/deepset-roberta-base-squad2/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ def decode(start, end, topk, max_answer_len, undesired_tokens):
5050

5151
return starts, ends, scores
5252

53-
tokenizer = RobertaTokenizer.from_pretrained('deepset_roberta_base_squad2')
53+
tokenizer = RobertaTokenizer.from_pretrained('nosaydomore/deepset-roberta-base-squad2')
5454
questions = ['Where do I live?']
5555
contexts = ['My name is Sarah and I live in London']
5656

@@ -77,7 +77,7 @@ offset_mapping = token[0]['offset_mapping']
7777

7878
input_ids = paddle.to_tensor(input_ids, dtype='int64').unsqueeze(0)
7979

80-
model = RobertaForQuestionAnswering.from_pretrained(path)
80+
model = RobertaForQuestionAnswering.from_pretrained("nosaydomore/deepset-roberta-base-squad2")
8181
model.eval()
8282
start, end = model(input_ids=input_ids)
8383
start_ = start[0].numpy()
Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
2-
"model_config_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/deepset_roberta_base_squad2/model_config.json",
3-
"model_state": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/deepset_roberta_base_squad2/model_state.pdparams",
4-
"tokenizer_config_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/deepset_roberta_base_squad2/tokenizer_config.json",
5-
"vocab_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/deepset_roberta_base_squad2/vocab.json",
6-
"merges_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/deepset_roberta_base_squad2/merges.txt"
2+
"model_config_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/deepset-roberta-base-squad2/model_config.json",
3+
"model_state": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/deepset-roberta-base-squad2/model_state.pdparams",
4+
"tokenizer_config_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/deepset-roberta-base-squad2/tokenizer_config.json",
5+
"vocab_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/deepset-roberta-base-squad2/vocab.json",
6+
"merges_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/deepset-roberta-base-squad2/merges.txt"
77
}

community/nosaydomore/roberta_en_base/README.md renamed to community/nosaydomore/roberta-en-base/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## roberta-base
1+
## nosaydomore/roberta-en-base
22
权重来源
33
https://huggingface.co/roberta-base
44

@@ -13,8 +13,8 @@ import paddle
1313
import os
1414
import numpy as np
1515

16-
model = RobertaForMaskedLM.from_pretrained('roberta-base')
17-
tokenizer = RobertaBPETokenizer.from_pretrained('roberta-base')
16+
model = RobertaForMaskedLM.from_pretrained('nosaydomore/roberta-en-base')
17+
tokenizer = RobertaBPETokenizer.from_pretrained('nosaydomore/roberta-en-base')
1818
text = ["The man worked as a", "."] #"The man worked as a <mask>."
1919
tokens_list = []
2020
for i in range(2):
Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
2-
"model_config_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta_en_base/model_config.json",
3-
"model_state": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta_en_base/model_state.pdparams",
4-
"tokenizer_config_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta_en_base/tokenizer_config.json",
5-
"vocab_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta_en_base/vocab.json",
6-
"merges_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta_en_base/merges.txt"
2+
"model_config_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta-en-base/model_config.json",
3+
"model_state": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta-en-base/model_state.pdparams",
4+
"tokenizer_config_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta-en-base/tokenizer_config.json",
5+
"vocab_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta-en-base/vocab.json",
6+
"merges_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta-en-base/merges.txt"
77
}

community/nosaydomore/roberta_en_large/README.md renamed to community/nosaydomore/roberta-en-large/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## roberta-large
1+
## nosaydomore/roberta-en-large
22
权重来源
33
https://huggingface.co/roberta-large
44

@@ -13,8 +13,8 @@ import paddle
1313
import os
1414
import numpy as np
1515

16-
model = RobertaForMaskedLM.from_pretrained('roberta-large')
17-
tokenizer = RobertaBPETokenizer.from_pretrained('roberta-large')
16+
model = RobertaForMaskedLM.from_pretrained('nosaydomore/roberta-en-large')
17+
tokenizer = RobertaBPETokenizer.from_pretrained('nosaydomore/roberta-en-large')
1818
text = ["The man worked as a", "."] #"The man worked as a <mask>."
1919
tokens_list = []
2020
for i in range(2):
Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
2-
"model_config_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta_en_large/model_config.json",
3-
"model_state": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta_en_large/model_state.pdparams",
4-
"tokenizer_config_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta_en_large/tokenizer_config.json",
5-
"vocab_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta_en_large/vocab.json",
6-
"merges_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta_en_large/merges.txt"
2+
"model_config_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta-en-large/model_config.json",
3+
"model_state": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta-en-large/model_state.pdparams",
4+
"tokenizer_config_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta-en-large/tokenizer_config.json",
5+
"vocab_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta-en-large/vocab.json",
6+
"merges_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/roberta-en-large/merges.txt"
77
}

community/nosaydomore/sshleifer_tiny_distilroberta_base/README.md renamed to community/nosaydomore/sshleifer-tiny-distilroberta-base/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## sshleifer/tiny-distilroberta-base
1+
## nosaydomore/sshleifer-tiny-distilroberta-base
22
权重来源
33
https://huggingface.co/sshleifer/tiny-distilroberta-base
44

@@ -13,8 +13,8 @@ import paddle
1313
import os
1414
import numpy as np
1515

16-
model = RobertaForMaskedLM.from_pretrained('sshleifer/tiny-distilroberta-base')
17-
tokenizer = RobertaBPETokenizer.from_pretrained('sshleifer/tiny-distilroberta-base')
16+
model = RobertaForMaskedLM.from_pretrained('nosaydomore/sshleifei-tiny-distilroberta-base')
17+
tokenizer = RobertaBPETokenizer.from_pretrained('nosaydomore/sshleifei-tiny-distilroberta-base')
1818
text = ["The man worked as a", "."] #"The man worked as a <mask>."
1919
tokens_list = []
2020
for i in range(2):
Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
2-
"model_config_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/sshleifer_tiny_distilroberta_base/model_config.json",
3-
"model_state": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/sshleifer_tiny_distilroberta_base/model_state.pdparams",
4-
"tokenizer_config_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/sshleifer_tiny_distilroberta_base/tokenizer_config.json",
5-
"vocab_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/sshleifer_tiny_distilroberta_base/vocab.json",
6-
"merges_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/sshleifer_tiny_distilroberta_base/merges.txt"
2+
"model_config_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/sshleifer-tiny-distilroberta-base/model_config.json",
3+
"model_state": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/sshleifer-tiny-distilroberta-base/model_state.pdparams",
4+
"tokenizer_config_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/sshleifer-tiny-distilroberta-base/tokenizer_config.json",
5+
"vocab_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/sshleifer-tiny-distilroberta-base/vocab.json",
6+
"merges_file": "https://bj.bcebos.com/paddlenlp/models/transformers/community/nosaydomore/sshleifer-tiny-distilroberta-base/merges.txt"
77
}

0 commit comments

Comments
 (0)