dureader checklist 2021

百度2021年语言与智能技术竞赛机器阅读理解

—— 基于transformers的pytorch baseline

比赛介绍

Link: https://aistudio.baidu.com/aistudio/competition/detail/66?isFromLuge=true

中国计算机学会、中国中文信息学会和百度公司已经于2018至2020年间连续联合举办了三届机器阅读理解评测，极大地推动了中文机器阅读理解技术的发展。然而，当前的机器阅读理解数据集大多都只采用单一的指标来评测模型的好坏，缺乏对模型语言理解能力的细粒度、多维度评测，导致模型的具体缺陷很难被发现和改进。为了解决这个问题，我们建立了细粒度的、多维度的评测数据集，从词汇理解、短语理解、语义角色理解、逻辑推理等多个维度检测模型的不足之处，从而推动阅读理解评测进入“精细化“时代。该数据集中的样本均来自于实际的应用场景，难度大，考察点丰富，覆盖了真实应用中诸多难以解决的问题。

China Computer Federation(CCF), Chinese Information Processing Society of China(CIPS), and Baidu Inc. have jointly held the shared task of machine reading comprehension (MRC) from 2018 to 2020, which greatly promoted the development of Chinese machine reading comprehension technology. However, most existing MRC datasets focus on the overall performance of an MRC model, lacking systematic and fine-grained evaluation. This leads to a consequence that the vulnerabilities of the current MRC models are not well-studied. Therefore, we hold the shared task of machine reading comprehension in " 2021 Language and Intelligence Challenge ", that focus on challenging the MRC models from multiple aspects, including understanding of vocabulary, phrase, semantic role, reasoning and so on.

依赖库

transformers==4.5.1
torch==1.8.1+cu111

示例

train以及evaluate过程都包含在main.py里面

train

from model_pytorch import Model
from training_args import training_args

train_dev_data_dir = "./dataset"

model = Model()

train_examples = model.processor.get_train_examples(train_dev_data_dir)
dev_examples = model.processor.get_dev_examples(train_dev_data_dir)

train_features = model.prepare_training_features(train_examples)
dev_features = model.prepare_validation_features(dev_examples)

dev_data = {"examples": dev_examples, "features": dev_features}

model.train(train_features, dev_data, args=training_args)

evaluate

dev_data_dir = "./test1"

model = Model(path='./outputs')

dev_examples = model.processor.get_dev_examples(dev_data_dir)

dev_features = model.prepare_validation_features(dev_examples)

dev_data = {"examples": dev_examples, "features": dev_features}

model.evaluate(dev_data, args=training_args, prefix='test1')

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
early_stopping.py		early_stopping.py
evaluate.py		evaluate.py
main.py		main.py
model_pytorch.py		model_pytorch.py
progress_bar.py		progress_bar.py
squad.py		squad.py
training_args.py		training_args.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dureader checklist 2021

百度2021年语言与智能技术竞赛机器阅读理解

比赛介绍

依赖库

示例

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

dureader checklist 2021

百度2021年语言与智能技术竞赛机器阅读理解

比赛介绍

依赖库

示例

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages