Skip to content

saber.load_dataset() should be able to pull from pubannotation. #146

@JohnGiorgi

Description

@JohnGiorgi

Saber.load_dataset() should be able to pull from pubannotation.org given a projects URL.

E.g.

saber.load_dataset('http://pubannotation.org/projects/AGAC_training/annotations.tgz')

should download the dataset to ~/saber/datasets, convert it to the CoNLL 2003 format, and load it into a Dataset object. Furthermore, if this URL is ever supplied again, load_dataset() should use the cached version of the dataset in ~/saber/datasets.

Considering pubannotation.org contains most of the most popular datasets for BioNLP, this would nearly eliminate the need to maintain datasets locally.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions