Name	Name	Last commit message	Last commit date
parent directory ..
assets	assets
configs	configs
corpora	corpora
evaluation	evaluation
scripts	scripts
training	training
.gitignore	.gitignore
README.md	README.md
__init__.py	__init__.py
project.yml	project.yml
requirements.txt	requirements.txt
test_nel_benchmark.py	test_nel_benchmark.py

🪐 Weasel Project: NEL Benchmark

Pipeline for benchmarking NEL approaches (incl. candidate generation and entity disambiguation).

📋 project.yml

The project.yml defines the data assets required by the project, as well as the available commands and workflows. For details, see the Weasel documentation.

⏯ Commands

The following commands are defined by the project. They can be executed using weasel run [name]. Commands are only re-run if their inputs have changed.

Command	Description
`download_mewsli9`	Download Mewsli-9 dataset.
`download_model`	Download a model with pretrained vectors and NER component.
`wikid_clone`	Clone `wikid` to prepare Wiki database and `KnowledgeBase`.
`preprocess`	Preprocess and clean corpus data.
`wikid_download_assets`	Download Wikipedia dumps. This can take a long time if you're not using the filtered dumps!
`wikid_parse`	Parse Wikipedia dumps. This can take a long time if you're not using the filtered dumps!
`wikid_create_kb`	Create the knowledge base and write it to file.
`parse_corpus`	Parse corpus to generate entity and annotation lookups used for corpora compilation.
`compile_corpora`	Compile corpora, separated in train/dev/test sets.
`train`	Train a new Entity Linking component. Pass --vars.gpu_id GPU_ID to train with GPU. Training with some datasets may take a long time!
`evaluate`	Evaluate on the test set.
`compare_evaluations`	Compare available set of evaluation runs.
`delete_wiki_db`	Deletes SQLite database generated in step wiki_parse with data parsed from Wikidata and Wikipedia dump.
`clean`	Remove intermediate files for specified dataset and language (excluding Wiki resources and database).

⏭ Workflows

The following workflows are defined by the project. They can be executed using weasel run [name] and will run the specified commands in order. Commands are only re-run if their inputs have changed.

Workflow	Steps
`all`	`download_mewsli9` → `download_model` → `wikid_clone` → `preprocess` → `wikid_download_assets` → `wikid_parse` → `wikid_create_kb` → `parse_corpus` → `compile_corpora` → `train` → `evaluate` → `compare_evaluations`
`training`	`train` → `evaluate`

Notes:

Warning: Parts of this project are currently not platform-agnostic and run only on Linux. Making the entire project work cross-platform is on our todo list.

svn is required for downloading the Mewsli-9 dataset.
The project configuration specifies a complete dump of the English Wikidata and Wikipedia as well as filtered versions. By default only the filtered versions - containing only articles and entities mentioning "New York" or "Boston" - are downloaded and processed. If you'd like to work with the complete dumps, make sure to...
- ...fetch assets with extra (spacy project assets --extra).
- ...set vars.use_filtered_dumps: "" in project.yml.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

🪐 Weasel Project: NEL Benchmark

📋 project.yml

⏯ Commands

⏭ Workflows

Uh oh!

FilesExpand file tree

nel

Directory actions

More options

Directory actions

More options

Latest commit

History

nel

Folders and files

parent directory

README.md

🪐 Weasel Project: NEL Benchmark

📋 project.yml

⏯ Commands

⏭ Workflows