evaluation
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
parent directory.. | ||||
It is possible to run the evaluation.py script against the gold standard
merged data. Please follow the next steps:
1) First, make sure you've run the download_dataset.sh bash script in
the ../dataset/ directory to obtain the four original datasets.
2) Second, create the merged data in the TSV format expected by the
evaluation.py script by running:
$ python3 make_merged_data_ready_for_evaluation.py
This will create the following files (which will be in the TSV
format expected by the evaluation.py script):
merged_data_subtask1_train_ready_for_evaluation.tsv
merged_data_subtask2_train_ready_for_evaluation.tsv
merged_data_subtask1_test_ready_for_evaluation.tsv
merged_data_subtask2_test_ready_for_evaluation.tsv
Notice that:
- subtask1 corresponds to NER data.
- subtask2 corresponds to NEL data.
3) Finally, we can perform NER and NEL evaluation, between
- the four original gold standard datasets and
- the merged gold standard dataset,
by running one of the following commands:
$ python3 evaluation.py train merged_data_subtask1_train_ready_for_evaluation.tsv
$ python3 evaluation.py train merged_data_subtask2_train_ready_for_evaluation.tsv
$ python3 evaluation.py test merged_data_subtask1_test_ready_for_evaluation.tsv
$ python3 evaluation.py test merged_data_subtask2_test_ready_for_evaluation.tsv
Note that the evalution result is expected to be a Micro-averaged
F1-score of 1.0. This also serves as a sanity check that the merged
data is equivalent to the four original datasets.
---
Abbreviations:
NER: named entity recognition
NEL: named entity linking
TSV: tab-separated values