Skip to content

Commit 714072d

Browse files
committed
Add a short explanation of how to download and process the ANG NER dataset posted here: https://github.com/dmetola/Old_English-OEDT/tree/main stanfordnlp/stanza-train#19 (comment)
1 parent 8c69363 commit 714072d

File tree

1 file changed

+14
-0
lines changed

1 file changed

+14
-0
lines changed

stanza/utils/datasets/ner/prepare_ner_dataset.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -469,6 +469,15 @@
469469
https://github.com/UniversalDependencies/UD_Hebrew-IAHLTknesset
470470
- change to the dev branch in that repo
471471
python3 stanza/utils/datasets/ner/prepare_ner_dataset.py he_iahlt
472+
473+
ang_ewt is an Old English dataset available here:
474+
https://github.com/dmetola/Old_English-OEDT/tree/main
475+
As more information, including a citation, will be added here
476+
- install in NERBASE:
477+
mkdir $NERBASE/ang
478+
cd $NERBASE/ang
479+
git clone [email protected]:dmetola/Old_English-OEDT.git
480+
- python3 stanza/utils/datasets/ner/prepare_ner_dataset.py ang_ewt
472481
"""
473482

474483
import glob
@@ -1471,8 +1480,13 @@ def process_he_iahlt(paths, short_name):
14711480
base_output_path = paths["NER_DATA_DIR"]
14721481
convert_he_iahlt.convert_iahlt(udbase, base_output_path, "he_iahlt")
14731482

1483+
def process_ang_ewt(paths, short_name):
1484+
assert short_name == 'ang_ewt'
1485+
base_input_path = os.path.join(paths["NERBASE"], "ang", "Old_English-OEDT")
1486+
convert_bio_to_json(base_input_path, paths["NER_DATA_DIR"], short_name)
14741487

14751488
DATASET_MAPPING = {
1489+
"ang_ewt": process_ang_ewt,
14761490
"ar_aqmar": process_ar_aqmar,
14771491
"bn_daffodil": process_bn_daffodil,
14781492
"da_ddt": process_da_ddt,

0 commit comments

Comments
 (0)