This repository shows the implementation of the system described in the paper Low-Resource Name Tagging Learned with Weakly Labeled Data.
python = 2.7
torch = 0.4.1
We collect weakly labeled data from wiki in Mongolian (mn). We select sentences with highest quality as validation set and test set.
files/ : input files, you can unzip it from files.zip.
train.txt local_train.txt nofuzzy_train.txt valid.txt test.txt : examples for training, validation and test.
entity_dict word_idf_dict mnalphabet : pre-generated files for training.
word_embedding : word embedding file.
code/ : implementation fo our model.
log/ : log files of traing and evaluation.
cd code/ python main.py
We output metrics including precision, precision type, recall, recall type, f1 and f1 type. The difference between xx and xx type is that the former one only considers prediction's boundary while that latter one considers both boundary and type.
This research is supported by the National Research Foundation, Singapore under its International Research Centres in Singapore Funding Initiative. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.