Skip to content

Latest commit

 

History

History
78 lines (62 loc) · 3.06 KB

File metadata and controls

78 lines (62 loc) · 3.06 KB

Boundary Smoothing for Named Entity Recognition

This page describes the materials and code for "Boundary Smoothing for Named Entity Recognition".

Setup

Installation

Setup an environment and install the dependencies and eznlp according to README.

Download and process datasets

Download pretrained language models

Download the pretrained language models by transformers and save to assets/transformers.

git clone https://huggingface.co/google-bert/bert-base-uncased  assets/transformers/bert-base-uncased
git clone https://huggingface.co/google-bert/bert-base-cased    assets/transformers/bert-base-cased
git clone https://huggingface.co/google-bert/bert-large-uncased assets/transformers/bert-large-uncased
git clone https://huggingface.co/google-bert/bert-large-cased   assets/transformers/bert-large-cased
git clone https://huggingface.co/FacebookAI/roberta-base        assets/transformers/roberta-base
git clone https://huggingface.co/FacebookAI/roberta-large       assets/transformers/roberta-large
git clone https://huggingface.co/hfl/chinese-bert-wwm-ext       assets/transformers/hfl/chinese-bert-wwm-ext
git clone https://huggingface.co/hfl/chinese-macbert-base       assets/transformers/hfl/chinese-macbert-base
git clone https://huggingface.co/hfl/chinese-macbert-large      assets/transformers/hfl/chinese-macbert-large

Running the Code

For English datasets:

$ python scripts/entity_recognition.py @scripts/options/with_bert.opt \
    --num_epochs 50 \
    --batch_size 48 \
    --num_grad_acc_steps 1 \
    --dataset {conll2003 | conll2012 | ace2004 | ace2005} \
    --ck_decoder boundary_selection \
    --sb_epsilon {0.0 | 0.1 | 0.2 | 0.3} \
    --sb_size {1 | 2} \
    --bert_arch {RoBERTa_base | RoBERTa_large | BERT_base | BERT_large} \
    --use_interm2 \
    [options]

For Chinese datasets:

$ python scripts/entity_recognition.py @scripts/options/with_bert.opt \
    --num_epochs 50 \
    --batch_size 48 \
    --num_grad_acc_steps 1 \
    --dataset {ontonotesv4_zh | SIGHAN2006 | WeiboNER | ResumeNER} \
    --ck_decoder boundary_selection \
    --sb_epsilon {0.0 | 0.1 | 0.2 | 0.3} \
    --sb_size {1 | 2} \
    --bert_arch {BERT_base_wwm | MacBERT_base | MacBERT_large} \
    --use_interm2 \
    [options]

See more details for options:

$ python scripts/entity_recognition.py --help