Code for the paper "An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models", accept by TACL 2020. Part of the code is from "here".
- Python 3.6
- MXNet 1.6.0, e.g., using cuda-10.0,
pip install mxnet-cu100 - GluonNLP 0.9.0
make train-bert exp=mnli_seed/bert task=MNLI test-split=dev_matched bs=32 gpu=0 \
nepochs=3 seed=2 lr=0.00002
make train-bert exp=mnli_seed/bert task=QQP test-split=dev bs=32 gpu=0 \
nepochs=3 seed=2 lr=0.00002
exp: the directory to save modelstask: which dataset to load.test-split: which split to use for validationbs: batch sizegpu: which gpu to usenepochs: the number of finetuning epochseed: random seed numberlr: learning rate
make train-bert exp=mnli_seed/bert task=MNLI test-split=dev_matched bs=32 \
gpu=0 nepochs=10 seed=2 lr=0.00002
make train-bert exp=qqp_seed/roberta task=QQP test-split=dev gpu=3 \
nepochs=10 model_type_a=roberta model_name=openwebtext_ccnews_stories_books_cased \
bs=32 seed=2 lr=0.00002
make train-bert exp=mnli_seed/robertal task=MNLI test-split=dev_matched \
gpu=0 nepochs=10 model_type_a=robertal model_name=openwebtext_ccnews_stories_books_cased \
seed=2 lr=0.00002
model_type_a: which pretrained language are used: 'bert': BERT; 'bertl': BERT LARGE; 'roberta': RoBERTa; 'robertal': RoBERTa LARGE.model_name: the dataset used for language model training: 'book_corpus_wiki_en_uncased' for BERT, 'openwebtext_ccnews_stories_books_cased' foor RoBERTa
make train-Mbert exp=mnli_seed_m/ber task=MNLI a-task=QQP test-split=dev_matched \
model_type_a=bert gpu=0 nepochs=10 seed=2 learningS=1 lr=0.00002
make train-Mbert exp=mnli_seed_m/ber task=MNLI a-task=QQP test-split=dev_matched \
model_type_a=roberta model_name=openwebtext_ccnews_stories_books_cased \
gpu=0 nepochs=10 seed=2 learningS=1 lr=0.00002
make train-Mbert exp=mnli_seed_m/robertal task=MNLI a-task=PAWSall train-split=mnli_snli_train \
a-train-split=paws_qqp test-split=dev_matched bs=4 accm=8 model_type_a=robertal \
model_name=openwebtext_ccnews_stories_books_cased gpu=2 nepochs=5 \
seed=2 learningS=0 lr=0.00002
task: the target datasetsa-task: the auxiliary datasetslearningS: 0:gradient accumulation; 1: traditional MTL taskingaccm: the number of steps for gradient accumulationtrain-split: which split to use for traininga-train-split: which split to use for training for auxiliary datasets
Following are several examples for the evaluation of trained models on the specific task:
make test test-split=test from=[path to model] test_model=[model] task=SNLI
make test test-split=lexical_overlap from=[path to model] test_model=[model] task=MNLI-hans
make test test-split=dev from=[path to model] test_model=[model] task=PAWS
test-split: which split to be evaluatedfrom: the directory to save modelstest_model: the save model file nametask: which dataset to be evaluated
In the file dataset.py, you can implement your own dataset class (please see several examles in the file).
Then add your dataset class in the file task.py. Now you can set parameters on your task for training.
For example, if the dataset is called XXX,
make train-Mbert exp=mnli_seed_m/ber task=MNLI a-task=XXX test-split=dev_matched \
model_type_a=bert gpu=0 nepochs=10 seed=2 learningS=1 lr=0.00002
The above examples is to finetune BERT on MNLI and XXX, and do early stopping on MNLI dev_mached split.
@article{tu20tacl,
title = {An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models
},
author = {Lifu Tu and Garima Lalwani and Spandana Gella and He He},
journal = {Transactions of the Association of Computational Linguistics},
month = {},
url = {https://arxiv.org/abs/2007.06778},
year = {2020}
}