The evaluation harness from EleutherAI is integrated a submodule. We use a fork on HF's Github. To initialize the submodule, run:
git submodule init
git submodule updateMake sure you have the requirements in lm-evaluation-harness:
cd lm-evaluation-harness
pip install -r requirements.txtTo launch an evaluation, run:
python lm-evaluation-harness/main.py \
--model gpt2 \
--model_args pretrained=gpt2-xl \
--tasks cola,mrpc,rte,qnli,qqp,sst,boolq,cb,copa,multirc,record,wic,wsc,coqa,drop,lambada,lambada_cloze,piqa,pubmedqa,sciq \
--provide_description \ # Whether to provide the task description
--num_fewshot 3 \ # Number of priming pairs
--batch_size 2 \
--output_path eval-gpt2-xlPlease note:
- As of now, only single GPU is supported in
lm-evaluation-harness.