Commit 127ecf9
committed
Update on "add option to run mmlu with 5 shots"
This PR does the following changes:
- add `--num_fewshot` option which is required for running MMLU task with 5 shots
- set the default value of `--limit` to none such that we can actually run all examples
- update `eval_llama` to call `simple_evaluate` which is a wrapper of `evaluate` and does some extra work for us like getting the task dict
Test Plan:
- Make sure WikiText perplexity for llama 3.2 1B stays the same before and after the change.
Before, run eval_llama for llama 3.2 1B with limit set to None:
```
wikitext: {'word_perplexity,none': 12.78246428138387, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.610432252171856, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.6874479705552373, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
```
After, run eval_llama for llama 3.2 1B:
```
wikitext: {'word_perplexity,none': 12.78246428138387, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.610432252171856, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.6874479705552373, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
```
- Make sure that lm_eval(v0.4.2, which is used by eval_llama) and eval_llama reports similar number for llama 3.2 1B and 3B BF16 for MMLU task with 5 shots.
Example command for lm_eval:
```
lm_eval --model hf \
--model_args pretrained=meta-llama/Llama-3.2-1B-Instruct \
--tasks mmlu \
--device cuda \
-f 5 \
--batch_size auto
```
Example command for eval_llama:
```
python -m examples.models.llama2.eval_llama \
-c /home/lunwenh/models/1B_Instruct/consolidated.00.pth \
-p /home/lunwenh/models/1B_Instruct/params.json \
-t /home/lunwenh/models/1B_Instruct/tokenizer.model \
-kv \
-d bf16 \
--tasks mmlu \
-f 5 \
--max_seq_length 2048
```
Differential Revision: [D64215268](https://our.internmc.facebook.com/intern/diff/D64215268)
[ghstack-poisoned]1 parent 42aaf6a commit 127ecf9
1 file changed
+4
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
247 | 247 | | |
248 | 248 | | |
249 | 249 | | |
250 | | - | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
251 | 254 | | |
252 | 255 | | |
253 | 256 | | |
| |||
0 commit comments