Commit e95aa9d
add option to run mmlu with 5 shots (#6146)
Summary:
Pull Request resolved: #6146
This PR does the following changes:
- add `--num_fewshot` option which is required for running MMLU task with 5 shots
- set the default value of `--limit` to none such that we can actually run all examples
- update `eval_llama` to call `simple_evaluate` which is a wrapper of `evaluate` and does some extra work for us like getting the task dict
Test Plan:
- Make sure WikiText perplexity for llama 3.2 1B stays the same before and after the change.
Before, run eval_llama for llama 3.2 1B with limit set to None:
```
wikitext: {'word_perplexity,none': 12.78246428138387, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.610432252171856, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.6874479705552373, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
```
After, run eval_llama for llama 3.2 1B:
```
wikitext: {'word_perplexity,none': 12.78246428138387, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.610432252171856, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.6874479705552373, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}
```
- Make sure that lm_eval(v0.4.2, which is used by eval_llama) and eval_llama reports similar number for llama 3.2 1B and 3B BF16 for MMLU task with 5 shots.
Example command for lm_eval:
```
lm_eval --model hf \
--model_args pretrained=meta-llama/Llama-3.2-1B-Instruct \
--tasks mmlu \
--device cuda \
-f 5 \
--batch_size auto
```
Example command for eval_llama:
```
python -m examples.models.llama2.eval_llama \
-c /home/lunwenh/models/1B_Instruct/consolidated.00.pth \
-p /home/lunwenh/models/1B_Instruct/params.json \
-t /home/lunwenh/models/1B_Instruct/tokenizer.model \
-kv \
-d bf16 \
--tasks mmlu \
-f 5 \
--max_seq_length 2048
```
imported-using-ghimport
Reviewed By: mergennachin
Differential Revision: D64215268
Pulled By: helunwencser
fbshipit-source-id: 606dd279201c4165cf8d218da50cef1457288ed61 parent 61c501c commit e95aa9d
File tree
3 files changed
+22
-50
lines changed- examples/models/llama2
- evaluate
3 files changed
+22
-50
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | | - | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| |||
246 | 247 | | |
247 | 248 | | |
248 | 249 | | |
249 | | - | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
250 | 262 | | |
251 | | - | |
252 | 263 | | |
253 | 264 | | |
254 | 265 | | |
| |||
281 | 292 | | |
282 | 293 | | |
283 | 294 | | |
284 | | - | |
285 | | - | |
286 | | - | |
287 | | - | |
288 | | - | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
289 | 302 | | |
290 | 303 | | |
291 | 304 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
11 | 10 | | |
12 | 11 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
11 | 10 | | |
12 | 11 | | |
13 | 12 | | |
14 | 13 | | |
15 | 14 | | |
16 | 15 | | |
17 | | - | |
18 | | - | |
19 | 16 | | |
20 | | - | |
21 | 17 | | |
22 | 18 | | |
23 | 19 | | |
| |||
79 | 75 | | |
80 | 76 | | |
81 | 77 | | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
88 | | - | |
89 | | - | |
90 | | - | |
91 | | - | |
92 | | - | |
93 | | - | |
94 | | - | |
95 | | - | |
96 | | - | |
97 | | - | |
98 | | - | |
99 | | - | |
100 | | - | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | | - | |
105 | | - | |
106 | | - | |
107 | | - | |
108 | | - | |
109 | | - | |
110 | | - | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
0 commit comments