Skip to content

Commit a523d27

Browse files
committed
[README] Add instructions for 📏 RULER benchmarks
1 parent f74c46a commit a523d27

File tree

1 file changed

+28
-6
lines changed

1 file changed

+28
-6
lines changed

README.md

Lines changed: 28 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -411,13 +411,35 @@ Running the command above will provide the task results reported in the GLA pape
411411
To perform data-parallel evaluation (where each GPU loads a separate full copy of the model), we leverage the accelerate launcher as follows:
412412
```sh
413413
$ PATH='fla-hub/gla-1.3B-100B'
414-
$ accelerate launch -m evals.harness --model hf \
415-
--model_args pretrained=$PATH,dtype=bfloat16 \
414+
$ accelerate launch -m evals.harness --model hf \
415+
--model_args pretrained=$PATH,dtype=bfloat16,trust_remote_code=True \
416416
--tasks wikitext,lambada_openai,piqa,hellaswag,winogrande,arc_easy,arc_challenge,boolq,sciq,copa,openbookqa \
417-
--batch_size 64 \
418-
--num_fewshot 0 \
419-
--device cuda \
420-
--show_config
417+
--batch_size 64 \
418+
--num_fewshot 0 \
419+
--device cuda \
420+
--show_config \
421+
--trust_remote_code
422+
```
423+
424+
4. 📏 RULER Benchmark suite
425+
426+
The RULER benchmarks are commonly used for evaluating model performance on long-context tasks.
427+
You can evaluate `fla` models on RULER directly using `lm-evaluation-harness`.
428+
429+
First, install the necessary dependencies for RULER:
430+
```sh
431+
pip install lm_eval["ruler"]
432+
```
433+
Then, run evaluation by (e.g., 32k contexts):
434+
```sh
435+
accelerate launch -m lm_eval \
436+
--output_path $OUTPUT \
437+
--tasks niah_single_1,niah_single_2,niah_single_3,niah_multikey_1,niah_multikey_2,niah_multikey_3,niah_multiquery,niah_multivalue,ruler_vt,ruler_cwe,ruler_fwe,ruler_qa_hotpot,ruler_qa_squad \
438+
--model_args pretrained=$PATH,dtype=bfloat16,max_length=32768,trust_remote_code=True \
439+
--metadata='{"max_seq_lengths":[4096,8192,16384,32768]}' \
440+
--batch_size 2 \
441+
--show_config \
442+
--trust_remote_code
421443
```
422444

423445
If a GPU can't load a full copy of the model, please refer to [this link](https://github.com/EleutherAI/lm-evaluation-harness?tab=readme-ov-file#multi-gpu-evaluation-with-hugging-face-accelerate) for FSDP settings.

0 commit comments

Comments
 (0)