Skip to content

Commit f4d50d5

Browse files
Update tools/benchmarks/llm_eval_harness/meta_eval_reproduce/README.md
Co-authored-by: Hamid Shojanazeri <[email protected]>
1 parent 1450068 commit f4d50d5

File tree

1 file changed

+1
-1
lines changed
  • tools/benchmarks/llm_eval_harness/meta_eval_reproduce

1 file changed

+1
-1
lines changed

tools/benchmarks/llm_eval_harness/meta_eval_reproduce/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11

22
# Reproducing Meta 3.1 Evaluation Metrics Using LM-Evaluation-Harness
33

4-
As Meta Llama models gain popularity, evaluating these models has become increasingly important. We have released all the evaluation details for Meta-Llama 3.1 models as datasets in the [3.1 evals Hugging Face collection](https://huggingface.co/collections/meta-llama/llama-31-evals-66a2c5a14c2093e58298ac7f). This tutorial demonstrates how to reproduce metrics similar to our reported numbers using the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main) library and our prompts from the 3.1 evals datasets on selected tasks.
4+
As Meta Llama models gain popularity, evaluating these models has become increasingly important. We have released all the evaluation details for Meta-Llama 3.1 models as datasets in the [3.1 evals Hugging Face collection](https://huggingface.co/collections/meta-llama/llama-31-evals-66a2c5a14c2093e58298ac7f). This recipe demonstrates how to closely reproduce the Llama 3.1 reported benchmark numbers using the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main) library and our prompts from the 3.1 evals datasets on selected tasks.
55

66
## Disclaimer
77

0 commit comments

Comments
 (0)