Skip to content

Commit 2501f51

Browse files
authored
Updates to accommodate OpenLLM leaderboard v2 tasks and change Meta Llama 3.1 to Llama 3.1 (meta-llama#639)
2 parents cf83593 + 8176c35 commit 2501f51

26 files changed

+144
-434
lines changed

.github/scripts/spellcheck_conf/wordlist.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1451,6 +1451,13 @@ openhathi
14511451
sarvam
14521452
subtask
14531453
acc
1454+
BigBench
1455+
IFEval
1456+
MuSR
1457+
Multistep
1458+
multistep
1459+
algorithmically
1460+
asymptote
14541461
Triaging
14551462
matplotlib
14561463
remediations

tools/benchmarks/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Benchmarks
22

33
* inference - a folder contains benchmark scripts that apply a throughput analysis for Llama models inference on various backends including on-prem, cloud and on-device.
4-
* llm_eval_harness - a folder contains a tool to evaluate fine-tuned Llama models including quantized models focusing on quality.
4+
* llm_eval_harness - a folder that introduces `lm-evaluation-harness`, a tool to evaluate Llama models including quantized models focusing on quality. We also included a recipe that calculates Llama 3.1 evaluation metrics Using `lm-evaluation-harness` and instructions that calculate HuggingFace Open LLM Leaderboard v2 metrics.

tools/benchmarks/llm_eval_harness/README.md

Lines changed: 86 additions & 58 deletions
Large diffs are not rendered by default.

tools/benchmarks/llm_eval_harness/eval.py

Lines changed: 0 additions & 229 deletions
This file was deleted.

0 commit comments

Comments
 (0)