Skip to content

Commit ce38464

Browse files
committed
make the leaderboard dependencies into an optional target under instructlab-eval[leaderboard]
Signed-off-by: Oleg Silkin <[email protected]>
1 parent aa573d9 commit ce38464

File tree

5 files changed

+27
-4
lines changed

5 files changed

+27
-4
lines changed

README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,16 @@ the phase. At the end of each phase, we evaluate all the checkpoints in order to
2929
Once training is complete, and we have picked the best checkpoint from the output of the final phase, we can run full-scale evaluation suite which runs MT-Bench, MMLU,
3030
MT-Bench Branch and MMLU Branch.
3131

32+
### Leaderboard Evaluation
33+
34+
For cases when you want to run the full Open LLM Leaderboard v2 evaluation suite, we provide an optional dependency package for the leaderboard tasks. This includes additional benchmarks like GPQA, IFEVAL, BBH, MMLU-PRO, MUSR, and MATH-HARD.
35+
36+
To install the optional leaderboard dependencies, use:
37+
38+
```bash
39+
pip install instructlab-eval[leaderboard]
40+
```
41+
3242
## Methods of Evaluation
3343

3444
Below are more in-depth explanations of the suite of benchmarks we are using as methods for evaluation of models.

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ package-dir = {"" = "src"}
5454

5555
[tool.setuptools.dynamic]
5656
dependencies = {file = ["requirements.txt"]}
57+
optional-dependencies = {leaderboard = {file = ["requirements-leaderboard.txt"]}}
5758

5859
[tool.setuptools.packages.find]
5960
where = ["src"]

requirements.txt

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,7 @@ transformers
88
accelerate
99
pandas
1010
pandas-stubs
11-
# All optional dependencies like this can be found in lm-eval:
12-
# https://github.com/EleutherAI/lm-evaluation-harness/blob/main/pyproject.toml
13-
lm-eval[math,ifeval,sentencepiece,vllm]>=0.4.4
11+
# Base lm-eval dependency
12+
lm-eval>=0.4.4
1413
httpx
1514
ragas>=0.2.11

scripts/test_leaderboard.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,19 @@
1+
#!/usr/bin/env python
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
# NOTE: This script requires the leaderboard optional dependencies.
5+
# Install with: pip install instructlab-eval[leaderboard]
6+
17
# First Party
8+
import json
29
from instructlab.eval.leaderboard import LeaderboardV2Evaluator
310

411
if __name__ == "__main__":
512
evaluator = LeaderboardV2Evaluator(
6-
model_path="ibm-granite/granite-3.1-8b-instruct",
13+
model_path="ibm-granite/granite-3.1-8b-base",
14+
eval_config={
15+
"apply_chat_template": False,
16+
},
717
)
818
results = evaluator.run()
919
print("got results from leaderboard v2")

src/instructlab/eval/leaderboard.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -541,6 +541,9 @@ def calculate_overall_leaderboard_score(results: t.Dict[str, ParsedScores]) -> f
541541
class LeaderboardV2Evaluator(Evaluator):
542542
"""
543543
Evaluator for Open Leaderboard v2.
544+
545+
NOTE: This evaluator requires the optional leaderboard dependencies.
546+
Install with: pip install instructlab-eval[leaderboard]
544547
"""
545548

546549
name = "leaderboard_v2"

0 commit comments

Comments
 (0)