You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,7 @@
1
+
## 0.4.2
2
+
3
+
* Adds the ability to provide a custom system prompt to the MMLU-based evaluators. When a system prompt is provided, LM-eval applies the chat template under the hood, else it will pass the model a barebones prompt.
4
+
1
5
## 0.4
2
6
3
7
* Added ability to specify a custom http client to MT-Bench
Copy file name to clipboardExpand all lines: src/instructlab/eval/mmlu.py
+23-7Lines changed: 23 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -102,6 +102,7 @@ class AbstractMMLUEvaluator(Evaluator):
102
102
few_shots number of examples
103
103
batch_size batch size for evaluation. Valid values are a positive integer or 'auto' to select the largest batch size that will fit in memory, or 'auto:N' to reselect the largest batch size N times'.
104
104
device PyTorch device (e.g. "cpu" or "cuda:0") for running models
105
+
system_prompt system prompt to be used when applying the chat template
@@ -213,12 +219,13 @@ class MMLUEvaluator(AbstractMMLUEvaluator):
213
219
Evaluator for Massive Multitask Language Understanding (MMLU)
214
220
215
221
Attributes:
216
-
model_path absolute path to or name of a huggingface model
217
-
tasks list of tasks for MMLU to test the model with
218
-
model_dtype dtype of model when served
219
-
few_shots number of examples
220
-
batch_size batch size for evaluation. Valid values are a positive integer or 'auto' to select the largest batch size that will fit in memory, or 'auto:N' to reselect the largest batch size N times'.
221
-
device PyTorch device (e.g. "cpu" or "cuda:0") for running models
222
+
model_path absolute path to or name of a huggingface model
223
+
tasks list of tasks for MMLU to test the model with
224
+
model_dtype dtype of model when served
225
+
few_shots number of examples
226
+
batch_size batch size for evaluation. Valid values are a positive integer or 'auto' to select the largest batch size that will fit in memory, or 'auto:N' to reselect the largest batch size N times'.
227
+
device PyTorch device (e.g. "cpu" or "cuda:0") for running models
228
+
system_prompt system prompt to be used when applying the chat template
0 commit comments