Skip to content

Commit aeb4b79

Browse files
[NVFP4] Add lm-eval test case (#1689)
Summary - Enable and nvfp4 weekly lm-eval test vLLM: ```bash |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.6899|± |0.0127| | | |strict-match | 5|exact_match|↑ |0.6384|± |0.0132| ``` Us: ``` |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.7036|± |0.0126| | | |strict-match | 5|exact_match|↑ |0.6573|± |0.0131| ``` --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
1 parent 2de9769 commit aeb4b79

File tree

2 files changed

+11
-1
lines changed

2 files changed

+11
-1
lines changed

tests/lmeval/configs/w4a4_nvfp4.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
cadence: "weekly"
2+
model: meta-llama/Llama-3.1-8B-Instruct
3+
scheme: NVFP4
4+
dataset_id: HuggingFaceH4/ultrachat_200k
5+
dataset_split: train_sft
6+
num_calibration_samples: 20
7+
lmeval:
8+
metrics:
9+
exact_match,flexible-extract: 0.70
10+
exact_match,strict-match: 0.65

tests/lmeval/test_lmeval.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ def set_up(self, test_data_file: str):
9090
logger.info("========== RUNNING ==============")
9191
logger.info(self.scheme)
9292

93-
self.num_calibration_samples = 512
93+
self.num_calibration_samples = eval_config.get("num_calibration_samples", 512)
9494
self.max_seq_length = 2048
9595

9696
def test_lm_eval(self, test_data_file: str):

0 commit comments

Comments
 (0)