Skip to content

Commit b1c86a8

Browse files
chtruong814yuki-97
andauthored
cp: docs: fix frontpage outdated eval docs (738) into r0.3.0 (#756)
Signed-off-by: Yuki Huang <[email protected]> Signed-off-by: NeMo Bot <[email protected]> Co-authored-by: yuki <[email protected]>
1 parent 5572ead commit b1c86a8

File tree

2 files changed

+6
-6
lines changed

2 files changed

+6
-6
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -375,12 +375,12 @@ Run evaluation script with custom settings:
375375
# Example: Evaluation of DeepScaleR-1.5B-Preview on MATH-500 using 8 GPUs
376376
# Pass@1 accuracy averaged over 16 samples for each problem
377377
uv run python examples/run_eval.py \
378+
--config examples/configs/evals/math_eval.yaml \
378379
generation.model_name=agentica-org/DeepScaleR-1.5B-Preview \
379380
generation.temperature=0.6 \
380381
generation.top_p=0.95 \
381382
generation.vllm_cfg.max_model_len=32768 \
382-
data.dataset_name=HuggingFaceH4/MATH-500 \
383-
data.dataset_key=test \
383+
data.dataset_name=math500 \
384384
eval.num_tests_per_prompt=16 \
385385
cluster.gpus_per_node=8
386386
```

docs/guides/eval.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -65,8 +65,8 @@ uv run python examples/run_eval.py \
6565
generation.model_name=agentica-org/DeepScaleR-1.5B-Preview \
6666
generation.temperature=0.6 \
6767
generation.top_p=0.95 \
68-
generation.vllm_cfg.max_model_len=32768 \
69-
data.dataset_name="math500" \
68+
generation.vllm_cfg.max_model_len=32768 \
69+
data.dataset_name=math500 \
7070
eval.num_tests_per_prompt=16 \
7171
cluster.gpus_per_node=8
7272
```
@@ -78,10 +78,10 @@ When you complete the evaluation, you will receive a summary similar to the foll
7878

7979
```
8080
============================================================
81-
model_name='Qwen2.5-Math-1.5B-Instruct' dataset_name='aime_2024'
81+
model_name='Qwen2.5-Math-1.5B-Instruct' dataset_name='aime2024'
8282
max_new_tokens=2048 temperature=0.0 top_p=1.0 top_k=-1
8383
84-
metric='pass@1' num_tests_per_prompt=1
84+
metric='pass@k' pass_k_value=1 num_tests_per_prompt=1
8585
8686
score=0.1000 (3.0/30)
8787
============================================================

0 commit comments

Comments
 (0)