fix: improve local eval config and doc by yuki-97 · Pull Request #1528 · NVIDIA-NeMo/RL

yuki-97 · 2025-11-17T06:37:20Z

Add some importance settings to examples/configs/evals/local_eval.yaml directly instead of inherited from examples/configs/evals/eval.yaml for easy use.
Update local eval dataset usage in doc.

Closes #1512.

Summary by CodeRabbit

Documentation
- Updated local dataset evaluation guide with instructions for configuring custom dataset keys (e.g., "Question" and "Answer") in CLI commands.
Chores
- Enhanced evaluation configuration template with additional setup options including system prompt file configuration and dataset format specifications.

Signed-off-by: Yuki Huang <yukih@nvidia.com>

coderabbitai · 2025-11-17T06:40:38Z

📝 Walkthrough

Walkthrough

Updates local evaluation documentation and configuration example files. Replaces previous sample CLI invocation with new parameters for dataset name and key mappings (data.dataset_name, data.problem_key, data.solution_key). Adds vllm generation configuration, system_prompt_file, data split type, and file format options to the example YAML configuration.

Changes

Cohort / File(s)	Summary
Documentation update `docs/guides/eval.md`	Replaces CSV-prefetched example with new local dataset configuration snippet. Updates guidance text to specify "Question" and "Answer" keys. Expands CLI invocation to include dataset_name and key parameters.
Configuration file enhancement `examples/configs/evals/local_eval.yaml`	Adds vllm_cfg with max_model_len: 2048 under generation. Introduces system_prompt_file: null under data. Adds split: "train" and file_format: "csv" fields. Includes explanatory comments about local and HuggingFace dataset usage.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Documentation update is straightforward and involves replacing an example snippet
Configuration file additions are simple field additions with clear purposes
No logic changes or complex interactions to verify
May verify that new configuration keys (vllm_cfg, system_prompt_file, split, file_format) are supported by the evaluation framework

Pre-merge checks and finishing touches

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: improve local eval config and doc' directly addresses the main changes: improvements to local eval configuration and documentation updates.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes	✅ Passed	Changes are limited to documentation and configuration clarifications in example files, not major changes requiring test results.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch yukih/improve-local-eval

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

examples/configs/evals/local_eval.yaml (2)
13-19: Clarify dataset customization expectations in comments.

The inline comments (lines 13–14) are helpful for local vs. HuggingFace paths, but the default dataset_name uses a hardcoded Azure blob URL. Consider adding a note that:

The URL example is for MATH-500 evaluation and users should replace it with their own dataset path or HuggingFace dataset identifier

The split, file_format, and key mappings (problem_key, solution_key) should be adjusted to match their dataset structure

This will help users understand they need to override these values for custom datasets.
  # You can also use custom datasets from a local dataset or HuggingFace.
  # e.g., /path/to/local/dataset or hf_org/hf_dataset_name (HuggingFace)
+ # For custom datasets, also update problem_key, solution_key, split, and file_format as needed.
12-12: Document system_prompt_file setting.

system_prompt_file: null is now explicitly set rather than inherited. Add an inline comment explaining that null disables custom system prompts (allowing the model's native template to be used), which aligns with the guidance in eval.md (line 36).
- system_prompt_file: null
+ system_prompt_file: null  # null uses model's native chat template; set to file path for custom system prompts

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fc917f and 0c4bba4.

📒 Files selected for processing (2)

docs/guides/eval.md (1 hunks)
examples/configs/evals/local_eval.yaml (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-19T03:00:58.662Z

Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.

Applied to files:

examples/configs/evals/local_eval.yaml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Lint check
GitHub Check: Post automodel integration comment / Comment on PR
GitHub Check: Post submodule check comment / Comment on PR

🔇 Additional comments (3)

examples/configs/evals/local_eval.yaml (1)

7-8: Verify max_model_len is intentionally restrictive or should match model context.

Setting max_model_len: 2048 hardcodes a limit significantly lower than Qwen2.5-7B-Instruct's actual context window. Per learnings from similar configs (PR 1006), overflow handling is implemented in vllm_worker.py when generation exceeds max_model_len. Verify whether this is intentionally set as a safeguard for evaluation or whether it should align with the model's native context (typically ~32k for this model family). If intentional, add an inline comment explaining the rationale.

docs/guides/eval.md (2)

67-72: Good placement and clarity of local dataset example.

The new example clearly demonstrates how to override dataset parameters for local evaluation, and the comment on line 67 is concise and helpful. The example aligns well with the updated local_eval.yaml configuration and follows the established CLI override pattern used throughout the guide.

Consider adding a note that split and file_format may also need customization for non-CSV datasets or different data splits, though the current example covers the essential parameters.

36-36: Verify documentation consistency on system_prompt_file.

Line 36 mentions setting data.system_prompt_file=null to use native chat templates. The updated local_eval.yaml (line 12) now explicitly defaults to system_prompt_file: null. Confirm this is intentional alignment and consider whether a cross-reference between this section and the config file would help users understand the relationship.

Signed-off-by: Yuki Huang <yukih@nvidia.com>

Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

improve local eval config and doc

0c4bba4

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 requested review from a team as code owners November 17, 2025 06:37

github-actions bot added the documentation Improvements or additions to documentation label Nov 17, 2025

coderabbitai bot reviewed Nov 17, 2025

View reviewed changes

yuki-97 added the CI:L1 Run doctests, unit tests, and functional tests label Nov 17, 2025

yuki-97 temporarily deployed to nemo-ci November 17, 2025 06:43 — with GitHub Actions Inactive

yuki-97 mentioned this pull request Nov 17, 2025

Confusing local_eval.yaml example: unclear dataset path usage and no system prompt support #1512

Closed

yuki-97 requested a review from terrykong November 17, 2025 06:43

yuki-97 temporarily deployed to nemo-ci November 17, 2025 06:43 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci November 17, 2025 09:04 — with GitHub Actions Inactive

terrykong approved these changes Nov 17, 2025

View reviewed changes

terrykong merged commit 45f5ce6 into main Nov 17, 2025
45 of 46 checks passed

terrykong deleted the yukih/improve-local-eval branch November 17, 2025 20:08

PrinsYin pushed a commit to PrinsYin/RL that referenced this pull request Nov 30, 2025

fix: improve local eval config and doc (NVIDIA-NeMo#1528)

9fd1727

Signed-off-by: Yuki Huang <yukih@nvidia.com>

DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026

fix: improve local eval config and doc (NVIDIA-NeMo#1528)

45c27d8

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026

fix: improve local eval config and doc (NVIDIA-NeMo#1528)

aadbebf

Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve local eval config and doc#1528

fix: improve local eval config and doc#1528
terrykong merged 1 commit intomainfrom
yukih/improve-local-eval

yuki-97 commented Nov 17, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 17, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yuki-97 commented Nov 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuki-97 commented Nov 17, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 17, 2025 •

edited

Loading