Skip to content

fix: improve local eval config and doc#1528

Merged
terrykong merged 1 commit intomainfrom
yukih/improve-local-eval
Nov 17, 2025
Merged

fix: improve local eval config and doc#1528
terrykong merged 1 commit intomainfrom
yukih/improve-local-eval

Conversation

@yuki-97
Copy link
Copy Markdown
Contributor

@yuki-97 yuki-97 commented Nov 17, 2025

  1. Add some importance settings to examples/configs/evals/local_eval.yaml directly instead of inherited from examples/configs/evals/eval.yaml for easy use.
  2. Update local eval dataset usage in doc.

Closes #1512.

Summary by CodeRabbit

  • Documentation

    • Updated local dataset evaluation guide with instructions for configuring custom dataset keys (e.g., "Question" and "Answer") in CLI commands.
  • Chores

    • Enhanced evaluation configuration template with additional setup options including system prompt file configuration and dataset format specifications.

Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 requested review from a team as code owners November 17, 2025 06:37
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 17, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 17, 2025

📝 Walkthrough

Walkthrough

Updates local evaluation documentation and configuration example files. Replaces previous sample CLI invocation with new parameters for dataset name and key mappings (data.dataset_name, data.problem_key, data.solution_key). Adds vllm generation configuration, system_prompt_file, data split type, and file format options to the example YAML configuration.

Changes

Cohort / File(s) Summary
Documentation update
docs/guides/eval.md
Replaces CSV-prefetched example with new local dataset configuration snippet. Updates guidance text to specify "Question" and "Answer" keys. Expands CLI invocation to include dataset_name and key parameters.
Configuration file enhancement
examples/configs/evals/local_eval.yaml
Adds vllm_cfg with max_model_len: 2048 under generation. Introduces system_prompt_file: null under data. Adds split: "train" and file_format: "csv" fields. Includes explanatory comments about local and HuggingFace dataset usage.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

  • Documentation update is straightforward and involves replacing an example snippet
  • Configuration file additions are simple field additions with clear purposes
  • No logic changes or complex interactions to verify
  • May verify that new configuration keys (vllm_cfg, system_prompt_file, split, file_format) are supported by the evaluation framework

Pre-merge checks and finishing touches

✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: improve local eval config and doc' directly addresses the main changes: improvements to local eval configuration and documentation updates.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes ✅ Passed Changes are limited to documentation and configuration clarifications in example files, not major changes requiring test results.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch yukih/improve-local-eval

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
examples/configs/evals/local_eval.yaml (2)

13-19: Clarify dataset customization expectations in comments.

The inline comments (lines 13–14) are helpful for local vs. HuggingFace paths, but the default dataset_name uses a hardcoded Azure blob URL. Consider adding a note that:

  • The URL example is for MATH-500 evaluation and users should replace it with their own dataset path or HuggingFace dataset identifier
  • The split, file_format, and key mappings (problem_key, solution_key) should be adjusted to match their dataset structure

This will help users understand they need to override these values for custom datasets.

  # You can also use custom datasets from a local dataset or HuggingFace.
  # e.g., /path/to/local/dataset or hf_org/hf_dataset_name (HuggingFace)
+ # For custom datasets, also update problem_key, solution_key, split, and file_format as needed.

12-12: Document system_prompt_file setting.

system_prompt_file: null is now explicitly set rather than inherited. Add an inline comment explaining that null disables custom system prompts (allowing the model's native template to be used), which aligns with the guidance in eval.md (line 36).

- system_prompt_file: null
+ system_prompt_file: null  # null uses model's native chat template; set to file path for custom system prompts
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fc917f and 0c4bba4.

📒 Files selected for processing (2)
  • docs/guides/eval.md (1 hunks)
  • examples/configs/evals/local_eval.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-19T03:00:58.662Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.

Applied to files:

  • examples/configs/evals/local_eval.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Lint check
  • GitHub Check: Post automodel integration comment / Comment on PR
  • GitHub Check: Post submodule check comment / Comment on PR
🔇 Additional comments (3)
examples/configs/evals/local_eval.yaml (1)

7-8: Verify max_model_len is intentionally restrictive or should match model context.

Setting max_model_len: 2048 hardcodes a limit significantly lower than Qwen2.5-7B-Instruct's actual context window. Per learnings from similar configs (PR 1006), overflow handling is implemented in vllm_worker.py when generation exceeds max_model_len. Verify whether this is intentionally set as a safeguard for evaluation or whether it should align with the model's native context (typically ~32k for this model family). If intentional, add an inline comment explaining the rationale.

docs/guides/eval.md (2)

67-72: Good placement and clarity of local dataset example.

The new example clearly demonstrates how to override dataset parameters for local evaluation, and the comment on line 67 is concise and helpful. The example aligns well with the updated local_eval.yaml configuration and follows the established CLI override pattern used throughout the guide.

Consider adding a note that split and file_format may also need customization for non-CSV datasets or different data splits, though the current example covers the essential parameters.


36-36: Verify documentation consistency on system_prompt_file.

Line 36 mentions setting data.system_prompt_file=null to use native chat templates. The updated local_eval.yaml (line 12) now explicitly defaults to system_prompt_file: null. Confirm this is intentional alignment and consider whether a cross-reference between this section and the config file would help users understand the relationship.

@terrykong terrykong merged commit 45f5ce6 into main Nov 17, 2025
45 of 46 checks passed
@terrykong terrykong deleted the yukih/improve-local-eval branch November 17, 2025 20:08
PrinsYin pushed a commit to PrinsYin/RL that referenced this pull request Nov 30, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026
Signed-off-by: Yuki Huang <yukih@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Confusing local_eval.yaml example: unclear dataset path usage and no system prompt support

2 participants