fix: improve local eval config and doc#1528
Conversation
Signed-off-by: Yuki Huang <yukih@nvidia.com>
📝 WalkthroughWalkthroughUpdates local evaluation documentation and configuration example files. Replaces previous sample CLI invocation with new parameters for dataset name and key mappings (data.dataset_name, data.problem_key, data.solution_key). Adds vllm generation configuration, system_prompt_file, data split type, and file format options to the example YAML configuration. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes
Pre-merge checks and finishing touches✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
examples/configs/evals/local_eval.yaml (2)
13-19: Clarify dataset customization expectations in comments.The inline comments (lines 13–14) are helpful for local vs. HuggingFace paths, but the default
dataset_nameuses a hardcoded Azure blob URL. Consider adding a note that:
- The URL example is for MATH-500 evaluation and users should replace it with their own dataset path or HuggingFace dataset identifier
- The
split,file_format, and key mappings (problem_key,solution_key) should be adjusted to match their dataset structureThis will help users understand they need to override these values for custom datasets.
# You can also use custom datasets from a local dataset or HuggingFace. # e.g., /path/to/local/dataset or hf_org/hf_dataset_name (HuggingFace) + # For custom datasets, also update problem_key, solution_key, split, and file_format as needed.
12-12: Document system_prompt_file setting.
system_prompt_file: nullis now explicitly set rather than inherited. Add an inline comment explaining that null disables custom system prompts (allowing the model's native template to be used), which aligns with the guidance in eval.md (line 36).- system_prompt_file: null + system_prompt_file: null # null uses model's native chat template; set to file path for custom system prompts
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/guides/eval.md(1 hunks)examples/configs/evals/local_eval.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-19T03:00:58.662Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.
Applied to files:
examples/configs/evals/local_eval.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Lint check
- GitHub Check: Post automodel integration comment / Comment on PR
- GitHub Check: Post submodule check comment / Comment on PR
🔇 Additional comments (3)
examples/configs/evals/local_eval.yaml (1)
7-8: Verify max_model_len is intentionally restrictive or should match model context.Setting
max_model_len: 2048hardcodes a limit significantly lower than Qwen2.5-7B-Instruct's actual context window. Per learnings from similar configs (PR 1006), overflow handling is implemented in vllm_worker.py when generation exceeds max_model_len. Verify whether this is intentionally set as a safeguard for evaluation or whether it should align with the model's native context (typically ~32k for this model family). If intentional, add an inline comment explaining the rationale.docs/guides/eval.md (2)
67-72: Good placement and clarity of local dataset example.The new example clearly demonstrates how to override dataset parameters for local evaluation, and the comment on line 67 is concise and helpful. The example aligns well with the updated
local_eval.yamlconfiguration and follows the established CLI override pattern used throughout the guide.Consider adding a note that
splitandfile_formatmay also need customization for non-CSV datasets or different data splits, though the current example covers the essential parameters.
36-36: Verify documentation consistency on system_prompt_file.Line 36 mentions setting
data.system_prompt_file=nullto use native chat templates. The updatedlocal_eval.yaml(line 12) now explicitly defaults tosystem_prompt_file: null. Confirm this is intentional alignment and consider whether a cross-reference between this section and the config file would help users understand the relationship.
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
examples/configs/evals/local_eval.yamldirectly instead of inherited fromexamples/configs/evals/eval.yamlfor easy use.Closes #1512.
Summary by CodeRabbit
Documentation
Chores