Skip to content

Conversation

@bzantium
Copy link
Contributor

@bzantium bzantium commented Dec 31, 2025

image

Fixes: #1148

Summary by CodeRabbit

  • Documentation

    • Updated evaluation guides and tutorials to reference current model configuration.
  • Chores

    • Updated default judge model reference in evaluation pipeline configurations across HLE and SimpleQA modules.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: bzantium <ryumin93@gmail.com>
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 31, 2025

Greptile Summary

This PR fixes an incorrect OpenAI model identifier used for the LLM-as-a-judge feature in evaluation pipelines. The model name o3-mini-20250131 was not a valid OpenAI model identifier, causing evaluation failures. The correct identifier is o3-mini-2025-01-31.

Confidence Score: 5/5

  • This PR is a straightforward string replacement bug fix with zero risk of introducing regressions
  • The change is a simple and consistent string replacement of an incorrect model identifier across all affected files. No logic changes, no new code paths, and the fix directly addresses a documented issue where evaluations were failing.
  • No files require special attention

Important Files Changed

Filename Overview
docs/evaluation/natural-math.md Updated example model name from o3-mini-20250131 to o3-mini-2025-01-31 in JUDGE_PIPELINE_ARGS documentation
docs/tutorials/posts/llama-nemotron-super-v1.5-evals.md Updated judge model name to o3-mini-2025-01-31 in documentation text and command examples for HLE evaluation (reasoning on/off)
docs/tutorials/posts/nemotron-nano-v2-evals.md Updated judge model name to o3-mini-2025-01-31 in documentation text and command example for HLE evaluation
nemo_skills/dataset/hle/init.py Fixed incorrect model identifier in JUDGE_PIPELINE_ARGS from o3-mini-20250131 to o3-mini-2025-01-31
nemo_skills/dataset/simpleqa/init.py Fixed incorrect model identifier in JUDGE_PIPELINE_ARGS from o3-mini-20250131 to o3-mini-2025-01-31

Sequence Diagram

sequenceDiagram
    participant User
    participant NemoSkills as Nemo-Skills CLI
    participant Config as Dataset Config<br/>(hle/__init__.py)
    participant OpenAI as OpenAI API

    User->>NemoSkills: ns eval --benchmarks=hle
    NemoSkills->>Config: Load JUDGE_PIPELINE_ARGS
    Config-->>NemoSkills: model: "o3-mini-2025-01-31"
    NemoSkills->>OpenAI: POST /v1/chat/completions<br/>model=o3-mini-2025-01-31
    OpenAI-->>NemoSkills: Judge response
    NemoSkills-->>User: Evaluation results
Loading

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 31, 2025

Greptile found no issues!

From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 31, 2025

📝 Walkthrough

Walkthrough

This pull request corrects a model identifier across documentation and configuration files. The judge model name is updated from "o3-mini-20250131" to "o3-mini-2025-01-31" (the correct OpenAI model name) in five files. No functional logic or control flow changes are introduced.

Changes

Cohort / File(s) Summary
Documentation files
docs/evaluation/natural-math.md, docs/tutorials/posts/llama-nemotron-super-v1.5-evals.md, docs/tutorials/posts/nemotron-nano-v2-evals.md
Updated JUDGE_PIPELINE_ARGS model reference from "o3-mini-20250131" to "o3-mini-2025-01-31" in example text and command snippets
Dataset configuration files
nemo_skills/dataset/hle/__init__.py, nemo_skills/dataset/simpleqa/__init__.py
Updated JUDGE_PIPELINE_ARGS constant dictionary "model" field value from "o3-mini-20250131" to "o3-mini-2025-01-31"

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Suggested reviewers

  • ekmb

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: updating the model name from o3-mini-20250131 to o3-mini-2025-01-31, matching the file modifications.
Linked Issues check ✅ Passed The PR successfully replaces all occurrences of the unavailable o3-mini-20250131 model with the correct o3-mini-2025-01-31 identifier across documentation and code.
Out of Scope Changes check ✅ Passed All changes are in-scope: the PR updates only model name references in documentation and code constants, directly addressing the issue requirement.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 91edc30 and 9bc4a9c.

📒 Files selected for processing (5)
  • docs/evaluation/natural-math.md
  • docs/tutorials/posts/llama-nemotron-super-v1.5-evals.md
  • docs/tutorials/posts/nemotron-nano-v2-evals.md
  • nemo_skills/dataset/hle/__init__.py
  • nemo_skills/dataset/simpleqa/__init__.py
🔇 Additional comments (6)
docs/tutorials/posts/nemotron-nano-v2-evals.md (1)

191-211: Documentation updates are consistent.

The model identifier has been correctly updated in both the descriptive text and the command example.

nemo_skills/dataset/simpleqa/__init__.py (1)

23-29: Model identifier update is correct.

The update is consistent with the changes in nemo_skills/dataset/hle/__init__.py, ensuring uniform configuration across datasets.

docs/evaluation/natural-math.md (1)

106-112: Documentation example updated correctly.

The example configuration now reflects the correct model identifier, consistent with the actual code changes.

docs/tutorials/posts/llama-nemotron-super-v1.5-evals.md (2)

161-180: Reasoning-on HLE evaluation section updated correctly.

The model identifier has been updated in both the descriptive text and the command example.


437-443: Reasoning-off HLE evaluation section updated correctly.

The model identifier is consistent with the reasoning-on section updates.

nemo_skills/dataset/hle/__init__.py (1)

23-29: Model identifier update is correct and complete.

The update from "o3-mini-20250131" to "o3-mini-2025-01-31" has been properly applied. The old identifier does not appear anywhere in the codebase, and the new identifier is consistently used across all dataset configurations (hle and simpleqa modules). The comment and configuration are aligned.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Kipok Kipok requested a review from wedu-nvidia January 6, 2026 21:12
@Kipok
Copy link
Collaborator

Kipok commented Jan 6, 2026

@wedu-nvidia could you please test this and if works approve / merge?

@wedu-nvidia
Copy link
Collaborator

@wedu-nvidia could you please test this and if works approve / merge?

Ok, sure.

Copy link
Collaborator

@wedu-nvidia wedu-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and verified, it looks good to me.

@Kipok Kipok merged commit 4a13de0 into NVIDIA-NeMo:main Jan 7, 2026
5 of 6 checks passed
blahblahasdf pushed a commit to blahblahasdf/Skills that referenced this pull request Jan 8, 2026
Signed-off-by: bzantium <ryumin93@gmail.com>
Co-authored-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: dlord <dlord@nvidia.com>
hsiehjackson pushed a commit that referenced this pull request Jan 13, 2026
Signed-off-by: bzantium <ryumin93@gmail.com>
Co-authored-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

o3-mini-20250131 is not available, need to use o3-mini-2025-01-31

3 participants