o3-mini-20250131 -> o3-mini-2025-01-31 #1149

bzantium · 2025-12-31T00:02:17Z

Fixes: #1148

Summary by CodeRabbit

Documentation
- Updated evaluation guides and tutorials to reference current model configuration.
Chores
- Updated default judge model reference in evaluation pipeline configurations across HLE and SimpleQA modules.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: bzantium <ryumin93@gmail.com>

greptile-apps · 2025-12-31T00:03:52Z

Greptile Summary

This PR fixes an incorrect OpenAI model identifier used for the LLM-as-a-judge feature in evaluation pipelines. The model name o3-mini-20250131 was not a valid OpenAI model identifier, causing evaluation failures. The correct identifier is o3-mini-2025-01-31.

Fixed model identifier in JUDGE_PIPELINE_ARGS for HLE and SimpleQA dataset configurations
Updated documentation examples in evaluation guides and tutorials to use the correct model name
Resolves issue o3-mini-20250131 is not available, need to use o3-mini-2025-01-31 #1148 where evaluations were failing due to invalid model reference

Confidence Score: 5/5

This PR is a straightforward string replacement bug fix with zero risk of introducing regressions
The change is a simple and consistent string replacement of an incorrect model identifier across all affected files. No logic changes, no new code paths, and the fix directly addresses a documented issue where evaluations were failing.
No files require special attention

Important Files Changed

Filename	Overview
docs/evaluation/natural-math.md	Updated example model name from `o3-mini-20250131` to `o3-mini-2025-01-31` in JUDGE_PIPELINE_ARGS documentation
docs/tutorials/posts/llama-nemotron-super-v1.5-evals.md	Updated judge model name to `o3-mini-2025-01-31` in documentation text and command examples for HLE evaluation (reasoning on/off)
docs/tutorials/posts/nemotron-nano-v2-evals.md	Updated judge model name to `o3-mini-2025-01-31` in documentation text and command example for HLE evaluation
nemo_skills/dataset/hle/init.py	Fixed incorrect model identifier in JUDGE_PIPELINE_ARGS from `o3-mini-20250131` to `o3-mini-2025-01-31`
nemo_skills/dataset/simpleqa/init.py	Fixed incorrect model identifier in JUDGE_PIPELINE_ARGS from `o3-mini-20250131` to `o3-mini-2025-01-31`

Sequence Diagram

sequenceDiagram
    participant User
    participant NemoSkills as Nemo-Skills CLI
    participant Config as Dataset Config<br/>(hle/__init__.py)
    participant OpenAI as OpenAI API

    User->>NemoSkills: ns eval --benchmarks=hle
    NemoSkills->>Config: Load JUDGE_PIPELINE_ARGS
    Config-->>NemoSkills: model: "o3-mini-2025-01-31"
    NemoSkills->>OpenAI: POST /v1/chat/completions<br/>model=o3-mini-2025-01-31
    OpenAI-->>NemoSkills: Judge response
    NemoSkills-->>User: Evaluation results

greptile-apps · 2025-12-31T00:03:52Z

Greptile found no issues!

From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section.

_{This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".}

coderabbitai · 2025-12-31T00:04:56Z

📝 Walkthrough

Walkthrough

This pull request corrects a model identifier across documentation and configuration files. The judge model name is updated from "o3-mini-20250131" to "o3-mini-2025-01-31" (the correct OpenAI model name) in five files. No functional logic or control flow changes are introduced.

Changes

Cohort / File(s)	Summary
Documentation files `docs/evaluation/natural-math.md`, `docs/tutorials/posts/llama-nemotron-super-v1.5-evals.md`, `docs/tutorials/posts/nemotron-nano-v2-evals.md`	Updated JUDGE_PIPELINE_ARGS model reference from "o3-mini-20250131" to "o3-mini-2025-01-31" in example text and command snippets
Dataset configuration files `nemo_skills/dataset/hle/__init__.py`, `nemo_skills/dataset/simpleqa/__init__.py`	Updated JUDGE_PIPELINE_ARGS constant dictionary "model" field value from "o3-mini-20250131" to "o3-mini-2025-01-31"

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Suggested reviewers

ekmb

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: updating the model name from o3-mini-20250131 to o3-mini-2025-01-31, matching the file modifications.
Linked Issues check	✅ Passed	The PR successfully replaces all occurrences of the unavailable o3-mini-20250131 model with the correct o3-mini-2025-01-31 identifier across documentation and code.
Out of Scope Changes check	✅ Passed	All changes are in-scope: the PR updates only model name references in documentation and code constants, directly addressing the issue requirement.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 91edc30 and 9bc4a9c.

📒 Files selected for processing (5)

docs/evaluation/natural-math.md
docs/tutorials/posts/llama-nemotron-super-v1.5-evals.md
docs/tutorials/posts/nemotron-nano-v2-evals.md
nemo_skills/dataset/hle/__init__.py
nemo_skills/dataset/simpleqa/__init__.py

🔇 Additional comments (6)

docs/tutorials/posts/nemotron-nano-v2-evals.md (1)

191-211: Documentation updates are consistent.

The model identifier has been correctly updated in both the descriptive text and the command example.

nemo_skills/dataset/simpleqa/__init__.py (1)

23-29: Model identifier update is correct.

The update is consistent with the changes in nemo_skills/dataset/hle/__init__.py, ensuring uniform configuration across datasets.

docs/evaluation/natural-math.md (1)

106-112: Documentation example updated correctly.

The example configuration now reflects the correct model identifier, consistent with the actual code changes.

docs/tutorials/posts/llama-nemotron-super-v1.5-evals.md (2)

161-180: Reasoning-on HLE evaluation section updated correctly.

The model identifier has been updated in both the descriptive text and the command example.

437-443: Reasoning-off HLE evaluation section updated correctly.

The model identifier is consistent with the reasoning-on section updates.

nemo_skills/dataset/hle/__init__.py (1)

23-29: Model identifier update is correct and complete.

The update from "o3-mini-20250131" to "o3-mini-2025-01-31" has been properly applied. The old identifier does not appear anywhere in the codebase, and the new identifier is consistently used across all dataset configurations (hle and simpleqa modules). The comment and configuration are aligned.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Kipok · 2026-01-06T21:13:38Z

@wedu-nvidia could you please test this and if works approve / merge?

wedu-nvidia · 2026-01-06T21:18:26Z

@wedu-nvidia could you please test this and if works approve / merge?

Ok, sure.

wedu-nvidia

Tested and verified, it looks good to me.

Signed-off-by: bzantium <ryumin93@gmail.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: dlord <dlord@nvidia.com>

Signed-off-by: bzantium <ryumin93@gmail.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

o3-mini-20250131 -> o3-mini-2025-01-31

9bc4a9c

Signed-off-by: bzantium <ryumin93@gmail.com>

Kipok requested a review from wedu-nvidia January 6, 2026 21:12

wedu-nvidia approved these changes Jan 6, 2026

View reviewed changes

Merge branch 'main' into feature/NVIDIA-NeMo#1148

de077ed

Kipok merged commit 4a13de0 into NVIDIA-NeMo:main Jan 7, 2026
5 of 6 checks passed

greptile-apps bot mentioned this pull request Jan 13, 2026

Add Arena-Hard-v2 benchmark support #1152

Open

3 tasks

hsiehjackson pushed a commit that referenced this pull request Jan 13, 2026

o3-mini-20250131 -> o3-mini-2025-01-31 (#1149)

86195df

Signed-off-by: bzantium <ryumin93@gmail.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

o3-mini-20250131 -> o3-mini-2025-01-31 #1149

o3-mini-20250131 -> o3-mini-2025-01-31 #1149

Uh oh!

bzantium commented Dec 31, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

greptile-apps bot commented Dec 31, 2025 •

edited

Loading

Uh oh!

greptile-apps bot commented Dec 31, 2025

Uh oh!

coderabbitai bot commented Dec 31, 2025

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

Kipok commented Jan 6, 2026

Uh oh!

wedu-nvidia commented Jan 6, 2026

Uh oh!

wedu-nvidia left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

o3-mini-20250131 -> o3-mini-2025-01-31 #1149

o3-mini-20250131 -> o3-mini-2025-01-31 #1149

Uh oh!

Conversation

bzantium commented Dec 31, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

greptile-apps bot commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot commented Dec 31, 2025

Greptile found no issues!

Uh oh!

coderabbitai bot commented Dec 31, 2025

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

Kipok commented Jan 6, 2026

Uh oh!

wedu-nvidia commented Jan 6, 2026

Uh oh!

wedu-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bzantium commented Dec 31, 2025 •

edited by coderabbitai bot

Loading

greptile-apps bot commented Dec 31, 2025 •

edited

Loading