Skip to content

Conversation

@Kipok
Copy link
Collaborator

@Kipok Kipok commented Jan 28, 2026

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced configuration format compatibility for improved robustness
    • OpenAI model now validates request parameters and rejects extra_body configuration

✏️ Tip: You can customize this high-level summary in your review settings.

Kipok added 3 commits January 27, 2026 15:08
Signed-off-by: Igor Gitman <[email protected]>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 28, 2026

Additional Comments (1)

nemo_skills/inference/model/openai.py
extra_body validation missing in _build_completion_request_params

The _build_chat_request_params method validates that extra_body is not provided (line 111-112), but _build_completion_request_params doesn't have this validation. If extra_body is passed through **kwargs, it will be silently included in the returned dict without proper validation.

    def _build_completion_request_params(self, **kwargs) -> dict:
        kwargs = copy.deepcopy(kwargs)
        assert kwargs.pop("tools", None) is None, "tools are not supported by completion requests."
        assert kwargs.pop("reasoning_effort", None) is None, (
            "reasoning_effort is not supported by completion requests."
        )
        assert kwargs.pop("top_k", -1) == -1, "`top_k` is not supported by OpenAI API, please set it to -1."
        assert kwargs.pop("min_p", 0.0) == 0.0, "`min_p` is not supported by OpenAI API, please set it to 0.0."
        assert kwargs.pop("repetition_penalty", 1.0) == 1.0, (
            "`repetition_penalty` is not supported by OpenAI API, please set it to 1.0."
        )
        if kwargs.get("extra_body"):
            raise ValueError("`extra_body` is not supported by OpenAI API")
        kwargs.pop("extra_body", None)
        if "tokens_to_generate" in kwargs:
            tokens_to_generate = kwargs.pop("tokens_to_generate")
            kwargs["max_tokens"] = tokens_to_generate
        if "random_seed" in kwargs:
            kwargs["seed"] = kwargs.pop("random_seed")
        if "stop_phrases" in kwargs:
            kwargs["stop"] = kwargs.pop("stop_phrases")
        return dict(kwargs)

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 28, 2026

📝 Walkthrough

Walkthrough

The changes enhance inference.extra_body handling across two modules: one improves normalization to properly handle Hydra DictConfig objects using OmegaConf conversion, while the other adds explicit validation to reject extra_body usage in OpenAI chat requests.

Changes

Cohort / File(s) Summary
Extra Body Handling
nemo_skills/inference/generate.py
Added imports for DictConfig and OmegaConf. Enhanced GenerationTask.__init__ to conditionally convert inference.extra_body: if it's a DictConfig, use OmegaConf.to_container(..., resolve=True) for proper Hydra configuration handling; otherwise, convert to dict as before.
Extra Body Handling
nemo_skills/inference/model/openai.py
Added validation in _build_chat_request_params to reject extra_body by raising a ValueError when provided, enforcing stricter constraints on OpenAI chat request parameters.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: fixing how dictionary parameters in extra_body are handled by properly converting DictConfig to standard containers.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@nemo_skills/inference/model/openai.py`:
- Around line 111-112: The current check raising ValueError when extra_body is
set blocks legitimate uses (e.g., callers passing
extra_body["reasoning_effort"]) — remove the blanket rejection and instead merge
extra_body into the request payload in the OpenAI client function (keep the
existing reasoning_effort parameter in the signature as authoritative), so that:
1) if both reasoning_effort param and extra_body["reasoning_effort"] are
present, prefer the explicit reasoning_effort argument; 2) otherwise include
keys from extra_body into the outgoing body to support forward-compatible
escape-hatch fields; reference the symbols extra_body and reasoning_effort in
your changes and adjust callers (prover.py/autoformalize.py) only if you choose
to prefer refactoring them to use the reasoning_effort param.

Comment on lines +111 to +112
if extra_body:
raise ValueError("`extra_body` is not supported by OpenAI API")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the openai.py file
fd "openai.py" | head -5

Repository: NVIDIA-NeMo/Skills

Length of output: 99


🏁 Script executed:

# Read the context around lines 111-112
cat -n nemo_skills/inference/model/openai.py | sed -n '100,125p'

Repository: NVIDIA-NeMo/Skills

Length of output: 1234


🏁 Script executed:

# Search for any other mentions of extra_body in the codebase
rg "extra_body" -n

Repository: NVIDIA-NeMo/Skills

Length of output: 8181


🌐 Web query:

OpenAI Chat Completions API extra_body parameter support additional parameters

💡 Result:

extra_body is not an official /v1/chat/completions API field. It’s a client/SDK “escape hatch” (request option) that lets you inject additional JSON properties into the HTTP request body beyond the SDK’s typed parameters. The OpenAI Python SDK explicitly documents extra_query, extra_body, and extra_headers as the way to send “extra/undocumented” params. (github.com)

Whether those extra fields “work” depends on what the server endpoint accepts:

  • The OpenAI Chat Completions API supports the documented request body fields shown in the API reference; arbitrary additional top-level fields are not documented as supported. (platform.openai.com)
  • extra_body is therefore mainly useful for:
    • forward compatibility (passing a newly introduced parameter before the SDK adds it), or
    • OpenAI-compatible proxies/providers (Azure-compatible layers, LiteLLM, vLLM, gateways) that intentionally accept extra fields.

Example (Python SDK):

completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role":"user","content":"Hi"}],
    extra_body={"my_proxy_flag": True}
)

(Adding extra fields is supported by the SDK; acceptance is up to the endpoint.) (github.com)

Citations:


🏁 Script executed:

# Let's verify reasoning_effort support in openai.py and check if it's actually handled
cat -n nemo_skills/inference/model/openai.py | sed -n '123,145p'

Repository: NVIDIA-NeMo/Skills

Length of output: 1408


🏁 Script executed:

# Check if prover.py and autoformalize.py will actually hit the error
cat -n nemo_skills/inference/prover.py | sed -n '155,175p'

Repository: NVIDIA-NeMo/Skills

Length of output: 1421


🌐 Web query:

OpenAI o1 o3 models reasoning_effort parameter API support

💡 Result:

  • Two different parameter names are in use (depends on API):

    • Responses API (/v1/responses): use reasoning: { effort: ... } (docs call it “gpt-5 and o‑series models only”). (platform.openai.com)
    • Chat Completions API (/v1/chat/completions) (and some other endpoints): OpenAI has also documented reasoning_effort as the knob for reasoning effort. (platform.openai.com)
  • Model support (o1 vs o3):

    • o1: OpenAI announced a reasoning_effort API parameter for o1 models, and the API changelog notes it was added for o1 models. (openai.com)
    • o3: In the Responses API, OpenAI shows model="o3" used with reasoning={ "effort": "high" } (plus background mode). (openai.com)
  • Allowed values / defaults:

    • The API reference for the reasoning config lists supported effort values as none, minimal, low, medium, high, xhigh, with default = medium (but exact supported values can vary by model family/version). (platform.openai.com)
    • OpenAI’s release notes specifically call out o3‑mini as having three effort settings: low/medium/high. (help.openai.com)

Citations:


This validation will break existing code that uses extra_body.

The blanket rejection at lines 111–112 is incompatible with actual codebase usage. prover.py (lines 160, 164, 172) and autoformalize.py (lines 244, 248, 256) both pass extra_body["reasoning_effort"], which will fail with this ValueError. Additionally, reasoning_effort is a documented OpenAI Chat Completions API parameter (officially supported for o1/o3 models), and extra_body is an official OpenAI Python SDK escape hatch for forward compatibility and OpenAI-compatible proxies.

Either allow extra_body for these legitimate use cases, or refactor callers to use the dedicated reasoning_effort parameter already present in the function signature (line 100).

🧰 Tools
🪛 Ruff (0.14.14)

112-112: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@nemo_skills/inference/model/openai.py` around lines 111 - 112, The current
check raising ValueError when extra_body is set blocks legitimate uses (e.g.,
callers passing extra_body["reasoning_effort"]) — remove the blanket rejection
and instead merge extra_body into the request payload in the OpenAI client
function (keep the existing reasoning_effort parameter in the signature as
authoritative), so that: 1) if both reasoning_effort param and
extra_body["reasoning_effort"] are present, prefer the explicit reasoning_effort
argument; 2) otherwise include keys from extra_body into the outgoing body to
support forward-compatible escape-hatch fields; reference the symbols extra_body
and reasoning_effort in your changes and adjust callers
(prover.py/autoformalize.py) only if you choose to prefer refactoring them to
use the reasoning_effort param.

Copy link
Collaborator

@gwarmstrong gwarmstrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kipok is there a good way to cover this behavior with a unit test?

@Kipok
Copy link
Collaborator Author

Kipok commented Jan 29, 2026

updated one of the gpu tests to include parameter that fails on main with

litellm.exceptions.InternalServerError: litellm.InternalServerError: InternalServerError: OpenAIException - Object of type DictConfig is not JSON serializable

probably not worth writing a whole separate test just for this. And unfortunately the model has to be served with vllm locally, so we can't use api models in cpu pipeline as they don't support extra body

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 62 to 143
# @pytest.mark.gpu
# def test_vllm_generate_greedy_chunked():
# model_path = require_env_var("NEMO_SKILLS_TEST_HF_MODEL")
# model_type = require_env_var("NEMO_SKILLS_TEST_MODEL_TYPE")

# output_dir = f"/tmp/nemo-skills-tests/{model_type}/vllm-generate-greedy-chunked/generation"
# docker_rm([output_dir])

# cmd = (
# f"ns generate "
# f" --cluster test-local --config_dir {Path(__file__).absolute().parent} "
# f" --model {model_path} "
# f" --output_dir {output_dir} "
# f" --server_type vllm "
# f" --server_gpus 1 "
# f" --server_nodes 1 "
# f" --server_args '--enforce-eager' "
# f" --num_chunks 2 "
# f" --input_file=/nemo_run/code/nemo_skills/dataset/gsm8k/test.jsonl "
# f" ++prompt_config=generic/math "
# f" ++max_samples=10 "
# f" ++skip_filled=False "
# )
# subprocess.run(cmd, shell=True, check=True)

# # no evaluation by default - checking just the number of lines and that there is no symbolic_correct key
# with open(f"{output_dir}/output.jsonl") as fin:
# lines = fin.readlines()
# assert len(lines) == 20 # because max_samples is the number of samples per chunk
# for line in lines:
# data = json.loads(line)
# assert "symbolic_correct" not in data
# assert "generation" in data
# assert os.path.exists(f"{output_dir}/output.jsonl.done")


# @pytest.mark.gpu
# def test_vllm_generate_seeds():
# model_path = require_env_var("NEMO_SKILLS_TEST_HF_MODEL")
# model_type = require_env_var("NEMO_SKILLS_TEST_MODEL_TYPE")

# output_dir = f"/tmp/nemo-skills-tests/{model_type}/vllm-generate-seeds/generation"
# docker_rm([output_dir])

# num_seeds = 3
# cmd = (
# f"ns generate "
# f" --cluster test-local --config_dir {Path(__file__).absolute().parent} "
# f" --model {model_path} "
# f" --server_type vllm "
# f" --output_dir {output_dir} "
# f" --server_gpus 1 "
# f" --server_nodes 1 "
# f" --server_args '--enforce-eager' "
# f" --num_random_seeds {num_seeds} "
# f" --with_sandbox "
# f" --input_file=/nemo_run/code/nemo_skills/dataset/gsm8k/test.jsonl "
# f" ++eval_type=math "
# f" ++prompt_config=generic/math "
# f" ++max_samples=10 "
# f" ++skip_filled=False "
# )
# subprocess.run(cmd, shell=True, check=True)

# # checking that all 3 files are created
# for seed in range(num_seeds):
# with open(f"{output_dir}/output-rs{seed}.jsonl") as fin:
# lines = fin.readlines()
# assert len(lines) == 10
# for line in lines:
# data = json.loads(line)
# assert "symbolic_correct" in data
# assert "generation" in data
# assert os.path.exists(f"{output_dir}/output-rs{seed}.jsonl.done")

# # running compute_metrics to check that results are expected
# metrics = ComputeMetrics(benchmark="gsm8k").compute_metrics([f"{output_dir}/output-rs*.jsonl"])["_all_"][
# "majority@3"
# ]
# # rough check, since exact accuracy varies depending on gpu type
# assert metrics["symbolic_correct"] >= 50
# assert metrics["num_entries"] == 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two tests (test_vllm_generate_greedy_chunked and test_vllm_generate_seeds) are commented out without explanation. According to CONTRIBUTING.md, commented code should be removed. Either remove these tests completely or create a separate issue/PR to address why they need to be disabled.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

from utils import require_env_var

from nemo_skills.evaluation.metrics import ComputeMetrics
from nemo_skills.evaluation.metrics.compute_metrics import ComputeMetrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import change is unnecessary - ComputeMetrics is already exported from nemo_skills.evaluation.metrics.__init__.py

Suggested change
from nemo_skills.evaluation.metrics.compute_metrics import ComputeMetrics
from nemo_skills.evaluation.metrics import ComputeMetrics

@gwarmstrong gwarmstrong enabled auto-merge (squash) January 29, 2026 16:57
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

from utils import require_env_var

from nemo_skills.evaluation.metrics import ComputeMetrics
from nemo_skills.evaluation.metrics.compute_metrics import ComputeMetrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ComputeMetrics is already exported from nemo_skills.evaluation.metrics.__init__.py - import can be simplified

Suggested change
from nemo_skills.evaluation.metrics.compute_metrics import ComputeMetrics
from nemo_skills.evaluation.metrics import ComputeMetrics

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@gwarmstrong gwarmstrong merged commit 3935fa5 into main Jan 29, 2026
5 checks passed
@gwarmstrong gwarmstrong deleted the igitman/fix-extra-body-dict branch January 29, 2026 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants