-
Notifications
You must be signed in to change notification settings - Fork 139
Fix adding dictionary parameters in extra body #1195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Igor Gitman <[email protected]>
Signed-off-by: Igor Gitman <[email protected]>
Signed-off-by: Igor Gitman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
Additional Comments (1)
The |
📝 WalkthroughWalkthroughThe changes enhance Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@nemo_skills/inference/model/openai.py`:
- Around line 111-112: The current check raising ValueError when extra_body is
set blocks legitimate uses (e.g., callers passing
extra_body["reasoning_effort"]) — remove the blanket rejection and instead merge
extra_body into the request payload in the OpenAI client function (keep the
existing reasoning_effort parameter in the signature as authoritative), so that:
1) if both reasoning_effort param and extra_body["reasoning_effort"] are
present, prefer the explicit reasoning_effort argument; 2) otherwise include
keys from extra_body into the outgoing body to support forward-compatible
escape-hatch fields; reference the symbols extra_body and reasoning_effort in
your changes and adjust callers (prover.py/autoformalize.py) only if you choose
to prefer refactoring them to use the reasoning_effort param.
| if extra_body: | ||
| raise ValueError("`extra_body` is not supported by OpenAI API") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and examine the openai.py file
fd "openai.py" | head -5Repository: NVIDIA-NeMo/Skills
Length of output: 99
🏁 Script executed:
# Read the context around lines 111-112
cat -n nemo_skills/inference/model/openai.py | sed -n '100,125p'Repository: NVIDIA-NeMo/Skills
Length of output: 1234
🏁 Script executed:
# Search for any other mentions of extra_body in the codebase
rg "extra_body" -nRepository: NVIDIA-NeMo/Skills
Length of output: 8181
🌐 Web query:
OpenAI Chat Completions API extra_body parameter support additional parameters
💡 Result:
extra_body is not an official /v1/chat/completions API field. It’s a client/SDK “escape hatch” (request option) that lets you inject additional JSON properties into the HTTP request body beyond the SDK’s typed parameters. The OpenAI Python SDK explicitly documents extra_query, extra_body, and extra_headers as the way to send “extra/undocumented” params. (github.com)
Whether those extra fields “work” depends on what the server endpoint accepts:
- The OpenAI Chat Completions API supports the documented request body fields shown in the API reference; arbitrary additional top-level fields are not documented as supported. (platform.openai.com)
extra_bodyis therefore mainly useful for:- forward compatibility (passing a newly introduced parameter before the SDK adds it), or
- OpenAI-compatible proxies/providers (Azure-compatible layers, LiteLLM, vLLM, gateways) that intentionally accept extra fields.
Example (Python SDK):
completion = client.chat.completions.create(
model="gpt-4o",
messages=[{"role":"user","content":"Hi"}],
extra_body={"my_proxy_flag": True}
)(Adding extra fields is supported by the SDK; acceptance is up to the endpoint.) (github.com)
Citations:
- 1: https://github.com/openai/openai-python
- 2: https://platform.openai.com/docs/api-reference/chat/object
- 3: https://github.com/openai/openai-python
🏁 Script executed:
# Let's verify reasoning_effort support in openai.py and check if it's actually handled
cat -n nemo_skills/inference/model/openai.py | sed -n '123,145p'Repository: NVIDIA-NeMo/Skills
Length of output: 1408
🏁 Script executed:
# Check if prover.py and autoformalize.py will actually hit the error
cat -n nemo_skills/inference/prover.py | sed -n '155,175p'Repository: NVIDIA-NeMo/Skills
Length of output: 1421
🌐 Web query:
OpenAI o1 o3 models reasoning_effort parameter API support
💡 Result:
-
Two different parameter names are in use (depends on API):
- Responses API (
/v1/responses): usereasoning: { effort: ... }(docs call it “gpt-5 and o‑series models only”). (platform.openai.com) - Chat Completions API (
/v1/chat/completions) (and some other endpoints): OpenAI has also documentedreasoning_effortas the knob for reasoning effort. (platform.openai.com)
- Responses API (
-
Model support (o1 vs o3):
- o1: OpenAI announced a
reasoning_effortAPI parameter for o1 models, and the API changelog notes it was added for o1 models. (openai.com) - o3: In the Responses API, OpenAI shows
model="o3"used withreasoning={ "effort": "high" }(plus background mode). (openai.com)
- o1: OpenAI announced a
-
Allowed values / defaults:
- The API reference for the reasoning config lists supported effort values as
none,minimal,low,medium,high,xhigh, with default =medium(but exact supported values can vary by model family/version). (platform.openai.com) - OpenAI’s release notes specifically call out o3‑mini as having three effort settings: low/medium/high. (help.openai.com)
- The API reference for the reasoning config lists supported effort values as
Citations:
- 1: https://platform.openai.com/docs/api-reference/responses/get?utm_source=openai
- 2: https://platform.openai.com/docs/api-reference/fine-tuning/chat-input?utm_source=openai
- 3: https://openai.com/index/o1-and-new-tools-for-developers/?utm_source=openai
- 4: https://openai.com/index/new-tools-and-features-in-the-responses-api/?utm_source=openai
- 5: https://platform.openai.com/docs/api-reference/responses/get?utm_source=openai
- 6: https://help.openai.com/en/articles/9624314-mo?utm_source=openai
This validation will break existing code that uses extra_body.
The blanket rejection at lines 111–112 is incompatible with actual codebase usage. prover.py (lines 160, 164, 172) and autoformalize.py (lines 244, 248, 256) both pass extra_body["reasoning_effort"], which will fail with this ValueError. Additionally, reasoning_effort is a documented OpenAI Chat Completions API parameter (officially supported for o1/o3 models), and extra_body is an official OpenAI Python SDK escape hatch for forward compatibility and OpenAI-compatible proxies.
Either allow extra_body for these legitimate use cases, or refactor callers to use the dedicated reasoning_effort parameter already present in the function signature (line 100).
🧰 Tools
🪛 Ruff (0.14.14)
112-112: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In `@nemo_skills/inference/model/openai.py` around lines 111 - 112, The current
check raising ValueError when extra_body is set blocks legitimate uses (e.g.,
callers passing extra_body["reasoning_effort"]) — remove the blanket rejection
and instead merge extra_body into the request payload in the OpenAI client
function (keep the existing reasoning_effort parameter in the signature as
authoritative), so that: 1) if both reasoning_effort param and
extra_body["reasoning_effort"] are present, prefer the explicit reasoning_effort
argument; 2) otherwise include keys from extra_body into the outgoing body to
support forward-compatible escape-hatch fields; reference the symbols extra_body
and reasoning_effort in your changes and adjust callers
(prover.py/autoformalize.py) only if you choose to prefer refactoring them to
use the reasoning_effort param.
gwarmstrong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kipok is there a good way to cover this behavior with a unit test?
Signed-off-by: Igor Gitman <[email protected]>
|
updated one of the gpu tests to include parameter that fails on main with probably not worth writing a whole separate test just for this. And unfortunately the model has to be served with vllm locally, so we can't use api models in cpu pipeline as they don't support extra body |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
tests/gpu-tests/test_generate.py
Outdated
| # @pytest.mark.gpu | ||
| # def test_vllm_generate_greedy_chunked(): | ||
| # model_path = require_env_var("NEMO_SKILLS_TEST_HF_MODEL") | ||
| # model_type = require_env_var("NEMO_SKILLS_TEST_MODEL_TYPE") | ||
|
|
||
| # output_dir = f"/tmp/nemo-skills-tests/{model_type}/vllm-generate-greedy-chunked/generation" | ||
| # docker_rm([output_dir]) | ||
|
|
||
| # cmd = ( | ||
| # f"ns generate " | ||
| # f" --cluster test-local --config_dir {Path(__file__).absolute().parent} " | ||
| # f" --model {model_path} " | ||
| # f" --output_dir {output_dir} " | ||
| # f" --server_type vllm " | ||
| # f" --server_gpus 1 " | ||
| # f" --server_nodes 1 " | ||
| # f" --server_args '--enforce-eager' " | ||
| # f" --num_chunks 2 " | ||
| # f" --input_file=/nemo_run/code/nemo_skills/dataset/gsm8k/test.jsonl " | ||
| # f" ++prompt_config=generic/math " | ||
| # f" ++max_samples=10 " | ||
| # f" ++skip_filled=False " | ||
| # ) | ||
| # subprocess.run(cmd, shell=True, check=True) | ||
|
|
||
| # # no evaluation by default - checking just the number of lines and that there is no symbolic_correct key | ||
| # with open(f"{output_dir}/output.jsonl") as fin: | ||
| # lines = fin.readlines() | ||
| # assert len(lines) == 20 # because max_samples is the number of samples per chunk | ||
| # for line in lines: | ||
| # data = json.loads(line) | ||
| # assert "symbolic_correct" not in data | ||
| # assert "generation" in data | ||
| # assert os.path.exists(f"{output_dir}/output.jsonl.done") | ||
|
|
||
|
|
||
| # @pytest.mark.gpu | ||
| # def test_vllm_generate_seeds(): | ||
| # model_path = require_env_var("NEMO_SKILLS_TEST_HF_MODEL") | ||
| # model_type = require_env_var("NEMO_SKILLS_TEST_MODEL_TYPE") | ||
|
|
||
| # output_dir = f"/tmp/nemo-skills-tests/{model_type}/vllm-generate-seeds/generation" | ||
| # docker_rm([output_dir]) | ||
|
|
||
| # num_seeds = 3 | ||
| # cmd = ( | ||
| # f"ns generate " | ||
| # f" --cluster test-local --config_dir {Path(__file__).absolute().parent} " | ||
| # f" --model {model_path} " | ||
| # f" --server_type vllm " | ||
| # f" --output_dir {output_dir} " | ||
| # f" --server_gpus 1 " | ||
| # f" --server_nodes 1 " | ||
| # f" --server_args '--enforce-eager' " | ||
| # f" --num_random_seeds {num_seeds} " | ||
| # f" --with_sandbox " | ||
| # f" --input_file=/nemo_run/code/nemo_skills/dataset/gsm8k/test.jsonl " | ||
| # f" ++eval_type=math " | ||
| # f" ++prompt_config=generic/math " | ||
| # f" ++max_samples=10 " | ||
| # f" ++skip_filled=False " | ||
| # ) | ||
| # subprocess.run(cmd, shell=True, check=True) | ||
|
|
||
| # # checking that all 3 files are created | ||
| # for seed in range(num_seeds): | ||
| # with open(f"{output_dir}/output-rs{seed}.jsonl") as fin: | ||
| # lines = fin.readlines() | ||
| # assert len(lines) == 10 | ||
| # for line in lines: | ||
| # data = json.loads(line) | ||
| # assert "symbolic_correct" in data | ||
| # assert "generation" in data | ||
| # assert os.path.exists(f"{output_dir}/output-rs{seed}.jsonl.done") | ||
|
|
||
| # # running compute_metrics to check that results are expected | ||
| # metrics = ComputeMetrics(benchmark="gsm8k").compute_metrics([f"{output_dir}/output-rs*.jsonl"])["_all_"][ | ||
| # "majority@3" | ||
| # ] | ||
| # # rough check, since exact accuracy varies depending on gpu type | ||
| # assert metrics["symbolic_correct"] >= 50 | ||
| # assert metrics["num_entries"] == 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two tests (test_vllm_generate_greedy_chunked and test_vllm_generate_seeds) are commented out without explanation. According to CONTRIBUTING.md, commented code should be removed. Either remove these tests completely or create a separate issue/PR to address why they need to be disabled.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Signed-off-by: Igor Gitman <[email protected]>
…IA/NeMo-Skills into igitman/fix-extra-body-dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
| from utils import require_env_var | ||
|
|
||
| from nemo_skills.evaluation.metrics import ComputeMetrics | ||
| from nemo_skills.evaluation.metrics.compute_metrics import ComputeMetrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import change is unnecessary - ComputeMetrics is already exported from nemo_skills.evaluation.metrics.__init__.py
| from nemo_skills.evaluation.metrics.compute_metrics import ComputeMetrics | |
| from nemo_skills.evaluation.metrics import ComputeMetrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
| from utils import require_env_var | ||
|
|
||
| from nemo_skills.evaluation.metrics import ComputeMetrics | ||
| from nemo_skills.evaluation.metrics.compute_metrics import ComputeMetrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ComputeMetrics is already exported from nemo_skills.evaluation.metrics.__init__.py - import can be simplified
| from nemo_skills.evaluation.metrics.compute_metrics import ComputeMetrics | |
| from nemo_skills.evaluation.metrics import ComputeMetrics |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.