Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR introduces a comprehensive NeMo Gym integration for GRPO RL training with argument validation, data producer orchestration, reward functions, and server lifecycle management. It also fixes GRPO builder field filtering for async configs, improves LoRA synchronization and entropy handling in async trainers, and adds OpenAI-compatible API endpoints to vLLM serving. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~90 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
NanoCode012
left a comment
There was a problem hiding this comment.
Is this work an addition ontop of async GRPO?
I'm not so clear on the arch, is it a replacement for vllm generation?
Also, how does the uv packages interact with axolotl, if we were to use pip?
4eaaf8a to
d8da708
Compare
|
📖 Documentation Preview: https://69c02f066e224299aeaf1509--resonant-treacle-0fd729.netlify.app Deployed on Netlify from commit 64746db |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (3)
src/axolotl/integrations/nemo_gym/README.md (1)
231-252: Consider adding language specifiers to fenced code blocks for diagram sections.The ASCII diagrams at lines 231-235 and 238-252 could use
textas the language identifier to satisfy markdown linters, though this is a minor stylistic preference.♻️ Optional fix
-``` +```text axolotl train → GRPO Trainer generates completions-``` +```text ┌─────────────┐ ┌──────────────┐ ┌──────────────────┐🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/axolotl/integrations/nemo_gym/README.md` around lines 231 - 252, Update the two fenced code blocks that render the ASCII diagrams (the block containing "axolotl train → GRPO Trainer generates completions" and the multi-turn diagram block beginning with the box diagram) to include a language specifier by changing the opening backticks to ```text so markdown linters recognize them as plain text; keep the diagram content unchanged and only add the "text" tag to the existing triple-backtick delimiters.src/axolotl/core/trainers/grpo/async_trainer.py (1)
985-988: Consider logging the suppressed exception for debuggability.The static analysis correctly flags the bare
try-except-passpattern. Whilereset_prefix_cache()failure may not be critical, silently swallowing all exceptions makes debugging harder when issues do occur.♻️ Proposed fix to log the exception
# Reset prefix cache after adapter update try: vllm_client.reset_prefix_cache() - except Exception: - pass # Not critical + except Exception as exc: + logger.debug("reset_prefix_cache failed (non-critical): %s", exc)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/axolotl/core/trainers/grpo/async_trainer.py` around lines 985 - 988, The bare except/pass around vllm_client.reset_prefix_cache() in async_trainer.py hides errors; change it to catch Exception as e and log the failure (use the module/class logger, e.g., self.logger or the existing logger in AsyncTrainer) — call logger.exception or logger.warning with the exception details and a short context message referencing reset_prefix_cache() so failures are visible for debugging.src/axolotl/integrations/nemo_gym/dataset.py (1)
80-81: Specify explicit UTF-8 encoding for cross-platform compatibility.Opening files without explicit encoding uses the system default, which may cause issues on Windows systems or with non-ASCII content.
♻️ Proposed fix
- with open(path) as f: + with open(path, encoding="utf-8") as f: lines = f.readlines()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/axolotl/integrations/nemo_gym/dataset.py` around lines 80 - 81, The file read uses bare open(path) which relies on system default encoding; update the call to explicitly set UTF-8 (e.g. change with open(path) as f: to with open(path, encoding="utf-8") as f:) to ensure cross-platform handling of non-ASCII content; optionally include errors="replace" if you want to tolerate malformed bytes. Ensure you modify the open call that produces lines = f.readlines() in src/axolotl/integrations/nemo_gym/dataset.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/axolotl/integrations/nemo_gym/args.py`:
- Around line 17-40: The validator runs too early/late leaving nullable fields
unresolved; add a root validator that runs before field validation to compute
effective defaults: create a `@root_validator`(pre=True) method (e.g.,
resolve_nemo_gym_defaults) in the same Pydantic model that checks incoming
values and sets nemo_gym_enabled=True if None, and sets nemo_gym_auto_clone=True
when nemo_gym_auto_clone is None && effective nemo_gym_enabled is True (and
ensure nemo_gym_dir stays None if unset). Apply the same root_validator pattern
to the other similar block (the fields around 120-136) so the effective defaults
are resolved consistently before validation.
In `@src/axolotl/integrations/nemo_gym/data_producer.py`:
- Around line 99-107: The code is dropping item["agent_ref"] by taking only
verify_extra for a prompt, causing _call_agents() to lose routing metadata;
change the logic in data_producer.py so you first fetch the full dataset entry
(e.g., full_item = self._dataset_lookup.get(prompt_text, {})), then set item =
full_item.get("verify_extra", {}) and if full_item contains "agent_ref" attach
it to item (item["agent_ref"] = full_item["agent_ref"]); when using the fallback
default payload for responses_create_params also ensure you preserve or include
agent_ref from full_item so _call_agents() can correctly choose the /run
endpoint.
In `@src/axolotl/integrations/nemo_gym/multi_turn.py`:
- Around line 66-84: The loop uses fuzzy substring matching and then strips
agent routing info by appending item.get("verify_extra", item); change the
matching to exact/full match (replace "if isinstance(key, str) and key in
prompt_str" with equality or full-text comparison such as "if isinstance(key,
str) and prompt_str == key") to avoid selecting wrong dataset rows, and when
building expanded_items preserve agent routing by appending the original item
(or merging item["agent_ref"] into the verify_extra payload) instead of blindly
using item.get("verify_extra", item) so that agent_ref stays in the dispatched
payload used by _call_agents().
In `@src/axolotl/integrations/nemo_gym/plugin.py`:
- Around line 149-154: The dataset lookup currently keys entries by
row["prompt"][0]["content"], but NemoGymDataProducer.produce() looks up using
the last prompt message, causing misses for multi-message prompts and loss of
verify_extra/agent_ref; update the keying in the __init__ (the _dataset_lookup
population loop that uses dataset and row) to use the last prompt message
content (e.g., row["prompt"][-1]["content"] or the same extraction logic
NemoGymDataProducer.produce() uses) instead of prompt[0], and make the identical
change for the second occurrence (the block around lines 175-179) so both lookup
tables use the same key format as produce().
- Around line 325-337: The code incorrectly wires cfg.nemo_gym_verify_timeout
into the NemoGymDataProducer.request_timeout causing multi-turn /run calls to be
aborted by the short verify timeout; change the constructor call in plugin.py
where NemoGymDataProducer is instantiated so it does not use
nemo_gym_verify_timeout for request_timeout—either remove the request_timeout
argument to let NemoGymDataProducer use its 10800s default, or use a dedicated
config key (e.g., cfg.nemo_gym_run_timeout) with fallback to 10800; update the
argument passed to request_timeout accordingly.
- Around line 121-123: start_servers() reads nemo_gym_server_timeout into the
global config but the subsequent call to
wait_for_resource_servers(self._global_config) still uses the helper's 180s
default; update the call to pass the configured timeout from the global config
(e.g. extract nemo_gym_server_timeout from the object returned by
get_server_configs or self._global_config and call
wait_for_resource_servers(self._global_config, timeout=nemo_gym_server_timeout)
or the equivalent parameter name expected by wait_for_resource_servers) so the
explicit server timeout is honored.
In `@src/axolotl/integrations/nemo_gym/rewards.py`:
- Around line 66-67: The public hook reward_nemo_gym_verify and helper
_get_verify_urls currently hard-code the head port (11000), post payload
("model": "axolotl"), and a 30s timeout; update both functions to read and use
the configured values (nemo_gym_head_port, nemo_gym_model_name,
nemo_gym_verify_timeout) passed into or available to reward_nemo_gym_verify so
discovery and POST use the configured head port, the configured model name in
the JSON body, and the configured timeout for HTTP requests instead of the
hard-coded defaults; ensure _get_verify_urls accepts/use the provided head_port
and reward_nemo_gym_verify passes the configured timeout into the requests call
and uses nemo_gym_model_name for payload construction.
In `@src/axolotl/integrations/nemo_gym/server.py`:
- Around line 186-200: Update get_server_base_url (and thus get_verify_endpoint)
to normalize bind-all and loopback addresses the same way get_agent_servers
does: if srv_cfg['host'] is "0.0.0.0" replace with the external host (e.g.,
global_config["host"] or the agent/cluster-facing host used elsewhere) and if it
is "localhost" or "127.0.0.1" map it to the machine-facing hostname/IP used for
reachability; ensure wait_for_resource_servers uses this normalized URL for its
readiness probes. Locate get_server_base_url and get_verify_endpoint and apply
the same host-rewrite logic/utility used by get_agent_servers (or extract that
normalization into a helper and reuse it) so /verify URLs are routable when
services advertise 0.0.0.0 or localhost.
- Around line 101-108: The current code builds a shell command using "bash -c"
with a spliced config_paths string (vulnerable to spaces/quotes and shell
injection); instead construct a subprocess argument list and invoke the venv's
ng_run binary directly so no shell interpretation occurs. Locate where
_ng_process is created and replace the ["bash", "-c", ...] invocation by
computing the ng_run executable from the venv under gym_dir (the same venv used
earlier), then call subprocess.Popen with shell=False and an argument list that
passes config paths as separate arguments (or a single safely-constructed
argument) rather than interpolating them into a shell string; keep _ng_log_file
for stdout/stderr and preserve existing nosec/permission handling.
In `@src/axolotl/scripts/vllm_serve_lora.py`:
- Around line 520-528: The usage.total_tokens is hardcoded to 0; compute it from
the actual counts instead. Inside the usage dict in vllm_serve_lora.py (the
block that builds "usage"), assign prompt_tokens =
len(all_outputs[0].prompt_token_ids) if all_outputs else 0, compute
completion_tokens by summing len(out.token_ids) for o in all_outputs for out in
o.outputs, and set total_tokens = prompt_tokens + completion_tokens so the
"usage" object reflects the real totals.
---
Nitpick comments:
In `@src/axolotl/core/trainers/grpo/async_trainer.py`:
- Around line 985-988: The bare except/pass around
vllm_client.reset_prefix_cache() in async_trainer.py hides errors; change it to
catch Exception as e and log the failure (use the module/class logger, e.g.,
self.logger or the existing logger in AsyncTrainer) — call logger.exception or
logger.warning with the exception details and a short context message
referencing reset_prefix_cache() so failures are visible for debugging.
In `@src/axolotl/integrations/nemo_gym/dataset.py`:
- Around line 80-81: The file read uses bare open(path) which relies on system
default encoding; update the call to explicitly set UTF-8 (e.g. change with
open(path) as f: to with open(path, encoding="utf-8") as f:) to ensure
cross-platform handling of non-ASCII content; optionally include
errors="replace" if you want to tolerate malformed bytes. Ensure you modify the
open call that produces lines = f.readlines() in
src/axolotl/integrations/nemo_gym/dataset.py.
In `@src/axolotl/integrations/nemo_gym/README.md`:
- Around line 231-252: Update the two fenced code blocks that render the ASCII
diagrams (the block containing "axolotl train → GRPO Trainer generates
completions" and the multi-turn diagram block beginning with the box diagram) to
include a language specifier by changing the opening backticks to ```text so
markdown linters recognize them as plain text; keep the diagram content
unchanged and only add the "text" tag to the existing triple-backtick
delimiters.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 66d9db69-40e4-426d-8f72-9160a41eba31
📒 Files selected for processing (15)
src/axolotl/core/builders/rl.pysrc/axolotl/core/trainers/grpo/async_trainer.pysrc/axolotl/integrations/nemo_gym/README.mdsrc/axolotl/integrations/nemo_gym/__init__.pysrc/axolotl/integrations/nemo_gym/args.pysrc/axolotl/integrations/nemo_gym/data_producer.pysrc/axolotl/integrations/nemo_gym/dataset.pysrc/axolotl/integrations/nemo_gym/examples/nemo_gym_multi_env.yamlsrc/axolotl/integrations/nemo_gym/examples/nemo_gym_multi_turn.yamlsrc/axolotl/integrations/nemo_gym/examples/nemo_gym_sudoku.yamlsrc/axolotl/integrations/nemo_gym/multi_turn.pysrc/axolotl/integrations/nemo_gym/plugin.pysrc/axolotl/integrations/nemo_gym/rewards.pysrc/axolotl/integrations/nemo_gym/server.pysrc/axolotl/scripts/vllm_serve_lora.py
| nemo_gym_enabled: bool | None = Field( | ||
| default=True, | ||
| json_schema_extra={ | ||
| "description": "Enable NeMo Gym integration for environment-based RL rewards." | ||
| }, | ||
| ) | ||
| nemo_gym_dir: str | None = Field( | ||
| default=None, | ||
| json_schema_extra={ | ||
| "description": ( | ||
| "Path to the NeMo Gym repository clone. " | ||
| "If not set and nemo_gym_auto_clone is True, clones to ~/Gym." | ||
| ) | ||
| }, | ||
| ) | ||
| nemo_gym_auto_clone: bool | None = Field( | ||
| default=None, | ||
| json_schema_extra={ | ||
| "description": ( | ||
| "Automatically clone the NeMo Gym repository if not present. " | ||
| "Defaults to True when nemo_gym_enabled is set." | ||
| ) | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Resolve the effective defaults before validating this config.
With mode="before", omitting nemo_gym_enabled skips this validator, but the model still finalizes to nemo_gym_enabled=True later. The same nullable-default pattern leaves flags like nemo_gym_auto_clone as None, which downstream code treats as false, so NeMo Gym can come up enabled without its required settings and without the documented auto-clone behavior.
One way to make the defaults consistent
class NemoGymArgs(BaseModel):
@@
- nemo_gym_enabled: bool | None = Field(
- default=True,
+ nemo_gym_enabled: bool = Field(
+ default=False,
@@
- nemo_gym_auto_clone: bool | None = Field(
- default=None,
+ nemo_gym_auto_clone: bool = Field(
+ default=True,
@@
- nemo_gym_auto_start: bool | None = Field(
+ nemo_gym_auto_start: bool = Field(
default=True,
@@
- `@model_validator`(mode="before")
- `@classmethod`
- def check_nemo_gym_config(cls, data):
- if data.get("nemo_gym_enabled"):
- if not data.get("nemo_gym_config_paths") and data.get(
- "nemo_gym_auto_start", True
- ):
+ `@model_validator`(mode="after")
+ def check_nemo_gym_config(self):
+ if self.nemo_gym_enabled:
+ if not self.nemo_gym_config_paths and self.nemo_gym_auto_start:
raise ValueError(
"nemo_gym_config_paths is required when nemo_gym_enabled=True "
"and nemo_gym_auto_start is not False."
)
- if not data.get("nemo_gym_datasets"):
+ if not self.nemo_gym_datasets:
raise ValueError(
"nemo_gym_datasets is required when nemo_gym_enabled=True. "
"Provide at least one dataset with 'path' and 'server_name'."
)
- return data
+ return selfAlso applies to: 120-136
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/axolotl/integrations/nemo_gym/args.py` around lines 17 - 40, The
validator runs too early/late leaving nullable fields unresolved; add a root
validator that runs before field validation to compute effective defaults:
create a `@root_validator`(pre=True) method (e.g., resolve_nemo_gym_defaults) in
the same Pydantic model that checks incoming values and sets
nemo_gym_enabled=True if None, and sets nemo_gym_auto_clone=True when
nemo_gym_auto_clone is None && effective nemo_gym_enabled is True (and ensure
nemo_gym_dir stays None if unset). Apply the same root_validator pattern to the
other similar block (the fields around 120-136) so the effective defaults are
resolved consistently before validation.
| # Find the full dataset item | ||
| item = self._dataset_lookup.get(prompt_text, {}).get("verify_extra", {}) | ||
| if not item: | ||
| item = { | ||
| "responses_create_params": { | ||
| "input": [{"role": "user", "content": prompt_text}] | ||
| } | ||
| } | ||
| dataset_items.append(item) |
There was a problem hiding this comment.
Preserve agent_ref when building the agent request payload.
This path keeps only verify_extra, but _call_agents() picks the /run endpoint from item["agent_ref"]["name"]. In multi-environment datasets that strips away the routing metadata and causes samples to fall back to the first agent server.
Suggested fix
- item = self._dataset_lookup.get(prompt_text, {}).get("verify_extra", {})
- if not item:
- item = {
- "responses_create_params": {
- "input": [{"role": "user", "content": prompt_text}]
- }
- }
+ row = self._dataset_lookup.get(prompt_text, {})
+ item = dict(row.get("verify_extra", {}))
+ if "agent_ref" in row:
+ item["agent_ref"] = row["agent_ref"]
+ if not item:
+ item = {
+ "responses_create_params": {
+ "input": [{"role": "user", "content": prompt_text}]
+ }
+ }
dataset_items.append(item)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Find the full dataset item | |
| item = self._dataset_lookup.get(prompt_text, {}).get("verify_extra", {}) | |
| if not item: | |
| item = { | |
| "responses_create_params": { | |
| "input": [{"role": "user", "content": prompt_text}] | |
| } | |
| } | |
| dataset_items.append(item) | |
| # Find the full dataset item | |
| row = self._dataset_lookup.get(prompt_text, {}) | |
| item = dict(row.get("verify_extra", {})) | |
| if "agent_ref" in row: | |
| item["agent_ref"] = row["agent_ref"] | |
| if not item: | |
| item = { | |
| "responses_create_params": { | |
| "input": [{"role": "user", "content": prompt_text}] | |
| } | |
| } | |
| dataset_items.append(item) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/axolotl/integrations/nemo_gym/data_producer.py` around lines 99 - 107,
The code is dropping item["agent_ref"] by taking only verify_extra for a prompt,
causing _call_agents() to lose routing metadata; change the logic in
data_producer.py so you first fetch the full dataset entry (e.g., full_item =
self._dataset_lookup.get(prompt_text, {})), then set item =
full_item.get("verify_extra", {}) and if full_item contains "agent_ref" attach
it to item (item["agent_ref"] = full_item["agent_ref"]); when using the fallback
default payload for responses_create_params also ensure you preserve or include
agent_ref from full_item so _call_agents() can correctly choose the /run
endpoint.
| for prompt_str in prompts: | ||
| # Prompts from TRL are chat-templated strings. Find the dataset item | ||
| # by matching against dataset_lookup keys (raw user message text). | ||
| item = None | ||
| for key, val in dataset_lookup.items(): | ||
| if isinstance(key, str) and key in prompt_str: | ||
| item = val | ||
| break | ||
|
|
||
| if item is None: | ||
| item = { | ||
| "responses_create_params": { | ||
| "input": [{"role": "user", "content": prompt_str}] | ||
| } | ||
| } | ||
|
|
||
| for _ in range(num_generations): | ||
| expanded_items.append(item.get("verify_extra", item)) | ||
| expanded_prompt_indices.append(prompt_str) |
There was a problem hiding this comment.
Avoid fuzzy prompt matching and keep agent_ref in the dispatched payload.
if key in prompt_str can select the wrong dataset row when prompts share a system or user prefix, and item.get("verify_extra", item) then strips the agent_ref that _call_agents() uses for routing. In multi-environment runs that silently sends requests to the wrong agent.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/axolotl/integrations/nemo_gym/multi_turn.py` around lines 66 - 84, The
loop uses fuzzy substring matching and then strips agent routing info by
appending item.get("verify_extra", item); change the matching to exact/full
match (replace "if isinstance(key, str) and key in prompt_str" with equality or
full-text comparison such as "if isinstance(key, str) and prompt_str == key") to
avoid selecting wrong dataset rows, and when building expanded_items preserve
agent routing by appending the original item (or merging item["agent_ref"] into
the verify_extra payload) instead of blindly using item.get("verify_extra",
item) so that agent_ref stays in the dispatched payload used by _call_agents().
| self._global_config = get_server_configs(head_port=head_port) | ||
| wait_for_resource_servers(self._global_config) | ||
|
|
There was a problem hiding this comment.
Pass the configured timeout into wait_for_resource_servers().
start_servers() honors nemo_gym_server_timeout, but this follow-up wait still uses the helper’s 180s default. Slow boots will fail after 3 minutes even when the config explicitly allows longer.
Suggested fix
self._global_config = get_server_configs(head_port=head_port)
- wait_for_resource_servers(self._global_config)
+ wait_for_resource_servers(self._global_config, timeout=server_timeout)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| self._global_config = get_server_configs(head_port=head_port) | |
| wait_for_resource_servers(self._global_config) | |
| self._global_config = get_server_configs(head_port=head_port) | |
| wait_for_resource_servers(self._global_config, timeout=server_timeout) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/axolotl/integrations/nemo_gym/plugin.py` around lines 121 - 123,
start_servers() reads nemo_gym_server_timeout into the global config but the
subsequent call to wait_for_resource_servers(self._global_config) still uses the
helper's 180s default; update the call to pass the configured timeout from the
global config (e.g. extract nemo_gym_server_timeout from the object returned by
get_server_configs or self._global_config and call
wait_for_resource_servers(self._global_config, timeout=nemo_gym_server_timeout)
or the equivalent parameter name expected by wait_for_resource_servers) so the
explicit server timeout is honored.
| self._dataset_lookup = {} | ||
| for i in range(len(dataset)): | ||
| row = dataset[i] | ||
| prompt_text = row["prompt"][0]["content"] | ||
| self._dataset_lookup[prompt_text] = row | ||
| LOG.info(f"Built dataset lookup with {len(self._dataset_lookup)} entries") |
There was a problem hiding this comment.
Use the same lookup key that the producer searches for.
These maps are keyed by row["prompt"][0]["content"], but NemoGymDataProducer.produce() looks them up with the last prompt message. Any prompt shaped like [system, user] will miss the lookup, drop its verify_extra/agent_ref, and then fall back to the wrong agent or resource server.
Also applies to: 175-179
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/axolotl/integrations/nemo_gym/plugin.py` around lines 149 - 154, The
dataset lookup currently keys entries by row["prompt"][0]["content"], but
NemoGymDataProducer.produce() looks up using the last prompt message, causing
misses for multi-message prompts and loss of verify_extra/agent_ref; update the
keying in the __init__ (the _dataset_lookup population loop that uses dataset
and row) to use the last prompt message content (e.g.,
row["prompt"][-1]["content"] or the same extraction logic
NemoGymDataProducer.produce() uses) instead of prompt[0], and make the identical
change for the second occurrence (the block around lines 175-179) so both lookup
tables use the same key format as produce().
| nemo_producer = NemoGymDataProducer( | ||
| config=inner.config, | ||
| prompt_dataset=inner._dataset, | ||
| num_generations=inner._num_generations, | ||
| generation_batch_size=inner._generation_batch_size, | ||
| train_batch_size=inner._train_batch_size, | ||
| steps_per_generation=inner._steps_per_generation, | ||
| shuffle_dataset=inner._shuffle_dataset, | ||
| seed=inner._seed, | ||
| agent_servers=self._agent_servers, | ||
| dataset_lookup=self._dataset_lookup or {}, | ||
| request_timeout=float(cfg.nemo_gym_verify_timeout or 10800), | ||
| ) |
There was a problem hiding this comment.
Do not reuse the /verify timeout for multi-turn /run calls.
nemo_gym_verify_timeout defaults to 30s, but agent /run requests can legitimately take much longer than that. Wiring it into NemoGymDataProducer.request_timeout will abort long rollouts even though the producer’s own default is 10800s.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/axolotl/integrations/nemo_gym/plugin.py` around lines 325 - 337, The code
incorrectly wires cfg.nemo_gym_verify_timeout into the
NemoGymDataProducer.request_timeout causing multi-turn /run calls to be aborted
by the short verify timeout; change the constructor call in plugin.py where
NemoGymDataProducer is instantiated so it does not use nemo_gym_verify_timeout
for request_timeout—either remove the request_timeout argument to let
NemoGymDataProducer use its 10800s default, or use a dedicated config key (e.g.,
cfg.nemo_gym_run_timeout) with fallback to 10800; update the argument passed to
request_timeout accordingly.
| def _get_verify_urls(head_port: int = 11000) -> dict[str, str]: | ||
| """Discover verify endpoints from the NeMo Gym head server. |
There was a problem hiding this comment.
Make the public reward hook honor the configured NeMo Gym settings.
reward_nemo_gym_verify() is documented as a ready-to-use entrypoint, but it always discovers endpoints from port 11000, posts "model": "axolotl", and uses a hard-coded 30s timeout. Any run that overrides nemo_gym_head_port, nemo_gym_model_name, or nemo_gym_verify_timeout will silently talk to the wrong backend or return 0.0 on slow verifies.
Also applies to: 103-167
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/axolotl/integrations/nemo_gym/rewards.py` around lines 66 - 67, The
public hook reward_nemo_gym_verify and helper _get_verify_urls currently
hard-code the head port (11000), post payload ("model": "axolotl"), and a 30s
timeout; update both functions to read and use the configured values
(nemo_gym_head_port, nemo_gym_model_name, nemo_gym_verify_timeout) passed into
or available to reward_nemo_gym_verify so discovery and POST use the configured
head port, the configured model name in the JSON body, and the configured
timeout for HTTP requests instead of the hard-coded defaults; ensure
_get_verify_urls accepts/use the provided head_port and reward_nemo_gym_verify
passes the configured timeout into the requests call and uses
nemo_gym_model_name for payload construction.
| config_str = ",".join(config_paths) | ||
| _ng_log_file = open(os.path.join(gym_dir, "ng_run.log"), "w") # noqa: SIM115 | ||
| _ng_process = subprocess.Popen( # nosec | ||
| [ | ||
| "bash", | ||
| "-c", | ||
| f'source .venv/bin/activate && ng_run "+config_paths=[{config_str}]"', | ||
| ], |
There was a problem hiding this comment.
Build the ng_run command without bash -c.
config_paths comes from config and is spliced into a shell string here. That makes paths with spaces or quotes brittle, and shell metacharacters become executable input. Calling the venv’s ng_run binary directly avoids both problems.
Suggested fix
config_str = ",".join(config_paths)
_ng_log_file = open(os.path.join(gym_dir, "ng_run.log"), "w") # noqa: SIM115
+ ng_run = os.path.join(gym_dir, ".venv", "bin", "ng_run")
_ng_process = subprocess.Popen( # nosec
- [
- "bash",
- "-c",
- f'source .venv/bin/activate && ng_run "+config_paths=[{config_str}]"',
- ],
+ [ng_run, f"+config_paths=[{config_str}]"],
cwd=gym_dir,
stdout=_ng_log_file,
stderr=subprocess.STDOUT,
)🧰 Tools
🪛 Ruff (0.15.6)
[error] 103-103: subprocess call: check for execution of untrusted input
(S603)
[error] 104-108: Starting a process with a partial executable path
(S607)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/axolotl/integrations/nemo_gym/server.py` around lines 101 - 108, The
current code builds a shell command using "bash -c" with a spliced config_paths
string (vulnerable to spaces/quotes and shell injection); instead construct a
subprocess argument list and invoke the venv's ng_run binary directly so no
shell interpretation occurs. Locate where _ng_process is created and replace the
["bash", "-c", ...] invocation by computing the ng_run executable from the venv
under gym_dir (the same venv used earlier), then call subprocess.Popen with
shell=False and an argument list that passes config paths as separate arguments
(or a single safely-constructed argument) rather than interpolating them into a
shell string; keep _ng_log_file for stdout/stderr and preserve existing
nosec/permission handling.
| def get_server_base_url(global_config: dict, server_name: str) -> str: | ||
| """Get the base URL for a given resource server.""" | ||
| try: | ||
| srv_cfg = global_config[server_name]["resources_servers"][server_name] | ||
| return f"http://{srv_cfg['host']}:{srv_cfg['port']}" | ||
| except (KeyError, TypeError) as exc: | ||
| raise ValueError( | ||
| f"Could not find resource server config for '{server_name}' in NeMo Gym. " | ||
| f"Available servers: {list(global_config.keys())}" | ||
| ) from exc | ||
|
|
||
|
|
||
| def get_verify_endpoint(global_config: dict, server_name: str) -> str: | ||
| """Get the /verify endpoint URL for a given resource server.""" | ||
| return f"{get_server_base_url(global_config, server_name)}/verify" |
There was a problem hiding this comment.
Normalize bind-all and loopback hosts for resource servers too.
get_server_base_url() and wait_for_resource_servers() use the raw host from NeMo Gym config. If a resource server advertises 0.0.0.0 or localhost, the generated /verify URL can be unroutable and the readiness probe can time out even though the service is up. This should mirror the host-rewrite logic already used in get_agent_servers().
Also applies to: 211-215
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/axolotl/integrations/nemo_gym/server.py` around lines 186 - 200, Update
get_server_base_url (and thus get_verify_endpoint) to normalize bind-all and
loopback addresses the same way get_agent_servers does: if srv_cfg['host'] is
"0.0.0.0" replace with the external host (e.g., global_config["host"] or the
agent/cluster-facing host used elsewhere) and if it is "localhost" or
"127.0.0.1" map it to the machine-facing hostname/IP used for reachability;
ensure wait_for_resource_servers uses this normalized URL for its readiness
probes. Locate get_server_base_url and get_verify_endpoint and apply the same
host-rewrite logic/utility used by get_agent_servers (or extract that
normalization into a helper and reuse it) so /verify URLs are routable when
services advertise 0.0.0.0 or localhost.
| "usage": { | ||
| "prompt_tokens": len(all_outputs[0].prompt_token_ids) | ||
| if all_outputs | ||
| else 0, | ||
| "completion_tokens": sum( | ||
| len(out.token_ids) for o in all_outputs for out in o.outputs | ||
| ), | ||
| "total_tokens": 0, | ||
| }, |
There was a problem hiding this comment.
Populate usage.total_tokens from the actual counts.
Line 527 hardcodes total_tokens to 0, so any downstream accounting that reads the usage block will undercount every response.
Suggested fix
return {
"id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
"object": "chat.completion",
"model": script_args.model,
"choices": choices,
"usage": {
"prompt_tokens": len(all_outputs[0].prompt_token_ids)
if all_outputs
else 0,
"completion_tokens": sum(
len(out.token_ids) for o in all_outputs for out in o.outputs
),
- "total_tokens": 0,
+ "total_tokens": (
+ (len(all_outputs[0].prompt_token_ids) if all_outputs else 0)
+ + sum(len(out.token_ids) for o in all_outputs for out in o.outputs)
+ ),
},
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "usage": { | |
| "prompt_tokens": len(all_outputs[0].prompt_token_ids) | |
| if all_outputs | |
| else 0, | |
| "completion_tokens": sum( | |
| len(out.token_ids) for o in all_outputs for out in o.outputs | |
| ), | |
| "total_tokens": 0, | |
| }, | |
| "usage": { | |
| "prompt_tokens": len(all_outputs[0].prompt_token_ids) | |
| if all_outputs | |
| else 0, | |
| "completion_tokens": sum( | |
| len(out.token_ids) for o in all_outputs for out in o.outputs | |
| ), | |
| "total_tokens": ( | |
| (len(all_outputs[0].prompt_token_ids) if all_outputs else 0) | |
| sum(len(out.token_ids) for o in all_outputs for out in o.outputs) | |
| ), | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/axolotl/scripts/vllm_serve_lora.py` around lines 520 - 528, The
usage.total_tokens is hardcoded to 0; compute it from the actual counts instead.
Inside the usage dict in vllm_serve_lora.py (the block that builds "usage"),
assign prompt_tokens = len(all_outputs[0].prompt_token_ids) if all_outputs else
0, compute completion_tokens by summing len(out.token_ids) for o in all_outputs
for out in o.outputs, and set total_tokens = prompt_tokens + completion_tokens
so the "usage" object reflects the real totals.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
f717e46 to
64746db
Compare
Summary by CodeRabbit
Release Notes
New Features
/v1/models,/v1/chat/completions) to vLLM serverBug Fixes
Documentation