refactor: refactor env and data processor & add nemotron super 49b recipes by yuki-97 · Pull Request #1506 · NVIDIA-NeMo/RL

yuki-97 · 2025-11-11T07:53:55Z

Follow up of #1472. Thanks @nv-mmanohara for adding this!

Add GRPO support for HelpSteer3 on LlamaNemotron 49B.
Add SFT support for tulu3 on LlamaNemotron 49B.
Add CodeJaccard environment.
Refactor env and data processor.
Introduce run_grpo.py, will [Refactor] Clear run_grpo_math.py and run_grpo_rm.py #1572 in a subsequent PR.

Test Result

grpo math before and after refactor

nemotron 49B

Known Issue

nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 cannot load from Hugging Face: [BUG] nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 cannot load from Hugging Face #1571
GRPO Nemotron HelpSteer3 recipe has very high logprob error: [BUG] GRPO Nemotron HelpSteer3 recipe has very high logprob error #1570

Design explaination

Purpose of `task_name = data.task_name if hasattr(data, "task_name") else task_spec.task_name`

(Answer to #1506 (review))
Relative doc is add to [docs/guides/grpo.md](docs/guides/grpo.md).

1.1 Enhanced Understandability

In the original run_grpo_math.py, the environment was hard-coded in the code. This file only supported one math environment, and the task_name of all datasets used was uniformly set to "math".
In this scenario, task_name, task_data_processors, and env were in a strict one-to-one binding. For example, the task_name of openmathinstruct2 was hard-coded as "math", the task_data_processors for the math task was bound to math_hf_data_processor, and the environment was bound to math_env.
Under this setup, one dataset could only be paired with one processor and one environment. We could interpret the task of run_grpo_math.py as "math", and the task of run_grpo_rm.py as "reward model".
Currently, we have abstracted run_grpo.py—the environment is no longer hard-coded but specified via configuration. This makes the binding between datasets, environments, and processors more flexible. For instance, openmathinstruct2 can use either the math environment or the reward model environment.
In this flexible setup, forcing task_name to "math" for all environments would cause confusion.
Our current design is dataset-centric: the dataset name serves as the task_name, and the task corresponding to the dataset can specify its own environment and processor.

1.2 Compatibility with Future Multi-Dataset and Multi-Environment Support

Consider a multi-dataset scenario where we use two datasets: openmathinstruct2 and dapo_math. Both are math-related datasets.
Suppose we want openmathinstruct2 (see: [openmathinstruct2.py#L38](

RL/nemo_rl/data/datasets/response_datasets/openmathinstruct2.py

Line 38 in 859a89a

"task_name": "math",

)) to use the math environment, and dapo_math (see: [dapo_math.py#L37](

RL/nemo_rl/data/datasets/response_datasets/dapo_math.py

Line 37 in 859a89a

"task_name": "math",

)) to use the reward model environment. We could theoretically specify the environment for each task in task_to_env (see: [run_grpo_math.py#L123](

RL/examples/run_grpo_math.py

Line 123 in 859a89a

task_to_env["math"] = math_env

)).
However, since the task_name for both datasets is hard-coded as "task_name": "math" in the code, this multi-environment configuration cannot be implemented.
But in current design, we can specify different task_name across datasets allowing them to use different env.

Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com>

Signed-off-by: ruit <ruit@nvidia.com>

Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com>

… processors. Added raw_dataset.py and path.py for improved dataset processing. Updated project-includes in pyrefly.toml and modified grpo.md to reflect new task-dataset mapping. Cleaned up unused code and configurations in various YAML files. Signed-off-by: ruit <ruit@nvidia.com>

…or handling - Introduced documentation for the new Code Jaccard Environment, detailing its functionality, usage, and configuration. - Updated RawDataset class to provide a default processor if none is specified in the data configuration. - Enhanced test coverage for the helpsteer3 data processor to ensure correct functionality and output. Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com>

- Updated CLEVRCoGenTDataset, OpenAIFormatDataset, and SquadDataset to inherit from the RawDataset class for improved dataset handling. - Added necessary imports for RawDataset in the respective files. Signed-off-by: ruit <ruit@nvidia.com>

…up for vlm grpo - Added `env_name` to `vlm_grpo_3B_megatron.yaml` and `vlm_grpo_3B.yaml` for environment specification. - Modified `setup_data` function in `run_vlm_grpo.py` to use `env_name` for environment configuration, enhancing flexibility in dataset processing. Signed-off-by: ruit <ruit@nvidia.com>

…tion Signed-off-by: ruit <ruit@nvidia.com>

Signed-off-by: ruit <ruit@nvidia.com>

…ated sequences Signed-off-by: ruit <ruit@nvidia.com>

Signed-off-by: ruit <ruit@nvidia.com>

…line project structure. Signed-off-by: ruit <ruit@nvidia.com>

Signed-off-by: ruit <ruit@nvidia.com>

Signed-off-by: Lawrence Lane <llane@nvidia.com>

Signed-off-by: ruit <ruit@nvidia.com>

github-actions · 2025-12-12T07:45:06Z

⚠️ File Consistency Check

Check based on commit: cf6e02a (PR #1506 from yukih/pr-1472)

⚠️ Parallel Plans Synchronization Warning

The file nemo_rl/models/dtensor/parallelize.py was modified in this PR, but neither 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py nor 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py was updated.

Why this matters:
These files contain similar parallel plan implementations that should be kept synchronized to ensure consistency across the codebase.

Action required:

Please review if the changes in nemo_rl/models/dtensor/parallelize.py should also be applied to 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py or 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py
Update the appropriate related file(s) if necessary to maintain functional consistency
Request access to the NVIDIA-NeMo/Automodel repository, create a PR against the nemo-rl-submodule branch, and update the Automodel submodule in the nemo-rl index
Add @ffrujeri as a reviewer of this PR if you have any questions about the consistency requirements
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/dtensor/parallelize.py
Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/optimized_tp_plans.py
Not modified: 3rdparty/Automodel-workspace/Automodel/nemo_automodel/components/distributed/parallelizer.py

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

nemo_rl/models/policy/workers/dtensor_policy_worker.py
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions bot added the documentation Improvements or additions to documentation label Nov 11, 2025

yuki-97 force-pushed the yukih/pr-1472 branch from 75f3d5c to 5ebbc73 Compare November 11, 2025 07:54

yuki-97 added the CI:L1 Run doctests, unit tests, and functional tests label Nov 11, 2025

yuki-97 temporarily deployed to nemo-ci November 11, 2025 07:56 — with GitHub Actions Inactive

yuki-97 force-pushed the yukih/pr-1472 branch 2 times, most recently from c9335d4 to a872ed6 Compare November 11, 2025 09:27

yuki-97 removed the CI:L1 Run doctests, unit tests, and functional tests label Nov 11, 2025

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Nov 16, 2025

RayenTian temporarily deployed to nemo-ci November 16, 2025 03:31 — with GitHub Actions Inactive

RayenTian removed the CI:L1 Run doctests, unit tests, and functional tests label Nov 16, 2025

RayenTian had a problem deploying to nemo-ci November 16, 2025 03:35 — with GitHub Actions Error

RayenTian force-pushed the yukih/pr-1472 branch 2 times, most recently from b7fedb9 to 9078e33 Compare November 16, 2025 03:37

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Nov 16, 2025

RayenTian temporarily deployed to nemo-ci November 16, 2025 03:38 — with GitHub Actions Inactive

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 16, 2025

RayenTian temporarily deployed to nemo-ci November 16, 2025 08:50 — with GitHub Actions Inactive

RayenTian force-pushed the yukih/pr-1472 branch from c0bfaa6 to ab0ac80 Compare November 17, 2025 08:44

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 17, 2025

RayenTian temporarily deployed to nemo-ci November 17, 2025 08:58 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci November 17, 2025 08:59 — with GitHub Actions Inactive

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 17, 2025

RayenTian temporarily deployed to nemo-ci November 17, 2025 14:21 — with GitHub Actions Inactive

nv-mmanohara and others added 24 commits December 11, 2025 23:44

Resolving comments

200a25b

Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com>

lint

894d9e9

Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com>

refactor yaml

a82eeef

Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com>

update custom parallel plan doc

620fa54

Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com>

revert logger.py

29c3655

Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com>

unify run_grpo with multiple env

32d1726

Signed-off-by: ruit <ruit@nvidia.com>

remove useless code

f45f55d

Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com>

Remove unused base model parallel plan from custom parallel configura…

6aafd29

…tion Signed-off-by: ruit <ruit@nvidia.com>

fix doc

d8cb985

Signed-off-by: ruit <ruit@nvidia.com>

Update nightly compute test to allow for up to 1300 GPU hours

48ebad7

Signed-off-by: ruit <ruit@nvidia.com>

address comments

c342278

Signed-off-by: ruit <ruit@nvidia.com>

waive nemotron 49B because of issue #1571

64fc14e

Signed-off-by: ruit <ruit@nvidia.com>

Update loss_multiplier in helpsteer3 data processor to zero for trunc…

a4f2f03

…ated sequences Signed-off-by: ruit <ruit@nvidia.com>

remove examples in environments documentation

5dc34bb

Signed-off-by: ruit <ruit@nvidia.com>

fix comment

4878deb

Signed-off-by: ruit <ruit@nvidia.com>

Remove metrics.json file from functional distillation tests to stream…

119e86a

…line project structure. Signed-off-by: ruit <ruit@nvidia.com>

remove path.py

f1d37cd

Signed-off-by: ruit <ruit@nvidia.com>

update env name check

09b8090

Signed-off-by: ruit <ruit@nvidia.com>

docs: minor revisions (#1626)

db22883

Signed-off-by: Lawrence Lane <llane@nvidia.com>

fix doc

cf6e02a

Signed-off-by: ruit <ruit@nvidia.com>

terrykong approved these changes Dec 13, 2025

View reviewed changes

RayenTian mentioned this pull request Dec 16, 2025

chore: fix grpo functional test metric #1643

Merged

yuki-97 mentioned this pull request Feb 4, 2026

Mmanohara/merge grpo helpsteer cp tp #1472

Open

4 tasks

This was referenced Feb 4, 2026

feat: unify nemogym dataset #1807

Merged

feat: improve dataset #1893

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: refactor env and data processor & add nemotron super 49b recipes#1506

refactor: refactor env and data processor & add nemotron super 49b recipes#1506
terrykong merged 27 commits intomainfrom
yukih/pr-1472

yuki-97 commented Nov 11, 2025 •

edited by RayenTian

Loading

Uh oh!

github-actions bot commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

yuki-97 commented Nov 11, 2025 • edited by RayenTian Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Result

grpo math before and after refactor

nemotron 49B

Known Issue

Design explaination

Purpose of task_name = data.task_name if hasattr(data, "task_name") else task_spec.task_name

1.1 Enhanced Understandability

1.2 Compatibility with Future Multi-Dataset and Multi-Environment Support

Uh oh!

github-actions bot commented Dec 12, 2025

⚠️ File Consistency Check

⚠️ Parallel Plans Synchronization Warning

✅ DTensor Policy Worker Synchronization Check

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

yuki-97 commented Nov 11, 2025 •

edited by RayenTian

Loading

Purpose of `task_name = data.task_name if hasattr(data, "task_name") else task_spec.task_name`