refactor: refactor env and data processor & add nemotron super 49b recipes#1506
refactor: refactor env and data processor & add nemotron super 49b recipes#1506
Conversation
75f3d5c to
5ebbc73
Compare
c9335d4 to
a872ed6
Compare
b7fedb9 to
9078e33
Compare
c0bfaa6 to
ab0ac80
Compare
Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: ruit <ruit@nvidia.com>
… processors. Added raw_dataset.py and path.py for improved dataset processing. Updated project-includes in pyrefly.toml and modified grpo.md to reflect new task-dataset mapping. Cleaned up unused code and configurations in various YAML files. Signed-off-by: ruit <ruit@nvidia.com>
…or handling
- Introduced documentation for the new Code Jaccard Environment, detailing its functionality, usage, and configuration.
- Updated RawDataset class to provide a default processor if none is specified in the data configuration.
- Enhanced test coverage for the helpsteer3 data processor to ensure correct functionality and output.
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
- Updated CLEVRCoGenTDataset, OpenAIFormatDataset, and SquadDataset to inherit from the RawDataset class for improved dataset handling. - Added necessary imports for RawDataset in the respective files. Signed-off-by: ruit <ruit@nvidia.com>
…up for vlm grpo - Added `env_name` to `vlm_grpo_3B_megatron.yaml` and `vlm_grpo_3B.yaml` for environment specification. - Modified `setup_data` function in `run_vlm_grpo.py` to use `env_name` for environment configuration, enhancing flexibility in dataset processing. Signed-off-by: ruit <ruit@nvidia.com>
…tion Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
…ated sequences Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
…line project structure. Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
|
Follow up of #1472. Thanks @nv-mmanohara for adding this!
run_grpo.py, will [Refactor] Clearrun_grpo_math.pyandrun_grpo_rm.py#1572 in a subsequent PR.Test Result
grpo math before and after refactor
nemotron 49B
Known Issue
Design explaination
Purpose of
task_name = data.task_name if hasattr(data, "task_name") else task_spec.task_name(Answer to #1506 (review))
Relative doc is add to [docs/guides/grpo.md](docs/guides/grpo.md).
1.1 Enhanced Understandability
run_grpo_math.py, the environment was hard-coded in the code. This file only supported one math environment, and thetask_nameof all datasets used was uniformly set to "math".task_name,task_data_processors, andenvwere in a strict one-to-one binding. For example, thetask_nameofopenmathinstruct2was hard-coded as "math", thetask_data_processorsfor the math task was bound tomath_hf_data_processor, and the environment was bound tomath_env.run_grpo_math.pyas "math", and the task ofrun_grpo_rm.pyas "reward model".run_grpo.py—the environment is no longer hard-coded but specified via configuration. This makes the binding between datasets, environments, and processors more flexible. For instance,openmathinstruct2can use either the math environment or the reward model environment.task_nameto "math" for all environments would cause confusion.task_name, and the task corresponding to the dataset can specify its own environment and processor.1.2 Compatibility with Future Multi-Dataset and Multi-Environment Support
openmathinstruct2anddapo_math. Both are math-related datasets.openmathinstruct2(see: [openmathinstruct2.py#L38](RL/nemo_rl/data/datasets/response_datasets/openmathinstruct2.py
Line 38 in 859a89a
dapo_math(see: [dapo_math.py#L37](RL/nemo_rl/data/datasets/response_datasets/dapo_math.py
Line 37 in 859a89a
task_to_env(see: [run_grpo_math.py#L123](RL/examples/run_grpo_math.py
Line 123 in 859a89a
task_namefor both datasets is hard-coded as"task_name": "math"in the code, this multi-environment configuration cannot be implemented.