feat: support qwen-omni grpo training recipe by yuekaizhang · Pull Request #2073 · NVIDIA-NeMo/RL

yuekaizhang · 2026-03-06T04:41:14Z

Conditional PR: NVIDIA-NeMo/Megatron-Bridge#2634, NVIDIA-NeMo/Megatron-Bridge#2342

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

New Features
- Added audio support for multimodal environments and data processing pipelines.
- Introduced AISHELL dataset for automatic speech recognition training.
- Introduced AVQA dataset for audio question-answering fine-tuning.
- Added example configurations for audio GRPO and audio language model training with Megatron backend.
- Enhanced multimodal content handling to process audio alongside images and videos.

Signed-off-by: root <zhangyuekai@foxmail.com>

copy-pr-bot · 2026-03-06T04:41:19Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-03-06T04:49:59Z

📝 Walkthrough

Walkthrough

Adds audio training support by introducing AISHELL and AVQA dataset wrappers with audio preprocessing, audio-enabled configuration files for GRPO and SFT training, and extends multimodal data handling across collation, processing, and generation pipelines to support audio modality alongside images and text.

Changes

Cohort / File(s)	Summary
Audio Dataset Implementations `nemo_rl/data/datasets/response_datasets/aishell.py`, `nemo_rl/data/datasets/response_datasets/avqa.py`, `nemo_rl/data/datasets/response_datasets/__init__.py`	Adds AishellDataset and AVQADataset classes with audio resampling, question parsing, and OpenAI-style message formatting. Registers both datasets in DATASET_REGISTRY and exports them via all.
Audio Training Configurations `examples/configs/audio_grpo_3B_megatron.yaml`, `examples/configs/sft_audio_lm_megatron.yaml`, `examples/configs/sft_openmathinstruct2.yaml`	Introduces comprehensive GRPO and SFT configuration files for audio-based training with Megatron backend (Qwen2.5Omni and Qwen2-Audio), plus minor processor specification update to OpenMathInstruct config.
Audio Data Pipeline `nemo_rl/data/collate_fn.py`, `nemo_rl/data/processors.py`, `nemo_rl/data/multimodal_utils.py`	Extends collation and processing logic to collect and forward vllm_audios; adds audio content handling in vlm_hf_data_processor alongside images/text; includes processor.model_input_names in multimodal key aggregation.
Audio in Generation & Rollouts `nemo_rl/experience/rollouts.py`, `nemo_rl/models/generation/vllm/utils.py`	Propagates vllm_audios through rollout generation and generalizes vLLM multimodal data handling to support both images and audios in a unified multi_modal_data dictionary.
Infrastructure & Utilities `nemo_rl/environments/utils.py`, `nemo_rl/models/megatron/setup.py`, `nemo_rl/utils/logger.py`, `examples/prompts/avqa_cot.txt`	Registers "avqa" environment in ENV_REGISTRY; adds VLM wrapper unwrapping for thinker module access in MoE router setup; improves numpy array serialization in JSONL logging; adds empty AVQA prompt template file.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Possibly related PRs

PR #2016 — Modifies the same multimodal data-loading and vLLM audio handling codepaths (processors, multimodal_utils, vLLM generation).
PR #1649 — Refactors dataset registry and loader interfaces in response_datasets, directly affected by new dataset registrations in this PR.
PR #1334 — Both modify vLLM integration code for multimodal handling (generation/vllm modules).

Suggested labels

CI:L1

Suggested reviewers

yuki-97
terrykong
cuichenx

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR contains major changes (~666 lines) with new datasets and audio processing, but lacks experiment results, logs, and documentation despite being marked WIP with incomplete TODOs.	Complete comprehensive testing of new datasets and training recipes, document test results in PR description, attach experiment logs as planned, and fix identified bugs before merging.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the main purpose of the pull request: adding support for Qwen-Omni GRPO training recipe with new audio datasets, configurations, and processors.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (3)

examples/prompts/avqa_cot.txt (1)
1-1: Clarify the intent of the empty prompt template.

The file contains only {} which provides no prompt formatting. If this is intentional (e.g., AVQA dataset already contains formatted prompts), consider adding a comment explaining this. If it's a placeholder, the TODO in the PR checklist should track completing it.
📝 Proposed documentation
-{}
+{
+  // Empty template: AVQA dataset messages are pre-formatted.
+  // The user message content is passed through without additional prompt wrapping.
+}
Or if JSON comments aren't supported, create a companion README or use the prompt file itself:
-{}
+{question}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/prompts/avqa_cot.txt` at line 1, The file
examples/prompts/avqa_cot.txt currently contains only "{}", which is ambiguous;
update the file to clarify intent by either replacing "{}" with the intended
prompt template for AVQA chain-of-thought (or a clear placeholder template) or
add a top-line comment explaining that "{}" is intentional because prompts are
provided externally by the AVQA dataset and link to the dataset/source; if this
is a temporary placeholder, add a TODO with an issue/PR reference in the file
(or create a companion README) to indicate who will complete the template and
when.
nemo_rl/models/megatron/setup.py (1)
696-700: Consider adding thinker unwrapping to MoEFloat16Module.re_enable_float32_expert_bias() for consistency.

The freeze_moe_router function now unwraps models with a thinker attribute (line 696-697) before accessing language_model. However, MoEFloat16Module.re_enable_float32_expert_bias() (lines 1051-1054) only checks for language_model:
# Line 1051-1054
if hasattr(module, "language_model"):
    module = module.language_model
If this wrapper is used with Qwen2.5-Omni models, it may fail to properly access the decoder layers.
♻️ Proposed fix for consistency
 def re_enable_float32_expert_bias(self) -> None:
     ...
     module = self.module
+    # Handle VLM models where thinker wraps the language model
+    if hasattr(module, "thinker"):
+        module = module.thinker
     # Handle VLM models where language model is nested
     if hasattr(module, "language_model"):
         module = module.language_model
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_rl/models/megatron/setup.py` around lines 696 - 700, The method
MoEFloat16Module.re_enable_float32_expert_bias currently only unwraps modules
via the language_model attribute but freeze_moe_router also unwraps a thinker
wrapper first; update re_enable_float32_expert_bias to mirror that logic by
checking hasattr(module, "thinker") and setting module = module.thinker before
the existing hasattr(module, "language_model") unwrap so it reliably reaches
module.decoder.layers for wrapped models (e.g., Qwen2.5-Omni).
nemo_rl/data/datasets/response_datasets/avqa.py (1)
103-107: Verify that list rendering for choices is intentional.

_parse_question returns choices as a list (e.g., ["3", "One", "4", "2"]), and DEFAULT_TEMPLATE.format(choices=choices) will render it as "['3', 'One', '4', '2']" in the prompt. This might produce awkward prompts like:

"How many animals...? Please choose from: ['3', 'One', '4', '2']."

Consider formatting choices explicitly:
Suggested fix
+        choices_str = ", ".join(choices) if choices else ""
-        prompt_text = DEFAULT_TEMPLATE.format(question=question, choices=choices)
+        prompt_text = DEFAULT_TEMPLATE.format(question=question, choices=choices_str)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_rl/data/datasets/response_datasets/avqa.py` around lines 103 - 107, The
prompt currently inserts the raw list returned by _parse_question into
DEFAULT_TEMPLATE, producing Python-list style output (e.g., "['3','One',...]");
before formatting the template convert choices into a human-friendly string
(e.g., choices_str = ", ".join(choices) or another desired separator/labeling)
and use that string when building prompt_text (i.e., pass choices=choices_str to
DEFAULT_TEMPLATE.format), keeping the rest of the logic (question replacement
and prompt_text creation) unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/configs/audio_grpo_3B_megatron.yaml`:
- Around line 63-66: The config hardcodes a local path for policy.model_name
which is user-specific; update policy.model_name to a HuggingFace model
identifier or a clear placeholder (e.g., "qwen/qwen-2.5-omni" or
"<HF_MODEL_ID>") and ensure tokenizer.name references the same identifier
(tokenizer.name: ${policy.model_name}) so others can run the example without the
local filesystem path.
- Line 140: The YAML sets converter_type: Qwen2_5OmniForConditionalGeneration
which is unsupported by Megatron-Bridge; update the converter_type entry to a
supported converter (e.g., Qwen2, Qwen2.5, Qwen2.5-VL, or a Qwen3 variant) or
remove the converter_type line and wire in a custom bridge implementation if
Omni (audio/video/speech) support is required; look for the converter_type key
in the file and replace Qwen2_5OmniForConditionalGeneration with the appropriate
supported converter name or add a note to implement a custom Megatron-Bridge
converter for Omni models.

In `@examples/configs/sft_audio_lm_megatron.yaml`:
- Around line 24-26: The config's policy.model_name is set to a user-local path
(/workspace_yuekai/HF/Qwen2-Audio-7B); replace it with a reproducible
HuggingFace model identifier or a clear placeholder (e.g., "Qwen2-Audio-7B" or
"<HF_MODEL_ID>") so other users can run the example, and ensure the
corresponding tokenizer field under policy is set to a matching tokenizer ID or
placeholder as well.

In `@nemo_rl/data/datasets/response_datasets/aishell.py`:
- Line 42: The load_dataset invocation in the constructor incorrectly hardcodes
split="test" and passes the validated split as a positional arg, causing the
user-provided split to be ignored; update the load_dataset call referenced by
self.dataset to use the split variable (e.g., pass split as the keyword
split=split or as the single positional split) and remove the hardcoded
split="test" so the requested split parameter is honored.
- Line 33: vlm_hf_data_processor is missing a handler for task_name "aishell",
causing a ValueError; update the dispatcher in vlm_hf_data_processor (in
nemo_rl/data/processors.py) to add a branch for task_name == "aishell" that
mirrors the AVQA pass-through behavior (i.e., return the input examples/records
unchanged or call the same helper used by AVQA), referencing the task_name
"aishell" string and the vlm_hf_data_processor function name so the aishell
dataset in nemo_rl/data/datasets/response_datasets/aishell.py is processed
without error.

In `@nemo_rl/data/datasets/response_datasets/avqa.py`:
- Line 84: Replace the hardcoded path passed to load_dataset with a configurable
parameter: accept a data_path (or dataset_id) from the constructor kwargs or
config, default to a public HuggingFace dataset identifier if not provided, and
use that value when calling load_dataset to set self.dataset; update the
constructor signature and any callers to forward data_path and ensure the code
uses load_dataset(data_path_or_id, split=split) instead of the
developer-specific "/workspace_yuekai/HF/avqa-processed".

---

Nitpick comments:
In `@examples/prompts/avqa_cot.txt`:
- Line 1: The file examples/prompts/avqa_cot.txt currently contains only "{}",
which is ambiguous; update the file to clarify intent by either replacing "{}"
with the intended prompt template for AVQA chain-of-thought (or a clear
placeholder template) or add a top-line comment explaining that "{}" is
intentional because prompts are provided externally by the AVQA dataset and link
to the dataset/source; if this is a temporary placeholder, add a TODO with an
issue/PR reference in the file (or create a companion README) to indicate who
will complete the template and when.

In `@nemo_rl/data/datasets/response_datasets/avqa.py`:
- Around line 103-107: The prompt currently inserts the raw list returned by
_parse_question into DEFAULT_TEMPLATE, producing Python-list style output (e.g.,
"['3','One',...]"); before formatting the template convert choices into a
human-friendly string (e.g., choices_str = ", ".join(choices) or another desired
separator/labeling) and use that string when building prompt_text (i.e., pass
choices=choices_str to DEFAULT_TEMPLATE.format), keeping the rest of the logic
(question replacement and prompt_text creation) unchanged.

In `@nemo_rl/models/megatron/setup.py`:
- Around line 696-700: The method MoEFloat16Module.re_enable_float32_expert_bias
currently only unwraps modules via the language_model attribute but
freeze_moe_router also unwraps a thinker wrapper first; update
re_enable_float32_expert_bias to mirror that logic by checking hasattr(module,
"thinker") and setting module = module.thinker before the existing
hasattr(module, "language_model") unwrap so it reliably reaches
module.decoder.layers for wrapped models (e.g., Qwen2.5-Omni).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b678d307-04c5-4bc1-8c8e-4d1cd5f8e056

📥 Commits

Reviewing files that changed from the base of the PR and between c4f8e1c and ad1c0b6.

📒 Files selected for processing (15)

examples/configs/audio_grpo_3B_megatron.yaml
examples/configs/sft_audio_lm_megatron.yaml
examples/configs/sft_openmathinstruct2.yaml
examples/prompts/avqa_cot.txt
nemo_rl/data/collate_fn.py
nemo_rl/data/datasets/response_datasets/__init__.py
nemo_rl/data/datasets/response_datasets/aishell.py
nemo_rl/data/datasets/response_datasets/avqa.py
nemo_rl/data/multimodal_utils.py
nemo_rl/data/processors.py
nemo_rl/environments/utils.py
nemo_rl/experience/rollouts.py
nemo_rl/models/generation/vllm/utils.py
nemo_rl/models/megatron/setup.py
nemo_rl/utils/logger.py

examples/configs/audio_grpo_3B_megatron.yaml

examples/configs/sft_audio_lm_megatron.yaml

nemo_rl/data/datasets/response_datasets/aishell.py

nemo_rl/data/datasets/response_datasets/avqa.py

Signed-off-by: root <zhangyuekai@foxmail.com>

yuekaizhang · 2026-03-10T09:29:46Z

@snowmanwwg Hi, I was wondering if you know someone could help review the PR, many thanks.

I have verified the PR with the below training results:

Model	MMAU (v05.15.25)
Qwen2.5-Omni-3B	69.8
+ HF GRPO	71.6
+ Nemo-RL GRPO (This PR)	72.1

support qwen-omni grpo training recipe

ad1c0b6

Signed-off-by: root <zhangyuekai@foxmail.com>

yuekaizhang requested review from a team as code owners March 6, 2026 04:41

github-actions bot added the community-request label Mar 6, 2026

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

yuekaizhang added 2 commits March 10, 2026 16:42

update lr

a4cc659

Signed-off-by: root <zhangyuekai@foxmail.com>

fix hard coding path

b134240

Signed-off-by: root <zhangyuekai@foxmail.com>

yuekaizhang changed the title ~~[WIP] support qwen-omni grpo training recipe~~ feat: support qwen-omni grpo training recipe Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support qwen-omni grpo training recipe#2073

feat: support qwen-omni grpo training recipe#2073
yuekaizhang wants to merge 3 commits intoNVIDIA-NeMo:mainfrom
yuekaizhang:qwen_omni

yuekaizhang commented Mar 6, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Mar 6, 2026

Uh oh!

coderabbitai bot commented Mar 6, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuekaizhang commented Mar 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yuekaizhang commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Mar 6, 2026

Uh oh!

coderabbitai bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuekaizhang commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yuekaizhang commented Mar 6, 2026 •

edited

Loading

coderabbitai bot commented Mar 6, 2026 •

edited

Loading

yuekaizhang commented Mar 10, 2026 •

edited

Loading