Skip to content

[M4] feat: Add M4 end2end qwen3_vl example#2117

Merged
yaoyu-33 merged 2 commits intoNVIDIA-NeMo:mainfrom
shifangx:shifang/qwen3_vl_perf_m4
Jan 29, 2026
Merged

[M4] feat: Add M4 end2end qwen3_vl example#2117
yaoyu-33 merged 2 commits intoNVIDIA-NeMo:mainfrom
shifangx:shifang/qwen3_vl_perf_m4

Conversation

@shifangx
Copy link
Contributor

@shifangx shifangx commented Jan 29, 2026

What does this PR do ?

Add M4 end2end qwen3vl example

  1. Pass pg_collection from Qwen3VLMoEModelProvider(or Qwen3VLModelProvider) to Qwen3VLModel.
  2. pass pg_collection to track_moe_metrics for moe model, in train_utils.py.

How to Run

# 8 GPUs: EP8
uv run python -m torch.distributed.run --nproc_per_node=8 examples/recipes/decentralized_pg/pretrain_qwen3_vl_simple.py

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 29, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 29, 2026

📝 Walkthrough

Walkthrough

This PR threads a ProcessGroupCollection parameter through the Qwen3VL model initialization hierarchy, from provider level through the main model, language model, and transformer blocks. The parameter is stored as an instance attribute and propagated downward without altering core logic.

Changes

Cohort / File(s) Summary
Model Hierarchy
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/model.py, src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/text_model.py, src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/transformer_block.py
Added pg_collection: ProcessGroupCollection parameter to Qwen3VLModel and Qwen3VLGPTModel constructors; propagated through initialization chain to transformer blocks. Added import for ProcessGroupCollection type.
Model Provider
src/megatron/bridge/models/qwen_vl/qwen3_vl_provider.py
Updated Qwen3VLModel constructor calls in both standard and MoE provider paths to pass pg_collection=self._pg_collection.
Training Utilities
src/megatron/bridge/training/utils/train_utils.py
Passed pg_collection argument to track_moe_metrics() function call in training logging.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Suggested reviewers

  • skyw
  • cuichenx
  • ananthsub
🚥 Pre-merge checks | ✅ 1 | ❌ 3
❌ Failed checks (3 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning PR adds M4 end-to-end Qwen3VL example with hundreds of files, but PR description lacks test results, performance metrics, or regression testing documentation required for major changes. Update PR description to include test results summary, relevant performance benchmarks for the M4 example, and confirmation of no regressions introduced.
Title check ⚠️ Warning The PR title '[M4] feat: Add M4 end2end qwen3_vl example' does not match the actual changeset, which focuses on adding ProcessGroupCollection support to Qwen models, not creating an end-to-end example. Update the title to reflect the actual changes, such as '[M4] feat: Add ProcessGroupCollection support to Qwen3VL models' or similar to accurately describe the infrastructure changes being made.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/megatron/bridge/training/utils/train_utils.py (1)

571-584: Remove pg_collection parameter from track_moe_metrics() call. The function does not accept this parameter and will raise TypeError at runtime. The pg_collection parameter is for other Megatron-Core APIs like TransformerModel and get_megatron_optimizer, not for track_moe_metrics().

🤖 Fix all issues with AI agents
In `@src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/transformer_block.py`:
- Line 32: Remove the unused import ProcessGroupCollection from
megatron.core.process_groups_config in transformer_block.py to fix the F401 lint
error; alternatively, if it was intended for typing, add it to a type-only
import (e.g., from typing import TYPE_CHECKING / inside an if TYPE_CHECKING
block) and reference it in relevant function/class annotations such as any
transformer block constructor or method that would accept process group types.
🧹 Nitpick comments (2)
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/text_model.py (1)

63-64: Use ProcessGroupCollection | None for nullable pg_collection.

♻️ Proposed fix
-        pg_collection: ProcessGroupCollection = None,
+        pg_collection: ProcessGroupCollection | None = None,
As per coding guidelines: Use 'T | None' for nullable types instead of 'Optional[T]'.
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/model.py (1)

63-64: Use ProcessGroupCollection | None for nullable pg_collection.

♻️ Proposed fix
-        pg_collection: ProcessGroupCollection = None,
+        pg_collection: ProcessGroupCollection | None = None,
As per coding guidelines: Use 'T | None' for nullable types instead of 'Optional[T]'.

from megatron.core.packed_seq_params import PackedSeqParams
from megatron.core.transformer.transformer_block import TransformerBlock
from megatron.core.utils import WrappedTensor, deprecate_inference_params, make_viewless_tensor
from megatron.core.process_groups_config import ProcessGroupCollection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused ProcessGroupCollection import (F401).
Static analysis flags this as unused; drop it or wire it into type hints.

🧹 Proposed fix
-from megatron.core.process_groups_config import ProcessGroupCollection
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from megatron.core.process_groups_config import ProcessGroupCollection
🧰 Tools
🪛 Flake8 (7.3.0)

[error] 32-32: 'megatron.core.process_groups_config.ProcessGroupCollection' imported but unused

(F401)

🤖 Prompt for AI Agents
In `@src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/transformer_block.py`
at line 32, Remove the unused import ProcessGroupCollection from
megatron.core.process_groups_config in transformer_block.py to fix the F401 lint
error; alternatively, if it was intended for typing, add it to a type-only
import (e.g., from typing import TYPE_CHECKING / inside an if TYPE_CHECKING
block) and reference it in relevant function/class annotations such as any
transformer block constructor or method that would accept process group types.

@shifangx shifangx changed the title [M4] feat: Add M4 end2end qwen3vl example [M4] feat: Add M4 end2end qwen3_vl example Jan 29, 2026
@shifangx shifangx force-pushed the shifang/qwen3_vl_perf_m4 branch 3 times, most recently from 2cce048 to ee07d70 Compare January 29, 2026 05:48
@shifangx
Copy link
Contributor Author

/ok to test 603f213

@shifangx shifangx force-pushed the shifang/qwen3_vl_perf_m4 branch from 22efc7d to 8fdb5a4 Compare January 29, 2026 07:19
@shifangx
Copy link
Contributor Author

/ok to test 8fdb5a4

@shifangx
Copy link
Contributor Author

/ok to test 49569e7

@shifangx shifangx force-pushed the shifang/qwen3_vl_perf_m4 branch from 49569e7 to cc9b85f Compare January 29, 2026 11:07
@shifangx
Copy link
Contributor Author

/ok to test cc9b85f

@yaoyu-33 yaoyu-33 merged commit 194ab6c into NVIDIA-NeMo:main Jan 29, 2026
49 checks passed
conver334 pushed a commit to conver334/Megatron-Bridge that referenced this pull request Jan 30, 2026
Signed-off-by: conver334 <conver334@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants