[QWen3_VL] pretrain performance optimization

# QWen3-VL pretrain performance optimization
This issue is used to track [QWen3-VL](https://github.com/QwenLM/Qwen3-VL) pretrain performance optimization.

## Functional Support
- model support 
- - https://github.com/NVIDIA-NeMo/Megatron-Bridge/pull/1174
- - https://github.com/NVIDIA-NeMo/Megatron-Bridge/pull/1533
- - https://github.com/NVIDIA-NeMo/Megatron-Bridge/pull/1769
- various image resolution and text length
- - [to be done]
- sequence packing
- - [to be done]
- other issues
- - [a temporary fix for pipeline parallelism](https://github.com/shifangx/Megatron-Bridge/commit/8316e6138037e51df1e2620f49c3233e5231b7e4), to fix https://github.com/NVIDIA-NeMo/Megatron-Bridge/issues/1631
- - https://github.com/NVIDIA-NeMo/Megatron-Bridge/issues/1606
- - Do we need to split the vision_embeds? https://github.com/NVIDIA-NeMo/Megatron-Bridge/pull/1174#discussion_r2587897250
- support video in dataset samples
- - [to be done]

## Parallel optimization
- Baseline
- - PP+EP, Encoder on the first pp stage.
- M-FSDP
- - M-FSDP for both Vision model and LLM backbone model
- DistTrain
- - https://github.com/NVIDIA-NeMo/Megatron-Bridge/issues/1632
- MDP
- - https://github.com/NVIDIA-NeMo/Megatron-Bridge/issues/1554
- Integrate MDP/DistTrain with other features
- - Intergrate with M-FSDP. For example, use M-FSDP for Encoder, and use other 3-D parallelism for LLM backbone.
- - Integrate with [Hybrid CP](https://github.com/NVIDIA/Megatron-LM/pull/2000) or [MagiAttention](https://github.com/SandAI-org/MagiAttention)
- - Integrate DistTrain with interleaved 1F1B overlap, which is essential for EP's performance.


**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QWen3_VL] pretrain performance optimization #1605

QWen3-VL pretrain performance optimization

Functional Support

Parallel optimization

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QWen3_VL] pretrain performance optimization #1605

Description

QWen3-VL pretrain performance optimization

Functional Support

Parallel optimization

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions