-
Notifications
You must be signed in to change notification settings - Fork 154
Open
Labels
enhancementNew feature or requestNew feature or request
Description
QWen3-VL pretrain performance optimization
This issue is used to track QWen3-VL pretrain performance optimization.
Functional Support
- model support
- various image resolution and text length
-
- [to be done]
- sequence packing
-
- [to be done]
- other issues
-
- Do we need to split the vision_embeds? Add Qwen3VL support (dense and moe) #1174 (comment)
- support video in dataset samples
-
- [to be done]
Parallel optimization
- Baseline
-
- PP+EP, Encoder on the first pp stage.
- M-FSDP
-
- M-FSDP for both Vision model and LLM backbone model
- DistTrain
- MDP
- Integrate MDP/DistTrain with other features
-
- Intergrate with M-FSDP. For example, use M-FSDP for Encoder, and use other 3-D parallelism for LLM backbone.
-
- Integrate with Hybrid CP or MagiAttention
-
- Integrate DistTrain with interleaved 1F1B overlap, which is essential for EP's performance.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request