You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
add Qwen 3.5 MoE model support with EP and VLM weight broadcast (#2026)
* feat: add Qwen 3.5 MoE model support with EP and CP integration
Adds a custom Qwen 3.5 MoE (GatedDeltaNet + MoE) implementation with:
- HF <-> PrimeRL weight conversion (fused/unfused expert formats)
- Expert Parallelism support (MoE layers auto-detected by apply_ep)
- Context Parallelism support (ring attention patching for flash attention layers)
- Router replay via routed_experts argument
- Unit tests for forward pass, weight roundtrip, router replay, and CP patching
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* remove vflm warning
* feat: add custom VLM support for Qwen 3.5 MoE
Extend Qwen3_5MoeForCausalLM to handle both text-only and VLM configs.
When the config has a vision_config, the model creates a composite body
(HF frozen vision encoder + custom PrimeRL text model). Weight conversion
auto-detects VLM keys and remaps accordingly.
- Unified model class (no separate VLM file) driven by config
- Config-based VLM detection fallback for local model paths
- VLM dispatch in get_model() via _CUSTOM_VLM_MAPPING
- mini_moe.py preset for qwen3_5_moe_vlm testing
- 6 new GPU tests covering forward/backward/weights/roundtrip/router/meta
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: add debug SFT config for real Qwen3.5-35B-A3B VLM
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: support VLM layer key format in weight broadcast
VLM models use `model.language_model.layers.*` instead of
`model.layers.*`, which crashed get_max_layer_num and caused
filter_state_dict_by_layers to silently drop layer weights.
Also fixes off-by-one in filter_state_dict_by_layers that
skipped layer 0.
* use registry-based approach
* run ruff
* chore: remove debug SFT configs for Qwen3.5 MoE
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: sami jaghouar <sami@primeintellect.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: sami jaghouar <sami.jaghouar@gmail.com>
0 commit comments