v3.4.0
中文版
新特性
- 支持Qwen3/Qwen2-MoE/Qwen3-MoE的Megatron训练(CPT/SFT),在MoE模型上相比transformers实现训练速度快近10倍。Qwen3-MoE训练最佳实践参考: #4030
新模型
- Qwen/Qwen3-32B, Qwen/Qwen3-30B-A3B系列
- Qwen/Qwen2.5-Omni-3B
English Version
New Features
- Support for Megatron training (CPT/SFT) of Qwen3/Qwen2-MoE/Qwen3-MoE, with training speeds nearly 10 times faster on MoE models compared to the Transformers implementation. For best practices on Qwen3-MoE training, refer to: #4030
New Models
- Qwen/Qwen3-32B, Qwen/Qwen3-30B-A3B series
- Qwen/Qwen2.5-Omni-3B
What's Changed
- 🐛 fix: fix reward model train seq_cls by @gaohongkui in #3921
- Support vllm quantization by @tastelikefeet in #4003
- [megatron] Support Qwen3 by @Jintao-Huang in #3995
- Fix merge sentence transformers by @tastelikefeet in #4011
- Fix gte training and compatible with ds3 by @tastelikefeet in #4022
- fix truncation_strategy by @Jintao-Huang in #4025
- [Megatron] support MoE (Qwen2-Moe & Qwen3-MoE) by @Jintao-Huang in #4012
- Support Qwen3 series by @Jintao-Huang in #4029
- fix bugs by @Jintao-Huang in #4031
- fix grpo resume_from_checkpoint by @Jintao-Huang in #4035
- support qwen3_self_cognition by @Jintao-Huang in #4039
- Update readme & fix generate by @Jintao-Huang in #4041
- update wechat by @tastelikefeet in #4047
- support Qwen2.5-Omni-3B by @Jintao-Huang in #4052
- updates GRPOTrainer compatible with trl 0.17 by @hjh0119 in #3969
- fix rollout by @hjh0119 in #4055
New Contributors
- @gaohongkui made their first contribution in #3921
Full Changelog: v3.3.1...v3.4.0