Skip to content

v3.4.0

Choose a tag to compare

@Jintao-Huang Jintao-Huang released this 30 Apr 15:45
· 811 commits to main since this release

中文版

新特性

  1. 支持Qwen3/Qwen2-MoE/Qwen3-MoE的Megatron训练(CPT/SFT),在MoE模型上相比transformers实现训练速度快近10倍。Qwen3-MoE训练最佳实践参考: #4030

新模型

  1. Qwen/Qwen3-32B, Qwen/Qwen3-30B-A3B系列
  2. Qwen/Qwen2.5-Omni-3B

English Version

New Features

  1. Support for Megatron training (CPT/SFT) of Qwen3/Qwen2-MoE/Qwen3-MoE, with training speeds nearly 10 times faster on MoE models compared to the Transformers implementation. For best practices on Qwen3-MoE training, refer to: #4030

New Models

  1. Qwen/Qwen3-32B, Qwen/Qwen3-30B-A3B series
  2. Qwen/Qwen2.5-Omni-3B

What's Changed

New Contributors

Full Changelog: v3.3.1...v3.4.0