v3.3.1
中文版
新特性
- Agent训练部署模块引入agent template,包括hermes, glm4_0414, llama4等10余种agent template,支持agent数据集兼容不同模型的训练切换,文档参考这里。
- GRPO训练支持调用外部vLLM server,训练与部署显存分配更灵活,训练脚本参考这里。
新模型
- OpenGVLab/InternVL3-1B系列
- moonshotai/Kimi-VL-A3B-Instruct系列
- ZhipuAI/GLM-4-9B-0414, ZhipuAI/GLM-Z1-9B-0414系列
English Version
New Features
- The Agent training and deployment module introduces agent templates, including more than 10 types such as hermes, glm4_0414, and llama4. These templates support switching between different models for agent dataset compatibility during training. For documentation, refer to here.
- GRPO training now supports calling an external vLLM server, allowing for more flexible allocation of GPU memory during training and deployment. For the training script, refer to here.
New Models
- OpenGVLab/InternVL3-1B series
- moonshotai/Kimi-VL-A3B-Instruct series
- ZhipuAI/GLM-4-9B-0414, ZhipuAI/GLM-Z1-9B-0414 series
What's Changed
- Fix sampling and rft by @tastelikefeet in #3847
- Fix incorrect retry count check in LazyLLMDataset.getitem by @IamLihua in #3845
- support internvl3 by @hjh0119 in #3842
- fix grpo filter overlong by @hjh0119 in #3844
- dapo-bug by @Evilxya in #3846
- support agent packing by @Jintao-Huang in #3853
- Fix internvl2.5/3 deepspeed packing by @Jintao-Huang in #3855
- fix multimodal target_modules by @Jintao-Huang in #3856
- Fix multimodal target modules by @Jintao-Huang in #3858
- Update FAQ by @slin000111 in #3841
- fix grpo completion length equal zero by @hjh0119 in #3857
- support val_dataset_shuffle by @Jintao-Huang in #3860
- Update swift docker by @Jintao-Huang in #3866
- fix citest & minimax link by @Jintao-Huang in #3868
- fix grpo save checkpoint by @hjh0119 in #3865
- support glm4-z1 by @hjh0119 in #3862
- add paper link by @tastelikefeet in #3886
- refactor mm target_regex (compat peft/vllm) by @Jintao-Huang in #3879
- Support kimi-vl by @Jintao-Huang in #3884
- Fix glm4 z1 by @Jintao-Huang in #3889
- fix bugs by @Jintao-Huang in #3893
- fix typealias to be compatible with Python 3.9 by @hjh0119 in #3895
- Fix ui by @tastelikefeet in #3903
- Fix fp16 bf16 by @Jintao-Huang in #3909
- add rm center_rewards_coefficient argument by @hjh0119 in #3917
- revert swift_from_pretrained by @Jintao-Huang in #3914
- fix grpo doc by @hjh0119 in #3920
- update qwen2_5_omni by @Jintao-Huang in #3908
- Support qwen3 by @Jintao-Huang in #3945
- Decouple vLLM engine and GRPOTrainer. by @hjh0119 in #3911
- Refactor Agent Template by @Jintao-Huang in #3918
- update docs by @Jintao-Huang in #3961
- fix bugs by @Jintao-Huang in #3962
- Support hermes loss_scale by @Jintao-Huang in #3963
- fix parse tools by @Jintao-Huang in #3975
- Update unsloth compatibility by @tastelikefeet in #3970
- Fix qwen2.5-omni use_audio_in_video by @Jintao-Huang in #3987
- Fix web-ui by @tastelikefeet in #3997
- fix get_toolcall & fix ci by @Jintao-Huang in #3999
- fix bugs by @Jintao-Huang in #4001
- fix seq_cls by @Jintao-Huang in #4002
New Contributors
Full Changelog: v3.3.0...v3.3.1