Release v3.6.0 · modelscope/ms-swift

中文版

新特性

Megatron-SWIFT：
a. 支持更多的 MoE 模型结构，包括：DeepseekV3ForCausalLM、Dots1ForCausalLM 和 Ernie4_5_MoeForCausalLM。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/moe
b. 支持更多的 Dense 模型结构，包括：MiMoForCausalLM、InternLM3ForCausalLM 和 Ernie4_5_ForCausalLM。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/dense
c. 支持 DPO 训练。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/rlhf/dpo
d. 支持 FP8 训练。
e. 支持更多 rope scaling 类型，包括：default、linear、yarn、dynamic、longrope、llama3 等。
f. --test_convert_precision参数优化，方便测试 mcore 与 huggingface 模型权重转换精度。
GRPO：
a. GRPO 多轮训练重构，支持使用 AsyncEngine 加速多轮推理，参考文档：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/DeveloperGuide/%E5%A4%9A%E8%BD%AE%E8%AE%AD%E7%BB%83.html
b. offload_model 参数额外对参考模型进行卸载。
c. 优化 sleep_level 和 offload_model 参数下的显存管理。
d. reward_funcs 增加了 trainer_state 入参，方便获取当前训练步数和总步数。
训练：
a. 支持 reranker 训练，训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/reranker
b. CPT/SFT/DPO/GRPO 纯文本大模型训练支持 ring-attention 切分序列长度，降低显存占用。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/long_text/ring_attention
c. channel loss 在CPT/SFT训练时，兼容 padding_free 与 packing。感谢招商银行技术团队的贡献。
d. remove_unused_columns 参数优化。设置为 False，则将额外数据集传递至 Trainer 内，方便自定义损失函数。
e. split_dataset_ratio参数默认值从0.01修改为0，默认不再进行验证集切分，需要手动设置--split_dataset_ratio或者--val_dataset。
f. 多模态模型 packing/padding_free 损失对齐问题修复。详见此PR：#4838
g. swanlab 支持训练完成后的飞书通知回调。
RLHF：
a. 纯文本/多模态模型支持 GKD 训练，部分场景下支持 padding_free 和 packing，训练脚本如下：
i. 大模型：https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd.sh
ii. 多模态大模型：https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/gkd.sh
b. reward model 训练支持 margin 参数支持，参考文档：https://swift.readthedocs.io/zh-cn/latest/Instruction/%E4%BA%BA%E7%B1%BB%E5%AF%B9%E9%BD%90.html#rm
全链路：
a. 支持使用 SGLang 推理引擎对 ms-swift 推理/部署/评测/ui模块进行加速，设置--infer_backend sglang即可。推理脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/infer/sglang
b. 支持 FP8 量化，量化脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/fp8.sh
Web-UI：
a. 支持 SFT/RLHF/GRPO 在不同 Tab 页面训练，支持保存训练命令行。
b. Web-UI 界面支持数据采样。

新模型

多模态模型：
a. ZhipuAI/GLM-4.1V-9B-Thinking系列
b. Kwai-Keye/Keye-VL-8B-Preview
c. moonshotai/Kimi-VL-A3B-Thinking-2506
d. google/gemma-3n-E2B-it系列
纯文本模型：
a. PaddlePaddle/ERNIE-4.5-21B-A3B-PT系列
b. rednote-hilab/dots.llm1.inst系列
c. Tencent-Hunyuan/Hunyuan-A13B-Instruct
d. MiniMax/MiniMax-M1-80k系列（推理）
e. moonshotai/Kimi-Dev-72B
f. cognitivecomputations/DeepSeek-R1-0528-AWQ

English Version

New Features

Megatron-SWIFT:
a. Support for more MoE model architectures, including: DeepseekV3ForCausalLM, Dots1ForCausalLM, and Ernie4_5_MoeForCausalLM. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/moe
b. Support for more Dense model architectures, including: MiMoForCausalLM, InternLM3ForCausalLM, and Ernie4_5_ForCausalLM. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/dense
c. DPO training supported. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/rlhf/dpo
d. FP8 training supported.
e. More rope scaling types supported, including: default, linear, yarn, dynamic, longrope, llama3, etc.
f. --test_convert_precision parameter optimized for easier testing of weight conversion precision between mcore and huggingface models.
GRPO:
a. GRPO multi-turn training refactored, supporting accelerated multi-turn inference with AsyncEngine. Documentation: https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/DeveloperGuide/%E5%A4%9A%E8%BD%AE%E8%AE%AD%E7%BB%83.html
b. The offload_model parameter now also offloads the reference model.
c. Optimized GPU memory management under sleep_level and offload_model parameters.
d. Added trainer_state as an input parameter to reward_funcs, making it easier to obtain the current and total training steps.
Training:
a. Reranker training supported. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/train/reranker
b. CPT/SFT/DPO/GRPO pure-text large model training supports ring-attention sequence length partitioning, reducing memory usage. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/train/long_text/ring_attention
c. Channel loss in CPT/SFT training is compatible with padding_free and packing. Thanks to the technical team at China Merchants Bank for their contribution.
d. Optimized remove_unused_columns parameter. When set to False, extra dataset columns are passed to the Trainer for custom loss functions.
e. The default value for split_dataset_ratio changed from 0.01 to 0, so the validation set is not split by default. You now need to manually set --split_dataset_ratio or --val_dataset.
f. Fixed loss alignment issue between packing/padding_free for multimodal models. For details, see this PR: #4838
g. Swanlab now supports Feishu (Lark Suite) notification callback after training is completed.
RLHF:
a. Pure-text and multimodal models support GKD training, with some scenarios supporting padding_free and packing. Training scripts:
i. Large models: https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd.sh
ii. Multimodal large models: https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/gkd.sh
b. Reward model training now supports the margin parameter. Documentation: https://swift.readthedocs.io/zh-cn/latest/Instruction/%E4%BA%BA%E7%B1%BB%E5%AF%B9%E9%BD%90.html#rm
Full Pipeline:
a. SGLang inference engine can be used to accelerate ms-swift inference/deployment/evaluation/ui modules, by setting --infer_backend sglang. Inference script reference: https://github.com/modelscope/ms-swift/tree/main/examples/infer/sglang
b. FP8 quantization supported. Quantization script reference: https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/fp8.sh
Web-UI:
a. Supports SFT/RLHF/GRPO training on different Tab pages, and saves training command lines.
b. Web-UI interface supports data sampling.

New Models

Multimodal Models:
a. ZhipuAI/GLM-4.1V-9B-Thinking series
b. Kwai-Keye/Keye-VL-8B-Preview
c. moonshotai/Kimi-VL-A3B-Thinking-2506
d. google/gemma-3n-E2B-it series
Pure Text Models:
a. PaddlePaddle/ERNIE-4.5-21B-A3B-PT series
b. rednote-hilab/dots.llm1.inst series
c. Tencent-Hunyuan/Hunyuan-A13B-Instruct
d. MiniMax/MiniMax-M1-80k series (inference)
e. moonshotai/Kimi-Dev-72B
f. cognitivecomputations/DeepSeek-R1-0528-AWQ

What's Changed

fix emb script and docs by @tastelikefeet in #4521
[grpo] update doc about move_model_batches by @hjh0119 in #4523
fix LoraModel by @Jintao-Huang in #4536
support cognitivecomputations/DeepSeek-R1-0528-AWQ by @Jintao-Huang in #4537
fix: handle INFONCE_HARD_NEGATIVES as integer if provided by @dlutwy in #4545
fix qwen3 embedding saving by @tastelikefeet in #4548
[megatron/dpo] fix megatron packing_cache & update DPOTrainer by @Jintao-Huang in #4556
[megatron] support DPO by @Jintao-Huang in #4193
support dots1 by @Jintao-Huang in #4560
[grpo] support offloading reference model by @hjh0119 in #4554
[grpo] fix the pickle data collator by @hjh0119 in #4562
[dataset] fix toolbench (local) by @Jintao-Huang in #4563
[Bug]Fix ulysses train steps, embedding negative sample length by @tastelikefeet in #4565
fix args.json by @Jintao-Huang in #4566
[model] fix ovis gradient_checkpointing vit no_grad by @Jintao-Huang in #4571
[megatron] Fix megatron all_reduce warning by @Jintao-Huang in #4568
[grpo] remove data collator to top-level to avoid pickle error in spawn mode by @hjh0119 in #4582
[grpo] model weight synchronization before first turn rollout with async generation by @hjh0119 in #4584
[megatron] support more rope_scaling & support deepseek-r1-qwen3-8b/internlm3/mimo-7b by @Jintao-Huang in #4576
[grpo] restore num_generations check by @hjh0119 in #4590
fix gc_kwargs by @Jintao-Huang in #4591
Fix UI llm_train by @slin000111 in #4592
[mirror] update swift mirror by @Jintao-Huang in #4601
[megatron] compat megatron-core main branch by @Jintao-Huang in #4606
[model] support minimax by @Jintao-Huang in #4610
Update FAQ by @slin000111 in #4612
[megatron] fix megatron pp max_epochs by @Jintao-Huang in #4608
Fix minimax & fix agent_template by @Jintao-Huang in #4618
[gkd] support gkd_trainer by @Jintao-Huang in #4587
[docs] remove Qwen3-32B-Base by @Jintao-Huang in #4621
[ppo] fix ppo by @Jintao-Huang in #4622
fix max_epochs tp by @Jintao-Huang in #4624
[loss_scale] support last_round_with_ignore_empty_think for rag by @sosofun in #4623
[rollout] swift rollout add template by @Jintao-Huang in #4626
[doc] LaTeX rendering by @hjh0119 in #4629
[infer/deploy/eval/app] support sglang engine by @Jintao-Huang in #3810
update docs & shell by @Jintao-Huang in #4637
update docs readme by @Jintao-Huang in #4639
[docs] update qwen3 best_practice by @Jintao-Huang in #4300
[template] optimize get_length by @Jintao-Huang in #4641
[model] fix model_meta by @Jintao-Huang in #4647
fix packing & load_from_cache_file by @Jintao-Huang in #4649
fix device_map & ddp rank0 by @Jintao-Huang in #4650
[megatron] fix eval data_collator by @Jintao-Huang in #4654
compat megatron-core 0.11 by @Jintao-Huang in #4655
[docs] update gkd by @Jintao-Huang in #4657
[gkd] support use_logits_to_keep/padding_free/packing & update gkd shell by @Jintao-Huang in #4658
[template] optimize remove_unused_columns by @Jintao-Huang in #4661
[grpo] refactor multi turn & support async engine & refactor grpo docs by @hjh0119 in #4380
[dataset] fix grounding_dataset by @Jintao-Huang in #4664
[docs] update docs by @Jintao-Huang in #4665
[channel loss]support packing & padding free by @kevssim in #4666
docs: correct typo "resonse" to "response" by @kv-chiu in #4672
[doc] fix image link by @hjh0119 in #4674
[doc] fix doc by @hjh0119 in #4675
[rollout] fix dp args by @hjh0119 in #4678
[grpo] fix grpo pt by @Jintao-Huang in #4683
[feat] support fine-tuning of reranker models by @0russwest0 in #4671
fix links by @tastelikefeet in #4690
[megatron] support DeepseekV2ForCausalLM and DeepseekV3ForCausalLM by @Jintao-Huang in #4659
[megatron] support rednote-hilab/dots.llm1.inst by @Jintao-Huang in #4707
[grpo] fix colocate seed by @hjh0119 in #4712
[doc] simplify environment variables & update best practices documentation by @0russwest0 in #4715
support Kimi-VL-A3B-Thinking-2506 & Kimi-Dev-72B by @Jintao-Huang in #4719
[quant] Support fp8 by @Jintao-Huang in #4729
[grpo] fix max_step for dataloader when applying sequence parallel by @0russwest0 in #4731
[grpo] check liger & sp by @hjh0119 in #4734
compat transformers==4.52 (vlm) by @Jintao-Huang in #4738
[grpo]Tool rl: add reward func for ToolRL by @tpx818 in #4694
[model] support Tencent-Hunyuan/Hunyuan-A13B-Instruct by @Jintao-Huang in #4745
[megatron] support fp8 by @Jintao-Huang in #4730
fix remove_unused_columns by @Jintao-Huang in #4749
[model] support ERNIE-4.5 by @Jintao-Huang in #4757
update wechat by @Jintao-Huang in #4769
update megatron shell by @Jintao-Huang in #4773
[grpo] update vllm weight sync & wake up by @hjh0119 in #4770
[docs] fix grpo docs by @hjh0119 in #4777
[grpo] pass trainer state to reward funcs by @hjh0119 in #4779
[grpo] check eval_dataset length by @hjh0119 in #4781
Fix media downloading from hf by @tastelikefeet in #4788
update resume from checkpoint & update timeout by @Jintao-Huang in #4774
update custom_dataset_docs by @Jintao-Huang in #4792
fix template bug for qwen3 reranker by @0russwest0 in #4795
[model] support GLM4.1V by @hjh0119 in #4804
[train] Update split_dataset_ratio by @Jintao-Huang in #4798
Refactor Web-UI by @slin000111 in #4687
Support ring attention for llm sft/dpo/grpo (packing/padding_free only). by @0russwest0 in #4814
[RM] support margin & update doc by @hjh0119 in #4817
[GITHUB WORKFLOW]add close stale issues workflow by @tastelikefeet in #4816
[rollout] fix external plugins by @hjh0119 in #4822
[rollout] Fix non-serializable torch.dtype bug in VLLM weight sync by @hjh0119 in #4825
[rollout] fix request from dict by @hjh0119 in #4826
[grpo] fix apply_chat_template by @hjh0119 in #4827
Support gemma3n by @0russwest0 in #4836
[train] fix multimodal packing & padding_free by @Jintao-Huang in #4838
fix multimodal padding_free prediction_step by @Jintao-Huang in #4839
[Feature] SwanLab Lark callback by @dykderrick in #4830
update stream & fix bugs by @Jintao-Huang in #4842
[megatron] Fix the display issue for train_type=lora by @Jintao-Huang in #4845
fix bug: grpo train error for deepseek model by @aacedar in #4833
[megatron] fix eval_iters -1 by @Jintao-Huang in #4847
[grpo] deprecated params for 3.6 by @hjh0119 in #4848
[grpo]Fix bug when repeatedly call inputs_to_rolloutrequest by @hrz394943230 in #4823
[grpo] fix offpolicy check by @hjh0119 in #4852
Fix test bug by @slin000111 in #4851
[grpo] update doc by @hjh0119 in #4853
[template] fix qwen3 remove '' by @Jintao-Huang in #4857
Support Kwai-Keye/Keye-VL-8B-Preview by @0russwest0 in #4856
[dataset] fix dataset ddp write conflict by @Jintao-Huang in #4860
[web-ui]Modify open parameter for Accordion by @slin000111 in #4859

New Contributors

@dlutwy made their first contribution in #4545
@sosofun made their first contribution in #4623
@kv-chiu made their first contribution in #4672
@0russwest0 made their first contribution in #4671
@tpx818 made their first contribution in #4694
@dykderrick made their first contribution in #4830
@aacedar made their first contribution in #4833
@hrz394943230 made their first contribution in #4823

Full Changelog: v3.5.0...v3.6.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v3.6.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

New Contributors

Contributors

Uh oh!