v3.5.0
中文版
新特性
- GRPO:
a. 代码重构,使用参数vllm_mode指定。参数说明详见参考文档:https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO.html#id1:~:text=vllm_mode%20server%20%E5%8F%82%E6%95%B0,colocate%20mode%20%E7%94%9F%E6%95%88%E3%80%82
b. GRPO长文本优化,支持ulysses序列并行,显著降低长文本训练显存占用,训练脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/train/long_text/sequence_parallel_grpo.sh
c. 新增sync_ref_model参数,支持训练中同步参考模型权重。
d. 支持 liger kernel loss,使用参数 use_liger_kernel,降低显存占用。
e. External mode 支持 move_model_batches,降低zero3同步权重时的显存峰值。
f. 集成 INTELLECT-2 的 Two-Sided Clipping 算法,使用参数 delta。
g. 支持奖励函数返回 None,适用于多任务训练,参考文档:https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO.html#id7
h. Internal mode 支持 vllm_server_base_url,传入外部 vLLM 服务器url。
i. 插件拓展:支持 QwenLong-L1 奖励模型插件。
j. 新增 steps_per_generation/generation_batch_size 参数,支持自定义采样批量大小。
k. Web-UI支持GRPO训练。
l. 以下参数将在 v3.6 移除:tensor_parallel_size / vllm_device / vllm_max_num_seqs / num_infer_workers。 - 训练:
a. CPT/SFT/DPO/GRPO 支持 padding free。通过将批次数据展平避免数据填充(padding),显著降低显存并加速训练。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/padding_free
b. 多模态训练增强。支持使用 vit_lr 和 aligner_lr 参数独立控制 ViT 和 Aligner 模块的学习率。支持通过 vit_gradient_checkpointing 参数单独控制 vit 模块的 gradient checkpointing,性能基准测试参考:https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/vit_gradient_checkpointing.sh
c. CPT/SFT支持使用 channel loss 对不同 channel 数据集分别统计损失值。感谢招商银行技术团队的贡献。
d. CPT/SFT/DPO支持 use_logits_to_keep参数,降低显存占用,提升训练速度。
e. Qwen2.5-VL/Omni 支持传入图像目录进行视频训练。 - 推理部署:
a.swift infer批处理优化,新增 write_batch_size 参数,用于控制批处理推理结果写入result_path的间隔。
b. vllm 推理引擎默认使用 V1 engine,并支持TP和DP结合的推理模式,脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/infer/vllm/dp_tp.sh - Megatron-SWIFT:
a. 非流式数据集支持通过 max_epochs 自动计算 train_iters。
b. 提供 extra_megatron_kwargs 参数,支持未写入ms-swift的megatron参数传入。
新模型
- Qwen/Qwen3-Embedding-0.6B系列,训练脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/train/embedding/train_emb.sh
- deepseek-ai/DeepSeek-R1-0528-Qwen3-8B系列,最佳实践参考https://mp.weixin.qq.com/s/-hhfGiiGTqXUybwPH525gw
- iic/QwenLong-L1-32B
- XiaomiMiMo/MiMo-7B-RL-0530、XiaomiMiMo/MiMo-VL-7B-SFT系列
- OpenBMB/MiniCPM4-0.5B系列
English Version
New Features
- GRPO:
a. Code refactored, specified via thevllm_modeparameter. For details, refer to the documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO.html#arguments-and-execution-script:~:text=vllm_mode%20server%20parameter,in%20colocate%20mode.
b. GRPO long-text optimization with Ulysses sequence parallelism, significantly reducing GPU memory usage during long-text training. Training script: https://github.com/modelscope/ms-swift/blob/main/examples/train/long_text/sequence_parallel_grpo.sh
c. Addedsync_ref_modelparameter to synchronize reference model weights during training.
d. Supports Liger Kernel Loss viause_liger_kernelparameter, reducing GPU memory consumption.
e. External mode supportsmove_model_batchesto lower peak GPU memory during ZeRO-3 weight synchronization.
f. Integrated INTELLECT-2’s Two-Sided Clipping algorithm using thedeltaparameter.
g. Supports reward functions returning None, applicable for multi-task training. For details, refer to the documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO.html#multi-task-training
h. Internal mode supportsvllm_server_base_urlfor passing external vLLM server URLs.
i. Plugin extension: Added QwenLong-L1 reward model plugin.
j. Addedsteps_per_generationandgeneration_batch_sizeparameters for customizing sampling batch size.
k. Web-UI supports GRPO training.
l. The following parameters will be deprecated in v3.6:tensor_parallel_size,vllm_device,vllm_max_num_seqs,num_infer_workers. - Training:
a. CPT/SFT/DPO/GRPO support padding-free training. By flattening batch data to avoid padding, GPU memory usage is reduced and training speed is improved. Script: https://github.com/modelscope/ms-swift/tree/main/examples/train/padding_free
b. Multimodal training enhancements: Supports separate learning rates for ViT and Aligner modules viavit_lrandaligner_lrparameters. Addedvit_gradient_checkpointingto independently control gradient checkpointing for ViT modules. Benchmark: https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/vit_gradient_checkpointing.sh
c. CPT/SFT supportchannel_lossto separately calculate loss for different channel datasets. Thanks to the contributions from the technical team at China Merchants Bank.
d. CPT/SFT/DPO supportuse_logits_to_keepto reduce GPU memory usage and accelerate training.
e. Qwen2.5-VL/Omni support video training by passing image directories. - Inference & Deployment:
a. Optimizedswift inferbatching with newwrite_batch_sizeparameter to control inference result write intervals toresult_path.
b. vLLM inference engine now defaults to V1 engine and supports hybrid Tensor Parallelism (TP) and Data Parallelism (DP). Script: https://github.com/modelscope/ms-swift/blob/main/examples/infer/vllm/dp_tp.sh - Megatron-SWIFT:
a. Non-streaming datasets automatically calculatetrain_itersviamax_epochs.
b. Addedextra_megatron_kwargsto pass unlisted Megatron parameters into ms-swift.
New Models
- Qwen/Qwen3-Embedding-0.6B series. Training script reference: https://github.com/modelscope/ms-swift/blob/main/examples/train/embedding/train_emb.sh
- deepseek-ai/DeepSeek-R1-0528-Qwen3-8B series. Best practices: https://mp.weixin.qq.com/s/-hhfGiiGTqXUybwPH525gw
- iic/QwenLong-L1-32B
- XiaomiMiMo/MiMo-7B-RL-0530 & XiaomiMiMo/MiMo-VL-7B-SFT series
- OpenBMB/MiniCPM4-0.5B series
What's Changed
- [grpo] code refactor by @hjh0119 in #4097
- support yarn by @tastelikefeet in #4197
- fix ppo init model by @hjh0119 in #4199
- fix ppo reward model by @hjh0119 in #4200
- [doc] remove vllm version warning in grpo by @hjh0119 in #4204
- [grpo] fix colocate + tp by @hjh0119 in #4209
- Refactor packing by @Jintao-Huang in #4207
- [grpo] set system in inputs by @hjh0119 in #4214
- fix mm packing by @Jintao-Huang in #4217
- fix packing multi_node by @Jintao-Huang in #4222
- fix get reward model by @hjh0119 in #4225
- fix val_dataset_shuffle by @Jintao-Huang in #4226
- fix task type judgement in rlhf by @hjh0119 in #4228
- fix eval extral args by @Yunnglin in #4227
- fix loss_scale by @Jintao-Huang in #4229
- update docs by @Jintao-Huang in #4235
- [rlhf] prepare_model for ref_model & reduce peak memory in dpo by @hjh0119 in #4232
- fix qwen2_5_vl VIDEO_TOTAL_PIXELS by @Jintao-Huang in #4236
- Support super long length sft by @tastelikefeet in #4237
- compat transformers 4.52 by @Jintao-Huang in #4238
- update liger_kernel docs by @Jintao-Huang in #4241
- [grpo] support synchronizing ref model by @hjh0119 in #4242
- optimize packing io by @Jintao-Huang in #4244
- fix register_post_encode_hook by @Jintao-Huang in #4247
- compat megatron-core 0.11 by @Jintao-Huang in #4250
- fix qwen2_5_omni by @Jintao-Huang in #4253
- fix readme by @Jintao-Huang in #4256
- [grpo] set v1 engine as default in external rollout by @hjh0119 in #4258
- fix ddp_timeout by @Jintao-Huang in #4259
- Add tqdm by @Jintao-Huang in #4260
- Fix is_master by @Jintao-Huang in #4262
- fix ppo zero3 by @Jintao-Huang in #4263
- test link valid by @Jintao-Huang in #4265
- update docs & fix quant by @Jintao-Huang in #4268
- [grpo] fix external mode&multi turn by @hjh0119 in #4255
- fix ulysses eval by @tastelikefeet in #4271
- support IndexedDataset shard by @Jintao-Huang in #4269
- Support vit_lr aligner_lr by @Jintao-Huang in #4273
- support padding_free CPT/SFT by @Jintao-Huang in #4274
- [grpo] fix num of reward_model > 1 by @hjh0119 in #4287
- fix n > 1 with vLLM V1 Engine by @hjh0119 in #4295
- update load_args by @Jintao-Huang in #4296
- update swift image by @Jintao-Huang in #4309
- Fix ulysses pending by @tastelikefeet in #4316
- GRPO Web-UI by @slin000111 in #4285
- Fix vLLM engine returning empty in stream generation by @wizyoung in #4303
- [grpo] support dp in external mode by @hjh0119 in #4279
- compat transformers==4.52 by @Jintao-Huang in #4308
- compat transformer_engine update by @Jintao-Huang in #4317
- grpo liger loss by @hjh0119 in #3781
- Update internvl.py, solve the exception when setting customized INPUT_SIZE. by @guanwei49 in #4320
- [megatron] Add extra args and provider support for easily customize megatron by @liuyanyi in #4240
- qwen2_5_vl support video use image_dir by @Jintao-Huang in #4326
- update link & update extra_megatron_kwargs by @Jintao-Huang in #4330
- refactor GC & support vit_gc by @Jintao-Huang in #4336
- [grpo] generation batch size & mini-batch update by @hjh0119 in #4322
- [doc] fix vl training doc by @hjh0119 in #4342
- Add template kwargs to Engine by @Jintao-Huang in #4343
- [grpo] fix batch size in dynamic sampling by @hjh0119 in #4344
- [infer] vllm remove Cached by @Jintao-Huang in #4354
- [infer/deploy] vllm use v1 engine & support write_batch_size & support vllm tp & dp by @Jintao-Huang in #4345
- [callback] fix logger by @Jintao-Huang in #4367
- [dataset] fix LazyLLMDataset random by @Jintao-Huang in #4369
- [dist] fix ddp_timeout by @Jintao-Huang in #4373
- [grpo] Refactor GRPOVllmEngine by @hjh0119 in #4375
- [megatron] fix save timeout & pp4 hang by @Jintao-Huang in #4381
- [grpo] QwenLong-L1 reward model plugin by @hjh0119 in #4385
- [megatron] fix split_dataset_ratio by @Jintao-Huang in #4391
- Standardize think templates by @hjh0119 in #4395
- update requirements by @hjh0119 in #4397
- [deploy] fix client timeout by @Jintao-Huang in #4399
- Support ulysses padding-free grpo by @tastelikefeet in #4377
- [dataset] Fix multinode packing by @Jintao-Huang in #4402
- fix ulysses by @tastelikefeet in #4404
- [pt/sft] support use_logits_to_keep & support DeepSeek-R1-0528 by @Jintao-Huang in #4409
- support DeepSeek-R1-0528-Qwen3-8B by @Jintao-Huang in #4417
- Fix cmdline parsing error on Windows system by @slin000111 in #4422
- fix transformers 4.52 device_map ddp by @Jintao-Huang in #4424
- [dataset] fix self-cognition & load_from_cache_file by @Jintao-Huang in #4426
- [dataset] add ms_logger_context by @Jintao-Huang in #4428
- [model] Support MiMo-VL by @Jintao-Huang in #4429
- fix model_meta by @tastelikefeet in #4431
- fix emb docs by @tastelikefeet in #4434
- [megatron] support megatron num_train_epochs by @Jintao-Huang in #4432
- fix qwen2_5_vl awq by @Jintao-Huang in #4436
- [template] fix vlm padding_free by @Jintao-Huang in #4444
- [grpo] support vllm_server_base_url for vLLMClient by @hjh0119 in #4449
- [grpo] Two-Sided Clipping for GRPO Trainer by @hjh0119 in #4450
- [grpo] fix hang in colocate lora settings by @hjh0119 in #4451
- [dpo] support dpo padding_free & dpo compat trl==0.18 by @Jintao-Huang in #4394
- [seq_parallel] fix sp compute_acc by @Jintao-Huang in #4456
- [train] Fix qwen2.5-vl use_cache by @Jintao-Huang in #4458
- [grpo] fix base url by @hjh0119 in #4463
- Fix create checkpoint symlink & grpo omni by @Jintao-Huang in #4468
- Fix omni grpo by @tastelikefeet in #4469
- [eval] fix eval dependence by @Yunnglin in #4472
- Support qwen3 embedding by @tastelikefeet in #4357
- [seq_parallel] fix sp compute_acc by @Jintao-Huang in #4474
- fix by @tastelikefeet in #4475
- [megatron] fix val_dataset by @Jintao-Huang in #4478
- [pt/sft] Feature channel loss by @kevssim in #4405
- [grpo] fix infer url by @hjh0119 in #4480
- [vlm] fix llm_lora vlm_full by @Jintao-Huang in #4482
- [infer] fix infer stream print by @Jintao-Huang in #4485
- Fix multi modal bugs in ulysses by @tastelikefeet in #4484
- [grpo] support move_model_batches for external mode by @hjh0119 in #4453
- [grpo] support None reward & multi-task doc & more profiling by @hjh0119 in #4459
- [grpo] update grpo check by @hjh0119 in #4493
- fix sft eval by @tastelikefeet in #4494
- [loss] fix vlm channel loss by @Jintao-Huang in #4497
- [Dataset]add stsb positive subset by @tastelikefeet in #4502
- [train] Fix vlm use_logits_to_keep by @Jintao-Huang in #4506
- Support minicpm4 by @Jintao-Huang in #4508
- [dataset] fix dpo emoji dataset by @Jintao-Huang in #4514
- [megatron] fix pp4 by @Jintao-Huang in #4516
- [qwen2.5-omni] Fix omni save checkpoint by @Jintao-Huang in #4517
- [qwen2.5-omni] Fix omni get_template by @Jintao-Huang in #4518
New Contributors
- @wizyoung made their first contribution in #4303
- @guanwei49 made their first contribution in #4320
- @liuyanyi made their first contribution in #4240
Full Changelog: v3.4.1...v3.5.0