Release v3.5.0 · modelscope/ms-swift

中文版

新特性

GRPO：
a. 代码重构，使用参数vllm_mode指定。参数说明详见参考文档：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO.html#id1:~:text=vllm_mode%20server%20%E5%8F%82%E6%95%B0,colocate%20mode%20%E7%94%9F%E6%95%88%E3%80%82
b. GRPO长文本优化，支持ulysses序列并行，显著降低长文本训练显存占用，训练脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/train/long_text/sequence_parallel_grpo.sh
c. 新增sync_ref_model参数，支持训练中同步参考模型权重。
d. 支持 liger kernel loss，使用参数 use_liger_kernel，降低显存占用。
e. External mode 支持 move_model_batches，降低zero3同步权重时的显存峰值。
f. 集成 INTELLECT-2 的 Two-Sided Clipping 算法，使用参数 delta。
g. 支持奖励函数返回 None，适用于多任务训练，参考文档：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO.html#id7
h. Internal mode 支持 vllm_server_base_url，传入外部 vLLM 服务器url。
i. 插件拓展：支持 QwenLong-L1 奖励模型插件。
j. 新增 steps_per_generation/generation_batch_size 参数，支持自定义采样批量大小。
k. Web-UI支持GRPO训练。
l. 以下参数将在 v3.6 移除：tensor_parallel_size / vllm_device / vllm_max_num_seqs / num_infer_workers。
训练：
a. CPT/SFT/DPO/GRPO 支持 padding free。通过将批次数据展平避免数据填充（padding），显著降低显存并加速训练。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/padding_free
b. 多模态训练增强。支持使用 vit_lr 和 aligner_lr 参数独立控制 ViT 和 Aligner 模块的学习率。支持通过 vit_gradient_checkpointing 参数单独控制 vit 模块的 gradient checkpointing，性能基准测试参考：https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/vit_gradient_checkpointing.sh
c. CPT/SFT支持使用 channel loss 对不同 channel 数据集分别统计损失值。感谢招商银行技术团队的贡献。
d. CPT/SFT/DPO支持 use_logits_to_keep参数，降低显存占用，提升训练速度。
e. Qwen2.5-VL/Omni 支持传入图像目录进行视频训练。
推理部署：
a. swift infer批处理优化，新增 write_batch_size 参数，用于控制批处理推理结果写入result_path的间隔。
b. vllm 推理引擎默认使用 V1 engine，并支持TP和DP结合的推理模式，脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/infer/vllm/dp_tp.sh
Megatron-SWIFT：
a. 非流式数据集支持通过 max_epochs 自动计算 train_iters。
b. 提供 extra_megatron_kwargs 参数，支持未写入ms-swift的megatron参数传入。

新模型

Qwen/Qwen3-Embedding-0.6B系列，训练脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/train/embedding/train_emb.sh
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B系列，最佳实践参考https://mp.weixin.qq.com/s/-hhfGiiGTqXUybwPH525gw
iic/QwenLong-L1-32B
XiaomiMiMo/MiMo-7B-RL-0530、XiaomiMiMo/MiMo-VL-7B-SFT系列
OpenBMB/MiniCPM4-0.5B系列

English Version

New Features

GRPO:
a. Code refactored, specified via the vllm_mode parameter. For details, refer to the documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO.html#arguments-and-execution-script:~:text=vllm_mode%20server%20parameter,in%20colocate%20mode.
b. GRPO long-text optimization with Ulysses sequence parallelism, significantly reducing GPU memory usage during long-text training. Training script: https://github.com/modelscope/ms-swift/blob/main/examples/train/long_text/sequence_parallel_grpo.sh
c. Added sync_ref_model parameter to synchronize reference model weights during training.
d. Supports Liger Kernel Loss via use_liger_kernel parameter, reducing GPU memory consumption.
e. External mode supports move_model_batches to lower peak GPU memory during ZeRO-3 weight synchronization.
f. Integrated INTELLECT-2’s Two-Sided Clipping algorithm using the delta parameter.
g. Supports reward functions returning None, applicable for multi-task training. For details, refer to the documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO.html#multi-task-training
h. Internal mode supports vllm_server_base_url for passing external vLLM server URLs.
i. Plugin extension: Added QwenLong-L1 reward model plugin.
j. Added steps_per_generation and generation_batch_size parameters for customizing sampling batch size.
k. Web-UI supports GRPO training.
l. The following parameters will be deprecated in v3.6: tensor_parallel_size, vllm_device, vllm_max_num_seqs, num_infer_workers.
Training:
a. CPT/SFT/DPO/GRPO support padding-free training. By flattening batch data to avoid padding, GPU memory usage is reduced and training speed is improved. Script: https://github.com/modelscope/ms-swift/tree/main/examples/train/padding_free
b. Multimodal training enhancements: Supports separate learning rates for ViT and Aligner modules via vit_lr and aligner_lr parameters. Added vit_gradient_checkpointing to independently control gradient checkpointing for ViT modules. Benchmark: https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/vit_gradient_checkpointing.sh
c. CPT/SFT support channel_loss to separately calculate loss for different channel datasets. Thanks to the contributions from the technical team at China Merchants Bank.
d. CPT/SFT/DPO support use_logits_to_keep to reduce GPU memory usage and accelerate training.
e. Qwen2.5-VL/Omni support video training by passing image directories.
Inference & Deployment:
a. Optimized swift infer batching with new write_batch_size parameter to control inference result write intervals to result_path.
b. vLLM inference engine now defaults to V1 engine and supports hybrid Tensor Parallelism (TP) and Data Parallelism (DP). Script: https://github.com/modelscope/ms-swift/blob/main/examples/infer/vllm/dp_tp.sh
Megatron-SWIFT:
a. Non-streaming datasets automatically calculate train_iters via max_epochs.
b. Added extra_megatron_kwargs to pass unlisted Megatron parameters into ms-swift.

New Models

Qwen/Qwen3-Embedding-0.6B series. Training script reference: https://github.com/modelscope/ms-swift/blob/main/examples/train/embedding/train_emb.sh
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B series. Best practices: https://mp.weixin.qq.com/s/-hhfGiiGTqXUybwPH525gw
iic/QwenLong-L1-32B
XiaomiMiMo/MiMo-7B-RL-0530 & XiaomiMiMo/MiMo-VL-7B-SFT series
OpenBMB/MiniCPM4-0.5B series

What's Changed

[grpo] code refactor by @hjh0119 in #4097
support yarn by @tastelikefeet in #4197
fix ppo init model by @hjh0119 in #4199
fix ppo reward model by @hjh0119 in #4200
[doc] remove vllm version warning in grpo by @hjh0119 in #4204
[grpo] fix colocate + tp by @hjh0119 in #4209
Refactor packing by @Jintao-Huang in #4207
[grpo] set system in inputs by @hjh0119 in #4214
fix mm packing by @Jintao-Huang in #4217
fix packing multi_node by @Jintao-Huang in #4222
fix get reward model by @hjh0119 in #4225
fix val_dataset_shuffle by @Jintao-Huang in #4226
fix task type judgement in rlhf by @hjh0119 in #4228
fix eval extral args by @Yunnglin in #4227
fix loss_scale by @Jintao-Huang in #4229
update docs by @Jintao-Huang in #4235
[rlhf] prepare_model for ref_model & reduce peak memory in dpo by @hjh0119 in #4232
fix qwen2_5_vl VIDEO_TOTAL_PIXELS by @Jintao-Huang in #4236
Support super long length sft by @tastelikefeet in #4237
compat transformers 4.52 by @Jintao-Huang in #4238
update liger_kernel docs by @Jintao-Huang in #4241
[grpo] support synchronizing ref model by @hjh0119 in #4242
optimize packing io by @Jintao-Huang in #4244
fix register_post_encode_hook by @Jintao-Huang in #4247
compat megatron-core 0.11 by @Jintao-Huang in #4250
fix qwen2_5_omni by @Jintao-Huang in #4253
fix readme by @Jintao-Huang in #4256
[grpo] set v1 engine as default in external rollout by @hjh0119 in #4258
fix ddp_timeout by @Jintao-Huang in #4259
Add tqdm by @Jintao-Huang in #4260
Fix is_master by @Jintao-Huang in #4262
fix ppo zero3 by @Jintao-Huang in #4263
test link valid by @Jintao-Huang in #4265
update docs & fix quant by @Jintao-Huang in #4268
[grpo] fix external mode&multi turn by @hjh0119 in #4255
fix ulysses eval by @tastelikefeet in #4271
support IndexedDataset shard by @Jintao-Huang in #4269
Support vit_lr aligner_lr by @Jintao-Huang in #4273
support padding_free CPT/SFT by @Jintao-Huang in #4274
[grpo] fix num of reward_model > 1 by @hjh0119 in #4287
fix n > 1 with vLLM V1 Engine by @hjh0119 in #4295
update load_args by @Jintao-Huang in #4296
update swift image by @Jintao-Huang in #4309
Fix ulysses pending by @tastelikefeet in #4316
GRPO Web-UI by @slin000111 in #4285
Fix vLLM engine returning empty in stream generation by @wizyoung in #4303
[grpo] support dp in external mode by @hjh0119 in #4279
compat transformers==4.52 by @Jintao-Huang in #4308
compat transformer_engine update by @Jintao-Huang in #4317
grpo liger loss by @hjh0119 in #3781
Update internvl.py, solve the exception when setting customized INPUT_SIZE. by @guanwei49 in #4320
[megatron] Add extra args and provider support for easily customize megatron by @liuyanyi in #4240
qwen2_5_vl support video use image_dir by @Jintao-Huang in #4326
update link & update extra_megatron_kwargs by @Jintao-Huang in #4330
refactor GC & support vit_gc by @Jintao-Huang in #4336
[grpo] generation batch size & mini-batch update by @hjh0119 in #4322
[doc] fix vl training doc by @hjh0119 in #4342
Add template kwargs to Engine by @Jintao-Huang in #4343
[grpo] fix batch size in dynamic sampling by @hjh0119 in #4344
[infer] vllm remove Cached by @Jintao-Huang in #4354
[infer/deploy] vllm use v1 engine & support write_batch_size & support vllm tp & dp by @Jintao-Huang in #4345
[callback] fix logger by @Jintao-Huang in #4367
[dataset] fix LazyLLMDataset random by @Jintao-Huang in #4369
[dist] fix ddp_timeout by @Jintao-Huang in #4373
[grpo] Refactor GRPOVllmEngine by @hjh0119 in #4375
[megatron] fix save timeout & pp4 hang by @Jintao-Huang in #4381
[grpo] QwenLong-L1 reward model plugin by @hjh0119 in #4385
[megatron] fix split_dataset_ratio by @Jintao-Huang in #4391
Standardize think templates by @hjh0119 in #4395
update requirements by @hjh0119 in #4397
[deploy] fix client timeout by @Jintao-Huang in #4399
Support ulysses padding-free grpo by @tastelikefeet in #4377
[dataset] Fix multinode packing by @Jintao-Huang in #4402
fix ulysses by @tastelikefeet in #4404
[pt/sft] support use_logits_to_keep & support DeepSeek-R1-0528 by @Jintao-Huang in #4409
support DeepSeek-R1-0528-Qwen3-8B by @Jintao-Huang in #4417
Fix cmdline parsing error on Windows system by @slin000111 in #4422
fix transformers 4.52 device_map ddp by @Jintao-Huang in #4424
[dataset] fix self-cognition & load_from_cache_file by @Jintao-Huang in #4426
[dataset] add ms_logger_context by @Jintao-Huang in #4428
[model] Support MiMo-VL by @Jintao-Huang in #4429
fix model_meta by @tastelikefeet in #4431
fix emb docs by @tastelikefeet in #4434
[megatron] support megatron num_train_epochs by @Jintao-Huang in #4432
fix qwen2_5_vl awq by @Jintao-Huang in #4436
[template] fix vlm padding_free by @Jintao-Huang in #4444
[grpo] support vllm_server_base_url for vLLMClient by @hjh0119 in #4449
[grpo] Two-Sided Clipping for GRPO Trainer by @hjh0119 in #4450
[grpo] fix hang in colocate lora settings by @hjh0119 in #4451
[dpo] support dpo padding_free & dpo compat trl==0.18 by @Jintao-Huang in #4394
[seq_parallel] fix sp compute_acc by @Jintao-Huang in #4456
[train] Fix qwen2.5-vl use_cache by @Jintao-Huang in #4458
[grpo] fix base url by @hjh0119 in #4463
Fix create checkpoint symlink & grpo omni by @Jintao-Huang in #4468
Fix omni grpo by @tastelikefeet in #4469
[eval] fix eval dependence by @Yunnglin in #4472
Support qwen3 embedding by @tastelikefeet in #4357
[seq_parallel] fix sp compute_acc by @Jintao-Huang in #4474
fix by @tastelikefeet in #4475
[megatron] fix val_dataset by @Jintao-Huang in #4478
[pt/sft] Feature channel loss by @kevssim in #4405
[grpo] fix infer url by @hjh0119 in #4480
[vlm] fix llm_lora vlm_full by @Jintao-Huang in #4482
[infer] fix infer stream print by @Jintao-Huang in #4485
Fix multi modal bugs in ulysses by @tastelikefeet in #4484
[grpo] support move_model_batches for external mode by @hjh0119 in #4453
[grpo] support None reward & multi-task doc & more profiling by @hjh0119 in #4459
[grpo] update grpo check by @hjh0119 in #4493
fix sft eval by @tastelikefeet in #4494
[loss] fix vlm channel loss by @Jintao-Huang in #4497
[Dataset]add stsb positive subset by @tastelikefeet in #4502
[train] Fix vlm use_logits_to_keep by @Jintao-Huang in #4506
Support minicpm4 by @Jintao-Huang in #4508
[dataset] fix dpo emoji dataset by @Jintao-Huang in #4514
[megatron] fix pp4 by @Jintao-Huang in #4516
[qwen2.5-omni] Fix omni save checkpoint by @Jintao-Huang in #4517
[qwen2.5-omni] Fix omni get_template by @Jintao-Huang in #4518

New Contributors

@wizyoung made their first contribution in #4303
@guanwei49 made their first contribution in #4320
@liuyanyi made their first contribution in #4240

Full Changelog: v3.4.1...v3.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v3.5.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

New Contributors

Contributors

Uh oh!