v3.2.0
中文版
新特性
- GRPO支持多vLLM/lmdeploy数据并行采样,支持异步采样,参考这里。多模态GRPO实验记录参考这里。
swift deployinfer_backend为pt时支持动态batch;流式推理接口修改(break change)。swift inferinfer_backend为vllm/lmdeploy支持数据并行。参考这里。- 支持moun优化器,参考这里。
新模型
- moonshotai/Moonlight-16B-A3B-Instruct
- LLM-Research/Phi-4-mini-instruct, LLM-Research/Phi-4-multimodal-instruct
- DeepSeek-V3-awq, deepseek-r1-awq
- Baichuan-M1-14B-Instruct
新数据集
- 多模态GRPO:
- lmms-lab/multimodal-open-r1-8k-verified
- okwinds/clevr_cogen_a_train
New Features
- GRPO supports multi-vLLM/lmdeploy data parallel sampling and asynchronous sampling. For more information, refer to here. Records of multi-modal GRPO experiments can be found here.
- When
swift deployinfer_backend is set to pt, it supports dynamic batching; the streaming inference interface has been modified (breaking change). - When
swift inferinfer_backend is set to vllm/lmdeploy, it supports data parallelism. Refer to here. - Supports the muon optimizer. For more information, refer to here.
New Models
- moonshotai/Moonlight-16B-A3B-Instruct
- LLM-Research/Phi-4-mini-instruct, LLM-Research/Phi-4-multimodal-instruct
- DeepSeek-V3-awq, deepseek-r1-awq
- Baichuan-M1-14B-Instruct
New Datasets
- Multi-modal GRPO:
- lmms-lab/multimodal-open-r1-8k-verified
- okwinds/clevr_cogen_a_train
What's Changed
- fix setup.py by @Jintao-Huang in #3198
- support vllm dp by @Jintao-Huang in #3201
- update dataset & fix bugs by @Jintao-Huang in #3203
- Support multiple vllms by @tastelikefeet in #3202
- update distill docs by @tastelikefeet in #3216
- compatible with trl0.16 by @hjh0119 in #3209
- support r1 awq by @Jintao-Huang in #3206
- fix grpo old_per_token_logps by @hjh0119 in #3220
- Support the generation of JanusPro models by @DaozeZhang in #3218
- Update the JanusPro-generation by @DaozeZhang in #3221
- fix load args by @Jintao-Huang in #3226
- update docs by @Jintao-Huang in #3230
- Speed up GRPO by @tastelikefeet in #3229
- fix docs zh by @Jintao-Huang in #3231
- fix deepseek_vl2 by @Jintao-Huang in #3233
- support moonlight by @Jintao-Huang in #3232
- support muon optimizer by @Jintao-Huang in #3234
- update docs by @Jintao-Huang in #3243
- fix grpo npu vllm by @hjh0119 in #3242
- fix grpo single card by @tastelikefeet in #3246
- save val_dataset by @Jintao-Huang in #3248
- fix grpo compat transformers==4.47.* by @Jintao-Huang in #3252
- grpo_countdown & fix format reward by @mi804 in #3269
- Support the base64 format of generated images for JanusPro by @DaozeZhang in #3265
- Fix typos by @co63oc in #3266
- compat lmdeploy 0.7 by @Jintao-Huang in #3256
- fix lmdeploy by @Jintao-Huang in #3274
- GRPO+LMDeploy 0.7 by @tastelikefeet in #3277
- Support max memory by @Jintao-Huang in #3282
- add lmdeploy dp shell by @Jintao-Huang in #3284
- Support Baichuan-M1-14B-Instruct by @DaozeZhang in #3271
- fix grpo top_k by @Jintao-Huang in #3293
- fix lmdeploy mllm in grpo by @tastelikefeet in #3296
- Update FAQ by @slin000111 in #3289
- fix: error when uploading model to huggingface by @xavier-h-10 in #3297
- add multimodal clevr exp by @mi804 in #3301
- update docs by @Jintao-Huang in #3304
- [refactor] patch_vllm by @Jintao-Huang in #3306
- GRPO mllm script by @hjh0119 in #3305
- [refactor & feat] support pt dynamic batch by @Jintao-Huang in #3278
- Support ZeRO++ by @tastelikefeet in #3315
- Revert pt engine batch infer by @Jintao-Huang in #3316
- optimize model_type by @Jintao-Huang in #3318
- Fix bugs & Update docs/datasets by @Jintao-Huang in #3322
- fix grpo zero3 by @hjh0119 in #3324
- fix grpo zero3 by @hjh0119 in #3326
- compat vllm>=0.5.1 lmdeploy>=0.5.0 by @Jintao-Huang in #3332
- update external plugins by @Jintao-Huang in #3334
- fix generation_config by @Jintao-Huang in #3335
- fix check_model error by @Jintao-Huang in #3336
- update get_model_tokenizer_with_flash_attn by @Jintao-Huang in #3337
- add geoqa grpo experiment by @mi804 in #3344
- fix max_memory by @Jintao-Huang in #3347
- support phi4-multimodal by @Jintao-Huang in #3350
- fix:fix bugs in cosine reward of GRPO by @youyc22 in #3358
- Remove entry including invalid
ROADMAPlink from English & Chinese documentation by @3manifold in #3357 - update docs by @Jintao-Huang in #3349
- Support the
- update docs by @Jintao-Huang in #3365
- add grpo openr1 multimodal experiment by @mi804 in #3368
- fix swift app format by @Jintao-Huang in #3367
New Contributors
- @xavier-h-10 made their first contribution in #3297
- @youyc22 made their first contribution in #3358
- @3manifold made their first contribution in #3357
Full Changelog: v3.1.1...v3.2.0