v3.2.1
中文版
新特性
- GRPO支持vLLM的tensor parallel模式。例子参考这里。
- GRPO支持co-locate和optimizer和model的offload,支持分批次导入权重和合并LoRA,节约显存资源,使72B模型的训练可以在四张A100上运行。例子参考这里。
- GRPO支持code ORM。最佳实践参考这里。
新模型
- Qwen/QwQ-32B系列
- inclusionAI/Ling-lite系列
New Features
- GRPO supports the tensor parallel mode of vLLM. Examples can be found here.
- GRPO supports co-locating offloading for both the optimizer and the model, allows for batch weight loading and LoRA merging, saving GPU memory resources, which enables training of a 72B model on four A100 GPUs. Examples can be found here.
- GRPO supports code ORM. Best practices can be found here.
New Models
- Qwen/QwQ-32B series
- inclusionAI/Ling-lite series
What's Changed
- Support vllm LLMEngine by @Jintao-Huang in #3370
- update publish workflows by @Jintao-Huang in #3374
- support ling by @Jintao-Huang in #3379
- Support mp mode and hybrid mode of GRPO by @tastelikefeet in #3381
- fix name by @tastelikefeet in #3382
- fix web-ui infer by @Jintao-Huang in #3384
- fix bugs by @tastelikefeet in #3385
- fix bugs by @Jintao-Huang in #3386
- support Qwen/QwQ-32B by @Jintao-Huang in #3388
- support qwq-awq by @Jintao-Huang in #3391
- support lmdeploy qwen2_5_vl by @Jintao-Huang in #3394
- update infer_save by @Jintao-Huang in #3400
- update requirements by @Jintao-Huang in #3403
- fix ollama export by @Jintao-Huang in #3406
- Fix grpo engine by @tastelikefeet in #3412
- fix infer_stream by @Jintao-Huang in #3413
- FIx some comments, add dlc script by @tastelikefeet in #3419
- add comments and docs by @tastelikefeet in #3424
- fix issue 1663 by @Jintao-Huang in #3417
- Support GRPO model and optimizer offload, and split loading model by @tastelikefeet in #3427
- update wechat by @tastelikefeet in #3430
- Fix vllm random by @tastelikefeet in #3437
- fix seed by @Jintao-Huang in #3438
- fix_base_deploy by @Jintao-Huang in #3442
- fix GRPO device mismatch by @hjh0119 in #3440
- compat vllm==0.5.1 by @Jintao-Huang in #3444
- fix grpo multimodal doc by @mi804 in #3449
- support grpo code orm by @hjh0119 in #3431
- fix GRPO seed by @Jintao-Huang in #3458
- fix grpo multi nodes by @hjh0119 in #3462
- Fix tensor parallel hang by @tastelikefeet in #3464
- fix grpo trainer zero3 always gather parameters by @tcye in #3467
- fix grpo temperature inconsistency by @hjh0119 in #3468
- fix grad_norm nan by @Jintao-Huang in #3465
- fix grad_norm by @Jintao-Huang in #3469
- update minimax by @Jintao-Huang in #3471
- Support 72b script with 4 gpus by @tastelikefeet in #3472
- refactor packing by @Jintao-Huang in #3457
- Fix some docs by @tastelikefeet in #3475
- fix grpo ddp hang by @hjh0119 in #3476
- fix moe quant by @Jintao-Huang in #3478
- Delete duplicate parameters in train_72b_4gpu.sh by @Marquis03 in #3479
- fix image by @tastelikefeet in #3480
- fix infer gptq internvl2 by @Jintao-Huang in #3481
- Resume sample by @BC-A in #3460
- fix qwen2_vl flash_attn deepspeed by @Jintao-Huang in #3484
- Fix seed of tp=1 by @tastelikefeet in #3486
- fix use_cache by @Jintao-Huang in #3487
- Fix qwen2 5 vl grounding by @Jintao-Huang in #3491
- fix ovis2 device_map by @Jintao-Huang in #3496
- fix template.decode by @Jintao-Huang in #3497
New Contributors
- @tcye made their first contribution in #3467
- @Marquis03 made their first contribution in #3479
- @BC-A made their first contribution in #3460
Full Changelog: v3.2.0...v3.2.1