Skip to content

v3.2.1

Choose a tag to compare

@Jintao-Huang Jintao-Huang released this 14 Mar 07:07
· 994 commits to main since this release

中文版

新特性

  1. GRPO支持vLLM的tensor parallel模式。例子参考这里
  2. GRPO支持co-locate和optimizer和model的offload,支持分批次导入权重和合并LoRA,节约显存资源,使72B模型的训练可以在四张A100上运行。例子参考这里
  3. GRPO支持code ORM。最佳实践参考这里

新模型

  1. Qwen/QwQ-32B系列
  2. inclusionAI/Ling-lite系列

New Features

  1. GRPO supports the tensor parallel mode of vLLM. Examples can be found here.
  2. GRPO supports co-locating offloading for both the optimizer and the model, allows for batch weight loading and LoRA merging, saving GPU memory resources, which enables training of a 72B model on four A100 GPUs. Examples can be found here.
  3. GRPO supports code ORM. Best practices can be found here.

New Models

  1. Qwen/QwQ-32B series
  2. inclusionAI/Ling-lite series

What's Changed

New Contributors

Full Changelog: v3.2.0...v3.2.1