Skip to content

Commit 6e982d7

Browse files
Fix some docs (#3475)
1 parent 9b9fd89 commit 6e982d7

File tree

3 files changed

+3
-0
lines changed

3 files changed

+3
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ You can contact us and communicate with us by adding our group:
7878

7979

8080
## 🎉 News
81+
- 🎁 2025.03.13: We provide a script of GRPO to train a 72B model with only 4 GPUs(4*80G), please check [here](examples/train/grpo/train_72b_4gpu.sh)
8182
- 🎁 2025.03.05: We support the hybrid mode of GRPO(rollout and actor on the same GPU, rollout sleep when actor training), meanwhile tensor parallel for GRPO, check[training script here](examples/train/grpo/multi_gpu_mp_colocate.sh)
8283
- 🎁 2025.02.21: We test the speed performance of GRPO,and with some tricks to [speed up to 300%](examples/train/grpo/full_lmdeploy.sh). WanDB charts can be found [here](https://wandb.ai/tastelikefeet/grpo_perf_test?nw=nwuseryuzezyz)
8384
- 🎁 2025.02.21: Support distill from LLM API,Please check[this example](examples/sampler/distill/distill.sh)

README_CN.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@
7474
- **模型量化**:支持AWQ、GPTQ和BNB的量化导出,导出的模型支持使用vLLM/LmDeploy推理加速,并支持继续训练。
7575

7676
## 🎉 新闻
77+
- 🎁 2025.03.13: 我们提供了一个仅使用4GPU(4*80G)来训练72B模型的脚本, 请查看[这里](examples/train/grpo/train_72b_4gpu.sh)
7778
- 🎁 2025.03.05: 支持GRPO的hybrid模式(rollout和actor在同一GPU上, rollout可以进行offload), 同时支持了vllm的tensor parallel, 查看[训练脚本](examples/train/grpo/multi_gpu_mp_colocate.sh)
7879
- 🎁 2025.02.21: 我们测试了GRPO算法的性能,并且使用一些tricks使[训练速度提高到300%](examples/train/grpo/full_lmdeploy.sh). WanDB表格请查看[这里](https://wandb.ai/tastelikefeet/grpo_perf_test?nw=nwuseryuzezyz)
7980
- 🎁 2025.02.21: 支持大模型API蒸馏采样,请查看[示例](examples/sampler/distill/distill.sh)

examples/train/grpo/train_72b_4gpu.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
# 4*80G GPU
12
CUDA_VISIBLE_DEVICES=0,1,2,3 \
23
NPROC_PER_NODE=4 \
34
swift rlhf \

0 commit comments

Comments
 (0)