Skip to content

Commit 348b11f

Browse files
authored
[grpo] update vllm weight sync & wake up (#4770)
* less memory during load and rollout * deprecate gc_collect_after_offload * offload context * fix without optimizer * rm gc_collect_after_offload in docs * rm gc_collect_after_offload in scripts
1 parent 5712d6a commit 348b11f

File tree

16 files changed

+77
-55
lines changed

16 files changed

+77
-55
lines changed

docs/source/BestPractices/Qwen3最佳实践.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -312,7 +312,6 @@ swift rlhf \
312312
--sleep_level 1 \
313313
--offload_model true \
314314
--offload_optimizer true \
315-
--gc_collect_after_offload true \
316315
--deepspeed zero3 \
317316
--num_infer_workers 8 \
318317
--tensor_parallel_size 1 \

docs/source/Instruction/GRPO/DeveloperGuide/奖励模型.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,6 @@ swift rlhf \
8080
--sleep_level 1 \
8181
--offload_model true \
8282
--offload_optimizer true \
83-
--gc_collect_after_offload true \
8483
--log_completions true \
8584
--deepspeed zero2
8685
```

docs/source/Instruction/GRPO/GetStarted/GRPO.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,6 @@ GRPO 训练框架支持集成高性能推理引擎(如 vLLM)来加速采样
154154
```bash
155155
--offload_optimizer true \
156156
--offload_model true \
157-
--gc_collect_after_offload true \
158157
```
159158

160159
4. 在vLLM中使用 Tensor Parallel 技术:

docs/source/Instruction/命令行参数.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -476,7 +476,6 @@ reward模型参数将在PPO、GRPO中使用。
476476
- sleep_level: 训练时释放 vLLM 显存,可选项为[0, 1], 默认为0,不释放
477477
- offload_optimizer: 是否在vLLM推理时offload optimizer参数,默认为False。
478478
- offload_model: 是否在vLLM推理时 offload 模型,默认为False。
479-
- gc_collect_after_offload: 是否在offload结束时进行gc(python gc和GPU gc),默认为False。
480479
- completion_length_limit_scope: 在多轮对话中,`max_completion_length` 的限制范围。
481480
`total`限制所有对话轮次的总输出长度不超过`max_completion_length`, `per_round`限制每一轮的输出长度。
482481
- num_iterations: 每个批次代更新次数,默认为1。

docs/source_en/BestPractices/Qwen3-Best-Practice.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -316,7 +316,6 @@ swift rlhf \
316316
--sleep_level 1 \
317317
--offload_model true \
318318
--offload_optimizer true \
319-
--gc_collect_after_offload true \
320319
--deepspeed zero3 \
321320
--num_infer_workers 8 \
322321
--tensor_parallel_size 1 \

docs/source_en/Instruction/Command-line-parameters.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -488,7 +488,6 @@ The meanings of the following parameters can be referenced [here](https://huggin
488488
- sleep_level: make vllm sleep when model is training. Options are 0 or 1, default is 0, no sleep
489489
- offload_optimizer: Whether to offload optimizer parameters during inference with vLLM. The default is `False`.
490490
- offload_model: Whether to offload the model during inference with vLLM. The default is `False`.
491-
- gc_collect_after_offload: Whether to perform garbage collection (both Python GC and GPU GC) after offloading. The default is `False`.
492491
- completion_length_limit_scope: Specifies the scope of the `max_completion_length` limit in multi-turn conversations.
493492
When set to `total`, the total output length across all turns must not exceed `max_completion_length`.
494493
When set to `per_round`, each individual turn's output length is limited separately.

docs/source_en/Instruction/GRPO/DeveloperGuide/reward_model.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,6 @@ swift rlhf \
7979
--sleep_level 1 \
8080
--offload_model true \
8181
--offload_optimizer true \
82-
--gc_collect_after_offload true \
8382
--log_completions true \
8483
--deepspeed zero2
8584
```

docs/source_en/Instruction/GRPO/GetStarted/GRPO.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,6 @@ When running in Colocate mode, out-of-memory (OOM) issues may frequently occur.
153153
```bash
154154
--offload_optimizer true \
155155
--offload_model true \
156-
--gc_collect_after_offload true \
157156
```
158157

159158
4. Use Tensor Parallelism in vLLM:

examples/train/grpo/internal/vllm_72b_4gpu.sh

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ swift rlhf \
88
--train_type lora \
99
--use_vllm true \
1010
--vllm_mode colocate \
11-
--vllm_gpu_memory_utilization 0.5 \
11+
--vllm_gpu_memory_utilization 0.55 \
1212
--vllm_max_model_len 2048 \
1313
--vllm_tensor_parallel_size 4 \
1414
--dataset AI-MO/NuminaMath-TIR#10000 \
@@ -39,5 +39,4 @@ swift rlhf \
3939
--move_model_batches 16 \
4040
--offload_optimizer true \
4141
--offload_model true \
42-
--gc_collect_after_offload true \
4342
--sleep_level 1

examples/train/grpo/internal/vllm_lora_qwenvl72b.sh

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,6 @@ swift rlhf \
4242
--async_generate false \
4343
--offload_optimizer true \
4444
--offload_model true \
45-
--gc_collect_after_offload true \
4645
--move_model_batches 40 \
4746
--sleep_level 1 \
4847
--report_to wandb \

0 commit comments

Comments
 (0)