Skip to content

Commit fd893c7

Browse files
authored
[vllm] feat: retires vllm spmd mode in the codebase (verl-project#4411)
### What does this PR do? Retires the legacy SPMD rollout path and standardizes the codebase on async-only rollout for vLLM (SGLang in the next PR). All Python modules, docs, workflows, and examples now reference the async server mode exclusively; the sync/SPMD runners, helpers, and CI jobs have been removed. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: _N/A (internal task to delete SPMD support)._ - [ ] Format the PR title as `[vllm, sglang, rollout, trainer, recipe, ci, doc] refactor: remove SPMD rollout` ### Test Not run (SPMD suites deleted; async flow already covered by existing CI). ### API and Usage Example All configs/scripts must now use `actor_rollout_ref.rollout.mode=async`. Example: ```bash python -m verl.trainer.main_ppo \ ... \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.mode=async \ ... ``` ### Design & Code Changes - Deleted `verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py` and the entire SGLang SPMD engine, leaving only async implementations. Updated `BaseRollout` registry, `RolloutConfig`, and `main_ppo` to error on `mode=sync`. - Removed SPMD-specific docs, tests (`tests/workers/rollout/test_sglang_*`, `test_vllm_spmd`, `test_vllm_model_rope_scaling`), and CI steps (`.github/workflows/vllm.yml`, `sgl.yml`). Simplified lint exclusions and helper scripts accordingly. - Cleaned recipes/examples to default `rollout_mode=async` and eliminated conditional sync branches (`examples/**`, `recipe/**`, e2e scripts). Added explicit validation in agent-loop utilities and SFT runner to reject non-async requests. - Updated documentation (FS- DP/Megatron worker guides, hybrid flow, r1_ascend notes, FP8 guide) to describe async-only rollout and mention removal of the old SPMD pathway. ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting). - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: _Removed obsolete SPMD jobs; async coverage already exists._ - [ ] Once your PR is ready for CI, notify the `ci-request` channel (or Feishu group).
1 parent ab07052 commit fd893c7

File tree

24 files changed

+331
-1140
lines changed

24 files changed

+331
-1140
lines changed

.github/workflows/vllm.yml

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -124,12 +124,6 @@ jobs:
124124
- name: Test the latest vLLM Rollout async with agent loop
125125
run: |
126126
ROLLOUT_NAME=vllm pytest -svvv tests/experimental/agent_loop
127-
- name: Test the latest vLLM
128-
run: |
129-
torchrun --standalone --nnodes=1 --nproc_per_node=4 $(which pytest) -s tests/workers/rollout/rollout_vllm/test_vllm_spmd.py
130-
- name: Test the latest vLLM on model with rope scaling
131-
run: |
132-
torchrun --standalone --nnodes=1 --nproc_per_node=4 $(which pytest) -s tests/workers/rollout/rollout_vllm/test_vllm_model_rope_scaling.py
133127
# Note(haibin.lin): for any new test, please update gpu_unit_tests.yaml to avoid repeated tests
134128

135129
cleanup:

examples/grpo_trainer/run_qwen2-7b_math_megatron.sh

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,9 @@ set -x
22

33
export CUDA_DEVICE_MAX_CONNECTIONS=1 # For megatron communication/computation overlapping
44

5-
rollout_mode="sync"
6-
if [ "$rollout_mode" = "async" ]; then
7-
export VLLM_USE_V1=1
8-
return_raw_chat="True"
9-
fi
5+
rollout_mode="async"
6+
export VLLM_USE_V1=1
7+
return_raw_chat="True"
108

119
gsm8k_train_path=$HOME/data/gsm8k/train.parquet
1210
gsm8k_test_path=$HOME/data/gsm8k/test.parquet

examples/grpo_trainer/run_qwen2-7b_seq_balance.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ set -x
44
# For async rollout mode, dataset should return raw chat.
55
rollout_mode="async"
66
rollout_name="sglang" # sglang or vllm
7-
if [ "$rollout_mode" = "async" ]; then
7+
return_raw_chat="True"
8+
if [ "$rollout_name" = "vllm" ]; then
89
export VLLM_USE_V1=1
9-
return_raw_chat="True"
1010
fi
1111

1212
python3 -m verl.trainer.main_ppo \

examples/grpo_trainer/run_qwen2_5_vl-7b-sglang.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ python3 -m verl.trainer.main_ppo \
4040
actor_rollout_ref.rollout.n=5 \
4141
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=20 \
4242
actor_rollout_ref.ref.fsdp_config.param_offload=True \
43-
actor_rollout_ref.rollout.mode=sync \
43+
actor_rollout_ref.rollout.mode=async \
4444
algorithm.use_kl_in_reward=False \
4545
trainer.critic_warmup=0 \
4646
trainer.logger='["console","wandb"]' \

examples/gspo_trainer/run_qwen30b_gspo.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,9 @@ fi
131131

132132
# ===================================== Inference =====================================
133133
rollout_name=vllm
134+
if [ "$rollout_name" = "vllm" ]; then
135+
export VLLM_USE_V1=1
136+
fi
134137
infer_tp=4
135138
infer_dp=1
136139
infer_ep=1

examples/gspo_trainer/test_gspo_3b_math.sh

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,11 @@ loss_agg_mode="seq-mean-token-mean"
4747
MODEL_PATH=Qwen/Qwen2.5-3B-Instruct
4848
offload=false # it's a small model, offloading will just slow-down training
4949
rollout_engine=vllm
50-
rollout_mode=sync # can be async to speedup large scale xps
50+
rollout_mode=async
51+
return_raw_chat="True"
52+
if [ "$rollout_engine" = "vllm" ]; then
53+
export VLLM_USE_V1=1
54+
fi
5155
gpu_memory_utilization=0.8
5256
reward_manager=dapo
5357
adv_estimator=grpo
@@ -121,6 +125,7 @@ python3 -m verl.trainer.main_ppo \
121125
data.prompt_key=prompt \
122126
data.truncation='error' \
123127
data.filter_overlong_prompts=true \
128+
data.return_raw_chat=${return_raw_chat} \
124129
data.train_batch_size=${train_batch_size} \
125130
data.max_prompt_length=${max_prompt_length} \
126131
data.max_response_length=${max_response_length} \
@@ -138,7 +143,6 @@ python3 -m verl.trainer.main_ppo \
138143
actor_rollout_ref.actor.ppo_max_token_len_per_gpu=${actor_ppo_max_token_len} \
139144
actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=${infer_ppo_max_token_len} \
140145
actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=${infer_ppo_max_token_len} \
141-
actor_rollout_ref.rollout.name=vllm \
142146
actor_rollout_ref.rollout.name=${rollout_engine} \
143147
actor_rollout_ref.rollout.mode=${rollout_mode} \
144148
actor_rollout_ref.model.path="${MODEL_PATH}" \

examples/gspo_trainer/test_gspo_3b_math_slurm.sh

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,11 @@ loss_agg_mode="seq-mean-token-mean"
5151
MODEL_PATH=Qwen/Qwen2.5-3B-Instruct
5252
offload=false # it's a small model, offloading will just slow-down training
5353
rollout_engine=vllm
54-
rollout_mode=sync # can be async to speedup large scale xps
54+
rollout_mode=async
55+
return_raw_chat="True"
56+
if [ "$rollout_engine" = "vllm" ]; then
57+
export VLLM_USE_V1=1
58+
fi
5559
gpu_memory_utilization=0.8
5660
reward_manager=dapo
5761
adv_estimator=grpo
@@ -125,6 +129,7 @@ python3 -m verl.trainer.main_ppo \
125129
data.prompt_key=prompt \
126130
data.truncation='error' \
127131
data.filter_overlong_prompts=true \
132+
data.return_raw_chat=${return_raw_chat} \
128133
data.train_batch_size=${train_batch_size} \
129134
data.max_prompt_length=${max_prompt_length} \
130135
data.max_response_length=${max_response_length} \
@@ -142,7 +147,6 @@ python3 -m verl.trainer.main_ppo \
142147
actor_rollout_ref.actor.ppo_max_token_len_per_gpu=${actor_ppo_max_token_len} \
143148
actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=${infer_ppo_max_token_len} \
144149
actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu=${infer_ppo_max_token_len} \
145-
actor_rollout_ref.rollout.name=vllm \
146150
actor_rollout_ref.rollout.name=${rollout_engine} \
147151
actor_rollout_ref.rollout.mode=${rollout_mode} \
148152
actor_rollout_ref.model.path="${MODEL_PATH}" \

examples/gspo_trainer/test_gspo_qwen30b_a3b_ep.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,9 @@ offload=True
6464

6565
# gen
6666
rollout_name=vllm # vllm or sglang
67+
if [ "$rollout_name" = "vllm" ]; then
68+
export VLLM_USE_V1=1
69+
fi
6770
gen_tp=1
6871
gen_dp=4
6972
gen_ep=4

examples/ppo_trainer/run_qwen2-7b_seq_balance.sh

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,8 @@ train_files="['$gsm8k_train_path', '$math_train_path']"
99
test_files="['$gsm8k_test_path', '$math_test_path']"
1010

1111
# For async rollout mode, dataset should return raw chat.
12-
rollout_mode="sync"
13-
if [ "$rollout_mode" = "async" ]; then
14-
return_raw_chat="True"
15-
fi
12+
rollout_mode="async"
13+
return_raw_chat="True"
1614

1715
python3 -m verl.trainer.main_ppo \
1816
algorithm.adv_estimator=gae \

recipe/dapo/test_dapo_gptoss_20b_megatron.sh

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -58,13 +58,10 @@ use_dynamic_bsz=False # recommended but not necessary
5858

5959
################################################### quick config ###################################################
6060

61-
rollout_mode="sync"
61+
rollout_mode="async"
6262
rollout_name="vllm" # sglang or vllm
63-
return_raw_chat="False"
64-
if [ "$rollout_mode" = "async" ]; then
65-
export VLLM_USE_V1=1
66-
return_raw_chat="True"
67-
fi
63+
export VLLM_USE_V1=1
64+
return_raw_chat="True"
6865
dtype="bfloat16" # ["bfloat16", "float16"]
6966

7067
project_name='DAPO'

0 commit comments

Comments
 (0)