Skip to content

Commit 497961c

Browse files
authored
feat(fsdp): support pipeline mode with FSDP as training backend (RLinf#332)
Signed-off-by: Bo Dai <daibo@infini-ai.com>
1 parent de0280a commit 497961c

File tree

92 files changed

+2115
-425
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

92 files changed

+2115
-425
lines changed

.github/workflows/agent-e2e-tests.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,20 @@ jobs:
6262
source .venv/bin/activate
6363
bash tests/e2e_tests/reasoning/run.sh qwen2.5-1.5b-grpo-collocated-fsdp-vllm
6464
65+
- name: FSDP SGLang Pipeline mode
66+
timeout-minutes: 20
67+
run: |
68+
export REPO_PATH=$(pwd)
69+
source .venv/bin/activate
70+
bash tests/e2e_tests/reasoning/run.sh qwen2.5-1.5b-grpo-pipeline-fsdp-sgl
71+
72+
- name: FSDP vLLM Pipeline mode
73+
timeout-minutes: 20
74+
run: |
75+
export REPO_PATH=$(pwd)
76+
source .venv/bin/activate
77+
bash tests/e2e_tests/reasoning/run.sh qwen2.5-1.5b-grpo-pipeline-fsdp-vllm
78+
6579
- name: Clean up
6680
run: |
6781
rm -rf .venv

examples/coding_online_rl/main_coding_online_rl.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ def main(cfg) -> None:
5757
placement_strategy=singleton_placement_strategy,
5858
)
5959

60-
rollout_worker_cls = get_rollout_backend_worker(cfg, component_placement)
60+
rollout_worker_cls = get_rollout_backend_worker(cfg)
6161

6262
# Rollout group
6363
rollout_placement_strategy = component_placement.get_strategy("rollout")

examples/coding_online_rl/main_coding_rl_llm_judge.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ def main(cfg) -> None:
4343
cluster = Cluster(cluster_cfg=cfg.cluster)
4444
component_placement = ModelParallelComponentPlacement(cfg, cluster)
4545

46-
rollout_worker_cls = get_rollout_backend_worker(cfg, component_placement)
46+
rollout_worker_cls = get_rollout_backend_worker(cfg)
4747

4848
# Rollout group
4949
rollout_placement_strategy = component_placement.get_strategy("rollout")

examples/embodiment/config/behavior_openvlaoft_eval.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,7 @@ actor:
139139
adam_beta1: 0.9
140140
adam_beta2: 0.999
141141
adam_eps: 1.0e-05
142+
weight_decay: 0.01
142143
clip_grad: 10.0
143144

144145
tokenizer:

examples/embodiment/config/behavior_ppo_openvlaoft.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,7 @@ actor:
146146
adam_beta2: 0.999
147147
adam_eps: 1.0e-05
148148
clip_grad: 10.0
149+
weight_decay: 0.01
149150
critic_warmup_steps: 0
150151

151152
tokenizer:

examples/embodiment/config/isaaclab_ppo_gr00t_demo.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,7 @@ actor:
152152
adam_beta2: 0.95
153153
adam_eps: 1.0e-05
154154
clip_grad: 1.0
155+
weight_decay: 0.01
155156
critic_warmup_steps: 0
156157

157158
# Override the default values in training_backend/fsdp

examples/embodiment/config/libero_10_grpo_openpi.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,7 @@ actor:
149149
adam_beta1: 0.9
150150
adam_beta2: 0.95
151151
adam_eps: 1.0e-05
152+
weight_decay: 0.01
152153
clip_grad: 2.0
153154

154155
# Override the default values in training_backend/fsdp

examples/embodiment/config/libero_10_grpo_openpi_pi05.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,7 @@ actor:
151151
adam_beta2: 0.95
152152
adam_eps: 1.0e-05
153153
clip_grad: 1.0
154+
weight_decay: 0.01
154155

155156
# Override the default values in training_backend/fsdp
156157
fsdp_config:

examples/embodiment/config/libero_10_grpo_openvlaoft.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,7 @@ actor:
151151
adam_beta1: 0.9
152152
adam_beta2: 0.999
153153
adam_eps: 1.0e-05
154+
weight_decay: 0.01
154155
clip_grad: 1.0
155156

156157
tokenizer:

examples/embodiment/config/libero_10_grpo_openvlaoft_eval.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,7 @@ actor:
145145
adam_beta1: 0.9
146146
adam_beta2: 0.999
147147
adam_eps: 1.0e-05
148+
weight_decay: 0.01
148149
clip_grad: 1.0
149150

150151
tokenizer:

0 commit comments

Comments
 (0)