Skip to content

Commit 001a6b6

Browse files
committed
enable rdma for weight sync
1 parent 8cb21be commit 001a6b6

File tree

2 files changed

+3
-2
lines changed

2 files changed

+3
-2
lines changed

apps/grpo/qwen3_8b.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ policy:
4242

4343
# Trainer configuration
4444
trainer:
45-
use_dcp: true
45+
use_dcp: false
4646
use_vllm_builtin_load: true
4747
model:
4848
name: qwen3

src/forge/actors/trainer.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -403,7 +403,8 @@ async def push_weights(self, policy_version: int) -> None:
403403
else:
404404
for name, param in hf_state_dict.items():
405405
key = get_param_key(policy_version, name)
406-
await ts.put(key, param)
406+
# RDMA is still broken on GPU, so we need to copy to CPU
407+
await ts.put(key, param.detach().cpu())
407408
t.step("ts_save")
408409
t.stop()
409410
end_time = time.perf_counter()

0 commit comments

Comments
 (0)