fix cpu offloading #1636

samsja · 2026-01-22T00:46:18Z

fix cpu offloading

Note

Improves correctness of model init/loading when fsdp_cpu_offload is enabled.

Use model.to_empty(device="cpu") when CPU offload is on; otherwise use CUDA (in load_dcp_from_hf and setup_model)
Add _init_buffers_post_meta() helper and call it consistently after meta-to-empty transitions
Add _move_buffers_to_cuda(model, config) to move buffers to CUDA (FSDP offload only manages parameters) and invoke after random init, DCP load, non-meta load, and delayed checkpoint init
No functional changes to FSDP/EP/LoRA logic aside from ensuring correct device/buffer placement

^{Written by Cursor Bugbot for commit cc3c3a6. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

src/prime_rl/trainer/model.py

Fix buffer init in random init path Co-authored-by: sami <sami@primeintellect.ai> Fix buffer device move when meta load unavailable Co-authored-by: sami <sami@primeintellect.ai>

fix cpu offloading

635adde

samsja marked this pull request as ready for review January 22, 2026 00:49

cursor bot reviewed Jan 23, 2026

View reviewed changes

src/prime_rl/trainer/model.py Outdated Show resolved Hide resolved

src/prime_rl/trainer/model.py Show resolved Hide resolved

fix offlaod for moe

cc3c3a6

Fix buffer init in random init path Co-authored-by: sami <sami@primeintellect.ai> Fix buffer device move when meta load unavailable Co-authored-by: sami <sami@primeintellect.ai>

samsja force-pushed the sami/fix-cpu-offloading branch from f566b5c to cc3c3a6 Compare January 23, 2026 19:08

samsja merged commit 82a8241 into main Jan 23, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix cpu offloading #1636

fix cpu offloading #1636

Uh oh!

samsja commented Jan 22, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix cpu offloading #1636

fix cpu offloading #1636

Uh oh!

Conversation

samsja commented Jan 22, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

samsja commented Jan 22, 2026 •

edited by cursor bot

Loading