Skip to content

Conversation

@samsja
Copy link
Member

@samsja samsja commented Jan 22, 2026

fix cpu offloading


Note

Improves correctness of model init/loading when fsdp_cpu_offload is enabled.

  • Use model.to_empty(device="cpu") when CPU offload is on; otherwise use CUDA (in load_dcp_from_hf and setup_model)
  • Add _init_buffers_post_meta() helper and call it consistently after meta-to-empty transitions
  • Add _move_buffers_to_cuda(model, config) to move buffers to CUDA (FSDP offload only manages parameters) and invoke after random init, DCP load, non-meta load, and delayed checkpoint init
  • No functional changes to FSDP/EP/LoRA logic aside from ensuring correct device/buffer placement

Written by Cursor Bugbot for commit cc3c3a6. This will update automatically on new commits. Configure here.

@samsja samsja marked this pull request as ready for review January 22, 2026 00:49
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Fix buffer init in random init path

Co-authored-by: sami <sami@primeintellect.ai>

Fix buffer device move when meta load unavailable

Co-authored-by: sami <sami@primeintellect.ai>
@samsja samsja force-pushed the sami/fix-cpu-offloading branch from f566b5c to cc3c3a6 Compare January 23, 2026 19:08
@samsja samsja merged commit 82a8241 into main Jan 23, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants