Skip to content

Conversation

allenwang28
Copy link
Contributor

@allenwang28 allenwang28 commented Sep 22, 2025

With #176 - model_state_dict is kept in the local directory and is left to the user to clean up manually.

This does two things:

  1. Deletes all versions that are not this policy version
  2. Adds model_state_dict to gitignore
  3. Removes reference actor test for now. These were good tests for compute_log_probs - unfortunately that functionality is now in the apps/grpo/main, which tries to import vLLM. Getting that working on CPU CI is out of scope for this

Tested manually on single node and with a simple test

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 22, 2025

# Delete old weight versions if they exist
if self.rank == 0:
cleanup_old_weight_versions(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know the policy isn't currently reading from these weights?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cleanup belongs to Policy since the Policy actor will know if an earlier version is still needed.
We can implement cleanup as an endpoint of Policy and we call it in the main loop to make sure everything is in sync.
(Later) once we figure out how to make the policy actors talk to each other after update_weights we can potentially move it to the inside of policy and do it automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HMM yeah good point. I think this requires more consideration. I don't want to introduce some tracking at this moment, maybe a fragile heuristic we can go with is "don't delete the last 2" since we're not going off policy more than 1? Can follow up more with #194 wdyt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HMM yeah good point. I think this requires more consideration. I don't want to introduce some tracking at this moment, maybe a fragile heuristic we can go with is "don't delete the last 2" since we're not going off policy more than 1? Can follow up more with #194 wdyt

Sounds good. add the heuristics now just to make sure nothing breaks and we can add proper evict logic later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok done!

Copy link
Contributor

@casteryh casteryh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@allenwang28 allenwang28 merged commit 8ffe0bc into meta-pytorch:main Sep 23, 2025
5 checks passed
@allenwang28 allenwang28 deleted the dcp_delete branch September 23, 2025 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants