-
Notifications
You must be signed in to change notification settings - Fork 24
Delete DCP saved checkpoints used for weight sync #210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
# Delete old weight versions if they exist | ||
if self.rank == 0: | ||
cleanup_old_weight_versions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know the policy isn't currently reading from these weights?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think cleanup belongs to Policy since the Policy actor will know if an earlier version is still needed.
We can implement cleanup as an endpoint of Policy and we call it in the main loop to make sure everything is in sync.
(Later) once we figure out how to make the policy actors talk to each other after update_weights we can potentially move it to the inside of policy and do it automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HMM yeah good point. I think this requires more consideration. I don't want to introduce some tracking at this moment, maybe a fragile heuristic we can go with is "don't delete the last 2" since we're not going off policy more than 1? Can follow up more with #194 wdyt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HMM yeah good point. I think this requires more consideration. I don't want to introduce some tracking at this moment, maybe a fragile heuristic we can go with is "don't delete the last 2" since we're not going off policy more than 1? Can follow up more with #194 wdyt
Sounds good. add the heuristics now just to make sure nothing breaks and we can add proper evict logic later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
With #176 -
model_state_dict
is kept in the local directory and is left to the user to clean up manually.This does two things:
Tested manually on single node and with a simple test