-
Notifications
You must be signed in to change notification settings - Fork 16
[RFC] StoreInterface #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: This is a proposed interface that will be the backend of the replay buffer. I have included methods that would be useful from the point of view of making a replay buffer. This also doubles as a proposal for the actual torchstore api. Test Plan: n/a
Also @kaiyuan-li |
|
||
# TODO(yuxuanh): add this to torchstore. | ||
@abstractmethod | ||
async def release(self, key: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea: what's the inspiration here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say the trainer is at step 10 and will no longer need stuff from step 5. A reasonable thing to do would be simply mark all keys starting with replay_buffer.step_10
as released and move on, instead of waiting it to be actually deleted.
While from the torchstore side it's probably easier to implement this as instant deletion right now, it would be nice to have this semantics, for if and when we hit a scale where this matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, on the other hand, since everything is implemented in Python. It's probably fast enough to just delete instantly since we don't deallocate memory when deleting. Indeed, currently all keys are held by a single process Controller actor in torchstore right now - so it makes less sense to reinvent GC ourself.
Things do get complicated if we need to shard the controller. And it's much easier to just not make any promises.
In this regard, we should probably remove the delete[_all]
methods all together, as it would be a nightmare to do it correctly in a distributed setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be some perf gains here where we notify controller of delete and then let storage volumes garbage collect later.
|
||
# TODO(yuxuanh): add this to torchstore. | ||
@abstractmethod | ||
async def release_all(self, prefix: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference between release and delete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And delete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the difference between two functions release
and delete
I like the ideas here but in general I think as a practice I think we should only add what we need as it's needed and used. Great work! |
Summary:
This is a proposed interface that will be the backend of the replay buffer. I have included methods that would be useful from the point of view of making a replay buffer.
This also doubles as a proposal for the actual torchstore api.
Test Plan:
n/a