-
Notifications
You must be signed in to change notification settings - Fork 124
Fix timing of removals from memory object maps #2980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
| } | ||
| // If runtime executes graph mempool with VM, then VA can be mapped in space | ||
| // for graph validation logic during execution. And the reason it's not unmaped | ||
| // in graph itself because the app can have a graph without a free node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gandryey This is a hack. If the graph doesn't have a free node then the runtime should add one implicitly to avoid removing / unmapping here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't add anything implicitly, since the destruction can occur outside of the graph. In other words the allocations don't belong to the graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general the object must control the location inside the maps. Our current code has explicit Add/Remove outside of mem object, which isn't right.
gandryey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely runtime should remove the ref counting for hip with DD path.
| } | ||
| // If runtime executes graph mempool with VM, then VA can be mapped in space | ||
| // for graph validation logic during execution. And the reason it's not unmaped | ||
| // in graph itself because the app can have a graph without a free node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't add anything implicitly, since the destruction can occur outside of the graph. In other words the allocations don't belong to the graph.
| } | ||
| // If runtime executes graph mempool with VM, then VA can be mapped in space | ||
| // for graph validation logic during execution. And the reason it's not unmaped | ||
| // in graph itself because the app can have a graph without a free node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general the object must control the location inside the maps. Our current code has explicit Add/Remove outside of mem object, which isn't right.
Motivation
Currently the memory object map updates (MemObjMap_ and VirtualMemObjMap_) are handled in the memory destructor. This creates issues of incorrect memory object map updates when asynchronous commands complete and release memory. One such issus is if we hipMemAddressFree an address and later reserve the same address with hipMemAddressReserve before a command retaining the memory releases. In this case, the VirtualMemObjMap_ will be invalid once the command completes as hipMemAddressReserve will not add anything to the map (as key already exists), but the command release will later remove it from the map.
Technical Details
Remove memory object map updates in destructor. Add virtualMemObjMap_ removal in hipMemAddressFree. MemObjMap_ removal is already contained in hipMemUnmap.
JIRA ID
N/A
Test Plan
Test on pytorch expandable segments test which was originally encountering a segfault.
Test Result
Test now passes.
Submission Checklist