Skip to content

Conversation

@cadolphe-amd
Copy link
Contributor

@cadolphe-amd cadolphe-amd commented Jan 30, 2026

Motivation

Currently the memory object map updates (MemObjMap_ and VirtualMemObjMap_) are handled in the memory destructor. This creates issues of incorrect memory object map updates when asynchronous commands complete and release memory. One such issus is if we hipMemAddressFree an address and later reserve the same address with hipMemAddressReserve before a command retaining the memory releases. In this case, the VirtualMemObjMap_ will be invalid once the command completes as hipMemAddressReserve will not add anything to the map (as key already exists), but the command release will later remove it from the map.

Technical Details

Remove memory object map updates in destructor. Add virtualMemObjMap_ removal in hipMemAddressFree. MemObjMap_ removal is already contained in hipMemUnmap.

JIRA ID

N/A

Test Plan

Test on pytorch expandable segments test which was originally encountering a segfault.

Test Result

Test now passes.

Submission Checklist

@cadolphe-amd cadolphe-amd requested a review from a team as a code owner January 30, 2026 02:09
@cadolphe-amd cadolphe-amd changed the title Fix timing of updates to memory object maps Fix timing of removals to memory object maps Jan 30, 2026
@cadolphe-amd cadolphe-amd changed the title Fix timing of removals to memory object maps Fix timing of removals from memory object maps Jan 30, 2026
}
// If runtime executes graph mempool with VM, then VA can be mapped in space
// for graph validation logic during execution. And the reason it's not unmaped
// in graph itself because the app can have a graph without a free node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gandryey This is a hack. If the graph doesn't have a free node then the runtime should add one implicitly to avoid removing / unmapping here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't add anything implicitly, since the destruction can occur outside of the graph. In other words the allocations don't belong to the graph.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the object must control the location inside the maps. Our current code has explicit Add/Remove outside of mem object, which isn't right.

Copy link
Contributor

@gandryey gandryey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely runtime should remove the ref counting for hip with DD path.

}
// If runtime executes graph mempool with VM, then VA can be mapped in space
// for graph validation logic during execution. And the reason it's not unmaped
// in graph itself because the app can have a graph without a free node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't add anything implicitly, since the destruction can occur outside of the graph. In other words the allocations don't belong to the graph.

}
// If runtime executes graph mempool with VM, then VA can be mapped in space
// for graph validation logic during execution. And the reason it's not unmaped
// in graph itself because the app can have a graph without a free node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the object must control the location inside the maps. Our current code has explicit Add/Remove outside of mem object, which isn't right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants