Skip to content

[{roc,hip}fft] Revisions to accounting of system memory usage and its limit-enforcing logic#5590

Draft
regan-amd wants to merge 1 commit intodevelopfrom
users/regan-amd/sys_mem_accounting
Draft

[{roc,hip}fft] Revisions to accounting of system memory usage and its limit-enforcing logic#5590
regan-amd wants to merge 1 commit intodevelopfrom
users/regan-amd/sys_mem_accounting

Conversation

@regan-amd
Copy link
Contributor

@regan-amd regan-amd commented Mar 19, 2026

Motivation

In presence of integrated devices, "host" and "device" allocations actually share the same physical memory, yet only the former are actually monitored and accounted for. This makes the own accounting of system memory usage by rocfft-test and/or hipfft-test unreliable on platforms with integrated devices. Additionally, the arbitrarily-defined chunk of the system's total memory that is "dedicated for device allocations" is misleading because

  1. there is no guarantee that the overall usage by "device" allocations is actually maintained within that chunk's size;
  2. as currently defined, that chunk's size is not even constant as it may depend on the observed free memory at any point.

This possibly results in larger-than-acceptable amounts of system memory made available for "host" allocations at some point in the application's lifetime, with possible exhaustion of system memory used in test runs.

Technical Details

The singleton host_memory structure is renamed system_memory and

  • modified to enforced const-ness of the system's total memory throughout an application's lifetime;
  • expanded with a used_bytes member intending to monitor the overall usage of system memory, either by "host" allocations or by "device" allocations on integrated devices;
  • its get_usable_bytes() member function is revised to guarantee that the returned value is consistent with the limits to be enforced at any point (i.e., accounting for the memory already being used at the time of the query) and the observed free memory.

The template classes gpubuf_t and hostbuf_t are modified to leverage the above changes. The accounting that has been specific to (specializations of) the hostbuf_t class is replaced by the singleton system_memory structure's accounting.

Test Plan

Current tests suffice, test results on platforms with integrated devices being of particular relevance.

Test Result

Tests pass.

Submission Checklist

@regan-amd regan-amd force-pushed the users/regan-amd/sys_mem_accounting branch from 0ee0292 to 0016bdb Compare March 19, 2026 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant