Tracking memory resources by achirkin · Pull Request #2973 · rapidsai/raft

achirkin · 2026-03-04T17:44:33Z

Detailed tracking of (almost) all allocations on device and host.

  // optionally pass an existing resource handle
  raft::resources res;

  // The tracking handle is a child of resource handle; it wraps all memory resources with statistics adaptors
  raft::memory_tracking_resources tracked(res, "allocations.csv", std::chrono::milliseconds(1));

  // All allocations are logged to a .csv as long as `tracked` is alive
  cuvs::neighbors::cagra::build(tracked, ...);

This produces a CSV file with sampled allocations with a timeline and NVTX correlation

timestamp_us,nvtx_depth,nvtx_range,host_current,host_total,pinned_current,pinned_total,managed_current,managed_total,device_current,device_total,workspace_current,workspace_total,large_workspace_current,large_workspace_total
198809,1,"hnsw::build<ACE>",20008,20008,0,0,0,0,148304,148304,0,0,0,0
199961,1,"hnsw::build<ACE>",20008,20008,0,0,0,0,15588304,15588304,0,0,0,0
201350,1,"hnsw::build<ACE>",0,20008,0,0,0,0,0,40385488,0,0,0,0
222216,3,"cagra::build_knn_graph<IVF-PQ>(5000000, 1536, 72)",1440000000,1440020008,0,0,0,0,0,40385488,0,0,0,0
273892,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,40385488,80770976,0,0,0,0
304183,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,40385488,80770976,0,0,4388567040,4388567040
309064,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,53860384,94245872,0,0,4388567040,4388567040
334655,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,67339295,107724783,0,0,4388567040,4388567040
385037,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,74076743,114462231,0,0,4388567040,4388567040
386129,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,80814199,121199687,0,0,4388567040,4388567040
402750,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,46099768,126913967,0,0,4388567040,4388567040
...

This can later be visualized (the visualization script is not included in the PR):

Implementation overview

NVTX

Added thread-local tracking of NVTX range stack; the calling thread shares a handle to the sampling thread to correlate the NVTX range state with allocations.

Memory resource adaptors

statistics adaptor: atomically counts allocations/deallocations for any cuda::mr-compatible resource
notifying adaptor: sets a shared "notifier" state on each event

Resource monitor

A resource monitor registers a collection of resource statistics objects, a single NVTX range handle, and a single notifier state. It spawns a new thread to sample the resource statistics at a given rate (but only when the notifier is triggered). This thread writes to a CSV output stream.

Memory tracking resources

raft::memory_tracking_resources is a child of raft::resources, thus can be used as a drop-in replacement. It replaces all known memory resource for the duration of its lifetime and manages the output file or stream if necessary.

Depends on (and includes all changes of) #2968

…nitializing make_*_scalar overloads

…sed host_resource and host_device_resource

…aw CCCL references

…urrent_device_resource()

…implementations

tfeher

Thanks Artem for the PR! This is great. As we try to maximize memory utilization, we are prone to run out of memory. This PR will be very useful to debug those issues and understand memory usage of various algorithms.

The extra memory usage tracking layer is only created if the user explicitly requests it. Therefore I do not see any issue merging this into raft. We should get this in 26.04.

I have few comments below.

My wishlist of follow up PRs:

Python API to enable memory_tracking_resource
Command line argument for cuvs-bench to enable memory tracking

cpp/include/raft/mr/statistics_adaptor.hpp

cpp/include/raft/mr/resource_monitor.hpp

cpp/include/raft/mr/notifying_adaptor.hpp

cpp/include/raft/mr/resource_monitor.hpp

cpp/include/raft/mr/notifying_adaptor.hpp

cpp/include/raft/core/detail/nvtx_range_stack.hpp

cpp/include/raft/mr/resource_monitor.hpp

Co-authored-by: Tamas Bela Feher <tfeher@nvidia.com>

…or throughput measurement

tfeher

Thanks Artem for the updates, the PR looks good to me!

cpp/include/raft/mr/resource_monitor.hpp

…ions

achirkin · 2026-03-18T07:35:54Z

Thanks Tamas for the review! Since the PR is not breaking and the change to the existing logic is minimal (maintaining NVTX names stack), I go ahead an merge it.
FYI, below is the benchmarks of tracking overheads. sample_rate_us = -1 means no tracking. The code does a large number of repeated allocations in the pool (very fast, no real cuda context calls). Hence the overheads of statistics adapter (a few atomic ops) and excessive sampling are visible:

---------------------------------------------------------------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations alloc_size      batch items_per_second sample_rate_us
---------------------------------------------------------------------------------------------------------------------------------
tracking_overhead/0/manual_time        1.60 ms         1.59 ms          776        256          0       12.5373M/s             -1
tracking_overhead/1/manual_time        2.01 ms         2.00 ms          700        256          0       9.96885M/s              0
tracking_overhead/2/manual_time        1.70 ms         1.70 ms          817        256          0       11.7602M/s              1
tracking_overhead/3/manual_time        1.70 ms         1.69 ms          819        256          0       11.7746M/s             10
tracking_overhead/4/manual_time        1.70 ms         1.69 ms          823        256          0       11.7727M/s            100
tracking_overhead/5/manual_time        1.62 ms         1.62 ms          875   1048.58k          0       12.3415M/s             -1
tracking_overhead/6/manual_time        1.81 ms         1.81 ms          609   1048.58k          0       11.0321M/s              0
tracking_overhead/7/manual_time        1.66 ms         1.66 ms          847   1048.58k          0        12.013M/s              1
tracking_overhead/8/manual_time        1.65 ms         1.65 ms          837   1048.58k          0       12.1312M/s             10
tracking_overhead/9/manual_time        1.69 ms         1.69 ms          856   1048.58k          0       11.8317M/s            100
tracking_overhead/10/manual_time      0.167 ms        0.163 ms         9088   67.1089M          0       11.9873M/s             -1
tracking_overhead/11/manual_time      0.219 ms        0.213 ms         6518   67.1089M          0        9.1249M/s              0
tracking_overhead/12/manual_time      0.177 ms        0.173 ms         7566   67.1089M          0       11.2846M/s              1
tracking_overhead/13/manual_time      0.168 ms        0.165 ms         8231   67.1089M          0       11.9152M/s             10
tracking_overhead/14/manual_time      0.167 ms        0.164 ms         8373   67.1089M          0       12.0072M/s            100
tracking_overhead/15/manual_time       1.50 ms         1.48 ms          926        256          1        13.313M/s             -1
tracking_overhead/16/manual_time       1.78 ms         1.76 ms          677        256          1       11.2217M/s              0
tracking_overhead/17/manual_time       1.59 ms         1.58 ms          858        256          1       12.5546M/s              1
tracking_overhead/18/manual_time       1.60 ms         1.59 ms          882        256          1       12.4699M/s             10
tracking_overhead/19/manual_time       1.63 ms         1.62 ms          812        256          1       12.2791M/s            100
tracking_overhead/20/manual_time      0.147 ms        0.146 ms         9466   1048.58k          1       13.6086M/s             -1
tracking_overhead/21/manual_time      0.213 ms        0.212 ms         7849   1048.58k          1       9.39716M/s              0
tracking_overhead/22/manual_time      0.160 ms        0.160 ms         8746   1048.58k          1       12.4864M/s              1
tracking_overhead/23/manual_time      0.161 ms        0.161 ms         8808   1048.58k          1       12.4082M/s             10
tracking_overhead/24/manual_time      0.158 ms        0.158 ms         8615   1048.58k          1       12.6409M/s            100

achirkin · 2026-03-18T07:36:01Z

/merge

Detailed tracking of (almost) all allocations on device and host. ```C++ // optionally pass an existing resource handle raft::resources res; // The tracking handle is a child of resource handle; it wraps all memory resources with statistics adaptors raft::memory_tracking_resources tracked(res, "allocations.csv", std::chrono::milliseconds(1)); // All allocations are logged to a .csv as long as `tracked` is alive cuvs::neighbors::cagra::build(tracked, ...); ``` This produces a CSV file with sampled allocations with a timeline and NVTX correlation ```csv timestamp_us,nvtx_depth,nvtx_range,host_current,host_total,pinned_current,pinned_total,managed_current,managed_total,device_current,device_total,workspace_current,workspace_total,large_workspace_current,large_workspace_total 198809,1,"hnsw::build<ACE>",20008,20008,0,0,0,0,148304,148304,0,0,0,0 199961,1,"hnsw::build<ACE>",20008,20008,0,0,0,0,15588304,15588304,0,0,0,0 201350,1,"hnsw::build<ACE>",0,20008,0,0,0,0,0,40385488,0,0,0,0 222216,3,"cagra::build_knn_graph<IVF-PQ>(5000000, 1536, 72)",1440000000,1440020008,0,0,0,0,0,40385488,0,0,0,0 273892,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,40385488,80770976,0,0,0,0 304183,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,40385488,80770976,0,0,4388567040,4388567040 309064,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,53860384,94245872,0,0,4388567040,4388567040 334655,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,67339295,107724783,0,0,4388567040,4388567040 385037,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,74076743,114462231,0,0,4388567040,4388567040 386129,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,80814199,121199687,0,0,4388567040,4388567040 402750,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,46099768,126913967,0,0,4388567040,4388567040 ... ``` This can later be visualized (the visualization script is not included in the PR): <img width="2100" height="1350" alt="allocations" src="https://github.com/user-attachments/assets/3f0ab942-b49b-4e09-a0ea-9181725ae05e" /> #### Implementation overview ##### NVTX Added thread-local tracking of NVTX range stack; the calling thread shares a handle to the sampling thread to correlate the NVTX range state with allocations. ##### Memory resource adaptors - statistics adaptor: atomically counts allocations/deallocations for any `cuda::mr`-compatible resource - notifying adaptor: sets a shared "notifier" state on each event ##### Resource monitor A resource monitor registers a collection of resource statistics objects, a single NVTX range handle, and a single notifier state. It spawns a new thread to sample the resource statistics at a given rate (but only when the notifier is triggered). This thread writes to a CSV output stream. ##### Memory tracking resources `raft::memory_tracking_resources` is a child of `raft::resources`, thus can be used as a drop-in replacement. It replaces all known memory resource for the duration of its lifetime and manages the output file or stream if necessary. Depends on (and includes all changes of) rapidsai#2968 Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#2973

Backport PRs that were mistakenly merged into `main`: - #2968 - #2973 Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #2983

csadorf · 2026-03-20T15:28:50Z

Would it make sense to try to upstream this kind of tooling into nsight?

achirkin and others added 25 commits February 26, 2026 09:20

Rename device_uvector_policy -> device_container_policy and add non-i…

45e2d49

…nitializing make_*_scalar overloads

Declare the new resources in raft handle

65d4570

Renamed managed policy

d86638f

Add raft::resources for pinned and managed resources and the type-era…

d6788f6

…sed host_resource and host_device_resource

Updated container policies

e7bea48

All but host memory resource are done

2514621

Simplify the implementation

49735a5

Make the host container policy use the resource concept

22b4048

Settle down with raft::mr::*et_default_host_resource()

557cc8c

Add some thread-safety

cc7a4b0

Merge branch 'main' into fea-unify-memory-resources

e77fe2a

C++17 backwards-compatibility

866211e

Merge branch 'main' into fea-unify-memory-resources

c171d84

newline

268eb1b

Add raft::mr::device_resource wrapper for cuda::mr::any_resource

5c718d6

Copy semantics and return resource refs

c5ab9c4

Rework workspace resources to avoid nesting bridge layers

6af142e

Fix the argument order in tests

ece1990

Merge branch 'main' into fea-unify-memory-resources

4dd256b

Add explicit conversion through cuda::mr refs to rmm ref

a26357d

Switch from rmm host and host_device resource reference wrappers to r…

2a90680

…aw CCCL references

Merge branch 'main' into fea-unify-memory-resources

59c3793

Prefer rmm::mr::get_current_device_resource_ref() over rmm::mr::get_c…

3a40d22

…urrent_device_resource()

Remove raft pinned and managed memory resources in favor of cuda::mr …

cce4f45

…implementations

Tracking memory resources

bc2518c

achirkin self-assigned this Mar 4, 2026

achirkin requested review from a team as code owners March 4, 2026 17:44

achirkin added feature request New feature or request non-breaking Non-breaking change labels Mar 4, 2026

github-project-automation bot added this to Unstructured Data Processing Mar 4, 2026

This was referenced Mar 4, 2026

Tracking memory resources achirkin/raft#3

Closed

Unify memory resources #2968

Merged

achirkin moved this to In Progress in Unstructured Data Processing Mar 4, 2026

achirkin moved this from In Progress to Blocked in Unstructured Data Processing Mar 4, 2026

achirkin and others added 3 commits March 5, 2026 05:18

Avoid direct resource -> rmm ref conversion to fix CI errors

ea2d7bf

Merge branch 'main' into fea-tracking-memory-resources

b0c7b54

Merge branch 'main' into fea-tracking-memory-resources

550205b

tfeher requested changes Mar 13, 2026

View reviewed changes

achirkin and others added 6 commits March 14, 2026 11:22

Merge branch 'main' into fea-tracking-memory-resources

830ec3c

Update cpp/include/raft/mr/statistics_adaptor.hpp

e35c4bf

Co-authored-by: Tamas Bela Feher <tfeher@nvidia.com>

Make sure to record the last updates when stop() is called

1f7b67f

Improve clarity via docs and more explicit constructors

90683f7

Clarify the nvtx current_range definition

2abe848

Enhance memory reporting: add local peak usage and total alloc/free f…

0f60d31

…or throughput measurement

achirkin requested a review from tfeher March 16, 2026 13:33

tfeher approved these changes Mar 17, 2026

View reviewed changes

cpp/include/raft/mr/resource_monitor.hpp Show resolved Hide resolved

achirkin added 2 commits March 17, 2026 16:26

Move the thread_local range_name_stack_instance out of a getter funct…

e10e02d

…ions

Benchmark tracking overhead

1650bcf

This was referenced Mar 18, 2026

[FEA] Python API for memory tracking resources #2982

Open

[FEA] Add memory tracking resources as a CLI option in ANN_BENCH rapidsai/cuvs#1930

Open

rapids-bot bot merged commit 1b20b78 into rapidsai:main Mar 18, 2026
147 of 149 checks passed

github-project-automation bot moved this from Blocked to Done in Unstructured Data Processing Mar 18, 2026

achirkin mentioned this pull request Mar 18, 2026

Backport memory resources PRs #2983

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking memory resources#2973

Tracking memory resources#2973
rapids-bot[bot] merged 36 commits intorapidsai:mainfrom
achirkin:fea-tracking-memory-resources

achirkin commented Mar 4, 2026 •

edited

Loading

Uh oh!

tfeher left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tfeher left a comment

Uh oh!

Uh oh!

achirkin commented Mar 18, 2026

Uh oh!

achirkin commented Mar 18, 2026

Uh oh!

Uh oh!

csadorf commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

achirkin commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation overview

NVTX

Memory resource adaptors

Resource monitor

Memory tracking resources

Uh oh!

tfeher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tfeher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

achirkin commented Mar 18, 2026

Uh oh!

achirkin commented Mar 18, 2026

Uh oh!

Uh oh!

csadorf commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

achirkin commented Mar 4, 2026 •

edited

Loading