Unify memory resources by achirkin · Pull Request #2968 · rapidsai/raft

achirkin · 2026-02-27T16:43:55Z

Use cuda::mr::any_synchronous_resource for host, pinned, and managed resource types and give the user explicit control for host, pinned, and managed resources.

New

raft::resource::managed_memory_resource and raft::resource::pinned_memory_resource are passed to managed and pinned mdarrays during construction via corresponding container policies. This allows the user to replace/modify these resources, for example, to add logging or memory pooling.
raft::mr::get_default_host_resource and raft::mr::set_default_host_resource can be used by the user to alter the default host resource the same way. It is not stored in raft::resources handle like the other two for two reasons:
1. To mirror rmm default device resource getter/setter
2. To avoid breaking the raft::make_host_mdarray overloads that do not take raft::resources as an argument (many instances across raft and cuvs).

Changed

Use raft::mr::host_resource_ref and raft::mr::host_device_resource_ref for the non-owning semantics (defined as cuda::mr::synchronous_resource_ref with appropriate access attributes)
Use raft::host_resource and raft::host_device_resource for owning semantics (defined as cuda::mr::any_synchronous_resource with appropriate access attributes)

With these changes, raft fully switches to cuda::mr types for host and host-device resources, while still using rmm types for device async resources. Changing the latter would break a lot of cuVS and is not needed - rmm will eventually fully converge to cuda::mr anyway.

Breaking changes

Rename container policies
Reuse of a single host_container for the three types of resources.
Switch to using cuda::mr::any_synchronous_resource from std::pmr::memory_resource

The effect of this changes should be limited, because the policies are hidden behind the mdarray templates and synonyms and the std::pmr::memory_resource was introduced recently and haven't been used much.

…nitializing make_*_scalar overloads

…sed host_resource and host_device_resource

…aw CCCL references

…urrent_device_resource()

bdice · 2026-03-04T14:39:05Z

cpp/include/raft/core/host_container_policy.hpp

-template <typename T>
-struct host_container {
+template <typename T, typename MR>
+#ifdef __cpp_concepts


I think RAFT is using C++20 now so it should be safe to use requires without the #ifdef guard?

Unfortunately some components of cuvs still use C++17 and it breaks if I remove the #ifdef in this header. I figured, I'd keep it here to keep cuvs passing CI without changes.

We should get cuVS updated to C++20, RMM will be requiring C++20 soon.

bdice · 2026-03-04T15:06:31Z

cpp/include/raft/mr/managed_memory_resource.hpp

+ * Provides CUDA unified (managed) memory accessible from both host and device.
+ * Uses synchronous allocation (no stream). Binds to raft::mr::host_device_resource_ref.
+ */
+class managed_memory_resource {


This is implemented in CCCL already. Please do not introduce a new implementation of this since one already exists.

https://nvidia.github.io/cccl/unstable/libcudacxx/runtime/memory_pools.html#cuda-managed-memory-pool
https://nvidia.github.io/cccl/unstable/libcudacxx/runtime/legacy_resources.html#cuda-mr-legacy-managed-memory-resource

Use cuda::mr::legacy_managed_memory_resource on CUDA 12 and cuda::managed_memory_pool on CUDA 13 (it's considerably faster). Maybe write a factory that returns the correct resource type for your CUDA version.

Thanks for the pointer! Really nice, I replaced it with the cuda::mr::legacy_managed_memory_resource and it just worked with no other modifications. I'd prefer to keep the legacy resource for now to keep exactly the same behavior in cuVS as before this PR.
The user is be able to replace it with the CUDA 13 pool-based resource even now via raft::resource::managed_memory_resource, but we can also make it the default later.

cpp/include/raft/mr/pinned_memory_resource.hpp

…implementations

achirkin · 2026-03-04T17:47:25Z

The follow up and motivation: tracking all memory allocations #2973
Here's the changeset of that PR without the content of the current PR: achirkin#3

Testing cuvs CI against rapidsai/raft#2968

achirkin · 2026-03-05T04:27:28Z

Testing the breaking changes:

cuvs Test against raft #2968 cuvs#1882 - passing
cuml Test against raft #2968 cuml#7856 - some jobs fail due to using packaged cuvs (compiled against a different version of raft) - expected behavior

bdice · 2026-03-05T15:37:12Z

cpp/include/raft/core/resource/managed_memory_resource.hpp

+
+class managed_memory_resource_factory : public resource_factory {
+ public:
+  managed_memory_resource_factory() : mr_(cuda::mr::legacy_managed_memory_resource{}) {}


I know you said it's out of scope for now, but I recommend a follow-up PR that uses the new managed pool on CUDA 13+. It's a worthy performance boost.

Sure, I've opened an issue here #2976

bdice · 2026-03-05T15:43:18Z

cpp/include/raft/core/managed_container_policy.hpp

-
+struct managed_container_policy {
+  using element_type          = ElementType;
+  using container_type        = host_container<element_type, raft::mr::host_device_resource_ref>;


Something to be aware of: It is possible for memory resources to be host-accessible and device-accessible but not have that known statically. For example, systems with HMM or ATS have device-accessibility for memory allocated with malloc. However, that can't be known by the type alone. You have to query the accessibility at runtime.

Some systems like DGX Spark with integrated memory may perform better with a host-device accessible resource that isn't a managed memory resource (but that would require some system knowledge at runtime).

All this to say, someday we might want to refactor this to use cuda::mr::synchronous_resource_ref<> and check the accessibility at runtime rather than using cuda::mr::synchronous_resource_ref<cuda::mr::host_accessible, cuda::mr::device_accessible> which requires that accessibility to be statically known.

Thanks, that's a very important point for cuVS - we've been experimenting using various memory types on Grace Hopper and DGX Spark. I actually hoped that I could use the new resources (defined in this PR as they are right now) to do more experiments by switching the memory resources.

I think, the naming goes against the intention a little bit since we decouple the memory resources, raft resource handles, and the containers (mdarrays).
On the algorithm implementation side:

When I'm using raft::managed_mdarray and raft::get_managed_memory_resource_ref in an algorithm code, I mean more of "some (probably paged, smart) memory resource with guaranteed host and device access" rather than specifically cudaMallocManaged.

Same for the pinned - "some (probably low-level, not-paged) memory resource with guaranteed host and device access and limited support for host-device atomics".

These two allow me to implement atomic synchronization between the device and host, reduce copy overheads, or just simplify the code a little bit. I don't need/want to query the resource properties at runtime for this.

On the user side (e.g. in cuvs benchmarks), I want be able to configure the program for the target device: query the device properties, check whether ATS is available, select the most appropriate resource that fits the bill. Only then wrap it into cuda::mr::synchronous_resource_ref<cuda::mr::host_accessible, cuda::mr::device_accessible>, pass using raft::set_managed_memory_resource, and benefit from the improved performance.

tfeher

Thanks Artem for this PR, it looks good to me!

achirkin · 2026-03-14T09:32:01Z

/merge

Detailed tracking of (almost) all allocations on device and host. ```C++ // optionally pass an existing resource handle raft::resources res; // The tracking handle is a child of resource handle; it wraps all memory resources with statistics adaptors raft::memory_tracking_resources tracked(res, "allocations.csv", std::chrono::milliseconds(1)); // All allocations are logged to a .csv as long as `tracked` is alive cuvs::neighbors::cagra::build(tracked, ...); ``` This produces a CSV file with sampled allocations with a timeline and NVTX correlation ```csv timestamp_us,nvtx_depth,nvtx_range,host_current,host_total,pinned_current,pinned_total,managed_current,managed_total,device_current,device_total,workspace_current,workspace_total,large_workspace_current,large_workspace_total 198809,1,"hnsw::build<ACE>",20008,20008,0,0,0,0,148304,148304,0,0,0,0 199961,1,"hnsw::build<ACE>",20008,20008,0,0,0,0,15588304,15588304,0,0,0,0 201350,1,"hnsw::build<ACE>",0,20008,0,0,0,0,0,40385488,0,0,0,0 222216,3,"cagra::build_knn_graph<IVF-PQ>(5000000, 1536, 72)",1440000000,1440020008,0,0,0,0,0,40385488,0,0,0,0 273892,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,40385488,80770976,0,0,0,0 304183,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,40385488,80770976,0,0,4388567040,4388567040 309064,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,53860384,94245872,0,0,4388567040,4388567040 334655,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,67339295,107724783,0,0,4388567040,4388567040 385037,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,74076743,114462231,0,0,4388567040,4388567040 386129,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,80814199,121199687,0,0,4388567040,4388567040 402750,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,46099768,126913967,0,0,4388567040,4388567040 ... ``` This can later be visualized (the visualization script is not included in the PR): <img width="2100" height="1350" alt="allocations" src="https://github.com/user-attachments/assets/3f0ab942-b49b-4e09-a0ea-9181725ae05e" /> #### Implementation overview ##### NVTX Added thread-local tracking of NVTX range stack; the calling thread shares a handle to the sampling thread to correlate the NVTX range state with allocations. ##### Memory resource adaptors - statistics adaptor: atomically counts allocations/deallocations for any `cuda::mr`-compatible resource - notifying adaptor: sets a shared "notifier" state on each event ##### Resource monitor A resource monitor registers a collection of resource statistics objects, a single NVTX range handle, and a single notifier state. It spawns a new thread to sample the resource statistics at a given rate (but only when the notifier is triggered). This thread writes to a CSV output stream. ##### Memory tracking resources `raft::memory_tracking_resources` is a child of `raft::resources`, thus can be used as a drop-in replacement. It replaces all known memory resource for the duration of its lifetime and manages the output file or stream if necessary. Depends on (and includes all changes of) #2968 Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #2973

Use `cuda::mr::any_synchronous_resource` for host, pinned, and managed resource types and give the user explicit control for host, pinned, and managed resources. #### New - `raft::resource::managed_memory_resource` and `raft::resource::pinned_memory_resource` are passed to managed and pinned mdarrays during construction via corresponding container policies. This allows the user to replace/modify these resources, for example, to add logging or memory pooling. - `raft::mr::get_default_host_resource` and `raft::mr::set_default_host_resource` can be used by the user to alter the default host resource the same way. It is not stored in `raft::resources` handle like the other two for two reasons: 1. To mirror rmm default device resource getter/setter 2. To avoid breaking the `raft::make_host_mdarray` overloads that do not take `raft::resources` as an argument (many instances across raft and cuvs). #### Changed - Use `raft::mr::host_resource_ref` and `raft::mr::host_device_resource_ref` for the non-owning semantics (defined as `cuda::mr::synchronous_resource_ref` with appropriate access attributes) - Use `raft::host_resource` and `raft::host_device_resource` for owning semantics (defined as `cuda::mr::any_synchronous_resource` with appropriate access attributes) With these changes, raft fully switches to `cuda::mr` types for host and host-device resources, while still using `rmm` types for device async resources. Changing the latter would break a lot of cuVS and is not needed - `rmm` will eventually fully converge to `cuda::mr` anyway. #### Breaking changes - Rename container policies - Reuse of a single `host_container` for the three types of resources. - Switch to using `cuda::mr::any_synchronous_resource` from `std::pmr::memory_resource` The effect of this changes should be limited, because the policies are hidden behind the mdarray templates and synonyms and the `std::pmr::memory_resource` was introduced recently and haven't been used much. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Bradley Dice (https://github.com/bdice) - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#2968

Detailed tracking of (almost) all allocations on device and host. ```C++ // optionally pass an existing resource handle raft::resources res; // The tracking handle is a child of resource handle; it wraps all memory resources with statistics adaptors raft::memory_tracking_resources tracked(res, "allocations.csv", std::chrono::milliseconds(1)); // All allocations are logged to a .csv as long as `tracked` is alive cuvs::neighbors::cagra::build(tracked, ...); ``` This produces a CSV file with sampled allocations with a timeline and NVTX correlation ```csv timestamp_us,nvtx_depth,nvtx_range,host_current,host_total,pinned_current,pinned_total,managed_current,managed_total,device_current,device_total,workspace_current,workspace_total,large_workspace_current,large_workspace_total 198809,1,"hnsw::build<ACE>",20008,20008,0,0,0,0,148304,148304,0,0,0,0 199961,1,"hnsw::build<ACE>",20008,20008,0,0,0,0,15588304,15588304,0,0,0,0 201350,1,"hnsw::build<ACE>",0,20008,0,0,0,0,0,40385488,0,0,0,0 222216,3,"cagra::build_knn_graph<IVF-PQ>(5000000, 1536, 72)",1440000000,1440020008,0,0,0,0,0,40385488,0,0,0,0 273892,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,40385488,80770976,0,0,0,0 304183,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,40385488,80770976,0,0,4388567040,4388567040 309064,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,53860384,94245872,0,0,4388567040,4388567040 334655,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,67339295,107724783,0,0,4388567040,4388567040 385037,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,74076743,114462231,0,0,4388567040,4388567040 386129,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,80814199,121199687,0,0,4388567040,4388567040 402750,4,"ivf_pq::build(5000000, 1536)",1440020008,1440040016,0,0,0,0,46099768,126913967,0,0,4388567040,4388567040 ... ``` This can later be visualized (the visualization script is not included in the PR): <img width="2100" height="1350" alt="allocations" src="https://github.com/user-attachments/assets/3f0ab942-b49b-4e09-a0ea-9181725ae05e" /> #### Implementation overview ##### NVTX Added thread-local tracking of NVTX range stack; the calling thread shares a handle to the sampling thread to correlate the NVTX range state with allocations. ##### Memory resource adaptors - statistics adaptor: atomically counts allocations/deallocations for any `cuda::mr`-compatible resource - notifying adaptor: sets a shared "notifier" state on each event ##### Resource monitor A resource monitor registers a collection of resource statistics objects, a single NVTX range handle, and a single notifier state. It spawns a new thread to sample the resource statistics at a given rate (but only when the notifier is triggered). This thread writes to a CSV output stream. ##### Memory tracking resources `raft::memory_tracking_resources` is a child of `raft::resources`, thus can be used as a drop-in replacement. It replaces all known memory resource for the duration of its lifetime and manages the output file or stream if necessary. Depends on (and includes all changes of) rapidsai#2968 Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#2973

Backport PRs that were mistakenly merged into `main`: - #2968 - #2973 Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #2983

achirkin added 10 commits February 26, 2026 09:20

Rename device_uvector_policy -> device_container_policy and add non-i…

45e2d49

…nitializing make_*_scalar overloads

Declare the new resources in raft handle

65d4570

Renamed managed policy

d86638f

Add raft::resources for pinned and managed resources and the type-era…

d6788f6

…sed host_resource and host_device_resource

Updated container policies

e7bea48

All but host memory resource are done

2514621

Simplify the implementation

49735a5

Make the host container policy use the resource concept

22b4048

Settle down with raft::mr::*et_default_host_resource()

557cc8c

Add some thread-safety

cc7a4b0

achirkin self-assigned this Feb 27, 2026

achirkin requested review from a team as code owners February 27, 2026 16:43

achirkin added the DO NOT MERGE Hold off on merging; see PR for details label Feb 27, 2026

achirkin added this to Vector Search, ML, & Data Mining Release Board Feb 27, 2026

achirkin added feature request New feature or request breaking Breaking change labels Feb 27, 2026

achirkin and others added 12 commits February 27, 2026 17:44

Merge branch 'main' into fea-unify-memory-resources

e77fe2a

C++17 backwards-compatibility

866211e

Merge branch 'main' into fea-unify-memory-resources

c171d84

newline

268eb1b

Add raft::mr::device_resource wrapper for cuda::mr::any_resource

5c718d6

Copy semantics and return resource refs

c5ab9c4

Rework workspace resources to avoid nesting bridge layers

6af142e

Fix the argument order in tests

ece1990

Merge branch 'main' into fea-unify-memory-resources

4dd256b

Add explicit conversion through cuda::mr refs to rmm ref

a26357d

Switch from rmm host and host_device resource reference wrappers to r…

2a90680

…aw CCCL references

Merge branch 'main' into fea-unify-memory-resources

59c3793

achirkin removed the DO NOT MERGE Hold off on merging; see PR for details label Mar 4, 2026

achirkin moved this to In Progress in Vector Search, ML, & Data Mining Release Board Mar 4, 2026

Prefer rmm::mr::get_current_device_resource_ref() over rmm::mr::get_c…

3a40d22

…urrent_device_resource()

bdice reviewed Mar 4, 2026

View reviewed changes

Remove raft pinned and managed memory resources in favor of cuda::mr …

cce4f45

…implementations

achirkin mentioned this pull request Mar 4, 2026

Tracking memory resources #2973

Merged

achirkin added a commit to rapidsai/cuvs that referenced this pull request Mar 5, 2026

Test against raft #2968

5ef7ecf

Testing cuvs CI against rapidsai/raft#2968

achirkin mentioned this pull request Mar 5, 2026

Test against raft #2968 rapidsai/cuvs#1882

Closed

achirkin added a commit to achirkin/cuml that referenced this pull request Mar 5, 2026

Test against raft rapidsai#2968

b198468

Testing cuvs CI against rapidsai/raft#2968

achirkin mentioned this pull request Mar 5, 2026

Test against raft #2968 rapidsai/cuml#7856

Closed

achirkin mentioned this pull request Mar 5, 2026

Dry Run Protocol #2961

Open

bdice reviewed Mar 5, 2026

View reviewed changes

bdice approved these changes Mar 5, 2026

View reviewed changes

achirkin mentioned this pull request Mar 6, 2026

[ENH] Consider switching to cuda::managed_memory_pool / cuda::pinned_memory_pool as the default managed/pinned resources #2976

Open

achirkin added 2 commits March 9, 2026 05:47

Merge branch 'main' into fea-unify-memory-resources

a3fe671

Merge branch 'main' into fea-unify-memory-resources

d01efbf

This comment was marked as resolved.

Sign in to view

achirkin added 2 commits March 12, 2026 09:47

Update cpp/include/raft/mr/mmap_memory_resource.hpp

f6fc1f1

Update cpp/include/raft/mr/mmap_memory_resource.hpp

e95c67c

tfeher approved these changes Mar 13, 2026

View reviewed changes

rapids-bot bot merged commit 8d8e1ef into rapidsai:main Mar 14, 2026
79 checks passed

github-project-automation bot moved this from In Progress to Done in Vector Search, ML, & Data Mining Release Board Mar 14, 2026

achirkin mentioned this pull request Mar 18, 2026

Backport memory resources PRs #2983

Merged

Conversation

achirkin commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New

Changed

Breaking changes

Uh oh!

bdice Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

achirkin Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

bdice Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

bdice Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

achirkin Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

achirkin commented Mar 4, 2026

Uh oh!

achirkin commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bdice Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

achirkin Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

bdice Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

achirkin Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

tfeher left a comment

Choose a reason for hiding this comment

Uh oh!

achirkin commented Mar 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

achirkin commented Feb 27, 2026 •

edited

Loading

achirkin commented Mar 5, 2026 •

edited

Loading

bdice Mar 5, 2026 •

edited

Loading

achirkin Mar 6, 2026 •

edited

Loading