Skip to content

Remove device_memory_resource inheritance from all resources and adaptors#2301

Merged
bdice merged 14 commits intorapidsai:stagingfrom
bdice:remove-dmr-inheritance
Mar 24, 2026
Merged

Remove device_memory_resource inheritance from all resources and adaptors#2301
bdice merged 14 commits intorapidsai:stagingfrom
bdice:remove-dmr-inheritance

Conversation

@bdice
Copy link
Collaborator

@bdice bdice commented Mar 14, 2026

Summary

  • Remove device_memory_resource inheritance from all memory resources (stateless, stateful, and adaptors)
  • Remove do_allocate / do_deallocate / do_is_equal virtual overrides from all resources
  • Rewrite benchmark factory functions from shared_ptr<device_memory_resource> to any_device_resource
  • Convert simulated_memory_resource from DMR inheritance to CCCL concepts
  • Change copy/move from = delete to = default on cuda_async_memory_resource, cuda_async_managed_memory_resource, sam_headroom_memory_resource, and simulated_memory_resource (required for CCCL resource_ref copyability via shared_resource base)
  • Remove NullUpstream tests and DEVICE_MEMORY_RESOURCE_VIEW_TEST (no longer needed without DMR)

Closes #2295
Part of #2011

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 14, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@bdice bdice force-pushed the remove-dmr-inheritance branch from 032b3a6 to 856395e Compare March 17, 2026 08:16
bdice added a commit that referenced this pull request Mar 17, 2026
…async_resource_ref (#2300)

## Summary

Replaces `shared_ptr[device_memory_resource]` with per-subclass
`unique_ptr[ConcreteType]` (owning) and
`optional[device_async_resource_ref]` (non-owning reference) across all
Python/Cython bindings. This is a part of #2011.

There are **significant** opportunities to make this Cython code better
over time but I have to get something that removes
`device_memory_resource` from the Python/Cython side before I can finish
migration on the C++ side (#2296). I welcome critique of this design,
and ideas for how it can be improved, particularly from @vyasr @wence-.
I would like to address any suggested improvements in follow-up PRs,
because this changeset is necessary to unblock #2301.

The changes in `cdef class DeviceMemoryResource` are perhaps the most
significant changes here from a design perspective.

The solution I'm going with for now is to keep the
`DeviceMemoryResource` class around, as a base class for the Cython MRs,
and let it handle allocate/deallocate. It owns a
`optional[device_async_resource_ref]` which is used for
allocation/deallocation. It's `optional` so that the class can be
default-constructed (Cython requires nullary constructors), but it
should never be `nullopt` except during initialization.

Then, each MR class owns a `c_obj` like
`unique_ptr[cuda_memory_resource]`. This is `unique_ptr` so it can be
default-constructed for Cython's requirements. I chose `unique_ptr` over
`optional` here to emphasize that this member is the thing that actually
owns the resource. As with the `c_ref`, this should never be `nullptr`
except during initialization. When an MR class is created, it
initializes its `c_obj` and then constructs a `c_ref` (a member
inherited from the `DeviceMemoryResource` base class).

"Special" methods for an MR like getting the statistics counts go
through `deref(self.c_obj)`, and "common" methods like
allocate/deallocate go through `self.c_ref.value()`.

### Changes

- **`.pxd` declarations**: Remove `device_memory_resource` class.
Declare `device_async_resource_ref` and a
`make_device_async_resource_ref()` inline C++ template that returns
`optional` to work around Cython generating default-constructed
temporaries for non-default-constructible types. All adaptor
constructors take `device_async_resource_ref` instead of
`device_memory_resource*`.
- **`.pxd` class definitions**: `DeviceMemoryResource` base holds
`optional[device_async_resource_ref] c_ref`; each concrete subclass
holds `unique_ptr[ConcreteType] c_obj`.
- **`.pyx` implementations**: All `__cinit__` methods construct via
`unique_ptr` then set `c_ref` via `make_device_async_resource_ref`.
Typed accessors (`pool_size`, `flush`, etc.) use `deref(self.c_obj)`.
Per-device functions use `set_per_device_resource_ref`.
- **`device_buffer.pyx`**: Passes `self.mr.c_ref.value()` instead of
`self.mr.get_mr()`.

Closes #2294
…tors

Remove the device_memory_resource virtual base class inheritance from all
production memory resources, adaptors, and stream_ordered_memory_resource.

Resources now derive publicly from cuda::mr::shared_resource<Impl> (for
stateful/adaptor types) or stand alone with direct CCCL concept methods
(for stateless types). The legacy do_allocate/do_deallocate/do_is_equal
virtual overrides and pointer-based per-device-resource APIs are removed.

stream_ordered_memory_resource provides allocate/deallocate/allocate_sync/
deallocate_sync directly instead of through the DMR virtual dispatch.

All 103 C++ tests and 1165 Python tests pass.
@bdice bdice force-pushed the remove-dmr-inheritance branch from 856395e to a06396a Compare March 17, 2026 23:46
@bdice bdice moved this to In Progress in RMM Project Board Mar 18, 2026
@bdice bdice self-assigned this Mar 18, 2026
@bdice bdice added breaking Breaking change improvement Improvement / enhancement to an existing function labels Mar 18, 2026
@bdice bdice marked this pull request as ready for review March 18, 2026 00:01
@bdice bdice requested review from a team as code owners March 18, 2026 00:01
@bdice bdice requested review from ttnghia and vyasr and removed request for a team March 18, 2026 00:01
::rmm::detail::cccl_resource_ref> and
not is_specialization_of_v<std::remove_cv_t<OtherResourceType>,
::rmm::detail::cccl_async_resource_ref> and
not shared_resource_cast<OtherResourceType>::value and
Copy link
Collaborator Author

@bdice bdice Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't something we want to keep. I don't know how to work around the constraint satisfaction issues in NVIDIA/cccl#8037.

Minimal repro: https://godbolt.org/z/ddfn1sE8Y

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewers: I discussed this issue with @ericniebler. We are going to try NVIDIA/cccl#8121 as a workaround. For now, I want to get the current series of open PRs merged (#2301, #2324, #2325), and then try to reduce the workarounds that have been introduced.

void* ptr{nullptr};
if (i != 0) { std::this_thread::sleep_for(std::chrono::milliseconds{100}); }
EXPECT_NO_THROW(ptr = mr.allocate(stream, allocation_size));
EXPECT_NO_THROW(ptr = mr.allocate(stream, allocation_size, rmm::CUDA_ALLOCATION_ALIGNMENT));
Copy link
Collaborator Author

@bdice bdice Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An explicit alignment argument is required because of the issues described in NVIDIA/cccl#8063. There is no 2-argument allocate(stream, size) overload for shared_resource<T> like there is for other CCCL MR classes. There is only allocation(stream, size, alignment) for shared_resource<T>, and it has no default alignment value.

@bdice bdice requested a review from a team as a code owner March 19, 2026 18:11
Rewrite benchmark factory functions from shared_ptr<device_memory_resource>
to any_device_resource, convert simulated_memory_resource from DMR
inheritance to CCCL concepts, and change copy/move from = delete to
= default on cuda_async_memory_resource, cuda_async_managed_memory_resource,
sam_headroom_memory_resource, and simulated_memory_resource to satisfy
CCCL resource_ref copyability requirements.
bdice added 2 commits March 21, 2026 05:48
GCC 13 incorrectly resolves the injected class name to the
fully-specialized type inside a class template body, even in
template-template parameter position. This causes a hard error
(not SFINAE) in is_specialization_of_v when checking against
cccl_resource_ref and cccl_async_resource_ref in their own
generic-resource constructors.

Replace is_specialization_of_v<T, cccl_*_resource_ref> with
dedicated is_cccl_*_resource_ref_v<T> traits defined outside
the class bodies, where the forward declarations are unambiguously
templates. This is correct on both GCC 13 and 14.
Inline allocation logic directly into the CCCL-compatible allocate/deallocate
methods and delete the legacy do_* methods, per task 4 of the DMR inheritance
removal plan.

/// MR factory functions
inline auto make_cuda() { return std::make_shared<rmm::mr::cuda_memory_resource>(); }
using any_device_resource = cuda::mr::any_resource<cuda::mr::device_accessible>;
Copy link
Member

@mhaseeb123 mhaseeb123 Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to define this in a common header. We seem to have it duplicated in a few places

Copy link
Collaborator Author

@bdice bdice Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to keep usage of the any_device_resource alias to a minimum, only for internal usage. I debated removing it entirely and requiring the explicit templated type everywhere. My reasoning for this is that we may want to allow other resource accessibility properties in the future (templating classes on Properties...) and I don't want to force the design into "device only" in too many places.

Comment on lines +50 to +55
std::void_t<
decltype(std::declval<std::remove_cv_t<T>&>().get()),
std::enable_if_t<std::is_base_of_v<cuda::mr::shared_resource<std::remove_reference_t<
decltype(std::declval<T&>().get())>>,
std::remove_cv_t<T>> and
not is_specialization_of_v<std::remove_cv_t<T>, cuda::mr::shared_resource>>>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use requires here instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we have to support C++17 in RMM for now. I plan to migrate to C++20 sometime after finishing the CCCL MR migration. Also this code will be completely deleted in a follow-up PR so it's fine to keep as-is for now.

Copy link
Member

@mhaseeb123 mhaseeb123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few non-blocking comments

}

using MRFactoryFunc = std::function<std::shared_ptr<rmm::mr::device_memory_resource>()>;
using any_device_resource = cuda::mr::any_resource<cuda::mr::device_accessible>;
Copy link

@vuule vuule Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this include resources that allocate memory that's also host accessible? i.e. does this template param mean "only device accessible"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, a cuda::mr::any_resource<cuda::mr::device_accessible> can do a narrowing conversion of a host-device accessible resource.

In other words, managed and pinned memory resources are safe to store in a cuda::mr::any_resource<cuda::mr::device_accessible>, but you lose the ability to statically know that they are host-accessible once the property is downcasted away.

Copy link

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a very straight-forward PR to review (with context) :)

Comment on lines +219 to +230
template <
typename OtherResourceType,
std::enable_if_t<
not is_specialization_of_v<std::remove_cv_t<OtherResourceType>,
cuda::mr::synchronous_resource_ref> and
not is_specialization_of_v<std::remove_cv_t<OtherResourceType>, cuda::mr::resource_ref> and
not is_cccl_resource_ref_v<std::remove_cv_t<OtherResourceType>> and
not is_cccl_async_resource_ref_v<std::remove_cv_t<OtherResourceType>> and
not shared_resource_cast<OtherResourceType>::value and
not std::is_base_of_v<rmm::mr::device_memory_resource,
std::remove_cv_t<OtherResourceType>> and
cuda::mr::synchronous_resource<OtherResourceType>>* = nullptr>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too verbose and error prone. Can we simplify this a bit? Such as introducing is_plain_synchronous_resource_v<T> and is_plain_async_resource_v<T> helper traits, then migrate to requires clauses.

Copy link
Collaborator Author

@bdice bdice Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all deleted code, two PRs from now. It's really gross but we won't have it in the final state. See #2325.

@bdice bdice merged commit d0f847f into rapidsai:staging Mar 24, 2026
84 of 85 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in RMM Project Board Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change improvement Improvement / enhancement to an existing function

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants