Remove device_memory_resource inheritance from all resources and adaptors#2301
Remove device_memory_resource inheritance from all resources and adaptors#2301bdice merged 14 commits intorapidsai:stagingfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
032b3a6 to
856395e
Compare
…async_resource_ref (#2300) ## Summary Replaces `shared_ptr[device_memory_resource]` with per-subclass `unique_ptr[ConcreteType]` (owning) and `optional[device_async_resource_ref]` (non-owning reference) across all Python/Cython bindings. This is a part of #2011. There are **significant** opportunities to make this Cython code better over time but I have to get something that removes `device_memory_resource` from the Python/Cython side before I can finish migration on the C++ side (#2296). I welcome critique of this design, and ideas for how it can be improved, particularly from @vyasr @wence-. I would like to address any suggested improvements in follow-up PRs, because this changeset is necessary to unblock #2301. The changes in `cdef class DeviceMemoryResource` are perhaps the most significant changes here from a design perspective. The solution I'm going with for now is to keep the `DeviceMemoryResource` class around, as a base class for the Cython MRs, and let it handle allocate/deallocate. It owns a `optional[device_async_resource_ref]` which is used for allocation/deallocation. It's `optional` so that the class can be default-constructed (Cython requires nullary constructors), but it should never be `nullopt` except during initialization. Then, each MR class owns a `c_obj` like `unique_ptr[cuda_memory_resource]`. This is `unique_ptr` so it can be default-constructed for Cython's requirements. I chose `unique_ptr` over `optional` here to emphasize that this member is the thing that actually owns the resource. As with the `c_ref`, this should never be `nullptr` except during initialization. When an MR class is created, it initializes its `c_obj` and then constructs a `c_ref` (a member inherited from the `DeviceMemoryResource` base class). "Special" methods for an MR like getting the statistics counts go through `deref(self.c_obj)`, and "common" methods like allocate/deallocate go through `self.c_ref.value()`. ### Changes - **`.pxd` declarations**: Remove `device_memory_resource` class. Declare `device_async_resource_ref` and a `make_device_async_resource_ref()` inline C++ template that returns `optional` to work around Cython generating default-constructed temporaries for non-default-constructible types. All adaptor constructors take `device_async_resource_ref` instead of `device_memory_resource*`. - **`.pxd` class definitions**: `DeviceMemoryResource` base holds `optional[device_async_resource_ref] c_ref`; each concrete subclass holds `unique_ptr[ConcreteType] c_obj`. - **`.pyx` implementations**: All `__cinit__` methods construct via `unique_ptr` then set `c_ref` via `make_device_async_resource_ref`. Typed accessors (`pool_size`, `flush`, etc.) use `deref(self.c_obj)`. Per-device functions use `set_per_device_resource_ref`. - **`device_buffer.pyx`**: Passes `self.mr.c_ref.value()` instead of `self.mr.get_mr()`. Closes #2294
…tors Remove the device_memory_resource virtual base class inheritance from all production memory resources, adaptors, and stream_ordered_memory_resource. Resources now derive publicly from cuda::mr::shared_resource<Impl> (for stateful/adaptor types) or stand alone with direct CCCL concept methods (for stateless types). The legacy do_allocate/do_deallocate/do_is_equal virtual overrides and pointer-based per-device-resource APIs are removed. stream_ordered_memory_resource provides allocate/deallocate/allocate_sync/ deallocate_sync directly instead of through the DMR virtual dispatch. All 103 C++ tests and 1165 Python tests pass.
856395e to
a06396a
Compare
| ::rmm::detail::cccl_resource_ref> and | ||
| not is_specialization_of_v<std::remove_cv_t<OtherResourceType>, | ||
| ::rmm::detail::cccl_async_resource_ref> and | ||
| not shared_resource_cast<OtherResourceType>::value and |
There was a problem hiding this comment.
This isn't something we want to keep. I don't know how to work around the constraint satisfaction issues in NVIDIA/cccl#8037.
Minimal repro: https://godbolt.org/z/ddfn1sE8Y
There was a problem hiding this comment.
Reviewers: I discussed this issue with @ericniebler. We are going to try NVIDIA/cccl#8121 as a workaround. For now, I want to get the current series of open PRs merged (#2301, #2324, #2325), and then try to reduce the workarounds that have been introduced.
| void* ptr{nullptr}; | ||
| if (i != 0) { std::this_thread::sleep_for(std::chrono::milliseconds{100}); } | ||
| EXPECT_NO_THROW(ptr = mr.allocate(stream, allocation_size)); | ||
| EXPECT_NO_THROW(ptr = mr.allocate(stream, allocation_size, rmm::CUDA_ALLOCATION_ALIGNMENT)); |
There was a problem hiding this comment.
An explicit alignment argument is required because of the issues described in NVIDIA/cccl#8063. There is no 2-argument allocate(stream, size) overload for shared_resource<T> like there is for other CCCL MR classes. There is only allocation(stream, size, alignment) for shared_resource<T>, and it has no default alignment value.
Rewrite benchmark factory functions from shared_ptr<device_memory_resource> to any_device_resource, convert simulated_memory_resource from DMR inheritance to CCCL concepts, and change copy/move from = delete to = default on cuda_async_memory_resource, cuda_async_managed_memory_resource, sam_headroom_memory_resource, and simulated_memory_resource to satisfy CCCL resource_ref copyability requirements.
563d7d3 to
8385650
Compare
GCC 13 incorrectly resolves the injected class name to the fully-specialized type inside a class template body, even in template-template parameter position. This causes a hard error (not SFINAE) in is_specialization_of_v when checking against cccl_resource_ref and cccl_async_resource_ref in their own generic-resource constructors. Replace is_specialization_of_v<T, cccl_*_resource_ref> with dedicated is_cccl_*_resource_ref_v<T> traits defined outside the class bodies, where the forward declarations are unambiguously templates. This is correct on both GCC 13 and 14.
Inline allocation logic directly into the CCCL-compatible allocate/deallocate methods and delete the legacy do_* methods, per task 4 of the DMR inheritance removal plan.
|
|
||
| /// MR factory functions | ||
| inline auto make_cuda() { return std::make_shared<rmm::mr::cuda_memory_resource>(); } | ||
| using any_device_resource = cuda::mr::any_resource<cuda::mr::device_accessible>; |
There was a problem hiding this comment.
Is there a way to define this in a common header. We seem to have it duplicated in a few places
There was a problem hiding this comment.
I'm trying to keep usage of the any_device_resource alias to a minimum, only for internal usage. I debated removing it entirely and requiring the explicit templated type everywhere. My reasoning for this is that we may want to allow other resource accessibility properties in the future (templating classes on Properties...) and I don't want to force the design into "device only" in too many places.
| std::void_t< | ||
| decltype(std::declval<std::remove_cv_t<T>&>().get()), | ||
| std::enable_if_t<std::is_base_of_v<cuda::mr::shared_resource<std::remove_reference_t< | ||
| decltype(std::declval<T&>().get())>>, | ||
| std::remove_cv_t<T>> and | ||
| not is_specialization_of_v<std::remove_cv_t<T>, cuda::mr::shared_resource>>>> { |
There was a problem hiding this comment.
Is it possible to use requires here instead?
There was a problem hiding this comment.
No, we have to support C++17 in RMM for now. I plan to migrate to C++20 sometime after finishing the CCCL MR migration. Also this code will be completely deleted in a follow-up PR so it's fine to keep as-is for now.
| } | ||
|
|
||
| using MRFactoryFunc = std::function<std::shared_ptr<rmm::mr::device_memory_resource>()>; | ||
| using any_device_resource = cuda::mr::any_resource<cuda::mr::device_accessible>; |
There was a problem hiding this comment.
does this include resources that allocate memory that's also host accessible? i.e. does this template param mean "only device accessible"?
There was a problem hiding this comment.
No, a cuda::mr::any_resource<cuda::mr::device_accessible> can do a narrowing conversion of a host-device accessible resource.
In other words, managed and pinned memory resources are safe to store in a cuda::mr::any_resource<cuda::mr::device_accessible>, but you lose the ability to statically know that they are host-accessible once the property is downcasted away.
vuule
left a comment
There was a problem hiding this comment.
Looks good, a very straight-forward PR to review (with context) :)
| template < | ||
| typename OtherResourceType, | ||
| std::enable_if_t< | ||
| not is_specialization_of_v<std::remove_cv_t<OtherResourceType>, | ||
| cuda::mr::synchronous_resource_ref> and | ||
| not is_specialization_of_v<std::remove_cv_t<OtherResourceType>, cuda::mr::resource_ref> and | ||
| not is_cccl_resource_ref_v<std::remove_cv_t<OtherResourceType>> and | ||
| not is_cccl_async_resource_ref_v<std::remove_cv_t<OtherResourceType>> and | ||
| not shared_resource_cast<OtherResourceType>::value and | ||
| not std::is_base_of_v<rmm::mr::device_memory_resource, | ||
| std::remove_cv_t<OtherResourceType>> and | ||
| cuda::mr::synchronous_resource<OtherResourceType>>* = nullptr> |
There was a problem hiding this comment.
This is too verbose and error prone. Can we simplify this a bit? Such as introducing is_plain_synchronous_resource_v<T> and is_plain_async_resource_v<T> helper traits, then migrate to requires clauses.
There was a problem hiding this comment.
This is all deleted code, two PRs from now. It's really gross but we won't have it in the final state. See #2325.
Summary
device_memory_resourceinheritance from all memory resources (stateless, stateful, and adaptors)do_allocate/do_deallocate/do_is_equalvirtual overrides from all resourcesshared_ptr<device_memory_resource>toany_device_resourcesimulated_memory_resourcefrom DMR inheritance to CCCL concepts= deleteto= defaultoncuda_async_memory_resource,cuda_async_managed_memory_resource,sam_headroom_memory_resource, andsimulated_memory_resource(required for CCCLresource_refcopyability viashared_resourcebase)Closes #2295
Part of #2011