-
Notifications
You must be signed in to change notification settings - Fork 28
Memory reserve or wait #688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 6 commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
013cd2b
_KiB
madsbk 894ae33
SetUpWithThreads(): accept memory_available
madsbk 9928c60
Rename Context::get_options() => Context::options()
madsbk 8d75bbe
MemoryReserveOrWait
madsbk bacfbcc
cleanup includes
madsbk 65124fe
tests
madsbk 395be4f
reserve_or_wait_or_overbook
madsbk 253f54a
Apply suggestions from code review
madsbk 5e12d14
ResReq const& request
madsbk 2796000
Request
madsbk 0fb32c9
fix cast
madsbk 98d7ead
Request: clean up ordering
madsbk 873b7b8
Request: clean up ordering
madsbk c62894c
Merge branch 'memory_reserve_or_wait' of github.com:madsbk/rapidsmpf …
madsbk f980092
doc
madsbk d74a7c1
reserve_or_wait_or_overbook: use mem_type_
madsbk bc32057
parameterized on multiple threads
madsbk 09be78d
cleanup
madsbk 7c457d2
Apply suggestion from @nirandaperera
madsbk 3ea48ac
Merge branch 'memory_reserve_or_wait' of github.com:madsbk/rapidsmpf …
madsbk 0c5c6c6
doc
madsbk 3326ced
CheckPriority: handle multiple threads
madsbk ae99954
cleanup
madsbk cb27f76
Merge branch 'main' of github.com:rapidsai/rapidsmpf into memory_rese…
madsbk f1ad5db
shutdow use shutdown
madsbk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
175 changes: 175 additions & 0 deletions
175
cpp/include/rapidsmpf/streaming/core/memory_reserve_or_wait.hpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,175 @@ | ||
| /** | ||
| * SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| */ | ||
|
|
||
| #pragma once | ||
|
|
||
| #include <optional> | ||
| #include <set> | ||
|
|
||
| #include <rapidsmpf/config.hpp> | ||
| #include <rapidsmpf/memory/memory_reservation.hpp> | ||
| #include <rapidsmpf/streaming/core/context.hpp> | ||
| #include <rapidsmpf/utils.hpp> | ||
|
|
||
| #include <coro/task.hpp> | ||
|
|
||
| namespace rapidsmpf::streaming { | ||
|
|
||
|
|
||
| /** | ||
| * @brief Asynchronous coordinator for memory reservation requests. | ||
| * | ||
| * `MemoryReserveOrWait` provides a coroutine-based mechanism for reserving | ||
| * memory with backpressure. Callers submit reservation requests via | ||
| * `reserve_or_wait()`, which suspends until enough memory is available or the | ||
| * request times out. | ||
| */ | ||
| class MemoryReserveOrWait { | ||
| public: | ||
| /** | ||
| * @brief Constructs a `MemoryReserveOrWait` instance. | ||
| * | ||
| * @param mem_type The memory type for which reservations are requested. | ||
| * @param ctx Streaming context. | ||
| * @param timeout Optional timeout duration. This timeout applies to how long pending | ||
| * requests may wait without making progress. If the timeout expires, a | ||
| * `reserve_or_wait()` returns even if no memory became available. If not explicitly | ||
| * provided, the timeout is read from the option key `"memory_reserve_timeout_ms"`, | ||
| * which defaults to 100 ms. | ||
| */ | ||
| MemoryReserveOrWait( | ||
| MemoryType mem_type, | ||
| std::shared_ptr<Context> ctx, | ||
| std::optional<Duration> timeout = std::nullopt | ||
| ); | ||
|
|
||
| ~MemoryReserveOrWait() noexcept; | ||
|
|
||
| /** | ||
| * @brief Shuts down all pending memory reservation requests. | ||
| * | ||
| * @return A coroutine that completes only after all pending requests have been | ||
| * cancelled and the periodic memory check task has exited. | ||
| */ | ||
| Node shutdown(); | ||
|
|
||
| /** | ||
| * @brief Attempts to reserve memory or waits until the reservation can be satisfied. | ||
| * | ||
| * This coroutine submits a memory reservation request and then suspends until | ||
| * either sufficient memory becomes available or no progress is made within the | ||
| * configured timeout. | ||
| * | ||
| * If the timeout expires before the request can be fulfilled, an empty | ||
| * `MemoryReservation` is returned. | ||
| * | ||
| * @param size Number of bytes to reserve. | ||
| * @param future_release_potential Estimated number of bytes the requester may release | ||
| * in the future, used as a heuristic when selecting which eligible request to satisfy | ||
| * first. | ||
| * @return A `MemoryReservation` representing the allocated memory, or an empty | ||
| * reservation if the timeout expires. | ||
| * | ||
| * @throws std::runtime_error If shutdown occurs before the request can be processed. | ||
nirandaperera marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| */ | ||
| coro::task<MemoryReservation> reserve_or_wait( | ||
| std::size_t size, std::size_t future_release_potential | ||
| ); | ||
|
|
||
| /** | ||
| * @brief Returns the number of pending memory reservation requests. | ||
| * | ||
| * It may change concurrently as requests are added or fulfilled. | ||
| * | ||
| * @return The number of outstanding reservation requests. | ||
| */ | ||
| [[nodiscard]] std::size_t size() const; | ||
|
|
||
| /** | ||
| * @brief Returns the number of iterations performed by `periodic_memory_check()`. | ||
| * | ||
| * This counter is incremented once per loop iteration inside | ||
| * `periodic_memory_check()`, and can be useful for diagnostics or testing. | ||
| * | ||
| * @return The total number of memory-check iterations executed so far. | ||
| */ | ||
| [[nodiscard]] std::size_t periodic_memory_check_counter() const; | ||
|
|
||
| private: | ||
| /** | ||
| * @brief Represents a single memory reservation request. | ||
| * | ||
| * A `ResReq` is inserted into a sorted container and processed by | ||
madsbk marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * `periodic_memory_check()`. Each request describes the amount of memory | ||
| * needed, an estimate of how much memory may be released in the future, and | ||
| * its submission order. A reference to the requester's queue is used to | ||
| * deliver the resulting `MemoryReservation` once the request is fulfilled. | ||
| * | ||
| * The ordering of `ResReq` instances is defined by `operator<`, which sorts | ||
| * lexicographically by `(size, future_release_potential, sequence_number)`. | ||
| */ | ||
| struct ResReq { | ||
| /// @brief The number of bytes requested. | ||
| std::size_t size; | ||
|
|
||
| /// @brief Estimated number of bytes expected to be released in the future. | ||
| std::size_t future_release_potential; | ||
|
|
||
| /// @brief Monotonically increasing identifier used to preserve submission order. | ||
| std::uint64_t sequence_number; | ||
|
|
||
| /// @brief Queue into which a reservation is pushed once the request is satisfied. | ||
| coro::queue<MemoryReservation>& queue; | ||
|
|
||
| /// @brief Lexicographic ordering. | ||
| friend bool operator<(ResReq const& a, ResReq const& b) { | ||
| return std::tie(a.size, a.future_release_potential, a.sequence_number) | ||
| < std::tie(b.size, b.future_release_potential, b.sequence_number); | ||
| } | ||
wence- marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| }; | ||
|
|
||
| /** | ||
| * @brief Periodically processes pending memory reservation requests. | ||
| * | ||
| * This coroutine drives the asynchronous mechanism of `MemoryReserveOrWait`. | ||
| * It repeatedly: | ||
| * - Queries the currently available memory for the configured memory type. | ||
| * - Identifies all pending reservation requests whose `size` fits within the | ||
| * available memory. | ||
| * - Among those, selects the request with the largest `future_release_potential`. | ||
| * - Fulfills the request by creating a `MemoryReservation` and pushing it into | ||
| * the requester's queue. | ||
| * | ||
| * If no reservation request can be satisfied for longer than `timeout_`, the | ||
| * coroutine forces progress by selecting the smallest pending request and | ||
| * attempting a reservation for it. This may produce an empty reservation if the | ||
| * request still cannot be satisfied. | ||
| * | ||
| * Shutdown and lifetime coordination | ||
| * ---------------------------------- | ||
| * A periodic memory check task is spawned on demand when the first pending | ||
| * request is enqueued, and it exits once all requests have been extracted. | ||
| * | ||
| * The task is spawned as a joinable coroutine. `shutdown()` and the destructor | ||
| * await the joinable task (if present) to ensure `periodic_memory_check()` has | ||
| * fully exited before object teardown. This avoids dangling references to | ||
| * members accessed by the coroutine. | ||
| * | ||
| * @return A coroutine that completes only once all pending requests have been | ||
| * extracted and all in-flight work has finished. | ||
| */ | ||
| coro::task<void> periodic_memory_check(); | ||
|
|
||
| mutable std::mutex mutex_; | ||
| std::uint64_t sequence_counter{0}; | ||
| MemoryType const mem_type_; | ||
| std::shared_ptr<Context> ctx_; | ||
| Duration const timeout_; | ||
| std::set<ResReq> reservation_requests_; | ||
| std::atomic<std::uint64_t> periodic_memory_check_counter_{0}; | ||
| std::optional<coro::task<void>> periodic_memory_check_task_; | ||
nirandaperera marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| }; | ||
|
|
||
| } // namespace rapidsmpf::streaming | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now
options()always returns a copy, rather than a const reference. Is there a particular reason for this?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplification,
config::Optionsis always backed by a shared pointer internally so the overhead is minimal