[ntuple] Add support for efficient multi-stream reading#20257
Merged
jblomer merged 20 commits intoroot-project:masterfrom Jan 26, 2026
Merged
[ntuple] Add support for efficient multi-stream reading#20257jblomer merged 20 commits intoroot-project:masterfrom
jblomer merged 20 commits intoroot-project:masterfrom
Conversation
Test Results 22 files 22 suites 3d 11h 20m 26s ⏱️ Results for commit 29c23a4. ♻️ This comment has been updated with latest results. |
ce3fb05 to
ab99805
Compare
9e8f065 to
d803490
Compare
hahnjo
requested changes
Jan 22, 2026
Member
hahnjo
left a comment
There was a problem hiding this comment.
This looks sound in terms of design and code, some comments inline. I think the only major point is the possible iterator invalidation of the keep container in RClusterPool::GetCluster (but maybe I'm missing something there)
d803490 to
7d3f4a5
Compare
hahnjo
approved these changes
Jan 23, 2026
This is only used in unit tests. It should wait for all clusters that are scheduled for background loading. However, it should _not_ remove those clusters from the in-flight queue but just let the queue with the ready clusters sit there for pickup by GetCluster().
Pinned clusters and their successors won't be evicted from the cluster pool. This also means that the cluster pool cannot have a fixed size anymore.
Now that the pool is not fixed-size anymore, use a hash map instead of a vector.
When cleaning up entire preloaded clusters from the page pool, skip pinned clusters.
API extension to tell RNTuple about the lifetime of entries. Useful when multiple streams (threads) share a single reader. The active entry tokens are linked to the reader by a shared control block. Active entry tokens can be copied and moved and take care of the reference counting of active entry numbers to clusters, such that the corresponding clusters are pinned and unpinned as needed.
7d3f4a5 to
29c23a4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Enables a shared RNTupleReader to read multiple streams efficiently. On the page source layer, add the possibility to pin and unpin clusters. Data from pinned clusters will not be evicted from the cluster pool or the page pool.
Extend the RNTupleReader API by "active entry tokens". Active entry tokens keep an entry alive in the cache. Internally, the active entries turn into a reference counter for the corresponding cluster, so that the clusters are pinned and unpinned correctly.
Active entry tokens should provide a flexible API not only to support multiple streams but also to keep, e.g., certain (past) reference events alive.
While this functionality is different from the description in #16325, it may be the flexibility that is actually needed.
@Dr15Jones FYI
Graphical output of the tutorial: