Skip to content

[ntuple] Add support for efficient multi-stream reading#20257

Merged
jblomer merged 20 commits intoroot-project:masterfrom
jblomer:ntuple-informed-cache
Jan 26, 2026
Merged

[ntuple] Add support for efficient multi-stream reading#20257
jblomer merged 20 commits intoroot-project:masterfrom
jblomer:ntuple-informed-cache

Conversation

@jblomer
Copy link
Contributor

@jblomer jblomer commented Oct 31, 2025

Enables a shared RNTupleReader to read multiple streams efficiently. On the page source layer, add the possibility to pin and unpin clusters. Data from pinned clusters will not be evicted from the cluster pool or the page pool.

Extend the RNTupleReader API by "active entry tokens". Active entry tokens keep an entry alive in the cache. Internally, the active entries turn into a reference counter for the corresponding cluster, so that the clusters are pinned and unpinned correctly.

Active entry tokens should provide a flexible API not only to support multiple streams but also to keep, e.g., certain (past) reference events alive.

While this functionality is different from the description in #16325, it may be the flexibility that is actually needed.

@Dr15Jones FYI

Graphical output of the tutorial:

image

@jblomer jblomer self-assigned this Oct 31, 2025
@jblomer jblomer requested a review from couet as a code owner October 31, 2025 10:54
@jblomer jblomer marked this pull request as draft October 31, 2025 10:55
@github-actions
Copy link

github-actions bot commented Oct 31, 2025

Test Results

    22 files      22 suites   3d 11h 20m 26s ⏱️
 3 772 tests  3 772 ✅ 0 💤 0 ❌
75 027 runs  75 027 ✅ 0 💤 0 ❌

Results for commit 29c23a4.

♻️ This comment has been updated with latest results.

@jblomer jblomer force-pushed the ntuple-informed-cache branch 3 times, most recently from ce3fb05 to ab99805 Compare December 9, 2025 13:50
@jblomer jblomer marked this pull request as ready for review December 9, 2025 13:51
@jblomer jblomer requested a review from bellenot as a code owner December 9, 2025 13:51
@jblomer jblomer force-pushed the ntuple-informed-cache branch 4 times, most recently from 9e8f065 to d803490 Compare December 16, 2025 14:47
@bellenot bellenot removed their request for review December 16, 2025 16:01
Copy link
Member

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sound in terms of design and code, some comments inline. I think the only major point is the possible iterator invalidation of the keep container in RClusterPool::GetCluster (but maybe I'm missing something there)

@jblomer jblomer force-pushed the ntuple-informed-cache branch from d803490 to 7d3f4a5 Compare January 23, 2026 10:49
@jblomer jblomer requested a review from hahnjo January 23, 2026 10:52
This is only used in unit tests. It should wait for all clusters that
are scheduled for background loading. However, it should _not_ remove
those clusters from the in-flight queue but just let the queue with the
ready clusters sit there for pickup by GetCluster().
Pinned clusters and their successors won't be evicted from the cluster
pool. This also means that the cluster pool cannot have a fixed size
anymore.
Now that the pool is not fixed-size anymore, use a hash map instead of a
vector.
When cleaning up entire preloaded clusters from the page pool, skip
pinned clusters.
API extension to tell RNTuple about the lifetime of entries. Useful when
multiple streams (threads) share a single reader.

The active entry tokens are linked to the reader by a shared control
block. Active entry tokens can be copied and moved and take care of the
reference counting of active entry numbers to clusters, such that the
corresponding clusters are pinned and unpinned as needed.
@jblomer jblomer force-pushed the ntuple-informed-cache branch from 7d3f4a5 to 29c23a4 Compare January 23, 2026 15:16
@jblomer jblomer merged commit a16cd17 into root-project:master Jan 26, 2026
30 checks passed
@jblomer jblomer deleted the ntuple-informed-cache branch January 26, 2026 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants