Skip to content

Conversation

jblomer
Copy link
Contributor

@jblomer jblomer commented Aug 26, 2025

Replaces #17139 with comments in that PR incorporated.

@jblomer jblomer self-assigned this Aug 26, 2025
@jblomer jblomer requested a review from couet as a code owner August 26, 2025 08:51
@jblomer jblomer force-pushed the ntuple-tutorial-streaming-vector-v2 branch from acbebf5 to 311a905 Compare August 26, 2025 08:55
// so that never the entire vector needs to stay in memory.
// Note that we don't need to implement loading chunks of data explicitly. Simply by asking for a single vector element
// at every iteration step, the RNTuple views will take care of keeping only the currently required data pages
// in memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful here to say how one can tune the maximum amount of memory used by the StreamingVector

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the important point is to turn off the cluster cache, otherwise we will load entire clusters anyway. Beyond that, the memory consumption should be that of a page, which is determined by the input file. The reading code doesn't have much control over it...

Copy link

Test Results

    21 files      21 suites   3d 17h 35m 53s ⏱️
 3 619 tests  3 477 ✅  0 💤 142 ❌
74 235 runs  74 074 ✅ 17 💤 144 ❌

For more details on these failures, see this check.

Results for commit 311a905.

Copy link
Member

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the other fixes to RNTupleLocalRange and RNTupleCollectionView - we may consider backporting these to the next release of v6.36. Some comments inline for the ntpl016_streaming_vector.C tutorial.

constexpr char const *kNTupleName = "ntpl";
constexpr char const *kFieldName = "LargeVector";
constexpr unsigned int kNEvents = 10;
constexpr unsigned int kVectorSize = 1000000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alma9 modules_off runtime_cxxmodules=Off is slightly unhappy about this line because TGeometry.h defines const Int_t kVectorSize = 3; I guess we have to rename the constant here...

#include <vector>
#include <utility>

constexpr char const *kFileName = "ntpl015_streaming_vector.root";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constexpr char const *kFileName = "ntpl015_streaming_vector.root";
constexpr char const *kFileName = "ntpl016_streaming_vector.root";

// A lightweight iterator used in StreamingVectorView::begin() and StreamingVectorView::end().
// Used to iterate over the elements of an RNTuple on-disk vector for a certain entry.
// Dereferencing the iterator returns the corresponding value of the item view.
class iterator {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably have the usual iterator using definitions, for std::iterator_traits

// so that never the entire vector needs to stay in memory.
// Note that we don't need to implement loading chunks of data explicitly. Simply by asking for a single vector element
// at every iteration step, the RNTuple views will take care of keeping only the currently required data pages
// in memory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the important point is to turn off the cluster cache, otherwise we will load entire clusters anyway. Beyond that, the memory consumption should be that of a page, which is determined by the input file. The reading code doesn't have much control over it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants