@@ -228,8 +228,7 @@ public:
228228The cluster descriptor is built in two phases. In a first phase, the descriptor has only an ID.
229229In a second phase, the event range, column group, page locations and column ranges are added.
230230Both phases are populated by the RClusterDescriptorBuilder.
231- Clusters usually span across all available columns but in some cases they can describe only a subset of the columns,
232- for instance when describing friend ntuples.
231+ Clusters span across all available columns in the ntuple.
233232*/
234233// clang-format on
235234class RClusterDescriptor final {
@@ -388,9 +387,11 @@ public:
388387 friend class Internal ::RClusterDescriptorBuilder;
389388
390389 private:
391- // / Extend this RPageRange to fit the given RColumnRange, i.e. prepend as many synthetic RPageInfos as needed to
392- // / cover the range in `columnRange`. `RPageInfo`s are constructed to contain as many elements of type `element`
393- // / given a page size limit of `pageSize` (in bytes); the locator for the referenced pages is `kTypePageZero`.
390+ // / \brief Extend this RPageRange to fit the given RColumnRange.
391+ // /
392+ // / To do so, prepend as many synthetic RPageInfos as needed to cover the range in `columnRange`.
393+ // / `RPageInfo`s are constructed to contain as many elements of type `element` given a page size
394+ // / limit of `pageSize` (in bytes); the locator for the referenced pages is `kTypePageZero`.
394395 // / This function is used to make up `RPageRange`s for clusters that contain deferred columns.
395396 // / \return The number of column elements covered by the synthesized RPageInfos
396397 std::size_t ExtendToFitColumnRange (const RColumnRange &columnRange,
@@ -436,9 +437,8 @@ public:
436437
437438private:
438439 ROOT::DescriptorId_t fClusterId = ROOT::kInvalidDescriptorId ;
439- // / Clusters can be swapped by adjusting the entry offsets
440+ // / Clusters can be swapped by adjusting the entry offsets of the cluster and all ranges
440441 ROOT::NTupleSize_t fFirstEntryIndex = ROOT::kInvalidNTupleIndex ;
441- // TODO(jblomer): change to std::uint64_t
442442 ROOT::NTupleSize_t fNEntries = ROOT::kInvalidNTupleIndex ;
443443
444444 std::unordered_map<ROOT::DescriptorId_t, RColumnRange> fColumnRanges ;
@@ -515,10 +515,9 @@ public:
515515\ingroup NTuple
516516\brief Clusters are bundled in cluster groups.
517517
518- Very large ntuples or combined ntuples (chains, friends) contain multiple cluster groups. The cluster groups
519- may contain sharded clusters.
520- Every ntuple has at least one cluster group. The clusters in a cluster group are ordered corresponding to
521- the order of page locations in the page list envelope that belongs to the cluster group (see format specification)
518+ Very large ntuples can contain multiple cluster groups to organize cluster metadata.
519+ Every ntuple has at least one cluster group. The clusters in a cluster group are ordered
520+ corresponding to their first entry number.
522521*/
523522// clang-format on
524523class RClusterGroupDescriptor final {
@@ -548,7 +547,7 @@ public:
548547 RClusterGroupDescriptor &operator =(RClusterGroupDescriptor &&other) = default ;
549548
550549 RClusterGroupDescriptor Clone () const ;
551- // Creates a clone without the cluster IDs
550+ // / Creates a clone without the cluster IDs
552551 RClusterGroupDescriptor CloneSummary () const ;
553552
554553 bool operator ==(const RClusterGroupDescriptor &other) const ;
@@ -616,18 +615,21 @@ public:
616615\ingroup NTuple
617616\brief The on-storage metadata of an ntuple
618617
619- Represents the on-disk (on storage) information about an ntuple. The metadata consists of a header and one or
620- several footers. The header carries the ntuple schema, i.e. the fields and the associated columns and their
621- relationships. The footer(s) carry information about one or several clusters. For every cluster, a footer stores
622- its location and size, and for every column the range of element indexes as well as a list of pages and page
618+ Represents the on-disk (on storage) information about an ntuple. The metadata consists of a header, a footer, and
619+ potentially multiple page lists.
620+ The header carries the ntuple schema, i.e. the fields and the associated columns and their relationships.
621+ The footer carries information about one or several cluster groups and links to their page lists.
622+ For every cluster group, a page list envelope stores cluster summaries and page locations.
623+ For every cluster, it stores for every column the range of element indexes as well as a list of pages and page
623624locations.
624625
625- The descriptor provide machine-independent (de-)serialization of headers and footers, and it provides lookup routines
626+ The descriptor provides machine-independent (de-)serialization of headers and footers, and it provides lookup routines
626627for ntuple objects (pages, clusters, ...). It is supposed to be usable by all RPageStorage implementations.
627628
628629The serialization does not use standard ROOT streamers in order to not let it depend on libCore. The serialization uses
629- the concept of frames: header, footer, and substructures have a preamble with version numbers and the size of the
630- written struct. This allows for forward and backward compatibility when the metadata evolves.
630+ the concept of envelopes and frames: header, footer, and page list envelopes have a preamble with a type ID and length.
631+ Substructures are serialized in frames and have a size and number of items (for list frames). This allows for forward
632+ and backward compatibility when the metadata evolves.
631633*/
632634// clang-format on
633635class RNTupleDescriptor final {
@@ -664,22 +666,21 @@ private:
664666 std::uint64_t fNEntries = 0 ; // /< Updated by the descriptor builder when the cluster groups are added
665667 std::uint64_t fNClusters = 0 ; // /< Updated by the descriptor builder when the cluster groups are added
666668
667- /* *
668- * Once constructed by an RNTupleDescriptorBuilder, the descriptor is mostly immutable except for set of
669- * active the page locations. During the lifetime of the descriptor, page location information for clusters
670- * can be added or removed . When this happens, the generation should be increased, so that users of the
671- * descriptor know that the information changed. The generation is increased, e.g., by the page source's
672- * exclusive lock guard around the descriptor . It is used , e.g., by the descriptor cache in RNTupleReader.
673- */
669+ // / \brief The generation of the descriptor
670+ // /
671+ // / Once constructed by an RNTupleDescriptorBuilder, the descriptor is mostly immutable except for the set of
672+ // / active page locations . During the lifetime of the descriptor, page location information for clusters
673+ // / can be added or removed. When this happens, the generation should be increased, so that users of the
674+ // / descriptor know that the information changed . The generation is increased , e.g., by the page source's
675+ // / exclusive lock guard around the descriptor. It is used, e.g., by the descriptor cache in RNTupleReader.
674676 std::uint64_t fGeneration = 0 ;
675677
676678 std::unordered_map<ROOT::DescriptorId_t, RClusterGroupDescriptor> fClusterGroupDescriptors ;
677679 // / References cluster groups sorted by entry range and thus allows for binary search.
678680 // / Note that this list is empty during the descriptor building process and will only be
679681 // / created when the final descriptor is extracted from the builder.
680682 std::vector<ROOT::DescriptorId_t> fSortedClusterGroupIds ;
681- // / May contain only a subset of all the available clusters, e.g. the clusters of the current file
682- // / from a chain of files
683+ // / Potentially a subset of all the available clusters
683684 std::unordered_map<ROOT::DescriptorId_t, RClusterDescriptor> fClusterDescriptors ;
684685
685686 // We don't expose this publicly because when we add sharded clusters, this interface does not make sense anymore
@@ -705,9 +706,9 @@ public:
705706 // / If set to true, projected fields will be reconstructed as such. This will prevent the model to be used
706707 // / with an RNTupleReader, but it is useful, e.g., to accurately merge data.
707708 bool fReconstructProjections = false ;
708- // / Normally creating a model will fail if any of the reconstructed fields contains an unknown column type.
709709 // / If this option is enabled, the model will be created and all fields containing unknown data (directly
710710 // / or indirectly) will be skipped instead.
711+ // / Normally creating a model will fail if any of the reconstructed fields contains an unknown column type.
711712 bool fForwardCompatible = false ;
712713 // / If true, the model will be created without a default entry (bare model).
713714 bool fCreateBare = false ;
@@ -902,7 +903,7 @@ class RNTupleDescriptor::RFieldDescriptorIterable {
902903private:
903904 // / The associated NTuple for this range.
904905 const RNTupleDescriptor &fNTuple ;
905- // / The descriptor ids of the child fields. These may be sorted using
906+ // / The descriptor IDs of the child fields. These may be sorted using
906907 // / a comparison function.
907908 std::vector<ROOT::DescriptorId_t> fFieldChildren = {};
908909
@@ -958,8 +959,7 @@ public:
958959\ingroup NTuple
959960\brief Used to loop over all the cluster groups of an ntuple (in unspecified order)
960961
961- Enumerate all cluster group IDs from the cluster group descriptor. No specific order can be assumed, use
962- FindNextClusterGroupId and FindPrevClusterGroupId to traverse clusters groups by entry number.
962+ Enumerate all cluster group IDs from the descriptor. No specific order can be assumed.
963963*/
964964// clang-format on
965965class RNTupleDescriptor ::RClusterGroupDescriptorIterable {
@@ -1009,8 +1009,9 @@ public:
10091009\ingroup NTuple
10101010\brief Used to loop over all the clusters of an ntuple (in unspecified order)
10111011
1012- Enumerate all cluster IDs from the cluster descriptor. No specific order can be assumed, use
1013- FindNextClusterId and FindPrevClusterId to travers clusters by entry number.
1012+ Enumerate all cluster IDs from all cluster descriptors. No specific order can be assumed, use
1013+ RNTupleDescriptor::FindNextClusterId() and RNTupleDescriptor::FindPrevClusterId() to traverse
1014+ clusters by entry number.
10141015*/
10151016// clang-format on
10161017class RNTupleDescriptor ::RClusterDescriptorIterable {
@@ -1568,6 +1569,8 @@ public:
15681569 // / annotated as begin part of the header extension.
15691570 void BeginHeaderExtension ();
15701571
1572+ // / \brief Shift column IDs of alias columns by `offset`
1573+ // /
15711574 // / If the descriptor is constructed in pieces consisting of physical and alias columns
15721575 // / (regular and projected fields), the natural column order would be
15731576 // / - Physical and alias columns of piece one
0 commit comments