Skip to content

Commit 1e0b7f9

Browse files
committed
[ntuple][NFC] Add late model extension documentation
1 parent 0cc7d61 commit 1e0b7f9

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

tree/ntuple/v7/doc/architecture.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -374,6 +374,18 @@ If the buffered sink is used (default), the pages of a cluster are buffered unti
374374
On committing the cluster, all pages are sealed and sent to a _persistent sink_ in one go (vector write).
375375
Pages are also reordered to ensure locality of pages of the same column.
376376
377+
#### Late model extension
378+
For fields added to the RNTupleModel after the RNTuple schema has been created (i.e., through `RNTupleWriter::CreateModelUpdater()`), the following steps are taken:
379+
380+
1. On calling `RUpdater::BeginUpdate()`, all `REntry` instances belonging to the underlying RNTupleModel are invalidated.
381+
2. After adding the desired additional fields, calling `RUpdater::CommitUpdate()` will add the relevant fields to the footer's [schema extension record frame](./specifications.md#schema-extensions-record-frame).
382+
1. The principal columns of top-level fields and record subfields will have a non-zero first element index.
383+
These columns are referred to as "deferred columns".
384+
In particular, columns in a subfield tree of collections or variants are _not_ stored as deferred columns (see next point).
385+
2. All other columns belonging to the added (sub)fields will be written as usual.
386+
3. `RNTuple(Writer|Model)::CreateEntry()` or `RNTupleModel::CreateBareEntry()` must be used to create an `REntry` matching the new model.
387+
4. Writing continues as described in steps 2-5 above.
388+
377389
### Reading Case
378390
The reverse process is performed on reading (e.g. `RNTupleReader::LoadEntry()`, `RNTupleView` call operator).
379391
@@ -389,6 +401,11 @@ The page source can be restricted to a certain entry range.
389401
This allows for optimizing the page lists that are being read.
390402
Additionally, it allows for optimizing the cluster pool to not read-ahead beyond the limits.
391403
404+
#### Late model extension
405+
Reading an RNTuple with an extended model is transparent -- i.e., no additional interface calls are required.
406+
Internally, columns that were created as part of late model extension will have synthesized zero-initialized column ranges for the clusters that were already written before the model was extended.
407+
In addition, pages made up of 0x00 bytes are synthesized for deferred columns in the clusters that were already (partially) filled before the model was extended.
408+
392409
Storage Backends
393410
----------------
394411

0 commit comments

Comments
 (0)