From d41ba5ef571baf6dd83146ba2cdd308735aa0e53 Mon Sep 17 00:00:00 2001 From: Jakob Blomer Date: Mon, 6 Oct 2025 12:01:13 +0200 Subject: [PATCH] [NFC][ntuple] add schema evolution docs --- tree/ntuple/doc/SchemaEvolution.md | 213 +++++++++++++++++++++++++++++ 1 file changed, 213 insertions(+) create mode 100644 tree/ntuple/doc/SchemaEvolution.md diff --git a/tree/ntuple/doc/SchemaEvolution.md b/tree/ntuple/doc/SchemaEvolution.md new file mode 100644 index 0000000000000..764f0ca3900bd --- /dev/null +++ b/tree/ntuple/doc/SchemaEvolution.md @@ -0,0 +1,213 @@ +# Schema Evolution + +Schema evolution is the capability of the ROOT I/O to read data +into in-memory models that are different but compatible to the on-disk schema. + +Schema evolution allows for data models to evolve over time +such that old data can be read into current models ("backward compatibility") +and old software can read newer data models ("forward compatibility"). +For instance, data model authors may over time add and reorder class members, change data types +(e.g. `std::vector` --> `ROOT::RVec`), rename classes, etc. + +ROOT applies automatic schema evolution rules for common, safe and unambiguous cases. +Users can complement the automatic rules by manual schema evolution ("I/O customization rules") +where custom code snippets implement the transformation logic. +In case neither automatic nor any of the provided I/O customization rules suffice +to transform the on-disk schema into the in-memory model, ROOT will error out and refrain from reading data. + +This document describes schema evolution support implemented in RNTuple. +For the most part, schema evolution works identical across the different ROOT I/O systems (TFile, TTree, RNTuple). +The exceptions are listed in the last section of this document. + +## Automatic schema evolution + +ROOT applies a number of rules to read data transparently into in-memory models +that are not an exact match to the on-disk schema. +The automatic rules apply recursively to compound types (classes, tuples, collections, etc.); +the outer types are evolved before the inner types. + +Automatic schema evolution rules transform native _types_ as well as the _shape_ of user-defined classes +as listed in the following, exhaustive tables. + +### Class shape transformations + +User-defined classes can automatically evolve their layout in the following ways. +Note that users should increase the class version number when the layout changes. + +| Layout Change | Also supported in Untyped Records | Comment | +| --------------------------------------- | --------------------------------- | ---------------------------------------------------- | +| Remove member | Yes | Match by member name | +| Add member | Yes | Match by member name, new member default-initialized | +| Reorder members | Yes | Match by member name | +| Remove all base classes | n/a | | +| Add base class(es) where they were none | n/a | New base class members default initialized | + +Reordering and incremental addition or removal of base classes is currently unsupported +but may be supported in future RNTuple versions. + +### Type transformations + +ROOT transparently reads into in-memory types that are different from but compatible to the on-disk type. +In the following tables, `T'` denotes a type that is compatible to `T`. +This includes user-defined types that are related via a renaming rule. + +#### Plain fields + +| In-memory type | Compatible on-disk types | Comment | +| --------------------------- | --------------------------- | ----------------------| +| `bool` | `char` | | +| | `std::[u]int[8,16,32,64]_t` | | +| | enum | | +|-----------------------------|-----------------------------|-----------------------| +| `char` | `bool` | | +| | `std::[u]int[8,16,32,64]_t` | with bounds check | +| | enum | with bounds check | +|-----------------------------|-----------------------------|-----------------------| +| `std::[u]int[8,16,32,64]_t` | `bool` | | +| | `char` | | +| | `std::[u]int[8,16,32,64]_t` | with bounds check | +| | enum | with bounds check | +|-----------------------------|-----------------------------|-----------------------| +| enum | enum of different type | with bounds check | +| | | on underlying integer | +|-----------------------------|-----------------------------|-----------------------| +| float | double | | +|-----------------------------|-----------------------------|-----------------------| +| double | float | | +|-----------------------------|-----------------------------|-----------------------| +| `std::atomic` | `T'` | | + + +#### Variable-length collections + +| In-memory type | Compatible on-disk types | Comment | +| -------------------------------- | ------------------------------------ | ------------------------------------- | +| `std::vector` | `ROOT::RVec` | | +| | `std::array` | | +| | `std::[unordered_][multi]set` | | +| | `std::[unordered_][multi]map` | only `T` = `std::[pair,tuple]` | +| | `std::optional` | | +| | `std::unique_ptr` | | +| | User-defined collection of `T'` | | +| | Untyped collection of `T'` | | +|----------------------------------|--------------------------------------|---------------------------------------| +| `std::RVec` | `std::vector` | with size check | +| | `std::array` | with size check | +| | `std::[unordered_][multi]set` | with size check | +| | `std::[unordered_][multi]map` | only `T` = `std::[pair,tuple]`, | +| | | with size check | +| | `std::optional` | | +| | `std::unique_ptr` | | +| | User-defined collection of `T'` | with size check | +| | Untyped collectionof `T'` | with size check | +|----------------------------------|--------------------------------------|---------------------------------------| +| `std::[unordered_]set` | `std::[unordered_]set` | | +| | `std::[unordered_]map` | only `T` = `std::[pair,tuple]` | +|----------------------------------|--------------------------------------|---------------------------------------| +| `std::[unordered_]multiset` | `ROOT::RVec` | | +| | `std::vector` | | +| | `std::array` | | +| | `std::[unordered_][multi]set` | | +| | `std::[unordered_][multi]map` | only `T` = `std::[pair,tuple]` | +| | User-defined collection of `T'` | | +| | Untyped collection of `T'` | | +|----------------------------------|--------------------------------------|---------------------------------------| +| `std::[unordered_]map` | `std::[unordered_]map` | | +| | `std::[unordered_]set` | only `T` = `std::[pair,tuple]` | +|----------------------------------|--------------------------------------|---------------------------------------| +| `std::[unordered_]multimap` | `ROOT::RVec` | only `T` = `std::[pair,tuple]` | +| | `std::vector` | only `T` = `std::[pair,tuple]` | +| | `std::array` | only `T` = `std::[pair,tuple]` | +| | `std::[unordered_][multi]set` | only `T` = `std::[pair,tuple]` | +| | `std::[unordered_][multi]map` | | +| | User-defined collection of `T` | only `T` = `std::[pair,tuple]` | +| | Untyped collection of `T` | only `T` = `std::[pair,tuple]` | + +#### Nullable fields + +| In-memory type | Compatible on-disk types | +| -------------------- | ------------------------ | +| `std::optional` | `std::unique_ptr` | +| | `T'` | +|----------------------|--------------------------| +| `std::unique_ptr` | `std::optional` | +| | `T'` | + +#### Records + +| In-memory type | Compatible on-disk types | +| --------------------------- | -------------------------------------- | +| `std::pair` | `std::tuple` | +|-----------------------------|----------------------------------------| +| `std::tuple` | `std::pair` | +|-----------------------------|----------------------------------------| +| Untyped record | User-defined class of compatible shape | + +Note that for emulated classes, the in-memory untyped record is constructed from on-disk information. + +#### Additional rules + +All on-disk types `std::atomic` can be read into a `T` in-memory model. + +If a class property changes from using an RNTuple streamer field to a using regular RNTuple class field, +existing files with on-disk streamer fields will continue to read as streamer fields. +This can be seen as "schema evolution out of streamer fields". + +## Manual schema evolution (I/O customization rules) + +ROOT I/O customization rules allow for custom code handling the transformation +from the on-disk schema to the in-memory model. +Customization rules are part of the class dictionary. +For the exact syntax of customization rules, please refer to the ROOT manual. + +Generally, customization rules consist of + - A target class. + - Target members of the target class, i.e. those class members whose value is set by the rule. + Target members must be direct members, i.e. not part of a base class. + - A source class (possibly having a different class name than the target class) + together with class versions or class checksums + that describe all the possible on-disk class versions the rule applies to. + - Source members of the source class; the given source members will be read as the given type. + The source member will undergo schema evolution before being passed to the rule's function. + Source members can also be from a base class. + Note that there is no way to specify a base class member that has the same name as a member in the derived class. + - The custom code snippet; the code snippet has access to the (whole) target object and to the given source members. + +At runtime, for any given target member there must be at most be one applicable rule. +A source member can be read into any type compatible to its on-disk type +but any given source member can only be read into one type for a given target class +(i.e. multiple rules for the same target/source class must not use different types for the same source member). + +There are two special types of rules + 1. Pure class rename rules consisting only of source and target class + 2. Whole-object rules that have no target members + +Class rename rules (pure or not) are not transitive +(if in-memory `A` can read from on-disk `B` and in-memory `B` can read from no-disk `C`, +in-memory `A` can not automatically read from on-disk `C`). + +Note that customization rules operate on partially read objects. +Customization rules are executed after all members not subject to customization rules have been read from disk. +Whole-object rules are executed after other rules. +Otherwise, the scheduling of rules is unspecified. + +## Interplay between automatic and manual schema evolution + +The target members of I/O customization rules are exempt from automatic schema evolution +(applies to the corresponding field of the target member and all its subfields). +Otherwise, automatic and manual schema evolution work side by side. +For instance, a renamed class is still subject to automatic schema evolution. + +The source member of a customization rule is subject to the same automatic and manual schema evolution rules +as if it was normally read, e.g. in an `RNTupleView`. + +## Schema evolution differences between RNTuple and Classic I/O + +In contrast to RNTuple, TTree and TFile apply also the following automatic schema evolution rules + - Conversion between floating point and integer types + - Conversion from `unique_ptr` --> `T'` + - Complete conversion matrix of all collection types + - Insertion and removal of intermediate classes + - Move of a member between base class and derived class + - Reordering of base classes +