Skip to content

Commit a8d2a1f

Browse files
committed
[NFC][ntuple] add schema evolution docs
1 parent c3e4cad commit a8d2a1f

File tree

1 file changed

+209
-0
lines changed

1 file changed

+209
-0
lines changed

tree/ntuple/doc/SchemaEvolution.md

Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
# Schema Evolution
2+
3+
Schema evolution is the capability of the ROOT I/O to read data
4+
into in-memory models that are different but compatible to the on-disk schema.
5+
6+
Schema evolution allows for data models to evolve over time
7+
such that old data can be read into current models ("backward compatibility")
8+
and old software can read newer data models ("forward compatibility").
9+
For instance, data model authors may over time add and reorder class members, change data types
10+
(e.g. `std::vector<float>` --> `ROOT::RVec<double>`), rename classes, etc.
11+
12+
ROOT applies automatic schema evolution rules for common, safe and unambiguous cases.
13+
Users can complement the automatic rules by manual schema evolution ("I/O read rules")
14+
where custom code snippets implement the transformation logic.
15+
In case neither automatic nor manual schema evolution rules suffice
16+
to transform the on-disk schema into the in-memory model, ROOT will error out and refrain from reading data.
17+
18+
This document describes schema evolution in RNTuple.
19+
For the most part, schema evolution works identical across the different ROOT I/O systems (TFile, TTree, RNTuple).
20+
The exceptions are listed in the last section of this document.
21+
22+
## Automatic schema evolution
23+
24+
ROOT applies a number of rules to read data transparently into in-memory models
25+
that are not an exact match to the on-disk schema.
26+
The automatic rules apply recursively to compound types (classes, tuples, collections, etc.);
27+
the outer types are evolved before the inner types.
28+
29+
Automatic schema evolution rules transform native _types_ as well as the _shape_ of user-defined classes
30+
as listed in the following, exhaustive tables.
31+
32+
### Class shape transformations
33+
34+
User-defined classes can automatically evolve their layout in the following ways.
35+
Note that users should increase the class version number when the layout changes.
36+
However, for RNTuple automatic rules that is not mandatory;
37+
RNTuple will always compare the current on-disk layout with the in-memory model.
38+
39+
| Layout Change | Also supported in Untyped Records | Comment |
40+
| --------------------------------------- | --------------------------------- | -------------------- |
41+
| Remove member | Yes | Match by member name |
42+
| Add member | Yes | Match by member name |
43+
| Reorder members | Yes | Match by member name |
44+
| Remove all base classes | n/a | |
45+
| Add base class(es) where they were none | n/a | |
46+
47+
Reordering and incremental addition or removal of base classes is currently unsupported
48+
but may be supported in future RNTuple versions.
49+
50+
### Type transformations
51+
52+
ROOT transparently reads into in-memory types that are different from but compatible to the on-disk type.
53+
In the following tables, `T'` denotes a type that is compatible to `T`.
54+
55+
#### Plain fields
56+
57+
| In-memory type | Compatible on-disk types | Comment |
58+
| --------------------------- | --------------------------- | ---------------------|
59+
| `bool` | `char` | |
60+
| | `std::[u]int[8,16,32,64]_t` | |
61+
| | enum | |
62+
|-----------------------------|-----------------------------|----------------------|
63+
| `char` | `bool` | |
64+
| | `std::[u]int[8,16,32,64]_t` | with bounds check |
65+
| | enum | with bounds check |
66+
|-----------------------------|-----------------------------|----------------------|
67+
| `std::[u]int[8,16,32,64]_t` | `bool` | |
68+
| | `char` | |
69+
| | `std::[u]int[8,16,32,64]_t` | with bounds check |
70+
| | enum | with bounds check |
71+
|-----------------------------|-----------------------------|----------------------|
72+
| enum | enum of different type | with bounds check |
73+
|-----------------------------|-----------------------------|----------------------|
74+
| float | double | |
75+
|-----------------------------|-----------------------------|----------------------|
76+
| double | float | |
77+
|-----------------------------|-----------------------------|----------------------|
78+
| `std::atomic<T>` | `T'` | |
79+
80+
81+
#### Variable-length collections
82+
83+
| In-memory type | Compatible on-disk types | Comment |
84+
| -------------------------------- | ------------------------------------ | ------------------------------------- |
85+
| `std::vector<T>` | `ROOT::RVec<T'>` | |
86+
| | `std::array<T', N>` | |
87+
| | `std::[unordered_][multi]set<T'>` | |
88+
| | `std::[unordered_][multi]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>` |
89+
| | `std::optional<T'>` | |
90+
| | `std::unique_ptr<T'>` | |
91+
| | User-defined collection of `T'` | |
92+
| | Untyped collection of `T'` | |
93+
|----------------------------------|--------------------------------------|---------------------------------------|
94+
| `std::RVec<T>` | `ROOT::vector<T'>` | with size check |
95+
| | `std::array<T', N>` | with size check |
96+
| | `std::[unordered_][multi]set<T'>` | with size check |
97+
| | `std::[unordered_][multi]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>`, |
98+
| | | with size check |
99+
| | `std::optional<T'>` | |
100+
| | `std::unique_ptr<T'>` | |
101+
| | User-defined collection of `T'` | with size check |
102+
| | Untyped collectionof `T'` | with size check |
103+
|----------------------------------|--------------------------------------|---------------------------------------|
104+
| `std::[unordered_]set<T>` | `std::[unordered_]set<T'>` | |
105+
| | `std::[unordered_]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>` |
106+
|----------------------------------|--------------------------------------|---------------------------------------|
107+
| `std::[unordered_]multiset<T>` | `ROOT::vector<T'>` | |
108+
| | `std::array<T', N>` | |
109+
| | `std::[unordered_][multi]set<T'>` | |
110+
| | `std::[unordered_][multi]map<K',V'>` | only `T` = `std::[pair,tuple]<K,V>` |
111+
| | User-defined collection of `T'` | |
112+
| | Untyped collection of `T'` | |
113+
|----------------------------------|--------------------------------------|---------------------------------------|
114+
| `std::[unordered_]map<K,V>` | `std::[unordered_]map<K',V'>` | |
115+
| | `std::[unordered_]set<T>` | only `T` = `std::[pair,tuple]<K',V'>` |
116+
|----------------------------------|--------------------------------------|---------------------------------------|
117+
| `std::[unordered_]multimap<K,V>` | `ROOT::vector<T>` | only `T` = `std::[pair,tuple]<K,V>` |
118+
| | `std::array<T, N>` | only `T` = `std::[pair,tuple]<K,V>` |
119+
| | `std::[unordered_][multi]set<T>` | only `T` = `std::[pair,tuple]<K,V>` |
120+
| | `std::[unordered_][multi]map<K',V'>` | |
121+
| | User-defined collection of `T` | only `T` = `std::[pair,tuple]<K,V>` |
122+
| | Untyped collection of `T` | only `T` = `std::[pair,tuple]<K,V>` |
123+
124+
#### Nullable fields
125+
126+
| In-memory type | Compatible on-disk types |
127+
| -------------------- | ------------------------ |
128+
| `std::optional<T>` | `std::unique_ptr<T'>` |
129+
| | `T'` |
130+
|----------------------|--------------------------|
131+
| `std::unique_ptr<T>` | `std::optional<T'>` |
132+
| | `T'` |
133+
134+
#### Records
135+
136+
| In-memory type | Compatible on-disk types |
137+
| --------------------------- | -------------------------------------- |
138+
| `std::pair<T,U>` | `std::tuple<T',U'>` |
139+
|-----------------------------|----------------------------------------|
140+
| `std::tuple<T,U>` | `std::pair<T',U'>` |
141+
|-----------------------------|----------------------------------------|
142+
| Untyped record | User-defined class of compatible shape |
143+
144+
Note that for emulated classes, the in-memory untyped record is constructed from on-disk information.
145+
146+
#### Additional rules
147+
148+
All on-disk types `std::atomic<T'>` can be read into a `T` in-memory model.
149+
150+
If a class property changes from using an RNTuple streamer field to a using regular RNTuple class field,
151+
existing files with on-disk streamer fields will continue to read as streamer fields.
152+
This can be seen as "schema evolution out of streamer fields".
153+
154+
## Manual schema evolution (I/O read rules)
155+
156+
ROOT I/O read rules allow for custom code handling the transformation from the on-disk schema to the in-memory model.
157+
Read rules are part of the class dictionary.
158+
For the exact syntax of read rules, we refer to the ROOT manual.
159+
160+
Generally, read rules consist of
161+
- A target class.
162+
- Target members of the target class, i.e. those class members whose value is set by the rule.
163+
Target members must be direct members, i.e. not part of a base class.
164+
- A source class (possibly having a different class name than the target class)
165+
together with class versions or class checksums
166+
that describe all the possible on-disk class versions the rule applies to.
167+
- Source members of the source class; the given source members will be read as the given type.
168+
Source members can also be from a base class.
169+
Note that there is no way to specify a base class member that has the same name as a member in the derived class.
170+
- The custom code snippet; the code snippet has access to the (whole) target object and to the given source members.
171+
172+
At runtime, for any given target member there must be at most be one applicable rule.
173+
A source member can be read them into any type compatible to its on-disk type
174+
but any given source member can only be read into one type for a given target class
175+
(i.e. multiple rules for the same target/source class must not use different types for the same source member).
176+
177+
There are two special types of rules
178+
1. Pure class rename rules consisting only of source and target class
179+
2. Whole-object rules that consisting only of source and target class and a code snippet
180+
181+
Class rename rules (pure or not) are not transitive
182+
(if in-memory `A` can read from on-disk `B` and in-memory `B` can read from no-disk `C`,
183+
in-memory `A` can not automatically read from on-disk `C`).
184+
185+
Note that read rules operate on partially read objects.
186+
Read rules are executed after all members not subject to read rules have been read from disk.
187+
Whole-object rules are executed after other rules.
188+
Otherwise, the scheduling of rules is unspecified.
189+
190+
## Interplay between automatic and manual schema evolution
191+
192+
The target members of I/O read rules are exempt from automatic schema evolution
193+
(applies to the corresponding field of the target member and all its subfields).
194+
Otherwise, automatic and manual schema evolution work side by side.
195+
For instance, a renamed class is still subject to automatic schema evolution.
196+
197+
The source member of a read rule is subject to the same automatic and manual schema evolution rules
198+
as if it was normally read, e.g. in an `RNTupleView`.
199+
200+
## Schema evolution differences between RNTuple and Classic I/O
201+
202+
In contrast to RNTuple, TTree and TFile apply also the following automatic schema evolution rules
203+
- Conversion between floating point and integer types
204+
- Conversion from `unique_ptr<T>` --> `T'`
205+
- Complete conversion matrix of all collection types
206+
- Insertion and removal of intermediate classes
207+
- Move of a member between base class and derived class
208+
- Reordering of base classes
209+

0 commit comments

Comments
 (0)