|
| 1 | +# Yuzu binary format |
| 2 | + |
| 3 | + * [Description](#description) |
| 4 | + * [Details](#details) |
| 5 | + |
| 6 | +## Description |
| 7 | + |
| 8 | +Yuzu binary format is optimized for fast serialization and deserialization, |
| 9 | +at the slight expense of density. |
| 10 | +If maximal density is required, it is recommended to additionaly compress serialized data. |
| 11 | + |
| 12 | +Although format is not strictly tied to C#/.NET, some format features correspond directly |
| 13 | +to C#/.NET ones and may require some extra conversion on other platforms. |
| 14 | + |
| 15 | +Yuzu binary format and Yuzu library provides extensive support for data migration. |
| 16 | +In particular, roundtrip is guaranteed even in the presence of unknown fields and unknown types |
| 17 | +(i.e. older version of code may read, modify and write data created by newer version, |
| 18 | +preserving new fields unknown to older version). |
| 19 | + |
| 20 | +Yuzu serialization and deserialization is (mostly) done in a single pass, both on object graph and serialized data stream. |
| 21 | +Exceptions (such as `CheckForEmptyCollections` option) are mentioned explicitly in the [reference](reference.md). |
| 22 | + |
| 23 | +Yuzu binary metadata is intermingled with data, with each new structured type described on the first occurrence. |
| 24 | +This avoids extra object graph pass to gather metadata, but creates some limits for reordering of serialized stream. |
| 25 | + |
| 26 | +Yuzu binary data stream may be split at top-level object boundaries. |
| 27 | + |
| 28 | +Individual sub-trees of object graph may be reordered between serialization and deserialization as long as |
| 29 | +they do not introduce any new metadata. |
| 30 | + |
| 31 | +## Details |
| 32 | + |
| 33 | +Yuzu binary consists of optional signature followed by a serialized data item. |
| 34 | +Item contains type header and data. |
| 35 | +Type header starts with a single-byte rough type: |
| 36 | + |
| 37 | +Value | Description |
| 38 | +---:| --- |
| 39 | + 1 | `sbyte` |
| 40 | + 2 | `byte` |
| 41 | + 3 | `short` |
| 42 | + 4 | `ushort` |
| 43 | + 5 | `int` |
| 44 | + 6 | `uint` |
| 45 | + 7 | `long` |
| 46 | + 8 | `ulong` |
| 47 | + 9 | `bool` |
| 48 | + 10 | `char` |
| 49 | + 11 | `float` |
| 50 | + 12 | `double` |
| 51 | + 13 | `decimal` |
| 52 | + 14 | `DateTime` |
| 53 | + 15 | `TimeSpan` |
| 54 | + 16 | `string` |
| 55 | + 17 | *Any* (`object`) |
| 56 | + 18 | `Nullable` |
| 57 | + 19 | `DateTimeOffset` |
| 58 | + 20 | `Guid` |
| 59 | + 32 | *Record* |
| 60 | + 33 | *Sequence* |
| 61 | + 34 | *Mapping* |
| 62 | + |
| 63 | +Basic types are immediately followed by value, where integers are stored in little-endian order, |
| 64 | +floating point values in IEEE 754 representation, char as UTF-8. |
| 65 | + |
| 66 | +String is serialized as a varint length followed by a sequence of UTF-8 bytes. |
| 67 | +If the length is zero, it is followed by either a zero byte indicating empty string or a byte with value 1 indicating `null`. |
| 68 | + |
| 69 | +`Nullable` is followed by item type, then a zero byte to represent `null` or a byte with value 1 followed by item value. |
| 70 | + |
| 71 | +*Sequence* (denoting arrays and collections) is followed by item type, then by 4-byte item count and item representations. |
| 72 | + |
| 73 | +*Mapping* (denoting dictionaries) is followed by key type, then by item type, then by 4-byte entry count and entry representations. |
| 74 | +Each entry consists of key followed by value. |
| 75 | + |
| 76 | +*Record* denotes structured types (`class` and `struct`). It is followed by: |
| 77 | +1. 2-byte type index. Indexes are counting from 1 upwards without gaps in order of type appearance in serialized stream. |
| 78 | +2. If this index is new (i.e. type did not yet occur in current stream), 2-byte number of fields, followed by field descriptions. |
| 79 | +3. For each field: |
| 80 | + 1. 2-byte field index starting with 1 with possible gaps (so some fields may be omitted) |
| 81 | + 2. If field type was Any (17), type of specific value. |
| 82 | + 3. Value representation. |
| 83 | +4. 2 zero bytes. |
| 84 | + |
| 85 | +Field description is: |
| 86 | +1. 2-byte field index. Indexes are counting from 1 upwards without gaps |
| 87 | +2. Field name length, varint-encoded. |
| 88 | +3. Field name in UTF-8. |
| 89 | +4. Field type. |
0 commit comments