Skip to content

Commit b6aee98

Browse files
committed
docs: Document binary format
1 parent 924d93b commit b6aee98

File tree

2 files changed

+90
-0
lines changed

2 files changed

+90
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,5 @@ Yet another C# serialization library.
99
## Documentation
1010

1111
* [Reference](Yuzu/docs/reference.md)
12+
* [Binary format](Yuzu/docs/binary.md)
1213

Yuzu/docs/binary.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Yuzu binary format
2+
3+
* [Description](#description)
4+
* [Details](#details)
5+
6+
## Description
7+
8+
Yuzu binary format is optimized for fast serialization and deserialization,
9+
at the slight expense of density.
10+
If maximal density is required, it is recommended to additionaly compress serialized data.
11+
12+
Although format is not strictly tied to C#/.NET, some format features correspond directly
13+
to C#/.NET ones and may require some extra conversion on other platforms.
14+
15+
Yuzu binary format and Yuzu library provides extensive support for data migration.
16+
In particular, roundtrip is guaranteed even in the presence of unknown fields and unknown types
17+
(i.e. older version of code may read, modify and write data created by newer version,
18+
preserving new fields unknown to older version).
19+
20+
Yuzu serialization and deserialization is (mostly) done in a single pass, both on object graph and serialized data stream.
21+
Exceptions (such as `CheckForEmptyCollections` option) are mentioned explicitly in the [reference](reference.md).
22+
23+
Yuzu binary metadata is intermingled with data, with each new structured type described on the first occurrence.
24+
This avoids extra object graph pass to gather metadata, but creates some limits for reordering of serialized stream.
25+
26+
Yuzu binary data stream may be split at top-level object boundaries.
27+
28+
Individual sub-trees of object graph may be reordered between serialization and deserialization as long as
29+
they do not introduce any new metadata.
30+
31+
## Details
32+
33+
Yuzu binary consists of optional signature followed by a serialized data item.
34+
Item contains type header and data.
35+
Type header starts with a single-byte rough type:
36+
37+
Value | Description
38+
---:| ---
39+
1 | `sbyte`
40+
2 | `byte`
41+
3 | `short`
42+
4 | `ushort`
43+
5 | `int`
44+
6 | `uint`
45+
7 | `long`
46+
8 | `ulong`
47+
9 | `bool`
48+
10 | `char`
49+
11 | `float`
50+
12 | `double`
51+
13 | `decimal`
52+
14 | `DateTime`
53+
15 | `TimeSpan`
54+
16 | `string`
55+
17 | *Any* (`object`)
56+
18 | `Nullable`
57+
19 | `DateTimeOffset`
58+
20 | `Guid`
59+
32 | *Record*
60+
33 | *Sequence*
61+
34 | *Mapping*
62+
63+
Basic types are immediately followed by value, where integers are stored in little-endian order,
64+
floating point values in IEEE 754 representation, char as UTF-8.
65+
66+
String is serialized as a varint length followed by a sequence of UTF-8 bytes.
67+
If the length is zero, it is followed by either a zero byte indicating empty string or a byte with value 1 indicating `null`.
68+
69+
`Nullable` is followed by item type, then a zero byte to represent `null` or a byte with value 1 followed by item value.
70+
71+
*Sequence* (denoting arrays and collections) is followed by item type, then by 4-byte item count and item representations.
72+
73+
*Mapping* (denoting dictionaries) is followed by key type, then by item type, then by 4-byte entry count and entry representations.
74+
Each entry consists of key followed by value.
75+
76+
*Record* denotes structured types (`class` and `struct`). It is followed by:
77+
1. 2-byte type index. Indexes are counting from 1 upwards without gaps in order of type appearance in serialized stream.
78+
2. If this index is new (i.e. type did not yet occur in current stream), 2-byte number of fields, followed by field descriptions.
79+
3. For each field:
80+
1. 2-byte field index starting with 1 with possible gaps (so some fields may be omitted)
81+
2. If field type was Any (17), type of specific value.
82+
3. Value representation.
83+
4. 2 zero bytes.
84+
85+
Field description is:
86+
1. 2-byte field index. Indexes are counting from 1 upwards without gaps
87+
2. Field name length, varint-encoded.
88+
3. Field name in UTF-8.
89+
4. Field type.

0 commit comments

Comments
 (0)