Skip to content

Commit ea3aec1

Browse files
Add serialization design doc
This adds a design doc for schema-based serialization. In the future this will be expanded with information about deserialization, codecs, and protocols.
1 parent e1d9c4f commit ea3aec1

File tree

1 file changed

+296
-0
lines changed

1 file changed

+296
-0
lines changed

designs/serialization.md

Lines changed: 296 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,296 @@
1+
# Protocol Serialization and Deserialization
2+
3+
This document will describe how objects are serialized and deserialized
4+
according to some protocol, such as
5+
[AWS RestJson1](https://smithy.io/2.0/aws/protocols/aws-restjson1-protocol.html),
6+
based on information from a Smithy model.
7+
8+
## Goals
9+
10+
* Shared - Protocols should be implemented as part of a shared library. If two
11+
clients using the same protocol are installed, they should use a shared
12+
implementation. These implementations should be as compact as possible while
13+
still being robust.
14+
* Hot-swappable - Implementations should be flexible enough to be swapped at
15+
runtime if necessary. If a service supports more than one protocol, it should
16+
be trivially easy to swap between them, even at runtime.
17+
* Flexible - Implementations should be useable for purposes other than as a
18+
component of making a request to a web service. Customers should be able to
19+
feed well-formed data from any source into a protocol and have it transform
20+
that data with no side-effects.
21+
22+
## Schemas
23+
24+
The basic building block of Smithy is the "shape", a representation of data of a
25+
given type with known properties called "members", additional constraints and
26+
metadata called "traits", and an identifier.
27+
28+
For each shape contained in a service, a `Schema` object will be generated that
29+
contains almost all of its information. Traits that are known to not affect
30+
serialization or deserialization will be omitted from the generated `Schema` to
31+
save space.
32+
33+
Schemas will form the backbone of serialization and deserialization, carrying
34+
information that cannot be natively included in generated data classes.
35+
36+
The `Schema` class will be a read-only dataclass. The following shows its basic
37+
definition, though the concrete definition may have a slightly different
38+
implementation and/or additional helper methods.
39+
40+
```python
41+
@dataclass(kw_only=True, frozen=True)
42+
class Schema:
43+
id: ShapeID
44+
shape_type: ShapeType
45+
traits: dict[ShapeID, "Trait"] = field(default_factory=dict)
46+
members: dict[str, "Schema"] = field(default_factory=dict)
47+
member_target: "Schema | None" = None
48+
member_index: int | None = None
49+
50+
@classmethod
51+
def collection(
52+
cls,
53+
*,
54+
id: ShapeID,
55+
shape_type: ShapeType = ShapeType.STRUCTURE,
56+
traits: list["Trait"] | None = None,
57+
members: Mapping[str, "MemberSchema"] | None = None,
58+
) -> Self:
59+
...
60+
61+
62+
@dataclass(kw_only=True, frozen=True)
63+
class Trait:
64+
id: "ShapeID"
65+
value: "DocumentValue" = field(default_factory=dict)
66+
```
67+
68+
Below is an example Smithy `structure` shape, followed by the `Schema` it would
69+
generate.
70+
71+
```smithy
72+
namespace com.example
73+
74+
structure ExampleStructure {
75+
member: Integer = 0
76+
}
77+
```
78+
79+
```python
80+
EXAMPLE_STRUCTURE_SCHEMA = Schema.collection(
81+
id=ShapeID("com.example#ExampleStructure"),
82+
members={
83+
"member": {
84+
"target": INTEGER,
85+
"index": 0,
86+
"traits": [
87+
Trait(id=ShapeID("smithy.api#default"), value=0),
88+
],
89+
},
90+
},
91+
)
92+
```
93+
94+
## Shape Serializers and Serializeable Shapes
95+
96+
Serialization will function by the interaction of two interfaces:
97+
`ShapeSerializer`s and `SerializeableShape`s.
98+
99+
A `ShapeSerializer` is a class that is capable of taking a `Schema` and an
100+
associated shape value and serializing it in some way. For example, a
101+
`JSONShapeSerializer` could be written in Python to convert the shape to JSON.
102+
103+
A `SerializeableShape` is a class that has a `serialize` method that takes a
104+
`ShapeSerializer` and calls the relevant methods needed to serialize it. All
105+
generated shapes will implement the `SerializeableShape` interface, which will
106+
then be the method by which all serialization is performed.
107+
108+
Using open interfaces in this way allows for great flexibility in the generated
109+
Python code, which will be discussed more later.
110+
111+
In Python, these interfaces will be represented as `Protocol`s, as shown below:
112+
113+
```python
114+
@runtime_checkable
115+
class ShapeSerializer(Protocol):
116+
117+
def begin_struct(
118+
self, schema: "Schema"
119+
) -> AbstractContextManager["ShapeSerializer"]:
120+
...
121+
122+
def write_struct(self, schema: "Schema", struct: "SerializeableStruct") -> None:
123+
with self.begin_struct(schema=schema) as struct_serializer:
124+
struct.serialize_members(struct_serializer)
125+
126+
def begin_list(self, schema: "Schema") -> AbstractContextManager["ShapeSerializer"]:
127+
...
128+
129+
def begin_map(self, schema: "Schema") -> AbstractContextManager["MapSerializer"]:
130+
...
131+
132+
def write_null(self, schema: "Schema") -> None:
133+
...
134+
135+
def write_boolean(self, schema: "Schema", value: bool) -> None:
136+
...
137+
138+
def write_byte(self, schema: "Schema", value: int) -> None:
139+
self.write_integer(schema, value)
140+
141+
def write_short(self, schema: "Schema", value: int) -> None:
142+
self.write_integer(schema, value)
143+
144+
def write_integer(self, schema: "Schema", value: int) -> None:
145+
...
146+
147+
def write_long(self, schema: "Schema", value: int) -> None:
148+
self.write_integer(schema, value)
149+
150+
def write_float(self, schema: "Schema", value: float) -> None:
151+
...
152+
153+
def write_double(self, schema: "Schema", value: float) -> None:
154+
self.write_float(schema, value)
155+
156+
def write_big_integer(self, schema: "Schema", value: int) -> None:
157+
self.write_integer(schema, value)
158+
159+
def write_big_decimal(self, schema: "Schema", value: Decimal) -> None:
160+
...
161+
162+
def write_string(self, schema: "Schema", value: str) -> None:
163+
...
164+
165+
def write_blob(self, schema: "Schema", value: bytes) -> None:
166+
...
167+
168+
def write_timestamp(self, schema: "Schema", value: datetime.datetime) -> None:
169+
...
170+
171+
def write_document(self, schema: "Schema", value: "Document") -> None:
172+
...
173+
174+
175+
@runtime_checkable
176+
class MapSerializer(Protocol):
177+
def entry(self, key: str, value_writer: Callable[[ShapeSerializer], None]):
178+
...
179+
180+
181+
@runtime_checkable
182+
class SerializeableShape(Protocol):
183+
def serialize(self, serializer: ShapeSerializer) -> None:
184+
...
185+
186+
187+
@runtime_checkable
188+
class SerializeableStruct(SerializeableShape, Protocol):
189+
def serialize_members(self, serializer: ShapeSerializer) -> None:
190+
...
191+
```
192+
193+
Below is an example Smithy `structure` shape, followed by the
194+
`SerializebleShape` it would generate.
195+
196+
```smithy
197+
namespace com.example
198+
199+
structure ExampleStructure {
200+
member: Integer = 0
201+
}
202+
```
203+
204+
```python
205+
@dataclass(kw_only=True)
206+
class ExampleStructure:
207+
member: int = 0
208+
209+
def serialize(self, serializer: ShapeSerializer):
210+
serializer.write_struct(EXAMPLE_STRUCTURE_SCHEMA, self)
211+
212+
def serialize_members(self, serializer: ShapeSerializer):
213+
serializer.write_integer(
214+
EXAMPLE_STRUCTURE_SCHEMA.members["member"], self.member
215+
)
216+
```
217+
218+
### Performing Serialization
219+
220+
To serialize a shape, all that is needed is an instance of the shape and a
221+
serializer. The following shows how one might serilize a shape to JSON bytes:
222+
223+
```python
224+
>>> shape = ExampleStructure(member=9)
225+
>>> serializer = JSONShapeSerializer()
226+
>>> shape.serialize(serializer)
227+
>>> print(serializer.get_result())
228+
b'{"member":9}'
229+
```
230+
231+
The process for performing serialization never changes from the high level.
232+
Different implementations (such as for XML, CBOR, etc.) will all interact with
233+
the shape in the same exact way. The same interface will be used to implement
234+
HTTP bindings, event stream bindings, and any other sort of model-driven data
235+
binding that may be needed.
236+
237+
These implementations can be swapped at any time without having to regenerate
238+
the client, and can be used for purposes other than making client calls to a
239+
service. A service could, for example, model its event structures and include
240+
them in their client. A customer could then use the generated
241+
`SerializeableShape`s to serialize those events without having to do so
242+
manually.
243+
244+
### Composing Serializers
245+
246+
While simple `ShapeSerializer`s can exist, the need to bind data to multiple
247+
locations or with conditional formatting may mean that a single
248+
`ShapeSerializer` may not be sufficient to implement a protocol, or even
249+
content-type. Instead, more complex protocols should *compose* multiple
250+
`ShapeSerializer`s to achieve their intended purpose. The
251+
`InterceptingSerializer` class aims, in part, to make this easier.
252+
253+
```python
254+
class InterceptingSerializer(ShapeSerializer, metaclass=ABCMeta):
255+
@abstractmethod
256+
def before(self, schema: Schema) -> ShapeSerializer: ...
257+
258+
@abstractmethod
259+
def after(self, schema: Schema) -> None: ...
260+
261+
def write_boolean(self, schema: Schema, value: bool) -> None:
262+
self.before(schema).write_boolean(schema, value)
263+
self.after(schema)
264+
265+
[...]
266+
```
267+
268+
The `before` method allows for dispatching to different serializers depending on
269+
the schema. You may dispatch to different serializers depending on whether the
270+
shape is bound to an HTTP header or query string, for example.
271+
272+
```python
273+
class HTTPBindingSerializer(InterceptingSerializer):
274+
_header_serializer: ShapeSerializer
275+
_query_serializer: ShapeSerializer
276+
277+
def before(self, schema: Schema) -> ShapeSerializer:
278+
if HTTP_HEADER_TRAIT in schema.traits:
279+
return _header_serializer
280+
elif HTTP_QUERY_TRAIT in schema.traits:
281+
return _query_serializer
282+
...
283+
```
284+
285+
Since each of these sub-serializers may only be able to handle shapes of a
286+
certain type, they may want to inherit from `SpecificShapeSerializer`, which
287+
throws an error by default for shape types whose serialize method is not
288+
implemented.
289+
290+
```python
291+
class HTTPHeaderSerializer(SpecificShapeSerializer):
292+
def write_boolean(self, schema: "Schema", value: bool) -> None:
293+
...
294+
295+
[...]
296+
```

0 commit comments

Comments
 (0)