Skip to content

Commit 821eef9

Browse files
Add deserialization design doc
1 parent 133df1e commit 821eef9

File tree

1 file changed

+183
-0
lines changed

1 file changed

+183
-0
lines changed

designs/serialization.md

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -294,3 +294,186 @@ class HTTPHeaderSerializer(SpecificShapeSerializer):
294294

295295
[...]
296296
```
297+
298+
## Shape Deserializers and Deserializeable Shapes
299+
300+
Deserialization will function very similarly to serialization, through the
301+
interaction of two interfaces: `ShapeDeserializer` and `DeserializeableShape`.
302+
303+
A `ShapeDeserializer` is a class that is given a data source and provides
304+
methods to extract typed data from it when given a schema. For example, a
305+
`JSONShapeDeserializer` could be written that is constructed with JSON bytes and
306+
allows a caller to convert it to a shape.
307+
308+
A `SerializeableShape` is a class that has a `deserialize` method that takes a
309+
`ShapeDeserializer` and calls the relevant methods needed to deserialize it. All
310+
generated shapes will implement the `DeserializeableShape` interface, which will
311+
then be the method by which all deserialization is performed.
312+
313+
In Python, these interfaces will be represented as `Protocol`s, as shown below:
314+
315+
```python
316+
@runtime_checkable
317+
class ShapeDeserializer(Protocol):
318+
319+
def read_struct(
320+
self,
321+
schema: "Schema",
322+
consumer: Callable[["Schema", "ShapeDeserializer"], None],
323+
) -> None:
324+
...
325+
326+
def read_list(
327+
self, schema: "Schema", consumer: Callable[["ShapeDeserializer"], None]
328+
) -> None:
329+
...
330+
331+
def read_map(
332+
self,
333+
schema: "Schema",
334+
consumer: Callable[[str, "ShapeDeserializer"], None],
335+
) -> None:
336+
...
337+
338+
def is_null(self) -> bool:
339+
...
340+
341+
def read_null(self) -> None:
342+
...
343+
344+
def read_boolean(self, schema: "Schema") -> bool:
345+
...
346+
347+
def read_blob(self, schema: "Schema") -> bytes:
348+
...
349+
350+
def read_byte(self, schema: "Schema") -> int:
351+
return self.read_integer(schema)
352+
353+
def read_short(self, schema: "Schema") -> int:
354+
return self.read_integer(schema)
355+
356+
def read_integer(self, schema: "Schema") -> int:
357+
...
358+
359+
def read_long(self, schema: "Schema") -> int:
360+
return self.read_integer(schema)
361+
362+
def read_float(self, schema: "Schema") -> float:
363+
...
364+
365+
def read_double(self, schema: "Schema") -> float:
366+
return self.read_float(schema)
367+
368+
def read_big_integer(self, schema: "Schema") -> int:
369+
return self.read_integer(schema)
370+
371+
def read_big_decimal(self, schema: "Schema") -> Decimal:
372+
...
373+
374+
def read_string(self, schema: "Schema") -> str:
375+
...
376+
377+
def read_document(self, schema: "Schema") -> "Document":
378+
...
379+
380+
def read_timestamp(self, schema: "Schema") -> datetime.datetime:
381+
...
382+
383+
384+
@runtime_checkable
385+
class DeserializeableShape(Protocol):
386+
@classmethod
387+
def deserialize(cls, deserializer: ShapeDeserializer) -> Self:
388+
...
389+
```
390+
391+
Below is an example Smithy `structure` shape, followed by the
392+
`DeserializeableShape` it would generate.
393+
394+
```smithy
395+
namespace com.example
396+
397+
structure ExampleStructure {
398+
member: Integer = 0
399+
}
400+
```
401+
402+
```python
403+
@dataclass(kw_only=True)
404+
class ExampleStructure:
405+
member: int = 0
406+
407+
@classmethod
408+
def deserialize(cls, deserializer: ShapeDeserializer) -> Self:
409+
return cls(**cls.deserialize_kwargs(deserializer))
410+
411+
@classmethod
412+
def deserialize_kwargs(cls, deserializer: ShapeDeserializer) -> dict[str, Any]:
413+
kwargs: dict[str, Any] = {}
414+
415+
def _consumer(schema: Schema, de: ShapeDeserializer) -> None:
416+
match schema.expect_member_index():
417+
case 0:
418+
kwargs["member"] = de.read_integer(
419+
_SCHEMA_CLIENT_OPTIONAL_DEFAULTS.members["member"]
420+
)
421+
422+
case _:
423+
logger.debug(f"Unexpected member schema: {schema}")
424+
425+
deserializer.read_struct(_SCHEMA_CLIENT_OPTIONAL_DEFAULTS, consumer=_consumer)
426+
return kwargs
427+
```
428+
429+
For structures, arguments are built up in a `kwargs` dictionary, which is later
430+
expanded to consstruct the final type. Other languages might use a builder
431+
pattern instead, but builders are atypical in Python, so this is a midway
432+
approach that should be familiar to Python users.
433+
434+
Member dispatch is currently based on the "member index", which is a
435+
representation of the member's position on the shape in the Smithy model itself.
436+
(Note that this is not always the same as the ordering of the members in the
437+
members dictionary. Recursive members are added at the end, regardless of where
438+
they appear in the model.)
439+
440+
Doing member dispatch this way is a micro-optimization brought over from Java,
441+
which uses relatively simple integer comparision instead of the comparatively
442+
more expensive string comparison needed to compare based on the member name.
443+
Further testing needs to be done in Python to determine whether the performance
444+
impact justifies the extra artifact size. In Java, the compiler is also capable
445+
of turning an integer switch into a jump table, which CPython does not do. This
446+
futher reduces the potential performance benefit of the approach.
447+
448+
It is important to note that the general approach of dealing with members
449+
differs from serialization. No callback functions are needed in serialization,
450+
but they are needed for deserialization. The reason is that deserializers must
451+
handle members as they are presented in the data source, without any sort of
452+
intermediate structure to pull members from. The shape class can't simply
453+
iterate through its members in whatever order it likes to check if said member
454+
is present, because the only member that is ever known about is the *next* one.
455+
456+
### Performing Deserialization
457+
458+
Deserialization works much like serialization does, all that is needed is a
459+
deserializer and a class to deserialize into. The following shows how one might
460+
deserialize a shape from JSON bytes:
461+
462+
```python
463+
>>> deserializer = JSONShapeDeserializer(b'{"member":9}')
464+
>>> print(ExampleStructure.deserialize(deserializer))
465+
ExampleStructure(member=9)
466+
```
467+
468+
Just like with serialization, the process for performing deserialization never
469+
changes at the high level. Different implementations will all interact with the
470+
shape in the same exact way. The same interface will be used for HTTP bindings,
471+
event stream bindings, and any other sort of model-driven data binding that may
472+
be needed.
473+
474+
These implementations can be swapped at any time without having to regenerate
475+
the client, and can be used for purposes other than receiving responses from a
476+
client call to a service. A service could, for example, model its event
477+
structures and include them in their client. A customer could then use the
478+
generated `DeserializeableShape`s to deserialize those events into Python types
479+
when they're received without having to do so manually.

0 commit comments

Comments
 (0)