Skip to content

Commit fa0c97c

Browse files
Add deserialization design doc
1 parent 08bf50d commit fa0c97c

File tree

1 file changed

+198
-0
lines changed

1 file changed

+198
-0
lines changed

designs/serialization.md

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -340,3 +340,201 @@ class HTTPHeaderSerializer(SpecificShapeSerializer):
340340

341341
[...]
342342
```
343+
344+
## Shape Deserializers and Deserializeable Shapes
345+
346+
Deserialization will function very similarly to serialization, through the
347+
interaction of two interfaces: `ShapeDeserializer` and `DeserializeableShape`.
348+
349+
A `ShapeDeserializer` is a class that is given a data source and provides
350+
methods to extract typed data from it when given a schema. For example, a
351+
`JSONShapeDeserializer` could be written that is constructed with JSON bytes and
352+
allows a caller to convert it to a shape.
353+
354+
A `SerializeableShape` is a class that has a `deserialize` method that takes a
355+
`ShapeDeserializer` and calls the relevant methods needed to deserialize it. All
356+
generated shapes will implement the `DeserializeableShape` interface, which will
357+
then be the method by which all deserialization is performed.
358+
359+
In Python these interfaces will be represented as shown below:
360+
361+
```python
362+
@runtime_checkable
363+
class ShapeDeserializer(Protocol):
364+
365+
def read_struct(
366+
self,
367+
schema: "Schema",
368+
state: dict[str, Any],
369+
consumer: Callable[["Schema", "ShapeDeserializer", dict[str, Any]], None],
370+
) -> None:
371+
...
372+
373+
def read_list(
374+
self,
375+
schema: "Schema",
376+
state: list[Any],
377+
consumer: Callable[["ShapeDeserializer"], None],
378+
) -> None:
379+
...
380+
381+
def read_map(
382+
self,
383+
schema: "Schema",
384+
state: dict[str, Any],
385+
consumer: Callable[["ShapeDeserializer"], None],
386+
) -> None:
387+
...
388+
389+
def is_null(self) -> bool:
390+
...
391+
392+
def read_null(self) -> None:
393+
...
394+
395+
def read_boolean(self, schema: "Schema") -> bool:
396+
...
397+
398+
def read_blob(self, schema: "Schema") -> bytes:
399+
...
400+
401+
def read_byte(self, schema: "Schema") -> int:
402+
return self.read_integer(schema)
403+
404+
def read_short(self, schema: "Schema") -> int:
405+
return self.read_integer(schema)
406+
407+
def read_integer(self, schema: "Schema") -> int:
408+
...
409+
410+
def read_long(self, schema: "Schema") -> int:
411+
return self.read_integer(schema)
412+
413+
def read_float(self, schema: "Schema") -> float:
414+
...
415+
416+
def read_double(self, schema: "Schema") -> float:
417+
return self.read_float(schema)
418+
419+
def read_big_integer(self, schema: "Schema") -> int:
420+
return self.read_integer(schema)
421+
422+
def read_big_decimal(self, schema: "Schema") -> Decimal:
423+
...
424+
425+
def read_string(self, schema: "Schema") -> str:
426+
...
427+
428+
def read_document(self, schema: "Schema") -> "Document":
429+
...
430+
431+
def read_timestamp(self, schema: "Schema") -> datetime.datetime:
432+
...
433+
434+
435+
@runtime_checkable
436+
class DeserializeableShape(Protocol):
437+
@classmethod
438+
def deserialize(cls, deserializer: ShapeDeserializer) -> Self:
439+
...
440+
```
441+
442+
Below is an example Smithy `structure` shape, followed by the
443+
`DeserializeableShape` it would generate.
444+
445+
```smithy
446+
namespace com.example
447+
448+
structure ExampleStructure {
449+
member: Integer = 0
450+
}
451+
```
452+
453+
```python
454+
@dataclass(kw_only=True)
455+
class ExampleStructure:
456+
member: int = 0
457+
458+
@classmethod
459+
def deserialize(cls, deserializer: ShapeDeserializer) -> Self:
460+
kwargs: dict[str, Any] = {}
461+
deserializer.read_struct(
462+
_SCHEMA_CLIENT_OPTIONAL_DEFAULTS,
463+
consumer=cls._deserialize_kwargs,
464+
)
465+
return cls(**kwargs)
466+
467+
@classmethod
468+
def _deserialize_kwargs(
469+
schema: Schema,
470+
de: ShapeDeserializer,
471+
kwargs: dict[str, Any],
472+
) -> None:
473+
match schema.expect_member_index():
474+
case 0:
475+
kwargs["member"] = de.read_integer(
476+
_SCHEMA_CLIENT_OPTIONAL_DEFAULTS.members["member"]
477+
)
478+
479+
case _:
480+
logger.debug(f"Unexpected member schema: {schema}")
481+
```
482+
483+
For structures, arguments are built up in a `kwargs` dictionary, which is later
484+
expanded to construct the final type. Other languages might use a builder
485+
pattern instead, but builders are atypical in Python, so this is a midway
486+
approach that should be familiar to Python users.
487+
488+
The `kwargs` dictionary is passed through the serializer in order to avoid
489+
having to allocate an anonymous function or use `functools.partial` (which would
490+
need to allocate a `Partial` object). Lists and maps pass in pre-constructed
491+
containers for the same reason, though unlike with structures the containers are
492+
not passed into the value consumer because there is no need to map value keys in
493+
those cases.
494+
495+
Member dispatch is currently based on the "member index", which is a
496+
representation of the member's position on the shape in the Smithy model itself.
497+
(Note that this is not always the same as the ordering of the members in the
498+
members dictionary. Recursive members are added at the end, regardless of where
499+
they appear in the model.)
500+
501+
Doing member dispatch this way is an optimization, which uses relatively simple
502+
integer comparision instead of the comparatively more expensive string
503+
comparison needed to compare based on the member name. Further testing needs to
504+
be done in Python to determine whether the performance impact justifies the
505+
extra artifact size. In other language, the compiler is also capable of turning
506+
an integer switch into a jump table, which CPython does not do (though it could
507+
in theory).
508+
509+
It is important to note that the general approach of dealing with members
510+
differs from serialization. No callback functions are needed in serialization,
511+
but they are needed for deserialization. The reason is that deserializers must
512+
handle members as they are presented in the data source, without any sort of
513+
intermediate structure to pull members from. The shape class can't simply
514+
iterate through its members in whatever order it likes to check if said member
515+
is present, because the only member that is ever known about is the *next* one.
516+
517+
### Performing Deserialization
518+
519+
Deserialization works much like serialization does, all that is needed is a
520+
deserializer and a class to deserialize into. The following shows how one might
521+
deserialize a shape from JSON bytes:
522+
523+
```python
524+
>>> deserializer = JSONShapeDeserializer(b'{"member":9}')
525+
>>> print(ExampleStructure.deserialize(deserializer))
526+
ExampleStructure(member=9)
527+
```
528+
529+
Just like with serialization, the process for performing deserialization never
530+
changes at the high level. Different implementations will all interact with the
531+
shape in the same exact way. The same interface will be used for HTTP bindings,
532+
event stream bindings, and any other sort of model-driven data binding that may
533+
be needed.
534+
535+
These implementations can be swapped at any time without having to regenerate
536+
the client, and can be used for purposes other than receiving responses from a
537+
client call to a service. A service could, for example, model its event
538+
structures and include them in their client. A customer could then use the
539+
generated `DeserializeableShape`s to deserialize those events into Python types
540+
when they're received without having to do so manually.

0 commit comments

Comments
 (0)