Skip to content

Commit 94ed292

Browse files
Add deserialization design doc
1 parent 08bf50d commit 94ed292

File tree

1 file changed

+183
-0
lines changed

1 file changed

+183
-0
lines changed

designs/serialization.md

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -340,3 +340,186 @@ class HTTPHeaderSerializer(SpecificShapeSerializer):
340340

341341
[...]
342342
```
343+
344+
## Shape Deserializers and Deserializeable Shapes
345+
346+
Deserialization will function very similarly to serialization, through the
347+
interaction of two interfaces: `ShapeDeserializer` and `DeserializeableShape`.
348+
349+
A `ShapeDeserializer` is a class that is given a data source and provides
350+
methods to extract typed data from it when given a schema. For example, a
351+
`JSONShapeDeserializer` could be written that is constructed with JSON bytes and
352+
allows a caller to convert it to a shape.
353+
354+
A `SerializeableShape` is a class that has a `deserialize` method that takes a
355+
`ShapeDeserializer` and calls the relevant methods needed to deserialize it. All
356+
generated shapes will implement the `DeserializeableShape` interface, which will
357+
then be the method by which all deserialization is performed.
358+
359+
In Python these interfaces will be represented as shown below:
360+
361+
```python
362+
@runtime_checkable
363+
class ShapeDeserializer(Protocol):
364+
365+
def read_struct(
366+
self,
367+
schema: "Schema",
368+
consumer: Callable[["Schema", "ShapeDeserializer"], None],
369+
) -> None:
370+
...
371+
372+
def read_list(
373+
self, schema: "Schema", consumer: Callable[["ShapeDeserializer"], None]
374+
) -> None:
375+
...
376+
377+
def read_map(
378+
self,
379+
schema: "Schema",
380+
consumer: Callable[[str, "ShapeDeserializer"], None],
381+
) -> None:
382+
...
383+
384+
def is_null(self) -> bool:
385+
...
386+
387+
def read_null(self) -> None:
388+
...
389+
390+
def read_boolean(self, schema: "Schema") -> bool:
391+
...
392+
393+
def read_blob(self, schema: "Schema") -> bytes:
394+
...
395+
396+
def read_byte(self, schema: "Schema") -> int:
397+
return self.read_integer(schema)
398+
399+
def read_short(self, schema: "Schema") -> int:
400+
return self.read_integer(schema)
401+
402+
def read_integer(self, schema: "Schema") -> int:
403+
...
404+
405+
def read_long(self, schema: "Schema") -> int:
406+
return self.read_integer(schema)
407+
408+
def read_float(self, schema: "Schema") -> float:
409+
...
410+
411+
def read_double(self, schema: "Schema") -> float:
412+
return self.read_float(schema)
413+
414+
def read_big_integer(self, schema: "Schema") -> int:
415+
return self.read_integer(schema)
416+
417+
def read_big_decimal(self, schema: "Schema") -> Decimal:
418+
...
419+
420+
def read_string(self, schema: "Schema") -> str:
421+
...
422+
423+
def read_document(self, schema: "Schema") -> "Document":
424+
...
425+
426+
def read_timestamp(self, schema: "Schema") -> datetime.datetime:
427+
...
428+
429+
430+
@runtime_checkable
431+
class DeserializeableShape(Protocol):
432+
@classmethod
433+
def deserialize(cls, deserializer: ShapeDeserializer) -> Self:
434+
...
435+
```
436+
437+
Below is an example Smithy `structure` shape, followed by the
438+
`DeserializeableShape` it would generate.
439+
440+
```smithy
441+
namespace com.example
442+
443+
structure ExampleStructure {
444+
member: Integer = 0
445+
}
446+
```
447+
448+
```python
449+
@dataclass(kw_only=True)
450+
class ExampleStructure:
451+
member: int = 0
452+
453+
@classmethod
454+
def deserialize(cls, deserializer: ShapeDeserializer) -> Self:
455+
return cls(**cls.deserialize_kwargs(deserializer))
456+
457+
@classmethod
458+
def deserialize_kwargs(cls, deserializer: ShapeDeserializer) -> dict[str, Any]:
459+
kwargs: dict[str, Any] = {}
460+
461+
def _consumer(schema: Schema, de: ShapeDeserializer) -> None:
462+
match schema.expect_member_index():
463+
case 0:
464+
kwargs["member"] = de.read_integer(
465+
_SCHEMA_CLIENT_OPTIONAL_DEFAULTS.members["member"]
466+
)
467+
468+
case _:
469+
logger.debug(f"Unexpected member schema: {schema}")
470+
471+
deserializer.read_struct(_SCHEMA_CLIENT_OPTIONAL_DEFAULTS, consumer=_consumer)
472+
return kwargs
473+
```
474+
475+
For structures, arguments are built up in a `kwargs` dictionary, which is later
476+
expanded to construct the final type. Other languages might use a builder
477+
pattern instead, but builders are atypical in Python, so this is a midway
478+
approach that should be familiar to Python users.
479+
480+
Member dispatch is currently based on the "member index", which is a
481+
representation of the member's position on the shape in the Smithy model itself.
482+
(Note that this is not always the same as the ordering of the members in the
483+
members dictionary. Recursive members are added at the end, regardless of where
484+
they appear in the model.)
485+
486+
Doing member dispatch this way is a micro-optimization brought over from Java,
487+
which uses relatively simple integer comparision instead of the comparatively
488+
more expensive string comparison needed to compare based on the member name.
489+
Further testing needs to be done in Python to determine whether the performance
490+
impact justifies the extra artifact size. In Java, the compiler is also capable
491+
of turning an integer switch into a jump table, which CPython does not do. This
492+
futher reduces the potential performance benefit of the approach.
493+
494+
It is important to note that the general approach of dealing with members
495+
differs from serialization. No callback functions are needed in serialization,
496+
but they are needed for deserialization. The reason is that deserializers must
497+
handle members as they are presented in the data source, without any sort of
498+
intermediate structure to pull members from. The shape class can't simply
499+
iterate through its members in whatever order it likes to check if said member
500+
is present, because the only member that is ever known about is the *next* one.
501+
502+
### Performing Deserialization
503+
504+
Deserialization works much like serialization does, all that is needed is a
505+
deserializer and a class to deserialize into. The following shows how one might
506+
deserialize a shape from JSON bytes:
507+
508+
```python
509+
>>> deserializer = JSONShapeDeserializer(b'{"member":9}')
510+
>>> print(ExampleStructure.deserialize(deserializer))
511+
ExampleStructure(member=9)
512+
```
513+
514+
Just like with serialization, the process for performing deserialization never
515+
changes at the high level. Different implementations will all interact with the
516+
shape in the same exact way. The same interface will be used for HTTP bindings,
517+
event stream bindings, and any other sort of model-driven data binding that may
518+
be needed.
519+
520+
These implementations can be swapped at any time without having to regenerate
521+
the client, and can be used for purposes other than receiving responses from a
522+
client call to a service. A service could, for example, model its event
523+
structures and include them in their client. A customer could then use the
524+
generated `DeserializeableShape`s to deserialize those events into Python types
525+
when they're received without having to do so manually.

0 commit comments

Comments
 (0)