Skip to content

Commit 7152431

Browse files
Add deserialization design doc
1 parent 25bfb64 commit 7152431

File tree

1 file changed

+183
-0
lines changed

1 file changed

+183
-0
lines changed

designs/serialization.md

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -302,3 +302,186 @@ class HTTPHeaderSerializer(SpecificShapeSerializer):
302302

303303
[...]
304304
```
305+
306+
## Shape Deserializers and Deserializeable Shapes
307+
308+
Deserialization will function very similarly to serialization, through the
309+
interaction of two interfaces: `ShapeDeserializer` and `DeserializeableShape`.
310+
311+
A `ShapeDeserializer` is a class that is given a data source and provides
312+
methods to extract typed data from it when given a schema. For example, a
313+
`JSONShapeDeserializer` could be written that is constructed with JSON bytes and
314+
allows a caller to convert it to a shape.
315+
316+
A `SerializeableShape` is a class that has a `deserialize` method that takes a
317+
`ShapeDeserializer` and calls the relevant methods needed to deserialize it. All
318+
generated shapes will implement the `DeserializeableShape` interface, which will
319+
then be the method by which all deserialization is performed.
320+
321+
In Python, these interfaces will be represented as `Protocol`s, as shown below:
322+
323+
```python
324+
@runtime_checkable
325+
class ShapeDeserializer(Protocol):
326+
327+
def read_struct(
328+
self,
329+
schema: "Schema",
330+
consumer: Callable[["Schema", "ShapeDeserializer"], None],
331+
) -> None:
332+
...
333+
334+
def read_list(
335+
self, schema: "Schema", consumer: Callable[["ShapeDeserializer"], None]
336+
) -> None:
337+
...
338+
339+
def read_map(
340+
self,
341+
schema: "Schema",
342+
consumer: Callable[[str, "ShapeDeserializer"], None],
343+
) -> None:
344+
...
345+
346+
def is_null(self) -> bool:
347+
...
348+
349+
def read_null(self) -> None:
350+
...
351+
352+
def read_boolean(self, schema: "Schema") -> bool:
353+
...
354+
355+
def read_blob(self, schema: "Schema") -> bytes:
356+
...
357+
358+
def read_byte(self, schema: "Schema") -> int:
359+
return self.read_integer(schema)
360+
361+
def read_short(self, schema: "Schema") -> int:
362+
return self.read_integer(schema)
363+
364+
def read_integer(self, schema: "Schema") -> int:
365+
...
366+
367+
def read_long(self, schema: "Schema") -> int:
368+
return self.read_integer(schema)
369+
370+
def read_float(self, schema: "Schema") -> float:
371+
...
372+
373+
def read_double(self, schema: "Schema") -> float:
374+
return self.read_float(schema)
375+
376+
def read_big_integer(self, schema: "Schema") -> int:
377+
return self.read_integer(schema)
378+
379+
def read_big_decimal(self, schema: "Schema") -> Decimal:
380+
...
381+
382+
def read_string(self, schema: "Schema") -> str:
383+
...
384+
385+
def read_document(self, schema: "Schema") -> "Document":
386+
...
387+
388+
def read_timestamp(self, schema: "Schema") -> datetime.datetime:
389+
...
390+
391+
392+
@runtime_checkable
393+
class DeserializeableShape(Protocol):
394+
@classmethod
395+
def deserialize(cls, deserializer: ShapeDeserializer) -> Self:
396+
...
397+
```
398+
399+
Below is an example Smithy `structure` shape, followed by the
400+
`DeserializeableShape` it would generate.
401+
402+
```smithy
403+
namespace com.example
404+
405+
structure ExampleStructure {
406+
member: Integer = 0
407+
}
408+
```
409+
410+
```python
411+
@dataclass(kw_only=True)
412+
class ExampleStructure:
413+
member: int = 0
414+
415+
@classmethod
416+
def deserialize(cls, deserializer: ShapeDeserializer) -> Self:
417+
return cls(**cls.deserialize_kwargs(deserializer))
418+
419+
@classmethod
420+
def deserialize_kwargs(cls, deserializer: ShapeDeserializer) -> dict[str, Any]:
421+
kwargs: dict[str, Any] = {}
422+
423+
def _consumer(schema: Schema, de: ShapeDeserializer) -> None:
424+
match schema.expect_member_index():
425+
case 0:
426+
kwargs["member"] = de.read_integer(
427+
_SCHEMA_CLIENT_OPTIONAL_DEFAULTS.members["member"]
428+
)
429+
430+
case _:
431+
logger.debug(f"Unexpected member schema: {schema}")
432+
433+
deserializer.read_struct(_SCHEMA_CLIENT_OPTIONAL_DEFAULTS, consumer=_consumer)
434+
return kwargs
435+
```
436+
437+
For structures, arguments are built up in a `kwargs` dictionary, which is later
438+
expanded to consstruct the final type. Other languages might use a builder
439+
pattern instead, but builders are atypical in Python, so this is a midway
440+
approach that should be familiar to Python users.
441+
442+
Member dispatch is currently based on the "member index", which is a
443+
representation of the member's position on the shape in the Smithy model itself.
444+
(Note that this is not always the same as the ordering of the members in the
445+
members dictionary. Recursive members are added at the end, regardless of where
446+
they appear in the model.)
447+
448+
Doing member dispatch this way is a micro-optimization brought over from Java,
449+
which uses relatively simple integer comparision instead of the comparatively
450+
more expensive string comparison needed to compare based on the member name.
451+
Further testing needs to be done in Python to determine whether the performance
452+
impact justifies the extra artifact size. In Java, the compiler is also capable
453+
of turning an integer switch into a jump table, which CPython does not do. This
454+
futher reduces the potential performance benefit of the approach.
455+
456+
It is important to note that the general approach of dealing with members
457+
differs from serialization. No callback functions are needed in serialization,
458+
but they are needed for deserialization. The reason is that deserializers must
459+
handle members as they are presented in the data source, without any sort of
460+
intermediate structure to pull members from. The shape class can't simply
461+
iterate through its members in whatever order it likes to check if said member
462+
is present, because the only member that is ever known about is the *next* one.
463+
464+
### Performing Deserialization
465+
466+
Deserialization works much like serialization does, all that is needed is a
467+
deserializer and a class to deserialize into. The following shows how one might
468+
deserialize a shape from JSON bytes:
469+
470+
```python
471+
>>> deserializer = JSONShapeDeserializer(b'{"member":9}')
472+
>>> print(ExampleStructure.deserialize(deserializer))
473+
ExampleStructure(member=9)
474+
```
475+
476+
Just like with serialization, the process for performing deserialization never
477+
changes at the high level. Different implementations will all interact with the
478+
shape in the same exact way. The same interface will be used for HTTP bindings,
479+
event stream bindings, and any other sort of model-driven data binding that may
480+
be needed.
481+
482+
These implementations can be swapped at any time without having to regenerate
483+
the client, and can be used for purposes other than receiving responses from a
484+
client call to a service. A service could, for example, model its event
485+
structures and include them in their client. A customer could then use the
486+
generated `DeserializeableShape`s to deserialize those events into Python types
487+
when they're received without having to do so manually.

0 commit comments

Comments
 (0)