Skip to content

Excessive attempts at nested union serialization #1861

@davidhewitt

Description

@davidhewitt

When serializing unions, there is currently a pattern where we first attempt to serialize with "strict" type checking, and then use "lax" type checking on a second pass. As I understand it, this is a way to make subclasses such as bool (subclass of int) serialize properly when in unions containing their parent type.

With nested unions, we recursively apply this pattern. I think this leads to redundant attempts to serialize, because at each level of the union we try to apply this dance.

Here's a more complete code example:

import pydantic
from typing import TypedDict

FIELD_A_SERIALIZER_CALLS = 0

# typed dict which contains a field with a serializer
class MyDict(TypedDict):
    a: int
    b: int

    @pydantic.field_serializer('a')
    def serialize_my_field(self, value: int) -> str:
        global FIELD_A_SERIALIZER_CALLS
        FIELD_A_SERIALIZER_CALLS += 1
        return str(value)
    
# set up this model within a union, where the `b` value is wrongly typed so
# it'll fail checks against MyDict
#
# set it up within a NESTED union, so that the 

class DictContainer(TypedDict):
    my_dict: MyDict | int

class DictContainer2(TypedDict):
    my_dict: DictContainer | int

ta = pydantic.TypeAdapter(DictContainer2 | int)


print(ta.dump_json(DictContainer2(my_dict=DictContainer(my_dict=MyDict(a=1, b='abc')))))

# We get 4 calls, why:
# 
# Top level attempt to serialize the outer union
# - sets up STRICT checks for the mid union
# - sets up STRICT checks for the bottom union
# - bottom union attempts to strict serialize MyDict, which calls the field serializer for `a` (1)
# - fails MyDict due to `b` being wrong type 
#
# - Repeat with top level at lax checking
# - mid union attempts with strict checking
# - bottom union attempts with strict checking, fails (2)

# - mid union attempts with lax checking
# - bottom union AGAIN attempts with strict checking, fails (3)
# - bottom union finally attempts with lax checking, fails (4)
#
# - everything then falls back to inference, which finally succeeds
#
# Each additional layer adds one more strict attempt which will fail
print(FIELD_A_SERIALIZER_CALLS)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions