Skip to content

Fix #66: dataclass from Generic now serializes #68

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changelog/_unreleased.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[[entries]]
id = "693674ea-b2b2-4733-bce6-4d5bae59b164"
type = "fix"
description = "Fix #66: dataclasses inheriting from uninstantiated Generic did not get all their fields serialized"
author = "@rhaps0dy"
2 changes: 1 addition & 1 deletion databind/src/databind/core/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ class A(Generic[T]):
pass

# Continue with the base classes.
for base in hint.bases or hint.type.__bases__:
for base in (*hint.bases, *hint.type.__bases__):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure atm. whether the order plays a role here, i.e. if maybe the two unrolls should be the other way around. 🤔 Will dig a bit when I get the time, unless you want to hash out what it means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, quickly: this comes from typeapi ClassTypeHint .bases , which uses get_type_hint_original_bases. That on purpose accesses __orig_bases__, which seems to be a CPython attribute of classes which is not very well documented.

According to this PEP, __orig_bases__ contains the instantiated generics for a particular type (e.g. A[int]), as opposed to just the bare types (e.g. A). The former is useful because it gives more information to Databind about how to de/serialize a particular field. However, subclasses overrides methods, and it's not clear that that's going to be in the correct order.

We can get the correct method resolution order by calling .mro() on the class that we're dealing with. Given the test case with inheritance class A(Generic[T]) -> class B(A[int]) -> class C(B), calling C.mro() would give us (C, B, A) which would correctly tell where fields come from. But, we'd be losing type information.

So, my recommendation is:

  • We should just call Class.mro() before this entire loop to get the correct iteration order
  • For every typehint we get, we get its __orig_bases__ to get the instantiated-Generic types. If TypeHint(t).type matches anything in the MRO list, we use the instantiated-generic type instead of the original bare type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I were to commit a solution like this, would you merge it?

Copy link
Owner

@NiklasRosenstein NiklasRosenstein May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rhaps0dy , I'm sorry for the late response.

Thanks for explaining your train of thought so thoroughly. I'm following you, but I think that if this should, it should probably be on the typeapi level instead. Thatis because recursing through ClassTypeHint.bases should allow you to reconstruct the MRO while at the same type getting access to the typed bases. (Just as you can reconstruct the MRO manually by recursing through type.__bases__, only that you don't have the additional type information).

The behaviour we're seeing seems to stem from the fact that __orig_bases__ does not get set on a subclass that inherits from a generic without parameterizing it.

from typing import Generic, TypeVar

T = TypeVar("T")


class Base(Generic[T]):
    a: int


class Correct(Base[T]):
    b: str


class Incorrect(Base):
    b: str


print(Correct.__orig_bases__)
assert "__orig_bases__" in vars(Correct)

print(Incorrect.__orig_bases__)
assert "__orig_bases__" not in vars(Incorrect)

This is a relatively old issue (so the requirement to always add Generic[T] into the mix is no longer present), but it hints at the fact that inheriting from a generic should be done with parameterization (even if that is with the same or another type variable). It seems there is no clear semantic assigned to inheriting from a generic without parameterizing it.

python/typing#85

This leads me to consider that this is not necessarily a bug in Databind or Typeapi but in your code. :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think the need to add # type: ignore[type-arg] is a pretty good hint that the way the inheritance is defined is actually invalid in Python's (aka. Mypy's) type system, and thus I don't think we should support the case.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base_hint = TypeHint(base, source=hint.type).evaluate().parameterize(parameter_map)
assert isinstance(base_hint, ClassTypeHint), f"nani? {base_hint}"
if dataclasses.is_dataclass(base_hint.type):
Expand Down
24 changes: 24 additions & 0 deletions databind/src/databind/core/tests/schema_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -456,3 +456,27 @@ def test_parse_dataclass_with_forward_ref() -> None:
ClassWithForwardRef,
ClassWithForwardRef,
)


UnboundGeneric = t.TypeVar("UnboundGeneric")


@dataclasses.dataclass
class GenericClass(t.Generic[UnboundGeneric]):
a_field: int


@dataclasses.dataclass
class InheritGeneric(GenericClass): # type: ignore[type-arg]
b_field: str


def test_schema_generic_dataclass() -> None:
"""Regression test for #66: dataclasses inheriting from Generic with an uninstantiated TypeVar don't get their
parents' fields.
"""
assert convert_dataclass_to_schema(InheritGeneric) == Schema(
{"a_field": Field(TypeHint(int), True), "b_field": Field(TypeHint(str), True)},
InheritGeneric,
InheritGeneric,
)
22 changes: 22 additions & 0 deletions databind/src/databind/json/tests/converters_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -713,3 +713,25 @@ def of(cls, v: str) -> "MyCls":
mapper = make_mapper([JsonConverterSupport()])
assert mapper.serialize(MyCls(), MyCls) == "MyCls"
assert mapper.deserialize("MyCls", MyCls) == MyCls()


UnboundGeneric = t.TypeVar("UnboundGeneric")


@dataclasses.dataclass
class GenericClass(t.Generic[UnboundGeneric]):
a_field: int


@dataclasses.dataclass
class InheritGeneric(GenericClass): # type: ignore[type-arg]
b_field: str


def test_serialize_generic_dataclass() -> None:
"""Regression test for #66: dataclasses inheriting from Generic with an uninstantiated TypeVar don't get their
parents' fields.
"""
obj = InheritGeneric(2, "hi")
mapper = make_mapper([SchemaConverter(), PlainDatatypeConverter()])
assert mapper.serialize(obj, InheritGeneric) == {"a_field": obj.a_field, "b_field": obj.b_field}