Skip to content

from_flat breaks with aliased fields pre-pydantic 2.11 #88

@d-v-b

Description

@d-v-b

Right now, if you try to define an attributes model that uses a field with an alias, e.g.

class AliasedAttrs(BaseModel):
    array_dimensions: tuple[str, ...] = Field(alias="_ARRAY_DIMENSIONS")

Then calling GroupSpec.from_flat on a dict that contains a class that uses this attribute will fail, because the aliased field will not be serialized to dict by its alias (in the above example, the dict would be {"array_dimensions": ,...} instead of {"_ARRAY_DIMENSIONS": ..}

See this example:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#     "pydantic-zarr @ git+https://github.com/zarr-developers/pydantic-zarr.git",
#     "pytest"
# ]
# ///

from typing import Any
from pydantic import BaseModel, ValidationError, Field
from pydantic_zarr.v2 import GroupSpec
import pytest


def test_from_flat_kwargs() -> None:
    """
    Test that from_flat takes keyword arguments for `BaseModel.model_dump`. This is necessary for
    handling fields with aliases properly.
    """

    class AliasedAttrs(BaseModel):
        array_dimensions: tuple[str, ...] = Field(alias="_ARRAY_DIMENSIONS")

    class AliasedGroup(GroupSpec[AliasedAttrs, Any]): ...

    flat = {"": AliasedGroup(attributes=AliasedAttrs(_ARRAY_DIMENSIONS=("a", "b")))}

    msg = r"Field required \[type=missing, input_value=\{'array_dimensions': \('a', 'b'\)}, input_type=dict]"

    with pytest.raises(ValidationError, match=msg):
        AliasedGroup.from_flat(flat)
    assert AliasedGroup.from_flat(flat) == flat[""]

if __name__ == "__main__":
    pytest.main([__file__, '-c', 'foo.py'])

There are two fixes possible for the code we have today

  • Define the serialization behavior of aliases in the model_config of the attribute class. This is only possible for newer versions of pydantic
  • Allow from_flat to take keyword arguments for model_dump; then you can call from_flat(data, by_alias=True), and this will work.

But it's weird that from_flat, which is about deserializing an object, takes keyword arguments for serializing an object. That's because GroupSpec.from_flat looks like this:

@classmethod
def from_flat(cls, data) -> Self:
       # this creates an untyped GroupSpec, which we have to then dump, before creating 
       # our typed groupspec
       from_flated = from_flat_group(data)  
        return cls(**from_flated.model_dump()) # here is where we could insert kwargs for model_dump

there's an extra GroupSpec in there, that only exists because of our need to call from_flat_group. It would be nice to refactor this function to not require that intermediate groupspec.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions