-
Notifications
You must be signed in to change notification settings - Fork 44
feat: pydantic v1 serialization and validation #399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 7 commits
9009c46
b99e933
09a5f76
c6ffee7
03f7d9a
52e2a18
ea1e65e
4158fdd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
from __future__ import annotations | ||
|
||
import os | ||
import pathlib | ||
import sys | ||
import warnings | ||
from abc import ABCMeta | ||
|
@@ -14,8 +15,10 @@ | |
from typing import TYPE_CHECKING | ||
from typing import Any | ||
from typing import BinaryIO | ||
from typing import Callable | ||
from typing import Literal | ||
from typing import TextIO | ||
from typing import TypedDict | ||
from typing import overload | ||
from urllib.parse import SplitResult | ||
from urllib.parse import urlsplit | ||
|
@@ -40,8 +43,10 @@ | |
|
||
if TYPE_CHECKING: | ||
if sys.version_info >= (3, 11): | ||
from typing import NotRequired | ||
from typing import Self | ||
else: | ||
from typing_extensions import NotRequired | ||
from typing_extensions import Self | ||
|
||
from pydantic import GetCoreSchemaHandler | ||
|
@@ -107,6 +112,14 @@ def __getitem__(cls, key): | |
return cls | ||
|
||
|
||
class SerializedUPath(TypedDict): | ||
"""Serialized format for a UPath object""" | ||
|
||
path: str | ||
protocol: NotRequired[str] | ||
storage_options: NotRequired[dict[str, Any]] | ||
|
||
|
||
class _UPathMixin(metaclass=_UPathMeta): | ||
__slots__ = () | ||
|
||
|
@@ -179,6 +192,13 @@ def path(self) -> str: | |
"""The path that a fsspec filesystem can use.""" | ||
return self.parser.strip_protocol(self.__str__()) | ||
|
||
def to_dict(self) -> SerializedUPath: | ||
return { | ||
"path": self.path, | ||
"protocol": self.protocol, | ||
"storage_options": dict(self.storage_options), | ||
} | ||
|
||
Comment on lines
+195
to
+201
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is Let's not extend the public UPath class API for pydantic v1 support There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah thats fair enough. I wanted to include a convenience method for enabling JSON serialization in pydantic v1 models. At present, there would not be a way to serialize this class that a V1 model would utilize - for example doing:
whereas if I do:
So its mainly a way to expose an easier serialization method that can be used with v1 models - unfortunately the serialization format for v2 is not used with v1 models (and you can't define a hook method on the custom class to provide a standardized serialization method for the class 😭, its solely the responsibility of the v1 model that uses the custom class), so I thought it might be a useful idea to include a serialization method to the class to provide an easier method to use as a json encoder hook. Totally okay with removing it though if you think this is too much over-reach of this PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmm... We could introduce a general serializer helper function in universal-pathlib, that could then be used. In principle this would be: serialize_upath = upath.UPath.__get_pydantic_core_schema__(None, **None)['serialization']['function'] Thinking about this, that might be the best idea. Something similar will land with chaining support: https://github.com/fsspec/universal_pathlib/pull/346/files#diff-79e86edc0bf259fa57cede9d23833d37576186abbdfe51591786da661c2620dfR28-R31 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but wouldn't this enforce that the serialization method requires installation of pydantic >= 2 to execute? Since the I agree I would like But perhaps thats not a big deal - if you want to serialize a UPath - you need |
||
def joinuri(self, uri: JoinablePathLike) -> UPath: | ||
"""Join with urljoin behavior for UPath instances""" | ||
# short circuit if the new uri uses a different protocol | ||
|
@@ -946,9 +966,7 @@ def __get_pydantic_core_schema__( | |
|
||
deserialization_schema = core_schema.chain_schema( | ||
[ | ||
core_schema.no_info_plain_validator_function( | ||
lambda v: {"path": v} if isinstance(v, str) else v, | ||
), | ||
core_schema.no_info_plain_validator_function(cls._to_serialized_format), | ||
core_schema.typed_dict_schema( | ||
{ | ||
"path": core_schema.typed_dict_field( | ||
|
@@ -973,13 +991,7 @@ def __get_pydantic_core_schema__( | |
}, | ||
extra_behavior="forbid", | ||
), | ||
core_schema.no_info_plain_validator_function( | ||
lambda dct: cls( | ||
dct.pop("path"), | ||
protocol=dct.pop("protocol"), | ||
**dct["storage_options"], | ||
) | ||
), | ||
core_schema.no_info_plain_validator_function(cls._validate), | ||
] | ||
) | ||
|
||
|
@@ -998,3 +1010,39 @@ def __get_pydantic_core_schema__( | |
), | ||
serialization=serialization_schema, | ||
) | ||
|
||
@classmethod | ||
def __get_validators__(cls) -> Iterator[Callable]: | ||
yield cls._validate | ||
Comment on lines
+1014
to
+1016
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could support for v1 be added by only introducing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. mentioned in a comment below but linked again for good measure - ultimately |
||
|
||
@staticmethod | ||
def _to_serialized_format( | ||
v: str | pathlib.Path | _UPathMixin | dict[str, Any] | ||
) -> SerializedUPath: | ||
if isinstance(v, _UPathMixin): | ||
return v.to_dict() | ||
if isinstance(v, dict): | ||
return { | ||
"path": v["path"], | ||
"protocol": v.get("protocol", ""), | ||
"storage_options": v.get("storage_options", {}), | ||
} | ||
if isinstance(v, pathlib.Path): | ||
return {"path": v.as_posix(), "protocol": ""} | ||
if isinstance(v, str): | ||
return { | ||
"path": v, | ||
} | ||
raise TypeError(f"Invalid path: {v!r}") | ||
|
||
@classmethod | ||
def _validate(cls, v: Any) -> UPath: | ||
if not isinstance(v, UPath): | ||
v = cls._to_serialized_format(v) | ||
|
||
return cls( | ||
v["path"], | ||
protocol=v.get("protocol"), | ||
**v.get("storage_options", {}), # type: ignore[arg-type] | ||
) | ||
return v | ||
Comment on lines
+1038
to
+1048
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this classmethod needed specifically? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is what is actually executed during v1 validation - I added this as a function such that now both v1 and v2 actually run this method when executing validation. I believe the v2 validation used LMK what you think though - I can change it back to having two separate procedures for v1 and v2 if need be. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let me rephrase: Why does this have to be attached to the class interface. I.e. this could be a separate helper function for
I'd prefer that. It'll make maintenance easier down the line, when v1 support is dropped eventually. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I guess my point is that encapsulating this logic within one method that is used by both v1 ( But okay, so you would prefer having a separate function that is not a classmethod that undertakes this validation and have it separate to the v2 validation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not predict the future. It's better if these tests break once pydantiv 3 actually breaks the api we would rely on in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is fair - I can remove the pin but might keep the comment such that when tests inevitably fail when pydantic v3 is released there is something to note why this might be the case? Does that sound suitable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI here is the version policy indicating the intention to remove
pydantic.v1
from v3 onwards.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I follow. Where does is say in there that
pydantic.v1
won't be available in v3 ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah sorry good point - I might have misread there - but here is the comment from Sam Colvin about intended removal of the V1 shim.