Skip to content

Commit 4f02298

Browse files
oxkevinjqliu
andauthored
Add serializer for AssertRefSnapshotId allowing null json value (#2375)
Closes #2342 # Rationale for this change The last attempt at fixing this (#2343) did not actually end up serializing None values to `null`. This was b/c the `AssertRefSnapshotId` requirement was often a child object of some other `Request` object, which was actually getting the `model_dump_json()` call. This meant that the `exclude_none=True` override would remove any `None`s _before_ any of the `Field`-specific configurations would be considered (like `exclude=False` or the model's own `model_dump_json` method). I then investigated setting some `ConfigDict` settings on the model, but that also did not have any useful configurations. I looked into setting `exclude_none=True` at all of the `model_json_dump()` callsites but there were too many and it would be easy to forget to set that value every time. The issue is that many values are _supposed to be absent_ to instruct the REST Catalog about what the client wants (things like filtering, for instance). ~The solution I ended up with was allowing `snapshot_id` to be a required `int` with a default value of `-1`, and adding a json-specific field serializer to write out `null` when dumping to json. This seems to work~ The solution that worked and allowed for `snapshot_id` to be set to `None` (as it's done in Ray Data), was to define a `model_serializer` method and always return the fields. This was based on conversations in pydantic/pydantic#7326. ## Are these changes tested? Yes ## Are there any user-facing changes? Actually allow `snapshot-id` to be null for `assert-ref-snapshot-id` requirements --------- Co-authored-by: Kevin Liu <[email protected]>
1 parent 07f3453 commit 4f02298

File tree

2 files changed

+23
-10
lines changed

2 files changed

+23
-10
lines changed

pyiceberg/table/update/__init__.py

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@
2121
from abc import ABC, abstractmethod
2222
from datetime import datetime
2323
from functools import singledispatch
24-
from typing import TYPE_CHECKING, Annotated, Any, Dict, Generic, List, Literal, Optional, Set, Tuple, TypeVar, Union, cast
24+
from typing import TYPE_CHECKING, Annotated, Any, Dict, Generic, List, Literal, Optional, Tuple, TypeVar, Union, cast
2525

26-
from pydantic import Field, field_validator, model_validator
26+
from pydantic import Field, field_validator, model_serializer, model_validator
2727

2828
from pyiceberg.exceptions import CommitFailedException
2929
from pyiceberg.partitioning import PARTITION_FIELD_ID_START, PartitionSpec
@@ -52,6 +52,8 @@
5252
from pyiceberg.utils.properties import property_as_int
5353

5454
if TYPE_CHECKING:
55+
from pydantic.functional_serializers import ModelWrapSerializerWithoutInfo
56+
5557
from pyiceberg.table import Transaction
5658

5759
U = TypeVar("U")
@@ -727,6 +729,12 @@ class AssertRefSnapshotId(ValidatableTableRequirement):
727729
ref: str = Field(...)
728730
snapshot_id: Optional[int] = Field(default=None, alias="snapshot-id")
729731

732+
@model_serializer(mode="wrap")
733+
def serialize_model(self, handler: ModelWrapSerializerWithoutInfo) -> dict[str, Any]:
734+
partial_result = handler(self)
735+
# Ensure "snapshot-id" is always present, even if value is None
736+
return {**partial_result, "snapshot-id": self.snapshot_id}
737+
730738
def validate(self, base_metadata: Optional[TableMetadata]) -> None:
731739
if base_metadata is None:
732740
raise CommitFailedException("Requirement failed: current table metadata is missing")
@@ -745,13 +753,6 @@ def validate(self, base_metadata: Optional[TableMetadata]) -> None:
745753
elif self.snapshot_id is not None:
746754
raise CommitFailedException(f"Requirement failed: branch or tag {self.ref} is missing, expected {self.snapshot_id}")
747755

748-
# override the override method, allowing None to serialize to `null` instead of being omitted.
749-
def model_dump_json(
750-
self, exclude_none: bool = False, exclude: Optional[Set[str]] = None, by_alias: bool = True, **kwargs: Any
751-
) -> str:
752-
# `snapshot-id` is required in json response, even if null
753-
return super().model_dump_json(exclude_none=False)
754-
755756

756757
class AssertLastAssignedFieldId(ValidatableTableRequirement):
757758
"""The table's last assigned column id must match the requirement's `last-assigned-field-id`."""

tests/test_serializers.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,14 +18,16 @@
1818
import json
1919
import os
2020
import uuid
21-
from typing import Any, Dict
21+
from typing import Any, Dict, Tuple
2222

2323
import pytest
2424
from pytest_mock import MockFixture
2525

2626
from pyiceberg.serializers import ToOutputFile
2727
from pyiceberg.table import StaticTable
2828
from pyiceberg.table.metadata import TableMetadataV1
29+
from pyiceberg.table.update import AssertRefSnapshotId, TableRequirement
30+
from pyiceberg.typedef import IcebergBaseModel
2931

3032

3133
def test_legacy_current_snapshot_id(
@@ -48,3 +50,13 @@ def test_legacy_current_snapshot_id(
4850
backwards_compatible_static_table = StaticTable.from_metadata(metadata_location)
4951
assert backwards_compatible_static_table.metadata.current_snapshot_id is None
5052
assert backwards_compatible_static_table.metadata == static_table.metadata
53+
54+
55+
def test_null_serializer_field() -> None:
56+
class ExampleRequest(IcebergBaseModel):
57+
requirements: Tuple[TableRequirement, ...]
58+
59+
request = ExampleRequest(requirements=(AssertRefSnapshotId(ref="main", snapshot_id=None),))
60+
dumped_json = request.model_dump_json()
61+
expected_json = """{"type":"assert-ref-snapshot-id","ref":"main","snapshot-id":null}"""
62+
assert expected_json in dumped_json

0 commit comments

Comments
 (0)