Skip to content

Commit f288493

Browse files
smaheshwar-pltrSreesh Maheshwar
authored andcommitted
Fix TableScan.update to exclude cached properties (apache#2178)
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change Closes apache#2179. # Are these changes tested? Yes. # Are there any user-facing changes? Yes, the scenario shown in the test and described in the issue now works. <!-- In the case of user-facing changes, please add the changelog label. --> --------- Co-authored-by: Sreesh Maheshwar <[email protected]>
1 parent fcfcd70 commit f288493

File tree

2 files changed

+21
-1
lines changed

2 files changed

+21
-1
lines changed

pyiceberg/table/__init__.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1702,7 +1702,14 @@ def to_polars(self) -> pl.DataFrame: ...
17021702

17031703
def update(self: S, **overrides: Any) -> S:
17041704
"""Create a copy of this table scan with updated fields."""
1705-
return type(self)(**{**self.__dict__, **overrides})
1705+
from inspect import signature
1706+
1707+
# Extract those attributes that are constructor parameters. We don't use self.__dict__ as the kwargs to the
1708+
# constructors because it may contain additional attributes that are not part of the constructor signature.
1709+
params = signature(type(self).__init__).parameters.keys() - {"self"} # Skip "self" parameter
1710+
kwargs = {param: getattr(self, param) for param in params} # Assume parameters are attributes
1711+
1712+
return type(self)(**{**kwargs, **overrides})
17061713

17071714
def use_ref(self: S, name: str) -> S:
17081715
if self.snapshot_id:

tests/integration/test_reads.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1057,3 +1057,16 @@ def test_initial_default(catalog: Catalog, spark: SparkSession) -> None:
10571057
result_table = tbl.scan().filter("so_true == True").to_arrow()
10581058

10591059
assert len(result_table) == 10
1060+
1061+
1062+
@pytest.mark.integration
1063+
@pytest.mark.parametrize("catalog", [pytest.lazy_fixture("session_catalog_hive"), pytest.lazy_fixture("session_catalog")])
1064+
def test_filter_after_arrow_scan(catalog: Catalog) -> None:
1065+
identifier = "test_partitioned_by_hours"
1066+
table = catalog.load_table(f"default.{identifier}")
1067+
1068+
scan = table.scan()
1069+
assert len(scan.to_arrow()) > 0
1070+
1071+
scan = scan.filter("ts >= '2023-03-05T00:00:00+00:00'")
1072+
assert len(scan.to_arrow()) > 0

0 commit comments

Comments
 (0)