Skip to content

Commit 2252e71

Browse files
committed
Merge branch 'main' of github.com:apache/iceberg-python into fd-add-ability-to-delete-full-data-files
2 parents 5cdb363 + b8c5bb7 commit 2252e71

28 files changed

+1836
-557
lines changed

.github/workflows/python-release.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ jobs:
5959
if: startsWith(matrix.os, 'ubuntu')
6060

6161
- name: Build wheels
62-
uses: pypa/cibuildwheel@v2.18.1
62+
uses: pypa/cibuildwheel@v2.19.1
6363
with:
6464
output-dir: wheelhouse
6565
config-file: "pyproject.toml"

dev/provision.py

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -342,3 +342,50 @@
342342
(array(), map(), array(struct(1)))
343343
"""
344344
)
345+
346+
spark.sql(
347+
f"""
348+
CREATE OR REPLACE TABLE {catalog_name}.default.test_table_snapshot_operations (
349+
number integer
350+
)
351+
USING iceberg
352+
TBLPROPERTIES (
353+
'format-version'='2'
354+
);
355+
"""
356+
)
357+
358+
spark.sql(
359+
f"""
360+
INSERT INTO {catalog_name}.default.test_table_snapshot_operations
361+
VALUES (1)
362+
"""
363+
)
364+
365+
spark.sql(
366+
f"""
367+
INSERT INTO {catalog_name}.default.test_table_snapshot_operations
368+
VALUES (2)
369+
"""
370+
)
371+
372+
spark.sql(
373+
f"""
374+
DELETE FROM {catalog_name}.default.test_table_snapshot_operations
375+
WHERE number = 2
376+
"""
377+
)
378+
379+
spark.sql(
380+
f"""
381+
INSERT INTO {catalog_name}.default.test_table_snapshot_operations
382+
VALUES (3)
383+
"""
384+
)
385+
386+
spark.sql(
387+
f"""
388+
INSERT INTO {catalog_name}.default.test_table_snapshot_operations
389+
VALUES (4)
390+
"""
391+
)

mkdocs/docs/api.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -926,6 +926,28 @@ tbl.overwrite(df, snapshot_properties={"abc": "def"})
926926
assert tbl.metadata.snapshots[-1].summary["abc"] == "def"
927927
```
928928

929+
## Snapshot Management
930+
931+
Manage snapshots with operations through the `Table` API:
932+
933+
```python
934+
# To run a specific operation
935+
table.manage_snapshots().create_tag(snapshot_id, "tag123").commit()
936+
# To run multiple operations
937+
table.manage_snapshots()
938+
.create_tag(snapshot_id1, "tag123")
939+
.create_tag(snapshot_id2, "tag456")
940+
.commit()
941+
# Operations are applied on commit.
942+
```
943+
944+
You can also use context managers to make more changes:
945+
946+
```python
947+
with table.manage_snapshots() as ms:
948+
ms.create_branch(snapshot_id1, "Branch_A").create_tag(snapshot_id2, "tag789")
949+
```
950+
929951
## Query the data
930952

931953
To query a table, a table scan is needed. A table scan accepts a filter, columns, optionally a limit and a snapshot ID:
@@ -994,6 +1016,15 @@ tpep_dropoff_datetime: [[2021-04-01 00:47:59.000000,...,2021-05-01 00:14:47.0000
9941016

9951017
This will only pull in the files that that might contain matching rows.
9961018

1019+
One can also return a PyArrow RecordBatchReader, if reading one record batch at a time is preferred:
1020+
1021+
```python
1022+
table.scan(
1023+
row_filter=GreaterThanOrEqual("trip_distance", 10.0),
1024+
selected_fields=("VendorID", "tpep_pickup_datetime", "tpep_dropoff_datetime"),
1025+
).to_arrow_batch_reader()
1026+
```
1027+
9971028
### Pandas
9981029

9991030
<!-- prettier-ignore-start -->

mkdocs/docs/configuration.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,7 @@ For the FileIO there are several configuration options available:
8989
| s3.access-key-id | admin | Configure the static secret access key used to access the FileIO. |
9090
| s3.secret-access-key | password | Configure the static session token used to access the FileIO. |
9191
| s3.signer | bearer | Configure the signature version of the FileIO. |
92+
| s3.signer.uri | http://my.signer:8080/s3 | Configure the remote signing uri if it differs from the catalog uri. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.singer.uri>/v1/aws/s3/sign`. |
9293
| s3.region | us-west-2 | Sets the region of the bucket |
9394
| s3.proxy-uri | http://my.proxy.com:8080 | Configure the proxy server to be used by the FileIO. |
9495
| s3.connect-timeout | 60.0 | Configure socket connection timeout, in seconds. |

mkdocs/requirements.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,13 @@
1616
# under the License.
1717

1818
mkdocs==1.6.0
19-
griffe==0.45.2
19+
griffe==0.47.0
2020
jinja2==3.1.4
2121
mkdocstrings==0.25.1
22-
mkdocstrings-python==1.10.3
22+
mkdocstrings-python==1.10.5
2323
mkdocs-literate-nav==0.6.1
2424
mkdocs-autorefs==1.0.1
2525
mkdocs-gen-files==0.5.0
26-
mkdocs-material==9.5.25
26+
mkdocs-material==9.5.27
2727
mkdocs-material-extensions==1.3.1
2828
mkdocs-section-index==0.3.9

0 commit comments

Comments
 (0)