Skip to content

Commit b0bb556

Browse files
authored
Merge branch 'apache:main' into main
2 parents e57cf77 + 9850290 commit b0bb556

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+3477
-645
lines changed

.github/workflows/python-ci-docs.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -36,12 +36,12 @@ jobs:
3636

3737
steps:
3838
- uses: actions/checkout@v4
39+
- name: Install poetry
40+
run: make install-poetry
3941
- uses: actions/setup-python@v5
4042
with:
4143
python-version: 3.12
4244
- name: Install
43-
working-directory: ./mkdocs
44-
run: pip install -r requirements.txt
45-
- name: Build
46-
working-directory: ./mkdocs
47-
run: mkdocs build --strict
45+
run: make docs-install
46+
- name: Build docs
47+
run: make docs-build

.github/workflows/python-release-docs.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,15 +31,15 @@ jobs:
3131

3232
steps:
3333
- uses: actions/checkout@v4
34+
- name: Install poetry
35+
run: make install-poetry
3436
- uses: actions/setup-python@v5
3537
with:
3638
python-version: ${{ matrix.python }}
37-
- name: Install
38-
working-directory: ./mkdocs
39-
run: pip install -r requirements.txt
40-
- name: Build
41-
working-directory: ./mkdocs
42-
run: mkdocs build --strict
39+
- name: Install docs
40+
run: make docs-install
41+
- name: Build docs
42+
run: make docs-build
4343
- name: Copy
4444
working-directory: ./mkdocs
4545
run: mv ./site /tmp/site

.github/workflows/stale.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ jobs:
3131
if: github.repository_owner == 'apache'
3232
runs-on: ubuntu-22.04
3333
steps:
34-
- uses: actions/stale@v9.0.0
34+
- uses: actions/stale@v9.1.0
3535
with:
3636
stale-issue-label: 'stale'
3737
exempt-issue-labels: 'not-stale'

.pre-commit-config.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@ repos:
2323
hooks:
2424
- id: trailing-whitespace
2525
- id: end-of-file-fixer
26-
- id: check-docstring-first
2726
- id: debug-statements
2827
- id: check-yaml
2928
- id: check-ast

Makefile

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,12 @@ help: ## Display this help
2222
install-poetry: ## Install poetry if the user has not done that yet.
2323
@if ! command -v poetry &> /dev/null; then \
2424
echo "Poetry could not be found. Installing..."; \
25-
pip install --user poetry==1.8.5; \
25+
pip install --user poetry==2.0.1; \
2626
else \
2727
echo "Poetry is already installed."; \
2828
fi
2929

30-
install-dependencies: ## Install dependencies including dev and all extras
30+
install-dependencies: ## Install dependencies including dev, docs, and all extras
3131
poetry install --all-extras
3232

3333
install: | install-poetry install-dependencies
@@ -97,3 +97,12 @@ clean: ## Clean up the project Python working environment
9797
@find . -name "*.pyd" -exec echo Deleting {} \; -delete
9898
@find . -name "*.pyo" -exec echo Deleting {} \; -delete
9999
@echo "Cleanup complete"
100+
101+
docs-install:
102+
poetry install --with docs
103+
104+
docs-serve:
105+
poetry run mkdocs serve -f mkdocs/mkdocs.yml
106+
107+
docs-build:
108+
poetry run mkdocs build -f mkdocs/mkdocs.yml --strict

NOTICE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11

22
Apache Iceberg
3-
Copyright 2017-2024 The Apache Software Foundation
3+
Copyright 2017-2025 The Apache Software Foundation
44

55
This product includes software developed at
66
The Apache Software Foundation (http://www.apache.org/).

dev/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ ENV ICEBERG_SPARK_RUNTIME_VERSION=3.5_2.12
4242
ENV ICEBERG_VERSION=1.6.0
4343
ENV PYICEBERG_VERSION=0.8.1
4444

45-
RUN curl --retry 5 -s -C - https://dlcdn.apache.org/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop3.tgz -o spark-${SPARK_VERSION}-bin-hadoop3.tgz \
45+
RUN curl --retry 5 -s -C - https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop3.tgz -o spark-${SPARK_VERSION}-bin-hadoop3.tgz \
4646
&& tar xzf spark-${SPARK_VERSION}-bin-hadoop3.tgz --directory /opt/spark --strip-components 1 \
4747
&& rm -rf spark-${SPARK_VERSION}-bin-hadoop3.tgz
4848

mkdocs/README.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@ The pyiceberg docs are stored in `docs/`.
2222
## Running docs locally
2323

2424
```sh
25-
pip3 install -r requirements.txt
26-
mkdocs serve
27-
open http://localhost:8000/
25+
make docs-install
26+
make docs-serve
2827
```

mkdocs/docs/api.md

Lines changed: 37 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1072,30 +1072,36 @@ Using `add_column` you can add a column, without having to worry about the field
10721072
with table.update_schema() as update:
10731073
update.add_column("retries", IntegerType(), "Number of retries to place the bid")
10741074
# In a struct
1075-
update.add_column("details.confirmed_by", StringType(), "Name of the exchange")
1075+
update.add_column("details", StructType())
1076+
1077+
with table.update_schema() as update:
1078+
update.add_column(("details", "confirmed_by"), StringType(), "Name of the exchange")
10761079
```
10771080

1081+
A complex type must exist before columns can be added to it. Fields in complex types are added in a tuple.
1082+
10781083
### Rename column
10791084

10801085
Renaming a field in an Iceberg table is simple:
10811086

10821087
```python
10831088
with table.update_schema() as update:
10841089
update.rename_column("retries", "num_retries")
1085-
# This will rename `confirmed_by` to `exchange`
1086-
update.rename_column("properties.confirmed_by", "exchange")
1090+
# This will rename `confirmed_by` to `processed_by` in the `details` struct
1091+
update.rename_column(("details", "confirmed_by"), "processed_by")
10871092
```
10881093

10891094
### Move column
10901095

1091-
Move a field inside of struct:
1096+
Move order of fields:
10921097

10931098
```python
10941099
with table.update_schema() as update:
10951100
update.move_first("symbol")
1101+
# This will move `bid` after `ask`
10961102
update.move_after("bid", "ask")
1097-
# This will move `confirmed_by` before `exchange`
1098-
update.move_before("details.created_by", "details.exchange")
1103+
# This will move `confirmed_by` before `exchange` in the `details` struct
1104+
update.move_before(("details", "confirmed_by"), ("details", "exchange"))
10991105
```
11001106

11011107
### Update column
@@ -1127,6 +1133,8 @@ Delete a field, careful this is a incompatible change (readers/writers might exp
11271133
```python
11281134
with table.update_schema(allow_incompatible_changes=True) as update:
11291135
update.delete_column("some_field")
1136+
# In a struct
1137+
update.delete_column(("details", "confirmed_by"))
11301138
```
11311139

11321140
## Partition evolution
@@ -1250,6 +1258,29 @@ with table.manage_snapshots() as ms:
12501258
ms.create_branch(snapshot_id1, "Branch_A").create_tag(snapshot_id2, "tag789")
12511259
```
12521260

1261+
## Table Statistics Management
1262+
1263+
Manage table statistics with operations through the `Table` API:
1264+
1265+
```python
1266+
# To run a specific operation
1267+
table.update_statistics().set_statistics(statistics_file=statistics_file).commit()
1268+
# To run multiple operations
1269+
table.update_statistics()
1270+
.set_statistics(statistics_file1)
1271+
.remove_statistics(snapshot_id2)
1272+
.commit()
1273+
# Operations are applied on commit.
1274+
```
1275+
1276+
You can also use context managers to make more changes:
1277+
1278+
```python
1279+
with table.update_statistics() as update:
1280+
update.set_statistics(statistics_file)
1281+
update.remove_statistics(snapshot_id2)
1282+
```
1283+
12531284
## Query the data
12541285

12551286
To query a table, a table scan is needed. A table scan accepts a filter, columns, optionally a limit and a snapshot ID:

0 commit comments

Comments
 (0)