Skip to content

Commit 284d05a

Browse files
committed
Merge branch 'main' of github.com:apache/iceberg-python into fd-add-ability-to-delete-full-data-files
2 parents bc9c83e + 4148edb commit 284d05a

28 files changed

+2259
-462
lines changed

mkdocs/docs/api.md

Lines changed: 178 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,25 @@ catalog.create_table(
165165
)
166166
```
167167

168+
To create a table with some subsequent changes atomically in a transaction:
169+
170+
```python
171+
with catalog.create_table_transaction(
172+
identifier="docs_example.bids",
173+
schema=schema,
174+
location="s3://pyiceberg",
175+
partition_spec=partition_spec,
176+
sort_order=sort_order,
177+
) as txn:
178+
with txn.update_schema() as update_schema:
179+
update_schema.add_column(path="new_column", field_type=StringType())
180+
181+
with txn.update_spec() as update_spec:
182+
update_spec.add_identity("symbol")
183+
184+
txn.set_properties(test_a="test_aa", test_b="test_b", test_c="test_c")
185+
```
186+
168187
## Load a table
169188

170189
### Catalog table
@@ -351,7 +370,165 @@ manifest_list: [["s3://warehouse/default/table_metadata_snapshots/metadata/snap-
351370
summary: [[keys:["added-files-size","added-data-files","added-records","total-data-files","total-delete-files","total-records","total-files-size","total-position-deletes","total-equality-deletes"]values:["5459","1","3","1","0","3","5459","0","0"],keys:["added-files-size","added-data-files","added-records","total-data-files","total-records",...,"total-equality-deletes","total-files-size","deleted-data-files","deleted-records","removed-files-size"]values:["5459","1","3","1","3",...,"0","5459","1","3","5459"],keys:["added-files-size","added-data-files","added-records","total-data-files","total-delete-files","total-records","total-files-size","total-position-deletes","total-equality-deletes"]values:["5459","1","3","2","0","6","10918","0","0"]]]
352371
```
353372
354-
### Add Files
373+
### Entries
374+
375+
To show all the table's current manifest entries for both data and delete files.
376+
377+
```python
378+
table.inspect.entries()
379+
```
380+
381+
```
382+
pyarrow.Table
383+
status: int8 not null
384+
snapshot_id: int64 not null
385+
sequence_number: int64 not null
386+
file_sequence_number: int64 not null
387+
data_file: struct<content: int8 not null, file_path: string not null, file_format: string not null, partition: struct<> not null, record_count: int64 not null, file_size_in_bytes: int64 not null, column_sizes: map<int32, int64>, value_counts: map<int32, int64>, null_value_counts: map<int32, int64>, nan_value_counts: map<int32, int64>, lower_bounds: map<int32, binary>, upper_bounds: map<int32, binary>, key_metadata: binary, split_offsets: list<item: int64>, equality_ids: list<item: int32>, sort_order_id: int32> not null
388+
child 0, content: int8 not null
389+
child 1, file_path: string not null
390+
child 2, file_format: string not null
391+
child 3, partition: struct<> not null
392+
child 4, record_count: int64 not null
393+
child 5, file_size_in_bytes: int64 not null
394+
child 6, column_sizes: map<int32, int64>
395+
child 0, entries: struct<key: int32 not null, value: int64> not null
396+
child 0, key: int32 not null
397+
child 1, value: int64
398+
child 7, value_counts: map<int32, int64>
399+
child 0, entries: struct<key: int32 not null, value: int64> not null
400+
child 0, key: int32 not null
401+
child 1, value: int64
402+
child 8, null_value_counts: map<int32, int64>
403+
child 0, entries: struct<key: int32 not null, value: int64> not null
404+
child 0, key: int32 not null
405+
child 1, value: int64
406+
child 9, nan_value_counts: map<int32, int64>
407+
child 0, entries: struct<key: int32 not null, value: int64> not null
408+
child 0, key: int32 not null
409+
child 1, value: int64
410+
child 10, lower_bounds: map<int32, binary>
411+
child 0, entries: struct<key: int32 not null, value: binary> not null
412+
child 0, key: int32 not null
413+
child 1, value: binary
414+
child 11, upper_bounds: map<int32, binary>
415+
child 0, entries: struct<key: int32 not null, value: binary> not null
416+
child 0, key: int32 not null
417+
child 1, value: binary
418+
child 12, key_metadata: binary
419+
child 13, split_offsets: list<item: int64>
420+
child 0, item: int64
421+
child 14, equality_ids: list<item: int32>
422+
child 0, item: int32
423+
child 15, sort_order_id: int32
424+
readable_metrics: struct<city: struct<column_size: int64, value_count: int64, null_value_count: int64, nan_value_count: int64, lower_bound: string, upper_bound: string> not null, lat: struct<column_size: int64, value_count: int64, null_value_count: int64, nan_value_count: int64, lower_bound: double, upper_bound: double> not null, long: struct<column_size: int64, value_count: int64, null_value_count: int64, nan_value_count: int64, lower_bound: double, upper_bound: double> not null>
425+
child 0, city: struct<column_size: int64, value_count: int64, null_value_count: int64, nan_value_count: int64, lower_bound: string, upper_bound: string> not null
426+
child 0, column_size: int64
427+
child 1, value_count: int64
428+
child 2, null_value_count: int64
429+
child 3, nan_value_count: int64
430+
child 4, lower_bound: string
431+
child 5, upper_bound: string
432+
child 1, lat: struct<column_size: int64, value_count: int64, null_value_count: int64, nan_value_count: int64, lower_bound: double, upper_bound: double> not null
433+
child 0, column_size: int64
434+
child 1, value_count: int64
435+
child 2, null_value_count: int64
436+
child 3, nan_value_count: int64
437+
child 4, lower_bound: double
438+
child 5, upper_bound: double
439+
child 2, long: struct<column_size: int64, value_count: int64, null_value_count: int64, nan_value_count: int64, lower_bound: double, upper_bound: double> not null
440+
child 0, column_size: int64
441+
child 1, value_count: int64
442+
child 2, null_value_count: int64
443+
child 3, nan_value_count: int64
444+
child 4, lower_bound: double
445+
child 5, upper_bound: double
446+
----
447+
status: [[1]]
448+
snapshot_id: [[6245626162224016531]]
449+
sequence_number: [[1]]
450+
file_sequence_number: [[1]]
451+
data_file: [
452+
-- is_valid: all not null
453+
-- child 0 type: int8
454+
[0]
455+
-- child 1 type: string
456+
["s3://warehouse/default/cities/data/00000-0-80766b66-e558-4150-a5cf-85e4c609b9fe.parquet"]
457+
-- child 2 type: string
458+
["PARQUET"]
459+
-- child 3 type: struct<>
460+
-- is_valid: all not null
461+
-- child 4 type: int64
462+
[4]
463+
-- child 5 type: int64
464+
[1656]
465+
-- child 6 type: map<int32, int64>
466+
[keys:[1,2,3]values:[140,135,135]]
467+
-- child 7 type: map<int32, int64>
468+
[keys:[1,2,3]values:[4,4,4]]
469+
-- child 8 type: map<int32, int64>
470+
[keys:[1,2,3]values:[0,0,0]]
471+
-- child 9 type: map<int32, int64>
472+
[keys:[]values:[]]
473+
-- child 10 type: map<int32, binary>
474+
[keys:[1,2,3]values:[416D7374657264616D,8602B68311E34240,3A77BB5E9A9B5EC0]]
475+
-- child 11 type: map<int32, binary>
476+
[keys:[1,2,3]values:[53616E204672616E636973636F,F5BEF1B5678E4A40,304CA60A46651840]]
477+
-- child 12 type: binary
478+
[null]
479+
-- child 13 type: list<item: int64>
480+
[[4]]
481+
-- child 14 type: list<item: int32>
482+
[null]
483+
-- child 15 type: int32
484+
[null]]
485+
readable_metrics: [
486+
-- is_valid: all not null
487+
-- child 0 type: struct<column_size: int64, value_count: int64, null_value_count: int64, nan_value_count: int64, lower_bound: string, upper_bound: string>
488+
-- is_valid: all not null
489+
-- child 0 type: int64
490+
[140]
491+
-- child 1 type: int64
492+
[4]
493+
-- child 2 type: int64
494+
[0]
495+
-- child 3 type: int64
496+
[null]
497+
-- child 4 type: string
498+
["Amsterdam"]
499+
-- child 5 type: string
500+
["San Francisco"]
501+
-- child 1 type: struct<column_size: int64, value_count: int64, null_value_count: int64, nan_value_count: int64, lower_bound: double, upper_bound: double>
502+
-- is_valid: all not null
503+
-- child 0 type: int64
504+
[135]
505+
-- child 1 type: int64
506+
[4]
507+
-- child 2 type: int64
508+
[0]
509+
-- child 3 type: int64
510+
[null]
511+
-- child 4 type: double
512+
[37.773972]
513+
-- child 5 type: double
514+
[53.11254]
515+
-- child 2 type: struct<column_size: int64, value_count: int64, null_value_count: int64, nan_value_count: int64, lower_bound: double, upper_bound: double>
516+
-- is_valid: all not null
517+
-- child 0 type: int64
518+
[135]
519+
-- child 1 type: int64
520+
[4]
521+
-- child 2 type: int64
522+
[0]
523+
-- child 3 type: int64
524+
[null]
525+
-- child 4 type: double
526+
[-122.431297]
527+
-- child 5 type: double
528+
[6.0989]]
529+
```
530+
531+
## Add Files
355532

356533
Expert Iceberg users may choose to commit existing parquet files to the Iceberg table as data files, without rewriting them.
357534

mkdocs/docs/how-to-release.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,17 @@ export GIT_TAG_HASH=${GIT_TAG_REF:0:40}
5656
export LAST_COMMIT_ID=$(git rev-list ${GIT_TAG} 2> /dev/null | head -n 1)
5757
```
5858

59-
The `-s` option will sign the commit. If you don't have a key yet, you can find the instructions [here](http://www.apache.org/dev/openpgp.html#key-gen-generate-key). To install gpg on a M1 based Mac, a couple of additional steps are required: https://gist.github.com/phortuin/cf24b1cca3258720c71ad42977e1ba57
59+
The `-s` option will sign the commit. If you don't have a key yet, you can find the instructions [here](http://www.apache.org/dev/openpgp.html#key-gen-generate-key). To install gpg on a M1 based Mac, a couple of additional steps are required: https://gist.github.com/phortuin/cf24b1cca3258720c71ad42977e1ba57.
60+
If you have not published your GPG key in [KEYS](https://dist.apache.org/repos/dist/dev/iceberg/KEYS) yet, you must publish it before sending the vote email by doing:
61+
62+
```bash
63+
svn co https://dist.apache.org/repos/dist/dev/iceberg icebergsvn
64+
cd icebergsvn
65+
echo "" >> KEYS # append a newline
66+
gpg --list-sigs <YOUR KEY ID HERE> >> KEYS # append signatures
67+
gpg --armor --export <YOUR KEY ID HERE> >> KEYS # append public key block
68+
svn commit -m "add key for <YOUR NAME HERE>"
69+
```
6070

6171
### Upload to Apache SVN
6272

@@ -96,7 +106,7 @@ Go to Github Actions and run the `Python release` action. Set the version of the
96106
Next step is to upload them to pypi. Please keep in mind that this **won't** bump the version for everyone that hasn't pinned their version, since it is set to an RC [pre-release and those are ignored](https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/#pre-release-versioning).
97107

98108
```bash
99-
twine upload -s release-0.1.0rc1/*
109+
twine upload release-0.1.0rc1/*
100110
```
101111

102112
Final step is to generate the email to the dev mail list:
@@ -176,7 +186,7 @@ svn ci -m "PyIceberg <VERSION>" /tmp/iceberg-dist-release/
176186
The latest version can be pushed to PyPi. Check out the Apache SVN and make sure to publish the right version with `twine`:
177187

178188
```bash
179-
twine upload -s /tmp/iceberg-dist-release/pyiceberg-<VERSION>/*
189+
twine upload /tmp/iceberg-dist-release/pyiceberg-<VERSION>/*
180190
```
181191

182192
Send out an announcement on the dev mail list:

0 commit comments

Comments
 (0)