Releases: delta-io/delta-rs
python-v1.5.0: faster writes, log compaction, spil config in MERGE
Breaking changes!
get_add_actions returns Arro3Table instead of Arro3RecordBatch
Main changes
- perf: parallel partition writers via per-stream JoinSet by @fvaleye in #4193
- refactor(python): get add action return arrow table by @vsmanish1772 in #4204
- feat: added disk spilling for merge by @thomasfrederikhoeck in #4219
- feat: log compaction by @ion-elgreco in #4210
- feat: vacuum lite mode to avoid storage listing by @khalidmammadov in #4227
- feat: implement batched deletion in delete_dir function by @nabobery in #4244
- chore: python datafusion 52 upgrade by @ethan-tyler in #4226
What's Changed
- chore: bump version from 1.4.1 to 1.4.2 by @ion-elgreco in #4182
- refactor: use FileSelection for matched file scans by @ethan-tyler in #4188
- feat: add DeltaScan insert_into with runtime log_store by @ethan-tyler in #4187
- fix: make session the first argument of update_datafusion_session by @pauldouane in #4192
- fix: preserve generated column metadata during schema merge by @ethan-tyler in #4191
- refactor: use BatchAdapterFactory for scan adaptation by @ethan-tyler in #4195
- fix: clarify vacuum command documentation for DeltaTable by @khalidmammadov in #4196
- fix(delete): use Add metadata for partition only DELETE by @ethan-tyler in #4150
- chore: harden scan adapter caching and DV mask edge cases by @ethan-tyler in #4199
- fix(datafusion): avoid overflow when scanning add actions by @vsmanish1772 in #4197
- chore: drop unused ReceiverStreamBuilder spawn by @ethan-tyler in #4201
- fix: propagate session config through Delta factory path by @ethan-tyler in #4202
- fix: enforce file-id filter semantics in scan planning by @ethan-tyler in #4206
- chore: set compression for partition optimization as well by @rtyler in #4208
- chore: enable snappy compression on checkpoints by @rtyler in #4209
- chore: change the versions for the next "majorish" release of 🦀 by @rtyler in #4207
- fix: code block indenting and fencing by @plaindocs in #4212
- fix: delete partition fallback batching and add action coalescing by @ethan-tyler in #4211
- fix: pad DV keep masks to numRecords by @ethan-tyler in #4236
- feat: route DeltaDataSink through shared write_streams by @ethan-tyler in #4194
- fix: generated column expr with SchemaMode::Merge handles missing columns by @veeceey in #4223
- docs: minor updates to the readme and contributing by @plaindocs in #4238
- feat: improve rust code samples in docs by @khalidmammadov in #4242
- docs: fix several nit issues in docs by @anshulbaliga7 in #4245
- feat(python): add post_commithook_properties to alter metadata apis by @vsmanish1772 in #4249
- docs: home page refactor by @plaindocs in #4234
- fix: move table builder local path check guard to open_table by @khalidmammadov in #4248
- chore: allow easier running of Azure integration tests in Python by @rtyler in #4255
- fix: create_write_transaction works again, now with 100% more coverage by @rtyler in #4260
- fix(warnings): change const to static for extension planners and reduce warnings by @khalidmammadov in #4259
- refactor: add insert_into and file selection write path to DeltaScan by @ethan-tyler in #4250
- fix: coerce decimal literals in target subset filters by @ethan-tyler in #4267
- fix: add a central arrow delta type normalization by @fvaleye in #4254
- fix: change visibility of add_action method to public by @lizardoluis in #4232
- refactor: reduce clippy warnings in core and in LogicalPlanBuilder and DeltaScanStream by @khalidmammadov in #4270
- docs: update contributing docs to include DCO more explicitly by @rtyler in #4271
- feat: consolidate target_file_size and allow unbounded writes by @abhiaagarwal in #4257
- fix: warn on lossy nanosecond timestamp truncation during normalization by @fvaleye in #4272
- refactor: migrate merge target scan to DeltaScanNext by @ethan-tyler in #4266
- fix: remove unsupported create_add from public API by @ethan-tyler in #4274
- refactor: reduce clippy warnings in core create by @khalidmammadov in #4275
New Contributors
- @pauldouane made their first contribution in #4192
- @plaindocs made their first contribution in #4212
- @veeceey made their first contribution in #4223
- @nabobery made their first contribution in #4244
- @anshulbaliga7 made their first contribution in #4245
- @lizardoluis made their first contribution in #4232
Full Changelog: python-v1.4.2...python-v1.5.0
python-v1.3.3
What's Changed
- fix(rust): backport pr4197 python v1.3.3 by @vsmanish1772 in #4205
Full Changelog: python-v1.3.2...python-v1.3.3
python-v1.4.2
What's Changed
- chore: upgrade python version for a patch release by @rtyler in #4141
- fix(python): guard DataFusion FFI export on datafusion major version by @ethan-tyler in #4142
- fix: align stats to view typed schema + harden parquet predicate pushdown by @ethan-tyler in #4144
- fix: nested runtimes in stream adapter by @ion-elgreco in #4148
- feat: session-first DataFusion integration + session resolution policies by @ethan-tyler in #4145
- refactor: avoid mutable updates of inner snapshot by @roeap in #4151
- perf: cache schema per stream instead of per batch by @fvaleye in #4152
- fix(docs): open_table examples to use URLs instead of str by @khalidmammadov in #4154
- fix: unblock schema merge appends with generated columns by @ethan-tyler in #4162
- fix: preserve kernel column segments by @ethan-tyler in #4164
- chore: add the unity catalog dependency back by @hntd187 in #4165
- fix: align file_id with DataFusion UInt16 dictionary by @ethan-tyler in #4167
- chore: use
default-https-clientfor aws sdk to avoid deps on hyper 0.14 by @BugenZhao in #4163 - fix(cdf): make cdf builders build function accessible by @khalidmammadov in #4161
- refactor: add FileSelection to next provider for DeltaTableProvider removal by @ethan-tyler in #4172
- feat: expose DV metadata and payloads as Arrow streams by @ethan-tyler in #4168
- fix: is_deltatable must not to create paths for not existing tables paths by @khalidmammadov in #4176
- docs: add storage backend configuration reference tables by @immohamedadhil in #4173
- refactor: remove table level stats on TableProvider by @roeap in #4174
New Contributors
- @khalidmammadov made their first contribution in #4154
- @BugenZhao made their first contribution in #4163
- @immohamedadhil made their first contribution in #4173
Full Changelog: python-v1.4.1...python-v1.4.2
python-v1.4.1
What's Changed
- fix(core): align file stats with parquet read schema by @roeap in #4130
- docs: incorporate some AI guidance for contributors by @rtyler in #4131
- fix(datafusion): resolve DML predicates against execution scan schema by @ethan-tyler in #4127
- feat: update asserted nullability in DataValidation output schema by @roeap in #4132
Full Changelog: python-v1.4.0...python-v1.4.1
python-v1.4.0
What's Changed
- fix: report failed data in data checks by @roeap in #4083
- refactor!: more logical writes by @roeap in #4090
- chore(deps): update foyer requirement from 0.20.0 to 0.22.2 by @dependabot[bot] in #4095
- fix: properly simplify delete predicate expressions for Datafusion by @rtyler in #4098
- fix: add support for user names in azure URLs by @sebbegg in #4100
- feat: enable deletion vector features for working with tables by @rtyler in #4101
- fix(python): disable ident normalization in merge by @bellshun in #4102
- feat: improve logical planning and migrate update op by @roeap in #4096
- fix(python): object store registration missing in session by @JonatanMartens in #4105
- refactor: avoid batch concatenation in write workers by @roeap in #4107
- feat: centralize predicate parsing with literal coercion by @roeap in #4106
- refactor: move normal and cdc writes into separate functions by @roeap in #4108
- fix(datafusion): handle coalesced multi-file batches in next-scan by @ethan-tyler in #4112
- refactor: move files scan to separate function by @roeap in #4111
- feat: migrate delete by @roeap in #4117
- chore: upgrade azurite and purge the need for a local az CLI to run tests by @rtyler in #4121
- chore: allow integration tests to be run in parallel with nextest by @rtyler in #4122
- chore(deps): upgrade datafusion to 52.0.0 by @ethan-tyler in #4092
- chore: upgrade python version for the next release by @rtyler in #4124
New Contributors
- @sebbegg made their first contribution in #4100
- @bellshun made their first contribution in #4102
- @JonatanMartens made their first contribution in #4105
Full Changelog: python-v1.3.2...python-v1.4.0
python-v1.3.2
What's Changed
- ci: correct maturin invocations to call publish by @rtyler in #4064
- fix: ensure that delete on an empty table works by @rtyler in #4066
- fix: avoid unnecessary reload of file data by file_views() by @rtyler in #4067
- feat: expose new table provider in query builder by @ion-elgreco in #4061
- fix: wrap table provider with block_on in scan for python by @ion-elgreco in #4072
- feat: migrate table scans by @roeap in #4048
- chore: clippy by @roeap in #4073
- feat: move data validation on a stream by @roeap in #4050
- fix: handle DV mask exhaustion and short masks in batch_project by @ethan-tyler in #4058
- fix: use LTO thin for linux ARM release by @ion-elgreco in #4077
- chore: clean up older ignored tests for deletion vectors and column mapping by @rtyler in #4075
- feat: integrate new table provider with DataSink by @roeap in #4049
- refactor: remove unused error variants by @roeap in #4078
Full Changelog: python-v1.3.1...python-v1.3.2
python-v1.3.1: read support deletion vectors, column mapping
What's Changed
- docs: update the changelog for the last couple releases by @rtyler in #4028
- chore: tidy up the python release by @rtyler in #4031
- ci: configure python version for windows by @abhiaagarwal in #4033
- feat: improve kernel engine by @roeap in #4035
- feat: kernel based table scans by @roeap in #4036
- feat: push predicates into parquet scans by @roeap in #4039
- fix(catalog-unity): improve error messages for temporary credentials failures by @saivineel in #4038
- docs: explicitly describe filesystem_check does fix and not only checks it by @SG5 in #3941
- refactor: remove DataFrame usage in delete operation by @roeap in #4047
- feat: improve schema and predicate handling in scan planning by @roeap in #4044
- chore: bump the patch version for a release of catalog-unity by @rtyler in #4042
- fix: decode paths only during scan_memory_table by @ion-elgreco in #4056
- chore: improve lakefs error msg with unknown errors by @ion-elgreco in #4055
- feat: expose newest table provider to python by @ion-elgreco in #4057
- chore!: remove peek_next_commit from LogStore by @roeap in #4059
- chore: 1.3.1 release by @ion-elgreco in #4060
- chore(ci): use ubuntu-arm images for linux arm builds by @abhiaagarwal in #4043
New Contributors
- @saivineel made their first contribution in #4038
- @SG5 made their first contribution in #3941
Full Changelog: python-v1.3.0...python-v1.3.1
rust-v0.30.0
⚠️ There are a number of API changes between 0.30.x and `0.29.4 ⚠️
This release includes delta_kernel which includes some performance improvements around stats parsing. The 0.30.x release line is expected to have a number of patch releases that incorporate more and more performance improvements with our delta_kernel integration.
Merged pull requests:
- refactor: remove log_data call sites in find_files #4026 (roeap)
- chore: remove wildcard dependency for publishing #4025 (rtyler)
- refactor: use logical type ref when getting stats #4019 (roeap)
- fix: handle stats config in data sink #4016 (roeap)
- fix: null handling when extracting scalars #4014 (roeap)
- fix: between range handling in expression translations #4013 (roeap)
- chore: fix windows uri test #4011 (hntd187)
- refactor: towards lazier snapshots #4010 (roeap)
- fix: pin pyspark and clear disk space in runners #4007 (ion-elgreco)
- test: add utilities for asserting DAT scan results #4005 (roeap)
- chore: update delta-kernel to 0.19 #4004 (roeap)
- refactor: simplify kernel extensions #4003 (roeap)
- chore: clippy #4002 (roeap)
- refactor: handle target version when resolving snapshot #4001 (roeap)
- refactor: use rstest for running DAT tests #4000 (roeap)
- feat: kernel expression conversion #3998 (roeap)
- chore: add easier local coverage reporting #3995 (rtyler)
- feat: expose operations on DeltaTable #3987 (roeap)
- chore: remove some warnigs #3986 (roeap)
- chore: normalize Url going into logstore and update everything to take references #3985 (rtyler)
- fix: add missing field to snapshot serde #3984 (roeap)
- feat: allow for concurrent deletes in conflict checker if
data_changeis false #3982 (abhiaagarwal) - fix: remove 3.9 from ci matrix #3978 (ion-elgreco)
- fix: decode path before lookup #3976 (ion-elgreco)
- chore: remove deprecated pyo3 methods #3975 (ion-elgreco)
- chore: removing APIs and deprecation warnings: 0.30.x here we come #3962 (rtyler)
- feat: update to DataFusion 51, arrow 57, delta-kernel 0.18.0, pyo3 26, pyo3-arrow 0.14 #3949 (hntd187)
- fix: schema evolution for merge operation #3945 (JustinRush80)
- chore: remove Python 3.9 from our build infrastructure #3937 (rtyler)
- docs: fix small typo issue #3935 (bmoreau8)
- chore: removing references to using
partition_filtersfor partition overwrite #3912 (zyd14) - feat(datafusion): add max_temp_directory_size parameter for z-order and compact operations for DataFusion #3847 (fvaleye)
Fixed bugs:
- Asked to increase
max_temp_directory_sizein the disk manager configuration when optimizing large table #3833
Closed issues:
- [Bug]: Count / get_add_actions exception for an empty table #4023
- [Bug]: MERGE with schema evolution does not add new columns #4009
- [Bug]: vacuum does not respect retention_hours when
full=True#3989 - [Bug]: write table by FFI call from go may memory leak? #3973
- [Bug]: Table merging fails with
merge_schema=True#3943 - [Bug]: _internal.DeltaError: Generic DeltaTable error: Unable to map __delta_rs_path to action during
overwritewithpredicate#3939 - [Feature]: update to DataFusion 51.0.0 #3920
- [Bug]:
get_add_actions()panics with "index out of bounds" when table has no data files #3918 - [Bug]: Docs describe
partition_filtersparameter towrite_deltalakethat doesn't exist #3904 - [Feature]: split
delta-rsinto multiple crates #3899 - [Feature]: Drop python 3.9 support once EOL #3886
- [Bug]: PyPi storage limit hit for
deltalake[python releases blocked for time-being] #3876**
python-v1.3.0
What's Changed
- fix: remove manylinux 217 builds for aarch64 by @ion-elgreco in #3880
- fix: display kernel-rs errors better by @ion-elgreco in #3883
- fix: needs ci release python by @ion-elgreco in #3885
- feat: add multiple constraints at once by @JustinRush80 in #3879
- chore: create the next release of the rust core package by @rtyler in #3887
- fix: surface the correct kernel objectstore error by @ion-elgreco in #3888
- feat(memory): optimize collection preallocation where capacity is known by @fvaleye in #3895
- feat: tracing spans across threadpool by @ion-elgreco in #3894
- chore: reduce wheel size by @abhiaagarwal in #3878
- fix: use the default features of aws-config by @rtyler in #3898
- perf(snapshot): minor memory allocation and usage reduction without cloning by @fvaleye in #3903
- feat: generate an Symlink Manifest for External Engines by @JustinRush80 in #3889
- feat(typed-builder): adopt typed-builder for safer builder pattern in non-core crates by @fvaleye in #3902
- chore: cleaning up warnings and preparing 0.29.3 by @rtyler in #3910
- fix: update stats serialization logic for scale-0 decimals by @DrakeLin in #3916
- feat: add GCS auto-registration via ctor hooks by @ethan-tyler in #3923
- chore: bump the patch version to release fixes by @rtyler in #3919
- chore(cargo): unify cargo profiles by @fvaleye in #3924
- fix: correctly rectify Urls with dots in DeltaTableBuilder by @rtyler in #3929
- chore(deps): update ctor requirement from 0.2 to 0.6 by @dependabot[bot] in #3927
- chore: remove proofs/ which are no longer used by @rtyler in #3930
- chore(deps): update convert_case requirement from 0.8.0 to 0.9.0 by @dependabot[bot] in #3926
- chore: adding more test coverage to the Gcp crate by @rtyler in #3931
- fix: handle empty tables in get_add_actions() by @vsmanish1772 in #3922
- chore: removing references to using
partition_filtersfor partition overwrite by @zyd14 in #3912 - chore: remove Python 3.9 from our build infrastructure by @rtyler in #3937
- feat: update to DataFusion 51, arrow 57, delta-kernel 0.18.0, pyo3 26, pyo3-arrow 0.14 by @hntd187 in #3949
- docs: fix small typo issue by @bmoreau8 in #3935
- fix: retry advancement of
PreparedCommitintoPostCommitin case version 0 already exists (created by another writer) by @danielgafni in #3513 - fix: remove 3.9 from ci matrix by @ion-elgreco in #3978
- chore: remove deprecated pyo3 methods by @ion-elgreco in #3975
- fix: schema evolution for merge operation by @JustinRush80 in #3945
- chore: removing APIs and deprecation warnings: 0.30.x here we come by @rtyler in #3962
- fix: add missing field to snapshot serde by @roeap in #3984
- chore: remove some warnigs by @roeap in #3986
- feat: expose operations on DeltaTable by @roeap in #3987
- feat(datafusion): add max_temp_directory_size parameter for z-order and compact operations for DataFusion by @fvaleye in #3847
- chore: normalize Url going into logstore and update everything to take references by @rtyler in #3985
- chore: add easier local coverage reporting by @rtyler in #3995
- refactor: use rstest for running DAT tests by @roeap in #4000
- refactor: handle target version when resolving snapshot by @roeap in #4001
- chore: clippy by @roeap in #4002
- refactor: simplify kernel extensions by @roeap in #4003
- chore: update delta-kernel to 0.19 by @roeap in #4004
- test: add utilities for asserting DAT scan results by @roeap in #4005
- feat: allow for concurrent deletes in conflict checker if
data_changeis false by @abhiaagarwal in #3982 - feat: kernel expression conversion by @roeap in #3998
- fix: decode path before lookup by @ion-elgreco in #3976
- fix: pin pyspark and clear disk space in runners by @ion-elgreco in #4007
- refactor: towards lazier snapshots by @roeap in #4010
- chore: fix windows uri test by @hntd187 in #4011
- fix: between range handling in expression translations by @roeap in #4013
- fix: null handling when extracting scalars by @roeap in #4014
- fix: handle stats config in data sink by @roeap in #4016
- refactor: use logical type ref when getting stats by @roeap in #4019
- chore: remove wildcard dependency for publishing by @rtyler in #4025
- refactor: remove log_data call sites in find_files by @roeap in #4026
New Contributors
- @DrakeLin made their first contribution in #3916
- @ethan-tyler made their first contribution in #3923
- @zyd14 made their first contribution in #3912
- @bmoreau8 made their first contribution in #3935
- @danielgafni made their first contribution in #3513
Full Changelog: python-v1.2.1...python-v1.3.0
python-v1.2.1: lazy writes
Performance improvements
- feat: in-flight, streaming
PartitionWriterby @abhiaagarwal in #3857 - fix: use single writer for all partition streams by @ion-elgreco in #3870
What's Changed
- feat: datafusion based kernel engine by @roeap in #3831
- fix: update pyproject.toml by @wagenrace in #3854
- chore: upgrade datafusion, arrow and parquet by @dentiny in #3856
- feat: allow RecordBatchWriter to pass through pass-through-commit-properties by @rtyler in #3858
- perf: support pushing physical filters down through DeltaScan by @alexwilcoxson-rel in #3859
- chore: remove some deprecated methods by @roeap in #3861
- fix: resolve some warnings by @roeap in #3862
- chore: deprecate file_actions on state by @roeap in #3863
- refactor: consolidate datafusion session setup by @roeap in #3860
- fix(core): handle Result type after get_actions sync conversion by @yousefsaad12 in #3846
- chore: change the core and meta crate versions for release by @rtyler in #3864
- chore: use form based issue templates by @roeap in #3865
- chore: add python deprecation warnings by @roeap in #3869
- feat(bench): add TPC-DS benchmarks by @abhiaagarwal in #3845
- fix: add regression test for working with dotted-named columns in Python by @rtyler in #3873
- fix: add a regression test while I'm tooting around by @rtyler in #3874
- feat: allow for lazy loading files in operations by @roeap in #3872
New Contributors
- @wagenrace made their first contribution in #3854
- @dentiny made their first contribution in #3856
- @yousefsaad12 made their first contribution in #3846
Full Changelog: python-v1.2.0...python-v1.2.1