Skip to content

Releases: delta-io/delta-rs

python-v1.5.0: faster writes, log compaction, spil config in MERGE

12 Mar 14:32

Choose a tag to compare

Breaking changes!

get_add_actions returns Arro3Table instead of Arro3RecordBatch

Main changes

What's Changed

New Contributors

Full Changelog: python-v1.4.2...python-v1.5.0

python-v1.3.3

16 Feb 11:54

Choose a tag to compare

What's Changed

Full Changelog: python-v1.3.2...python-v1.3.3

python-v1.4.2

09 Feb 02:51
27a6fb6

Choose a tag to compare

What's Changed

  • chore: upgrade python version for a patch release by @rtyler in #4141
  • fix(python): guard DataFusion FFI export on datafusion major version by @ethan-tyler in #4142
  • fix: align stats to view typed schema + harden parquet predicate pushdown by @ethan-tyler in #4144
  • fix: nested runtimes in stream adapter by @ion-elgreco in #4148
  • feat: session-first DataFusion integration + session resolution policies by @ethan-tyler in #4145
  • refactor: avoid mutable updates of inner snapshot by @roeap in #4151
  • perf: cache schema per stream instead of per batch by @fvaleye in #4152
  • fix(docs): open_table examples to use URLs instead of str by @khalidmammadov in #4154
  • fix: unblock schema merge appends with generated columns by @ethan-tyler in #4162
  • fix: preserve kernel column segments by @ethan-tyler in #4164
  • chore: add the unity catalog dependency back by @hntd187 in #4165
  • fix: align file_id with DataFusion UInt16 dictionary by @ethan-tyler in #4167
  • chore: use default-https-client for aws sdk to avoid deps on hyper 0.14 by @BugenZhao in #4163
  • fix(cdf): make cdf builders build function accessible by @khalidmammadov in #4161
  • refactor: add FileSelection to next provider for DeltaTableProvider removal by @ethan-tyler in #4172
  • feat: expose DV metadata and payloads as Arrow streams by @ethan-tyler in #4168
  • fix: is_deltatable must not to create paths for not existing tables paths by @khalidmammadov in #4176
  • docs: add storage backend configuration reference tables by @immohamedadhil in #4173
  • refactor: remove table level stats on TableProvider by @roeap in #4174

New Contributors

Full Changelog: python-v1.4.1...python-v1.4.2

python-v1.4.1

09 Feb 02:49

Choose a tag to compare

What's Changed

  • fix(core): align file stats with parquet read schema by @roeap in #4130
  • docs: incorporate some AI guidance for contributors by @rtyler in #4131
  • fix(datafusion): resolve DML predicates against execution scan schema by @ethan-tyler in #4127
  • feat: update asserted nullability in DataValidation output schema by @roeap in #4132

Full Changelog: python-v1.4.0...python-v1.4.1

python-v1.4.0

26 Jan 16:48

Choose a tag to compare

What's Changed

  • fix: report failed data in data checks by @roeap in #4083
  • refactor!: more logical writes by @roeap in #4090
  • chore(deps): update foyer requirement from 0.20.0 to 0.22.2 by @dependabot[bot] in #4095
  • fix: properly simplify delete predicate expressions for Datafusion by @rtyler in #4098
  • fix: add support for user names in azure URLs by @sebbegg in #4100
  • feat: enable deletion vector features for working with tables by @rtyler in #4101
  • fix(python): disable ident normalization in merge by @bellshun in #4102
  • feat: improve logical planning and migrate update op by @roeap in #4096
  • fix(python): object store registration missing in session by @JonatanMartens in #4105
  • refactor: avoid batch concatenation in write workers by @roeap in #4107
  • feat: centralize predicate parsing with literal coercion by @roeap in #4106
  • refactor: move normal and cdc writes into separate functions by @roeap in #4108
  • fix(datafusion): handle coalesced multi-file batches in next-scan by @ethan-tyler in #4112
  • refactor: move files scan to separate function by @roeap in #4111
  • feat: migrate delete by @roeap in #4117
  • chore: upgrade azurite and purge the need for a local az CLI to run tests by @rtyler in #4121
  • chore: allow integration tests to be run in parallel with nextest by @rtyler in #4122
  • chore(deps): upgrade datafusion to 52.0.0 by @ethan-tyler in #4092
  • chore: upgrade python version for the next release by @rtyler in #4124

New Contributors

Full Changelog: python-v1.3.2...python-v1.4.0

python-v1.3.2

14 Jan 10:12
f5ed490

Choose a tag to compare

What's Changed

  • ci: correct maturin invocations to call publish by @rtyler in #4064
  • fix: ensure that delete on an empty table works by @rtyler in #4066
  • fix: avoid unnecessary reload of file data by file_views() by @rtyler in #4067
  • feat: expose new table provider in query builder by @ion-elgreco in #4061
  • fix: wrap table provider with block_on in scan for python by @ion-elgreco in #4072
  • feat: migrate table scans by @roeap in #4048
  • chore: clippy by @roeap in #4073
  • feat: move data validation on a stream by @roeap in #4050
  • fix: handle DV mask exhaustion and short masks in batch_project by @ethan-tyler in #4058
  • fix: use LTO thin for linux ARM release by @ion-elgreco in #4077
  • chore: clean up older ignored tests for deletion vectors and column mapping by @rtyler in #4075
  • feat: integrate new table provider with DataSink by @roeap in #4049
  • refactor: remove unused error variants by @roeap in #4078

Full Changelog: python-v1.3.1...python-v1.3.2

python-v1.3.1: read support deletion vectors, column mapping

09 Jan 21:29
642febb

Choose a tag to compare

What's Changed

  • docs: update the changelog for the last couple releases by @rtyler in #4028
  • chore: tidy up the python release by @rtyler in #4031
  • ci: configure python version for windows by @abhiaagarwal in #4033
  • feat: improve kernel engine by @roeap in #4035
  • feat: kernel based table scans by @roeap in #4036
  • feat: push predicates into parquet scans by @roeap in #4039
  • fix(catalog-unity): improve error messages for temporary credentials failures by @saivineel in #4038
  • docs: explicitly describe filesystem_check does fix and not only checks it by @SG5 in #3941
  • refactor: remove DataFrame usage in delete operation by @roeap in #4047
  • feat: improve schema and predicate handling in scan planning by @roeap in #4044
  • chore: bump the patch version for a release of catalog-unity by @rtyler in #4042
  • fix: decode paths only during scan_memory_table by @ion-elgreco in #4056
  • chore: improve lakefs error msg with unknown errors by @ion-elgreco in #4055
  • feat: expose newest table provider to python by @ion-elgreco in #4057
  • chore!: remove peek_next_commit from LogStore by @roeap in #4059
  • chore: 1.3.1 release by @ion-elgreco in #4060
  • chore(ci): use ubuntu-arm images for linux arm builds by @abhiaagarwal in #4043

New Contributors

Full Changelog: python-v1.3.0...python-v1.3.1

rust-v0.30.0

31 Dec 18:58

Choose a tag to compare

⚠️ There are a number of API changes between 0.30.x and `0.29.4 ⚠️

This release includes delta_kernel which includes some performance improvements around stats parsing. The 0.30.x release line is expected to have a number of patch releases that incorporate more and more performance improvements with our delta_kernel integration.

Full Changelog

Merged pull requests:

  • refactor: remove log_data call sites in find_files #4026 (roeap)
  • chore: remove wildcard dependency for publishing #4025 (rtyler)
  • refactor: use logical type ref when getting stats #4019 (roeap)
  • fix: handle stats config in data sink #4016 (roeap)
  • fix: null handling when extracting scalars #4014 (roeap)
  • fix: between range handling in expression translations #4013 (roeap)
  • chore: fix windows uri test #4011 (hntd187)
  • refactor: towards lazier snapshots #4010 (roeap)
  • fix: pin pyspark and clear disk space in runners #4007 (ion-elgreco)
  • test: add utilities for asserting DAT scan results #4005 (roeap)
  • chore: update delta-kernel to 0.19 #4004 (roeap)
  • refactor: simplify kernel extensions #4003 (roeap)
  • chore: clippy #4002 (roeap)
  • refactor: handle target version when resolving snapshot #4001 (roeap)
  • refactor: use rstest for running DAT tests #4000 (roeap)
  • feat: kernel expression conversion #3998 (roeap)
  • chore: add easier local coverage reporting #3995 (rtyler)
  • feat: expose operations on DeltaTable #3987 (roeap)
  • chore: remove some warnigs #3986 (roeap)
  • chore: normalize Url going into logstore and update everything to take references #3985 (rtyler)
  • fix: add missing field to snapshot serde #3984 (roeap)
  • feat: allow for concurrent deletes in conflict checker if data_change is false #3982 (abhiaagarwal)
  • fix: remove 3.9 from ci matrix #3978 (ion-elgreco)
  • fix: decode path before lookup #3976 (ion-elgreco)
  • chore: remove deprecated pyo3 methods #3975 (ion-elgreco)
  • chore: removing APIs and deprecation warnings: 0.30.x here we come #3962 (rtyler)
  • feat: update to DataFusion 51, arrow 57, delta-kernel 0.18.0, pyo3 26, pyo3-arrow 0.14 #3949 (hntd187)
  • fix: schema evolution for merge operation #3945 (JustinRush80)
  • chore: remove Python 3.9 from our build infrastructure #3937 (rtyler)
  • docs: fix small typo issue #3935 (bmoreau8)
  • chore: removing references to using partition_filters for partition overwrite #3912 (zyd14)
  • feat(datafusion): add max_temp_directory_size parameter for z-order and compact operations for DataFusion #3847 (fvaleye)

Fixed bugs:

  • Asked to increase max_temp_directory_size in the disk manager configuration when optimizing large table #3833

Closed issues:

  • [Bug]: Count / get_add_actions exception for an empty table #4023
  • [Bug]: MERGE with schema evolution does not add new columns #4009
  • [Bug]: vacuum does not respect retention_hours when full=True #3989
  • [Bug]: write table by FFI call from go may memory leak? #3973
  • [Bug]: Table merging fails with merge_schema=True #3943
  • [Bug]: _internal.DeltaError: Generic DeltaTable error: Unable to map __delta_rs_path to action during overwrite with predicate #3939
  • [Feature]: update to DataFusion 51.0.0 #3920
  • [Bug]: get_add_actions() panics with "index out of bounds" when table has no data files #3918
  • [Bug]: Docs describe partition_filters parameter to write_deltalake that doesn't exist #3904
  • [Feature]: split delta-rs into multiple crates #3899
  • [Feature]: Drop python 3.9 support once EOL #3886
  • [Bug]: PyPi storage limit hit for deltalake [python releases blocked for time-being] #3876**

python-v1.3.0

09 Jan 21:23
98b1335

Choose a tag to compare

What's Changed

  • fix: remove manylinux 217 builds for aarch64 by @ion-elgreco in #3880
  • fix: display kernel-rs errors better by @ion-elgreco in #3883
  • fix: needs ci release python by @ion-elgreco in #3885
  • feat: add multiple constraints at once by @JustinRush80 in #3879
  • chore: create the next release of the rust core package by @rtyler in #3887
  • fix: surface the correct kernel objectstore error by @ion-elgreco in #3888
  • feat(memory): optimize collection preallocation where capacity is known by @fvaleye in #3895
  • feat: tracing spans across threadpool by @ion-elgreco in #3894
  • chore: reduce wheel size by @abhiaagarwal in #3878
  • fix: use the default features of aws-config by @rtyler in #3898
  • perf(snapshot): minor memory allocation and usage reduction without cloning by @fvaleye in #3903
  • feat: generate an Symlink Manifest for External Engines by @JustinRush80 in #3889
  • feat(typed-builder): adopt typed-builder for safer builder pattern in non-core crates by @fvaleye in #3902
  • chore: cleaning up warnings and preparing 0.29.3 by @rtyler in #3910
  • fix: update stats serialization logic for scale-0 decimals by @DrakeLin in #3916
  • feat: add GCS auto-registration via ctor hooks by @ethan-tyler in #3923
  • chore: bump the patch version to release fixes by @rtyler in #3919
  • chore(cargo): unify cargo profiles by @fvaleye in #3924
  • fix: correctly rectify Urls with dots in DeltaTableBuilder by @rtyler in #3929
  • chore(deps): update ctor requirement from 0.2 to 0.6 by @dependabot[bot] in #3927
  • chore: remove proofs/ which are no longer used by @rtyler in #3930
  • chore(deps): update convert_case requirement from 0.8.0 to 0.9.0 by @dependabot[bot] in #3926
  • chore: adding more test coverage to the Gcp crate by @rtyler in #3931
  • fix: handle empty tables in get_add_actions() by @vsmanish1772 in #3922
  • chore: removing references to using partition_filters for partition overwrite by @zyd14 in #3912
  • chore: remove Python 3.9 from our build infrastructure by @rtyler in #3937
  • feat: update to DataFusion 51, arrow 57, delta-kernel 0.18.0, pyo3 26, pyo3-arrow 0.14 by @hntd187 in #3949
  • docs: fix small typo issue by @bmoreau8 in #3935
  • fix: retry advancement of PreparedCommit into PostCommit in case version 0 already exists (created by another writer) by @danielgafni in #3513
  • fix: remove 3.9 from ci matrix by @ion-elgreco in #3978
  • chore: remove deprecated pyo3 methods by @ion-elgreco in #3975
  • fix: schema evolution for merge operation by @JustinRush80 in #3945
  • chore: removing APIs and deprecation warnings: 0.30.x here we come by @rtyler in #3962
  • fix: add missing field to snapshot serde by @roeap in #3984
  • chore: remove some warnigs by @roeap in #3986
  • feat: expose operations on DeltaTable by @roeap in #3987
  • feat(datafusion): add max_temp_directory_size parameter for z-order and compact operations for DataFusion by @fvaleye in #3847
  • chore: normalize Url going into logstore and update everything to take references by @rtyler in #3985
  • chore: add easier local coverage reporting by @rtyler in #3995
  • refactor: use rstest for running DAT tests by @roeap in #4000
  • refactor: handle target version when resolving snapshot by @roeap in #4001
  • chore: clippy by @roeap in #4002
  • refactor: simplify kernel extensions by @roeap in #4003
  • chore: update delta-kernel to 0.19 by @roeap in #4004
  • test: add utilities for asserting DAT scan results by @roeap in #4005
  • feat: allow for concurrent deletes in conflict checker if data_change is false by @abhiaagarwal in #3982
  • feat: kernel expression conversion by @roeap in #3998
  • fix: decode path before lookup by @ion-elgreco in #3976
  • fix: pin pyspark and clear disk space in runners by @ion-elgreco in #4007
  • refactor: towards lazier snapshots by @roeap in #4010
  • chore: fix windows uri test by @hntd187 in #4011
  • fix: between range handling in expression translations by @roeap in #4013
  • fix: null handling when extracting scalars by @roeap in #4014
  • fix: handle stats config in data sink by @roeap in #4016
  • refactor: use logical type ref when getting stats by @roeap in #4019
  • chore: remove wildcard dependency for publishing by @rtyler in #4025
  • refactor: remove log_data call sites in find_files by @roeap in #4026

New Contributors

Full Changelog: python-v1.2.1...python-v1.3.0

python-v1.2.1: lazy writes

20 Oct 06:46
cb672ac

Choose a tag to compare

Performance improvements

What's Changed

  • feat: datafusion based kernel engine by @roeap in #3831
  • fix: update pyproject.toml by @wagenrace in #3854
  • chore: upgrade datafusion, arrow and parquet by @dentiny in #3856
  • feat: allow RecordBatchWriter to pass through pass-through-commit-properties by @rtyler in #3858
  • perf: support pushing physical filters down through DeltaScan by @alexwilcoxson-rel in #3859
  • chore: remove some deprecated methods by @roeap in #3861
  • fix: resolve some warnings by @roeap in #3862
  • chore: deprecate file_actions on state by @roeap in #3863
  • refactor: consolidate datafusion session setup by @roeap in #3860
  • fix(core): handle Result type after get_actions sync conversion by @yousefsaad12 in #3846
  • chore: change the core and meta crate versions for release by @rtyler in #3864
  • chore: use form based issue templates by @roeap in #3865
  • chore: add python deprecation warnings by @roeap in #3869
  • feat(bench): add TPC-DS benchmarks by @abhiaagarwal in #3845
  • fix: add regression test for working with dotted-named columns in Python by @rtyler in #3873
  • fix: add a regression test while I'm tooting around by @rtyler in #3874
  • feat: allow for lazy loading files in operations by @roeap in #3872

New Contributors

Full Changelog: python-v1.2.0...python-v1.2.1