Skip to content

Latest commit

 

History

History
2104 lines (1729 loc) · 109 KB

File metadata and controls

2104 lines (1729 loc) · 109 KB

Changelog

v0.20.0 (2026-02-26)

Full Changelog

🏗️ Breaking changes

  1. Remove DefaultEngine::new (#1583)
    • Use DefaultEngineBuilder instead like: DefaultEngineBuilder::new(store).build()
  2. Add ParseJson expression (#1586)
    • Implementors of the ExpressionHandler trait now need to handle this expression
  3. Change CommitResponse::Committed to return a FileMeta (#1599)
    • Committer implementations must now return a FileMeta of the written file after each commit, instead of only returning the committed version
  4. Add stats_columns to ParquetHandler (#1668)
    • Add stat_columns to write_parquet_file engine implementation, which specifies the columns to collect Delta stats on
  5. Add StatisticsCollector core with numRecords (#1662)
    • Renames _stat_columns above to stat_columns
  6. Return updated Snapshot from Snapshot::publish (#1694)
    • Snapshot::publish now takes self: Arc and returns DeltaResult instead of ()
  7. Pass engine to Snapshot::transaction() for domain metadata access (#1707)
    • Snapshot::transaction() now requires an engine: &dyn Engine parameter to read domain metadata
  8. Add tracing instrumentation to transaction and snapshot operations (#1772)
    • snapshot and transaction have both stopped implementing auto traits UnwindSafe and RefUnwindSafe due to storing new instrumentation span fields
  9. Use physical stats column names in WriteContext (#1836)
  10. Generate physical_schema in WriteContext w.r.t column mapping and materializePartitionColumns (#1837)
  11. Fix get_app_id_version to take &self (#1770)
    • If you are calling get_app_id pass a reference to the Snapshot not Arc<Snapshot>
  12. Add ability to 'enter' the runtime to the default engine (#1847)
    • Implementors of the TaskExecutor trait now need to support this

🚀 Features / new APIs

  1. Add doctests for IntoEngineData derive macro (#1580)
  2. Create DefaultEngineBuilder to build DefaultEngine (#1582)
  3. Implement Scalar::From<HashMap<K, V>> (#1541)
  4. Add logSegment.new_with_commit_appended API (#1602)
  5. snapshot.new_post_commit (#1604)
    • Creates a new Snapshot reflecting a just-committed transaction without re-reading the log
  6. Enable Arrow to convert nullable StructArray to RecordBatch (#1635)
  7. Add snapshot.checkpoint() for all-in-one checkpointing (#1600)
  8. Add a tracing statement to print table configuration for each version (#1634)
  9. Add CheckpointDeduplicator for checkpoint phase of distributed log replay (#1538)
  10. Add CreateTable API with simplified single-stage flow (#1629)
  11. Add with_table_properties method to CreateTableTransactionBuilder (#1649)
  12. Add post-commit Snapshot to txn (#1633)
  13. Add CDF tracing for Phase 1 of Change Data feed (#1654)
  14. Make Sequential phase schema only contain add and remove actions (#1679)
  15. Add executor for distributed log replay (#1539)
  16. Transaction stats API (#1658)
  17. Snapshot::publish API with e2e in-memory UC test (#1628)
  18. Expose a Snapshot::get_domain_metadata_internal API, guarded by internal-api feature flag (#1692)
  19. Add nullCount support to StatisticsCollector (#1663)
  20. Add minValues and maxValues support to StatisticsCollector (#1664)
  21. Enable NullCount collection for complex data types (#1706)
  22. Implement schema diffing for flat schemas (2/5]) (#1478)
  23. Add API on Scan to perform 2-phase log replay (#1547)
  24. Enable distributed log replay serde serialization for serializable scan state (#1549)
  25. Add InCommitTimestamp support to ChangeDataFeed (#1670)
  26. Add include_stats_columns API and output_stats_schema field (#1728)
  27. Add write support for clustered tables behind feature flag (#1704)
  28. Add snapshot load instrumentation (#1750)
  29. Create table builder and domain metadata handling (#1762)
  30. Add crc module with schema, visitor, reader, and lazy loader (#1780)
  31. Add clustering support for CREATE TABLE (#1763)
  32. Support owned runtime in TokioMultiThreadExecutor (#1719)
  33. (transaction) Support blind append commit metadata (#1783)
    • Adds set_is_blind_append() API to Transaction, includes isBlindAppend in generated CommitInfo, and validates blind-append semantics (add-only, no removals/DV updates, dataChange must be true) before commit.
  34. Add stats transform module for checkpoint stats population (#1646)
  35. Refactor data skipping to use stats_parsed directly (#1715)
  36. Support using stats_columns and predicate together in scans (#1691)
  37. Support creation of DefaultEngine with TokioMultiThreadExecutor in FFI (#1755)
  38. Add column mapping support for CREATE TABLE (#1764)
  39. Write parsed stats in checkpoints (#1643)
  40. Implement ReadConfig for Benchmark Framework (#1758)
  41. Implement TableInfo Deserialization for Benchmark Framework (#1759)
  42. Implement Read Spec Deserialization for Benchmark Framework (#1760)
  43. Allow visitors to visit REE Arrow columns. (#1829)
  44. (committer) Add tracing instrumentation to FileSystemCommitter::commit (#1811)
  45. Try and cache brew packages to speed up CI (#1909)
  46. Extend GetData with float, double, date, timestamp, decimal types (#1901)
  47. Define and use constants for protocol (3,7]) (#1917)
  48. Generate transform in WriteContext w.r.t column mapping (#1862)
  49. Support v2 checkpoints in create_table API (#1864)
  50. Expand add files schema to include all stats fields (#1748)
  51. Support write with both partition columns and column mapping in DefaultEngine (#1870)
  52. Feat: support scanning for multiple specific domains in domain metadata replay (#1881)
    • Allows callers to request multiple domain names in a single metadata replay pass, with early termination once all requested domains are found. Includes optimized skip of domain metadata fields when a domain has already been seen in a newer commit.
  53. Allow ffi for uc_catalog stuff (#1711)
  54. Support column mapping on writes (#1863)
  55. Coerce parquet read nullability to match table schema (#1903)
  56. Relax clustering column constraints to align with Delta protocol (#1913)
  57. Auto-enable variantType feature during CREATE TABLE ([#1922]) (#1949)
  58. Add type validation for evaluate_expression (#1575)
  59. Use ReaderBuilder::with_coerce_primitive when parsing JSON (#1651)
  60. Allow to change tracing level and callback more than once (#1111)
  61. Simplify checkpoint-table with Snapshot::checkpoint (#1813)
  62. Add size metadata to the CdfScanFile (#1935)
  63. Add deletion vector APIs to transaction (#1430)
  64. Include max known published commit version inside of LogSegment (#1587)
  65. Use CRC for In-Commit-Timestamp reading (#1806)
  66. Refactor ListedLogFiles::try_new to be more extensible and with default values by using builder pattern (#1585)
  67. Implement the read metadata workload runner (#1919)
  68. Provide expected stats schema (#1592)
  69. Add checkpoint schema discovery for stats_parsed detection (#1550)
  70. Add function to check if schema supports parsed stats (#1573)
  71. Read parsed-stats from checkpoint (#1638)
  72. feat: add get clustering columns in transactions (#1693)
  73. Change expected_stats_schema to return logical schema + physical schema (#1749)
  74. Add support for outputting parsed file statistics to scan batches (#1720)
  75. Checkpoint and sidecar row group skipping via stats_parsed (#1853)
  76. Add serialization/deserialization support for Predicates and Expressions (#1543)
  77. Distributed Log Replay serialization/deserialization (#1503)
  78. Introduce Deduplicator trait to unify mutable and immutable deduplication (#1537)
  79. Add ffi api to perform a checkpoint (#1619)

🐛 Bug Fixes

  1. Make parquet read actually use the executor (#1596)
  2. Deadlock for TokioMultiThreadExecutor (#1606)
  3. Remove breaking-change tag after semver passes (#1621)
  4. Enable arrow conversion from Int96 (#1653)
  5. Preserve null bitmap in nested transform expressions (#1645)
  6. Include domain metadata in checkpoints (#1718)
  • Domain metadata was not being written to checkpoint files, causing it to be lost after checkpoints
  1. Propagate struct-level nulls when computing nested column stats (#1745)
  2. Express One Zone URLs do not support lexicographical ordering (#1753)
  3. Preserve non-commit files (CRC, checkpoints, compactions) at log tail versions (#1817)
    • Fixes list_log_files to no longer discard CRC, checkpoint, and compaction files at the log tail boundary, ensuring these auxiliary files are preserved alongside their commit files.
  4. Fix Miri CI failure by cleaning stale Miri artifacts before test run (#1845)
  5. Strip parquet field IDs from physical stats schema for checkpoint reading (#1839)
  6. Unify v2 checkpoint batch schemas (#1833)
  7. Improve performance and correctness of EngineMap implementation in default engine (#1785)
  8. Parquet footer skipping cannot trust nullcount=0 stat (#1914)
  9. Column extraction for visitors should not rely on schema order (#1818)
  10. Ensure consistent usage of parquet.field.id and conversion to PARQUET:field_id in kernel/default engine (#1850)
  11. Make log segment merging in Snapshot::try_new_from deduplicate compaction files (#1954)

⚡ Performance

  1. Pre-allocate Vecs and HashSets when size is known (#1676)
  2. Add skip_stats option to skip reading file statistics (#1738)
  3. Use CRC in Protocol + Metadata log replay (#1790)

🚜 Refactor

  1. Move doctest into mods (#1574)
  2. Deny panics in ffi crate (#1576)
  3. Extract shared HTTP utilities to http.rs (#1590)
  4. Rename Snapshot.checkpoint (#1608)
  5. Extract stats from ActionReconciliationIterator (#1618)
  6. Cleanup repeated schema definitions in kernel/tests/write.rs (#1637)
  7. Split committer.rs into multiple files (#1622)
  8. Consolidate nullable stat transforms (#1636)
  9. Add Expression::coalesce helper method (#1648)
  10. Add checkpoint info to ScanLogReplayProcessor (#1752)
  11. Extract protocol & metadata replay into log_segment submodule (#1782)
  12. Define constants for table property keys (#1797)
    • Replaces scattered string literals for Delta table property keys (e.g. delta.appendOnly, delta.enableChangeDataFeed) with named constants, improving maintainability and preventing typos.
  13. Update metadata schema to be a SchemaRef and add appropriate Arcs (#1802)
  14. Rename set_is_blind_append to with_blind_append, returning Self (#1838)
    • Adopts builder-style API for the blind append flag, allowing method chaining (e.g. txn.with_blind_append(true).commit(...)).
  15. Extract clustering tests into sub-module (#1828)
  16. Split UCCommitsClient into UCCommitClient and UCGetCommitsClient (#1854)
    • Separates the Unity Catalog commits client into two focused traits — one for committing and one for reading commits — enabling cleaner dependency boundaries and testability.
  17. Use type-state pattern for CreateTableTransaction compile-time API safety (#1842)
    • Encodes the create-table workflow states (building → ready → committed) in the type system, so invalid transitions (e.g. committing before setting schema) are caught at compile time. Reorganizes create-table code and moves tests to integration tests.
  18. Simplify table feature parsing (#1878)
  19. Define and use new TableConfiguration methods (#1905)
  20. Improve Protocol::try_new and make tests call it reliably (#1907)
  21. Simplify GetData impls with bool::then() (#1918)
  22. Split transaction module into mod.rs and update.rs (#1877)
    • Breaks the growing transaction module into separate files: core transaction logic in mod.rs and update/DV-related logic in update.rs, improving navigability.
  23. Rename FeatureType::Writer as WriterOnly (#1934)
  24. Clean up TableConfiguration validation and unit tests (#1947)
  25. StructType modification method and stat_transform schema boilerplate code refactor. (#1872)

🧪 Testing

  1. In-Memory UC-Commits-Client (#1644)
  2. Add test for post_commit_snapshot with create table API (#1680)
  3. Add rs-test support (#1708)
  4. Add test validating collect_stats() output against Spark (#1778)
  5. Add test for parquet id when CM enabled (#1946)
  6. [Test Only] Minor refactor to log_segment tests (#1581
  7. Add file size to the unit test of Engine's ParquetReader (#1921)

⚙️ Chores/CI

  1. Remove unnecessary spaces in PR description (#1598)
  2. Upgrade to reqwest 0.13 and rustls as default (#1588)
  3. Stats-schema improvements (#1642)
  4. Add Rust caching to build and test jobs (#1672)
  5. Use cargo-nextest for parallel test execution (#1673)
  • ~19x faster locally via per-test process isolation
  1. Fix ffi_test cache miss by using consistent toolchain action (#1702)
  2. Add caching and optimize tool installation across all jobs (#1674)
  3. Remove unused remove metadata (#1732)
  4. Prefer append_value_n over append_value (#1868)
  5. Pin native-tls to 0.2.16 due to upstream breakage (#1880)
  6. Fix unit tests with bad protocol versions (#1879)
  7. Add nextest support for miri tests (#1685)
  8. Unpin Miri nightly toolchain (#1900)
  9. Bring 0.19.1 changes into main (#1632)
  10. Remove comfy-table dependency declaration (#1860)
  11. Update review policy in CONTRIBUTING.md (#1945)
  12. Revert "chore: pin native-tls to 0.2.16 due to upstream breakage" (#1915)

Other

  1. Remove comments and text from pull_request_template.md (#1589)

v0.19.1 (2026-01-20)

Full Changelog

🐛 Bug Fixes

  1. fix: deadlock for TokioMultiThreadExecutor (#1606) (see #1605 for a description of the issue)

v0.19.0 (2025-12-19)

Full Changelog

🏗️ Breaking changes

  1. Error on surplus columns in output schema (#1528)
  2. Remove arrow-55 support (upgrate to arrow 56 or 57 required) (#1507)
  3. Add a new read_parquet_schema function to the ParquetHandler trait (#1498)
  4. Add a new write_parquet_file function to the ParquetHandler trait (#1392)
  5. Make PartialEq for Scalar a physical comparison (#1554)

Caution

Note this is a breaking behavior change. Code that previously relied on PartialEq as a logical comparison will still compile, but its runtime behavior will silently change to perform structural comparisons.

This change moves the current definition of PartialEq for Scalar to a new Scalar::logical_eq method, and derives PartialEq (= physical comparison).

We also remove PartialOrd for Scalar because it, too, would become physical (required to match PartialEq), and the result would be largely nonsensical. The logical comparison moves to Scalar::logical_partial_cmp instead.

These changes are needed because today there's no reliable way to physically compare two scalars, and most comparisons are physical in practice. Only predicate evaluation needs logical comparisons, and that code already has a narrow waist.

  1. Expose mod time in scan metadata callbacks: users must change the scan callback function to take a struct which has all the previous arguments as members (and the mod time). See an example of the needed change here. For FFI code, your callback function needs an extra argument. See an example of the change needed here. (#1565)

🚀 Features / new APIs

  1. Initial Metrics implementation (#1448)
  2. Build TableConfiguration for each version of change data feed (#1531)
  3. Add ability for engines to specify a scan schema (#1463)
  4. Add bidirectional expression round-trip test with visitor functions (#1467)
  5. Add support for the materializePartitionColumns writer feature (#1476)
  6. Allow comfy-table 7.2.x (#1545)
  7. Rustls for uc-client (#1533)
  8. Add file name metadata column to parquet reading. (#1512)
  9. Add checkpoint example (#1544)
  10. Commit Reader for processing commit actions (#1499)
  11. Add CheckpointManifestReader to process sidecar files (#1500)
  12. Distributed Log Replay Sequential Phase (#1502)
  13. Passing schema from C, plus example/tests in C (#1535)
  14. Support sidecar in inspect-table (#1566)
  15. short-circuit coalesce evaluation when array has no nulls (#1568)

🐛 Bug Fixes

  1. Force usage of ListedLogFiles::try_new() (#1562)
  2. Improve parse_json performance by removing line-by-line parsing (#1561)

🚜 Refactor

  1. Move ensure_read_support/ensure_write_support to operation entry points (#1518)
  2. Migrate custom feature functions to generic is_feature_enabled/is_feature_supported (#1519)
  3. Separate async handler logic from sync bridge logic (#1435)

🧪 Testing

  1. Migrated protocol validation tests to table_configuration (#1517)
  2. Move scan/mod.rs to scan/tests.rs and scan/test_utils.rs (#1485)

⚙️ Chores/CI

  1. Remove macOS metadata from test data tarballs (#1534)
  2. Make tests async if they rely on async (#1438)
  3. Cleanup scalar eq workaround (#1560)

Other

  1. Remove architecture.md from readme (#1551)

v0.18.2 (2025-12-03)

Full Changelog

🐛 Bug Fixes

  1. Address column mapping edge case in protocol validation (#1513)

🧪 Testing

  1. Remove arrow error message dependency from test (#1529)

v0.18.1 (2025-11-24)

Full Changelog

🚀 Features / new APIs

  1. Scan::execute no longer requires lifetime bound (#1515)
  2. Migrate protocol validation to table_configuration (#1411)
  3. Add Display for StructType, StructField, and MetadataColumnSpec (#1494)
  4. Add EngineDataArrowExt and use it everywhere (#1516)
  5. Implement builder for StructType (#1492)
  6. Enable CDF for column-mapped tables (#1510)

🧪 Testing

  1. Extract File Action tests (#1365)

v0.18.0 (2025-11-19)

Full Changelog

🏗️ Breaking changes

  1. New Engine StorageHandler head API (#1465)
    • Engine API implementers must add the head API to StorageHandler which fetches metadata about a file in storage
  2. Add remove_files API (#1353)
    • The schema for scan rows (from Scan::scan_metadata) has been updated to include two new fields: fileConstantValues.tags and fileConstantValues.defaultRowCommitVersion.

🚀 Features / new APIs

  1. Add parser for iceberg compat properties (#1466)
  2. Pass ColumnMappingMode to physical_name (#1403)
  3. Allow visiting entire domain metadata (#1384)
  4. Add Table Feature Info (#1462)
  5. (FFI) Snapshot log tail FFI (#1379)
  6. Add generic is_feature_supported and is_feature_enabled methods to TableConfiguration (#1405)
  7. Un-deprecate ArrayData.array_elements() (#1493)
  8. Allow writes to CDF tables for add-only, remove-only, and non-data-change transactions (#1490)
  9. (catalog-managed) UCCommitter (#1418)

🐛 Bug Fixes

  1. Eliminate endless busy looping in read_json_files on failed read (#1489)
  2. Handle array/map types in ffi schema example and test (#1497)

📚 Documentation

  1. Fix docs for rustc 1.92+ (#1470)

🚜 Refactor

  1. Harmonize checkpoint and log compaction iterators (#1436)
  2. Avoid overly complex itertools methods in log listing code (#1434)
  3. Simplify creation of default engine in tests (#1437)

🧪 Testing

  1. Add tests for StructField.physical_name (#1469)

v0.17.1 (2025-11-13)

Full Changelog

📚 Documentation

  1. Fix docs for rustc 1.92+ (#1470)

v0.17.0 (2025-11-10)

Full Changelog

🏗️ Breaking changes

  1. (catalog-managed): New copy_atomic StorageHandler method (#1400)
    • StorageHandler implementers must implement the copy_atomic method.
  2. Make expression and predicate evaluator constructors fallible (#1452)
    • Predicate and expression evaluator constructors return DeltaResult.
  3. (catalog-managed): add log_tail to SnapshotBuilder (#1290)
    • into_scan_builder() no longer exists on Snapshot. Must create an Arc<Snapshot>
  4. Arrow 57, MSRV 1.85+ (#1424)
    • The Minimum Required Rust Version to use kernel-rs is now 1.85.
  5. Add ffi for idempotent write primitives (#1191)
    • get_transform_for_row now returns new FFI-safe OptionalValue instead of Option
  6. Rearchitect CommitResult (#1343)
    • CommitResult is now an enum containing CommittedTransaction, ConflictedTransaction, and RetryableTransaction
  7. Add with_data_change to transaction (#1281)
    • Engines must use with_data_change on the transaction level instead of passing it to the method. add_files_schema is moved to be scoped on a the transaction.
  8. (catalog-managed) Introduce Committer (with FileSystemCommitter) (#1349)
    • Constructing a transaction now requires a committer. Ex: FileSystemCommitter
  9. Switch scan.execute to return pre-filtered data (#1429)
    • Connectors no longer need to filter data that is returned from scan.execute()

🚀 Features / new APIs

  1. Add visit_string_map to the ffi (#1342)
  2. Add tags field to LastCheckpointHint (#1455)
  3. Support writing domain metadata (1/2]) (#1274)
  4. Change input to write_json_file to be FilteredEngineData (#1312)
  5. Convert DV storage_type to enum (#1366)
  6. Add latest_commit_file field to LogSegment (#1364)
  7. No staged commits in checkpoint/compaction (#1374)
  8. Generate In Commit Timestamp on write (#1314)
  9. (catalog-managed) Add uc-catalog crate with load_table (#1324)
  10. Snapshot should not expose delta implementation details (#1339)
  11. (catalog-managed) Uc-client commit API (#1399)
  12. Add row tracking support (#1375)
  13. Support writing domain metadata (2/2]) (#1275)
  14. Add parser for enableTypeWidening table property (#1456)
  15. Implement From trait EngineData into FilteredEngineData (#1397)
  16. Unify TableFeatures followups (#1404)
  17. Accept nullable values in "tags" HashMap in Add action (#1395)
  18. Enable writes to CDF enabled tables only if append only is supported (#1449)
  19. Add deletion vector file writer (#1425)
  20. Allow converting bytes::Bytes into a Binary Scalar (#1373)
  21. CDF API for FFI (#1335)
  22. Add optional stats field to remove action (#1390)
  23. Modify read_actions to not require callers to know details about checkpoints. (#1407)
  24. Add Accessor for Binary data (#1383)

🐛 Bug Fixes

  1. Change InCommitTimestamp enablement getter function (#1357)
  2. Be adaptive to the log schema changing in inspect-table (#1368)
  3. Typo on variable name for ScanTransformFieldClassifierieldClassifier (#1394)
  4. Pin cbindgen to 0.29.0 (#1412)
  5. Unpin cbindgen (#1414)
  6. Don't return errors from ParsedLogPath::try_from (#1433)
  7. Doc issue, stray ' (#1445)
  8. Replace todo!() with proper error handling in deletion vector (#1447)

📚 Documentation

  1. Fix scan_metadata docs (#1450)

🚜 Refactor

  1. Pull out transform spec utils and definitions (#1326)
  2. Use expression transforms in change data feed (#1330)
  3. Remove raw pointer indexing and add unit tests for RowIndexBuilder (#1334)
  4. Make Metadata fields private (#1347)
  5. Remove storing UUID in LogPathFileType::UuidCheckpoint (#1317)
  6. Consolidate physical/logical info into StateInfo (#1350)
  7. Consolidate regular scan and CDF scan field handling (#1359)
  8. Make get_cdf_transform_expr return Option (#1401)
  9. Separate domain metadata additions and removals (#1421)
  10. Unify Reader/WriterFeature into a single TableFeature (#1345)
  11. Put DataFileMetadata::as_record_batch under #[internal_api] (#1409)
  12. Create static variables for magic values in deletion vector (#1446)

🧪 Testing

  1. E2e test for log compaction (#1308)
  2. Tombstone expiration e2e test for log compaction (#1341)
  3. Add memory tests (via DHAT) (#1009)
  4. One liner to skip read_table_version_hdfs (#1428)

⚙️ Chores/CI

  1. Add CI for examples (#1393)
  2. Small typo's in log_segment.rs (#1396)
  3. Reduce log verbosity when encountering non-standard files in _delta_log (#1416)
  4. Follow up on TODO in log_replay.rs (#1408)
  5. Remove a stray comment in the kernel visitor (#1457)
  6. Allow passing more on the command line for all the cli examples (#1352)
  7. add back arrow-55 support (#1458)
  8. Rename log_schema to commit_schema (#1419)

v0.16.0 (2025-09-19)

Full Changelog

🏗️ Breaking changes

  1. New expression variants: UnaryExpression and ToJson expression (#1192)
  2. New SnapshotBuilder API: Snapshot::try_new(...) replaced with Snapshot::builder(...) and its associated methods. TLDR, you make a builder and call build to construct a Snapshot. (#1189)
  3. Simplify the Expr::Transform API, add FFI support:
    • Reworks the pub members of Transform used by Expr::Transform and introduce a new FieldTransform struct. Also, rework Transform::new (constructor) and Transform::with_input_path (method) into a pair of constructors, new_top_level and new_nested.
    • Adds two new members to the FFI EngineExpressionVisitor struct -- visit_transform_expression and visit_field_transform, which also changes the ordering of existing fields. (#1243)
  4. Add numRecords to ADD_FILES_SCHEMA (#1235)
  5. New EngineData trait required method: try_append_columns (#1190)
  6. Make ColumnType private (#1258)
  7. Add row tracking writer feature: updates ADD_FILES_SCHEMA (see PR for details) (#1239)
  8. Migrate Snapshot::try_new_from into SnapshotBuilder::new_from (#1289)
  9. (FFI) Add CDvInfo struct: The CScanCallback now takes a &CDvInfo and not a &DvInfo. (#1286)
  10. (FFI) Add explicit numbers for each KernelError enum variants. (see PR for details) (#1313)
  11. (more) new expression variants: Expression::Variadic and Coalesce expressions (#1198)
  12. All new/modified StructType constructors, see PR for details (#1278)
  13. Introduce metadata column API: StructType has new private field (#1266)
  14. (FFI) engine_data::get_engine_data now takes an AllocateErrorFn instead of an engine. (#1325)
  15. StructType::into_fields returns DoubleEndedIterator + FusedIterator (#1327)

🚀 Features / new APIs

  1. (catalog-managed) Add log_tail to list_log_files (#1194)
  2. CommitInfo sets a txnId (#1262)
  3. Allow LargeUTF8 -> String and LargeBinary -> Binary in arrow conversion (#1294)
  4. Implement log compaction (#1234)
  5. Disallow equal version in log compaction (#1309)
  6. Add Iterable to StructType (#1287)
  7. ParsedLogPath for staged commits (#1305)
  8. Default expression eval supports nested transforms (#1247)
  9. Introduce row index metadata column (#1272)

📚 Documentation

  1. Update README.md to enhance FFI documentation (#1237)

⚡ Performance

  1. Make checkpoint visitor more efficient using short circuiting (#1203)

🚜 Refactor

  1. Factor out a method for LastCheckpointHint path generation (#1228)
  2. Do not guess Vec size for checkpoints (#1263)
  3. Introduce current_time_ms() helper (#1256)
  4. Retention calculation into a new trait (#1264)
  5. Minor Refactoring in Log Compaction (#1301)
  6. Rename SnapshotBuilder::new to new_for (#1306)
  7. Move log replay into the action reconciliation module (#1295)
  8. Introduce SnapshotRef type alias (#1299)
  9. Row tracking write cleanup (#1291)

🧪 Testing

  1. Update invalid-handle tests for rustc 1.90 (#1321)
  2. Create expression benchmark for default engine (#1220)

⚙️ Chores/CI

  1. Update changelog for 0.15.1 release (#1227)
  2. Sync changelog for 0.15.2 (#1251)
  3. Update data types test to validate full Arrow error message (#1259)
  4. Add better panic message when not OK (#1293)
  5. Add test for empty commits and clean up test error types (#1252)
  6. Update contributing.md (#1206)

v0.15.2 (2025-09-03)

Full Changelog

🐛 Bug Fixes

  1. pin comfy-table at 7.1.4 to restore kernel MSRV (#1231)
  2. Arrow json decoder fix for breakage on long json string (#1244)

v0.15.1 (2025-08-28)

Full Changelog

🐛 Bug Fixes

  1. Make ListedLogFiles::try_new internal-api (again) (#1226)

v0.15.0 (2025-08-28)

Full Changelog

🏗️ Breaking changes

  1. Rename default-engine feature to default-engine-native-tls (#1100)
  2. Add arrow 56 support, drop arrow 54 (#1141)
  3. Add catalogManaged (and catalogOwned-preview) table features + catalog-managed experimental feature flag (#1165)
  4. ExpressionRef instead of owned Expression for transforms (#1171): Expression::Struct now takes a Vec<ExpressionRef> instead of Vec<Expression>
  5. Add support for Column Mapping Id Mode (#1056): significantly changes the semantics (Engine trait requirements) of the parquet handler in column mapping id mode. See ParquetHandler::read_parquet_files docs for details.
  6. StructField.physical_name is no longer public (internal-api) (#1186)
  7. Add support for sparse transform expressions (#1199): adds a new Expression::Transform variant.
  8. Expression evaluators take ExpressionRef as input (#1221):
    • EvaluationHandler::new_expression_evaluator and EvaluationHandler::new_predicate_evaluator take Arc instead of owned expression/predicate.
    • scan::state::transform_to_logical takes owned Option<ExpressionRef> instead of a borrowed reference.
    • transaction::WriteContext::logical_to_physical returns an Arc instead of a borrowed reference

🚀 Features / new APIs

  1. Impl IntoEngineData for Protocol action (#1136)
  2. Add txnId to commit info (#1148)
  3. (catalog-managed) Experimental uc client (#1164)
  4. Implement IntoEngineData for DomainMetadata (#1169)
  5. Add example for table writes (#1119)
  6. (ffi) Add visit_expression_literal_date (#1096)

🐛 Bug Fixes

  1. Match arrow versions in examples (#1166)
  2. Support arrow views in ensure_data_types (#1028)
  3. Make ListedLogFiles internal-api again (#1209)
  4. Provide accurate error when evaluating a different type in LiteralExpressionTransform (#1207)
  5. Fix failing test and improve indentation test error message (#1135)

🚜 Refactor

  1. Contiguous commit file checking inside ListedLogFiles::try_new() (#1107)
  2. New listed_log_files module (#1150)
  3. Move LastCheckpointHint to separate module (#1154)
  4. (catalog-managed) Push down _last_checkpoint read into LogSegment (#1204)

🧪 Testing

  1. Add metadata-only regression test (#1183)
  2. Parameterize column mapping tests to check different modes (#1176)
  3. Add apply_schema mismatch test (#1210)

⚙️ Chores/CI

  1. Appease clippy in rustc 1.89 (#1151)
  2. Bump MSRV to 1.84 (#1142)
  3. Remove object store versioning (#1161)
  4. Remove unused deps from examples (#1175)
  5. Update deps (#1181)

v0.14.0 (2025-08-01)

Full Changelog

🏗️ Breaking changes

  1. Removed Table APIs: instead use Snapshot and Transaction directly. (#976)
  2. Add support for Variant type and the variantType table feature (new DataType::Variant enum variant and new variantType-preview and variantShredding Reader/Writer features) (#1015)
  3. Expose post commit stats. Now, in Transaction::commit the Committed variant of the enum includes a post_commit_stats field with info about the commits since checkpoint and log compaction. (#1079)
  4. Replace Transaction::with_commit_info() API with with_engine_info() API (#997)
  5. Removed DataType::decimal_unchecked API (#1087)
  6. make_physical takes column mapping and sets parquet field ids. breaking: (1) StructField::make_physical is now an internal_api instead of a public function. Its signature has also changed. And (2) If ColumnMappingMode is None, then the physical schema's name is the logical name. Previously, kernel would unconditionally use the column mapping physical name, even if column mapping mode is none. (#1082)

🚀 Features / new APIs

  1. (ffi) Added default-engine-rustls feature and extern "C" for .h file (#1023)
  2. Add log segment constructor for timestamp to version conversion (#895)
  3. Expose unshredded variant type as DataType::unshredded_variant() (#1086)
  4. New ffi API for get_domain_metadata() (#1041)
  5. Add append functions to ffi (#962)
  6. Add try_new and IntoEngineData for Metadata action (#1122)

🐛 Bug Fixes

  1. Rename object_store PutMultipartOpts (#1071, #1090)
  2. Use object_store >= 0.12.3 for arrow 55 feature (#1117)
  3. VARIANT follow-ups for SchemaTransform etc (#1106)

🚜 Refactor

  1. Downgrade stale _last_checkpoint log from warn! to info! (#777)
  2. Exclude tests/data from release (#1092)
  3. Deny panics in prod code (#1113)

🧪 Testing

  1. Add derive macro tests (#514)
  2. Add unshredded variant read test (#1088)
  3. (ffi) AllocateErrorFn should be able to allocate a nullptr (#1105)
  4. Assert tests on error message instead of is_err() (#1110)

⚙️ Chores/CI

  1. Expose Snapshot and ListedLogFiles constructors behind internal api flag (#1076)
  2. Only semver check released crates (#1101)

Other

  1. Fix typos in README (#1093)
  2. Fix typos in docstrings (#1118)

v0.13.0 (2025-07-11)

Full Changelog

🏗️ Breaking changes

  1. Add support for opaque engine expressions. Includes a number of changes: new ExpressionTypes (OpaqueExpression, OpaquePredicate, Unknown) and Expression/Predicate variants (Opaque, Unknown), and visitors, transforms, and evaluators changed to support opaque/unknown expressions/predicate. (#686)
  2. Rename Transaction::add_write_metadata to Transaction::add_files (#1019)

🚀 Features / new APIs

  1. add ability to only retain SetTransaction actions <= SetTransactionRetentionDuration (#1013)
  2. (ffi) Add timetravel by version number (#1044)
  3. Introduce a crate for args that are common between examples (#1046)
  4. Support reordering structs that are inside maps in default parquet reader (#1060)
  5. Add default engine support for arrow eval of opaque expressions (#980)
  6. Expose descriptive fields on Metadata action (#1051)

🐛 Bug Fixes

  1. Clippy fmt cleanup (#1042)
  2. Examples: move logic into the thread::scope call so examples don't hang (#1040)
  3. Remove panic from read_last_checkpoint (#1022)
  4. Always write _last_checkpoint with parts = None (#1053)
  5. Don't release common crate (used only by example programs) (#1065)

🚜 Refactor

  1. Move various test util functions to test-utils crate (#985)
  2. Define and use a cow helper for transforms (#1057)
  3. Expand capability and usage of Cow helper for transforms (#1061)

v0.12.1 (2025-06-05)

Full Changelog

🐛 Bug Fixes

  1. Remove azure suffix range request (#1006)

v0.12.0 (2025-06-04)

Full Changelog

🏗️ Breaking changes

  1. Remove GlobalScanState: instead use new Scan APIs directly (logical_schema, physical_schema, etc.) (#947)
  2. table feature enums are now internal_api (not public, unless internal-api flag is set) (#998)

🚀 Features / new APIs

  1. Use compacted log files in log-replay (#950)
  2. New #[derive(IntoEngineData)] proc macro (#830)
  3. Add support for kernel default expression evaluation (#979)
  4. New: panic in debug builds if ListedLogFiles breaks invariants (#986)
  5. Create visitor for getting In-commit Timestamp (#897)
  6. Binary searching utility function for timestamp to version conversion (#896)
  7. Enable "TimestampWithoutTimezone" table feature and add protocol validation for it (#988)
  8. add missing reader/writer features (variantType/clustered) (#998)

🐛 Bug Fixes

  1. Disable timestamp column's maxValues for data skipping (#1003)

🚜 Refactor

  1. Make KernelPredicateEvaluator trait dyn-compatible (#994)

v0.11.0 (2025-05-27)

Full Changelog

🏗️ Breaking changes

  1. Add in-commit timestamp table feature (#894)
  2. Make Error non_exhaustive (will reduce future breaking changes!) (#913)
  3. Scalar::Map support (#881)
    • New Scalar::Map(MapData) variant and MapData struct to describe Scalar maps.
    • New visit_literal_map FFI
  4. Split out predicates as different from expressions (#775): pervasive change which moves some expressions to new predicate type.
  5. Bump MSRV from 1.81 to 1.82 (#942)
  6. DataSkippingPredicateEvaluator's associated types TypedStat and IntStat combined into one ColumnStat type (#939)
  7. Code movement in FFI crate (#940):
    • Rename ffi::expressions::engine mod as kernel_visitor
    • Rename ffi::expressions::kernel mod as engine_visitor
    • Move the free_kernel_[expression|predicate] functions to the expressions mod
    • Move the EnginePredicate struct to the ffi::scan module
  8. Fix timestamp ntz in physical to logical cdf (#948): now TableChangesScan::execute returns a schema with _commit_timestamp of type Timestamp (UTC) instead of TimestampNtz.
  9. Add TryIntoKernel/Arrow traits (#946): Removes old From/Into implementations for kernel schema types, replaces with TryFromKernel/TryIntoKernel/TryFromArrow/TryIntoArrow. Migration should be as simple as changing a .try_into() to a .try_into_kernel() or .try_into_arrow().
  10. Remove SyncEngine (now test-only), use DefaultEngine everywhere else (#957)

🚀 Features / new APIs

  1. Add Snapshot::checkpoint() & Table::checkpoint() API (#797)
  2. Add CRC ParsedLogPath (#889)
  3. Use arrow array builders in Scalar::to_array (#905)
  4. Add domainMetadata read support (#875)
  5. Support maps and arrays in literal_expression_transform (#882)
  6. Add CheckpointWriter::finalize() API (#851)
  7. DataSkippingPredicate dyn compatible (#939): finish_eval_pred_junction now takes &dyn Iterator
  8. Store compacted log files in LogSegment (#936)
  9. Add CRC, FileSizeHistogram, and DeletedRecordCountsHistogram schemas (#917)
  10. Scan from previous result (#829)
  11. Include latest CRC in LogSegment (#964)
  12. CRC protocol+metadata visitor (#972)
  13. Make several types/function pub and fix their doc comments (#977)
    • KernelPredicateEvaluator and KernelPredicateEvaluatorDefaults are now pub.
    • DataSkippingPredicateEvaluator is now pub.
    • add new type aliases DirectDataSkippingPredicateEvaluator and IndirectDataSkippingPredicateEvaluator
    • Arrow engine evaluate_expression and evaluate_predicate are now pub.
    • Expression::predicate renamed to Expression::from_pred

🐛 Bug Fixes

  1. Fix incorrect results for Scalar::Array::to_array (#905)
  2. Use object_store::Path::from_url_path when appropriate (#924)
  3. Don't include modules via a macro (#935)
  4. Rustc 1.87 clippy fixes (#955)
  5. Allow CheckpointDataIterator to be used across await (#961)
  6. Remove target-cpu=native rustflags (#960)
  7. Rename drop_null_container_values to allow_null_container_values (#965)
  8. Make ActionsBatch fields pub for internal-api (#983)

📚 Documentation

  1. Add readme badges (#904)

🚜 Refactor

  1. Combine actions counts in CheckpointVisitor (#883)
  2. Simplify Display for Expression and Predicate (#938)
  3. Macro traits cleanup (#967)
  4. Remove redundant binary predicate operations (#949)
  5. Make arrow predicate eval directly invertible (#956)
  6. Add ActionsBatch (#974)

⚙️ Chores/CI

  1. Remove abs_diff since we have rust 1.81 (#909)
  2. Conditional compilation instead of suppressing clippy warnings (#945)
  3. Expose some more arrow utils via internal-api (#971)
  4. Use consistent naming of kernel data type in arrow eval tests (#978)
  5. Cargo doc workspace + all-features (#981)

v0.10.0 (2025-04-28)

Full Changelog

🏗️ Breaking changes

  1. Updated dependencies, breaking updates: itertools 0.14, thiserror 2, and strum 0.27 (#814)
  2. Rename developer-visibility feature flag to internal-api (#834)
  3. Tidy up AND/OR/NOT API and usage (#842)
  4. Rename VariadicExpression to JunctionExpression (#841)
  5. Enforce precision/scale correctness of Decimal types and values (#857)
  6. Expression system refactors
    • Make literal expressions more strict (removed Into trait impl) (#867)
    • Remove nearly-unused expression lt_eq/gt_eq overloads (#871)
    • Move expression transforms (ExpressionTransform and ExpressionDepthChecker) to own module (#878)
    • Code movement in expression-related code (Reordered variants of the BinaryExpressionOp enum) (#879)
  7. Introduce the ability for consumers to add ObjectStore url handlers (#873)
  8. Update to arrow 55, drop arrow 53 support (#885, #903)

🚀 Features / new APIs

  1. Add CheckpointVisitor in new checkpoint mod (#738)
  2. Add CheckpointLogReplayProcessor in new checkpoints mod (#744)
  3. Add transaction.with_transaction_id() API (#824)
  4. Add snapshot.get_app_id_version(app_id, engine) (#862)
  5. Overwrite logic in write_json_file for default & sync engine (#849)

🐛 Bug Fixes

  1. default engine: Sort list results based on URL scheme (#820)
  2. impl AllocateError for T: ExternEngine (#856)
  3. Disable predicate pushdown in Scan::execute (#861)

📚 Documentation

  1. Correct docstring for DefaultEngine::new (#821)
  2. Remove acceptance from rust-analyzer.cargo.features in README (#858)

🚜 Refactor

  1. Rename predicates mod to kernel_predicates (#822)
  2. Code movement to tidy up ffi (#840)
  3. Grab bag of cosmetic tweaks and comment updates (#848)
  4. New #[internal_api] macro instead of visibility crate (#835)
  5. Expression transforms use new recurse_into_children helper (#869)
  6. Minor test improvements (#872)

⚙️ Chores/CI

  1. Remove unused dependencies (#863)
  2. Test code uses Expr shorthand for Expression (#866)
  3. Arrow DefaultExpressionEvaluator need not box its inner expression (#868)

v0.9.0 (2025-04-08)

Full Changelog

🏗️ Breaking changes

  1. Change MetadataValue::Number(i32) to MetadataValue::Number(i64) (#733)
  2. Get prefix from offset path: DefaultEngine::new no longer requires a table_root parameter and list_from consistently returns keys greater than the offset (#699)
  3. Make snapshot.schema() return a SchemaRef (#751)
  4. Make visit_expression_internal private, and unwrap_kernel_expression pub(crate) (#767)
  5. Make actions types pub(crate) instead of pub (#405)
  6. New null_row ExpressionHandler API (#662)
  7. Rename enums ReaderFeatures -> ReaderFeature and WriterFeatures -> WriterFeature (#802)
  8. Remove get_ prefix from engine getters (#804)
  9. Rename FileSystemClient to StorageHandler (#805)
  10. Adopt types for table features (New ReadFeature::Unknown(String) and (WriterFeature::Unknown(String)) (#684)
  11. Renamed ScanData to ScanMetadata (#817)
    • rename ScanData to ScanMetadata
    • rename Scan::scan_data() to Scan::scan_metadata()
    • (ffi) rename free_kernel_scan_data() to free_scan_metadata_iter()
    • (ffi) rename kernel_scan_data_next() to scan_metadata_next()
    • (ffi) rename visit_scan_data() to visit_scan_metadata()
    • (ffi) rename kernel_scan_data_init() to scan_metadata_iter_init()
    • (ffi) rename KernelScanDataIterator to ScanMetadataIterator
    • (ffi) rename SharedScanDataIterator to SharedScanMetadataIterator
  12. ScanMetadata is now a struct (instead of tuple) with new FiltereEngineData type (#768)

🚀 Features / new APIs

  1. (v2Checkpoint) Extract & insert sidecar batches in replay's action iterator (#679)
  2. Support the v2Checkpoint reader/writer feature (#685)
  3. Add check for whether appendOnly table feature is supported or enabled (#664)
  4. Add basic partition pruning support (#713)
  5. Add DeletionVectors to supported writer features (#735)
  6. Add writer version 2/invariant table feature support (#734)
  7. Improved pre-signed URL checks (#760)
  8. Add CheckpointMetadata action (#781)
  9. Add classic and uuid parquet checkpoint path generation (#782)
  10. New Snapshot::try_new_from() API (#549)

🐛 Bug Fixes

  1. Return Error::unsupported instead of panic in Scalar::to_array(MapType) (#757)
  2. Remove 'default-members' in workspace, default to all crates (#752)
  3. Update compilation error and clippy lints for rustc 1.86 (#800)

🚜 Refactor

  1. Split up arrow_expression module (#750)
  2. Flatten deeply nested match statement (#756)
  3. Simplify predicate evaluation by supporting inversion (#761)
  4. Rename LogSegment::replay to LogSegment::read_actions (#766)
  5. Extract deduplication logic from AddRemoveDedupVisitor into embeddable FileActionsDeduplicator (#769)
  6. Move testing helper function to test_utils mod (#794)
  7. Rename _last_checkpoint from CheckpointMetadata to LastCheckpointHint (#789)
  8. Use ExpressionTransform instead of adhoc expression traversals (#803)
  9. Extract log replay processing structure into LogReplayProcessor trait (#774)

🧪 Testing

  1. Add V2 checkpoint read support integration tests (#690)

⚙️ Chores/CI

  1. Use maintained action to setup rust toolchain (#585)

Other

  1. Update HDFS dependencies (#689)
  2. Add .cargo/config.toml with native instruction codegen (#772)

v0.8.0 (2025-03-04)

Full Changelog

🏗️ Breaking changes

  1. ffi: get_partition_column_count and get_partition_columns now take a Snapshot instead of a Scan (#697)
  2. ffi: expression visitor callback visit_literal_decimal now takes i64 for the upper half of a 128-bit int value (#724)
    • DefaultJsonHandler::with_readahead() renamed to DefaultJsonHandler::with_buffer_size() (#711)
  3. DefaultJsonHandler's defaults changed:
  • default buffer size: 10 => 1000 requests/files
  • default batch size: 1024 => 1000 rows
  1. Bump MSRV to rustc 1.81 (#725)

🐛 Bug Fixes

  1. Pin chrono version to fix arrow compilation failure (#719)

⚡ Performance

  1. Replace default engine JSON reader's FileStream with concurrent futures (#711)

v0.7.0 (2025-02-24)

Full Changelog

🏗️ Breaking changes

  1. Read transforms are now communicated via expressions (#607, #612, #613, #614) This includes:
    • ScanData now includes a third tuple field: a row-indexed vector of transforms to apply to the EngineData.
    • Adds a new scan::state::transform_to_logical function that encapsulates the boilerplate of applying the transform expression
    • Removes scan_action_iter API and logical_to_physical API
    • Removes column_mapping_mode from GlobalScanState
    • ffi: exposes methods to get an expression evaluator and evaluate an expression from c
    • read-table example: Removes add_partition_columns in arrow.c
    • read-table example: adds an apply_transform function in arrow.c
  2. ffi: support field nullability in schema visitor (#656)
  3. ffi: expose metadata in SchemaEngineVisitor ffi api (#659)
  4. ffi: new visit_schema FFI now operates on a Schema instead of a Snapshot (#683, #709)
  5. Introduced feature flags (arrow_54 and arrow_53) to select major arrow versions (#654, #708, #717)

🚀 Features / new APIs

  1. Read partition_values in RemoveVisitor and remove break in RowVisitor for RemoveVisitor (#633)
  2. Add the in-commit timestamp field to CommitInfo (#581)
  3. Support NOT and column expressions in eval_sql_where (#653)
  4. Add check for schema read compatibility (#554)
  5. Introduce TableConfiguration to jointly manage metadata, protocol, and table properties (#644)
  6. Add visitor SidecarVisitor and Sidecar action struct (#673)
  7. Add in-commit timestamps table properties (#558)
  8. Support writing to writer version 1 (#693)
  9. ffi: new logical_schema FFI to get the logical schema of a snapshot (#709)

🐛 Bug Fixes

  1. Incomplete multi-part checkpoint handling when no hint is provided (#641)
  2. Consistent PartialEq for Scalar (#677)
  3. Cargo fmt does not handle mods defined in macros (#676)
  4. Ensure properly nested null masks for parquet reads (#692)
  5. Handle predicates on non-nullable columns without stats (#700)

📚 Documentation

  1. Update readme to reflect tracing feature is needed for read-table (#619)
  2. Clarify JsonHandler semantics on EngineData ordering (#635)

🚜 Refactor

  1. Make [non] nullable struct fields easier to create (#646)
  2. Make eval_sql_where available to DefaultPredicateEvaluator (#627)

🧪 Testing

  1. Port cdf tests from delta-spark to kernel (#611)

⚙️ Chores/CI

  1. Fix some typos (#643)
  2. Release script publishing fixes (#638)

v0.6.1 (2025-01-10)

Full Changelog

🚀 Features / new APIs

  1. New feature flag default-engine-rustls (#572)

🐛 Bug Fixes

  1. Allow partition value timestamp to be ISO8601 formatted string (#622)
  2. Fix stderr output for handle tests (#630)

⚙️ Chores/CI

  1. Expand the arrow version range to allow arrow v54 (#616)
  2. Update to CodeCov @v5 (#608)

Other

  1. Fix msrv check by pinning home dependency (#605)
  2. Add release script (#636)

v0.6.0 (2024-12-17)

Full Changelog

API Changes

Breaking

  1. Scan::execute takes an Arc<dyn EngineData> now (#553)
  2. StructField::physical_name no longer takes a ColumnMapping argument (#543)
  3. removed ColumnMappingMode Default implementation (#562)
  4. Remove lifetime requirement on Scan::execute (#588)
  5. scan::Scan::predicate renamed as physical_predicate to eliminate ambiguity (#512)
  6. scan::log_replay::scan_action_iter now takes fewer (and different) params. (#512)
  7. Expression::Unary, Expression::Binary, and Expression::Variadic now wrap a struct of the same name containing their fields (#530)
  8. Moved delta_kernel::engine::parquet_stats_skipping module to delta_kernel::predicate::parquet_stats_skipping (#602)
  9. New Error variants Error::ChangeDataFeedIncompatibleSchema and Error::InvalidCheckpoint (#593)

Additions

  1. Ability to read a table's change data feed with new TableChanges API! See new table_changes module as well as the 'read-table-changes' example (#597). Changes include:
  • Implement Log Replay for Change Data Feed (#540)
  • ScanFile expression and visitor for CDF (#546)
  • Resolve deletion vectors to find inserted and removed rows for CDF (#568)
  • Helper methods for CDF Physical to Logical Transformation (#579)
  • TableChangesScan::execute and end to end testing for CDF (#580)
  • TableChangesScan::schema method to get logical schema (#589)
  1. Enable relaying log events via FFI (#542)

Implemented enhancements:

  • Define an ExpressionTransform trait (#530)
  • [chore] appease clippy in rustc 1.83 (#557)
  • Simplify column mapping mode handling (#543)
  • Adding some more miri tests (#503)
  • Data skipping correctly handles nested columns and column mapping (#512)
  • Engines now return FileMeta with correct millisecond timestamps (#565)

Fixed bugs:

  • don't use std abs_diff, put it in test_utils instead, run tests with msrv in action (#596)
  • (CDF) Add fix for sv extension (#591)
  • minimal CI fixes in arrow integration test and semver check (#548)

v0.5.0 (2024-11-26)

Full Changelog

API Changes

Breaking

  1. Expression::Column(String) is now Expression::Column(ColumnName) #400
  2. delta_kernel_ffi::expressions moved into two modules: delta_kernel_ffi::expressions::engine and delta_kernel_ffi::expressions::kernel #363
  3. FFI: removed (hazardous) impl From for KernelStringSlize and added unsafe constructor instead #441
  4. Moved LogSegment into its own module (log_segment::LogSegment) #438
  5. Renamed EngineData::length as EngineData::len #471
  6. New AsAny trait: AsAny: Any + Send + Sync required bound on all engine traits #450
  7. Rename mod features to mod table_features #454
  8. LogSegment fields renamed: commit_files -> ascending_commit_files and checkpoint_files -> checkpoint_parts #495
  9. Added minimum-supported rust version: currenly rust 1.80 #504
  10. Improved row visitor API: renamed EngineData::extract as EngineData::visit_rows, and DataVisitor trait renamed as RowVisitor #481
  11. FFI: New mod engine_data and mod error (moved Error to error::Error) #537
  12. new error types: InvalidProtocol, InvalidCommitInfo, MissingCommitInfo, FileAlreadyExists, Unsupported, ParseIntervalError, ChangeDataFeedUnsupported

Additions

  1. New ColumnName, column_name!, column_expr! for structured column name parsing. #400 #467
  2. New Engine API write_json_file() for atomically writing JSON #370
  3. New Transaction API for creating transactions, adding commit info and write metadata, and commiting the transaction to the table. Includes Table.new_transaction(), Transaction.write_context(), Transaction.with_commit_info, Transaction.with_operation(), Transaction.with_write_metadata(), and Transaction.commit() #370 #393
  4. FFI: Visitor for converting kernel expressions to engine expressions. See the new example at ffi/examples/visit-expression/ #363
  5. FFI: New TryFromStringSlice trait and kernel_string_slice macro #441
  6. New DefaultEngine engine implementation for writing parquet: write_parquet_file() #393
  7. Added support for parsing comma-separated column name lists: ColumnName::parse_column_name_list() #458
  8. New VacuumProtocolCheck table feature #454
  9. DvInfo now implements Clone, PartialEq, and Eq #468
  10. Stats now implements Debug, Clone, PartialEq, and Eq #468
  11. Added Cdc action support #506
  12. (early CDF read support) New TableChanges type to read CDF from a table between versions #505
  13. (early CDF read support) Builder for scans on TableChanges #521
  14. New TableProperties struct which can parse tables' metadata.configuration #453 #536

Implemented enhancements:

  • FFI examples now use AddressSanitizer #447
  • ColumnName now tracks a path of field names instead of a simple string #445
  • use ParsedLogPaths for files in LogSegment #472
  • FFI: added Miri support for tests #470
  • check table URI has trailing slash #432
  • build cargo docs in CI #479
  • new test-utils crate #477
  • added proper protocol validation (both parsing correctness and semantic correctness) #454 #493
  • harmonize predicate evaluation between delta stats and parquet footer stats #420
  • more log path tests #485
  • ensure_read_supported and ensure_write_supported APIs #518
  • include NOTICE and LICENSE in published crates #520
  • FFI: factored out read_table kernel utils into kernel_utils.h/c #539
  • simplified log replay visitor and avoid materializing Add/Remove actions #494
  • simplified schema transform API #531
  • support arrow view types in conversion from ArrowDataType to kernel's DataType #533

Fixed bugs:

  • Disabled missing-column row group skipping: The optimization to treat a physically missing column as all-null is unsound, if the schema was not already verified to prove that the table's logical schema actually includes the missing column. We disable it until we can add the necessary validation. #435
  • fixed leaks in read_table FFI example #449
  • fixed read_table compilation on windows #455
  • fixed various predicate eval bugs #420

v0.4.1 (2024-10-28)

Full Changelog

API Changes

None.

Fixed bugs:

  • Disabled missing-column row group skipping: The optimization to treat a physically missing column as all-null is unsound, if the schema was not already verified to prove that the table's logical schema actually includes the missing column. We disable it until we can add the necessary validation. #435

v0.4.0 (2024-10-23)

Full Changelog

API Changes

Breaking

  1. pub ScanResult.mask field made private and only accessible as ScanResult.raw_mask() method #374
  2. new ReaderFeatures enum variant: TypeWidening and TypeWideningPreview #335
  3. new WriterFeatures enum variant: TypeWidening and TypeWideningPreview #335
  4. new Error enum variant: InvalidLogPath when kernel is unable to parse the name of a log path #347
  5. Module moved: mod delta_kernel::transaction -> mod delta_kernel::actions::set_transaction #386
  6. change default-feature to be none (removed sync-engine by default. If downstream users relied on this, turn on sync-engine feature or specific arrow-related feature flags to pull in the pieces needed) #339
  7. Scan's execute(..) method now returns a lazy iterator instead of materializing a Vec<ScanResult>. You can trivially migrate to the new API (and force eager materialization by using .collect() or the like on the returned iterator) #340
  8. schema and expression FFI moved to their own mod delta_kernel_ffi::schema and mod delta_kernel_ffi::expressions #360
  9. Parquet and JSON readers in Engine trait now take Arc<Expression> (aliased to ExpressionRef) instead of Expression #364
  10. StructType::new(..) now takes an impl IntoIterator<Item = StructField> instead of Vec<StructField> #385
  11. DataType::struct_type(..) now takes an impl IntoIterator<Item = StructField> instead of Vec<StructField> #385
  12. removed DataType::array_type(..) API: there is already an impl From<ArrayType> for DataType #385
  13. Expression::struct_expr(..) renamed to Expression::struct_from(..) #399
  14. lots of expressions take impl Into<Self> or impl Into<Expression> instead of just Self/Expression now #399
  15. remove log_replay_iter and process_batch APIs in scan::log_replay #402

Additions

  1. remove feature flag requirement for impl GetData on () #334
  2. new full_mask() method on ScanResult #374
  3. StructType::try_new(fields: impl IntoIterator<Item = StructField>) #385
  4. DataType::try_struct_type(fields: impl IntoIterator<Item = StructField>) #385
  5. StructField.metadata_with_string_values(&self) -> HashMap<String, String> to materialize and return our metadata into a hashmap #331

Implemented enhancements:

  • support reading tables with type widening in default engine #335
  • add predicate to protocol and metadata log replay for pushdown #336 and #343
  • support annotation (macro) for nullable values in a container (for #[derive(Schema)]) #342
  • new ParsedLogPath type for better log path parsing #347
  • implemented row group skipping for default engine parquet readers and new utility trait for stats-based skipping logic #357, #362, #381
  • depend on wider arrow versions and add arrow integration testing #366 and #413
  • added semver testing to CI #369, #383, #384
  • new SchemaTransform trait and usage in column mapping and data skipping #395 and #398
  • arrow expression evaluation improvements #401
  • replace panics with to_compiler_error in macros #409

Fixed bugs:

  • output of arrow expression evaluation now applies/validates output schema in default arrow expression handler #331
  • add arrow-buffer to arrow-expression feature #332
  • fix bug with out-of-date last checkpoint #354
  • fixed broken sync engine json parsing and harmonized sync/async json parsing #373
  • filesystem client now always returns a sorted list #344

v0.3.1 (2024-09-10)

Full Changelog

API Changes

Additions

  1. Two new binary expressions: In and NotIn, as well as a new Scalar::Array variant to represent arrays in the expression framework #270 NOTE: exact API for these expressions is still evolving.

Implemented enhancements:

  • Enabled more golden table tests #301

Fixed bugs:

  • Allow kernel to read tables with invalid _last_checkpoint #311
  • List log files with checkpoint hint when constructing latest snapshot (when version requested is None) #312
  • Fix incorrect offset value when computing list offsets #327
  • Fix metadata string conversion in default engine arrow conversion #328

v0.3.0 (2024-08-07)

Full Changelog

API Changes

Breaking

  1. delta_kernel::column_mapping module moved to delta_kernel::features::column_mapping #222

Additions

  1. New deletion vector API row_indexes (and accompanying FFI) to get row indexes instead of seletion vector of deleted rows. This can be more efficient for sparse DVs. #215
  2. Typed table features: ReaderFeatures, WriterFeatures enums and has_reader_feature/has_writer_feature API #222

Implemented enhancements:

  • Add --limit option to example read-table-multi-threaded #297
  • FFI now built with cmake. Move to using the read-test example as an ffi-test. And building on macos. #288
  • Golden table tests migrated from delta-spark/delta-kernel java #295
  • Code coverage implemented via cargo-llvm-cov and reported with codecov #287
  • All tests enabled to run in CI #284
  • Updated DAT to 0.3 #290

Fixed bugs:

  • Evaluate timestamps as "UTC" instead of "+00:00" for timezone #295
  • Make Map arrow type field naming consistent with parquet field naming #299

v0.2.0 (2024-07-17)

Full Changelog

API Changes

Breaking

  1. The scan callback if using visit_scan_files now takes an extra Option<Stats> argument, holding top level stats for associated scan file. You will need to add this argument to your callback.

    Likewise, the callback in the ffi code also needs to take a new argument which is a pointer to a Stats struct, and which can be null if no stats are present.

Additions

  1. You can call scan_builder() directly on a snapshot, for more convenience.
  2. You can pass a URL starting with "hdfs" or "viewfs" to the default client to read using hdfs_native_store

Implemented enhancements:

  • Handle nested structs in schemaString (allows reading iceberg compat tables) #257
  • Expose top level stats in scans #227
  • Hugely expanded C-FFI example #203
  • Add scan_builder function to Snapshot #273
  • Add hdfs_native_store support #273
  • Proper reading of Parquet files, including only reading requested leaves, type casting, and reordering #271
  • Allow building the package if you are behind an https proxy #282

Fixed bugs:

  • Don't error if more fields exist than expected in a struct expression #267
  • Handle cases where the deletion vector length is less than the total number of rows in the chunk #276
  • Fix partition map indexing if column mapping is in effect #278

v0.1.1 (2024-06-03)

Full Changelog

Implemented enhancements:

  • Support unary NOT and IsNull for data skipping #231
  • Add unary visitors to c ffi #247
  • Minor other QOL improvements

v0.1.0 (2024-06-12)

Initial public release