0.8.0
Pre-release
Pre-release
DataFusion Comet 0.8.0 Changelog
This release consists of 81 commits from 11 contributors. See credits at the end of this changelog for more information.
Fixed bugs:
- fix: remove code duplication in native_datafusion and native_iceberg_compat implementations #1443 (parthchandra)
- fix: Refactor CometScanRule and fix bugs #1483 (andygrove)
- fix: check if handle has been initialized before closing #1554 (wForget)
- fix: Taking slicing into account when writing BooleanBuffers as fast-encoding format #1522 (Kontinuation)
- fix: isCometEnabled name conflict #1569 (kazuyukitanimura)
- fix: make register_object_store use same session_env as file scan #1555 (wForget)
- fix: adjust CometNativeScan's doCanonicalize and hashCode for AQE, use DataSourceScanExec trait #1578 (mbutrovich)
- fix: corrected the logic of eliminating CometSparkToColumnarExec #1597 (wForget)
- fix: avoid panic caused by close null handle of parquet reader #1604 (wForget)
- fix: Make AQE capable of converting Comet shuffled joins to Comet broadcast hash joins #1605 (Kontinuation)
- fix: Making shuffle files generated in native shuffle mode reclaimable #1568 (Kontinuation)
- fix: Support per-task shuffle write rows and shuffle write time metrics #1617 (Kontinuation)
- fix: Modify Spark SQL core 2 tests for
native_datafusionreader, change 3.5.5 diff hash length to 11 #1641 (mbutrovich) - fix: fix spark/sql test failures in native_iceberg_compat #1593 (parthchandra)
- fix: handle missing field correctly in native_iceberg_compat #1656 (parthchandra)
- fix: better int96 support for experimental native scans #1652 (mbutrovich)
- fix: respect
ignoreNullsflag infirst_valueandlast_value#1626 (andygrove) - fix: update row groups count in internal metrics accumulator #1658 (parthchandra)
- fix: Shuffle should maintain insertion order #1660 (EmilyMatt)
Performance related:
- perf: Use a global tokio runtime #1614 (andygrove)
- perf: Respect Spark's PARQUET_FILTER_PUSHDOWN_ENABLED config #1619 (andygrove)
- perf: Experimental fix to avoid join strategy regression #1674 (andygrove)
Implemented enhancements:
- feat: add read array support #1456 (comphead)
- feat: introduce hadoop mini cluster to test native scan on hdfs #1556 (wForget)
- feat: make parquet native scan schema case insensitive #1575 (wForget)
- feat: enable iceberg compat tests, more tests for complex types #1550 (comphead)
- feat: pushdown filter for native_iceberg_compat #1566 (wForget)
- feat: Fix struct of arrays schema issue #1592 (comphead)
- feat: adding more struct/arrays tests #1594 (comphead)
- feat: respect
batchSize/workerThreads/blockingThreadsconfigurations for native_iceberg_compat scan #1587 (wForget) - feat: add MAP type support for first level #1603 (comphead)
- feat: Add more tests for nested types combinations for
native_datafusion#1632 (comphead) - feat: Override MapBuilder values field with expected schema #1643 (comphead)
- feat: track unified memory pool #1651 (wForget)
- feat: Add support for complex types in native shuffle #1655 (andygrove)
Documentation updates:
- docs: Update configuration guide to show optional configs #1524 (andygrove)
- docs: Add changelog for 0.7.0 release #1527 (andygrove)
- docs: Use a shallow clone for Spark SQL test instructions #1547 (mbutrovich)
- docs: Update benchmark results for 0.7.0 release #1548 (andygrove)
- doc: Renew
kubernetes.md#1549 (comphead) - docs: various improvements to tuning guide #1525 (andygrove)
- docs: Update supported Spark versions #1580 (andygrove)
- docs: change OSX/OS X to macOS #1584 (mbutrovich)
- docs: docs for benchmarking in aws ec2 #1601 (andygrove)
- docs: Update compatibility docs for new native scans #1657 (andygrove)
- doc: Document local HDFS setup #1673 (comphead)
Other:
- chore: fix issue in release process #1528 (andygrove)
- chore: Remove all subdependencies #1514 (EmilyMatt)
- chore: Drop support for Spark 3.3 (EOL) #1529 (andygrove)
- chore: Prepare for 0.8.0 development #1530 (andygrove)
- chore: Re-enable GitHub discussions #1535 (andygrove)
- chore: [FOLLOWUP] Drop support for Spark 3.3 (EOL) #1534 (kazuyukitanimura)
- build: Use unique name for surefire artifacts #1544 (andygrove)
- chore: Update links for released version #1540 (andygrove)
- chore: Enable Comet explicitly in
CometTPCDSQueryTestSuite#1559 (andygrove) - chore: Fix some inconsistencies in memory pool configuration #1561 (andygrove)
- upgraded spark 3.5.4 to 3.5.5 #1565 (YanivKunda)
- minor: fix typo #1570 (wForget)
- Chore: simplify array related functions impl #1490 (kazantsev-maksim)
- added fallback using reflection for backward-compatibility #1573 (YanivKunda)
- chore: Override node name for CometSparkToColumnar #1577 (l0kr)
- chore: Reimplement ShuffleWriterExec using interleave_record_batch #1511 (Kontinuation)
- chore: Run Comet tests for more Spark versions #1582 (andygrove)
- Feat: support array_except function #1343 (kazantsev-maksim)
- minor: Fix clippy warnings #1606 (Kontinuation)
- chore: Remove some unwraps in hashing code #1600 (andygrove)
- chore: Remove redundant shims for getFailOnError #1608 (andygrove)
- chore: Making comet native operators write spill files to spark local dir #1581 (Kontinuation)
- chore: Refactor QueryPlanSerde to use idiomatic Scala and reduce verbosity #1609 (andygrove)
- chore: Create simple fuzz test as part of test suite #1610 (andygrove)
- chore: Document
testSingleLineQuerytest method #1628 (comphead) - chore: Parquet fuzz testing #1623 (andygrove)
- chore: Change default Spark version to 3.5 #1620 (andygrove)
- chore: Add manually-triggered CI jobs for testing Spark SQL with native scans #1624 (andygrove)
- chore: refactor v2 scan conversion #1621 (andygrove)
- chore: clean up
planner.rs#1650 (comphead) - chore: correct name of pipelines for native_datafusion ci workflow #1653 (parthchandra)
- chore: Upgrade to datafusion 47.0.0-rc1 and arrow-rs 55.0.0 #1563 (andygrove)
- chore: Upgrade to datafusion 47.0.0 #1663 (YanivKunda)
- chore: Enable CometFuzzTestSuite int96 test for experimental native scans (without complex types) #1664 (mbutrovich)
- chore: Refactor Memory Pools #1662 (EmilyMatt)
Credits
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
31 Andy Grove
11 Oleks V
10 Zhen Wang
7 Kristin Cowalcijk
6 Matt Butrovich
5 Parth Chandra
3 Emily Matheys
3 Yaniv Kunda
2 KAZUYUKI TANIMURA
2 Kazantsev Maksim
1 Łukasz
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.