Add changelog for 0.8.0 (#1675)

andygrove · web-flow · commit 07c7ef55c6f3 · 2025-04-23T11:45:14.000-06:00
diff --git a/dev/changelog/0.8.0.md b/dev/changelog/0.8.0.md
@@ -0,0 +1,139 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# DataFusion Comet 0.8.0 Changelog
+
+This release consists of 81 commits from 11 contributors. See credits at the end of this changelog for more information.
+
+**Fixed bugs:**
+
+- fix: remove code duplication in native_datafusion and native_iceberg_compat implementations [#1443](https://github.com/apache/datafusion-comet/pull/1443) (parthchandra)
+- fix: Refactor CometScanRule and fix bugs [#1483](https://github.com/apache/datafusion-comet/pull/1483) (andygrove)
+- fix: check if handle has been initialized before closing [#1554](https://github.com/apache/datafusion-comet/pull/1554) (wForget)
+- fix: Taking slicing into account when writing BooleanBuffers as fast-encoding format [#1522](https://github.com/apache/datafusion-comet/pull/1522) (Kontinuation)
+- fix: isCometEnabled name conflict [#1569](https://github.com/apache/datafusion-comet/pull/1569) (kazuyukitanimura)
+- fix: make register_object_store use same session_env as file scan [#1555](https://github.com/apache/datafusion-comet/pull/1555) (wForget)
+- fix: adjust CometNativeScan's doCanonicalize and hashCode for AQE, use DataSourceScanExec trait [#1578](https://github.com/apache/datafusion-comet/pull/1578) (mbutrovich)
+- fix: corrected the logic of eliminating CometSparkToColumnarExec [#1597](https://github.com/apache/datafusion-comet/pull/1597) (wForget)
+- fix: avoid panic caused by close null handle of parquet reader [#1604](https://github.com/apache/datafusion-comet/pull/1604) (wForget)
+- fix: Make AQE capable of converting Comet shuffled joins to Comet broadcast hash joins [#1605](https://github.com/apache/datafusion-comet/pull/1605) (Kontinuation)
+- fix: Making shuffle files generated in native shuffle mode reclaimable [#1568](https://github.com/apache/datafusion-comet/pull/1568) (Kontinuation)
+- fix: Support per-task shuffle write rows and shuffle write time metrics [#1617](https://github.com/apache/datafusion-comet/pull/1617) (Kontinuation)
+- fix: Modify Spark SQL core 2 tests for `native_datafusion` reader, change 3.5.5 diff hash length to 11 [#1641](https://github.com/apache/datafusion-comet/pull/1641) (mbutrovich)
+- fix: fix spark/sql test failures in native_iceberg_compat [#1593](https://github.com/apache/datafusion-comet/pull/1593) (parthchandra)
+- fix: handle missing field correctly in native_iceberg_compat [#1656](https://github.com/apache/datafusion-comet/pull/1656) (parthchandra)
+- fix: better int96 support for experimental native scans [#1652](https://github.com/apache/datafusion-comet/pull/1652) (mbutrovich)
+- fix: respect `ignoreNulls` flag in `first_value` and `last_value` [#1626](https://github.com/apache/datafusion-comet/pull/1626) (andygrove)
+- fix: update row groups count in internal metrics accumulator [#1658](https://github.com/apache/datafusion-comet/pull/1658) (parthchandra)
+- fix: Shuffle should maintain insertion order [#1660](https://github.com/apache/datafusion-comet/pull/1660) (EmilyMatt)
+
+**Performance related:**
+
+- perf: Use a global tokio runtime [#1614](https://github.com/apache/datafusion-comet/pull/1614) (andygrove)
+- perf: Respect Spark's PARQUET_FILTER_PUSHDOWN_ENABLED config [#1619](https://github.com/apache/datafusion-comet/pull/1619) (andygrove)
+- perf: Experimental fix to avoid join strategy regression [#1674](https://github.com/apache/datafusion-comet/pull/1674) (andygrove)
+
+**Implemented enhancements:**
+
+- feat: add read array support [#1456](https://github.com/apache/datafusion-comet/pull/1456) (comphead)
+- feat: introduce hadoop mini cluster to test native scan on hdfs [#1556](https://github.com/apache/datafusion-comet/pull/1556) (wForget)
+- feat: make parquet native scan schema case insensitive [#1575](https://github.com/apache/datafusion-comet/pull/1575) (wForget)
+- feat: enable iceberg compat tests, more tests for complex types [#1550](https://github.com/apache/datafusion-comet/pull/1550) (comphead)
+- feat: pushdown filter for native_iceberg_compat [#1566](https://github.com/apache/datafusion-comet/pull/1566) (wForget)
+- feat: Fix struct of arrays schema issue [#1592](https://github.com/apache/datafusion-comet/pull/1592) (comphead)
+- feat: adding more struct/arrays tests [#1594](https://github.com/apache/datafusion-comet/pull/1594) (comphead)
+- feat: respect `batchSize/workerThreads/blockingThreads` configurations for native_iceberg_compat scan [#1587](https://github.com/apache/datafusion-comet/pull/1587) (wForget)
+- feat: add MAP type support for first level [#1603](https://github.com/apache/datafusion-comet/pull/1603) (comphead)
+- feat: Add more tests for nested types combinations for `native_datafusion` [#1632](https://github.com/apache/datafusion-comet/pull/1632) (comphead)
+- feat: Override MapBuilder values field with expected schema [#1643](https://github.com/apache/datafusion-comet/pull/1643) (comphead)
+- feat: track unified memory pool [#1651](https://github.com/apache/datafusion-comet/pull/1651) (wForget)
+- feat: Add support for complex types in native shuffle [#1655](https://github.com/apache/datafusion-comet/pull/1655) (andygrove)
+
+**Documentation updates:**
+
+- docs: Update configuration guide to show optional configs [#1524](https://github.com/apache/datafusion-comet/pull/1524) (andygrove)
+- docs: Add changelog for 0.7.0 release [#1527](https://github.com/apache/datafusion-comet/pull/1527) (andygrove)
+- docs: Use a shallow clone for Spark SQL test instructions [#1547](https://github.com/apache/datafusion-comet/pull/1547) (mbutrovich)
+- docs: Update benchmark results for 0.7.0 release [#1548](https://github.com/apache/datafusion-comet/pull/1548) (andygrove)
+- doc: Renew `kubernetes.md` [#1549](https://github.com/apache/datafusion-comet/pull/1549) (comphead)
+- docs: various improvements to tuning guide [#1525](https://github.com/apache/datafusion-comet/pull/1525) (andygrove)
+- docs: Update supported Spark versions [#1580](https://github.com/apache/datafusion-comet/pull/1580) (andygrove)
+- docs: change OSX/OS X to macOS [#1584](https://github.com/apache/datafusion-comet/pull/1584) (mbutrovich)
+- docs: docs for benchmarking in aws ec2 [#1601](https://github.com/apache/datafusion-comet/pull/1601) (andygrove)
+- docs: Update compatibility docs for new native scans [#1657](https://github.com/apache/datafusion-comet/pull/1657) (andygrove)
+- doc: Document local HDFS setup [#1673](https://github.com/apache/datafusion-comet/pull/1673) (comphead)
+
+**Other:**
+
+- chore: fix issue in release process [#1528](https://github.com/apache/datafusion-comet/pull/1528) (andygrove)
+- chore: Remove all subdependencies [#1514](https://github.com/apache/datafusion-comet/pull/1514) (EmilyMatt)
+- chore: Drop support for Spark 3.3 (EOL) [#1529](https://github.com/apache/datafusion-comet/pull/1529) (andygrove)
+- chore: Prepare for 0.8.0 development [#1530](https://github.com/apache/datafusion-comet/pull/1530) (andygrove)
+- chore: Re-enable GitHub discussions [#1535](https://github.com/apache/datafusion-comet/pull/1535) (andygrove)
+- chore: [FOLLOWUP] Drop support for Spark 3.3 (EOL) [#1534](https://github.com/apache/datafusion-comet/pull/1534) (kazuyukitanimura)
+- build: Use unique name for surefire artifacts [#1544](https://github.com/apache/datafusion-comet/pull/1544) (andygrove)
+- chore: Update links for released version [#1540](https://github.com/apache/datafusion-comet/pull/1540) (andygrove)
+- chore: Enable Comet explicitly in `CometTPCDSQueryTestSuite` [#1559](https://github.com/apache/datafusion-comet/pull/1559) (andygrove)
+- chore: Fix some inconsistencies in memory pool configuration [#1561](https://github.com/apache/datafusion-comet/pull/1561) (andygrove)
+- upgraded spark 3.5.4 to 3.5.5 [#1565](https://github.com/apache/datafusion-comet/pull/1565) (YanivKunda)
+- minor: fix typo [#1570](https://github.com/apache/datafusion-comet/pull/1570) (wForget)
+- Chore: simplify array related functions impl [#1490](https://github.com/apache/datafusion-comet/pull/1490) (kazantsev-maksim)
+- added fallback using reflection for backward-compatibility [#1573](https://github.com/apache/datafusion-comet/pull/1573) (YanivKunda)
+- chore: Override node name for CometSparkToColumnar [#1577](https://github.com/apache/datafusion-comet/pull/1577) (l0kr)
+- chore: Reimplement ShuffleWriterExec using interleave_record_batch [#1511](https://github.com/apache/datafusion-comet/pull/1511) (Kontinuation)
+- chore: Run Comet tests for more Spark versions [#1582](https://github.com/apache/datafusion-comet/pull/1582) (andygrove)
+- Feat: support array_except function [#1343](https://github.com/apache/datafusion-comet/pull/1343) (kazantsev-maksim)
+- minor: Fix clippy warnings [#1606](https://github.com/apache/datafusion-comet/pull/1606) (Kontinuation)
+- chore: Remove some unwraps in hashing code [#1600](https://github.com/apache/datafusion-comet/pull/1600) (andygrove)
+- chore: Remove redundant shims for getFailOnError [#1608](https://github.com/apache/datafusion-comet/pull/1608) (andygrove)
+- chore: Making comet native operators write spill files to spark local dir [#1581](https://github.com/apache/datafusion-comet/pull/1581) (Kontinuation)
+- chore: Refactor QueryPlanSerde to use idiomatic Scala and reduce verbosity [#1609](https://github.com/apache/datafusion-comet/pull/1609) (andygrove)
+- chore: Create simple fuzz test as part of test suite [#1610](https://github.com/apache/datafusion-comet/pull/1610) (andygrove)
+- chore: Document `testSingleLineQuery` test method [#1628](https://github.com/apache/datafusion-comet/pull/1628) (comphead)
+- chore: Parquet fuzz testing [#1623](https://github.com/apache/datafusion-comet/pull/1623) (andygrove)
+- chore: Change default Spark version to 3.5 [#1620](https://github.com/apache/datafusion-comet/pull/1620) (andygrove)
+- chore: Add manually-triggered CI jobs for testing Spark SQL with native scans [#1624](https://github.com/apache/datafusion-comet/pull/1624) (andygrove)
+- chore: refactor v2 scan conversion [#1621](https://github.com/apache/datafusion-comet/pull/1621) (andygrove)
+- chore: clean up `planner.rs` [#1650](https://github.com/apache/datafusion-comet/pull/1650) (comphead)
+- chore: correct name of pipelines for native_datafusion ci workflow [#1653](https://github.com/apache/datafusion-comet/pull/1653) (parthchandra)
+- chore: Upgrade to datafusion 47.0.0-rc1 and arrow-rs 55.0.0 [#1563](https://github.com/apache/datafusion-comet/pull/1563) (andygrove)
+- chore: Upgrade to datafusion 47.0.0 [#1663](https://github.com/apache/datafusion-comet/pull/1663) (YanivKunda)
+- chore: Enable CometFuzzTestSuite int96 test for experimental native scans (without complex types) [#1664](https://github.com/apache/datafusion-comet/pull/1664) (mbutrovich)
+- chore: Refactor Memory Pools [#1662](https://github.com/apache/datafusion-comet/pull/1662) (EmilyMatt)
+
+## Credits
+
+Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
+
+```
+    31	Andy Grove
+    11	Oleks V
+    10	Zhen Wang
+     7	Kristin Cowalcijk
+     6	Matt Butrovich
+     5	Parth Chandra
+     3	Emily Matheys
+     3	Yaniv Kunda
+     2	KAZUYUKI TANIMURA
+     2	Kazantsev Maksim
+     1	Łukasz
+```
+
+Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.
+