Add changelog for 0.9.0 (#1955)

andygrove · web-flow · commit ae7a1abb6630 · 2025-07-01T06:39:10.000-06:00
diff --git a/dev/changelog/0.9.0.md b/dev/changelog/0.9.0.md
@@ -0,0 +1,210 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# DataFusion Comet 0.9.0 Changelog
+
+This release consists of 139 commits from 24 contributors. See credits at the end of this changelog for more information.
+
+**Fixed bugs:**
+
+- fix: typo for `instr` in fuzz testing [#1686](https://github.com/apache/datafusion-comet/pull/1686) (mbutrovich)
+- fix: Bucketed scan fallback for native_datafusion Parquet scan [#1720](https://github.com/apache/datafusion-comet/pull/1720) (mbutrovich)
+- fix: Skip row index Spark SQL tests for native_datafusion Parquet scan [#1724](https://github.com/apache/datafusion-comet/pull/1724) (mbutrovich)
+- fix: Check acquired memory when CometMemoryPool grows [#1732](https://github.com/apache/datafusion-comet/pull/1732) (wForget)
+- fix: Fix data race in memory profiling [#1727](https://github.com/apache/datafusion-comet/pull/1727) (andygrove)
+- fix: Enable some DPP Spark SQL tests [#1734](https://github.com/apache/datafusion-comet/pull/1734) (andygrove)
+- fix: support literal null list and map [#1742](https://github.com/apache/datafusion-comet/pull/1742) (kazuyukitanimura)
+- fix: get_struct field is incorrect when struct in array [#1687](https://github.com/apache/datafusion-comet/pull/1687) (comphead)
+- fix: cast map types correctly in schema adapter [#1771](https://github.com/apache/datafusion-comet/pull/1771) (parthchandra)
+- fix: correct schema type checking in native_iceberg_compat [#1755](https://github.com/apache/datafusion-comet/pull/1755) (parthchandra)
+- fix: default values for native_datafusion scan [#1756](https://github.com/apache/datafusion-comet/pull/1756) (mbutrovich)
+- fix: [native_scans] Support `CASE_SENSITIVE` when reading Parquet [#1782](https://github.com/apache/datafusion-comet/pull/1782) (andygrove)
+- fix: cargo install tpchgen-cli in benchmark doc [#1797](https://github.com/apache/datafusion-comet/pull/1797) (zhuqi-lucas)
+- fix: support `map_keys` [#1788](https://github.com/apache/datafusion-comet/pull/1788) (comphead)
+- fix: fall back on nested types for default values [#1799](https://github.com/apache/datafusion-comet/pull/1799) (mbutrovich)
+- fix: Re-enable Spark 4 tests on Linux [#1806](https://github.com/apache/datafusion-comet/pull/1806) (andygrove)
+- fix: fallback to Spark scan if encryption is enabled (native_datafusion/native_iceberg_compat) [#1785](https://github.com/apache/datafusion-comet/pull/1785) (parthchandra)
+- fix: native_iceberg_compat: move checking parquet types above fetching batch [#1809](https://github.com/apache/datafusion-comet/pull/1809) (mbutrovich)
+- fix: translate missing or corrupt file exceptions, fall back if asked to ignore [#1765](https://github.com/apache/datafusion-comet/pull/1765) (mbutrovich)
+- fix: Fix Spark SQL AQE exchange reuse test failures [#1811](https://github.com/apache/datafusion-comet/pull/1811) (coderfender)
+- fix: Enable more Spark SQL tests [#1834](https://github.com/apache/datafusion-comet/pull/1834) (andygrove)
+- fix: support `map_values` [#1835](https://github.com/apache/datafusion-comet/pull/1835) (comphead)
+- fix: Handle case where num_cols == 0 in native execution [#1840](https://github.com/apache/datafusion-comet/pull/1840) (andygrove)
+- fix: Fix shuffle writing rows containing null struct fields [#1845](https://github.com/apache/datafusion-comet/pull/1845) (Kontinuation)
+- fix: Fall back to Spark for `RANGE BETWEEN` window expressions [#1848](https://github.com/apache/datafusion-comet/pull/1848) (andygrove)
+- fix: Remove COMET_SHUFFLE_FALLBACK_TO_COLUMNAR hack [#1865](https://github.com/apache/datafusion-comet/pull/1865) (andygrove)
+- fix: support read Struct by user schema [#1860](https://github.com/apache/datafusion-comet/pull/1860) (comphead)
+- fix: map parquet field_id correctly (native_iceberg_compat) [#1815](https://github.com/apache/datafusion-comet/pull/1815) (parthchandra)
+- fix: cast_struct_to_struct aligns to Spark behavior [#1879](https://github.com/apache/datafusion-comet/pull/1879) (mbutrovich)
+- fix: correctly handle schemas with nested array of struct (native_iceberg_compat) [#1883](https://github.com/apache/datafusion-comet/pull/1883) (parthchandra)
+- fix: set RangePartitioning for native shuffle default to false  [#1907](https://github.com/apache/datafusion-comet/pull/1907) (mbutrovich)
+- fix: conflict between #1905 and #1892. [#1919](https://github.com/apache/datafusion-comet/pull/1919) (mbutrovich)
+- fix: Add overflow check to evaluate of sum decimal accumulator [#1922](https://github.com/apache/datafusion-comet/pull/1922) (leung-ming)
+- fix: Fix overflow handling when casting float to decimal [#1914](https://github.com/apache/datafusion-comet/pull/1914) (leung-ming)
+- fix: Ignore a test case fails on Miri [#1951](https://github.com/apache/datafusion-comet/pull/1951) (leung-ming)
+
+**Performance related:**
+
+- perf: Add memory profiling [#1702](https://github.com/apache/datafusion-comet/pull/1702) (andygrove)
+- perf: Add performance tracing capability [#1706](https://github.com/apache/datafusion-comet/pull/1706) (andygrove)
+- perf: Add `COMET_RESPECT_PARQUET_FILTER_PUSHDOWN` config [#1936](https://github.com/apache/datafusion-comet/pull/1936) (andygrove)
+
+**Implemented enhancements:**
+
+- feat: add jemalloc as optional custom allocator [#1679](https://github.com/apache/datafusion-comet/pull/1679) (mbutrovich)
+- feat: support `array_repeat` [#1680](https://github.com/apache/datafusion-comet/pull/1680) (comphead)
+- feat: More warning info for users [#1667](https://github.com/apache/datafusion-comet/pull/1667) (hsiang-c)
+- feat: decode() expression when using 'utf-8' encoding [#1697](https://github.com/apache/datafusion-comet/pull/1697) (mbutrovich)
+- feat: regexp_replace() expression with no starting offset [#1700](https://github.com/apache/datafusion-comet/pull/1700) (mbutrovich)
+- feat: Improve performance tracing feature [#1730](https://github.com/apache/datafusion-comet/pull/1730) (andygrove)
+- feat: Set/cancel with job tag and make max broadcast table size configurable [#1693](https://github.com/apache/datafusion-comet/pull/1693) (wForget)
+- feat: Add support for `expm1` expression from `datafusion-spark` crate [#1711](https://github.com/apache/datafusion-comet/pull/1711) (andygrove)
+- feat: Add config option for showing all Comet plan transformations [#1780](https://github.com/apache/datafusion-comet/pull/1780) (andygrove)
+- feat: Support Type widening: byte → short/int/long, short → int/long [#1770](https://github.com/apache/datafusion-comet/pull/1770) (huaxingao)
+- feat: Translate Hadoop S3A configurations to object_store configurations [#1817](https://github.com/apache/datafusion-comet/pull/1817) (Kontinuation)
+- feat: Upgrade to official DataFusion 48.0.0 release [#1877](https://github.com/apache/datafusion-comet/pull/1877) (andygrove)
+- feat: Add experimental auto mode for `COMET_PARQUET_SCAN_IMPL` [#1747](https://github.com/apache/datafusion-comet/pull/1747) (andygrove)
+- feat: support RangePartitioning with native shuffle [#1862](https://github.com/apache/datafusion-comet/pull/1862) (mbutrovich)
+- feat: Add support for signum expression [#1889](https://github.com/apache/datafusion-comet/pull/1889) (andygrove)
+- feat: Add support to lookup map by key [#1898](https://github.com/apache/datafusion-comet/pull/1898) (comphead)
+- feat: support array_max [#1892](https://github.com/apache/datafusion-comet/pull/1892) (drexler-sky)
+- feat: pass ignore_nulls flag to first and last [#1866](https://github.com/apache/datafusion-comet/pull/1866) (rluvaton)
+- feat: Implement ToPrettyString [#1921](https://github.com/apache/datafusion-comet/pull/1921) (andygrove)
+- feat: Support hadoop s3a config in native_iceberg_compat [#1925](https://github.com/apache/datafusion-comet/pull/1925) (parthchandra)
+- feat: rand expression support [#1199](https://github.com/apache/datafusion-comet/pull/1199) (akupchinskiy)
+- feat: supports array_distinct [#1923](https://github.com/apache/datafusion-comet/pull/1923) (drexler-sky)
+- feat: `auto` scan mode should check for supported file location [#1930](https://github.com/apache/datafusion-comet/pull/1930) (andygrove)
+- feat: Encapsulate Parquet objects [#1920](https://github.com/apache/datafusion-comet/pull/1920) (huaxingao)
+- feat: Change default value of `COMET_NATIVE_SCAN_IMPL` to `auto` [#1933](https://github.com/apache/datafusion-comet/pull/1933) (andygrove)
+- feat: Supports array_union [#1945](https://github.com/apache/datafusion-comet/pull/1945) (drexler-sky)
+
+**Documentation updates:**
+
+- docs: Add changelog for 0.8.0 [#1675](https://github.com/apache/datafusion-comet/pull/1675) (andygrove)
+- docs: Add instructions on running TPC-H on macOS [#1647](https://github.com/apache/datafusion-comet/pull/1647) (andygrove)
+- docs: Add documentation for accelerating Iceberg Parquet scans with Comet [#1683](https://github.com/apache/datafusion-comet/pull/1683) (andygrove)
+- docs: Add note on setting `core.abbrev` when generating diffs [#1735](https://github.com/apache/datafusion-comet/pull/1735) (andygrove)
+- docs: Remove outdated param in macos bench guide [#1748](https://github.com/apache/datafusion-comet/pull/1748) (ding-young)
+- docs: Add instructions for running individual Spark SQL tests from sbt [#1752](https://github.com/apache/datafusion-comet/pull/1752) (coderfender)
+- docs: Add documentation for native_datafusion Parquet scanner's S3 support [#1832](https://github.com/apache/datafusion-comet/pull/1832) (Kontinuation)
+- docs: Add docs stating that Comet does not support reading decimals encoded in Parquet BINARY format [#1895](https://github.com/apache/datafusion-comet/pull/1895) (andygrove)
+
+**Other:**
+
+- chore: Start 0.9.0 development [#1676](https://github.com/apache/datafusion-comet/pull/1676) (andygrove)
+- chore: Update viable crates [#1677](https://github.com/apache/datafusion-comet/pull/1677) (EmilyMatt)
+- chore: match Maven plugin versions with Spark 3.5 [#1668](https://github.com/apache/datafusion-comet/pull/1668) (hsiang-c)
+- chore: Remove fallback reason "because the children were not native" [#1672](https://github.com/apache/datafusion-comet/pull/1672) (andygrove)
+- chore: Rename `scalarExprToProto` to `scalarFunctionExprToProto` [#1688](https://github.com/apache/datafusion-comet/pull/1688) (comphead)
+- chore: fix build errors [#1690](https://github.com/apache/datafusion-comet/pull/1690) (comphead)
+- chore: Make Aggregate transformation more compact [#1670](https://github.com/apache/datafusion-comet/pull/1670) (EmilyMatt)
+- chore: update dev/release/rat_exclude_files.txt [#1689](https://github.com/apache/datafusion-comet/pull/1689) (hsiang-c)
+- chore: Move Comet rules into their own files [#1695](https://github.com/apache/datafusion-comet/pull/1695) (andygrove)
+- chore: Remove fast encoding option [#1703](https://github.com/apache/datafusion-comet/pull/1703) (andygrove)
+- chore: fix CI job name [#1712](https://github.com/apache/datafusion-comet/pull/1712) (hsiang-c)
+- minor: Warn if memory pool is dropped with bytes still reserved [#1721](https://github.com/apache/datafusion-comet/pull/1721) (andygrove)
+- chore: Correct memory acquired size in unified memory pool [#1738](https://github.com/apache/datafusion-comet/pull/1738) (zuston)
+- chore: allow large errors for Clippy [#1743](https://github.com/apache/datafusion-comet/pull/1743) (comphead)
+- chore: Refactor DataTypeSupport [#1741](https://github.com/apache/datafusion-comet/pull/1741) (andygrove)
+- chore: More refactoring of type checking logic [#1744](https://github.com/apache/datafusion-comet/pull/1744) (andygrove)
+- chore: Enable more complex type tests [#1753](https://github.com/apache/datafusion-comet/pull/1753) (andygrove)
+- chore: Add `scanImpl` attribute to `CometScanExec` [#1746](https://github.com/apache/datafusion-comet/pull/1746) (andygrove)
+- chore: Prepare for DataFusion 48.0.0 [#1710](https://github.com/apache/datafusion-comet/pull/1710) (andygrove)
+- Docs: Setup Comet on IntelliJ  [#1760](https://github.com/apache/datafusion-comet/pull/1760) (coderfender)
+- chore: Reenable nested types for CometFuzzTestSuite with int96 [#1761](https://github.com/apache/datafusion-comet/pull/1761) (mbutrovich)
+- chore: Enable partial Spark SQL tests for `native_iceberg_compat` scan [#1762](https://github.com/apache/datafusion-comet/pull/1762) (andygrove)
+- chore: [native_iceberg_compat / native_datafusion] Ignore Spark SQL Parquet encryption tests [#1763](https://github.com/apache/datafusion-comet/pull/1763) (andygrove)
+- build: Ignore array_repeat test to fix CI issues [#1774](https://github.com/apache/datafusion-comet/pull/1774) (andygrove)
+- chore: Upload crash logs if Java tests fail [#1779](https://github.com/apache/datafusion-comet/pull/1779) (andygrove)
+- chore: Drop support for Java 8 [#1777](https://github.com/apache/datafusion-comet/pull/1777) (andygrove)
+- chore: Bump arrow to 18.3.0 [#1773](https://github.com/apache/datafusion-comet/pull/1773) (Kontinuation)
+- build: Stop running Comet's Spark 4 tests on Linux for PR builds [#1802](https://github.com/apache/datafusion-comet/pull/1802) (andygrove)
+- Chore: Moved strings expressions to separate file [#1792](https://github.com/apache/datafusion-comet/pull/1792) (kazantsev-maksim)
+- chore: Speed up "PR Builds" CI workflows [#1807](https://github.com/apache/datafusion-comet/pull/1807) (andygrove)
+- chore: [native scans] Ignore Spark SQL test for string predicate pushdown [#1768](https://github.com/apache/datafusion-comet/pull/1768) (andygrove)
+- chore: Bump DataFusion to git rev 2c2f225 [#1814](https://github.com/apache/datafusion-comet/pull/1814) (andygrove)
+- Feat: support bit_count function [#1602](https://github.com/apache/datafusion-comet/pull/1602) (kazantsev-maksim)
+- Chore: implement bit_not as ScalarUDFImpl [#1825](https://github.com/apache/datafusion-comet/pull/1825) (kazantsev-maksim)
+- build: Specify -Dsbt.log.noformat=true in sbt CI runs [#1822](https://github.com/apache/datafusion-comet/pull/1822) (andygrove)
+- chore: Use unique artifact names in Java test run [#1818](https://github.com/apache/datafusion-comet/pull/1818) (andygrove)
+- minor: Refactor PhysicalPlanner::default() to avoid duplicate code [#1821](https://github.com/apache/datafusion-comet/pull/1821) (andygrove)
+- Chore: implement bit_count as ScalarUDFImpl [#1826](https://github.com/apache/datafusion-comet/pull/1826) (kazantsev-maksim)
+- chore: IgnoreCometNativeScan on a few more Spark SQL tests [#1837](https://github.com/apache/datafusion-comet/pull/1837) (mbutrovich)
+- chore: Enable tests in RemoveRedundantProjectsSuite.scala related to issue #242 [#1838](https://github.com/apache/datafusion-comet/pull/1838) (rishvin)
+- minor: Replace many instances of `checkSparkAnswer` with `checkSparkAnswerAndOperator` [#1851](https://github.com/apache/datafusion-comet/pull/1851) (andygrove)
+- chore: Update documentation and ignore Spark SQL tests for known issue with count distinct on NaN in aggregate [#1847](https://github.com/apache/datafusion-comet/pull/1847) (andygrove)
+- chore: Ignore Spark SQL WholeStageCodegenSuite tests [#1859](https://github.com/apache/datafusion-comet/pull/1859) (andygrove)
+- chore: Upgrade to DataFusion 48.0.0-rc3 [#1863](https://github.com/apache/datafusion-comet/pull/1863) (andygrove)
+- upgraded spark 3.5.5 to 3.5.6 [#1861](https://github.com/apache/datafusion-comet/pull/1861) (YanivKunda)
+- build: Disable some rounding tests when miri is enabled [#1873](https://github.com/apache/datafusion-comet/pull/1873) (andygrove)
+- chore: Enable Spark SQL tests for `native_iceberg_compat` [#1876](https://github.com/apache/datafusion-comet/pull/1876) (andygrove)
+- chore: Enable more Spark SQL tests [#1869](https://github.com/apache/datafusion-comet/pull/1869) (andygrove)
+- chore: refactor planner read schema tests [#1886](https://github.com/apache/datafusion-comet/pull/1886) (comphead)
+- chore: Implement date_trunc as ScalarUDFImpl [#1880](https://github.com/apache/datafusion-comet/pull/1880) (leung-ming)
+- Chore: implement datetime funcs as ScalarUDFImpl [#1874](https://github.com/apache/datafusion-comet/pull/1874) (trompa)
+- minor: Improve testing of math scalar functions [#1896](https://github.com/apache/datafusion-comet/pull/1896) (andygrove)
+- minor: Avoid rewriting join to unsupported join [#1888](https://github.com/apache/datafusion-comet/pull/1888) (andygrove)
+- chore: Enable `native_iceberg_compat` Spark SQL tests (for real, this time) [#1910](https://github.com/apache/datafusion-comet/pull/1910) (andygrove)
+- chore: rename makeParquetFileAllTypes to makeParquetFileAllPrimitiveTypes [#1905](https://github.com/apache/datafusion-comet/pull/1905) (parthchandra)
+- chore: add a test case to read from an arbitrarily complex type schema [#1911](https://github.com/apache/datafusion-comet/pull/1911) (parthchandra)
+- test: Trigger Spark 3.4.3 SQL tests for iceberg-compat [#1912](https://github.com/apache/datafusion-comet/pull/1912) (kazuyukitanimura)
+- build: Fix conflict between #1910 and #1912 [#1924](https://github.com/apache/datafusion-comet/pull/1924) (andygrove)
+- minor: fix kube/Dockerfile build failed [#1918](https://github.com/apache/datafusion-comet/pull/1918) (zhangxffff)
+- chore: Improve reporting of fallback reasons for CollectLimit [#1694](https://github.com/apache/datafusion-comet/pull/1694) (andygrove)
+- chore: move udf registration to better place [#1899](https://github.com/apache/datafusion-comet/pull/1899) (rluvaton)
+- chore: Comet + Iceberg (1.8.1) CI [#1715](https://github.com/apache/datafusion-comet/pull/1715) (hsiang-c)
+- chore: Introduce `exprHandlers` map in QueryPlanSerde [#1903](https://github.com/apache/datafusion-comet/pull/1903) (andygrove)
+- chore: Enable Spark SQL tests for auto scan mode [#1885](https://github.com/apache/datafusion-comet/pull/1885) (andygrove)
+- Feat: support bit_get function [#1713](https://github.com/apache/datafusion-comet/pull/1713) (kazantsev-maksim)
+- chore: Clippy fixes for Rust 1.88 [#1939](https://github.com/apache/datafusion-comet/pull/1939) (andygrove)
+- Minor: Add unit tests for `ceil`/`floor` functions [#1728](https://github.com/apache/datafusion-comet/pull/1728) (tlm365)
+
+## Credits
+
+Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
+
+```
+    62	Andy Grove
+    16	Matt Butrovich
+    10	Oleks V
+     8	Parth Chandra
+     5	Kazantsev Maksim
+     5	hsiang-c
+     4	Kristin Cowalcijk
+     4	Leung Ming
+     3	B Vadlamani
+     3	drexler-sky
+     2	Emily Matheys
+     2	Huaxin Gao
+     2	KAZUYUKI TANIMURA
+     2	Raz Luvaton
+     2	Zhen Wang
+     1	Artem Kupchinskiy
+     1	Junfan Zhang
+     1	Qi Zhu
+     1	Rishab Joshi
+     1	Tai Le Manh
+     1	Yaniv Kunda
+     1	Zhang Xiaofeng
+     1	ding-young
+     1	trompa
+```
+
+Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.
+