|
| 1 | +<!-- |
| 2 | +Licensed to the Apache Software Foundation (ASF) under one |
| 3 | +or more contributor license agreements. See the NOTICE file |
| 4 | +distributed with this work for additional information |
| 5 | +regarding copyright ownership. The ASF licenses this file |
| 6 | +to you under the Apache License, Version 2.0 (the |
| 7 | +"License"); you may not use this file except in compliance |
| 8 | +with the License. You may obtain a copy of the License at |
| 9 | +
|
| 10 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 11 | +
|
| 12 | +Unless required by applicable law or agreed to in writing, |
| 13 | +software distributed under the License is distributed on an |
| 14 | +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 15 | +KIND, either express or implied. See the License for the |
| 16 | +specific language governing permissions and limitations |
| 17 | +under the License. |
| 18 | +--> |
| 19 | + |
| 20 | +# DataFusion Comet 0.9.0 Changelog |
| 21 | + |
| 22 | +This release consists of 139 commits from 24 contributors. See credits at the end of this changelog for more information. |
| 23 | + |
| 24 | +**Fixed bugs:** |
| 25 | + |
| 26 | +- fix: typo for `instr` in fuzz testing [#1686](https://github.com/apache/datafusion-comet/pull/1686) (mbutrovich) |
| 27 | +- fix: Bucketed scan fallback for native_datafusion Parquet scan [#1720](https://github.com/apache/datafusion-comet/pull/1720) (mbutrovich) |
| 28 | +- fix: Skip row index Spark SQL tests for native_datafusion Parquet scan [#1724](https://github.com/apache/datafusion-comet/pull/1724) (mbutrovich) |
| 29 | +- fix: Check acquired memory when CometMemoryPool grows [#1732](https://github.com/apache/datafusion-comet/pull/1732) (wForget) |
| 30 | +- fix: Fix data race in memory profiling [#1727](https://github.com/apache/datafusion-comet/pull/1727) (andygrove) |
| 31 | +- fix: Enable some DPP Spark SQL tests [#1734](https://github.com/apache/datafusion-comet/pull/1734) (andygrove) |
| 32 | +- fix: support literal null list and map [#1742](https://github.com/apache/datafusion-comet/pull/1742) (kazuyukitanimura) |
| 33 | +- fix: get_struct field is incorrect when struct in array [#1687](https://github.com/apache/datafusion-comet/pull/1687) (comphead) |
| 34 | +- fix: cast map types correctly in schema adapter [#1771](https://github.com/apache/datafusion-comet/pull/1771) (parthchandra) |
| 35 | +- fix: correct schema type checking in native_iceberg_compat [#1755](https://github.com/apache/datafusion-comet/pull/1755) (parthchandra) |
| 36 | +- fix: default values for native_datafusion scan [#1756](https://github.com/apache/datafusion-comet/pull/1756) (mbutrovich) |
| 37 | +- fix: [native_scans] Support `CASE_SENSITIVE` when reading Parquet [#1782](https://github.com/apache/datafusion-comet/pull/1782) (andygrove) |
| 38 | +- fix: cargo install tpchgen-cli in benchmark doc [#1797](https://github.com/apache/datafusion-comet/pull/1797) (zhuqi-lucas) |
| 39 | +- fix: support `map_keys` [#1788](https://github.com/apache/datafusion-comet/pull/1788) (comphead) |
| 40 | +- fix: fall back on nested types for default values [#1799](https://github.com/apache/datafusion-comet/pull/1799) (mbutrovich) |
| 41 | +- fix: Re-enable Spark 4 tests on Linux [#1806](https://github.com/apache/datafusion-comet/pull/1806) (andygrove) |
| 42 | +- fix: fallback to Spark scan if encryption is enabled (native_datafusion/native_iceberg_compat) [#1785](https://github.com/apache/datafusion-comet/pull/1785) (parthchandra) |
| 43 | +- fix: native_iceberg_compat: move checking parquet types above fetching batch [#1809](https://github.com/apache/datafusion-comet/pull/1809) (mbutrovich) |
| 44 | +- fix: translate missing or corrupt file exceptions, fall back if asked to ignore [#1765](https://github.com/apache/datafusion-comet/pull/1765) (mbutrovich) |
| 45 | +- fix: Fix Spark SQL AQE exchange reuse test failures [#1811](https://github.com/apache/datafusion-comet/pull/1811) (coderfender) |
| 46 | +- fix: Enable more Spark SQL tests [#1834](https://github.com/apache/datafusion-comet/pull/1834) (andygrove) |
| 47 | +- fix: support `map_values` [#1835](https://github.com/apache/datafusion-comet/pull/1835) (comphead) |
| 48 | +- fix: Handle case where num_cols == 0 in native execution [#1840](https://github.com/apache/datafusion-comet/pull/1840) (andygrove) |
| 49 | +- fix: Fix shuffle writing rows containing null struct fields [#1845](https://github.com/apache/datafusion-comet/pull/1845) (Kontinuation) |
| 50 | +- fix: Fall back to Spark for `RANGE BETWEEN` window expressions [#1848](https://github.com/apache/datafusion-comet/pull/1848) (andygrove) |
| 51 | +- fix: Remove COMET_SHUFFLE_FALLBACK_TO_COLUMNAR hack [#1865](https://github.com/apache/datafusion-comet/pull/1865) (andygrove) |
| 52 | +- fix: support read Struct by user schema [#1860](https://github.com/apache/datafusion-comet/pull/1860) (comphead) |
| 53 | +- fix: map parquet field_id correctly (native_iceberg_compat) [#1815](https://github.com/apache/datafusion-comet/pull/1815) (parthchandra) |
| 54 | +- fix: cast_struct_to_struct aligns to Spark behavior [#1879](https://github.com/apache/datafusion-comet/pull/1879) (mbutrovich) |
| 55 | +- fix: correctly handle schemas with nested array of struct (native_iceberg_compat) [#1883](https://github.com/apache/datafusion-comet/pull/1883) (parthchandra) |
| 56 | +- fix: set RangePartitioning for native shuffle default to false [#1907](https://github.com/apache/datafusion-comet/pull/1907) (mbutrovich) |
| 57 | +- fix: conflict between #1905 and #1892. [#1919](https://github.com/apache/datafusion-comet/pull/1919) (mbutrovich) |
| 58 | +- fix: Add overflow check to evaluate of sum decimal accumulator [#1922](https://github.com/apache/datafusion-comet/pull/1922) (leung-ming) |
| 59 | +- fix: Fix overflow handling when casting float to decimal [#1914](https://github.com/apache/datafusion-comet/pull/1914) (leung-ming) |
| 60 | +- fix: Ignore a test case fails on Miri [#1951](https://github.com/apache/datafusion-comet/pull/1951) (leung-ming) |
| 61 | + |
| 62 | +**Performance related:** |
| 63 | + |
| 64 | +- perf: Add memory profiling [#1702](https://github.com/apache/datafusion-comet/pull/1702) (andygrove) |
| 65 | +- perf: Add performance tracing capability [#1706](https://github.com/apache/datafusion-comet/pull/1706) (andygrove) |
| 66 | +- perf: Add `COMET_RESPECT_PARQUET_FILTER_PUSHDOWN` config [#1936](https://github.com/apache/datafusion-comet/pull/1936) (andygrove) |
| 67 | + |
| 68 | +**Implemented enhancements:** |
| 69 | + |
| 70 | +- feat: add jemalloc as optional custom allocator [#1679](https://github.com/apache/datafusion-comet/pull/1679) (mbutrovich) |
| 71 | +- feat: support `array_repeat` [#1680](https://github.com/apache/datafusion-comet/pull/1680) (comphead) |
| 72 | +- feat: More warning info for users [#1667](https://github.com/apache/datafusion-comet/pull/1667) (hsiang-c) |
| 73 | +- feat: decode() expression when using 'utf-8' encoding [#1697](https://github.com/apache/datafusion-comet/pull/1697) (mbutrovich) |
| 74 | +- feat: regexp_replace() expression with no starting offset [#1700](https://github.com/apache/datafusion-comet/pull/1700) (mbutrovich) |
| 75 | +- feat: Improve performance tracing feature [#1730](https://github.com/apache/datafusion-comet/pull/1730) (andygrove) |
| 76 | +- feat: Set/cancel with job tag and make max broadcast table size configurable [#1693](https://github.com/apache/datafusion-comet/pull/1693) (wForget) |
| 77 | +- feat: Add support for `expm1` expression from `datafusion-spark` crate [#1711](https://github.com/apache/datafusion-comet/pull/1711) (andygrove) |
| 78 | +- feat: Add config option for showing all Comet plan transformations [#1780](https://github.com/apache/datafusion-comet/pull/1780) (andygrove) |
| 79 | +- feat: Support Type widening: byte → short/int/long, short → int/long [#1770](https://github.com/apache/datafusion-comet/pull/1770) (huaxingao) |
| 80 | +- feat: Translate Hadoop S3A configurations to object_store configurations [#1817](https://github.com/apache/datafusion-comet/pull/1817) (Kontinuation) |
| 81 | +- feat: Upgrade to official DataFusion 48.0.0 release [#1877](https://github.com/apache/datafusion-comet/pull/1877) (andygrove) |
| 82 | +- feat: Add experimental auto mode for `COMET_PARQUET_SCAN_IMPL` [#1747](https://github.com/apache/datafusion-comet/pull/1747) (andygrove) |
| 83 | +- feat: support RangePartitioning with native shuffle [#1862](https://github.com/apache/datafusion-comet/pull/1862) (mbutrovich) |
| 84 | +- feat: Add support for signum expression [#1889](https://github.com/apache/datafusion-comet/pull/1889) (andygrove) |
| 85 | +- feat: Add support to lookup map by key [#1898](https://github.com/apache/datafusion-comet/pull/1898) (comphead) |
| 86 | +- feat: support array_max [#1892](https://github.com/apache/datafusion-comet/pull/1892) (drexler-sky) |
| 87 | +- feat: pass ignore_nulls flag to first and last [#1866](https://github.com/apache/datafusion-comet/pull/1866) (rluvaton) |
| 88 | +- feat: Implement ToPrettyString [#1921](https://github.com/apache/datafusion-comet/pull/1921) (andygrove) |
| 89 | +- feat: Support hadoop s3a config in native_iceberg_compat [#1925](https://github.com/apache/datafusion-comet/pull/1925) (parthchandra) |
| 90 | +- feat: rand expression support [#1199](https://github.com/apache/datafusion-comet/pull/1199) (akupchinskiy) |
| 91 | +- feat: supports array_distinct [#1923](https://github.com/apache/datafusion-comet/pull/1923) (drexler-sky) |
| 92 | +- feat: `auto` scan mode should check for supported file location [#1930](https://github.com/apache/datafusion-comet/pull/1930) (andygrove) |
| 93 | +- feat: Encapsulate Parquet objects [#1920](https://github.com/apache/datafusion-comet/pull/1920) (huaxingao) |
| 94 | +- feat: Change default value of `COMET_NATIVE_SCAN_IMPL` to `auto` [#1933](https://github.com/apache/datafusion-comet/pull/1933) (andygrove) |
| 95 | +- feat: Supports array_union [#1945](https://github.com/apache/datafusion-comet/pull/1945) (drexler-sky) |
| 96 | + |
| 97 | +**Documentation updates:** |
| 98 | + |
| 99 | +- docs: Add changelog for 0.8.0 [#1675](https://github.com/apache/datafusion-comet/pull/1675) (andygrove) |
| 100 | +- docs: Add instructions on running TPC-H on macOS [#1647](https://github.com/apache/datafusion-comet/pull/1647) (andygrove) |
| 101 | +- docs: Add documentation for accelerating Iceberg Parquet scans with Comet [#1683](https://github.com/apache/datafusion-comet/pull/1683) (andygrove) |
| 102 | +- docs: Add note on setting `core.abbrev` when generating diffs [#1735](https://github.com/apache/datafusion-comet/pull/1735) (andygrove) |
| 103 | +- docs: Remove outdated param in macos bench guide [#1748](https://github.com/apache/datafusion-comet/pull/1748) (ding-young) |
| 104 | +- docs: Add instructions for running individual Spark SQL tests from sbt [#1752](https://github.com/apache/datafusion-comet/pull/1752) (coderfender) |
| 105 | +- docs: Add documentation for native_datafusion Parquet scanner's S3 support [#1832](https://github.com/apache/datafusion-comet/pull/1832) (Kontinuation) |
| 106 | +- docs: Add docs stating that Comet does not support reading decimals encoded in Parquet BINARY format [#1895](https://github.com/apache/datafusion-comet/pull/1895) (andygrove) |
| 107 | + |
| 108 | +**Other:** |
| 109 | + |
| 110 | +- chore: Start 0.9.0 development [#1676](https://github.com/apache/datafusion-comet/pull/1676) (andygrove) |
| 111 | +- chore: Update viable crates [#1677](https://github.com/apache/datafusion-comet/pull/1677) (EmilyMatt) |
| 112 | +- chore: match Maven plugin versions with Spark 3.5 [#1668](https://github.com/apache/datafusion-comet/pull/1668) (hsiang-c) |
| 113 | +- chore: Remove fallback reason "because the children were not native" [#1672](https://github.com/apache/datafusion-comet/pull/1672) (andygrove) |
| 114 | +- chore: Rename `scalarExprToProto` to `scalarFunctionExprToProto` [#1688](https://github.com/apache/datafusion-comet/pull/1688) (comphead) |
| 115 | +- chore: fix build errors [#1690](https://github.com/apache/datafusion-comet/pull/1690) (comphead) |
| 116 | +- chore: Make Aggregate transformation more compact [#1670](https://github.com/apache/datafusion-comet/pull/1670) (EmilyMatt) |
| 117 | +- chore: update dev/release/rat_exclude_files.txt [#1689](https://github.com/apache/datafusion-comet/pull/1689) (hsiang-c) |
| 118 | +- chore: Move Comet rules into their own files [#1695](https://github.com/apache/datafusion-comet/pull/1695) (andygrove) |
| 119 | +- chore: Remove fast encoding option [#1703](https://github.com/apache/datafusion-comet/pull/1703) (andygrove) |
| 120 | +- chore: fix CI job name [#1712](https://github.com/apache/datafusion-comet/pull/1712) (hsiang-c) |
| 121 | +- minor: Warn if memory pool is dropped with bytes still reserved [#1721](https://github.com/apache/datafusion-comet/pull/1721) (andygrove) |
| 122 | +- chore: Correct memory acquired size in unified memory pool [#1738](https://github.com/apache/datafusion-comet/pull/1738) (zuston) |
| 123 | +- chore: allow large errors for Clippy [#1743](https://github.com/apache/datafusion-comet/pull/1743) (comphead) |
| 124 | +- chore: Refactor DataTypeSupport [#1741](https://github.com/apache/datafusion-comet/pull/1741) (andygrove) |
| 125 | +- chore: More refactoring of type checking logic [#1744](https://github.com/apache/datafusion-comet/pull/1744) (andygrove) |
| 126 | +- chore: Enable more complex type tests [#1753](https://github.com/apache/datafusion-comet/pull/1753) (andygrove) |
| 127 | +- chore: Add `scanImpl` attribute to `CometScanExec` [#1746](https://github.com/apache/datafusion-comet/pull/1746) (andygrove) |
| 128 | +- chore: Prepare for DataFusion 48.0.0 [#1710](https://github.com/apache/datafusion-comet/pull/1710) (andygrove) |
| 129 | +- Docs: Setup Comet on IntelliJ [#1760](https://github.com/apache/datafusion-comet/pull/1760) (coderfender) |
| 130 | +- chore: Reenable nested types for CometFuzzTestSuite with int96 [#1761](https://github.com/apache/datafusion-comet/pull/1761) (mbutrovich) |
| 131 | +- chore: Enable partial Spark SQL tests for `native_iceberg_compat` scan [#1762](https://github.com/apache/datafusion-comet/pull/1762) (andygrove) |
| 132 | +- chore: [native_iceberg_compat / native_datafusion] Ignore Spark SQL Parquet encryption tests [#1763](https://github.com/apache/datafusion-comet/pull/1763) (andygrove) |
| 133 | +- build: Ignore array_repeat test to fix CI issues [#1774](https://github.com/apache/datafusion-comet/pull/1774) (andygrove) |
| 134 | +- chore: Upload crash logs if Java tests fail [#1779](https://github.com/apache/datafusion-comet/pull/1779) (andygrove) |
| 135 | +- chore: Drop support for Java 8 [#1777](https://github.com/apache/datafusion-comet/pull/1777) (andygrove) |
| 136 | +- chore: Bump arrow to 18.3.0 [#1773](https://github.com/apache/datafusion-comet/pull/1773) (Kontinuation) |
| 137 | +- build: Stop running Comet's Spark 4 tests on Linux for PR builds [#1802](https://github.com/apache/datafusion-comet/pull/1802) (andygrove) |
| 138 | +- Chore: Moved strings expressions to separate file [#1792](https://github.com/apache/datafusion-comet/pull/1792) (kazantsev-maksim) |
| 139 | +- chore: Speed up "PR Builds" CI workflows [#1807](https://github.com/apache/datafusion-comet/pull/1807) (andygrove) |
| 140 | +- chore: [native scans] Ignore Spark SQL test for string predicate pushdown [#1768](https://github.com/apache/datafusion-comet/pull/1768) (andygrove) |
| 141 | +- chore: Bump DataFusion to git rev 2c2f225 [#1814](https://github.com/apache/datafusion-comet/pull/1814) (andygrove) |
| 142 | +- Feat: support bit_count function [#1602](https://github.com/apache/datafusion-comet/pull/1602) (kazantsev-maksim) |
| 143 | +- Chore: implement bit_not as ScalarUDFImpl [#1825](https://github.com/apache/datafusion-comet/pull/1825) (kazantsev-maksim) |
| 144 | +- build: Specify -Dsbt.log.noformat=true in sbt CI runs [#1822](https://github.com/apache/datafusion-comet/pull/1822) (andygrove) |
| 145 | +- chore: Use unique artifact names in Java test run [#1818](https://github.com/apache/datafusion-comet/pull/1818) (andygrove) |
| 146 | +- minor: Refactor PhysicalPlanner::default() to avoid duplicate code [#1821](https://github.com/apache/datafusion-comet/pull/1821) (andygrove) |
| 147 | +- Chore: implement bit_count as ScalarUDFImpl [#1826](https://github.com/apache/datafusion-comet/pull/1826) (kazantsev-maksim) |
| 148 | +- chore: IgnoreCometNativeScan on a few more Spark SQL tests [#1837](https://github.com/apache/datafusion-comet/pull/1837) (mbutrovich) |
| 149 | +- chore: Enable tests in RemoveRedundantProjectsSuite.scala related to issue #242 [#1838](https://github.com/apache/datafusion-comet/pull/1838) (rishvin) |
| 150 | +- minor: Replace many instances of `checkSparkAnswer` with `checkSparkAnswerAndOperator` [#1851](https://github.com/apache/datafusion-comet/pull/1851) (andygrove) |
| 151 | +- chore: Update documentation and ignore Spark SQL tests for known issue with count distinct on NaN in aggregate [#1847](https://github.com/apache/datafusion-comet/pull/1847) (andygrove) |
| 152 | +- chore: Ignore Spark SQL WholeStageCodegenSuite tests [#1859](https://github.com/apache/datafusion-comet/pull/1859) (andygrove) |
| 153 | +- chore: Upgrade to DataFusion 48.0.0-rc3 [#1863](https://github.com/apache/datafusion-comet/pull/1863) (andygrove) |
| 154 | +- upgraded spark 3.5.5 to 3.5.6 [#1861](https://github.com/apache/datafusion-comet/pull/1861) (YanivKunda) |
| 155 | +- build: Disable some rounding tests when miri is enabled [#1873](https://github.com/apache/datafusion-comet/pull/1873) (andygrove) |
| 156 | +- chore: Enable Spark SQL tests for `native_iceberg_compat` [#1876](https://github.com/apache/datafusion-comet/pull/1876) (andygrove) |
| 157 | +- chore: Enable more Spark SQL tests [#1869](https://github.com/apache/datafusion-comet/pull/1869) (andygrove) |
| 158 | +- chore: refactor planner read schema tests [#1886](https://github.com/apache/datafusion-comet/pull/1886) (comphead) |
| 159 | +- chore: Implement date_trunc as ScalarUDFImpl [#1880](https://github.com/apache/datafusion-comet/pull/1880) (leung-ming) |
| 160 | +- Chore: implement datetime funcs as ScalarUDFImpl [#1874](https://github.com/apache/datafusion-comet/pull/1874) (trompa) |
| 161 | +- minor: Improve testing of math scalar functions [#1896](https://github.com/apache/datafusion-comet/pull/1896) (andygrove) |
| 162 | +- minor: Avoid rewriting join to unsupported join [#1888](https://github.com/apache/datafusion-comet/pull/1888) (andygrove) |
| 163 | +- chore: Enable `native_iceberg_compat` Spark SQL tests (for real, this time) [#1910](https://github.com/apache/datafusion-comet/pull/1910) (andygrove) |
| 164 | +- chore: rename makeParquetFileAllTypes to makeParquetFileAllPrimitiveTypes [#1905](https://github.com/apache/datafusion-comet/pull/1905) (parthchandra) |
| 165 | +- chore: add a test case to read from an arbitrarily complex type schema [#1911](https://github.com/apache/datafusion-comet/pull/1911) (parthchandra) |
| 166 | +- test: Trigger Spark 3.4.3 SQL tests for iceberg-compat [#1912](https://github.com/apache/datafusion-comet/pull/1912) (kazuyukitanimura) |
| 167 | +- build: Fix conflict between #1910 and #1912 [#1924](https://github.com/apache/datafusion-comet/pull/1924) (andygrove) |
| 168 | +- minor: fix kube/Dockerfile build failed [#1918](https://github.com/apache/datafusion-comet/pull/1918) (zhangxffff) |
| 169 | +- chore: Improve reporting of fallback reasons for CollectLimit [#1694](https://github.com/apache/datafusion-comet/pull/1694) (andygrove) |
| 170 | +- chore: move udf registration to better place [#1899](https://github.com/apache/datafusion-comet/pull/1899) (rluvaton) |
| 171 | +- chore: Comet + Iceberg (1.8.1) CI [#1715](https://github.com/apache/datafusion-comet/pull/1715) (hsiang-c) |
| 172 | +- chore: Introduce `exprHandlers` map in QueryPlanSerde [#1903](https://github.com/apache/datafusion-comet/pull/1903) (andygrove) |
| 173 | +- chore: Enable Spark SQL tests for auto scan mode [#1885](https://github.com/apache/datafusion-comet/pull/1885) (andygrove) |
| 174 | +- Feat: support bit_get function [#1713](https://github.com/apache/datafusion-comet/pull/1713) (kazantsev-maksim) |
| 175 | +- chore: Clippy fixes for Rust 1.88 [#1939](https://github.com/apache/datafusion-comet/pull/1939) (andygrove) |
| 176 | +- Minor: Add unit tests for `ceil`/`floor` functions [#1728](https://github.com/apache/datafusion-comet/pull/1728) (tlm365) |
| 177 | + |
| 178 | +## Credits |
| 179 | + |
| 180 | +Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor. |
| 181 | + |
| 182 | +``` |
| 183 | + 62 Andy Grove |
| 184 | + 16 Matt Butrovich |
| 185 | + 10 Oleks V |
| 186 | + 8 Parth Chandra |
| 187 | + 5 Kazantsev Maksim |
| 188 | + 5 hsiang-c |
| 189 | + 4 Kristin Cowalcijk |
| 190 | + 4 Leung Ming |
| 191 | + 3 B Vadlamani |
| 192 | + 3 drexler-sky |
| 193 | + 2 Emily Matheys |
| 194 | + 2 Huaxin Gao |
| 195 | + 2 KAZUYUKI TANIMURA |
| 196 | + 2 Raz Luvaton |
| 197 | + 2 Zhen Wang |
| 198 | + 1 Artem Kupchinskiy |
| 199 | + 1 Junfan Zhang |
| 200 | + 1 Qi Zhu |
| 201 | + 1 Rishab Joshi |
| 202 | + 1 Tai Le Manh |
| 203 | + 1 Yaniv Kunda |
| 204 | + 1 Zhang Xiaofeng |
| 205 | + 1 ding-young |
| 206 | + 1 trompa |
| 207 | +``` |
| 208 | + |
| 209 | +Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release. |
| 210 | + |
0 commit comments