Skip to content

Commit ae7a1ab

Browse files
authored
Add changelog for 0.9.0 (#1955)
1 parent 547eb9c commit ae7a1ab

File tree

1 file changed

+210
-0
lines changed

1 file changed

+210
-0
lines changed

dev/changelog/0.9.0.md

Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# DataFusion Comet 0.9.0 Changelog
21+
22+
This release consists of 139 commits from 24 contributors. See credits at the end of this changelog for more information.
23+
24+
**Fixed bugs:**
25+
26+
- fix: typo for `instr` in fuzz testing [#1686](https://github.com/apache/datafusion-comet/pull/1686) (mbutrovich)
27+
- fix: Bucketed scan fallback for native_datafusion Parquet scan [#1720](https://github.com/apache/datafusion-comet/pull/1720) (mbutrovich)
28+
- fix: Skip row index Spark SQL tests for native_datafusion Parquet scan [#1724](https://github.com/apache/datafusion-comet/pull/1724) (mbutrovich)
29+
- fix: Check acquired memory when CometMemoryPool grows [#1732](https://github.com/apache/datafusion-comet/pull/1732) (wForget)
30+
- fix: Fix data race in memory profiling [#1727](https://github.com/apache/datafusion-comet/pull/1727) (andygrove)
31+
- fix: Enable some DPP Spark SQL tests [#1734](https://github.com/apache/datafusion-comet/pull/1734) (andygrove)
32+
- fix: support literal null list and map [#1742](https://github.com/apache/datafusion-comet/pull/1742) (kazuyukitanimura)
33+
- fix: get_struct field is incorrect when struct in array [#1687](https://github.com/apache/datafusion-comet/pull/1687) (comphead)
34+
- fix: cast map types correctly in schema adapter [#1771](https://github.com/apache/datafusion-comet/pull/1771) (parthchandra)
35+
- fix: correct schema type checking in native_iceberg_compat [#1755](https://github.com/apache/datafusion-comet/pull/1755) (parthchandra)
36+
- fix: default values for native_datafusion scan [#1756](https://github.com/apache/datafusion-comet/pull/1756) (mbutrovich)
37+
- fix: [native_scans] Support `CASE_SENSITIVE` when reading Parquet [#1782](https://github.com/apache/datafusion-comet/pull/1782) (andygrove)
38+
- fix: cargo install tpchgen-cli in benchmark doc [#1797](https://github.com/apache/datafusion-comet/pull/1797) (zhuqi-lucas)
39+
- fix: support `map_keys` [#1788](https://github.com/apache/datafusion-comet/pull/1788) (comphead)
40+
- fix: fall back on nested types for default values [#1799](https://github.com/apache/datafusion-comet/pull/1799) (mbutrovich)
41+
- fix: Re-enable Spark 4 tests on Linux [#1806](https://github.com/apache/datafusion-comet/pull/1806) (andygrove)
42+
- fix: fallback to Spark scan if encryption is enabled (native_datafusion/native_iceberg_compat) [#1785](https://github.com/apache/datafusion-comet/pull/1785) (parthchandra)
43+
- fix: native_iceberg_compat: move checking parquet types above fetching batch [#1809](https://github.com/apache/datafusion-comet/pull/1809) (mbutrovich)
44+
- fix: translate missing or corrupt file exceptions, fall back if asked to ignore [#1765](https://github.com/apache/datafusion-comet/pull/1765) (mbutrovich)
45+
- fix: Fix Spark SQL AQE exchange reuse test failures [#1811](https://github.com/apache/datafusion-comet/pull/1811) (coderfender)
46+
- fix: Enable more Spark SQL tests [#1834](https://github.com/apache/datafusion-comet/pull/1834) (andygrove)
47+
- fix: support `map_values` [#1835](https://github.com/apache/datafusion-comet/pull/1835) (comphead)
48+
- fix: Handle case where num_cols == 0 in native execution [#1840](https://github.com/apache/datafusion-comet/pull/1840) (andygrove)
49+
- fix: Fix shuffle writing rows containing null struct fields [#1845](https://github.com/apache/datafusion-comet/pull/1845) (Kontinuation)
50+
- fix: Fall back to Spark for `RANGE BETWEEN` window expressions [#1848](https://github.com/apache/datafusion-comet/pull/1848) (andygrove)
51+
- fix: Remove COMET_SHUFFLE_FALLBACK_TO_COLUMNAR hack [#1865](https://github.com/apache/datafusion-comet/pull/1865) (andygrove)
52+
- fix: support read Struct by user schema [#1860](https://github.com/apache/datafusion-comet/pull/1860) (comphead)
53+
- fix: map parquet field_id correctly (native_iceberg_compat) [#1815](https://github.com/apache/datafusion-comet/pull/1815) (parthchandra)
54+
- fix: cast_struct_to_struct aligns to Spark behavior [#1879](https://github.com/apache/datafusion-comet/pull/1879) (mbutrovich)
55+
- fix: correctly handle schemas with nested array of struct (native_iceberg_compat) [#1883](https://github.com/apache/datafusion-comet/pull/1883) (parthchandra)
56+
- fix: set RangePartitioning for native shuffle default to false [#1907](https://github.com/apache/datafusion-comet/pull/1907) (mbutrovich)
57+
- fix: conflict between #1905 and #1892. [#1919](https://github.com/apache/datafusion-comet/pull/1919) (mbutrovich)
58+
- fix: Add overflow check to evaluate of sum decimal accumulator [#1922](https://github.com/apache/datafusion-comet/pull/1922) (leung-ming)
59+
- fix: Fix overflow handling when casting float to decimal [#1914](https://github.com/apache/datafusion-comet/pull/1914) (leung-ming)
60+
- fix: Ignore a test case fails on Miri [#1951](https://github.com/apache/datafusion-comet/pull/1951) (leung-ming)
61+
62+
**Performance related:**
63+
64+
- perf: Add memory profiling [#1702](https://github.com/apache/datafusion-comet/pull/1702) (andygrove)
65+
- perf: Add performance tracing capability [#1706](https://github.com/apache/datafusion-comet/pull/1706) (andygrove)
66+
- perf: Add `COMET_RESPECT_PARQUET_FILTER_PUSHDOWN` config [#1936](https://github.com/apache/datafusion-comet/pull/1936) (andygrove)
67+
68+
**Implemented enhancements:**
69+
70+
- feat: add jemalloc as optional custom allocator [#1679](https://github.com/apache/datafusion-comet/pull/1679) (mbutrovich)
71+
- feat: support `array_repeat` [#1680](https://github.com/apache/datafusion-comet/pull/1680) (comphead)
72+
- feat: More warning info for users [#1667](https://github.com/apache/datafusion-comet/pull/1667) (hsiang-c)
73+
- feat: decode() expression when using 'utf-8' encoding [#1697](https://github.com/apache/datafusion-comet/pull/1697) (mbutrovich)
74+
- feat: regexp_replace() expression with no starting offset [#1700](https://github.com/apache/datafusion-comet/pull/1700) (mbutrovich)
75+
- feat: Improve performance tracing feature [#1730](https://github.com/apache/datafusion-comet/pull/1730) (andygrove)
76+
- feat: Set/cancel with job tag and make max broadcast table size configurable [#1693](https://github.com/apache/datafusion-comet/pull/1693) (wForget)
77+
- feat: Add support for `expm1` expression from `datafusion-spark` crate [#1711](https://github.com/apache/datafusion-comet/pull/1711) (andygrove)
78+
- feat: Add config option for showing all Comet plan transformations [#1780](https://github.com/apache/datafusion-comet/pull/1780) (andygrove)
79+
- feat: Support Type widening: byte → short/int/long, short → int/long [#1770](https://github.com/apache/datafusion-comet/pull/1770) (huaxingao)
80+
- feat: Translate Hadoop S3A configurations to object_store configurations [#1817](https://github.com/apache/datafusion-comet/pull/1817) (Kontinuation)
81+
- feat: Upgrade to official DataFusion 48.0.0 release [#1877](https://github.com/apache/datafusion-comet/pull/1877) (andygrove)
82+
- feat: Add experimental auto mode for `COMET_PARQUET_SCAN_IMPL` [#1747](https://github.com/apache/datafusion-comet/pull/1747) (andygrove)
83+
- feat: support RangePartitioning with native shuffle [#1862](https://github.com/apache/datafusion-comet/pull/1862) (mbutrovich)
84+
- feat: Add support for signum expression [#1889](https://github.com/apache/datafusion-comet/pull/1889) (andygrove)
85+
- feat: Add support to lookup map by key [#1898](https://github.com/apache/datafusion-comet/pull/1898) (comphead)
86+
- feat: support array_max [#1892](https://github.com/apache/datafusion-comet/pull/1892) (drexler-sky)
87+
- feat: pass ignore_nulls flag to first and last [#1866](https://github.com/apache/datafusion-comet/pull/1866) (rluvaton)
88+
- feat: Implement ToPrettyString [#1921](https://github.com/apache/datafusion-comet/pull/1921) (andygrove)
89+
- feat: Support hadoop s3a config in native_iceberg_compat [#1925](https://github.com/apache/datafusion-comet/pull/1925) (parthchandra)
90+
- feat: rand expression support [#1199](https://github.com/apache/datafusion-comet/pull/1199) (akupchinskiy)
91+
- feat: supports array_distinct [#1923](https://github.com/apache/datafusion-comet/pull/1923) (drexler-sky)
92+
- feat: `auto` scan mode should check for supported file location [#1930](https://github.com/apache/datafusion-comet/pull/1930) (andygrove)
93+
- feat: Encapsulate Parquet objects [#1920](https://github.com/apache/datafusion-comet/pull/1920) (huaxingao)
94+
- feat: Change default value of `COMET_NATIVE_SCAN_IMPL` to `auto` [#1933](https://github.com/apache/datafusion-comet/pull/1933) (andygrove)
95+
- feat: Supports array_union [#1945](https://github.com/apache/datafusion-comet/pull/1945) (drexler-sky)
96+
97+
**Documentation updates:**
98+
99+
- docs: Add changelog for 0.8.0 [#1675](https://github.com/apache/datafusion-comet/pull/1675) (andygrove)
100+
- docs: Add instructions on running TPC-H on macOS [#1647](https://github.com/apache/datafusion-comet/pull/1647) (andygrove)
101+
- docs: Add documentation for accelerating Iceberg Parquet scans with Comet [#1683](https://github.com/apache/datafusion-comet/pull/1683) (andygrove)
102+
- docs: Add note on setting `core.abbrev` when generating diffs [#1735](https://github.com/apache/datafusion-comet/pull/1735) (andygrove)
103+
- docs: Remove outdated param in macos bench guide [#1748](https://github.com/apache/datafusion-comet/pull/1748) (ding-young)
104+
- docs: Add instructions for running individual Spark SQL tests from sbt [#1752](https://github.com/apache/datafusion-comet/pull/1752) (coderfender)
105+
- docs: Add documentation for native_datafusion Parquet scanner's S3 support [#1832](https://github.com/apache/datafusion-comet/pull/1832) (Kontinuation)
106+
- docs: Add docs stating that Comet does not support reading decimals encoded in Parquet BINARY format [#1895](https://github.com/apache/datafusion-comet/pull/1895) (andygrove)
107+
108+
**Other:**
109+
110+
- chore: Start 0.9.0 development [#1676](https://github.com/apache/datafusion-comet/pull/1676) (andygrove)
111+
- chore: Update viable crates [#1677](https://github.com/apache/datafusion-comet/pull/1677) (EmilyMatt)
112+
- chore: match Maven plugin versions with Spark 3.5 [#1668](https://github.com/apache/datafusion-comet/pull/1668) (hsiang-c)
113+
- chore: Remove fallback reason "because the children were not native" [#1672](https://github.com/apache/datafusion-comet/pull/1672) (andygrove)
114+
- chore: Rename `scalarExprToProto` to `scalarFunctionExprToProto` [#1688](https://github.com/apache/datafusion-comet/pull/1688) (comphead)
115+
- chore: fix build errors [#1690](https://github.com/apache/datafusion-comet/pull/1690) (comphead)
116+
- chore: Make Aggregate transformation more compact [#1670](https://github.com/apache/datafusion-comet/pull/1670) (EmilyMatt)
117+
- chore: update dev/release/rat_exclude_files.txt [#1689](https://github.com/apache/datafusion-comet/pull/1689) (hsiang-c)
118+
- chore: Move Comet rules into their own files [#1695](https://github.com/apache/datafusion-comet/pull/1695) (andygrove)
119+
- chore: Remove fast encoding option [#1703](https://github.com/apache/datafusion-comet/pull/1703) (andygrove)
120+
- chore: fix CI job name [#1712](https://github.com/apache/datafusion-comet/pull/1712) (hsiang-c)
121+
- minor: Warn if memory pool is dropped with bytes still reserved [#1721](https://github.com/apache/datafusion-comet/pull/1721) (andygrove)
122+
- chore: Correct memory acquired size in unified memory pool [#1738](https://github.com/apache/datafusion-comet/pull/1738) (zuston)
123+
- chore: allow large errors for Clippy [#1743](https://github.com/apache/datafusion-comet/pull/1743) (comphead)
124+
- chore: Refactor DataTypeSupport [#1741](https://github.com/apache/datafusion-comet/pull/1741) (andygrove)
125+
- chore: More refactoring of type checking logic [#1744](https://github.com/apache/datafusion-comet/pull/1744) (andygrove)
126+
- chore: Enable more complex type tests [#1753](https://github.com/apache/datafusion-comet/pull/1753) (andygrove)
127+
- chore: Add `scanImpl` attribute to `CometScanExec` [#1746](https://github.com/apache/datafusion-comet/pull/1746) (andygrove)
128+
- chore: Prepare for DataFusion 48.0.0 [#1710](https://github.com/apache/datafusion-comet/pull/1710) (andygrove)
129+
- Docs: Setup Comet on IntelliJ [#1760](https://github.com/apache/datafusion-comet/pull/1760) (coderfender)
130+
- chore: Reenable nested types for CometFuzzTestSuite with int96 [#1761](https://github.com/apache/datafusion-comet/pull/1761) (mbutrovich)
131+
- chore: Enable partial Spark SQL tests for `native_iceberg_compat` scan [#1762](https://github.com/apache/datafusion-comet/pull/1762) (andygrove)
132+
- chore: [native_iceberg_compat / native_datafusion] Ignore Spark SQL Parquet encryption tests [#1763](https://github.com/apache/datafusion-comet/pull/1763) (andygrove)
133+
- build: Ignore array_repeat test to fix CI issues [#1774](https://github.com/apache/datafusion-comet/pull/1774) (andygrove)
134+
- chore: Upload crash logs if Java tests fail [#1779](https://github.com/apache/datafusion-comet/pull/1779) (andygrove)
135+
- chore: Drop support for Java 8 [#1777](https://github.com/apache/datafusion-comet/pull/1777) (andygrove)
136+
- chore: Bump arrow to 18.3.0 [#1773](https://github.com/apache/datafusion-comet/pull/1773) (Kontinuation)
137+
- build: Stop running Comet's Spark 4 tests on Linux for PR builds [#1802](https://github.com/apache/datafusion-comet/pull/1802) (andygrove)
138+
- Chore: Moved strings expressions to separate file [#1792](https://github.com/apache/datafusion-comet/pull/1792) (kazantsev-maksim)
139+
- chore: Speed up "PR Builds" CI workflows [#1807](https://github.com/apache/datafusion-comet/pull/1807) (andygrove)
140+
- chore: [native scans] Ignore Spark SQL test for string predicate pushdown [#1768](https://github.com/apache/datafusion-comet/pull/1768) (andygrove)
141+
- chore: Bump DataFusion to git rev 2c2f225 [#1814](https://github.com/apache/datafusion-comet/pull/1814) (andygrove)
142+
- Feat: support bit_count function [#1602](https://github.com/apache/datafusion-comet/pull/1602) (kazantsev-maksim)
143+
- Chore: implement bit_not as ScalarUDFImpl [#1825](https://github.com/apache/datafusion-comet/pull/1825) (kazantsev-maksim)
144+
- build: Specify -Dsbt.log.noformat=true in sbt CI runs [#1822](https://github.com/apache/datafusion-comet/pull/1822) (andygrove)
145+
- chore: Use unique artifact names in Java test run [#1818](https://github.com/apache/datafusion-comet/pull/1818) (andygrove)
146+
- minor: Refactor PhysicalPlanner::default() to avoid duplicate code [#1821](https://github.com/apache/datafusion-comet/pull/1821) (andygrove)
147+
- Chore: implement bit_count as ScalarUDFImpl [#1826](https://github.com/apache/datafusion-comet/pull/1826) (kazantsev-maksim)
148+
- chore: IgnoreCometNativeScan on a few more Spark SQL tests [#1837](https://github.com/apache/datafusion-comet/pull/1837) (mbutrovich)
149+
- chore: Enable tests in RemoveRedundantProjectsSuite.scala related to issue #242 [#1838](https://github.com/apache/datafusion-comet/pull/1838) (rishvin)
150+
- minor: Replace many instances of `checkSparkAnswer` with `checkSparkAnswerAndOperator` [#1851](https://github.com/apache/datafusion-comet/pull/1851) (andygrove)
151+
- chore: Update documentation and ignore Spark SQL tests for known issue with count distinct on NaN in aggregate [#1847](https://github.com/apache/datafusion-comet/pull/1847) (andygrove)
152+
- chore: Ignore Spark SQL WholeStageCodegenSuite tests [#1859](https://github.com/apache/datafusion-comet/pull/1859) (andygrove)
153+
- chore: Upgrade to DataFusion 48.0.0-rc3 [#1863](https://github.com/apache/datafusion-comet/pull/1863) (andygrove)
154+
- upgraded spark 3.5.5 to 3.5.6 [#1861](https://github.com/apache/datafusion-comet/pull/1861) (YanivKunda)
155+
- build: Disable some rounding tests when miri is enabled [#1873](https://github.com/apache/datafusion-comet/pull/1873) (andygrove)
156+
- chore: Enable Spark SQL tests for `native_iceberg_compat` [#1876](https://github.com/apache/datafusion-comet/pull/1876) (andygrove)
157+
- chore: Enable more Spark SQL tests [#1869](https://github.com/apache/datafusion-comet/pull/1869) (andygrove)
158+
- chore: refactor planner read schema tests [#1886](https://github.com/apache/datafusion-comet/pull/1886) (comphead)
159+
- chore: Implement date_trunc as ScalarUDFImpl [#1880](https://github.com/apache/datafusion-comet/pull/1880) (leung-ming)
160+
- Chore: implement datetime funcs as ScalarUDFImpl [#1874](https://github.com/apache/datafusion-comet/pull/1874) (trompa)
161+
- minor: Improve testing of math scalar functions [#1896](https://github.com/apache/datafusion-comet/pull/1896) (andygrove)
162+
- minor: Avoid rewriting join to unsupported join [#1888](https://github.com/apache/datafusion-comet/pull/1888) (andygrove)
163+
- chore: Enable `native_iceberg_compat` Spark SQL tests (for real, this time) [#1910](https://github.com/apache/datafusion-comet/pull/1910) (andygrove)
164+
- chore: rename makeParquetFileAllTypes to makeParquetFileAllPrimitiveTypes [#1905](https://github.com/apache/datafusion-comet/pull/1905) (parthchandra)
165+
- chore: add a test case to read from an arbitrarily complex type schema [#1911](https://github.com/apache/datafusion-comet/pull/1911) (parthchandra)
166+
- test: Trigger Spark 3.4.3 SQL tests for iceberg-compat [#1912](https://github.com/apache/datafusion-comet/pull/1912) (kazuyukitanimura)
167+
- build: Fix conflict between #1910 and #1912 [#1924](https://github.com/apache/datafusion-comet/pull/1924) (andygrove)
168+
- minor: fix kube/Dockerfile build failed [#1918](https://github.com/apache/datafusion-comet/pull/1918) (zhangxffff)
169+
- chore: Improve reporting of fallback reasons for CollectLimit [#1694](https://github.com/apache/datafusion-comet/pull/1694) (andygrove)
170+
- chore: move udf registration to better place [#1899](https://github.com/apache/datafusion-comet/pull/1899) (rluvaton)
171+
- chore: Comet + Iceberg (1.8.1) CI [#1715](https://github.com/apache/datafusion-comet/pull/1715) (hsiang-c)
172+
- chore: Introduce `exprHandlers` map in QueryPlanSerde [#1903](https://github.com/apache/datafusion-comet/pull/1903) (andygrove)
173+
- chore: Enable Spark SQL tests for auto scan mode [#1885](https://github.com/apache/datafusion-comet/pull/1885) (andygrove)
174+
- Feat: support bit_get function [#1713](https://github.com/apache/datafusion-comet/pull/1713) (kazantsev-maksim)
175+
- chore: Clippy fixes for Rust 1.88 [#1939](https://github.com/apache/datafusion-comet/pull/1939) (andygrove)
176+
- Minor: Add unit tests for `ceil`/`floor` functions [#1728](https://github.com/apache/datafusion-comet/pull/1728) (tlm365)
177+
178+
## Credits
179+
180+
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
181+
182+
```
183+
62 Andy Grove
184+
16 Matt Butrovich
185+
10 Oleks V
186+
8 Parth Chandra
187+
5 Kazantsev Maksim
188+
5 hsiang-c
189+
4 Kristin Cowalcijk
190+
4 Leung Ming
191+
3 B Vadlamani
192+
3 drexler-sky
193+
2 Emily Matheys
194+
2 Huaxin Gao
195+
2 KAZUYUKI TANIMURA
196+
2 Raz Luvaton
197+
2 Zhen Wang
198+
1 Artem Kupchinskiy
199+
1 Junfan Zhang
200+
1 Qi Zhu
201+
1 Rishab Joshi
202+
1 Tai Le Manh
203+
1 Yaniv Kunda
204+
1 Zhang Xiaofeng
205+
1 ding-young
206+
1 trompa
207+
```
208+
209+
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.
210+

0 commit comments

Comments
 (0)