|
| 1 | +<!-- |
| 2 | +Licensed to the Apache Software Foundation (ASF) under one |
| 3 | +or more contributor license agreements. See the NOTICE file |
| 4 | +distributed with this work for additional information |
| 5 | +regarding copyright ownership. The ASF licenses this file |
| 6 | +to you under the Apache License, Version 2.0 (the |
| 7 | +"License"); you may not use this file except in compliance |
| 8 | +with the License. You may obtain a copy of the License at |
| 9 | +
|
| 10 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 11 | +
|
| 12 | +Unless required by applicable law or agreed to in writing, |
| 13 | +software distributed under the License is distributed on an |
| 14 | +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 15 | +KIND, either express or implied. See the License for the |
| 16 | +specific language governing permissions and limitations |
| 17 | +under the License. |
| 18 | +--> |
| 19 | + |
| 20 | +# DataFusion Comet 0.8.0 Changelog |
| 21 | + |
| 22 | +This release consists of 81 commits from 11 contributors. See credits at the end of this changelog for more information. |
| 23 | + |
| 24 | +**Fixed bugs:** |
| 25 | + |
| 26 | +- fix: remove code duplication in native_datafusion and native_iceberg_compat implementations [#1443](https://github.com/apache/datafusion-comet/pull/1443) (parthchandra) |
| 27 | +- fix: Refactor CometScanRule and fix bugs [#1483](https://github.com/apache/datafusion-comet/pull/1483) (andygrove) |
| 28 | +- fix: check if handle has been initialized before closing [#1554](https://github.com/apache/datafusion-comet/pull/1554) (wForget) |
| 29 | +- fix: Taking slicing into account when writing BooleanBuffers as fast-encoding format [#1522](https://github.com/apache/datafusion-comet/pull/1522) (Kontinuation) |
| 30 | +- fix: isCometEnabled name conflict [#1569](https://github.com/apache/datafusion-comet/pull/1569) (kazuyukitanimura) |
| 31 | +- fix: make register_object_store use same session_env as file scan [#1555](https://github.com/apache/datafusion-comet/pull/1555) (wForget) |
| 32 | +- fix: adjust CometNativeScan's doCanonicalize and hashCode for AQE, use DataSourceScanExec trait [#1578](https://github.com/apache/datafusion-comet/pull/1578) (mbutrovich) |
| 33 | +- fix: corrected the logic of eliminating CometSparkToColumnarExec [#1597](https://github.com/apache/datafusion-comet/pull/1597) (wForget) |
| 34 | +- fix: avoid panic caused by close null handle of parquet reader [#1604](https://github.com/apache/datafusion-comet/pull/1604) (wForget) |
| 35 | +- fix: Make AQE capable of converting Comet shuffled joins to Comet broadcast hash joins [#1605](https://github.com/apache/datafusion-comet/pull/1605) (Kontinuation) |
| 36 | +- fix: Making shuffle files generated in native shuffle mode reclaimable [#1568](https://github.com/apache/datafusion-comet/pull/1568) (Kontinuation) |
| 37 | +- fix: Support per-task shuffle write rows and shuffle write time metrics [#1617](https://github.com/apache/datafusion-comet/pull/1617) (Kontinuation) |
| 38 | +- fix: Modify Spark SQL core 2 tests for `native_datafusion` reader, change 3.5.5 diff hash length to 11 [#1641](https://github.com/apache/datafusion-comet/pull/1641) (mbutrovich) |
| 39 | +- fix: fix spark/sql test failures in native_iceberg_compat [#1593](https://github.com/apache/datafusion-comet/pull/1593) (parthchandra) |
| 40 | +- fix: handle missing field correctly in native_iceberg_compat [#1656](https://github.com/apache/datafusion-comet/pull/1656) (parthchandra) |
| 41 | +- fix: better int96 support for experimental native scans [#1652](https://github.com/apache/datafusion-comet/pull/1652) (mbutrovich) |
| 42 | +- fix: respect `ignoreNulls` flag in `first_value` and `last_value` [#1626](https://github.com/apache/datafusion-comet/pull/1626) (andygrove) |
| 43 | +- fix: update row groups count in internal metrics accumulator [#1658](https://github.com/apache/datafusion-comet/pull/1658) (parthchandra) |
| 44 | +- fix: Shuffle should maintain insertion order [#1660](https://github.com/apache/datafusion-comet/pull/1660) (EmilyMatt) |
| 45 | + |
| 46 | +**Performance related:** |
| 47 | + |
| 48 | +- perf: Use a global tokio runtime [#1614](https://github.com/apache/datafusion-comet/pull/1614) (andygrove) |
| 49 | +- perf: Respect Spark's PARQUET_FILTER_PUSHDOWN_ENABLED config [#1619](https://github.com/apache/datafusion-comet/pull/1619) (andygrove) |
| 50 | +- perf: Experimental fix to avoid join strategy regression [#1674](https://github.com/apache/datafusion-comet/pull/1674) (andygrove) |
| 51 | + |
| 52 | +**Implemented enhancements:** |
| 53 | + |
| 54 | +- feat: add read array support [#1456](https://github.com/apache/datafusion-comet/pull/1456) (comphead) |
| 55 | +- feat: introduce hadoop mini cluster to test native scan on hdfs [#1556](https://github.com/apache/datafusion-comet/pull/1556) (wForget) |
| 56 | +- feat: make parquet native scan schema case insensitive [#1575](https://github.com/apache/datafusion-comet/pull/1575) (wForget) |
| 57 | +- feat: enable iceberg compat tests, more tests for complex types [#1550](https://github.com/apache/datafusion-comet/pull/1550) (comphead) |
| 58 | +- feat: pushdown filter for native_iceberg_compat [#1566](https://github.com/apache/datafusion-comet/pull/1566) (wForget) |
| 59 | +- feat: Fix struct of arrays schema issue [#1592](https://github.com/apache/datafusion-comet/pull/1592) (comphead) |
| 60 | +- feat: adding more struct/arrays tests [#1594](https://github.com/apache/datafusion-comet/pull/1594) (comphead) |
| 61 | +- feat: respect `batchSize/workerThreads/blockingThreads` configurations for native_iceberg_compat scan [#1587](https://github.com/apache/datafusion-comet/pull/1587) (wForget) |
| 62 | +- feat: add MAP type support for first level [#1603](https://github.com/apache/datafusion-comet/pull/1603) (comphead) |
| 63 | +- feat: Add more tests for nested types combinations for `native_datafusion` [#1632](https://github.com/apache/datafusion-comet/pull/1632) (comphead) |
| 64 | +- feat: Override MapBuilder values field with expected schema [#1643](https://github.com/apache/datafusion-comet/pull/1643) (comphead) |
| 65 | +- feat: track unified memory pool [#1651](https://github.com/apache/datafusion-comet/pull/1651) (wForget) |
| 66 | +- feat: Add support for complex types in native shuffle [#1655](https://github.com/apache/datafusion-comet/pull/1655) (andygrove) |
| 67 | + |
| 68 | +**Documentation updates:** |
| 69 | + |
| 70 | +- docs: Update configuration guide to show optional configs [#1524](https://github.com/apache/datafusion-comet/pull/1524) (andygrove) |
| 71 | +- docs: Add changelog for 0.7.0 release [#1527](https://github.com/apache/datafusion-comet/pull/1527) (andygrove) |
| 72 | +- docs: Use a shallow clone for Spark SQL test instructions [#1547](https://github.com/apache/datafusion-comet/pull/1547) (mbutrovich) |
| 73 | +- docs: Update benchmark results for 0.7.0 release [#1548](https://github.com/apache/datafusion-comet/pull/1548) (andygrove) |
| 74 | +- doc: Renew `kubernetes.md` [#1549](https://github.com/apache/datafusion-comet/pull/1549) (comphead) |
| 75 | +- docs: various improvements to tuning guide [#1525](https://github.com/apache/datafusion-comet/pull/1525) (andygrove) |
| 76 | +- docs: Update supported Spark versions [#1580](https://github.com/apache/datafusion-comet/pull/1580) (andygrove) |
| 77 | +- docs: change OSX/OS X to macOS [#1584](https://github.com/apache/datafusion-comet/pull/1584) (mbutrovich) |
| 78 | +- docs: docs for benchmarking in aws ec2 [#1601](https://github.com/apache/datafusion-comet/pull/1601) (andygrove) |
| 79 | +- docs: Update compatibility docs for new native scans [#1657](https://github.com/apache/datafusion-comet/pull/1657) (andygrove) |
| 80 | +- doc: Document local HDFS setup [#1673](https://github.com/apache/datafusion-comet/pull/1673) (comphead) |
| 81 | + |
| 82 | +**Other:** |
| 83 | + |
| 84 | +- chore: fix issue in release process [#1528](https://github.com/apache/datafusion-comet/pull/1528) (andygrove) |
| 85 | +- chore: Remove all subdependencies [#1514](https://github.com/apache/datafusion-comet/pull/1514) (EmilyMatt) |
| 86 | +- chore: Drop support for Spark 3.3 (EOL) [#1529](https://github.com/apache/datafusion-comet/pull/1529) (andygrove) |
| 87 | +- chore: Prepare for 0.8.0 development [#1530](https://github.com/apache/datafusion-comet/pull/1530) (andygrove) |
| 88 | +- chore: Re-enable GitHub discussions [#1535](https://github.com/apache/datafusion-comet/pull/1535) (andygrove) |
| 89 | +- chore: [FOLLOWUP] Drop support for Spark 3.3 (EOL) [#1534](https://github.com/apache/datafusion-comet/pull/1534) (kazuyukitanimura) |
| 90 | +- build: Use unique name for surefire artifacts [#1544](https://github.com/apache/datafusion-comet/pull/1544) (andygrove) |
| 91 | +- chore: Update links for released version [#1540](https://github.com/apache/datafusion-comet/pull/1540) (andygrove) |
| 92 | +- chore: Enable Comet explicitly in `CometTPCDSQueryTestSuite` [#1559](https://github.com/apache/datafusion-comet/pull/1559) (andygrove) |
| 93 | +- chore: Fix some inconsistencies in memory pool configuration [#1561](https://github.com/apache/datafusion-comet/pull/1561) (andygrove) |
| 94 | +- upgraded spark 3.5.4 to 3.5.5 [#1565](https://github.com/apache/datafusion-comet/pull/1565) (YanivKunda) |
| 95 | +- minor: fix typo [#1570](https://github.com/apache/datafusion-comet/pull/1570) (wForget) |
| 96 | +- Chore: simplify array related functions impl [#1490](https://github.com/apache/datafusion-comet/pull/1490) (kazantsev-maksim) |
| 97 | +- added fallback using reflection for backward-compatibility [#1573](https://github.com/apache/datafusion-comet/pull/1573) (YanivKunda) |
| 98 | +- chore: Override node name for CometSparkToColumnar [#1577](https://github.com/apache/datafusion-comet/pull/1577) (l0kr) |
| 99 | +- chore: Reimplement ShuffleWriterExec using interleave_record_batch [#1511](https://github.com/apache/datafusion-comet/pull/1511) (Kontinuation) |
| 100 | +- chore: Run Comet tests for more Spark versions [#1582](https://github.com/apache/datafusion-comet/pull/1582) (andygrove) |
| 101 | +- Feat: support array_except function [#1343](https://github.com/apache/datafusion-comet/pull/1343) (kazantsev-maksim) |
| 102 | +- minor: Fix clippy warnings [#1606](https://github.com/apache/datafusion-comet/pull/1606) (Kontinuation) |
| 103 | +- chore: Remove some unwraps in hashing code [#1600](https://github.com/apache/datafusion-comet/pull/1600) (andygrove) |
| 104 | +- chore: Remove redundant shims for getFailOnError [#1608](https://github.com/apache/datafusion-comet/pull/1608) (andygrove) |
| 105 | +- chore: Making comet native operators write spill files to spark local dir [#1581](https://github.com/apache/datafusion-comet/pull/1581) (Kontinuation) |
| 106 | +- chore: Refactor QueryPlanSerde to use idiomatic Scala and reduce verbosity [#1609](https://github.com/apache/datafusion-comet/pull/1609) (andygrove) |
| 107 | +- chore: Create simple fuzz test as part of test suite [#1610](https://github.com/apache/datafusion-comet/pull/1610) (andygrove) |
| 108 | +- chore: Document `testSingleLineQuery` test method [#1628](https://github.com/apache/datafusion-comet/pull/1628) (comphead) |
| 109 | +- chore: Parquet fuzz testing [#1623](https://github.com/apache/datafusion-comet/pull/1623) (andygrove) |
| 110 | +- chore: Change default Spark version to 3.5 [#1620](https://github.com/apache/datafusion-comet/pull/1620) (andygrove) |
| 111 | +- chore: Add manually-triggered CI jobs for testing Spark SQL with native scans [#1624](https://github.com/apache/datafusion-comet/pull/1624) (andygrove) |
| 112 | +- chore: refactor v2 scan conversion [#1621](https://github.com/apache/datafusion-comet/pull/1621) (andygrove) |
| 113 | +- chore: clean up `planner.rs` [#1650](https://github.com/apache/datafusion-comet/pull/1650) (comphead) |
| 114 | +- chore: correct name of pipelines for native_datafusion ci workflow [#1653](https://github.com/apache/datafusion-comet/pull/1653) (parthchandra) |
| 115 | +- chore: Upgrade to datafusion 47.0.0-rc1 and arrow-rs 55.0.0 [#1563](https://github.com/apache/datafusion-comet/pull/1563) (andygrove) |
| 116 | +- chore: Upgrade to datafusion 47.0.0 [#1663](https://github.com/apache/datafusion-comet/pull/1663) (YanivKunda) |
| 117 | +- chore: Enable CometFuzzTestSuite int96 test for experimental native scans (without complex types) [#1664](https://github.com/apache/datafusion-comet/pull/1664) (mbutrovich) |
| 118 | +- chore: Refactor Memory Pools [#1662](https://github.com/apache/datafusion-comet/pull/1662) (EmilyMatt) |
| 119 | + |
| 120 | +## Credits |
| 121 | + |
| 122 | +Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor. |
| 123 | + |
| 124 | +``` |
| 125 | + 31 Andy Grove |
| 126 | + 11 Oleks V |
| 127 | + 10 Zhen Wang |
| 128 | + 7 Kristin Cowalcijk |
| 129 | + 6 Matt Butrovich |
| 130 | + 5 Parth Chandra |
| 131 | + 3 Emily Matheys |
| 132 | + 3 Yaniv Kunda |
| 133 | + 2 KAZUYUKI TANIMURA |
| 134 | + 2 Kazantsev Maksim |
| 135 | + 1 Łukasz |
| 136 | +``` |
| 137 | + |
| 138 | +Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release. |
| 139 | + |
0 commit comments