|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: Apache DataFusion Comet 0.12.0 Release |
| 4 | +date: 2025-12-04 |
| 5 | +author: pmc |
| 6 | +categories: [subprojects] |
| 7 | +--- |
| 8 | + |
| 9 | +<!-- |
| 10 | +{% comment %} |
| 11 | +Licensed to the Apache Software Foundation (ASF) under one or more |
| 12 | +contributor license agreements. See the NOTICE file distributed with |
| 13 | +this work for additional information regarding copyright ownership. |
| 14 | +The ASF licenses this file to you under the Apache License, Version 2.0 |
| 15 | +(the "License"); you may not use this file except in compliance with |
| 16 | +the License. You may obtain a copy of the License at |
| 17 | +
|
| 18 | +http://www.apache.org/licenses/LICENSE-2.0 |
| 19 | +
|
| 20 | +Unless required by applicable law or agreed to in writing, software |
| 21 | +distributed under the License is distributed on an "AS IS" BASIS, |
| 22 | +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 23 | +See the License for the specific language governing permissions and |
| 24 | +limitations under the License. |
| 25 | +{% endcomment %} |
| 26 | +--> |
| 27 | + |
| 28 | +[TOC] |
| 29 | + |
| 30 | +The Apache DataFusion PMC is pleased to announce version 0.12.0 of the [Comet](https://datafusion.apache.org/comet/) subproject. |
| 31 | + |
| 32 | +Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for |
| 33 | +improved performance and efficiency without requiring any code changes. |
| 34 | + |
| 35 | +This release covers approximately four weeks of development work and is the result of merging 105 PRs from 13 |
| 36 | +contributors. See the [change log] for more information. |
| 37 | + |
| 38 | +[change log]: https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.12.0.md |
| 39 | + |
| 40 | +## Release Highlights |
| 41 | + |
| 42 | +### Experimental Native Apache Iceberg Scan Support |
| 43 | + |
| 44 | +Comet has a new, experimental, [native Iceberg scan](https://github.com/apache/datafusion-comet/pull/2528). This work relies on [iceberg-rust](https://github.com/apache/iceberg-rust) and the Parquet reader from [arrow-rs](https://github.com/apache/arrow-rs) that Comet already uses to great effect. Comet’s [existing Iceberg integration](https://datafusion.apache.org/comet/user-guide/0.12/iceberg.html) relies on a modified Iceberg Java build to accelerate Parquet decoding. This new approach allows unmodified Iceberg Java to handle query planning (*i.e.*, catalog access, partition pruning, etc.), then Comet serializes Iceberg `FileScanTask` objects directly to iceberg-rust, enabling native execution of Iceberg table scans through DataFusion. |
| 45 | + |
| 46 | +This represents a significant step forward in Comet's support for data lakehouse architectures and expands the range of workloads that can benefit from native acceleration. Please take a look at the PR and Comet’s documentation to understand the current limitations and try it on your workloads! We are eager for feedback on this approach. |
| 47 | + |
| 48 | +### Code Architecture Improvements |
| 49 | + |
| 50 | +This release includes significant refactoring to improve code maintainability and extensibility, and we will continue those efforts into 0.13.0 development: |
| 51 | + |
| 52 | +- **Unified operator serialization**: The [CometExecRule refactor](https://github.com/apache/datafusion-comet/pull/2768) unifies CometNativeExec creation with serialization through the new `CometOperatorSerde` trait |
| 53 | +- **Expression serde refactoring**: Multiple PRs ([#2738](https://github.com/apache/datafusion-comet/pull/2738), [#2741](https://github.com/apache/datafusion-comet/pull/2741), [#2791](https://github.com/apache/datafusion-comet/pull/2791)) moved expression serialization logic out of `QueryPlanSerde` into specialized traits |
| 54 | +- **Aggregate expression improvements**: [Added getSupportLevel to CometAggregateExpressionSerde trait](https://github.com/apache/datafusion-comet/pull/2777) for better aggregate function handling |
| 55 | + |
| 56 | +These architectural improvements make it easier for contributors to add new operators and expressions while reducing code complexity. |
| 57 | + |
| 58 | +### New SQL Functions |
| 59 | + |
| 60 | +The following SQL functions are now supported: |
| 61 | + |
| 62 | +- [`concat`](https://github.com/apache/datafusion-comet/pull/2604) - String concatenation |
| 63 | +- [`abs`](https://github.com/apache/datafusion-comet/pull/2689) - Absolute value |
| 64 | +- [`sha1`](https://github.com/apache/datafusion-comet/pull/2471) - SHA-1 hash function |
| 65 | +- [`cot`](https://github.com/apache/datafusion-comet/pull/2755) - Cotangent function |
| 66 | +- [Hyperbolic trigonometric functions](https://github.com/apache/datafusion-comet/pull/2784) - sinh, cosh, tanh, and their inverse functions |
| 67 | + |
| 68 | +### New Operators |
| 69 | + |
| 70 | +- [`CometLocalTableScanExec`](https://github.com/apache/datafusion-comet/pull/2735) - Native support for local table scans, eliminating fallback to Spark for small, in-memory datasets |
| 71 | + |
| 72 | +### Configuration and Usability Improvements |
| 73 | + |
| 74 | +- **Simplified on-heap configuration**: [Simplified on-heap memory configuration](https://github.com/apache/datafusion-comet/pull/2599) for easier setup |
| 75 | +- **Extended explain format**: [Renamed and improved COMET_EXTENDED_EXPLAIN_FORMAT](https://github.com/apache/datafusion-comet/pull/2644) with better defaults |
| 76 | +- **Environment variable support**: [Improved framework for setting configs with environment variables](https://github.com/apache/datafusion-comet/pull/2722) |
| 77 | +- **Native config passing**: [All Comet configs now passed to native plan](https://github.com/apache/datafusion-comet/pull/2801) |
| 78 | +- **Config categorization**: [Categorized testing configs](https://github.com/apache/datafusion-comet/pull/2740) and added notes about known timezone issues |
| 79 | +- **Removed legacy configs**: [Removed COMET_EXPR_ALLOW_INCOMPATIBLE config](https://github.com/apache/datafusion-comet/pull/2786) to simplify configuration |
| 80 | + |
| 81 | +### Bug Fixes |
| 82 | + |
| 83 | +This release includes numerous bug fixes: |
| 84 | + |
| 85 | +- [Fixed None.get in stringDecode](https://github.com/apache/datafusion-comet/pull/2606) when binary child cannot be converted |
| 86 | +- [Proper fallback for lpad/rpad with unsupported arguments](https://github.com/apache/datafusion-comet/pull/2630) |
| 87 | +- [Fixed trunc/date_trunc with unsupported format strings](https://github.com/apache/datafusion-comet/pull/2634) |
| 88 | +- [Corrected single partition handling in native_datafusion](https://github.com/apache/datafusion-comet/pull/2675) |
| 89 | +- [Fixed LeftSemi join handling](https://github.com/apache/datafusion-comet/pull/2687) - do not replace SMJ with HJ |
| 90 | +- [Fixed CometLiteral class cast exception with arrays](https://github.com/apache/datafusion-comet/pull/2718) |
| 91 | +- [Fixed missing SortOrder fallback reason in range partitioning](https://github.com/apache/datafusion-comet/pull/2716) |
| 92 | +- [Improved checkSparkMaybeThrows to compare results in success case](https://github.com/apache/datafusion-comet/pull/2728) |
| 93 | +- [Fixed null handling in CometVector implementations](https://github.com/apache/datafusion-comet/pull/2643) |
| 94 | + |
| 95 | +### Documentation Improvements |
| 96 | + |
| 97 | +- [Added FFI documentation to contributor guide](https://github.com/apache/datafusion-comet/pull/2668) |
| 98 | +- [Updated contributor guide for adding new expressions](https://github.com/apache/datafusion-comet/pull/2704) and [operators](https://github.com/apache/datafusion-comet/pull/2758) |
| 99 | +- [Improved documentation layout](https://github.com/apache/datafusion-comet/pull/2587) and [navigation](https://github.com/apache/datafusion-comet/pull/2597) |
| 100 | +- [Added prettier enforcement](https://github.com/apache/datafusion-comet/pull/2783) for consistent markdown formatting |
| 101 | +- [CI check to ensure generated docs are in sync](https://github.com/apache/datafusion-comet/pull/2779) |
| 102 | +- Various documentation updates for [SortOrder expressions](https://github.com/apache/datafusion-comet/pull/2694), [LocalTableScan and WindowExec](https://github.com/apache/datafusion-comet/pull/2742), and [Spark SQL tests](https://github.com/apache/datafusion-comet/pull/2712) |
| 103 | + |
| 104 | +### Dependency Updates |
| 105 | + |
| 106 | +- [Upgraded to Spark 3.5.7](https://github.com/apache/datafusion-comet/pull/2574) |
| 107 | +- [Upgraded to DataFusion 50.3.0](https://github.com/apache/datafusion-comet/pull/2605) |
| 108 | +- [Upgraded Parquet from 56.0.0 to 56.2.0](https://github.com/apache/datafusion-comet/pull/2608) |
| 109 | +- Various other dependency updates via Dependabot |
| 110 | + |
| 111 | +### Spark Compatibility |
| 112 | + |
| 113 | +- Spark 3.4.3 with JDK 11 & 17, Scala 2.12 & 2.13 |
| 114 | +- Spark 3.5.4 through 3.5.7 with JDK 11 & 17, Scala 2.12 & 2.13 |
| 115 | +- Spark 4.0.1 with JDK 17, Scala 2.13 |
| 116 | + |
| 117 | +We are looking for help from the community to fully support Spark 4.0.1. See [EPIC: Support 4.0.0] for more information. |
| 118 | + |
| 119 | +[EPIC: Support 4.0.0]: https://github.com/apache/datafusion-comet/issues/1637 |
| 120 | + |
| 121 | +## Getting Involved |
| 122 | + |
| 123 | +The Comet project welcomes new contributors. We use the same [Slack and Discord] channels as the main DataFusion |
| 124 | +project and have a weekly [DataFusion video call]. |
| 125 | + |
| 126 | +[Slack and Discord]: https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord |
| 127 | +[DataFusion video call]: https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing |
| 128 | + |
| 129 | +The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or |
| 130 | +performance regressions that you find. See the [Getting Started] guide for instructions on downloading and installing |
| 131 | +Comet. |
| 132 | + |
| 133 | +[Getting Started]: https://datafusion.apache.org/comet/user-guide/installation.html |
| 134 | + |
| 135 | +There are also many [good first issues] waiting for contributions. |
| 136 | + |
| 137 | +[good first issues]: https://github.com/apache/datafusion-comet/contribute |
0 commit comments