Skip to content

Commit 646e014

Browse files
authored
Add Comet 0.12.0 blog post (#125)
1 parent e7a3738 commit 646e014

File tree

1 file changed

+137
-0
lines changed

1 file changed

+137
-0
lines changed
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
---
2+
layout: post
3+
title: Apache DataFusion Comet 0.12.0 Release
4+
date: 2025-12-04
5+
author: pmc
6+
categories: [subprojects]
7+
---
8+
9+
<!--
10+
{% comment %}
11+
Licensed to the Apache Software Foundation (ASF) under one or more
12+
contributor license agreements. See the NOTICE file distributed with
13+
this work for additional information regarding copyright ownership.
14+
The ASF licenses this file to you under the Apache License, Version 2.0
15+
(the "License"); you may not use this file except in compliance with
16+
the License. You may obtain a copy of the License at
17+
18+
http://www.apache.org/licenses/LICENSE-2.0
19+
20+
Unless required by applicable law or agreed to in writing, software
21+
distributed under the License is distributed on an "AS IS" BASIS,
22+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
23+
See the License for the specific language governing permissions and
24+
limitations under the License.
25+
{% endcomment %}
26+
-->
27+
28+
[TOC]
29+
30+
The Apache DataFusion PMC is pleased to announce version 0.12.0 of the [Comet](https://datafusion.apache.org/comet/) subproject.
31+
32+
Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for
33+
improved performance and efficiency without requiring any code changes.
34+
35+
This release covers approximately four weeks of development work and is the result of merging 105 PRs from 13
36+
contributors. See the [change log] for more information.
37+
38+
[change log]: https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.12.0.md
39+
40+
## Release Highlights
41+
42+
### Experimental Native Apache Iceberg Scan Support
43+
44+
Comet has a new, experimental, [native Iceberg scan](https://github.com/apache/datafusion-comet/pull/2528). This work relies on [iceberg-rust](https://github.com/apache/iceberg-rust) and the Parquet reader from [arrow-rs](https://github.com/apache/arrow-rs) that Comet already uses to great effect. Comet’s [existing Iceberg integration](https://datafusion.apache.org/comet/user-guide/0.12/iceberg.html) relies on a modified Iceberg Java build to accelerate Parquet decoding. This new approach allows unmodified Iceberg Java to handle query planning (*i.e.*, catalog access, partition pruning, etc.), then Comet serializes Iceberg `FileScanTask` objects directly to iceberg-rust, enabling native execution of Iceberg table scans through DataFusion.
45+
46+
This represents a significant step forward in Comet's support for data lakehouse architectures and expands the range of workloads that can benefit from native acceleration. Please take a look at the PR and Comet’s documentation to understand the current limitations and try it on your workloads! We are eager for feedback on this approach.
47+
48+
### Code Architecture Improvements
49+
50+
This release includes significant refactoring to improve code maintainability and extensibility, and we will continue those efforts into 0.13.0 development:
51+
52+
- **Unified operator serialization**: The [CometExecRule refactor](https://github.com/apache/datafusion-comet/pull/2768) unifies CometNativeExec creation with serialization through the new `CometOperatorSerde` trait
53+
- **Expression serde refactoring**: Multiple PRs ([#2738](https://github.com/apache/datafusion-comet/pull/2738), [#2741](https://github.com/apache/datafusion-comet/pull/2741), [#2791](https://github.com/apache/datafusion-comet/pull/2791)) moved expression serialization logic out of `QueryPlanSerde` into specialized traits
54+
- **Aggregate expression improvements**: [Added getSupportLevel to CometAggregateExpressionSerde trait](https://github.com/apache/datafusion-comet/pull/2777) for better aggregate function handling
55+
56+
These architectural improvements make it easier for contributors to add new operators and expressions while reducing code complexity.
57+
58+
### New SQL Functions
59+
60+
The following SQL functions are now supported:
61+
62+
- [`concat`](https://github.com/apache/datafusion-comet/pull/2604) - String concatenation
63+
- [`abs`](https://github.com/apache/datafusion-comet/pull/2689) - Absolute value
64+
- [`sha1`](https://github.com/apache/datafusion-comet/pull/2471) - SHA-1 hash function
65+
- [`cot`](https://github.com/apache/datafusion-comet/pull/2755) - Cotangent function
66+
- [Hyperbolic trigonometric functions](https://github.com/apache/datafusion-comet/pull/2784) - sinh, cosh, tanh, and their inverse functions
67+
68+
### New Operators
69+
70+
- [`CometLocalTableScanExec`](https://github.com/apache/datafusion-comet/pull/2735) - Native support for local table scans, eliminating fallback to Spark for small, in-memory datasets
71+
72+
### Configuration and Usability Improvements
73+
74+
- **Simplified on-heap configuration**: [Simplified on-heap memory configuration](https://github.com/apache/datafusion-comet/pull/2599) for easier setup
75+
- **Extended explain format**: [Renamed and improved COMET_EXTENDED_EXPLAIN_FORMAT](https://github.com/apache/datafusion-comet/pull/2644) with better defaults
76+
- **Environment variable support**: [Improved framework for setting configs with environment variables](https://github.com/apache/datafusion-comet/pull/2722)
77+
- **Native config passing**: [All Comet configs now passed to native plan](https://github.com/apache/datafusion-comet/pull/2801)
78+
- **Config categorization**: [Categorized testing configs](https://github.com/apache/datafusion-comet/pull/2740) and added notes about known timezone issues
79+
- **Removed legacy configs**: [Removed COMET_EXPR_ALLOW_INCOMPATIBLE config](https://github.com/apache/datafusion-comet/pull/2786) to simplify configuration
80+
81+
### Bug Fixes
82+
83+
This release includes numerous bug fixes:
84+
85+
- [Fixed None.get in stringDecode](https://github.com/apache/datafusion-comet/pull/2606) when binary child cannot be converted
86+
- [Proper fallback for lpad/rpad with unsupported arguments](https://github.com/apache/datafusion-comet/pull/2630)
87+
- [Fixed trunc/date_trunc with unsupported format strings](https://github.com/apache/datafusion-comet/pull/2634)
88+
- [Corrected single partition handling in native_datafusion](https://github.com/apache/datafusion-comet/pull/2675)
89+
- [Fixed LeftSemi join handling](https://github.com/apache/datafusion-comet/pull/2687) - do not replace SMJ with HJ
90+
- [Fixed CometLiteral class cast exception with arrays](https://github.com/apache/datafusion-comet/pull/2718)
91+
- [Fixed missing SortOrder fallback reason in range partitioning](https://github.com/apache/datafusion-comet/pull/2716)
92+
- [Improved checkSparkMaybeThrows to compare results in success case](https://github.com/apache/datafusion-comet/pull/2728)
93+
- [Fixed null handling in CometVector implementations](https://github.com/apache/datafusion-comet/pull/2643)
94+
95+
### Documentation Improvements
96+
97+
- [Added FFI documentation to contributor guide](https://github.com/apache/datafusion-comet/pull/2668)
98+
- [Updated contributor guide for adding new expressions](https://github.com/apache/datafusion-comet/pull/2704) and [operators](https://github.com/apache/datafusion-comet/pull/2758)
99+
- [Improved documentation layout](https://github.com/apache/datafusion-comet/pull/2587) and [navigation](https://github.com/apache/datafusion-comet/pull/2597)
100+
- [Added prettier enforcement](https://github.com/apache/datafusion-comet/pull/2783) for consistent markdown formatting
101+
- [CI check to ensure generated docs are in sync](https://github.com/apache/datafusion-comet/pull/2779)
102+
- Various documentation updates for [SortOrder expressions](https://github.com/apache/datafusion-comet/pull/2694), [LocalTableScan and WindowExec](https://github.com/apache/datafusion-comet/pull/2742), and [Spark SQL tests](https://github.com/apache/datafusion-comet/pull/2712)
103+
104+
### Dependency Updates
105+
106+
- [Upgraded to Spark 3.5.7](https://github.com/apache/datafusion-comet/pull/2574)
107+
- [Upgraded to DataFusion 50.3.0](https://github.com/apache/datafusion-comet/pull/2605)
108+
- [Upgraded Parquet from 56.0.0 to 56.2.0](https://github.com/apache/datafusion-comet/pull/2608)
109+
- Various other dependency updates via Dependabot
110+
111+
### Spark Compatibility
112+
113+
- Spark 3.4.3 with JDK 11 & 17, Scala 2.12 & 2.13
114+
- Spark 3.5.4 through 3.5.7 with JDK 11 & 17, Scala 2.12 & 2.13
115+
- Spark 4.0.1 with JDK 17, Scala 2.13
116+
117+
We are looking for help from the community to fully support Spark 4.0.1. See [EPIC: Support 4.0.0] for more information.
118+
119+
[EPIC: Support 4.0.0]: https://github.com/apache/datafusion-comet/issues/1637
120+
121+
## Getting Involved
122+
123+
The Comet project welcomes new contributors. We use the same [Slack and Discord] channels as the main DataFusion
124+
project and have a weekly [DataFusion video call].
125+
126+
[Slack and Discord]: https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord
127+
[DataFusion video call]: https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing
128+
129+
The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or
130+
performance regressions that you find. See the [Getting Started] guide for instructions on downloading and installing
131+
Comet.
132+
133+
[Getting Started]: https://datafusion.apache.org/comet/user-guide/installation.html
134+
135+
There are also many [good first issues] waiting for contributions.
136+
137+
[good first issues]: https://github.com/apache/datafusion-comet/contribute

0 commit comments

Comments
 (0)