Skip to content

Commit 6261205

Browse files
authored
Add guide showing comparison between Comet and Gluten (#2012)
1 parent 496cad9 commit 6261205

File tree

4 files changed

+125
-3
lines changed

4 files changed

+125
-3
lines changed
81.4 KB
Loading
30.7 KB
Loading
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
<!---
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Comparison of Comet and Gluten
21+
22+
This document provides a comparison of the Comet and Gluten projects to help guide users who are looking to choose
23+
between them. This document is likely biased because it is maintained by the Comet community.
24+
25+
We recommend trying out both Comet and Gluten to see which is the best fit for your needs.
26+
27+
This document is based on Comet 0.9.0 and Gluten 1.4.0.
28+
29+
## Architecture
30+
31+
Comet and Gluten have very similar architectures. Both are Spark plugins that translate Spark physical plans to
32+
a serialized representation and pass them to native code for execution.
33+
34+
Gluten serializes the plans using the Substrait format and has an extensible architecture that supports execution
35+
against multiple engines. Velox and Clickhouse are currently supported, but Velox is more widely used.
36+
37+
Comet serializes the plans in a proprietary Protocol Buffer format. Execution is delegated to Apache DataFusion. Comet
38+
does not plan to support multiple engines, but rather focus on a tight integration between Spark and DataFusion.
39+
40+
## Underlying Execution Engine: DataFusion vs Velox
41+
42+
One of the main differences between Comet and Gluten is the choice of native execution engine.
43+
44+
Gluten uses Velox, which is a vectorized query engine implemented in C++ and is maintained by Meta.
45+
46+
Comet uses DataFusion, which is a vectorized query engine implemented in Rust and is maintained by the
47+
Apache Software Foundation.
48+
49+
Velox and DataFusion are both mature query engines that are growing in popularity.
50+
51+
Comet may be a better choice for users with plans for integrating with other Rust software in the future, and
52+
Gluten+Velox may be a better choice for users with plans for integrating with other C++ code.
53+
54+
![github-stars-datafusion-velox.png](../_static/images/github-stars-datafusion-velox.png)
55+
56+
## Compatibility
57+
58+
Comet relies on the full Spark SQL test suite (consisting of more than 24,000 tests) as well its own unit and
59+
integration tests to ensure compatibility with Spark. Features that are known to have compatibility differences with
60+
Spark are disabled by default, but users can opt in. See the [Comet Compatibility Guide] for more information.
61+
62+
[Comet Compatibility Guide]: compatibility.md
63+
64+
Gluten also aims to provide compatibility with Spark, and includes a subset of the Spark SQL tests in its own test
65+
suite. See the [Gluten Compatibility Guide] for more information.
66+
67+
[Gluten Compatibility Guide]: https://apache.github.io/incubator-gluten-site/archives/v1.3.0/velox-backend/limitations/
68+
69+
## Performance
70+
71+
When running a benchmark derived from TPC-H on a single node against local Parquet files, we see that both Comet
72+
and Gluten provide a good speedup when compared to Spark. Gluten is currently slightly faster than Comet, but we
73+
expect to close that gap over time.
74+
75+
![tpch_allqueries_comet_gluten.png](../_static/images/tpch_allqueries_comet_gluten.png)
76+
77+
## Ease of Development
78+
79+
Comet has a much smaller codebase than Gluten. A fresh clone of the respective repositories shows that Comet has ~41k
80+
lines of Scala+Java code and ~40k lines of Rust code. Gluten has ~207k lines of Scala+Java code and ~89k lines of C++
81+
code.
82+
83+
Setting up a local development environment with Comet is generally easier than with Gluten due to Rust's package
84+
management capabilities vs the complexities around installing C++ dependencies.
85+
86+
### Comet Lines of Code
87+
88+
```
89+
-------------------------------------------------------------------------------
90+
Language files blank comment code
91+
-------------------------------------------------------------------------------
92+
Rust 159 4870 5388 39989
93+
Scala 171 4849 6277 32538
94+
Java 66 1556 2619 8724
95+
```
96+
97+
### Gluten Lines of Code
98+
99+
```
100+
--------------------------------------------------------------------------------
101+
Language files blank comment code
102+
--------------------------------------------------------------------------------
103+
Scala 1312 23264 37534 179664
104+
C++ 421 9841 10245 64554
105+
Java 328 5063 6726 26520
106+
C/C++ Header 304 4875 6255 23527
107+
```
108+
109+
## Summary
110+
111+
Comet and Gluten are both good solutions for accelerating Spark jobs. We recommend trying both to see which is the
112+
best fit for your needs.

docs/source/user-guide/overview.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,9 +44,19 @@ query results, etc) with Comet turned on or turned off in their Spark
4444
jobs. In addition, Comet extension should automatically detect unsupported
4545
features and fallback to Spark engine.
4646

47-
To achieve this, besides unit tests within Comet itself, we also re-use
48-
Spark SQL tests and make sure they all pass with Comet extension
49-
enabled.
47+
## Comparison with other open-source Spark accelerators
48+
49+
There are two other major open-source Spark accelerators:
50+
51+
- [Apache Gluten (incubating)](https://github.com/apache/incubator-gluten)
52+
- [NVIDIA Spark RAPIDS](https://github.com/NVIDIA/spark-rapids)
53+
54+
We have a detailed guide [comparing Apache DataFusion Comet with Apache Gluten].
55+
56+
Spark RAPIDS is a solution that provides hardware acceleration on NVIDIA GPUs. Comet does not require specialized
57+
hardware.
58+
59+
[comparing Apache DataFusion Comet with Apache Gluten]: gluten_comparison.md
5060

5161
## Getting Started
5262

0 commit comments

Comments
 (0)