|
| 1 | +<!--- |
| 2 | + Licensed to the Apache Software Foundation (ASF) under one |
| 3 | + or more contributor license agreements. See the NOTICE file |
| 4 | + distributed with this work for additional information |
| 5 | + regarding copyright ownership. The ASF licenses this file |
| 6 | + to you under the Apache License, Version 2.0 (the |
| 7 | + "License"); you may not use this file except in compliance |
| 8 | + with the License. You may obtain a copy of the License at |
| 9 | +
|
| 10 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 11 | +
|
| 12 | + Unless required by applicable law or agreed to in writing, |
| 13 | + software distributed under the License is distributed on an |
| 14 | + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 15 | + KIND, either express or implied. See the License for the |
| 16 | + specific language governing permissions and limitations |
| 17 | + under the License. |
| 18 | +--> |
| 19 | + |
| 20 | +# Comparison of Comet and Gluten |
| 21 | + |
| 22 | +This document provides a comparison of the Comet and Gluten projects to help guide users who are looking to choose |
| 23 | +between them. This document is likely biased because it is maintained by the Comet community. |
| 24 | + |
| 25 | +We recommend trying out both Comet and Gluten to see which is the best fit for your needs. |
| 26 | + |
| 27 | +This document is based on Comet 0.9.0 and Gluten 1.4.0. |
| 28 | + |
| 29 | +## Architecture |
| 30 | + |
| 31 | +Comet and Gluten have very similar architectures. Both are Spark plugins that translate Spark physical plans to |
| 32 | +a serialized representation and pass them to native code for execution. |
| 33 | + |
| 34 | +Gluten serializes the plans using the Substrait format and has an extensible architecture that supports execution |
| 35 | +against multiple engines. Velox and Clickhouse are currently supported, but Velox is more widely used. |
| 36 | + |
| 37 | +Comet serializes the plans in a proprietary Protocol Buffer format. Execution is delegated to Apache DataFusion. Comet |
| 38 | +does not plan to support multiple engines, but rather focus on a tight integration between Spark and DataFusion. |
| 39 | + |
| 40 | +## Underlying Execution Engine: DataFusion vs Velox |
| 41 | + |
| 42 | +One of the main differences between Comet and Gluten is the choice of native execution engine. |
| 43 | + |
| 44 | +Gluten uses Velox, which is a vectorized query engine implemented in C++ and is maintained by Meta. |
| 45 | + |
| 46 | +Comet uses DataFusion, which is a vectorized query engine implemented in Rust and is maintained by the |
| 47 | +Apache Software Foundation. |
| 48 | + |
| 49 | +Velox and DataFusion are both mature query engines that are growing in popularity. |
| 50 | + |
| 51 | +Comet may be a better choice for users with plans for integrating with other Rust software in the future, and |
| 52 | +Gluten+Velox may be a better choice for users with plans for integrating with other C++ code. |
| 53 | + |
| 54 | + |
| 55 | + |
| 56 | +## Compatibility |
| 57 | + |
| 58 | +Comet relies on the full Spark SQL test suite (consisting of more than 24,000 tests) as well its own unit and |
| 59 | +integration tests to ensure compatibility with Spark. Features that are known to have compatibility differences with |
| 60 | +Spark are disabled by default, but users can opt in. See the [Comet Compatibility Guide] for more information. |
| 61 | + |
| 62 | +[Comet Compatibility Guide]: compatibility.md |
| 63 | + |
| 64 | +Gluten also aims to provide compatibility with Spark, and includes a subset of the Spark SQL tests in its own test |
| 65 | +suite. See the [Gluten Compatibility Guide] for more information. |
| 66 | + |
| 67 | +[Gluten Compatibility Guide]: https://apache.github.io/incubator-gluten-site/archives/v1.3.0/velox-backend/limitations/ |
| 68 | + |
| 69 | +## Performance |
| 70 | + |
| 71 | +When running a benchmark derived from TPC-H on a single node against local Parquet files, we see that both Comet |
| 72 | +and Gluten provide a good speedup when compared to Spark. Gluten is currently slightly faster than Comet, but we |
| 73 | +expect to close that gap over time. |
| 74 | + |
| 75 | + |
| 76 | + |
| 77 | +## Ease of Development |
| 78 | + |
| 79 | +Comet has a much smaller codebase than Gluten. A fresh clone of the respective repositories shows that Comet has ~41k |
| 80 | +lines of Scala+Java code and ~40k lines of Rust code. Gluten has ~207k lines of Scala+Java code and ~89k lines of C++ |
| 81 | +code. |
| 82 | + |
| 83 | +Setting up a local development environment with Comet is generally easier than with Gluten due to Rust's package |
| 84 | +management capabilities vs the complexities around installing C++ dependencies. |
| 85 | + |
| 86 | +### Comet Lines of Code |
| 87 | + |
| 88 | +``` |
| 89 | +------------------------------------------------------------------------------- |
| 90 | +Language files blank comment code |
| 91 | +------------------------------------------------------------------------------- |
| 92 | +Rust 159 4870 5388 39989 |
| 93 | +Scala 171 4849 6277 32538 |
| 94 | +Java 66 1556 2619 8724 |
| 95 | +``` |
| 96 | + |
| 97 | +### Gluten Lines of Code |
| 98 | + |
| 99 | +``` |
| 100 | +-------------------------------------------------------------------------------- |
| 101 | +Language files blank comment code |
| 102 | +-------------------------------------------------------------------------------- |
| 103 | +Scala 1312 23264 37534 179664 |
| 104 | +C++ 421 9841 10245 64554 |
| 105 | +Java 328 5063 6726 26520 |
| 106 | +C/C++ Header 304 4875 6255 23527 |
| 107 | +``` |
| 108 | + |
| 109 | +## Summary |
| 110 | + |
| 111 | +Comet and Gluten are both good solutions for accelerating Spark jobs. We recommend trying both to see which is the |
| 112 | +best fit for your needs. |
0 commit comments