|
| 1 | +<!-- |
| 2 | +Licensed to the Apache Software Foundation (ASF) under one |
| 3 | +or more contributor license agreements. See the NOTICE file |
| 4 | +distributed with this work for additional information |
| 5 | +regarding copyright ownership. The ASF licenses this file |
| 6 | +to you under the Apache License, Version 2.0 (the |
| 7 | +"License"); you may not use this file except in compliance |
| 8 | +with the License. You may obtain a copy of the License at |
| 9 | +
|
| 10 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 11 | +
|
| 12 | +Unless required by applicable law or agreed to in writing, |
| 13 | +software distributed under the License is distributed on an |
| 14 | +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 15 | +KIND, either express or implied. See the License for the |
| 16 | +specific language governing permissions and limitations |
| 17 | +under the License. |
| 18 | +--> |
| 19 | + |
| 20 | +# Comet Accelerator for Apache Spark and Apache Iceberg |
| 21 | + |
| 22 | +<!-- Code from https://buttons.github.io/ --> |
| 23 | +<p> |
| 24 | + <!-- Place this tag where you want the button to render. --> |
| 25 | + <a class="github-button" href="https://github.com/apache/datafusion-comet" data-size="large" data-show-count="true" aria-label="Star apache/datafusion-comet on GitHub">Star</a> |
| 26 | + <!-- Place this tag where you want the button to render. --> |
| 27 | + <a class="github-button" href="https://github.com/apache/datafusion-comet/fork" data-size="large" data-show-count="true" aria-label="Fork apache/datafusion-comet on GitHub">Fork</a> |
| 28 | +</p> |
| 29 | + |
| 30 | +Apache DataFusion Comet is a high-performance accelerator for Apache Spark, built on top of the powerful |
| 31 | +[Apache DataFusion] query engine. Comet is designed to significantly enhance the |
| 32 | +performance of Apache Spark workloads while leveraging commodity hardware and seamlessly integrating with the |
| 33 | +Spark ecosystem without requiring any code changes. |
| 34 | + |
| 35 | +Comet also accelerates Apache Iceberg, when performing Parquet scans from Spark. |
| 36 | + |
| 37 | +[Apache DataFusion]: https://datafusion.apache.org |
| 38 | + |
| 39 | +## Run Spark Queries at DataFusion Speeds |
| 40 | + |
| 41 | +Comet delivers a performance speedup for many queries, enabling faster data processing and shorter time-to-insights. |
| 42 | + |
| 43 | +The following chart shows the time it takes to run the 22 TPC-H queries against 100 GB of data in Parquet format |
| 44 | +using a single executor with 8 cores. See the [Comet Benchmarking Guide](https://datafusion.apache.org/comet/contributor-guide/benchmarking.html) |
| 45 | +for details of the environment used for these benchmarks. |
| 46 | + |
| 47 | +When using Comet, the overall run time is reduced from 652 seconds to 268 seconds, a 2.4x speedup. |
| 48 | + |
| 49 | + |
| 50 | + |
| 51 | +Here is a breakdown showing relative performance of Spark and Comet for each TPC-H query. |
| 52 | + |
| 53 | + |
| 54 | + |
| 55 | +These benchmarks can be reproduced in any environment using the documentation in the |
| 56 | +[Comet Benchmarking Guide](/contributor-guide/benchmarking.md). We encourage |
| 57 | +you to run your own benchmarks. |
| 58 | + |
| 59 | +## Use Commodity Hardware |
| 60 | + |
| 61 | +Comet leverages commodity hardware, eliminating the need for costly hardware upgrades or |
| 62 | +specialized hardware accelerators, such as GPUs or FPGA. By maximizing the utilization of commodity hardware, Comet |
| 63 | +ensures cost-effectiveness and scalability for your Spark deployments. |
| 64 | + |
| 65 | +## Spark Compatibility |
| 66 | + |
| 67 | +Comet aims for 100% compatibility with all supported versions of Apache Spark, allowing you to integrate Comet into |
| 68 | +your existing Spark deployments and workflows seamlessly. With no code changes required, you can immediately harness |
| 69 | +the benefits of Comet's acceleration capabilities without disrupting your Spark applications. |
| 70 | + |
| 71 | +## Tight Integration with Apache DataFusion |
| 72 | + |
| 73 | +Comet tightly integrates with the core Apache DataFusion project, leveraging its powerful execution engine. With |
| 74 | +seamless interoperability between Comet and DataFusion, you can achieve optimal performance and efficiency in your |
| 75 | +Spark workloads. |
| 76 | + |
| 77 | +## Active Community |
| 78 | + |
| 79 | +Comet boasts a vibrant and active community of developers, contributors, and users dedicated to advancing the |
| 80 | +capabilities of Apache DataFusion and accelerating the performance of Apache Spark. |
| 81 | + |
| 82 | +## Getting Started |
| 83 | + |
| 84 | +To get started with Apache DataFusion Comet, follow the |
| 85 | +[installation instructions](https://datafusion.apache.org/comet/user-guide/installation.html). Join the |
| 86 | +[DataFusion Slack and Discord channels](https://datafusion.apache.org/contributor-guide/communication.html) to connect |
| 87 | +with other users, ask questions, and share your experiences with Comet. |
| 88 | + |
| 89 | +Follow [Apache DataFusion Comet Overview](https://datafusion.apache.org/comet/user-guide/overview.html) to get more detailed information |
| 90 | + |
| 91 | +## Contributing |
| 92 | + |
| 93 | +We welcome contributions from the community to help improve and enhance Apache DataFusion Comet. Whether it's fixing |
| 94 | +bugs, adding new features, writing documentation, or optimizing performance, your contributions are invaluable in |
| 95 | +shaping the future of Comet. Check out our |
| 96 | +[contributor guide](https://datafusion.apache.org/comet/contributor-guide/contributing.html) to get started. |
| 97 | + |
| 98 | +```{toctree} |
| 99 | +:maxdepth: 1 |
| 100 | +:caption: Overview |
| 101 | +:hidden: |
| 102 | +
|
| 103 | +Comet Overview <overview> |
| 104 | +Comparison with Gluten <gluten_comparison> |
| 105 | +``` |
| 106 | + |
| 107 | +```{toctree} |
| 108 | +:maxdepth: 2 |
| 109 | +:caption: User Guides |
| 110 | +:hidden: |
| 111 | +
|
| 112 | +0.12.0-SNAPSHOT <user-guide/latest/index> |
| 113 | +0.11.x <user-guide/0.11/index> |
| 114 | +0.10.x <user-guide/0.10/index> |
| 115 | +0.9.x <user-guide/0.9/index> |
| 116 | +0.8.x <user-guide/0.8/index> |
| 117 | +``` |
| 118 | + |
| 119 | +```{toctree} |
| 120 | +:maxdepth: 1 |
| 121 | +:caption: Contributor Guide |
| 122 | +:hidden: |
| 123 | +
|
| 124 | +Getting Started <contributor-guide/contributing> |
| 125 | +Comet Plugin Overview <contributor-guide/plugin_overview> |
| 126 | +Development Guide <contributor-guide/development> |
| 127 | +Debugging Guide <contributor-guide/debugging> |
| 128 | +Benchmarking Guide <contributor-guide/benchmarking> |
| 129 | +Adding a New Expression <contributor-guide/adding_a_new_expression> |
| 130 | +Tracing <contributor-guide/tracing> |
| 131 | +Profiling Native Code <contributor-guide/profiling_native_code> |
| 132 | +Spark SQL Tests <contributor-guide/spark-sql-tests.md> |
| 133 | +Roadmap <contributor-guide/roadmap.md> |
| 134 | +Github and Issue Tracker <https://github.com/apache/datafusion-comet> |
| 135 | +``` |
| 136 | + |
| 137 | +```{toctree} |
| 138 | +:maxdepth: 1 |
| 139 | +:caption: ASF Links |
| 140 | +:hidden: |
| 141 | +
|
| 142 | +Apache Software Foundation <https://apache.org> |
| 143 | +License <https://www.apache.org/licenses/> |
| 144 | +Donate <https://www.apache.org/foundation/sponsorship.html> |
| 145 | +Thanks <https://www.apache.org/foundation/thanks.html> |
| 146 | +Security <https://www.apache.org/security/> |
| 147 | +Code of conduct <https://www.apache.org/foundation/policies/conduct.html> |
| 148 | +``` |
0 commit comments