DataFusion is an extensible query engine written in Rust that uses Apache Arrow as its in-memory format.
The documentation on this site is for the core DataFusion project, which contains libraries and binaries for developers building fast and feature rich database and analytic systems, customized to particular workloads. See use cases for examples.
The following related subprojects target end users and have separate documentation.
- DataFusion Python offers a Python interface for SQL and DataFrame queries.
- DataFusion Ray provides a distributed version of DataFusion that scales out on Ray clusters.
- DataFusion Comet is an accelerator for Apache Spark based on DataFusion.
"Out of the box," DataFusion offers SQL and Dataframe APIs, excellent performance, built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community. Python Bindings are also available.
DataFusion features a full query planner, a columnar, streaming, multi-threaded, vectorized execution engine, and partitioned data sources. You can customize DataFusion at almost all points including additional data sources, query languages, functions, custom operators and more. See the Architecture section for more details.
To get started, see
- The example usage section of the user guide and the datafusion-examples directory.
- The library user guide for examples of using DataFusion's extension APIs
- The developer’s guide for contributing and communication for getting in touch with us.
.. toctree:: :maxdepth: 1 :caption: ASF Links Apache Software Foundation <https://apache.org> License <https://www.apache.org/licenses/> Donate <https://www.apache.org/foundation/sponsorship.html> Thanks <https://www.apache.org/foundation/thanks.html> Security <https://www.apache.org/security/>
.. toctree:: :maxdepth: 1 :caption: Links GitHub and Issue Tracker <https://github.com/apache/datafusion> crates.io <https://crates.io/crates/datafusion> API Docs <https://docs.rs/datafusion/latest/datafusion/> Blog <https://datafusion.apache.org/blog/> Code of conduct <https://github.com/apache/datafusion/blob/main/CODE_OF_CONDUCT.md> Download <download>
.. toctree:: :maxdepth: 1 :caption: User Guide user-guide/introduction user-guide/example-usage user-guide/features user-guide/concepts-readings-events user-guide/crate-configuration user-guide/cli/index user-guide/dataframe user-guide/expressions user-guide/sql/index user-guide/configs user-guide/explain-usage user-guide/faq
.. toctree:: :maxdepth: 1 :caption: Library User Guide library-user-guide/index library-user-guide/extensions library-user-guide/using-the-sql-api library-user-guide/working-with-exprs library-user-guide/using-the-dataframe-api library-user-guide/building-logical-plans library-user-guide/catalogs library-user-guide/adding-udfs library-user-guide/custom-table-providers library-user-guide/extending-operators library-user-guide/profiling library-user-guide/query-optimizer library-user-guide/upgrading
.. toctree:: :maxdepth: 1 :caption: Contributor Guide contributor-guide/index contributor-guide/communication contributor-guide/development_environment contributor-guide/architecture contributor-guide/testing contributor-guide/api-health contributor-guide/howtos contributor-guide/roadmap contributor-guide/governance contributor-guide/inviting contributor-guide/specification/index contributor-guide/gsoc_application_guidelines contributor-guide/gsoc_project_ideas
.. toctree:: :maxdepth: 1 :caption: DataFusion Subprojects DataFusion Ballista <https://arrow.apache.org/ballista/> DataFusion Comet <https://datafusion.apache.org/comet/> DataFusion Python <https://datafusion.apache.org/python/>
