|
1 | | -# DataFusion Federation |
2 | | - |
3 | | -[](https://crates.io/crates/datafusion-federation) |
4 | | -[](https://docs.rs/datafusion-federation) |
5 | | - |
6 | | -DataFusion Federation allows |
7 | | -[DataFusion](https://github.com/apache/arrow-datafusion) to execute (part of) a |
8 | | -query plan by a remote execution engine. |
9 | | - |
10 | | - ┌────────────────┐ |
11 | | - ┌────────────┐ │ Remote DBMS(s) │ |
12 | | - SQL Query ───> │ DataFusion │ ───> │ ( execution │ |
13 | | - └────────────┘ │ happens here ) │ |
14 | | - └────────────────┘ |
15 | | - |
16 | | -The goal is to allow resolving queries across remote query engines while |
17 | | -pushing down as much compute as possible to the remote database(s). This allows |
18 | | -execution to happen as close to the storage as possible. This concept is |
19 | | -referred to as 'query federation'. |
20 | | - |
21 | | -> [!TIP] |
22 | | -> This repository implements the federation framework itself. If you want to |
23 | | -> connect to a specific database, check out the compatible providers available |
24 | | -> in |
25 | | -> [datafusion-contrib/datafusion-table-providers](https://github.com/datafusion-contrib/datafusion-table-providers/). |
26 | | -
|
27 | | -## Usage |
28 | | - |
29 | | -Check out the [examples](./datafusion-federation/examples/) to get a feel for |
30 | | -how it works. |
31 | | - |
32 | | -For a complete step-by-step example of how federation works, you can check the |
33 | | -example [here](./datafusion-federation/examples/df-csv-advanced.rs). |
34 | | - |
35 | | -## Potential use-cases: |
36 | | - |
37 | | -- Querying across SQLite, MySQL, PostgreSQL, ... |
38 | | -- Pushing down SQL or [Substrait](https://substrait.io/) plans. |
39 | | -- DataFusion -> Flight SQL -> DataFusion |
40 | | -- .. |
41 | | - |
42 | | -## Design concept |
43 | | - |
44 | | -Say you have a query plan as follows: |
45 | | - |
46 | | - ┌────────────┐ |
47 | | - │ Join │ |
48 | | - └────────────┘ |
49 | | - ▲ |
50 | | - ┌───────┴────────┐ |
51 | | - ┌────────────┐ ┌────────────┐ |
52 | | - │ Scan A │ │ Join │ |
53 | | - └────────────┘ └────────────┘ |
54 | | - ▲ |
55 | | - ┌───────┴────────┐ |
56 | | - ┌────────────┐ ┌────────────┐ |
57 | | - │ Scan B │ │ Scan C │ |
58 | | - └────────────┘ └────────────┘ |
59 | | - |
60 | | -DataFusion Federation will identify the largest possible sub-plans that |
61 | | -can be executed by an external database: |
62 | | - |
63 | | - ┌────────────┐ Optimizer recognizes |
64 | | - │ Join │ that B and C are |
65 | | - └────────────┘ available in an |
66 | | - ▲ external database |
67 | | - ┌──────────────┴────────┐ |
68 | | - │ ┌ ─ ─ ─ ─ ─ ─ ┴ ─ ── ─ ─ ─ ─ ─┐ |
69 | | - ┌────────────┐ ┌────────────┐ │ |
70 | | - │ Scan A │ │ │ Join │ |
71 | | - └────────────┘ └────────────┘ │ |
72 | | - │ ▲ |
73 | | - ┌───────┴────────┐ │ |
74 | | - ┌────────────┐ ┌────────────┐ │ |
75 | | - ││ Scan B │ │ Scan C │ |
76 | | - └────────────┘ └────────────┘ │ |
77 | | - ─ ── ─ ─ ── ─ ─ ─ ─ ─ ─ ─ ── ─ ┘ |
78 | | - |
79 | | -The sub-plans are cut out and replaced by an opaque federation node in the plan: |
80 | | - |
81 | | - ┌────────────┐ |
82 | | - │ Join │ |
83 | | - └────────────┘ Rewritten Plan |
84 | | - ▲ |
85 | | - ┌────────┴───────────┐ |
86 | | - │ │ |
87 | | - ┌────────────┐ ┏━━━━━━━━━━━━━━━━━━┓ |
88 | | - │ Scan A │ ┃ Scan B+C ┃ |
89 | | - └────────────┘ ┃ (TableProvider ┃ |
90 | | - ┃ that can execute ┃ |
91 | | - ┃ sub-plan in an ┃ |
92 | | - ┃external database)┃ |
93 | | - ┗━━━━━━━━━━━━━━━━━━┛ |
94 | | - |
95 | | -Different databases may have different query languages and execution |
96 | | -capabilities. To accommodate for this, we allow each 'federation provider' to |
97 | | -self-determine what part of a sub-plan it will actually federate. This is done |
98 | | -by letting each federation provider define its own optimizer rule. When a |
99 | | -sub-plan is 'cut out' of the overall plan, it is first passed the federation |
100 | | -provider's optimizer rule. This optimizer rule determines the part of the plan |
101 | | -that is cut out, based on the execution capabilities of the database it |
102 | | -represents. |
103 | | - |
104 | | -## Implementation |
105 | | - |
106 | | -A remote database is represented by the `FederationProvider` trait. To identify |
107 | | -table scans that are available in the same database, they implement |
108 | | -`FederatedTableSource` trait. This trait allows lookup of the corresponding |
109 | | -`FederationProvider`. |
110 | | - |
111 | | -Identifying sub-plans to federate is done by the `FederationOptimizerRule`. |
112 | | -This rule needs to be registered in your DataFusion SessionState. One easy way |
113 | | -to do this is using `default_session_state`. To do its job, the |
114 | | -`FederationOptimizerRule` currently requires that all TableProviders that need |
115 | | -to be federated are `FederatedTableProviderAdaptor`s. The |
116 | | -`FederatedTableProviderAdaptor` also has a fallback mechanism that allows |
117 | | -implementations to fallback to a 'vanilla' TableProvider in case the |
118 | | -`FederationOptimizerRule` isn't registered. |
119 | | - |
120 | | -The `FederationProvider` can provide a `compute_context`. This allows it to |
121 | | -differentiate between multiple remote execution context of the same type. For |
122 | | -example two different mysql instances, database schemas, access level, etc. The |
123 | | -`FederationProvider` also returns the `Optimizer` that is allows it to |
124 | | -self-determine what part of a sub-plan it can federate. |
125 | | - |
126 | | -The `sql` module implements a generic `FederationProvider` for SQL execution |
127 | | -engines. A specific SQL engine implements the `SQLExecutor` trait for its |
128 | | -engine specific execution. There are a number of compatible providers available |
129 | | -in |
130 | | -[datafusion-contrib/datafusion-table-providers](https://github.com/datafusion-contrib/datafusion-table-providers/). |
131 | | - |
132 | | -## Status |
133 | | - |
134 | | -The project is in alpha status. Contributions welcome; land a PR = commit |
135 | | -access. |
136 | | - |
137 | | -- [Docs (release)](https://docs.rs/datafusion-federation) |
138 | | -- [Docs (main)](https://datafusion-contrib.github.io/datafusion-federation/) |
| 1 | +# DataFusion Flight SQL Server |
| 2 | + |
| 3 | +The `datafusion-flight-sql-server` is a Flight SQL server that implements the |
| 4 | +necessary endpoints to use DataFusion as the query engine. |
| 5 | + |
| 6 | +## Getting Started |
| 7 | + |
| 8 | +To use `datafusion-flight-sql-server` in your Rust project, run: |
| 9 | + |
| 10 | +```sh |
| 11 | +$ cargo add datafusion-flight-sql-server |
| 12 | +``` |
| 13 | + |
| 14 | +## Example |
| 15 | + |
| 16 | +Here's a basic example of setting up a Flight SQL server: |
| 17 | + |
| 18 | +```rust |
| 19 | +use datafusion_flight_sql_server::service::FlightSqlService; |
| 20 | +use datafusion::{ |
| 21 | + execution::{ |
| 22 | + context::SessionContext, |
| 23 | + options::CsvReadOptions, |
| 24 | + }, |
| 25 | +}; |
| 26 | + |
| 27 | +async { |
| 28 | + let dsn: String = "0.0.0.0:50051".to_string(); |
| 29 | + let remote_ctx = SessionContext::new(); |
| 30 | + remote_ctx |
| 31 | + .register_csv("test", "./examples/test.csv", CsvReadOptions::new()) |
| 32 | + .await.expect("Register csv"); |
| 33 | + |
| 34 | + FlightSqlService::new(remote_ctx.state()).serve(dsn.clone()) |
| 35 | + .await |
| 36 | + .expect("Run flight sql service"); |
| 37 | + |
| 38 | +}; |
| 39 | +``` |
| 40 | + |
| 41 | +This example sets up a Flight SQL server listening on `127.0.0.1:50051`. |
| 42 | + |
| 43 | + |
| 44 | +# Acknowledgments |
| 45 | + |
| 46 | +This repository was a Rust crate that was first built as a part of |
| 47 | +[datafusion-federation](https://github.com/datafusion-contrib/datafusion-federation/) |
| 48 | +repository. |
| 49 | + |
| 50 | +For more details about the original repository, please visit |
| 51 | +[datafusion-federation](https://github.com/datafusion-contrib/datafusion-federation/). |
| 52 | + |
0 commit comments