Skip to content

Commit 75c0286

Browse files
committed
Migrate the code from the datafusion-federation repository.
`filter-repo` could be used instead of bringing all the commits from the federation repository, but since the flight-sql-server directory has been renamed a few times, it's hard to distinguish the commits, and we don't want to lose the original commits.
1 parent 55db1a6 commit 75c0286

File tree

26 files changed

+58
-4115
lines changed

26 files changed

+58
-4115
lines changed

.github/workflows/docs.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,10 @@ jobs:
2323
- uses: arduino/setup-protoc@v3
2424
with:
2525
repo-token: ${{ secrets.GITHUB_TOKEN }}
26-
- run: cargo rustdoc -p datafusion-federation -- --cfg docsrs
26+
- run: cargo rustdoc -p datafusion-flight-sql-server -- --cfg docsrs
2727
- run: chmod -c -R +rX "target/doc"
2828
- run: touch target/doc/index.html
29-
- run: echo "<meta http-equiv=refresh content=0;url=datafusion_federation>" > target/doc/index.html
29+
- run: echo "<meta http-equiv=refresh content=0;url=datafusion_flight_sql_server>" > target/doc/index.html
3030
- if: github.event_name == 'push' && github.ref == 'refs/heads/main'
3131
uses: actions/upload-pages-artifact@v3
3232
with:

.github/workflows/test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,4 +70,4 @@ jobs:
7070
with:
7171
repo-token: ${{ secrets.GITHUB_TOKEN }}
7272
- run: cargo build --all
73-
- run: cargo package -p datafusion-federation --allow-dirty
73+
- run: cargo package -p datafusion-flight-sql-server --allow-dirty

Cargo.toml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
resolver = "2"
33

44
members = [
5-
"datafusion-federation",
65
"datafusion-flight-sql-server",
76
"datafusion-flight-sql-table-provider",
87
]
@@ -12,16 +11,15 @@ version = "0.3.5"
1211
edition = "2021"
1312
license = "Apache-2.0"
1413
readme = "README.md"
15-
repository = "https://github.com/datafusion-contrib/datafusion-federation"
14+
repository = "https://github.com/datafusion-contrib/datafusion-flight-sql-server"
1615

1716
[workspace.dependencies]
1817
arrow = "53.3"
1918
arrow-flight = { version = "53.3", features = ["flight-sql-experimental"] }
2019
arrow-json = "53.3"
21-
async-stream = "0.3.5"
2220
async-trait = "0.1.83"
2321
datafusion = "44.0.0"
24-
datafusion-federation = { path = "./datafusion-federation", version = "0.3.5" }
22+
datafusion-federation = { version = "0.3.5" }
2523
datafusion-substrait = "44.0.0"
2624
futures = "0.3.31"
2725
tokio = { version = "1.41", features = ["full"] }

README.md

Lines changed: 52 additions & 138 deletions
Original file line numberDiff line numberDiff line change
@@ -1,138 +1,52 @@
1-
# DataFusion Federation
2-
3-
[![crates.io](https://img.shields.io/crates/v/datafusion-federation.svg)](https://crates.io/crates/datafusion-federation)
4-
[![docs.rs](https://docs.rs/datafusion-federation/badge.svg)](https://docs.rs/datafusion-federation)
5-
6-
DataFusion Federation allows
7-
[DataFusion](https://github.com/apache/arrow-datafusion) to execute (part of) a
8-
query plan by a remote execution engine.
9-
10-
┌────────────────┐
11-
┌────────────┐ │ Remote DBMS(s) │
12-
SQL Query ───> │ DataFusion │ ───> │ ( execution │
13-
└────────────┘ │ happens here ) │
14-
└────────────────┘
15-
16-
The goal is to allow resolving queries across remote query engines while
17-
pushing down as much compute as possible to the remote database(s). This allows
18-
execution to happen as close to the storage as possible. This concept is
19-
referred to as 'query federation'.
20-
21-
> [!TIP]
22-
> This repository implements the federation framework itself. If you want to
23-
> connect to a specific database, check out the compatible providers available
24-
> in
25-
> [datafusion-contrib/datafusion-table-providers](https://github.com/datafusion-contrib/datafusion-table-providers/).
26-
27-
## Usage
28-
29-
Check out the [examples](./datafusion-federation/examples/) to get a feel for
30-
how it works.
31-
32-
For a complete step-by-step example of how federation works, you can check the
33-
example [here](./datafusion-federation/examples/df-csv-advanced.rs).
34-
35-
## Potential use-cases:
36-
37-
- Querying across SQLite, MySQL, PostgreSQL, ...
38-
- Pushing down SQL or [Substrait](https://substrait.io/) plans.
39-
- DataFusion -> Flight SQL -> DataFusion
40-
- ..
41-
42-
## Design concept
43-
44-
Say you have a query plan as follows:
45-
46-
┌────────────┐
47-
│ Join │
48-
└────────────┘
49-
50-
┌───────┴────────┐
51-
┌────────────┐ ┌────────────┐
52-
│ Scan A │ │ Join │
53-
└────────────┘ └────────────┘
54-
55-
┌───────┴────────┐
56-
┌────────────┐ ┌────────────┐
57-
│ Scan B │ │ Scan C │
58-
└────────────┘ └────────────┘
59-
60-
DataFusion Federation will identify the largest possible sub-plans that
61-
can be executed by an external database:
62-
63-
┌────────────┐ Optimizer recognizes
64-
│ Join │ that B and C are
65-
└────────────┘ available in an
66-
▲ external database
67-
┌──────────────┴────────┐
68-
│ ┌ ─ ─ ─ ─ ─ ─ ┴ ─ ── ─ ─ ─ ─ ─┐
69-
┌────────────┐ ┌────────────┐ │
70-
│ Scan A │ │ │ Join │
71-
└────────────┘ └────────────┘ │
72-
│ ▲
73-
┌───────┴────────┐ │
74-
┌────────────┐ ┌────────────┐ │
75-
││ Scan B │ │ Scan C │
76-
└────────────┘ └────────────┘ │
77-
─ ── ─ ─ ── ─ ─ ─ ─ ─ ─ ─ ── ─ ┘
78-
79-
The sub-plans are cut out and replaced by an opaque federation node in the plan:
80-
81-
┌────────────┐
82-
│ Join │
83-
└────────────┘ Rewritten Plan
84-
85-
┌────────┴───────────┐
86-
│ │
87-
┌────────────┐ ┏━━━━━━━━━━━━━━━━━━┓
88-
│ Scan A │ ┃ Scan B+C ┃
89-
└────────────┘ ┃ (TableProvider ┃
90-
┃ that can execute ┃
91-
┃ sub-plan in an ┃
92-
┃external database)┃
93-
┗━━━━━━━━━━━━━━━━━━┛
94-
95-
Different databases may have different query languages and execution
96-
capabilities. To accommodate for this, we allow each 'federation provider' to
97-
self-determine what part of a sub-plan it will actually federate. This is done
98-
by letting each federation provider define its own optimizer rule. When a
99-
sub-plan is 'cut out' of the overall plan, it is first passed the federation
100-
provider's optimizer rule. This optimizer rule determines the part of the plan
101-
that is cut out, based on the execution capabilities of the database it
102-
represents.
103-
104-
## Implementation
105-
106-
A remote database is represented by the `FederationProvider` trait. To identify
107-
table scans that are available in the same database, they implement
108-
`FederatedTableSource` trait. This trait allows lookup of the corresponding
109-
`FederationProvider`.
110-
111-
Identifying sub-plans to federate is done by the `FederationOptimizerRule`.
112-
This rule needs to be registered in your DataFusion SessionState. One easy way
113-
to do this is using `default_session_state`. To do its job, the
114-
`FederationOptimizerRule` currently requires that all TableProviders that need
115-
to be federated are `FederatedTableProviderAdaptor`s. The
116-
`FederatedTableProviderAdaptor` also has a fallback mechanism that allows
117-
implementations to fallback to a 'vanilla' TableProvider in case the
118-
`FederationOptimizerRule` isn't registered.
119-
120-
The `FederationProvider` can provide a `compute_context`. This allows it to
121-
differentiate between multiple remote execution context of the same type. For
122-
example two different mysql instances, database schemas, access level, etc. The
123-
`FederationProvider` also returns the `Optimizer` that is allows it to
124-
self-determine what part of a sub-plan it can federate.
125-
126-
The `sql` module implements a generic `FederationProvider` for SQL execution
127-
engines. A specific SQL engine implements the `SQLExecutor` trait for its
128-
engine specific execution. There are a number of compatible providers available
129-
in
130-
[datafusion-contrib/datafusion-table-providers](https://github.com/datafusion-contrib/datafusion-table-providers/).
131-
132-
## Status
133-
134-
The project is in alpha status. Contributions welcome; land a PR = commit
135-
access.
136-
137-
- [Docs (release)](https://docs.rs/datafusion-federation)
138-
- [Docs (main)](https://datafusion-contrib.github.io/datafusion-federation/)
1+
# DataFusion Flight SQL Server
2+
3+
The `datafusion-flight-sql-server` is a Flight SQL server that implements the
4+
necessary endpoints to use DataFusion as the query engine.
5+
6+
## Getting Started
7+
8+
To use `datafusion-flight-sql-server` in your Rust project, run:
9+
10+
```sh
11+
$ cargo add datafusion-flight-sql-server
12+
```
13+
14+
## Example
15+
16+
Here's a basic example of setting up a Flight SQL server:
17+
18+
```rust
19+
use datafusion_flight_sql_server::service::FlightSqlService;
20+
use datafusion::{
21+
execution::{
22+
context::SessionContext,
23+
options::CsvReadOptions,
24+
},
25+
};
26+
27+
async {
28+
let dsn: String = "0.0.0.0:50051".to_string();
29+
let remote_ctx = SessionContext::new();
30+
remote_ctx
31+
.register_csv("test", "./examples/test.csv", CsvReadOptions::new())
32+
.await.expect("Register csv");
33+
34+
FlightSqlService::new(remote_ctx.state()).serve(dsn.clone())
35+
.await
36+
.expect("Run flight sql service");
37+
38+
};
39+
```
40+
41+
This example sets up a Flight SQL server listening on `127.0.0.1:50051`.
42+
43+
44+
# Acknowledgments
45+
46+
This repository was a Rust crate that was first built as a part of
47+
[datafusion-federation](https://github.com/datafusion-contrib/datafusion-federation/)
48+
repository.
49+
50+
For more details about the original repository, please visit
51+
[datafusion-federation](https://github.com/datafusion-contrib/datafusion-federation/).
52+

datafusion-federation/CHANGELOG.md

Lines changed: 0 additions & 32 deletions
This file was deleted.

datafusion-federation/Cargo.toml

Lines changed: 0 additions & 43 deletions
This file was deleted.

datafusion-federation/examples/data/test.csv

Lines changed: 0 additions & 4 deletions
This file was deleted.

datafusion-federation/examples/data/test2.csv

Lines changed: 0 additions & 7 deletions
This file was deleted.

0 commit comments

Comments
 (0)