|
| 1 | +# Localhost workers example |
| 2 | + |
| 3 | +This example executes a SQL query in a distributed context. |
| 4 | + |
| 5 | +For this example to work, it's necessary to spawn some localhost workers with the `localhost_worker.rs` example: |
| 6 | + |
| 7 | +## Preparation |
| 8 | + |
| 9 | +This example queries a couple of test parquet we have for integration tests, and those files are stored using `git lfs`, |
| 10 | +so pulling the first is necessary. |
| 11 | + |
| 12 | +```shell |
| 13 | +git lfs checkout |
| 14 | +``` |
| 15 | + |
| 16 | +### Spawning the workers |
| 17 | + |
| 18 | +In two different terminals spawn two ArrowFlightEndpoints |
| 19 | + |
| 20 | +```shell |
| 21 | +cargo run --example localhost_worker -- 8080 --cluster-ports 8080,8081 |
| 22 | +``` |
| 23 | + |
| 24 | +```shell |
| 25 | +cargo run --example localhost_worker -- 8081 --cluster-ports 8080,8081 |
| 26 | +``` |
| 27 | + |
| 28 | +- The positional numeric argument is the port in which each Arrow Flight endpoint will listen |
| 29 | +- The `--cluster-ports` parameter tells the Arrow Flight endpoint all the available localhost workers in the cluster |
| 30 | + |
| 31 | +### Issuing a distributed SQL query |
| 32 | + |
| 33 | +Now, DataFusion queries can be issued using these workers as part of the cluster. |
| 34 | + |
| 35 | +```shell |
| 36 | +cargo run --example localhost_run -- 'SELECT count(*), "MinTemp" FROM weather GROUP BY "MinTemp"' --cluster-ports 8080,8081 |
| 37 | +``` |
| 38 | + |
| 39 | +The head stage will be executed locally in the same process as that `cargo run` command, but further stages will be |
| 40 | +delegated to the workers running on ports 8080 and 8081. |
| 41 | + |
| 42 | +Additionally, the `--explain` flag can be passed to render the distributed plan: |
| 43 | + |
| 44 | +```shell |
| 45 | +cargo run --example localhost_run -- 'SELECT count(*), "MinTemp" FROM weather GROUP BY "MinTemp"' --cluster-ports 8080,8081 --explain |
| 46 | +``` |
| 47 | + |
| 48 | +### Available tables |
| 49 | + |
| 50 | +Two tables are available in this example: |
| 51 | + |
| 52 | +- `flights_1m`: Flight data with 1m rows |
| 53 | +- `weather`: Small dataset of weather data |
0 commit comments