-
Notifications
You must be signed in to change notification settings - Fork 14
Add localhost_run.rs and localhost_worker.rs examples
#111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d706ee3 to
2d39a7c
Compare
2bdadca to
d0fad3a
Compare
NGA-TRAN
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super cool. I have tested them on the branch. Thanks Gabriel
|
|
||
| ```shell | ||
| cargo run --example localhost_run -- 'SELECT count(*), "MinTemp" FROM weather GROUP BY "MinTemp"' --cluster-ports 8080,8081 --explain | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These commands are so cool. Do you think for near future work, we are ready to work on supporting distributed-datafusion-cli defined in #4?
Maybe we add a new folder distributed-datafusion-cli similar to datafusion-cli to support this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I'm not sure what value would distributed-datafusion-cli bring on top of the normal datafusion-cli. As this is just a library for distributing queries, the concept of CLI becomes less relevant in this context.
If people anyways want to use the CLI, hopefully we can just reuse the normal datafusion-cli rather than building our own thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reusing datafusion-cli is a good option as long as we provide a good way to have a default (e.g 3 workers) and easy-custom distributed settings
| cluster_ports: Vec<u16>, | ||
|
|
||
| #[structopt(long)] | ||
| explain: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
examples/localhost.md
Outdated
| ``` | ||
|
|
||
| The head stage will be executed locally in the same process as that `cargo run` command, but further stages will be | ||
| delegated to the workers running on ports 8080 and 8081. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you define head stage? This makes me ask myself which part of the plan runs in head stage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you visualize a plan as a tree of stages, the "head" or "root" stage is the top-level one, the first one counting from top to bottom. Added a clarifying comment.
| @@ -0,0 +1,53 @@ | |||
| # Localhost workers example | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked out your branch and run the below commands and get query result and explain back. Super cool!
2d39a7c to
a2d5822
Compare
Adds a basic example for running distributed queries on top of a couple of parquet files, the same we use for tests.