-
Notifications
You must be signed in to change notification settings - Fork 19
Closed
Description
benchmarking infrastructure with a proper setup based on AWS EC2 for example could prove value
It would be valuable to have a CLI which can instantiate distributed datafusion servers on a collection of remote workers.
It should be able to
- install fixtures on each server (ex. tell each server to download tpch parquet tables)
- issue SQL to a server for distributed execution
Each server should be able to
- recognize fixtures from a directory as tables
- execute distributed queries, measure metrics, and report these metrics back
- optionally stream the results back to the user
Aside: It would be awesome to extend the datafusion-cli to be distributed, but this will require a lot more effort and may not be worth it
We can borrow many of ideas from
- https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachprod
- https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest
Which do exactly what we need in this issue
Metadata
Metadata
Assignees
Labels
No labels