Skip to content

Distributed CLI / Benchmarking Tool #214

@jayshrivastava

Description

@jayshrivastava

benchmarking infrastructure with a proper setup based on AWS EC2 for example could prove value

It would be valuable to have a CLI which can instantiate distributed datafusion servers on a collection of remote workers.

It should be able to

  • install fixtures on each server (ex. tell each server to download tpch parquet tables)
  • issue SQL to a server for distributed execution

Each server should be able to

  • recognize fixtures from a directory as tables
  • execute distributed queries, measure metrics, and report these metrics back
  • optionally stream the results back to the user

Aside: It would be awesome to extend the datafusion-cli to be distributed, but this will require a lot more effort and may not be worth it

We can borrow many of ideas from

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions