Skip to content

Conversation

@jayshrivastava
Copy link
Collaborator

@jayshrivastava jayshrivastava commented Nov 2, 2025

This change relates to #214. It adds a new package to this crate called datafusion-distributed-controller which will contain control-plane-related tooling for running a distributed datafusion clusters. This utility will be used for large-scale integration tests and benchmarking.

The proposed architecture is:

  • have a local CLI "controller" which can interface with agents running on workers
  • the agents can run code on the machine (ex. download data from blob store, start arrow flight server, execute sql on the cluster etc.)

This change adds an agent. The controller will be added in a future commit. Usually an agent runs on a port and is interfaced with via RPC, but since this utility is just for testing, this change proposes that the agent can be interfaced with via a CLI.

@jayshrivastava jayshrivastava force-pushed the js/benchmarking branch 3 times, most recently from 80dae61 to d63b805 Compare November 3, 2025 02:43
@jayshrivastava jayshrivastava changed the title checkpoint add controller package and agent CLI Nov 3, 2025
@jayshrivastava jayshrivastava force-pushed the js/benchmarking branch 3 times, most recently from 6858376 to 542eaf2 Compare November 3, 2025 02:51
This change relates to #214. It adds a new package to this crate called `datafusion-distributed-controller` which will contain control-plane-related tooling for running a distributed datafusion clusters. This utility will be used for large-scale integration tests and benchmarking. This package does not necessarily need to contain rust code. We can

The proposed architecture is:
- have a local CLI "controller" which can interface with agents running on workers
- the agents can run code on the machine (ex. download data from blob store, start arrow flight server, execute sql on the cluster etc.)

This change adds an agent. The controller will be added in a future commit. Usually an agent runs on a port and is interfaced with via RPC, but since this utility is just for testing, this change proposes that the agent can be interfaced with via a CLI.
@jayshrivastava
Copy link
Collaborator Author

Since the new plan is implement the controller in the aws cdk (typescript), we should refactor this so it's just distributed-datafusion-server containing the server CLI only

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants