Skip to content

Conversation

@robtandy
Copy link
Collaborator

@robtandy robtandy commented Aug 5, 2025

Ported over the concepts from the previous version into this structure. The tests/distributed_execution.rs is now a failing test due to a proto serialization bug around the custom codec.

I don't have time to further look into it today. I would like this PR merged though if possible, and the fix for the failing test as a subsequent PR. This way it unblocks adding examples, tests, and fixes or reworks.

Brief summary of changes, docs and readme to follow in subsequent smaller PRs

  • ExecutionStage now has an assign method which will recursively add worker urls to all tasks in the tree. It is currently done for the entire tree when ExecutionStage::execute() is called, but it can be improved by lazily doing it before sending out a child stage to workers.
  • The DoGet proto consists of an ExecutionStageProto and the partition to execute.
  • PartitionIsolatorExec node was ported over to allow for using RepartitionExecs unchanged, only showing them their allowed partitions
  • CombinedRecordBatchStream ported over for scatter gather from child stages

Challenges:

  • I could not get the ChannelManager to work with the FlightClient. The high level flight client is a lot easier to use and it seems to want to take tonic Channels. Happy to have this fixed or refactored.
  • I believe the codec work is structured well, but it is not working. I did not find the bug, which is halting distributed execution.

@gabotechs gabotechs merged commit ccf36a1 into main Aug 5, 2025
2 of 3 checks passed
@gabotechs gabotechs deleted the robtandy/port_and_refactor_execution_code branch August 5, 2025 10:58
@gabotechs gabotechs mentioned this pull request Aug 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants