Skip to content

Distributed plans do not support further optimization passes #177

@gabotechs

Description

@gabotechs

As mentioned in #163 (comment), this project leaves the distributed DataFusion plan in a state where it does not play well with other optimization rules in the DataFusion ecosystem.

The truth is that we are not playing by the DataFusion rules, and after distributed planning, we pretty much render the execution plan useless besides execution and display:

  • The plan becomes just a tree of stages, making it impossible to perform further traversals to insides of a stage with just DataFusion tools.
  • Our execution plans do not support to be called with new arbitrary children. The .with_new_children() call is either not supported or does not accept any arbitrary plan in our nodes.
  • Our nodes need to be prepared to take any arbitrary node as a child. Some crates will wrap nodes in wrapper passthrough nodes, so downcasting children to specific types will fail.

This prevents the project from working well with other crates like https://github.com/datafusion-contrib/datafusion-tracing.

Ideally, the produced distributed plan should allow traversals and operations as any other non-distributed plan.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions