Skip to content

Design and Implement EXPLAIN ANALYZE for Distributed Queries #7

@NGA-TRAN

Description

@NGA-TRAN

EXPLAIN ANALYZE is incredibly useful for understanding query performance, but there’s currently no defined behavior for distributed queries in DataFusion. This gives us the opportunity to define how it should work from the ground up.

Proposed approach:

  • In each DFRayProcessor, generate an instrumented plan as if it were running EXPLAIN ANALYZE locally.
  • While streaming results back across the network, attach the instrumented plan as an opaque payload in the response—so it can be collected at the head node for final formatting.
  • Investigate the use of opaque fields in the Arrow Flight protocol to carry this metadata.

This will give developers deep insight into execution performance across stages and workers in a distributed setup.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions