Propagate execution metrics and user-provided trailing data back to the head of the plan

Each physical node in DataFusion is capable of gathering execution metrics that are only fully known after all work has finished.

In normal DataFusion, this metrics are available just by inspecting the nodes themselves, but in distributed DataFusion, only the nodes belonging to the head stage will have execution metrics, as all others are executed in different machines, and therefore not known by the head of the plan that gets executed locally.

The gRPC specification has this notion of "trailing" data, which is metadata that instead of being part of the headers of the request (before the body), are part of the trailers of the request (after the body). We probably should use that for threading the execution metrics and assigning them to the appropriate nodes in the head stage run in the user's machine.

Additionally to this, users might want to be able to propagate arbitrary metadata once all the results are fully streamed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Propagate execution metrics and user-provided trailing data back to the head of the plan #94

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Propagate execution metrics and user-provided trailing data back to the head of the plan #94

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions