Skip to content

Propagate execution metrics and user-provided trailing data back to the head of the plan #94

@gabotechs

Description

@gabotechs

Each physical node in DataFusion is capable of gathering execution metrics that are only fully known after all work has finished.

In normal DataFusion, this metrics are available just by inspecting the nodes themselves, but in distributed DataFusion, only the nodes belonging to the head stage will have execution metrics, as all others are executed in different machines, and therefore not known by the head of the plan that gets executed locally.

The gRPC specification has this notion of "trailing" data, which is metadata that instead of being part of the headers of the request (before the body), are part of the trailers of the request (after the body). We probably should use that for threading the execution metrics and assigning them to the appropriate nodes in the head stage run in the user's machine.

Additionally to this, users might want to be able to propagate arbitrary metadata once all the results are fully streamed.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions