Skip to content
Frank McSherry edited this page Aug 1, 2015 · 13 revisions

The main feature timely dataflow provides, in addition to moving data along dataflow edges, is the indication of progress through the stream of data.

The system alerts each recipient of timestamped data once it can determine that some previously active timestamp will never be seen again. This allows the operator to react with final messages and actions, for example sending aggregates, flushing state, or releasing held resources. The same mechanisms also alert the user as output data emerge from the dataflow graph.

Language of progress

We first need to develop timely dataflow's progress vocabulary.

A timely dataflow graph consists of operators, whose inputs and outputs are connected by channels. Each operator input has one associated channel, but an operator output may have many channels leading from it: multiple other operators may want to see the data it produces. Each group of records transmitted in timely dataflow have an associated timestamp.

Clone this wiki locally