Skip to content

Runtime: Allow a step to execute only when all upstream steps have completed #850

@josephjclark

Description

@josephjclark

When a step completes right now, the runtime looks to see if the step has any next (or downstream) steps. If so, the downstream step will be added to the queue to be executed immediately. If a step has multiple upstream edges, it'll run multiple times (after each one completes)

Basically if a node has upstream edges, they are treated as logical ORs. When each upstream step completes, the step will be re-executed.

Image

This is often useful, but we also want to support a mode where a step will not execute until ALL upstream steps have completed. Like a logical AND.

See https://community.openfn.org/t/allow-a-step-to-run-only-when-all-upstream-ancestor-steps-have-run/738

Things to consider:

  • The runtime needs to be more aware of the hierarchy of steps. A step cannot be executed unless all upstream edges have been tested (or all upstream branches have been executed)
  • In other words, a step has dependencies now and cannot run until all dependencies have had a chance to run. Does this mean looking ahead in the queue to see if any upstream (including indirect upstream) steps are waiting? And then defer to the back of the queue? I think so - but it may be more complex than this
  • Do we toggle this behaviour on the edge, node, or global? Does it make sense that some branches are ORs and some are ANDs? I kind of hope not because that's over complicated and hard to visually explain.
  • How to reconcile state. Three upstream steps will have three different state objects. What state does the downstream step receive? We should have a shallow first-to-last merge - just squash it all down - by default. But we also need to enable a reconcile function which takes all state objects as arguments and returns a single state.
  • Don't get blocked if some upstream steps don't execute. The runtime needs to know if all upstream edges have had a chance to run, and when they've all been tried, we can run the downstream step.
  • In other words, if two upstreams steps say "execute x" and one upstream step says "don't execute x", who wins? I'd suggest that as soon as any step allows step x to run, then step x MUST run. We must just wait for any other ancestors to run first.
  • Remember that when referring to "upstream" steps, the upstream step may be indirect. Consider the whole branch.
  • Instead of a reconcile function, should we instead have a reconcile strategy, deep vs shallow? If deep, then we'll recursively traverse all state objects and arrays and merge them. Otherwise we just spread/assign keys at the top level.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    DevX Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions