Skip to content

Concept

Alan B. Christie edited this page Aug 15, 2025 · 11 revisions

The Workflow Engine relates to the logic responsible for the interpretation and execution of YAML-based workflow definitions. The engine's repository has two distinct responsibilities: the definition of the workflow schema (a decoder) and the implementation of the workflow state machine (a workflow engine).

Original design requirements: -

A little like our Job Testing framework (Jote) it was considered desirable if...

  1. The code would be external to the Data Manager (DM)
  2. Testable without the need of kubernetes

These requirements, although appearing to be of great benefit to developers of workflows, introduce significant design considerations.

Running Jobs outside of the DM is a relatively simple task - Jote creates a docker-compose file and launches a container image. But Jobs (Instances) do not have a complex operational state that workflows would undoubtably have. Jobs simply result in one instance, easily handled by one docker compose file, whereas it was clear from the outset that a Workflow would result in (potentially) a large number of instances, some running in parallel ... introducing a complex state model that would needed to be tracked in a separate (persistent) object (one that we would call a Running Workflow). Each running workflow could result in a number of Jobs (that we'd call Steps) that would also have to be tracked.

Now that I write this section of the documentation I do wonder whether developing code that would run "inside" the DM, outside of the DM environment was the best approach. Since the original concept it has resulted in the creation of a significant amount of custom logic that is required to _emulate_ the ability to launch Jobs and manage a workflow-related database. We could have developed a workflow state machine entirely in a client application - one that would rely solely on the DM REST API in order to run Workflow Jobs (steps). And, with the simplicity of deploying Squonk2 to a local kubernetes cluster (with Docker Desktop of Minikube) at least, during testing, we'd be execution much , much more of the real code?

But, to coin a phrase ... we are where we are.

External to the DM

As the code is to be independent of the DM repository, in order to follow sensible loosely coupled highly cohesive design paradigm, the ability to launch (run) Job Instances and access the Data Manager database would be encapsulated in two interfaces, two abstract base classes: -

  • The InstanceLauncher would provide an abstract interface that would allow the engine to run Jobs
  • The WorkflowAPI would provide an abstract interface that would allow access and modification of workflow-related database record.

When running (imported) in the DM, the DM would provide its won concrete implementations of these interfaces as it creates instances of the WorkflowEngine class. Under test the workflow engine repository would provide its own (mock) implementations.

Testing without Kubernetes

This is a significant challenge - the ability to run workflows without a real launcher or database. To do this we need functional implementations of both. This is achieved with classes that implement the above interface declarations located in the tests directory - along with a messaging framework that allows us to run the code asynchronously, as it is expected to.

A more detailed explanation of the approach to testing can be found on the Test framework page.

Clone this wiki locally