Workflow definitions (2.0)

Workflows are defined using YAML.

The schema for a workflow can be found in the engine's repository (workflow/workflow-schema.yaml) that can be used as a schema in an editor using a suitable URL. As an example, here's the URL for this version: -

https://raw.githubusercontent.com/InformaticsMatters/squonk2-data-manager-workflow-engine/refs/heads/2.0/workflow/workflow-schema.yaml

What follows is a discussion of workflows for version 2.0, where you can find a number of example workflows in the engine repository's tests/workflow-definitions directory.

In general, a workflow definition requires the provision of the following: -

An Information block
Workflow variables
Steps - Job specifications and "plumbing" (routing of variables)

Information

All workflows need an information section. The supported root-level properties are described in the following table: -

Property	Type	Description
kind	String	A constant value that needs to be set to `DataManagerWorkflow` that will not change in future engine versions
kind-version	String	An enum string value that needs to be set to `2025.2` for version 2.0
name	String	The workflow name. An RFC1035 label name compliant string. Essentially up to 64 lower-case letters, digits and hyphens (not ending with `-`)
description	String	An optional free-format text property that provides the user with a high-level description of the workflow

Here's an example: -

---
kind: DataManagerWorkflow
kind-version: "2025.2"
name: nop-fail
description: >-
  A workflow with one step that fails

Workflow variables

An optional (although typically always present) root-level property, variables declares all the inputs, outputs, and options a user is expected to provide when they execute the workflow.

The workflow 2.0 schema does not cover this section, its structure is defined by the DM Job definition schema located in our squonk2-data-manager-job-decoder repository. The workflow engine currently does not use this section, it is present to simplify UI development by allowing the UI to reuse logic it uses when launching Jobs to set variables when launching workflows.

Steps

The Jobs that are run by a workflow are defined in a root-level array of steps, an array of Job "specifications", and "plumbing".

The 2.0 engine executes steps in the order they are defined in the workflow YAML file.

Every step needs a name and a specification. The name is used to identify the step within the workflow, a name that has to be unique within the workflow. The specification provides a structured reference to a Squonk2 Job that can also include pre-defined variables (names and values) that do not need to be set when the workflow is run. This specification behaves exactly as it does when providing a specification when launching a DM Job via the API.

The plumbing provides a mechanism to connect variables together. More on that later.

In the following example the workflow declares a step that will run the rdkit-molprops Job with a pre-defined variable col1 with a value of 123: -

steps:
- name: step1
  description: Add column 1
  specification:
    collection: workflow-engine-unit-test-jobs
    job: rdkit-molprops
    version: "1.0.0"
    variables:
      name: "col1"
      value: 123

Assigning workflow variables to steps

Workflow variables specialise the workflow and typically require a user to identify project-level files that are to be processed along with the name of the output the workflow is expected to create (with optional defaults).

To connect a workflow variable value to a step variable (input, option or output) we use the step's "plumbing" block.

In the following example we declare that the step's inputFile variable value is to be set from the value of the workflow variable called candidateMolecules: -

steps:
- name: step-1
  [...]
  plumbing:
  - variable: inputFile
    from-workflow:
      variable: candidateMolecules

We do not care what the variable value is, or expect any particular type (string, integer, float) - we are simply saying the value of step variable inputFile is to be set from the workflow variable candidateMolecules.

Variables in the workflow step's plumbing block have no concept of the variable's function. They are not inputs, outputs or options, they are just variables. It is assumed that every variable's name is unique within a step - the workflow engine is not interested in the variable's function or type. All variables are essentially treated as strings.

Connecting step variables

You often need to connect one step's output to another step's input. Again, to do this we use the step's "plumbing" block.

Here the step's inputFile variable value is to be set from the value of the variable outputFile that was used in the prior step named step-1: -

steps:
- name: step-2
  [...]
  plumbing:
  - variable: inputFile
    from-step:
      name: step-1
      variable: outputFile

You can only refer to a prior step. Variables cannot be set from values of variables in steps that have not already run.

Identifying project files (workflow inputs)

As well as declaring connections between step variables and workflow (or prior step) variables, for convenience the step also names those variables that are expected to be files (or directories) in the Project directory, and therefore need to be copied/linked into the step's execution directory. This is done with the in property. In the following we identify two such variables: -

steps:
- name: step-1
  [...]
  in:
  - inputFileA
  - inputFileB

inputFileA and inputFileB are expected to be variables that are either in the step's specification or its variable-mapping block.

Declaring workflow outputs

Workflow steps execute in sub-directories of the chosen Project, which is also where all their files are located. If you want a file that is generated by a step to be propagated to the Project directory you need to declare this in the step's "plumbing" block.

Any step can produce outputs, it does not necessarily have to be the final step that produces outputs. The decision is yours.

In the following we identify two variables (normally outputs of the job) as representing files that the engine needs to copy to the Project directory when the workflow finishes (successfully): -

steps:
- name: step-2
  [...]
  plumbing:
  - variable: outputOne
    to-project:
  - variable: outputTwo
    to-project:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Workflow definitions (2.0)

Information

Workflow variables

Steps

Assigning workflow variables to steps

Connecting step variables

Identifying project files (workflow inputs)

Declaring workflow outputs

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

The Workflow Engine

Clone this wiki locally