Skip to content

Workflow definitions (2.0)

Alan B. Christie edited this page Sep 19, 2025 · 37 revisions

Workflows are defined using YAML.

The schema for a workflow can be found in the engine's repository (workflow/workflow-schema.yaml) that can be used as a schema in an editor using a suitable URL. As an example, here's the URL for this version: -

  • https://raw.githubusercontent.com/InformaticsMatters/squonk2-data-manager-workflow-engine/refs/heads/2.0/workflow/workflow-schema.yaml

    What follows is a discussion of workflows for version 2.0, where you can find a number of example workflows in the engine repository's tests/workflow-definitions directory.

In general, a workflow definition requires the provision of the following: -

  1. An Information block
  2. Workflow variables
  3. Steps - Job specifications and "plumbing" (routing of variables)

Information

All workflows need an information section. The supported root-level properties are described in the following table: -

Property Type Description
kind String A constant value that needs to be set to DataManagerWorkflow that will not change in future engine versions
kind-version String An enum string value that needs to be set to 2025.2 for version 2.0
name String The workflow name. An RFC1035 label name compliant string. Essentially up to 64 lower-case letters, digits and hyphens (not ending with -)
description String An optional free-format text property that provides the user with a high-level description of the workflow

Here's an example: -

---
kind: DataManagerWorkflow
kind-version: "2025.2"
name: nop-fail
description: >-
  A workflow with one step that fails

Workflow variables

An optional (although typically always present) root-level property, variables declares all the inputs, outputs, and options a user is expected to provide when they execute the workflow.

The workflow 2.0 schema does not cover this section, its structure is defined by the DM Job definition schema located in our squonk2-data-manager-job-decoder repository. The workflow engine currently does not use this section, it is present to simplify UI development by allowing the UI to reuse logic it uses when launching Jobs to set variables when launching workflows.

Steps

The Jobs that are run by a workflow are defined in a root-level array of steps, an array of Job "specifications", and "plumbing".

The 2.0 engine executes steps in the order they are defined in the workflow YAML file.

Every step needs a name and a specification. The name is used to identify the step within the workflow, a name that has to be unique within the workflow. The specification provides a structured reference to a Squonk2 Job that can also include pre-defined variables (names and values) that do not need to be set when the workflow is run. This specification behaves exactly as it does when providing a specification when launching a DM Job via the API.

The plumbing provides a mechanism to connect variables together. More on that later.

In the following example the workflow declares a step that will run the rdkit-molprops Job with a pre-defined variable col1 with a value of 123: -

steps:
- name: step1
  description: Add column 1
  specification:
    collection: workflow-engine-unit-test-jobs
    job: rdkit-molprops
    version: "1.0.0"
    variables:
      name: "col1"
      value: 123

Assigning workflow variables to steps

Workflow variables specialise the workflow and typically require a user to identify project-level files that are to be processed along with the name of the output the workflow is expected to create (with optional defaults).

To connect a workflow variable value to a step variable (input, option or output) we use the step's "plumbing" block.

In the following example we declare that the step's inputFile variable value is to be set from the value of the workflow variable called candidateMolecules: -

steps:
- name: step-1
  [...]
  plumbing:
  - variable: inputFile
    from-workflow:
      variable: candidateMolecules

We do not care what the variable value is, or expect any particular type (string, integer, float) - we are simply saying the value of step variable inputFile is to be set from the workflow variable candidateMolecules.

Variables in the workflow step's plumbing block have no concept of the variable's function. They are not inputs, outputs or options, they are just variables. It is assumed that every variable's name is unique within a step - the workflow engine is not interested in the variable's function or type. All variables are essentially treated as strings.

Connecting step variables

You often need to connect one step's output to another step's input. Again, to do this we use the step's "plumbing" block.

Here the step's inputFile variable value is to be set from the value of the variable outputFile that was used in the prior step named step-1: -

steps:
- name: step-2
  [...]
  plumbing:
  - variable: inputFile
    from-step:
      name: step-1
      variable: outputFile

You can only refer to a prior step. Variables cannot be set from values of variables in steps that have not already run.

Pre-defined (built-in) engine variables

The workflow engine provides a number of pre-defined (built-in) variables, available to all workflows. To use their values in a step variable you can use the from-predefined type in the "plumbing" block.

Here we set a step's inputDirPrefix to the value of the pre-defined variable link-glob: -

steps:
- name: step-1
  [...]
  plumbing:
  - variable: inputDirPrefix
    from-predefined:
      variable: link-glob

The following variables are defined and are available to all steps: -

Variable Description
link-glob The directory glob used for prior step directories linked into the existing step filesystem
Clone this wiki locally