-
Notifications
You must be signed in to change notification settings - Fork 0
Workflow definitions (2.0)
Workflows are defined using YAML.
The schema for a workflow can be found in the engine's repository (workflow/workflow-schema.yaml
) that can be used as a schema in an editor using a suitable URL. As an example, here's the URL for this version: -
-
https://raw.githubusercontent.com/InformaticsMatters/squonk2-data-manager-workflow-engine/refs/heads/2.0/workflow/workflow-schema.yaml
What follows is a discussion of workflows for version 2.0, where you can find a number of example workflows in the engine repository's
tests/workflow-definitions
directory.
In general, a workflow definition requires the provision of the following: -
- An Information block
- Workflow variables
- Steps - Job specifications and "plumbing" (routing of variables)
All workflows need an information section. The supported root-level properties are described in the following table: -
Property | Type | Description |
---|---|---|
kind | String | A constant value
that needs to be set to DataManagerWorkflow
that will not change in future engine versions |
kind-version | String | An enum string value
that needs to be set to 2025.2 for version 2.0 |
name | String | The workflow name.
An RFC1035 label name compliant string.
Essentially up to 64 lower-case letters, digits
and hyphens (not ending with - ) |
description | String | An optional free-format text property that provides the user with a high-level description of the workflow |
Here's an example: -
---
kind: DataManagerWorkflow
kind-version: "2025.2"
name: nop-fail
description: >-
A workflow with one step that fails
An optional (although typically always present) root-level property, variables
declares all the inputs
, outputs
, and options
a user is expected to provide when they execute the workflow.
The workflow 2.0 schema does not cover this section, its structure is defined by the DM Job definition schema located in our squonk2-data-manager-job-decoder repository. The workflow engine currently does not use this section, it is present to simplify UI development by allowing the UI to reuse logic it uses when launching Jobs to set variables when launching workflows.
The Jobs that are run by a workflow are defined in a root-level array of steps
, an array of Job "specifications", and "plumbing".
The 2.0 engine executes steps in the order they are defined in the workflow YAML file.
Every step needs a name
and a specification
. The name
is used to identify the step within the workflow, a name that has to be unique within the workflow. The specification
provides a structured reference to a Squonk2 Job that can also include pre-defined variables
(names and values) that do not need to be set when the workflow is run. This specification
behaves exactly as it does when providing a specification when launching a DM Job via the API.
The plumbing
provides a mechanism to connect variables together. More on that later.
In the following example the workflow declares a step that will run the rdkit-molprops
Job with a pre-defined variable col1
with a value of 123: -
steps:
- name: step1
description: Add column 1
specification:
collection: workflow-engine-unit-test-jobs
job: rdkit-molprops
version: "1.0.0"
variables:
name: "col1"
value: 123
Workflow variables specialise the workflow and typically require a user to identify project-level files that are to be processed along with the name of the output the workflow is expected to create (with optional defaults).
To connect a workflow variable value to a step variable (input, option or output) we use the step's "plumbing" block.
In the following example we declare that the step's inputFile
variable value is to be set from the value of the workflow variable called candidateMolecules
: -
steps:
- name: step-1
[...]
plumbing:
- variable: inputFile
from-workflow:
variable: candidateMolecules
We do not care what the variable value is, or expect any particular type (string, integer, float) - we are simply saying the value of step variable inputFile is to be set from the workflow variable candidateMolecules.
Variables in the workflow step's plumbing block have no concept of the variable's function. They are not inputs, outputs or options, they are just variables. It is assumed that every variable's name is unique within a step - the workflow engine is not interested in the variable's function or type. All variables are essentially treated as strings.
You often need to connect one step's output to another step's input. Again, to do this we use the step's "plumbing" block.
Here the step's inputFile
variable value is to be set from the value of the variable outputFile
that was used in the prior step named step-1
: -
steps:
- name: step-2
[...]
plumbing:
- variable: inputFile
from-step:
name: step-1
variable: outputFile
You can only refer to a prior step. Variables cannot be set from values of variables in steps that have not already run.
The workflow engine provides a number of pre-defined (built-in) variables, available to all workflows. To use their values in a step variable you can use the from-predefined
type in the "plumbing" block.
Here we set a step's inputDirPrefix
to the value of the pre-defined variable link-glob
: -
steps:
- name: step-1
[...]
plumbing:
- variable: inputDirPrefix
from-predefined:
variable: link-glob
The following variables are defined and are available to all steps: -
Variable | Description |
---|---|
link-glob | The directory glob used for prior step directories linked into the existing step filesystem |