-
Notifications
You must be signed in to change notification settings - Fork 0
Workflow definitions (2.0)
Workflows are defined using YAML.
The schema for a workflow can be found in the engine's repository (workflow/workflow-schema.yaml
) that can be used as a schema in an editor using a suitable URL. As an example, here's the URL for this version: -
-
https://raw.githubusercontent.com/InformaticsMatters/squonk2-data-manager-workflow-engine/refs/heads/2.0/workflow/workflow-schema.yaml
What follows is a discussion of workflows for version 2.0, where you can find a number of example workflows in the engine repository's
tests/workflow-definitions
directory.
In general, a workflow definition requires the provision of the following: -
- An Information block
- Workflow variables
- Steps - Job specifications and "plumbing" (routing of variables)
All workflows need an information section. The supported root-level properties are described in the following table: -
Property | Type | Description |
---|---|---|
kind | String | A constant value
that needs to be set to DataManagerWorkflow
that will not change in future engine versions |
kind-version | String | An enum string value
that needs to be set to 2025.2 for version 2.0 |
name | String | The workflow name.
An RFC1035 label name compliant string.
Essentially up to 64 lower-case letters, digits
and hyphens (not ending with - ) |
description | String | An optional free-format text property that provides the user with a high-level description of the workflow |
Here's an example: -
---
kind: DataManagerWorkflow
kind-version: "2025.2"
name: nop-fail
description: >-
A workflow with one step that fails
An optional (although typically always present) root-level property, variables
declares all the inputs
, outputs
, and options
a user is expected to provide when they execute the workflow.
The workflow 2.0 schema does not cover this section, its structure is defined by the DM Job definition schema located in our squonk2-data-manager-job-decoder repository. The workflow engine currently does not use this section, it is present to simplify UI development by allowing the UI to reuse logic it uses when launching Jobs to set variables when launching workflows.
The Jobs that are run by a workflow are defined in a root-level array of steps
, an array of Job "specifications", and "plumbing".
The 2.0 engine executes steps in the order they are defined in the workflow YAML file.
Every step needs a name
and a specification
. The name
is used to identify the step within the workflow, a name that has to be unique within the workflow. The specification
provides a structured reference to a Squonk2 Job that can also include pre-defined variables
(names and values) that do not need to be set when the workflow is run. This specification
behaves exactly as it does when providing a specification when launching a DM Job via the API.
The plumbing
provides a mechanism to connect variables together. More on that later.
In the following example the workflow declares a step that will run the rdkit-molprops
Job with a pre-defined variable col1
with a value of 123: -
steps:
- name: step1
description: Add column 1
specification:
collection: workflow-engine-unit-test-jobs
job: rdkit-molprops
version: "1.0.0"
variables:
name: "col1"
value: 123
Workflow variables specialise the workflow and typically require a user to identify project-level files that are to be processed along with the name of the output the workflow is expected to create (with optional defaults).
To connect a workflow variable value to a step variable (input, option or output) we use the step's "plumbing" block.
In the following example we declare that the step's inputFile
variable value is to be set from the value of the workflow variable called candidateMolecules
: -
steps:
- name: step-1
[...]
plumbing:
- variable: inputFile
from-workflow:
variable: candidateMolecules
We do not care what the variable value is, or expect any particular type (string, integer, float) - we are simply saying the value of step variable inputFile is to be set from the workflow variable candidateMolecules.
Variables in the workflow step's plumbing block have no concept of the variable's function. They are not inputs, outputs or options, they are just variables. It is assumed that every variable's name is unique within a step - the workflow engine is not interested in the variable's function or type. All variables are essentially treated as strings.
You often need to connect one step's output to another step's input. Again, to do this we use the step's "plumbing" block.
Here the step's inputFile
variable value is to be set from the value of the variable outputFile
that was used in the prior step named step-1
: -
steps:
- name: step-2
[...]
plumbing:
- variable: inputFile
from-step:
name: step-1
variable: outputFile
You can only refer to a prior step. Variables cannot be set from values of variables in steps that have not already run.
As well as declaring connections between step variables and workflow (or prior step) variables, for convenience the step also names those variables that are expected to be files (or directories) in the Project directory, and therefore need to be copied/linked into the step's execution directory. This is done with the in
property. In the following we identify two such variables: -
steps:
- name: step-1
[...]
in:
- inputFileA
- inputFileB
inputFileA
and inputFileB
are expected to be variables that are either in the step's specification
or its variable-mapping
block.
Workflow steps execute in sub-directories of the chosen Project, which is also where all their files are located. If you want a file that is generated by a step to be propagated to the Project directory you need to declare this in the step's "plumbing" block.
Any step can produce outputs, it does not necessarily have to be the final step that produces outputs. The decision is yours.
In the following we identify two variables (normally outputs of the job) as representing files that the engine needs to copy to the Project directory when the workflow finishes (successfully): -
steps:
- name: step-2
[...]
plumbing:
- variable: outputOne
to-project:
- variable: outputTwo
to-project: