Skip to content

Workflow definitions (2.0)

Alan B. Christie edited this page Aug 19, 2025 · 37 revisions

Workflows are defined using YAML. The schema for a workflow can be found in the engine's repository (workflow/workflow-schema.yaml). It can be used as a schema in an editor using a suitable URL. As an example, here's the URL for this version: -

  • https://raw.githubusercontent.com/InformaticsMatters/squonk2-data-manager-workflow-engine/refs/heads/2.0/workflow/workflow-schema.yaml

    What follows is a discussion of workflows for version 2.0 of the workflow engine. You can find a number of example workflows in the engine repository's test/workflow-definitions directory.

In general, a workflow definition requires the provision of the following: -

  1. An Information block
  2. Variables
  3. Steps (Job specifications)

Information

All workflows need an information section. The supported root-level properties are described in the following table: -

Property Type Description
kind String A constant value that needs to be set to DataManagerWorkflow that will not change in future engine versions
kind-version String An enum string value that needs to be set to 2025.2 for version 2.0
name String The workflow name. An RFC1035 label name compliant string. Essentially up to 64 lower-case letters, digits and hyphens (not ending with -)
description String An optional free-format text property that provides the user with a high-level description of the workflow

Here's an example: -

---
kind: DataManagerWorkflow
kind-version: "2025.2"
name: nop-fail
description: >-
  A workflow with one step that fails

Variables

An optional (although typically always present) root-level property, variables declares all the inputs, outputs, and options a user is expected to provide when they execute the workflow.

The workflow schema does not cover this section, its structure is defined by the DM Job definition schema located in our squonk2-data-manager-job-decoder repository. The workflow engine currently does not use this section, it is present to simplify UI development by allowing the UI to reuse variable logic that is used when launching Jobs.

Steps

Jobs that are run in the workflow are defined in the root-level array property steps. It is an array of Job specifications and step variables. The 2.0 engine executes steps in the order they are defined in the workflow YAML file.

Every step requires a name and a specification. The name is used to identify the step, which has to be unique within the workflow. The specification provides a structured reference to a Squonk2 Job that can also include pre-defined variables (names and values) that do not need to be set by the workflow when the workflow is run. This specification behaves exactly as it does when providing a specification when launching a DM Job via the API.

In the following example the workflow declares a step that will run the rdkit-molprops Job with a pre-defined variable col1 with a value of 123: -

steps:
- name: step1
  description: Add column 1
  specification:
    collection: workflow-engine-unit-test-jobs
    job: rdkit-molprops
    version: "1.0.0"
    variables:
      name: "col1"
      value: 123

Step variable mapping

In addition to the specification's built-in variables any step variable can be _connected_ to a workflow variable or a variable used in a prior step. To make these connections steps can provide a variable-mapping block. In this example we declare that the step's inputFile variable is to be set to the value of the workflow variable candidateMolecules: -

steps:
- name: step-1
  [...]
  variable-mapping:
  - variable: inputFile
    from-workflow:
      variable: candidateMolecules

We do not care what the variable value is, or expect any particular type (string, integer, float) - we simply say the value of step variable inputFile is to be set from the workflow variable candidateMolecules. The value will be set by the user when they run the workflow.

In an alternative example, we declare that the step's inputFile variable is to be set to the value of the variable outputFile that can be obtained from a prior step: -

steps:
- name: step-2
  [...]
  variable-mapping:
  - variable: inputFile
    from-step:
      name: step-1
      variable: outputFile

You can only refer to a prior step. Variables cannot be set from values of variables in steps that have yet to run.

Step project inputs

As well as declaring connections between step variables and workflow (or prior step) variables, for convenience the step also names those variables that are expected to be files (or directories) in the Project directory, and therefore need to be copied/linked into the step's execution directory. This is done with the in property. In the following we identify two such variables: -

steps:
- name: step-1
  [...]
  in:
  - inputFileA
  - inputFileB

inputFileA and inputFileB are expected to be variables that are either in the step's specification or its variable-mapping block.

Step project outputs

A step declares outputs when they are expected to be delivered to the workflow Project directory. In the following we identify a name of one of these variables: -

steps:
- name: step-2
  [...]
  out:
  - outputFile
Clone this wiki locally