Skip to content

Workflow definitions (2.0)

Alan B. Christie edited this page Aug 18, 2025 · 37 revisions

Workflows are defined using YAML. The schema for a workflow can be found in the engine's repository (workflow/workflow-schema.yaml). It can be used as a schema in an editor using a suitable URL. As an example, here's the URL for this version: -

  • https://raw.githubusercontent.com/InformaticsMatters/squonk2-data-manager-workflow-engine/refs/heads/2.0/workflow/workflow-schema.yaml

    What follows is a discussion of workflows for version 2.0 of the workflow engine. You can find a number of example workflows in the engine repository's test/workflow-definitions directory.

In general, a workflow definition requires the provision of the following: -

  1. An Information block
  2. Variables
  3. Variable Mapping
  4. Steps (Job specifications)

Information

All workflows need an information section. The supported root-level properties are described in the following table: -

Property Type Description
kind String A constant value that needs to be set to DataManagerWorkflow that will not change in future engine versions
kind-version String An enum string value that needs to be set to 2025.2 for version 2.0
name String The workflow name. An RFC1035 label name compliant string. Essentially up to 64 lower-case letters, digits and hyphens (not ending with -)
description String An optional free-format text property that provides the user with a high-level description of the workflow

Here's an example: -

---
kind: DataManagerWorkflow
kind-version: "2025.2"
name: nop-fail
description: >-
  A workflow with one step that fails

Variables

An optional (although typically always present) root-level property, variables declares all the inputs, outputs, and options a user is expected to provide when they execute the workflow.

The workflow schema does not cover this section, its structure is defined by the DM Job definition schema located in our squonk2-data-manager-job-decoder repository. The workflow engine currently does not use this section, it is present to simplify UI development by allowing the UI to reuse variable logic that is used when launching Jobs.

Variable Mapping

A workflow that contains variables (almost all of them do) must contain a root-level variable-mapping property. This is a structure that defines the mapping of the workflow variables to the variables of individual steps within the workflow. There are sections for inputs, ouptuts, and options.

Inputs

We use an input to describe files expected by Jobs (Steps).

Jobs in a workflow are defined in a step, which will be covered later in the steps section.

inputs is an array that identifies input variables that are used somewhere within the workflow. It's a temporary structure in 2.0 to simplify UI development by separating workflow semantics from Job execution.

Here's an example: -

variable-mapping:
  inputs:
  - name: input-1.sdf
  - name: input-2.sdf

Options

We use options to describe non-file parameters (numerical values, strings etc.) used in Steps, configuration values that are not input or output files.

The options section of the variable-mapping block is an array that declares all the options of the workflow and an array of Steps that use them.

In the following example the workflow declares two options: option-a whose value will be used for the Job variable threshold in step-1, and option-b which will be used for Job variable floor in step-1, and Job variable minimum in step-2: -

variable-mapping:
  options:
  - name: option-a
    as:
    - option: threshold
      step: step-1
  - name: option-b
    as:
    - option: floor
      step: step-1
    - option: minimum
      step: step-2

Outputs

We use an output to describe files created by Jobs (Steps).

The outputs section of the variable-mapping block is an array that declares all the outputs of the workflow. These will typically be the outputs of one (or more) of the Steps the workflow executes. Each output declaration contains the name of a workflow output variable and the Job and output variable that is its source.

In the following example the workflow declares two outputs: output-a will be a copy of the file named in the Job variable output-file used in step-4, and output-b, which will be a copy of the file named in the Job variable output-file used in step-5: -

variable-mapping:
  outputs:
  - name: output-a
    from:
      step: step-4
      output: output-file
  - name: output-b
    from:
      step: step-5
      output: output-file

The name of a workflow output file is set by the variable in the variables section whose name is provided as the variable-mapping->outputs->name value.

Warning
The workflow engine currently is not looking at the variables section and clearly is unable to satisfy this part of the design at the moment. Instead the name of the file will be the value of the corresponding step output variable.

Steps

Jobs that are run in the workflow are defined in the root-level array property steps. It is an array of Job specifications, inputs, options, and outputs. The 2.0 engine executes steps in the order they are defined in the workflow YAML file.

Every step requires a name and a specification. The name is used to identify the step, which has to be unique within the workflow. The specification provides a structured reference to a Squonk2 Job that also includes the variables (names and values) required to be set when the workflow is run. This specification behaves exactly as it does when providing a specification when launching a DM Job via the API.

In the following example the workflow declares a step that will run the rdkit-molprops Job exposing the variable col1 with a default value of 123: -

steps:
- name: step1
  description: Add column 1
  specification:
    collection: workflow-engine-unit-test-jobs
    job: rdkit-molprops
    version: "1.0.0"
    variables:
      name: "col1"
      value: 123

Inputs

Step inputs come from a workflow variable, and the workflow variables has to be named: -

steps:
- name: step-1
  inputs:
  - input: inputFile
    from:
      workflow-input: candidateMolecules

...or from a prior step's output: -

steps:
- name: step-2
  inputs:
  - input: inputFile
    from:
      step: step-1
      output: outputFile

outputs

A steps declares outputs when they are expected to be delivered to the workflow Project directory. If not named, all outputs of a step remain in the Step Job Instance directory on completion. They can be used as input to a following step, but they are only named as an output if the workflow developer wants the output in the project directory: -

steps:
- name: step-1
  outputs:
  - output: outputFile
Clone this wiki locally