Skip to content

Input variable values

Alan B. Christie edited this page Sep 19, 2025 · 4 revisions

As we've seen in the Accessing files section, step input files are created from either a from-workflow or from-step declaration. How does the step deal with Project files (which are file-links) and step files (which are directory-links)?

The answer is - they don't know the difference. The workflow engine ensures this by prefixing the filename with an instance sub-directory when the step is using a file from a prior step. Let's look at two variable values in a step that uses a Project file and a prior step file using the following workflow excerpt: -

- name: step-3
  plumbing:
  - variable: inputFile
    from-workflow:
      variable: candidateMolecules
  - variable: propertyFile
    from-step:
      name: step-2
      variable: outputFile

For the sake of this example, to help illustrate how the engine sets the values of step-3's variables, let's also assume the values of the candidateMolecules variable is the name of a Project file called molecules.smi and step-2's outputFile is properties.txt (a file in its own instance directory). Let's also assume step-2's instance directory is called .instance-123435678.

In this scenario the workflow engine will launch step-3 with the following variables and values: -

{
  "inputFile": "molecules.smi",
  "propertyFile": ".instance-123435678/properties.txt"
}

The DM will hard-link the file molecules.smi and the instance directory .instance-123435678 into step-3's own instance directory.

Special considerations for "combining" steps

In our terminology, a combing step is one that processes a file generated by a series of concurrent (parallel) prior steps.

Each concurrent step is expected to produce an output, each using the same file name, with each file in its own instance directory. How do we pass the directory, and names of these files into the combiner? Depending on the workflow there could be a very large number of steps. Passing 8,000 or so paths to the step is likely to cause issues with command line length. For this special case the engine avoids potential command-line length problems by requiring the combining step to accept a directory glob and a filename.

Here's a workflow excerpt to illustrate: -

- name: combine
  plumbing:
  - variable: inputFile
    from-step:
      name: parallel
      variable: outputFile
  - variable: inputDirGlob
    from-predefined:
      variable: instance-link-glob

If we assume the the file generated by the parallel step (its outputFile) is results.sdf then the workflow engine might set the following variables and values for the combine step: -

{
  "inputFile": "results.sdf",
  "inputDirGlob": ".instance-*"
}

Essentially the combining step is expected to find the input files using the glob it is given.

Clone this wiki locally