-
Notifications
You must be signed in to change notification settings - Fork 0
Input variable values
As we've seen in the Accessing files section, step input files are created from either a from-workflow
or from-step
declaration. How does the step deal with Project files (which are file-links) and step files (which are directory-links)?
The answer is - they don't know the difference. The workflow engine ensures this by prefixing the filename with an instance sub-directory when the step is using a file from a prior step. Let's look at two variable values in a step that uses a Project file and a prior step file using the following workflow excerpt: -
- name: step-3
plumbing:
- variable: inputFile
from-workflow:
variable: candidateMolecules
- variable: propertyFile
from-step:
name: step-2
variable: outputFile
For the sake of this example, to help illustrate how the engine sets the values of step-3's variables, let's also assume the values of the candidateMolecules
variable is the name of a Project file called molecules.smi
and step-2's outputFile
is properties.txt
(a file in its own instance directory). Let's also assume step-2's instance directory is called .instance-123435678
.
In this scenario the workflow engine will launch step-3 with the following variables and values: -
{
"inputFile": "molecules.smi",
"propertyFile": ".instance-123435678/properties.txt"
}
The DM will hard-link the file molecules.smi
and the instance directory .instance-123435678
into step-3's own instance directory.
In our terminology, a combing step is one that processes a file generated by a series of concurrent (parallel) prior steps.
Each concurrent step is expected to produce an output, each using the same file name, with each file in its own instance directory. How do we pass the directory, and names of these files into the combiner? Depending on the workflow there could be a very large number of steps. Passing 8,000 or so paths to the step is likely to cause issues with command line length. For this special case the engine avoids potential command-line length problems by requiring the combining step to accept a directory glob and a filename.
Here's a workflow excerpt to illustrate: -
- name: combine
plumbing:
- variable: inputFile
from-step:
name: parallel
variable: outputFile
- variable: inputDirGlob
from-predefined:
variable: instance-link-glob
If we assume the the file generated by the parallel
step (its outputFile
) is results.sdf
then the workflow engine might set the following variables and values for the combine
step: -
{
"inputFile": "results.sdf",
"inputDirGlob": ".instance-*"
}
Essentially the combining step is expected to find the input files using the glob it is given.