-
Notifications
You must be signed in to change notification settings - Fork 0
Combining
Combining, or "fan in", is used to refer to a step Job that expects multiple input files and creates a single output file. Simple concatenation of the files is one example. "Combiner" Jobs typically join a large number of files (produced by a prior parallel step) in a single step instance.
The Workflow Engine determines that a step is combining multiple files by inspecting the plumbing that refers to a prior steps's output. If a step input variable is (according to the Job Definition) of type files then the step is assumed to be a combiner of files generated by multiple instances of a prior step.
Here's an example workflow excerpt: -
- name: parallel
description: Add some params
specification:
collection: demo
job: append-col
version: "1.0.0"
- name: combine
description: Combine the parallel files
specification:
collection: demo
job: concatenate
version: "1.0.0"
plumbing:
- variable: inputFile
from-step:
name: parallel
variable: outputFile
- variable: inputDirPrefix
from-predefined:
variable: link-glob
In the above example, the combine step uses an inputFile
variable (whose value is the value of the outputFile
variable of the parallel step). When the workflow engine decides to run the combine step it inspects the step's Job Definition, (version 1.0.0
of the concatenate
job in the demo
collection). The engine looks specifically for the definition of the job's inputFile
variable. If the variable is found to be of type files then the concatenate* step will be launched once and given a glob so the step can find the instance directories (hard-linked into its instance directory) where all the outputFile
files can be found (one in each incoming instance directory).
- When the workflow engine discovers a combining step, it does nothing about launching the step until all of the prior (parallel) steps have successfully completed. If a parallel step fails the workflow will not progress.
- A step can must use the pre-defined (built-in) variable exposed by the workflow engine (a filesystem glob) that is can use to locate the directories where the files to be combined can be found.
- Combining Jobs must provide two variables: one to accept the output filename of the prior step and another that accepts a filesystem glob value to identify the instance directories where each file can be found.