Skip to content

Accessing files

Alan B. Christie edited this page Sep 19, 2025 · 8 revisions

Workflow Steps are simply Jobs. Steps, like Jobs, run is a subdirectory of a Project. Neither Jobs nor Steps have general access to Project files.

If a step needs to access Project files, or files generated by other steps it declares this need in the step's "plumbing".

Accessing project files

If a Step needs to access a Project file it must make two declarations.

  • The Workflow must declare an input Workflow Variable. When a user runs the Workflow they are required to provide a value for the input - a file in the Project volume. This is enforced by the DM, the Workflow Engine can assume that all input variables have been declared.
  • Secondly, any Step that wishes to use this named file needs to declare it in their plumbing.

Here's an example of a Workflow input variable: -

variables:
  inputs:
    type: object
    properties:
      candidateMolecules:
        title: Molecules
        type: file

When the user runs the workflow they will be required to provide a value for the variable candidateMolecules, the name of a file in the Project.

Project files are not presented to every step, so step that expects a project file must also declare this. This is fone in the plumbing section, illustrated in this workflow excerpt: -

step:
- name: step-1
  plumbing:
  - variable: inputFile
    from-workflow:
      variable: candidateMolecules

The plumbing reveals two important facts about the step: -

  1. The step's Job has a variable called inputFile
  2. The step expects the inputFile value to be set the the value of the workflow variable candidateMolecules

Accessing files in prior steps

Steps can also use files that are expected to have been created by other (prior) steps. The step's plumbing is used to declare this relationship.

In the following workflow excerpt, step-2 uses a file from step-1 whose name is expected to be in step-1's outputFile variable. The from-step tells the workflow which variable to use and the name of the step to get it from.

- name: step-2
  plumbing:
  - variable: inputFile
    from-step:
      name: step-1
      variable: outputFile

Files are hard-links

What process puts files into a step's instance directory?

It is the DM that places files into a step's instance directory, with help from the engine, which passes the values of the selected workflow files, and step instances a LaunchParameters object that is passed to the InstanceLauncher launch() command. Files are not copied. Dependent files (or directories) are hard-linked into a step's instance directory, illustrated by the following diagram: -

https://github.com/user-attachments/assets/f0e6f0f0-f551-4818-ade3-01237679a8f0
  • Project files are hard-linked into the step's instance directory
  • Prior step files are made available by linking the prior step's entire instance directory

By hard-linking the DM saves on file-space. As in input file (or directory), the step is not expected to modify these files the file-system does not prevent it. Although modifying an input file is generally discouraged as a pattern the user must understand the consequences of doing so, or even locking the file should multiple steps want to access the file concurrently.

Clone this wiki locally